Red Hat Articles / Blogs / Perficient https://blogs.perficient.com/category/partners/ibm-red-hat/red-hat/ Expert Digital Insights Fri, 19 Jan 2024 15:02:11 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png Red Hat Articles / Blogs / Perficient https://blogs.perficient.com/category/partners/ibm-red-hat/red-hat/ 32 32 30508587 Deep Dive into IBM Sterling Certified Containers and Cloud Solutions https://blogs.perficient.com/2024/01/19/deep-dive-into-ibm-sterling-certified-containers-and-cloud-solutions/ https://blogs.perficient.com/2024/01/19/deep-dive-into-ibm-sterling-certified-containers-and-cloud-solutions/#respond Fri, 19 Jan 2024 14:22:26 +0000 https://blogs.perficient.com/?p=352300

Many retailers are embarking on a digital transformation to modernize and scale their order management system (OMS) solution. Built on a modern architecture, the solution wraps Docker containers around order management business services. This architecture streamlines application management and the release of new functionality. The container technology also supports varying levels of technical acumen, business continuity, security, and compliance. If you want to reduce capital and operational expenditures, speed time to market, and improve scalability, elasticity, security, and compliance, you should consider moving your on-premises IBM Sterling application to IBM supported native SaaS or other cloud solutions which best suits your business.

Tailored Hybrid Cloud Solutions from IBM

IBM offers retailers three distinct hybrid cloud solutions tailored to their specific needs. The first option involves a do-it-yourself (DIY) approach with containers on any platform. While offering flexibility, it comes with potential downsides such as slower time to market, increased operational costs, and higher risk due to the intricacies of self-managing containerized environments. The second option introduces a more robust solution with IBM Certified Containers deployed using Kubernetes, striking a balance between customization and manageability. Option three, the most advanced choice, employs IBM Certified Containers deployed through the Red Hat OpenShift Containers Platform. This enterprise-grade solution prioritizes faster time to market, reduced operational costs, and lower risk, providing a secure and comprehensive hybrid cloud environment for organizations seeking efficiency and reliability in their IT transformation endeavors.

Containers*K8s is referred to Kubernetes. * RHOCP is referred to Red Hat OpenShift Container Platform.

IBM Sterling Certified Container Overview

IBM Sterling Order Management certified containers are distributed in the form of three images—om-base, om-app, and om-agent—via the IBM Entitled Registry. This distribution utilizes licensed API keys, streamlining the process for customers to conveniently retrieve and access these containers in their local registries or incorporate them seamlessly into their CI/CD pipelines.

  • om-base: Serving as the foundational image, om-base is provisioned on the IBM Cloud Container Registry (Image Registry). It is equipped for the addition of product extensions and customizations, allowing customers to create a customized runtime tailored to their specific needs.
  • om-app: This image is the Order Management application server designed to manage synchronous traffic patterns. It incorporates the IBM WebSphere Liberty application server. The different om-app images that are built using the customized runtime can be deployed with a dedicated route or ingress to expose the applications in om-app images. Routes are created only if using a Red Hat OpenShift Container Platform cluster. For any other Kubernetes cluster, ingress is created.
  • om-agent: This container serves as the Order Management workflow agent and integration server, specifically tailored to handle asynchronous traffic patterns.
Basic Architecture of Sterling OMS – Post Deployment on K8s or RHOCP
Oms On Redhat Image Courtesy: IBM

Key Benefits of IBM Sterling Certified Containers

  • Flexibility: Multi cloud & platform validated to run applications anywhere seamlessly.
  • Speed: Faster start-up times and new instance creation with easy install and configurations.
  • Efficient Scaling and Deployment Management: Auto-scaling with standardized deployment across all environments. Optimize your infrastructure by capacity scaling & reduced compute resources. Better logging and monitoring and support for continuous integration & delivery.
  • Security: Safeguard brand reputation with top-tier security standards.
  • Seamless Upgrades with Zero Downtime: Simplify application deployment and maintenance with zero down-time upgrades.

Cloud Solutions for IBM Sterling Order Management

IBM offers its native Software as a Service (SaaS), commonly known as IBM Cloud or CoC, taking on the responsibility for hosting, managing, maintaining, and monitoring the entire Order Management (OM) ecosystem. This allows customers to direct their focus toward achieving their business requirements and enhancing business services. IBM’s ownership and management of the DevOps process facilitate automatic upgrades of the OMS application with new features, alongside activities such as backup, database reorganization, and upgrades/patches for WebSphere Application Server (WAS) Liberty, MQ, DB2, and Red Hat Enterprise Linux (RHEL). The proactive monitoring of system performance, coupled with the establishment of automatic alerts and remediation procedures for instances of high CPU/memory usage, ensures a seamless experience for customers. Convenient access to detailed audits/graphs of system performance is provided through a self-serve tool, complemented by log monitoring via Greylog.

In contrast, three other well-regarded cloud solutions compatible with IBM Sterling Certified containers—Amazon AWS, Microsoft Azure, and Oracle Cloud Infrastructure (OCI)—present unique advantages. However, customers opting for these alternatives bear the responsibility of implementing measures to manage, maintain, and monitor the entire Order Management (OM) ecosystem. This encompasses tasks such as database backups, infrastructure upgrades, and system performance monitoring. Additionally, customers must seamlessly integrate with logging tools of their choice when opting for these alternatives.

Conclusion: A Path to Modernization and Efficiency

In conclusion, the shift towards a modernized and scalable Order Management System (OMS) is becoming imperative for retailers undergoing digital transformation. The adoption of IBM Sterling Certified Containers and Software as a Service (SaaS) solutions presents a strategic pathway to enhance flexibility, speed, efficiency, and security in managing the OMS ecosystem. IBM’s hybrid cloud offerings provide retailers with tailored choices, allowing them to align their preferences with the desired level of customization, manageability, and risk. The option to leverage IBM’s native SaaS or explore alternate cloud solutions like Amazon AWS, Microsoft Azure or Oracle Cloud underscores the adaptability of IBM Sterling solutions to diverse business needs. As retailers navigate the complexities of modernizing their OMS, the comprehensive support provided by IBM’s SaaS offerings stands out, ensuring a secure, efficient, and future-ready infrastructure for their digital endeavors.

Key Links-

Installing IBM Sterling Order Management System Software using Certified Container – IBM Documentation

A Step-by-Step Guide for Deploying IBM Sterling Order Management on AWS | AWS for Industries (amazon.com)

Deploy Sterling Order Management on Azure Red Hat OpenShift – IBM Developer

Deploy IBM Sterling Order Management Software in a Virtual Machine on Oracle Cloud Infrastructure

]]>
https://blogs.perficient.com/2024/01/19/deep-dive-into-ibm-sterling-certified-containers-and-cloud-solutions/feed/ 0 352300
Red Hat Ansible Accelerator https://blogs.perficient.com/2023/07/28/red-hat-ansible-accelerator/ https://blogs.perficient.com/2023/07/28/red-hat-ansible-accelerator/#respond Fri, 28 Jul 2023 15:27:40 +0000 https://blogs.perficient.com/?p=340997

Automate with Ansible

Most server management infrastructure tasks have been automated for some time, but network changes can still create a bottleneck. Red Hat Ansible enables you to automate many IT tasks including cloud provisioning, configuration management, application deployment, and intra-service orchestration. With Ansible you can configure systems, deploy software, and coordinate more advanced IT tasks such as continuous integration/continuous deployment (CI/CD) or zero downtime rolling updates.

Our Ansible Accelerator provides an overview of what Ansible can do to help modernize and streamline your DevOps and IT operations. The accelerator is available at three different intervention levels: a workshop, technical enablement, or full team consulting. In 6-12 weeks, we architect a proof of concept that delivers a more secure, compliant, reliable, and automated solution for you and your business.

What’s Included

  • An Ansible pilot with demo playbooks for common use cases
  • Ansible Engine core workshop
  • Playbook authoring
  • Security encryption
  • Writing custom roles
  • Tips for how to leverage cloud providers and dynamic inventories
  • Using Ansible Engine as part of a CI/CD pipeline with other tools
  • Ansible Tower accelerator
  • Documentation including best practices and sample style guide to assist developers in adhering to corporate standards

Use Cases

  • Enable automated deployment across devices in a hybrid model (cloud and on premises)
  • Network automation in a hybrid model
  • Automating Windows
  • Application deployment for Windows/Linux
  • Solving classic infrastructure-as-code challenges
  • Support security scanning for code and infrastructure to enable compliance and remediation (DevSecOps)
  • Support for automation of all cloud platforms including Azure and AWS

Ready to Accelerate?

Perficient + Red Hat

Red Hat provides open-source technologies that enable strategic cloud-native development, DevOps, and enterprise integration solutions to make it easier for enterprises to work across platforms and environments. As a Red Hat Premier Partner and a Red Hat Apex Partner, we help drive strategic initiatives around cloud-native development, DevOps, and enterprise integration to ensure successful application modernization and cloud implementations and migrations.

Contact Us

]]>
https://blogs.perficient.com/2023/07/28/red-hat-ansible-accelerator/feed/ 0 340997
Red Hat Summit & Ansible Fest 2023 Recap https://blogs.perficient.com/2023/06/05/red-hat-summit-ansible-fest-2023-recap/ https://blogs.perficient.com/2023/06/05/red-hat-summit-ansible-fest-2023-recap/#respond Mon, 05 Jun 2023 18:23:29 +0000 https://blogs.perficient.com/?p=336597

Last week, our team attended Red Hat Summit, in Boston, MA.  This past event marks the first time Red Hat combined Ansible Fest with Red Hat Summit. During the three-day conference, Red Hat partners, clients, and vendors got together to hear from Red Hat leadership, industry experts, and get hands-on experience with Red Hat platforms. The Perficient team learned about new capabilities and technologies, heard new product announcements, and connected with peers and clients from across industries.

The stars of Ansible Fest at this year’s Red Hat Summit were undoubtedly the imminent general availability of Event Driven Ansible and the release of a developer plugin called Lightspeed which takes advantage of generative AI to help speed up automation development.

Event Driven Ansible

General availability for Event-Driven Ansible (EDA) is set to be included with the next version of Ansible Automation Platform (AAP) 2.4, which should be available for subscribers beginning in June of this year. So, right around the corner.

EDA introduces the concepts of rulebooks which define sources, rules, and actions to kick off automation. Where sources are things like metrics from an APM like Prometheus or Dynatrace, security events from a SIEM, changes to files, and so on. Rules are the conditions to act on from the source. Finally, actions are the defined automation tasks to carry out, like running a playbook or launching a job or workflow from AAP due to its close integration with the AAP controller.

Event-driven content is already being certified by Red Hat for the AAP 2.4 launch. More content will be released in the Automation hub as partners certify and release it.

Some Key Features of EDA within AAP:

  • A new Event Driven Ansible Controller which runs the always-on EDA listeners. The controller will have a familiar user experience for existing users of Ansible Automation Platform.
  • Tight integration with the Automation Controller to launch job and workflow templates when expected events are triggered.
  • Event throttling, which allows developers and admins to constrain the number of events that can trigger actions.

Ansible Lightspeed

With Ansible expanding to include the new EDA rulebooks, it’s important to maintain development velocity and quality. Red Hat’s strategy to help is to provide a powerful new VS Code extension leveraging IBM Watson-based generative AI.

Formerly known as “Project Wisdom”, Red Hat will soon be providing a targeted generative AI plugin for Microsoft’s Visual Studio Code editor. The demos presented in Boston last week were exciting. My take was that the plugin works like a slick combination of ChatGPT, intellisense, and tab-completion. If the final release is anything like the demos, developers will be able to prompt code generation with the name: block of an ansible task. Lightspeed will process for a moment and offer generated code complete with fully qualified collection names, parameters, and even variables inferred from vars blocks and imported vars files.

Lightspeed is still in a closed beta, so to get on the waiting list to try it out you’ll want to visit https://www.redhat.com/en/engage/project-wisdom#sign-up and make sure to include your GitHub ID.

For anyone concerned about being required to participate in the language model for Lightspeed, Red Hat made it a point to emphasize that the data collection for your ansible code will be an opt-in option, meaning data collection is off until explicitly turned on by you, the developer.

 

Interested in More Red Hat Summit Content?

Check out this video!

 

Perficient + Red Hat

Red Hat provides open-source technologies that enable strategic cloud-native development, DevOps, and enterprise integration solutions to make it easier for enterprises to work across platforms and environments. As a Red Hat Premier Partner and a Red Hat Apex Partner, we help drive strategic initiatives around cloud-native development, DevOps, and enterprise integration to ensure successful application modernization and cloud implementations and migrations.

]]>
https://blogs.perficient.com/2023/06/05/red-hat-summit-ansible-fest-2023-recap/feed/ 0 336597
The Open-Source Philosophy https://blogs.perficient.com/2023/05/31/whatisopensource/ https://blogs.perficient.com/2023/05/31/whatisopensource/#respond Wed, 31 May 2023 19:43:30 +0000 https://blogs.perficient.com/?p=336747

Open-Source vs. Proprietary Software – What’s the Difference?

To thoroughly grasp what open source is, one should understand what it is not. Open source is not restricted by licensing agreements, and the user behind open-source software is not forbidden to change, edit, study, or redistribute manipulated versions of it.

Open-source software grants its users a degree of accessibility that is not possible through its proprietary counterpart. Open-source codes are published publicly for all who wish to study and manipulate them, whereas proprietary software keeps users more restricted, inside hard, iron-clad lines.

Richard Stallman, founder of the free Unix-style operating system GNU and leading voice in the open-source movement, asserts that there are four essential freedoms of open source:

  1. The freedom to run the program as you wish, for any purpose.
  2. The freedom to study how the program works and change it so it does your computing as you wish. Access to the source code is a precondition for this.
  3. The freedom to redistribute copies so you can help others.
  4. The freedom to distribute copies of your modified versions to others. Doing this gives the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

Open Source Is Essential for Modern Development

These freedoms, for Stallman and open-source advocates everywhere, are part of what makes open-source a huge driver of innovation. Due to its free nature, open source inevitably cultivates collaboration and prompts interaction among those in the software world. Code can be constantly shared in open-source environments. This leads to increased productivity because coders waste less time searching for solutions to problems, and it supports the diversity of skill sets.

If a glitch occurs when using proprietary software, especially in the business realm, one typically must go through many channels to get it fixed; open-source software, on the other hand, gives the user a greater sense of agency over issues and user experience. This is convenient for expert software engineers and is integral for educational purposes, as it allows students to learn through application. Any student of code, whether they be pursuing a degree in computer science, or a hobbyist trying to make their own program from scratch, can click “view source” in their web browser and dive deeply into the recipe of the site’s open-source code.

This education is also driven by the open-source community’s expectation that users will be active participants in its democracy. Open source follows the philosophy that all can contribute to the pot of knowledge, and discoveries should not be withheld under the guise of intellectual property.

Open source empowers the user over the program and encourages the utmost technological collaboration and education. It allows users the liberty to change the source, making it do what they want it to do. Rather than the user remaining stuck inside the constraints instilled by a proprietary developer, the open-source experience allows a higher potential to execute the exact desire of the user. The philosophy of open source flips the notion that one must maneuver code in the bounds of the preexisting and promotes a more dispersed power dynamic.

***

Perficient partners with many open-source companies to deliver innovative, scalable solutions to our clients. Interested in learning more about how your company can reap the benefits of open source? Contact one of our strategists today.

]]>
https://blogs.perficient.com/2023/05/31/whatisopensource/feed/ 0 336747
The IT Leader’s Ultimate Multicloud Toolbox https://blogs.perficient.com/2023/05/11/the-it-leaders-ultimate-multicloud-toolbox/ https://blogs.perficient.com/2023/05/11/the-it-leaders-ultimate-multicloud-toolbox/#respond Thu, 11 May 2023 12:34:35 +0000 https://blogs.perficient.com/?p=331644

As an IT leader, you know that adopting a multicloud strategy is a must-have in today’s digital landscape but selecting the right tools can be a bit of a headache. The “IT Leader’s Guide to Multicloud Readiness” is a practical guide that provides key insights and important factors to consider in your multicloud strategy. In the guide, we speak to effective cloud-agnostic tools to help you achieve your goals.

Today, we’ll take a deeper dive in five of the most popular tools mentioned in the guide – Terraform, Azure DevOps, Ansible Automation Platform, Red Hat OpenShift, and CloudBolt – their use cases, strengths, and weaknesses of these tools to help you determine if they are the right fit for your organization.

 

Terraform Logo 2

Terraform is a popular infrastructure as code (IaC) tool that allows you to run code and deploy infrastructure across multiple cloud platforms. It has gained widespread popularity due to its modern syntax that is easy to read and pick up, even for beginners. Terraform is a powerful tool for provisioning infrastructure, whether it’s in the cloud or on-premises. DevOps teams will love Terraform’s ability to work with CI/CD tools, making infrastructure deployment a breeze. Additionally, Terraform’s cloud-agnostic nature means it can deploy infrastructure across multiple cloud platforms, making it perfect for multi-cloud environments.

Strengths:

  1. Cloud Agnostic: Terraform’s superpower is that it is cloud agnostic. Whether you’re targeting Azure, AWS, or Google Cloud, Terraform has got you covered. Terraform can also be used on-premises, with VMware.
  2. Robust Code Base: While native cloud tools such as Azure ARM templates and Bicep offer rudimentary checking to ensure that variables you are referencing are present, Terraform takes it a step further by providing built-in commands and extra packages. It’s like having your own personal code checker.
  3. Providers: Terraform has collections of code called providers, specific to each target platform. Providers are open-source, and Hashicorp, the makers of Terraform, provides documentation for all their providers. This makes it more usable and friendly for users.
  4. Community: Terraform has a mature community of online tools that can plug into Terraform, making it even more robust.
  5. Base Code and Package of Resources: Terraform provides a code base for each provider, and there is a package of resources for each provider that is based on the same underlying language.

Weaknesses:

  1. Learning Curve: While Terraform is relatively easy to pick up, having development experience and platform knowledge on Azure, AWS, Google Cloud or the platform you’re targeting would definitely help.
  2. Limitations: There are always going to be a few things you can’t do with the Terraform code, such as certain settings on certain resources. It’s not 100% complete but Terraform releases a new version every week for the providers they support.

[CONTINUED BELOW…]

 

 

Azure Devops Logo

Azure DevOps is a cloud-based service that provides a comprehensive platform for software development and delivery. It offers a range of tools and services to help teams plan, build, test, and deploy applications with ease. With Azure DevOps, teams can collaborate seamlessly, streamline their workflows, and deliver high-quality software products faster and more reliably. It offers a range of use cases, such as Continuous Integration and Continuous Deployment (CI/CD), Agile Project Management, Version Control, and Infrastructure as Code (IaC). With Azure DevOps pipelines, teams can create workflows that integrate with other Microsoft tools such as Visual Studio and many cloud platforms and services, allowing them to spend less time on manual work and more time on coding.

Strengths:

  1. Integration with Azure: As an Azure SaaS service, Azure DevOps integrates seamlessly with other Azure services, making it easy for developers building Azure-based applications.
  2. Agile Project Management: With Azure DevOps, you get all the Agile tools you need to manage your projects, from sprints to releases. It’s like having a project manager in the cloud.
  3. Customizable Pipelines: Azure DevOps allows teams to customize their pipelines to meet their specific needs. You can choose from a variety of pre-built templates or create your own custom workflows if you want to stray from cookie-cutter DevOps!
  4. Version Control: Azure DevOps offers Git repositories and TFVC, which are widely used in the software development industry. You won’t have to worry about your code disappearing into the void.
  5. Security: Azure DevOps offers enterprise-grade security and compliance features, including role-based access control, multi-factor authentication, and audit trails. You can sleep soundly knowing your code is safe and sound.

Weaknesses:

  1. Learning Curve: Azure DevOps, like any DevOps tool, may have a steep learning curve for teams who are new to the DevOps process.
  2. Limited Support for On-Premises Deployment: While an Azure DevOps server version is available for on-premises deployment, it may require additional resources or expertise to set up and maintain, and may not provide all the features and benefits of the cloud-based version.
  3. Cost: Azure DevOps can be expensive for small organizations or teams if you need Test Plans as part of your membership. However, the potential savings in cloud spend and increased efficiency can often justify the investment.

 

Ansible Automation Platform Logo 2 (2)

Red Hat Ansible Automation Platform, formerly known as Ansible Tower, is a popular IT automation tool that allows organizations to manage, automate, and orchestrate their infrastructure from a central location. It provides a web-based interface and REST API that helps users to manage Ansible workflows, job scheduling, and inventory management. Ansible is commonly used in DevOps environments as it enables easy configuration management of infrastructure and applications across multiple environments. With Ansible, users can automate tasks such as deployment, scaling of infrastructure, software updates, security patching, and backups, which saves time and reduces errors.

Ansible is also great for configuration management of infrastructure such as VMs, switches, and load balancers. It allows users to write configurations once and apply them to many different machines, which reduces configuration drift and provides a single source of truth. Additionally, Ansible can integrate with other tools to create more complex workflows and automation, such as triggering another automation job, opening tickets for users who need assistance, or trying a configuration a different way.

Strengths:

  1. Easy to Learn: You don’t need to be a programming genius to use Ansible. All you need is a basic understanding of YAML, which is a simple language to learn, and you’re good to go.
  2. Powerful Automation Capabilities: Ansible’s powerful automation capabilities make it a go-to tool for infrastructure management and automation of tasks.
  3. Single Source of Truth: Ansible’s single change management database (CMDB) provides a single source of truth for infrastructure configurations, making it easy to keep track of changes.
  4. Integration: Ansible can integrate with other tools to create more complex workflows and automation. It’s like the glue that holds your IT infrastructure together.

Weaknesses:

  1. Limited Support for Newer Technologies: Ansible’s lack of support for newer technologies like containers and Kubernetes might make it less useful for automation of ephemeral services.
  2. Lack of Version Control: Ansible’s limited support for version control might make it challenging to manage large-scale changes to infrastructure configurations.

 

Red Hat Openshift Logo 2

Red Hat OpenShift is an enterprise-grade, open-source container application platform that simplifies the building, deployment, and management of containerized applications across a variety of environments. Built on top of Kubernetes, it provides robust security, compliance, and monitoring capabilities, as well as features and tools to enhance the development process. With OpenShift, you can automate many aspects of the software development and deployment process, save time and energy, and build, test, and deploy your applications without worrying about managing underlying infrastructure. OpenShift simplifies the process of deploying and managing containerized applications across a wide range of environments, making it an ideal platform for Application deployment and management, DevOps automation, and Platform as a Service (PaaS) scenarios.

Strengths:

  1. Turnkey Application Platform: OpenShift is a comprehensive application platform that offers a complete set of tools and services for deploying and managing containerized applications.
  2. Secure and Compliant: OpenShift provides a secure and compliant environment for running applications, with features like image scanning, network isolation, and compliance checks based on industry standards.
  3. Multicloud Support: OpenShift is designed to run on multiple cloud platforms, including AWS, Azure, and Google Cloud, as well as on-premises data centers. It’s like having a travel adapter for your applications.

Weaknesses:

  1. Complexity: OpenShift is a complex platform with many features and capabilities, which can make it difficult for some users to configure and manage.
  2. Cost: While OpenShift is available in open-source and community editions, the enterprise version requires a subscription and can be costly for some organizations.
  3. Learning curve: Learning to use OpenShift effectively may require training and experience, particularly for developers and DevOps engineers who are new to containerization and Kubernetes.

 

Cloudbolt Software Logo

CloudBolt is a cloud management platform designed to help organizations manage their cloud infrastructure across different platforms and providers. It allows teams to provision, manage, and optimize resources in the cloud from a single interface. With CloudBolt, organizations can optimize their cloud spend by providing clear visibility, recommendations, and monitoring tools to help them cut costs and maximize ROI. Additionally, managing resources across multiple cloud providers becomes easier, eliminating the need for different tools and portals. CloudBolt also empowers teams to self-provision resources on demand, reducing the burden on central IT teams.

Strengths:

  1. Flexibility: Whether it’s AWS, Azure, or Google Cloud, CloudBolt can work with different cloud platforms, providers, and technologies. It can also be integrated with other tools to enhance functionality.
  2. Self-Service: CloudBolt’s self-service portal lets users take control of their own workflows and resource management, freeing up IT teams to focus on more important tasks.
  3. Automation: CloudBolt offers automation capabilities to help optimize resource usage and reduce waste, such as power scheduling, bulk cleanup, and rightsizing.
  4. Scalability: CloudBolt can scale to meet the needs of any organization, whether you’re a small business or a large enterprise.

Weaknesses:

  1. Learning Curve: As with any tool, there may be a learning curve associated with CloudBolt, especially for non-technical users. However, the intuitive interface and user-friendly features can help ease the transition.
  2. Cost: While CloudBolt may be considered costly by some organizations, the potential savings in cloud spend and increased efficiency can often justify the investment.

NEXT STEPSG Cover Image 1400px Lb The It Leaders Guide To Multicloud Readiness

The tools we discussed above are crucial for IT leaders who are preparing for multicloud adoption. They provide valuable insights and help organizations streamline workload deployment and management, optimize costs, and maintain critical app and service reliability.

For those seeking a deeper dive into multicloud, the ‘IT Leader’s Guide to Multicloud Readiness’ is an excellent resource that offers practical guidance on developing a multicloud strategy that aligns with business objectives, identifying the right cloud providers and services, and aligning IT teams and stakeholders for successful multicloud adoption. With the right tools and guidance, IT leaders can confidently navigate the challenges of multicloud adoption and unlock the full potential of cloud technology.

]]>
https://blogs.perficient.com/2023/05/11/the-it-leaders-ultimate-multicloud-toolbox/feed/ 0 331644
Custom Ansible Execution Environments https://blogs.perficient.com/2023/05/03/custom-ansible-execution-environments/ https://blogs.perficient.com/2023/05/03/custom-ansible-execution-environments/#respond Wed, 03 May 2023 17:05:31 +0000 https://blogs.perficient.com/?p=334261

Ansible Tower users are gearing up for a big migration to Ansible Automation Platform 2. Ansible Tower 3.8 is technically AAP 1.2, which sunsets in September of 2023. AAP has a few usability updates like the improved Job search, which now lets you search in the web UI for specific job details like the limit which is welcome, also the stdout and stderr in the UI is more readable. A Private Automation Hub is available which acts as a combination container registry and on-LAN Ansible Galaxy server. Automation hub brings frequently used collections closer to the controllers which can speed up job execution by removing the need to pull collections directly from the internet. It also hosts the execution environments which AAP uses to execute the ansible plays in containerized runtimes. It’s one of the biggest fundamental difference between AAP and the outgoing Tower product. Where Tower relies on python virtual environments to organize competing python dependencies, its replacement uses container runtimes. The execution environments are more portable than the old virtual environments which must be created or recreated for each developer. Having a container image which runs ansible jobs means developers can pull what they need and get to writing their automation and configuration instead of wrangling their different python environments.

This post will walk through a typical python virtual environment and a simple execution environment creation and execution. At the end I’ll demonstrate the Ansible’s text-based user interface (TUI) with ansible-navigator. There is a lot more to the tool than what I talk about here. This post also assumes a basic familiarity of Python, Red Hat Enterprise Linux, and container tools like Docker or Podman. I’d encourage anyone working with AAP or AWX to also look into ansible-builder for building execution environments. For now, we’re just customizing containers and using them to run ansible jobs.

Traditional Python Virtual Environments

Until recently, developing ansible automation has been pretty straightforward once you get the hang of it. For the most part it’s just a matter of writing some yaml, double-checking your spacing, maybe installing a collection or two and testing. But what if one of your collection requires a python package outside of the standard library? Your best bet is usually to create python virtual environments that contain everything you need for your plays. A python virtual environment is just that. It’s a way to isolate your python development environment with packages that are only available when the environment is active and keeps unnecessary python packages from polluting your system-python environment.

To create a virtual environment just run

python3 -m venv ansible-test

Then activate it by sourcing the activate script

. ansible-test/bin/activate (note the leading dot .)

With the virtual environment active, you can now install any prerequisite packages you might need to support a galaxy collection and the collection itself. For instance the MySQL collection has both system and python dependencies. For Red Hat based distros the system packages are: gcc, python-devel and mysql-devel. Additionally there are also two python package requirements: PyMySQL and mysqlclient. We’ll want to install the system packages with dnf/yum, then pip install our python dependencies. Here’s our barebones requirements.txt for pip packages and requirements.yml for galaxy content – we’ll use these later in the execution environment as well:

requirements.txt

PyMySQL
mysqlclient

requirements.yml

collections:
  - name: community.mysql
    version: 3.6.0

Now we Install our System Dependencies with:

dnf install gcc mysql-devel python-devel -y

And Our Python Dependencies with:

pip install -r requirements.txt && ansible-galaxy collection install -r requirements.yml

So, not too bad. Just setup a virtual environment, activate it, install your requirements and get to developing. But what if you don’t want to install system packages? Maybe it conflicts with something else installed on your development machine. How do you collaborate with with your team? How do you keep your virtual environments in sync? Of course you can use ansible to create a virtual environment with the pip module, but there might be a better way altogether using a containerized execution environment.

Containerized Execution Environments and Ansible Content Navigator

If you’re just getting started with ansible today or if your organization is using AAP or AWX you might want to look at the latest Ansible Content Navigator tool: ansible-navigator. Ansible-navigator combines a lot of the break-out CLI commands listed earlier into a single executable and provides an optional TUI interface to drill-down into playbook execution. More, it eliminates the need for a python virtual environment and swaps it for a more portable and modern containerized execution environment (ee). It’s still on the developer to customize the execution environment, but the upside is you can now push the whole environment to a registry and  the ansible content you write will run the same from anywhere that container can run. This is how Red Hat’s AAP and the upstream AWX work, so if you’re using one of those you’ll want to be sure your dev environment is consistent with your production automation platform. Developing automation using the same container image that the controller uses is the trick.

AAP comes with a few standard execution environments out of the box that automation developers can pull to their dev box. Each image is an ansible aware container with some base collections to run your playbooks. The base image I’m using in this example is at quay.io/ansible/creator-ee. It’s got a few base collections, ansible-core, and an ansible-runner to execute the plays. All the same container customizations apply here as they would with any other container image. Future posts might go into using ansible-builder, but for today I’m sticking to plain vanilla container customization.

Lets take that MySQL example for instance. Here’s an example Containerfile that we might use to get started to run our community.mysql plays:

MySQLContainerfile

FROM quay.io/ansible/creator-ee

COPY requirements.txt .
COPY requirements.yml .
RUN microdnf install gcc python3-devel mysql-devel -y
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
RUN ansible-galaxy collection install -r requirements.yml -p /usr/share/ansible/collections

Note: Here we’ve offloaded those system-wide packages to the container instead of our own system. Also I’ve instructed ansible-galaxy to install the collection to the container’s collections directory inside the image. This ensures the collection persists beyond the initial image creation. It’s where the rest of the default collections like ansible.posix and kubernetes.core are, so it’s good enough for me for now.

Save that to a containerfile called MySQLContainerfile (or whatever you want to call it) and build your image. I’m using podman here, but feel free to use docker if that’s your jam.

podman build -t registry.example.com/ansible-demo/mysql-ee:0.0.1 -f MySQLContainerfile

Now we can create and test our plays using the new execution environment we just created and if all goes well, we’ll push the image to our registry to be used by other developers or make it available to AAP.

podman push registry.example.com/ansible-demo/mysql-ee:0.0.1

Lets start with a simple play that installs mariadb (an open fork of mysql), initializes a new schema, and adds a user that can connect with a password on the localhost and from our ansible controller.

Here’s the playbook itself:

mysqldb.yml

---
- name: Install and initialize a mysql database
  hosts: db
  become: true

  vars_files:
    - secrets.yml

  tasks:
    - name: Install SQL packages
      ansible.builtin.package:
        name: "{{ db_packages }}"
        state: present

    - name: Open Host Firewall for SQL Connections
      ansible.posix.firewalld:
        service: mysql
        permanent: true
        immediate: true
        state: enabled

    - name: Start SQL Server Service
      ansible.builtin.service:
        name: "{{ db_service }}"
        state: started

    - name: Create .my.cnf
      ansible.builtin.template:
        src: templates/my.j2
        dest: "{{ db_super_path }}.my.cnf"

    - name: Create Database
      community.mysql.mysql_db:
        login_unix_socket: /var/lib/mysql/mysql.sock
        name: "{{ db_name }}"
        state: present

    - name: Add user to {{ db_name }}
      community.mysql.mysql_user:
        login_unix_socket: /var/lib/mysql/mysql.sock
        name: "{{ db_user }}"
        password: "{{ db_pass }}"
        priv: '{{ db_name }}.*:ALL'
        host: "{{ item }}"
        state: present
      loop: "{{ db_hosts }}"

And my secrets.yml file that I’m decrypting by passing –vault-id to the ansible-navigator command. More on that in just a bit.

Here’s my secrets.yml with swapped out passwords. Please don’t use passwords this crummy 👎.

secrets.yml

db_user: “demo_user”
db_pass: "password123"
db_super_pass: “$uper$ecure”

Finally, we just have a simple template file to create root’s .my.cnf credential file in the fourth task.

templates/my.j2

[client]
user=root
password={{ db_super_pass }}

I’m including the secrets here because using ansible-vault with ansible-navigator can be a little tricky but easy to demonstrate.

For the decryption, I’ve just set an environment variable

export ANSIBLE_VAULT_PASSWORD=myverysecurepassword

And have a simple bash script that I pass to navigator that just echoes that back out.

vault-pass.sh

#!/bin/bash

echo ${ANSIBLE_VAULT_PASSWORD}

Now, we can run our playbook with the following command using a containerized execution environment and make available to AAP.

ansible-navigator run mysqldb.yml --eei registry.example.com/ansible-demo/mysql-ee:0.0.1 --vault-id "vault-pass.sh" -m stdout

PLAY [Install and initialize a mysql database] *********************************

TASK [Gathering Facts] *********************************************************
ok: [fedora1]

TASK [Install MySQL package] ***************************************************
changed: [fedora1]

TASK [Open Host Firewall for SQL Connections] **********************************
changed: [fedora1]

TASK [Start SQL Server Service] ************************************************
changed: [fedora1]

TASK [Copy .my.cnf] ************************************************************
changed: [fedora1]

TASK [Create Database] *********************************************************
changed: [fedora1]

TASK [Add user to demo_db] *****************************************************
changed: [fedora1] => (item=192.168.64.2)
changed: [fedora1] => (item=localhost)

PLAY RECAP *********************************************************************
fedora1                    : ok=8    changed=7    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

 

Passing -m stdout displays the results in the traditional ansible format. To use the TUI interface instead, leave off the -m stdout.

The TUI interface allows you to drill into your play’s execution.

Just type the number of the row you want to look into (use esc to go back).

  Play name.                                  Ok Changed  Unreachable  Failed  Skipped  Ignored  In progress  Task count   Progress
0│Install and initialize a mysql database      7       6            0       0        0        0            0           7   Complete

^b/PgUp page up         ^f/PgDn page down         ↑↓ scroll         esc back         [0-9] goto         :help help      Successful

 

Typing 0 here will take us into the play execution below:

  Result   Host      Number   Changed   Task                                           Task action                      Duration
0│Ok       fedora1        0   False     Gathering Facts                                gather_facts                           1s
1│Ok       fedora1        1   True      Install MySQL package                          ansible.builtin.package                9s
2│Ok       fedora1        2   True      Open Host Firewall for SQL Connections         ansible.posix.firewalld                0s
3│Ok       fedora1        3   True      Start SQL Server Service                       ansible.builtin.service                3s
4│Ok       fedora1        4   True      Copy .my.cnf                                   ansible.builtin.template               1s
5│Ok       fedora1        5   True      Create Database                                community.mysql.mysql_db               0s
6│Ok       fedora1        6   True      Add user to demo_db                            community.mysql.mysql_user             1s


^b/PgUp page up         ^f/PgDn page down         ↑↓ scroll         esc back         [0-9] goto         :help help      Successful

 

Now let’s look at line 5 to see more information on the Create Database task.

The output below shows us all of the parameters and results of a given task in YAML format:

Play name: Install and initialize a mysql database:5
Task name: Create Database
CHANGED: fedora1
 0│---                                                                                                                             
 1│duration: 0.392973                                                                                                              
 2│end: '2023-05-02T07:32:19.647181'                                                                                               
 3│event_loop: null                                                                                                                
 4│host: fedora1                                                                                                                   
 5│play: Install and initialize a mysql database                                                                                   
 6│play_pattern: db                                                                                                                
 7│playbook: /home/lxadmin/mysql/mysqldb.yml                                                                                       
 8│remote_addr: fedora1                                                                                                            
 9│res:                                                                                                                            
10│  _ansible_no_log: null                                                                                                         
11│  changed: true                                                                                                                 
12│  db: demo_db
13│  db_list:
14│  - demo_db
15│  executed_commands:
16│  - CREATE DATABASE `demo_db`
17│  invocation:
18│    module_args:
19│      ca_cert: null
20│      chdir: null
21│      check_hostname: null
22│      check_implicit_admin: false
23│      client_cert: null
24│      client_key: null
25│      collation: ''
^b/PgUp page up    ^f/PgDn page down    ↑↓ scroll    esc back    - previous    + next    [0-9] goto    :help help       Successful

Finally, lets test the mysql connection from our ansible controller using our provisioned user to connect to the new database:

mysql -u demo_user -p -h fedora1
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.5.5-10.5.18-MariaDB MariaDB Server

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use demo_db;
Database changed

So there’s a little more setup on the front-end to get started using execution environments over the old python virtual environments, but I think it’s worth trying out. Especially for teams who are really digging into what AAP and AWX and want that consistent development environment.

]]>
https://blogs.perficient.com/2023/05/03/custom-ansible-execution-environments/feed/ 0 334261
Connect with us at Red Hat Summit 2023! https://blogs.perficient.com/2023/04/25/connect-with-us-at-red-hat-summit-2023/ https://blogs.perficient.com/2023/04/25/connect-with-us-at-red-hat-summit-2023/#respond Tue, 25 Apr 2023 16:50:20 +0000 https://blogs.perficient.com/?p=333740

Red Hat Summit is only a few weeks away! The conference will take place on May 23-25, 2023, in Boston, Massachusetts, at the Boston Convention and Exhibition Center. Still not registered? Click here!

Red Hat Summit is the premier open-source technology event, that brings together customers, partners, industry thought leaders, and community contributors to learn, network, and experience the full potential of open source.

Be sure to visit us at booth #116 where our Red Hat experts will answer your questions about Ansible, OpenShift, and showcase how Perficient can help you succeed with Red Hat. As a Red Hat Premier Partner and a Red Hat Apex Partner, Perficient helps drive strategic initiatives around cloud-native development, DevOps, and enterprise integration to ensure successful application modernization and cloud implementations and migrations.

Are you attending Red Hat Summit? Reach out to connect with our team.

]]>
https://blogs.perficient.com/2023/04/25/connect-with-us-at-red-hat-summit-2023/feed/ 0 333740
Cloud Takes Center Stage at 2023 Bank Automation Summit https://blogs.perficient.com/2023/03/30/cloud-takes-center-stage-at-2023-bank-automation-summit/ https://blogs.perficient.com/2023/03/30/cloud-takes-center-stage-at-2023-bank-automation-summit/#respond Thu, 30 Mar 2023 17:49:46 +0000 https://blogs.perficient.com/?p=331802

Bank Automation Summit CollageRecently, I attended the 2023 Bank Automation Summit, where one of the significant topics of discussion was how banks navigate their transition to the cloud.

The “cloud” refers to a global network of servers, each with a unique function, that works in tandem to enable users to access files stored within from any approved device. The computing and storage of cloud data occur in a data center, rather than on a locally sourced device.

Cloud computing makes data more accessible, cheaper, and scalable. For these reasons, Gartner predicts that by 2025, 85% of enterprises will have a cloud-first principle. However, due to their sensitive and regulated natures, some industries – especially the financial services industry – have had more complicated cloud transformation journeys than others.

Given their unique vulnerabilities, what do financial services institutions need to consider when migrating to the cloud? 

Security

Traditionally, information was said to be most secure when separated and segmented. However, the cloud’s structure makes data segmentation more complex and potentially more vulnerable if the correct security measures are not followed. For example, as a start, companies should leverage the cloud for initiatives surrounding verification methods, access security, and anti-phishing training.

A Hybrid Road to a Long-term Solution

Migration to the cloud and data transformation does not need to happen overnight. Especially for larger, older institutions, it might take some warming up to cloud-based applications before adopting them at full capacity. And, many have the impression that everything should move to the cloud, but depending on an institution’s needs, it might make sense for them to keep certain things on premises. Institutions should implement cloud technologies in a way that makes sense for their needs. For many, this means starting their journey using microservices.

Compliance

Financial services institutions must be hypervigilant regarding where customer data is located, who has data access, and how data is managed in a cloud environment. There are also certain global and regional regulatory considerations for migrating to the cloud in a phased approach, and institutions must have a thorough understanding and awareness of the implications.

Interested in discussing how Perficient can support your cloud transformation journey? Contact one of our experts today.

]]>
https://blogs.perficient.com/2023/03/30/cloud-takes-center-stage-at-2023-bank-automation-summit/feed/ 0 331802
So… What is Ansible? https://blogs.perficient.com/2023/03/21/so-what-is-ansible/ https://blogs.perficient.com/2023/03/21/so-what-is-ansible/#respond Tue, 21 Mar 2023 16:45:44 +0000 https://blogs.perficient.com/?p=330935

IT organizations are responsible for an ever-increasing number of applications, infrastructure providers, operating system versions, network devices, platforms, monitoring, ticket systems, and more. The challenges that go along with managing it can range anywhere from time-consuming to downright Sisyphean. The rising adoption of cloud services adds a financial component, a new challenge for many organizations starting their cloud journey. It’s more important than ever for organizations to know as much as possible about their infrastructure, how it’s configured, and how it’s all integrated.

There are many enterprise organizations that have long-standing legacy technology which can’t be containerized or launched in the cloud. The idea that servers should be cattle and not pets is a fantastic goal, but sometimes that livestock gets a name and special treatment turning it into a big pet. There’s a constellation of IoT devices out there that might fall under one regulatory agency’s OT security guidelines or another. IT Engineers need to be able to keep their systems and applications flowing with changing business needs, security updates, and regulatory controls. If you’re looking for a solution to all these problems that’s where Ansible comes in.

Ansible

At its heart, Ansible is a configuration management and automation tool written in Python. That doesn’t mean Ansible developers need to know anything about the Python language to use it (although it is extensible with plugins and custom modules); instead, automation definitions are written in YAML. Sorry, there’s no escaping YAML in today’s IT landscape. Like it or not, it’s the language of configuration for now at least.

Teams using Ansible can define and execute desired states for devices, automate the installation of tools to support an application, and even deploy and configure the application itself using the same tool. Need to update a ServiceNow ticket after modifying a config file on a prod instance? Or add a Jira task if something that wasn’t accounted for pops up? Ansible has modules for that.

Think of a traditional IT application deployment on new infrastructure – let’s say a web server running a simple Flask app in the DMZ VLAN feeding off a PostgreSQL database on the internal VLAN. The Dev team has tested their code and hands it off to the operations team to deploy it on the prod servers with some step-by-step instructions as to what goes where, what required services need to be in place, required versions, and so on. Operations needs to prepare those servers in accordance with their own guidelines, install the Dev team’s prerequisites, then deploy the application. Meanwhile, network engineers need to ensure that the servers have valid IP addresses and that the firewalls on both sides of the DMZ are allowing the correct traffic though so that users can get to the app, and the web server can talk to the database.

What If?

What if instead of step-by-step instructions, it was a simple Ansible role that could be called from a playbook along with the network team’s IP and firewall roles and operations server-compliance configuration? Now everything needed to build that application is defined in code, packaged together, and tracked in source control. Ansible enables teams to do just that. When done carefully, Ansible playbooks and roles can be self-documenting. Ansible has a shallow learning curve, fantastic documentation, and a no-cost barrier to entry if using ansible-core to get started.

AAP Highlights

Ansible core can take a small team a long way. Larger teams and teams who might be outgrowing the command-line-only Ansible tools will want to look at Red Hat’s Ansible Automation Platform. Ansible Automation Platform (AAP for short) is a full suite of tools that expands on the capabilities of Ansible core. Some of the highlights of what AAP provides are:

  • Role-based access control (RBAC) with multiple authentication sources (Active Directory, OAuth2, SAML, and more)
  • SCM project integration supporting Git and Subversion
  • Secure secrets management
  • Job scheduling
  • Job templates to execute playbooks
  • Workflow Jobs can combine job templates from multiple projects and teams with pass/fail paths for workflow actions.
  • Useful web UI for the Automation Controller and Private Automation Hub.
  • Prebuilt inventory plugins for many third party CMDBs
  • Full-featured REST API
  • Access to the Red Hat supported content catalog full of partner resources by way of collections.
  • A Private Automation Hub to host the execution environments, Red Hat Certified Automation Content as well as community and even custom collections to support an organization’s automation requirements.

Perficient + Red Hat

Red Hat provides open-source technologies that enable strategic cloud-native development, DevOps, and enterprise integration solutions to make it easier for enterprises to work across platforms and environments. As a Red Hat Premier Partner and a Red Hat Apex Partner, we help drive strategic initiatives around cloud-native development, DevOps, and enterprise integration to ensure successful application modernization and cloud implementations and migrations.

Contact Us

]]>
https://blogs.perficient.com/2023/03/21/so-what-is-ansible/feed/ 0 330935
Openshift Essentials and Modern App Dev on Kubernetes https://blogs.perficient.com/2023/03/13/openshift-essentials-and-modern-app-dev/ https://blogs.perficient.com/2023/03/13/openshift-essentials-and-modern-app-dev/#respond Mon, 13 Mar 2023 19:02:00 +0000 https://blogs.perficient.com/?p=322779

Introduction

Whether you have already adopted Openshift or are considering it, this article will help you increase your ROI and productivity by listing the 12 essential features including with any Openshift subscription. This is where Openshift shines as a platform when compared to pure Kubernetes engine distributions like EKS, AKS, etc. which are more barebones and require quite a bit of setup to be production and/or enterprise ready. When you consider the total value of Openshift, and factor in the total cost of ownership for the alternative, Openshift is a very competitive option not only for cost conscious buyers but also organizations that like to get things done, and get things done the right way. Here we go:

  1. Managed Openshift in the cloud
  2. Operators
  3. GitOps
  4. Cluster Monitoring
  5. Cluster Logging
  6. Distributed Tracing
  7. Pipelines
  8. Autoscaling
  9. Service Mesh
  10. Serverless
  11. External Secrets
  12. Hyperscaler Operators

Special Bonus: Api Management

ROSA, ARO, ROKS: Managed Openshift in the cloud

If you want an easy way to manage your Openshift Cloud infra, these managed Openshift solutions are an excellent value and a great way to get ROI fast. Pay-as-you-go running on the hyperscaler’s infrastructure, you can save a ton of money by using reserved instances with a year commitment. RedHat manages the control plane (master and infra nodes) and you pay a small fee per worker. We like the seamless integration with native hyperscaler services like storage and node pools for easy autoscaling, zone awareness for HA, networking and RBAC security with IAM or AAD. Definitely worth a consideration over the EKS/AKS, etc. solutions which are more barebones.

Check out our Openshift Spring Boot Accelerator for ROSA, which leverages most of the tools I’m introducing down below…

Operators

Available by default on Openshift, the OperatorHub is pretty much the app store for Kubernetes. Operators manage the installation, upgrade and lifecycle of complex Kubernetes-based solutions like the tools we’re going to present in this list. They also are based on the controller pattern which is at the core of the Kubernete’s architecture and enable declarative configuration through the use of Custom Resource Definitions (CRD). Operators is a very common way to distribute 3rd party software nowadays and the Operator Framework makes it easy to create custom controllers to automate common Kubernetes operations tasks in your organization.

The OperatorHub included with Openshift out-of-the-box allows you to install said 3rd party tools with the click of a button, so you can setup a full-featured cluster in just minutes, instead of spending days, weeks, months gathering installation packages from all over. The Operator Framework support Helm, Ansible and plain Go based controllers to manage your own CRDs and extend the Kubernetes APIs. At Perficient, we leverage custom operators to codify operations of high-level resources like a SpringBootApp. To me, Operators represent the pinnacle of devsecops automation or at least a giant leap forward.

Openshift GitOps (AKA ArgoCD)

First thing you should install on your clusters to centralize the management of your clusters configuration with Git is GitOps. GitOps is a RedHat’s distribution of ArgoCD which is delivered as an Operator, and integrates seamlessly with Openshift RBAC and single-sign on authentication. Instead of relying on a CI/CD pipeline and the oc (kubectl) cli to implement changes in your clusters, ArgoCD works as an agent running on your cluster which automatically pulls your configuration manifests from a Git repository. This is the single most important tool in my opinion for so many reasons, the main ones being:

  1. Central management and synchronization of multi-cluster configuration (think multi-region active/active setups at the minimum)
  2. Ability to version control cluster states (auditing, rollback, git flow for change management)
  3. Reduction of learning curve for development teams (no new tool required, just git, manage simple yaml files)
  4. Governance and security (quickly propagating policy changes, no need to give non-admin users access to clusters’apis)

I have a very detailed series on GitOps on my Perficient’s blog, this is a must-read whether you’re new to Openshift or not.

Cluster Monitoring

Openshift comes with a pre-configured monitoring stack powered by Prometheus and Grafana. Openshift Monitoring manages the collection and visualization of internal metrics like resource utilization, which can be leveraged to create alerts and used as the source of data for autoscaling. This is generally a cheaper and more powerful alternative to the native monitoring systems provided by the hyperscalers like CloudWatch and Azure Monitoring. Like other RedHat’s managed operators, it comes already integrated with Openshift RBAC and authentication. The best part is it can be managed through GitOps by using the provided, super simple CRDs.

A less-know feature is the ability to leverage Cluster Monitoring to collect your own application metrics. This is called user-workload monitoring and can be enabled with one line in a manifest file. You can then create ServiceMonitor resources to indicate where Prometheus can scrape your application custom metrics, which can be then used to build custom alerts, framework-aware dashboards, and best of all, used as a source for autoscaling (beyond cpu/memory). All with a declarative approach which you can manage across clusters with GitOps!

Cluster Logging

Based on a Fluentd-Elasticsearch stack, cluster logging can be deployed through the OperatorHub and comes with production-ready configuration to collect logs from the Kubernetes engine as well as all your custom workloads in one place. Like Cluster Monitoring, Cluster Logging is generally a much cheaper and Powerful alternative to the hyperscaler’s native services. Again, the integration with Openshift RBAC and single-sign on makes it very easy to secure on day one. The built-in Kibana deployment allows you to visualize all your logs through a web browser without requiring access to the Kubernetes API or CLI. The ability to visualize logs from multiple pods simultaneously, sort and filter messages based on specific fields and create custom analytics dashboards makes Cluster Logging a must-have.

Another feature of Cluster Logging is log forwarding. Through a simple LogForwarder CRD, you can easily (and through GitOps too!) forward logs to external systems for additional processing such as real-time notifications, anomaly detection, or simply integrate with the rest of your organization’s logging systems. A great use case of log forwarding is to selectively send log messages to a central location which is invaluable when managing multiple clusters in active-active configuration for example.

Last but not least is the addition of custom Elasticsearch index schema in recent versions, which allows developers to output structured log messages (JSON) and build application-aware dashboards and analytics. This feature is invaluable when it comes to filtering log messages based on custom fields like log levels, or a trace ID, to track logs across distributed transactions (think Kafka messages transiting through multiple topics and consumers). Bonus points for being able to use Elasticsearch as a metrics source for autoscaling with KEDA for example.

Openshift Distributed Tracing

Based on Jaeger and Opentracing, Distributed Tracing can again be quicky installed through the OperatorHub and makes implementing Opentracing for your applications very, ridiculously easy. Just deploy a Jaeger instance in your namespace and you can just annotate any Deployment resource in that namespace with one single line to start collecting your traces. Opentelemetry is invaluable for pinpointing performance bottlenecks in distributed systems. Alongside Cluster Logging with structured logs as mentioned above, it makes up a complete solution for troubleshooting transactions across multiple services if you just log your Opentracing trace IDs.

Openshift Distributed Tracing also integrates with Service Mesh, which we’ll introduce further down, to monitor and troubleshoot traffic between services inside a mesh, even for applications which are not configured with Opentelemetry to begin with.

Openshift Pipelines

Based on Tekton, Openshift pipelines allow you to create declarative pipelines for all kind of purposes. Pipelines are the recommended way to create CI/CD workflows and replaces the original Jenkins integration. The granular declarative nature of Tekton makes creating re-usable pipeline steps, tasks and entire pipelines a breeze, and again can be managed through GitOps (!) and custom operators. Openshift pipelines can be deployed through the OperatorHub in one-click and comes with a very intuitive (Jenkins-like) UI and pre-defined tasks like S2I to containerize applications easily. Creating custom tasks is a breeze as tasks are simply containers, which allows you to leverage the massive ecosystem of 3rd party containers without having to install anything additional.

You can use Openshift pipelines for any kind of workflow, from standard ci/cd for application deployments to on demand integration tests, to executing operations maintenance tasks, or even step functions. As Openshift native, Pipelines are very scalable as they leverage the Openshift infrastructure to execute tasks on pods, which can be very finely tuned for maximum performance and high availability, integrate with the Openshift RBAC and storage.

Autoscaling

Openshift supports the three types of autoscalers: horizontal pod autoscaler, vertical pod autoscaler, cluster autoscaler. The horizontal pod autoscaler is included OOTB alongisde the node autoscaler, and the vertical pod autoscaler can be installed through the OperatorHub.

Horizontal pod autoscaler is a controller which increases and decreases the number of pod replicas for a deployment based on CPU and Memory metrics threshold. It leverages Cluster Logging to source the Kubernetes pod metrics from the included Prometheus server and can be extended to use custom application metrics. The HPA is great to scale stateless rest services up and down to maximize utilization and increase responsiveness during traffic increase.

Vertical pod autoscaler is another controller which analyses utilization metrics patterns to optimize pod resource configuration. It automatically tweaks your deployment resources CPU and memory requests to reduce wastes or undercommitment to insure maximum performance. It’s worth noting that a drawback of VPA is that pods have to be shutdown and replaced during scaling operations. Use with caution.

Finally, the cluster autoscaler is used to increase or decrease the number of nodes (machines) in the cluster to adapt to the number of pods and requested resources. The cluster autoscaler paired with the hyperscaler integration with machine pools can automatically create new nodes when additional space is required and remove the nodes when the load decreases. There are a lot of considerations to account for before turning on cluster autoscaling related to cost, stateful workloads requiring local storage, multi-zone setups, etc.  Use with caution too.

Special Mention

Special mention for KEDA, which is not commercially supported by RedHat (yet), although it is actually a RedHat-Microsoft led project. KEDA is an event-driven scaler which sits on top of the built-in HPA and provides extensions to integrate with 3rd party metrics aggregating systems like Prometheus, Datadog, Azure App Insight, and many many more. It’s most well-known for autoscaling serverless or event-driven applications backed by tools like Kafka, AMQ, Azure EventHub, etc. but it’s very useful to autoscale REST services as well. Really cool tech if you want  to move your existing AWS Lambda or Azure Functions over to Kubernetes.

Service Mesh

Service mesh is supported by default and can also be installed through the OperatorHub. It leverages Istio and integrates nicely with other Openshift operators such as Distributed Tracing, Monitoring & Logging, as well as SSO. Service mesh serves many different functions that you might be managing inside your application today (For example if you’re using Netflix OSS apps like Eureka, Hystrix, Ribbon, etc):

  1. Blue/green deployments
  2. Canary deployments (weighted traffic)
  3. A/B testing
  4. Chaos testing
  5. Traffic encryption
  6. OAuth and OpenID authentication
  7. Distributed tracing
  8. APM

You don’t even need to use microservices to take advantage of Service Mesh, a lot of these features apply to re-platformed monoliths as well.

Finally you can leverage Service Mesh as a simple API Management tool thanks to the Ingress Gateway components, in order to expose APIs outside of the cluster behind a single pane of glass.

Serverless

Now we’re getting into real modern application development and deployment. If you want peak performance and maximize your compute resources and/or bring down your cost, serverless is the way to go for APIs. Openshift Serverless is based on KNative and provides 2 main components: serving and eventing. Serving is for HTTP APIs containers autoscaling and basic routing, while eventing is for event-driven architecture with CloudEvents.

If you’re familiar with AWS Lambda or Azure Functions, Serverless is the equivalent in the Kubernetes world, and there are ways to migrate from one to the other if you want to leverage more Kubernetes in your infrastructure.

We can build a similar solution with some of the tools we already discussed like KEDA and Service Mesh, but KNative is a more opinionated model for HTTP-based serverless applications. You will get better results with KNative if you’re starting from scratch.

The big new thing is eventing which promotes a message-based approach to service-to-service communication (as opposed to point-to-point). If you’ve used that kind of decoupling before, you might have used Kafka, or AWS SQS or other types of queues to decouple your applications, and maybe Mulesoft or Spring Integration or Camel (Fuse) to produce and consume messages. KNative eventing is a unified model for message format with CloudEvent and abstracts the transport layer with a concept called event mesh. Check it out: https://knative.dev/docs/eventing/event-mesh/#knative-event-mesh.

External Secrets Add-On

One of the first things to address when deploying applications to Kubernetes is the management of sensitive configuration variables like passwords to external systems. Though Openshift doesn’t officially support loading secrets from external vaults, there are widely used solutions which are easily setup on Openshift clusters:

  • Sealed Secrets: if you just want to manage your secrets in Git, you cannot have them in clear even if you’re using GitHub or other Git providers. SealedSecrets allows you to encrypt secrets in Git which can only be read by your Openshift cluster. This requires an extra step before committing using the provided client certificate but doesn’t require a 3rd party store.
  • External Secrets: this operator allows you to map secrets stored in external vaults like Hashicorp, Azure Vault and AWS Secret Manager to internal Openshift secrets. Very similar to the CSI driver below, it essentially creates a Secret resource automatically, but doesn’t require an application deployment manifest to be modified in order to be leveraged.
  • Secrets Store CSI Driver: another operator which syncs an external secrets store to an internal secret in Openshift but works differently than the External Secrets operator above. Secrets managed by the CSI driver only exist as long as a pod using it is running, and the application’s deployment manifest has to explicitly “invoke” it. It’s not usable for 3rd party containers which are not built with CSI driver support out-of-the-box.

Each have their pros and cons depending on whether you’re in the cloud, use GitOps, your organization policies, existing secrets management processes, etc. If you’re starting from scratch and are not sure of which one to use, I recommend starting with External Secrets and your Cloud provider secret store like AWS Secret Manager or Azure Vault.

Special Mention: Hyperscalers Operators

If you’re running on AWS or Azure, each cloud provider has released their own operators to manage cloud infrastructure components through GitOps (think vaults, databases, disks, etc), allowing you to consolidate all your cloud configuration in one place, instead of using additional tools like Terraform and CI/CD. This is particularly useful when automating integration or end-to-end tests with ephemeral Helm charts to setup various components of an application.

API Management Add-On

Muleosft, Boomi or Cloupak for integration customers, this is an add-on but it’s way worth considering if you want to reduce your APIM costs: Redhat Application Foundation and Integration. These suites include a bunch of cool tech like Kafka (with a registry) and AMQ, SSO (SAML, OIDC, OAuth), Runtimes like Quarkus and Spring and Camel, 3Scale for API Management (usage plans, keys, etc), CDC, Caching and more.

Again because it’s all packaged as an operator, you can install and start using all these things in just a few minutes, with the declarative configuration goodness that enables GitOps and custom operators.

]]>
https://blogs.perficient.com/2023/03/13/openshift-essentials-and-modern-app-dev/feed/ 0 322779
6 Steps to successful autoscaling on Kubernetes https://blogs.perficient.com/2023/01/06/6-steps-to-successful-autoscaling-on-kubernetes/ https://blogs.perficient.com/2023/01/06/6-steps-to-successful-autoscaling-on-kubernetes/#respond Fri, 06 Jan 2023 18:44:17 +0000 https://blogs.perficient.com/?p=324354

Introduction

One of the big drivers of adopting containers to deploy microservices is the elasticity provided by platforms like Kubernetes. The ability to quickly scale applications up and down according to current demand can cut your spending by more than half, and add a few 9s to your SLAs. Because it’s so easy to setup nowadays, there’s really no good reason for autoscaling not to be one of your top priorities for a successful adoption of Kubernetes. In this post I’m going to give you the 6 easy steps to establish a solid autoscaling foundation using KEDA, and trust me you’ll go a long way with just these basic principles.

TL;DR

  1. Rightsize your deployment container
  2. Get a performance baseline for your application
  3. Use the baseline measurement as a the KEDA ScaledObject target
  4. Test your KEDA configuration with realistic load
  5. Refine the metric to minimize the number of pods running
  6. Iterate

Understand these principles

Before you jump into autoscaling, please consider the following

  • Autoscaling is not a silver bullet to solve performance problems
  • “Enabling HPA is not the same as having a working autoscaling solution” (credit: Sasidhar Sekar)
  • It’s a powerful tool that needs to be used with caution, bad configuration can lead to large cost overruns
  • Autoscaling is better suited for non-spiky load patterns
  • Autoscaling tuning can be different for each application
  • Tuning requires a solid understanding of traffic patterns, application performance bottlenecks
  • Sometimes it’s good to not auto-scale (you might want backpressure)
  • Careful with async workloads
  • Think about the whole system, external dependencies, tracing is invaluable
  • Tuning autoscaling is a process, to be refined over time

Now that we got out of the way, let’s get started…

Autoscaling options

Let’s super quickly review the different types of autoscaling available for Kubernetes:

Vertical Autoscaling: resizes individual pods to increase the load capacity. Great for rightsizing applications that don’t scale horizontally easily such as Stateful services (databases for example) or applications that are CPU or memory bound in general. Scaling a pod vertically requires replacing the pod, which might cause downtime. Note that for certain type of services, resizing a pod might have no effect at all on its capacity to process more requests. That’s because Spring Boot services for example have a set number of threads per instance, so you would need to explicitly increase the number of threads to leverage the additional CPU.

Horizontal Autoscaling: creates additional identical pods to increase the overall load capacity. Best option to use whenever possible in order to optimize pod density on a node. Supports CPU and memory-based scaling out-of-the-box but supports custom metrics as well. Well-suited for stateless services, event-driven consumers

Node Autoscaling: creates additional identical nodes (machines) in order run more pods when existing nodes are at capacity. This is a great companion for horizontal autoscaling but… there are many considerations to take into account before turning it on. The two main concerns are waste – new nodes might get provisioned for only minor capacity increase – and scaling down – when nodes run Stateful pods which might be tied to specific zones.

The rest of this article will be focused on horizontal pods autoscaling.

Understanding the Horizontal Pod Autoscaler

HPA ships with Kubernetes and consist of a controller that manages the scaling up and down of the number of pods in a deployment.

HPA flow

In a nutshell:

  1. You create a manifest to configure autoscaling for one of your deployments
  2. The manifests specifies what metric and threshold to use to make a scaling decision
  3. The operator constantly monitors the K8s metrics or some metrics API
  4. When a threshold is breached, the operator updates the number of replicas for your deployment

HPA is limited in terms of what metrics you can use by default though: CPU & memory. So this is fine if your service is CPU or memory bound but if you want to use anything else, you’ll need to provide HPA with a custom API to serve other types of metrics.

This is the basic formula that the HPA to calculate the desired number of pods to schedule:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

This is calculated on every “tic” of the HPA, which can be configured per deployment but default to 30 seconds.

Example:

An HPA configured with a target CPU usage of 60% will try to maintain an average usage of 60% CPU across all deployment’s pods.

If the current deployment is running 8 pods averaging %70 usage, desiredReplicas = ceil[8*(70/60)] = ceil(9.33) = 10. The HPA will add 2 pods.

Introducing KEDA

According to the KEDA website:

KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed.

That’s actually a bit misleading and reducing. The common misconception is that KEDA can only be used when doing event-driven architecture like MQ or Kafka. In reality KEDA provides that API I mentioned earlier for serving custom metrics to the HPA. Any type of metrics, like response time or requests/second, etc

KEDA FLow

So say you want to use Prometheus metrics, or CloudWatch metrics, etc. KEDA has a lot of scalers to integrate with all these services. This is a very easy way to augment the default HPA and not write custom metrics APIs.

KEDA Workflow

  1. A ScaledObject Kubernetes manifest tells KEDA about your deployment and desired scaling configuration
  2. KEDA initially scales down the deployment’s pod to 0
  3. When pod activity is first detected, KEDA scales the deployment to the min number of pods specified in the config file
  4. KEDA also creates a native Kubernetes HorizontalPodAutoscaler (HPA) resource
  5. The HPA monitors the targeted metric for autoscaling by querying a KEDA metric server
  6. The KEDA metric server acts as a broker for the actual metric server (Azure Monitor, App Insight, Prometheus, etc)
  7. When the metric threshold is breached, the HPA adds more pods according to the formula below
  8. When no more traffic is detected, the HPA scales back the pods down to the min number of pods
  9. Eventually KEDA will scale back down to 0 and de-activate the HPA

A note about HTTP services

One of the interesting features of KEDA is the ability to scale down to 0 when there’s nothing to do. KEDA will just query the metric system until activity is detected. This is pretty easy to understand when you’re looking at things like queue size of Kafka records age, etc. The underlying service (i.e. Kafka) still runs and is able to receive messages, even if there aren’t any consumer doing work. No message will be lost.

When you consider HTTP services though, it doesn’t work quite the same. You need at least one instance of the service to process the first incoming HTTP request so KEDA cannot scale that type of deployment to 0.

(There is an add-on to handle HttpScaledObjects that creates a sort of HTTP proxy, but if you really need to scale down services to 0, I recommend looking at KNative instead)

You can still leverage KEDA as the HPA backend to scale on things like requests/seconds and this is what we’re going to do next.

Rightsizing your application pods

What we call rightsizing in Kubernetes is determining the ideal CPU and Memory requirements for your pod to maximize utilization while preserving performance.

Rightsizing serves 2 main purposes:

  • Optimize the density of pods on a node to minimize waste
  • Understand your application capacity for a single pod so we know when to scale

Optimizing density

This is more related to cost control and utilization of compute resources. If you picture the node as a box, and the pods as little balls, the smaller the balls, the less wasted space between the balls.

Also you’re sharing the node with other application pods, so the less you use, the more resources you leave to other applications.

Let’s walk through an example. Say your node pools are made of machine with 16 vCPUs and your pods are configured to request 2 vCPUs, you can put 8 pods on that node. If your pod actually only uses 1 vCPU, then you’re wasting 50% capacity on that node.

If you request a big vCPU number, also keep in mind that every time a new pod comes up, you might only use a fraction of that pod capacity while usage goes up. Say your pod uses 4 vCPU / 1000 concurrent requests. At 1250 requests for example, a new pod would be created, but only ¼ of the requested vCPU would be used. So you’re blocking resources that another application might need to use.

You get the idea… smaller pods = smaller scaling increment

Understanding performance

This is to give us a baseline for the metrics to scale on. The idea is to establish a relation between the pod resources and its capacity to reach a target so multiply by 2 and you can twice the capacity, multiple by 3 and you get 3 times the capacity etc.

I recommend using a performance-based metrics for autoscaling as opposed to a utilization metric. That’s because a lot of http services don’t necessarily use more resources to process more requests. Checkout the following load test of a simple Spring Boot application.

Spring Boot Load Test

In this test I’m doubling the number of concurrent requests at each peak. You can see that the max CPU utilization doesn’t change.

So what’s the right size? In a nutshell, the minimum CPU and memory size to insure a quick startup of the service and provide enough capacity to handle the first few requests.

Typical steps for a microservice with a single container per pod (not counting sidecar containers which should be negligible):

  1. To determine the initial CPU and memory requests, the easiest approach is to deploy a single pod and run a few typical requests against it. Defaults depend on the language and framework used by your application. In general, the CPU request is tied to the response time of your application, so if you’re expecting ~250ms response time, 250m CPU and 500Mi memory is a good start
  2. Observe your pod metrics and adjust the memory request to be around the memory used +- 10%.
  3. Observe your application’s startup time. In some cases, requests impact how fast an application pod will start so increase/decrease CPU requests until the startup time is stable

Avoid specifying CPU limits at least at this point to avoid throttling

To really optimize costs, this will need to be refined over time by observing the utilization trends in production.

Getting a performance baseline

This step measures how much load a single pod is able to handle. The right measure depends on the type of services that you’re running. For typical APIs, requests/seconds is the preferred metric. For event-driven consumers, throughput or queue-size is best.

A good article about calculating a performance baseline can be found here: https://blog.avenuecode.com/how-to-determine-a-performance-baseline-for-a-web-api

The general idea is to find the maximum load the pod can sustain without degradation of service, which is indicated by a drop in response time.

Don’t feel bad if your application cannot serve 1000s of RPS, that’s what HPA is for, and this is highly dependent on your application response time to begin with.

  1. Start a load test with a “low” number of threads, without a timer, to approximate your application response time
  2. Now double the number of threads and add a timer according to the formula in the above article
  3. Observe your application average response time
  4. Repeat until the response time goes up
  5. Iterate until the response time is stable
  6. You now have your maximum RPS for a rightsized pod

Keep an eye on your pod CPU utilization and load. A sharp increase might indicate an incorrect CPU request setting on your pod or a problem inside your application (async processes, web server threading configuration, etc)

Example: REST Service expecting 350ms response time

We rightsized our Spring Boot application and chose 500m CPU and 600Mi memory requests for our pod. We’ve also created a deployment in our Kubernetes cluster with a single replica. Using JMeter and Azure Load Testing we were able to get the following results. The graphs show number of concurrent threads (users) on the top left, response time on the top right, and requests/seconds (RPS) on the bottom left.

1 POD (500m CPU) – 200 users

Keda007

1 POD (500m CPU) – 400 users

Keda009

1 POD (500m CPU) – 500 users

Keda011

1 POD (500m CPU) – 600 users

Keda015

Observe the response time degrading at 600 users (460ms vs 355ms before). So our pod performance baseline is 355ms @ 578 rps (500 users).

Interestingly, the CPU load plateaued at around 580 RPS. That’s because Spring Boot rest services are typically not CPU bound. The requests are still accepted but land in the thread queue until capacity is available to process the request again. That’s why you see an increase of the response time despite the CPU load staying the same. This is a perfect example of why using CPU for autoscaling doesn’t work sometimes, since in this case, you would just never reach a high CPU utilization. We still want the CPU request to be higher because of startup time for Spring Boot apps.

Now let’s scale our deployment to 2 replicas and run the tests again.

2PODS (500m CPU) – 1000

Keda017

2 PODS (500m CPU) – 1200 users

Keda019

This confirms our baseline so we know we can double the number of pods to double the capacity (353.11ms @ 1.17 rps)

Configuring KEDA

I’ve previously explained that the HPA only supports CPU and memory metrics for autoscaling out-of-the-box. Since we’ll be using RPS instead, we need to provide the HPA an API to access the metric. This is where KEDA comes in handy.

KEDA provides access to 3rd party metrics monitoring systems through the concept of Scalers. Available scalers include Azure Monitor, Kafka, App Insights, Prometheus, etc. For our use case, the RPS metric is exposed by our Spring Boot application through the Actuator, then scraped by Prometheus. So we’ll be using the Prometheus scaler.

The ScaledObect resource

In order to register a deployment with KEDA, you will need to create a ScaledObject resource, similar to a deployment or service manifest. Here’s an example:

KEDA ScaleObject

Let’s discuss the main fields:

  • minReplicaCount is the number of replicas we want to maintain. Remember in the case of an HTTP service, we always want at least one at all time (see discussion above)
  • scaleTargetRef is a reference to your deployment resource (here we’re using Openshift DeploymentConfig, but normally you’d target a Deployment)
  • metadata.type indicates that we want to use the Prometheus scaler
  • metadata.query specifies the PromQL query to calculate the average RPS across all pods tagged with “echo-service-spring-boot”
  • metadata.threshold is the target for the HPA. Remember the formula at the beginning of the post “desiredMetricValue”, this is it
  • metadata.metricName is whatever you want and has to be unique across scalers

That’s pretty much it. Apply that resource to the namespace where your deployment is running and you can start testing

Tuning Autoscaling

Let’s first look at the basic steps and we’ll discuss details down below:

  1. Start with 1 minReplicaCount
  2. Start your test
  3. Observe the response time graph
  4. If things are configured properly, we expect the response time to remain constant as the number of RPS increases
  5. Increase the threshold until you start seeing spikes in response time, which would indicate that the autoscaler is scaling too late
  6. If the ramp-up time is too short, you will see a spike in response time and possibly errors coming up
  7. Change minReplicaCount to at least 2 for HA but match real-world normal traffic expectations

Understanding timing

Pay attention, this part is very important: always test for realistic load. Testing with a ramp-up of 10k users/s is probably not realistic and most likely will not work. Understanding your traffic patterns is critical.

Remember that the various components in the autoscaling system are not real-time. Prometheus has a scrapping interval, the HPA has a query interval, KEDA has a scaling interval, and then you have your pod startup time, etc. This can add up to a few minutes in the worst case scenario.

During load increase, only the current number of pods will be able to handle the incoming traffic, until KEDA detects the breach of threshold and triggers a scaling event. So you might experience more or less serious degradation of service until your new pods come up. Can your users tolerate a few seconds of latency? Up to you to decide what’s acceptable.

Example:

Let me try to illustrate what’s going on. Imagine an application which can serve 5 RPM and we set our autoscaling threshold to 4 RPM, and we configure our test with 10 threads and a ramp up time of 150 seconds, this means we have a ramp-up rate of 4 threads per minute. We calculated that it’d would take 1.5 min for KEDA to trigger a scale up, and for a new pod to be ready to receive requests. We can trace the following graph:

Keda023

In blue we show the number of users/min simulated by our load test, in orange, the capacity of a single pod and in purple, the threshold set in the autoscaler.

At the 1 minute mark, the threshold will be breached (blue line crossing), so 1.5 minutes after that – in the worst case – our second pod will be ready at the 2.5 minutes mark.

The vertical black line shows that the number of users at the 2.5 min would have already reached 10 so the single first pod will have to deal with up to 2x its RPM capacity until the second pod comes up.

We know our application can handle up to 5 RPS without service degradation, so we want to configure our tests so the ramp-up rate falls under the orange line. That’s a 2 threads/min ramp-up, hence we need to increase our ramp-up time in JMeter to 300 seconds and make sure our overall test duration is at least 300 seconds.

Tuning the threshold

In our previous example, what if your actual ramp-up in production is just that high? Before messing with the threshold, try this first:

  • Decrease your pod startup time
  • Decrease the autoscaler timers (not super recommended)
  • Improve your app performance so the RPS goes up
  • Be OK with slower response times for a short time

If none of that helps you achieve your goal, you can try lowering the threshold BUT you need to understand the tradeoffs. Let’s go back to our formula:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

You can see that the number of pods is directly related to the ratio between the threshold and the actual load. Now let’s say you want to handle a load of 1000 RPS.

If you set the threshold to 1000 RPS, the HPA will scale to 10 pods. Now, change the threshold to 50 RPS and the HPA will scale to 20 pods – i.e. twice the amount of pods – for the same load and same pod capacity!

A lower threshold will result in more pods for the same load, which will increase cost, waste resources (under-utilized pods) and potentially impact overall cluster performance. At the same time a lower threshold will result in less risk of degraded service.

Example of autoscaling based on our previously tested REST API

Autoscaled – 1400 users – 120 seconds ramp-up – 500 rps threshold

Keda025

Ramp-up time is too short and threshold is too high, resulting in a serious increase in response time for the first couple pods

Autoscaled – 1400 users – 240 seconds ramp-up – 500 rps threshold

Keda027

Double ramp-up time, still small spike in response time but better overall

Autoscaled – 1400 users – 240 seconds ramp-up – 400 rps threshold

Keda029

Decreased threshold improves response time degradation

Autoscaled – 1400 users – 240 seconds ramp-up – 300 rps threshold

Keda031

Lower threshold improved response time even more BUT…

Keda033

HPA scales to 4 pods @ 400 RPS threshold

Keda035

HPA scales to 6 pods @ 300 RPS threshold

In this case, we determined that 400 RPS was the correct threshold to avoid overly degraded response time during initial scale-up while maximizing resource utilization.

Impact of application performance problems in production

Autoscaling a part of a system means making sure the other parts can scale too

If an application response time starts increasing significantly, autoscaling can become a big problem if you’re not using the right metric.

A misconfigured autoscaler can result in much higher costs without benefit and negatively impact other systems.

For example, if an application becomes really slow because of a downstream problem with a database, adding more pods will not improve the situation. In some cases, that would actually aggravate the problem by putting more pressure on the downstream system.

A drop in response time would mean a drop in RPS. By using RPS as the scaling metric, in that case, we would actually decrease the number of pods to match what the system is actually capable of serving. If you instead scaled on response time, the number of pods would increase but the throughput would remain exactly the same. You’d just have stuck requests spred out across more pods.

Monitoring key metrics is critical to avoid runaway costs

Monitor HPA, understand how often pods come up and down and detect anomalies like unusually long response times. Sometimes autoscaling will mask critical problems and waste a lot of resources.

Improve your application’s resilience first

Sometimes it is actually better to not autoscale when you want back-pressure to avoid overwhelming downstream systems and provide feedback to users. It’s a good idea to implement circuit breakers, application firewalls, etc to guard against these problems

Continuous improvement

Autoscaling tuning CI/CD

All the steps above can be automated as part of your CI/CD pipeline. JMeter and Azure Load Tests can be scripted with ADO and ARM or Terraform templates.

This is to proactively track changes in application baseline performance which would result in changing the target value for the autoscaling metric.

You can easily deploy a temporary complete application stack in Kubernetes by using Helm. Run your scripted load tests, compare with previous results, and automatically update your ScaledObject manifest.

Reactive optimization

Monitoring the right platform and application metrics will surface optimization opportunities (and anticipate problems). Following are some of the metrics you want to keep an eye on:

Application response time: if the response time generally goes up, it might be time to re-evaluate your baseline performance and adjust your target RPS accordingly

Number of active pods: changes in active pods patterns usually indicate a sub-optimized autoscaling configuration. Spikes in number of pods can be an indication of a too low target

Pod CPU & memory utilization %: monitor your pods utilization to adjust your rightsizing settings

Request per seconds per pod: if the RPS of single pods is much below the configured target, the target is too low which results in underutilized pods

This process can also be automated to a certain extent. Some alerting mechanism which provides recommendation is best, in most cases you want a human looking at the metrics and decide on the appropriate action.

Conclusion

I’ll repeat what I’ve said at the very beginning of this article: autoscaling is not the solution to poor application performance problems. That being said if your application is optimized and you’re able to predictably scale horizontally, KEDA is the easiest way to get started with autoscaling. Just remember that KEDA is just a tool and in my experience, the number one impediment to a successful autoscaling implementation is a lack of understanding of testing procedures or lack of tests altogether. If you don’t want to end up with a huge bill at the end of the month, reach out to Perficient for help!

 

]]>
https://blogs.perficient.com/2023/01/06/6-steps-to-successful-autoscaling-on-kubernetes/feed/ 0 324354
Kubernetes Multi-Cluster Management – Part 3 https://blogs.perficient.com/2022/12/14/kubernetes-multi-cluster-management-part-3-2/ https://blogs.perficient.com/2022/12/14/kubernetes-multi-cluster-management-part-3-2/#respond Wed, 14 Dec 2022 17:18:01 +0000 https://blogs.perficient.com/?p=323506

Introduction

In part I of our Kubernetes multi-cluster management series, we’ve talked about the basics of GitOps and explained why you should really consider GitOps as a central tenet of your Kubernetes management strategy. In part II, we looked at a reference implementation of GitOps using ArgoCD, and how to organize your GitOps repositories for security and release management.

In this final installment of our Kubernetes multi-cluster management series, we’re taking our approach to the next level and look at enterprise scale, operational efficiency, compliance for regulated industries, and so much more. To help alleviate some of the shortcomings of the GitOps-only approach, we’re going to introduce a new tool which integrates really nicely with ArgoCD: RedHat Advanced Cluster Management (ACM) and the upstream community project Open Cluster Management.

I should mention right off-the-bat that ACM is not limited to RedHat Openshift clusters. If your hub cluster is Openshift, then you can use ACM and import your other clusters (AKS, EKS, etc) into it, but you can also use other clusters as hubs with Open Cluster Management.

Motivation

When I first heard about Advanced Cluster Management, I thought, why do I need this? After all, our GitOps approach discussed in Part II works and scales really well, up to thousands of clusters, but realized there are a few shortcomings when you’re deploying in the enterprise:

Visibility: because we use a pull model, where each cluster is independent of the others and manages its own ArgoCD instance, identifying what is actually deployed on a whole fleet of cluster is not straight forward. You can only look at one ArgoCD instance at a time.

Compliance: while it’s a good idea to have standard configuration for all your clusters in your organization, it’s not always possible, in practice, to force every team to be in sync all at the same time. You can certainly configure GitOps to sync manually let individual teams pick and choose what/when they want deployed in their cluster, but how do you keep track of gaps at a global level?

Management: GitOps only deals with configuration, not cluster lifecycle, and by that I mean provisioning, starting, pausing clusters. Depending on the organization, we’ve been relying on the cloud provider console or CLIs, and/or external automation tools like Terraform to do this. But that’s a lot more complicated in hybrid environments.

Use Case

Let’s focus on a specific use case and see how to address the above concerns by introducing advanced cluster management. The organization is a large retailer with thousands of locations all over the US. Each individual store runs an application which collects Bluetooth data and computes the information locally to customize the customer’s experience. The organization also run applications in the cloud like their website, pipelines to aggregate collected store data, etc. all on Kubernetes.

Requirements

  1. All the stores run the same version of the application
  2. Application updates need to happen at scale over-the-air
  3. All the clusters (on location and in the cloud) must adhere to specific standards like PCI
  4. Development teams need to be able to simulate store clusters in the cloud
  5. The infrastructure team needs to quickly provision new clusters
  6. The compliance and security teams regularly produce reports to show auditors

Solution

GitOps-only

1-4 requirements can be mostly addressed with our GitOps-only approach. This is the overall architecture following what we presented in part II

retail edge

  • Baseline configuration for all our clusters is in a shared repository managed by the infrastructure team
  • The baseline contains dependencies and tools like operators, monitoring and logging manifests, etc. And some of the PCI-related configuration.
  • Various teams can contribute to the baseline repository by submitting pull-requests
  • Store clusters are connected to a second repository which contains the BT application manifests
  • Stores clusters can be simulated in the cloud by connecting them to the same repository
  • Dev teams have dedicated repositories which they can use to deploy applications in the cloud clusters
  • For simplicity sake, we only have two environments: dev and prod. Each environment is connected to the repository branch of the same name
  • New releases are deployed in the dev branches, and promoted to production with Git merge

So GitOps-only goes a long way here. From a pure operational perspective, we covered most of the requirements and the devops flow is pretty straight forward:

  1. Dev teams can simulate a store cluster in the cloud
  2. They can deploy and test changes by committing to the dev branches of the repos
  3. When they’re ready to go, changes are merge to prod and automatically deployed to all the edge clusters
  4. All the clusters share the same baseline configuration with the necessary PCI requirements

ACM and OCM overview

Red Hat Advanced Cluster Management and the upstream community project Open Cluster Management is an orchestration platform for Kubernetes clusters. I’m not going to go into too many details in this article. Feel free to check out the documentation on the OCM website for a more complete explanation on architecture and concepts.

For the moment all you need to know is:

  • ACM uses a hub-and-spoke model, where the actual ACM platform is installed on the hub
  • The hub is where global configuration is stored
  • Agents run on the spokes to pull configuration from the hub

Hub-and-spoke Architecture

  • ACM provides 4 main features:
    • Clusters lifecycle management (create, start/stop/pause, group, destroy clusters)
    • Workload management to place application onto spokes using cluster groups
    • Governance to manage compliance of the clusters through policies
    • Monitoring data aggregation

Both tools can be installed in a few minutes through the operators’ marketplace, with ACM being more integrated with Openshift specifically and Red Hat enterprise support.

Cluster Provisioning

GitOps cannot be used to provision clusters so you need to rely on something else to create the cluster infrastructure and bootstrap the cluster with GitOps. You have 2 options:

  • If you’re deploying non-Openshift clusters, you will need to use some kind of automation tool like Ansible or Terraform or provider specific services like Cloudformation or ARM. I recommend wrapping those inside a CI/CD pipeline and manage the infrastructure state in Git so you can easily add a cluster by committing a change to your central infrastructure repo.
  • If you’re deploying Opensihft clusters, then you can leverage ACM directly, which integrates with the main cloud providers and even virtualized datacenters. A cool feature of ACM is cluster pools, which allows you to pre-configure Openshift clusters and assign them to teams with one click. More on that later…

Regardless of the approach, you need to register the cluster with the Hub. This is a very straight forward operation, which can be included as a step in your CI/CD pipeline, and is managed by ACM automatically if you’re using it to create the clusters.

Cluster bootstrapping

Once the blank cluster is available, we need to start installing dependencies, basic tooling, security, etc. Normally we would add a step in our provisioning CI/CD but this is where Advanced Cluster Management comes in handy. You don’t need an additional automation tool to handle it, you can create policies in ACM, which are just standard Kubernetes manifests, to automatically install ArgoCD on newly registered clusters and create an ArgoCD application which bootstraps the clusters using the baseline repository.

The 3 bootstrapping policies installed on our hub

The 3 bootstrapping policies installed on our hub

ACM governance is how we address point 5 and 6 in the requirements. You can pick existing policies and/or create new ones to apply specific configuration to your clusters, which are tied to a regulatory requirement like PCI. Security and compliance teams, as well as auditors can quickly identify gaps through the ACM interface.

ACM Governance Dashboard

ACM Governance Dashboard

You can choose whether you want policies to be enforced or just monitored. This gives your teams flexibility to decide when they’re ready to upgrade their cluster configuration to get into compliance.

In our case, our policy enforces the installation and initialization of GitOps on every single cluster, as a requirement for disaster recovery. This approach allows us to quickly provision new edge clusters and keep them all in sync:

  • Provision the new cluster hardware
  • Install Kubernetes
  • Register with the hub
  • As soon as the cluster is online, it bootstraps itself with GitOps
  • Each cluster then keeps itself up-to-date by syncing with the edge repo prod branch

Workload Management

You have 2 ways to manage workloads with Advanced Cluster Management:

Subscriptions: this is essentially ACM-native GitOps if you’re not using ArgoCD. You create channels to monitor Git repositories and use cluster sets to target groups of clusters to deploy the applications into. The main difference with GitOps is the ability to deploy to multiple clusters at the same time

ArgoCD ApplicationSets: this is a fairly new addition to ArgoCD which addresses the multi-cluster deployment scenario. You can use ApplicationSets without ACM, but ACM auto-configures the sets to leverage your existing cluster groups, so you don’t have to maintain that in two places

ApplicationSets are the recommended way but Subscriptions have some cool features which are not available with ArgoCD like scheduled deployments.

I have one main concern with ApplicationSets, and it’s the fact that it relies on a push model. ApplicationSets generates one ArgoCD Application per target cluster on the hub and syncs the changes from the one ArgoCD server on the hub. When you’re talking 100s or 1000s of clusters it will put a lot of load on the hub.

Another potential issue is ingress networking for the retail locations to allow the hub to push things out to the spoke. It’s usually easier to configure a NAT on the local network, going out to the internet to pull the manifests from Git.

So… in this case, we’re just not going to use any of the ACM provided method and stick to our approach where each cluster manages its own ArgoCD server 😀

But… we can still use ACM for visibility. ACM application page can show you all the workloads deployed on all your clusters in one place, show you deployment status, basic information, and makes it very easy to drill down into logs for example. And navigate to each individual cluster’s ArgoCD UI.

Application dashboard

Application dashboard

Note on cluster pools

I want to go back to cluster pools for a little bit because it’s a really neat feature if you are using Openshift.

This is an example of a cluster pool on AWS:

AWS Cluster Pool

AWS Cluster Pool

This cluster pool is configured to always have one cluster ready to be assigned. Notice the “claim cluster” link on the right. As soon as a team claims this cluster, the pool will automatically provision a new one, if so configured.

Cluster pool configuration

Cluster pool configuration

In this case the pool is configured with 10 clusters, with one warm cluster (not bootstrapped) available to be claimed immediately at all time.

Pool dashboard

Pool dashboard

Here we see 2 clusters provisioned through the pool, which we previously claimed. Notice the auto-generated cluster name indicating the origin of the cluster as rhug-.

Note that clusters deployed on AWS that way are not managed by AWS and Red Hat, unlike AWS ROSA clusters, although they are provisioned using the same installation method. Something to keep in mind. If unmanaged clusters are a deal breaker, you can always provision a cluster on ROSA and register it with the hub explicitly.

There are two great use case for cluster pools & cluster lifecycle management:

  • Quickly provisioning turnkey new clusters for development teams. If you maintain your shared config repositories correctly, you can build and bootstrap brand new, fully compliant and ready-to-use clusters in a matter of minutes. You can terminate a cluster at the end of the work day and re-create it identical the next morning to save on cost, reset clusters to the base configuration if you messed it up, etc
  • Active-passive disaster recovery across regions or even cloud providers. If you want to save on your redundant environment infrastructure cost, and you don’t have Stateful applications (no storage), create a second cluster pool in the other region/cloud and hibernate your clusters. When you need to failover, resume the clusters in the standby region, and they will very quickly start and get up-to-date with the last-know good config.

Conclusion

There is so much to discuss on the topics of GitOps and ACM, this is barely scratching the surface, but hopefully it gave you enough information to get started on the right foot and to get excited about the possibilities. Even if you only have one cluster, GitOps is very easy to setup, and you’ll rip the benefits on day 1. I strongly recommend you check out part II of this series for a reference. As a reminder, I also shared a CloudFormation template which deploys an implementation of that architecture using AWS ROSA (minus ACM as of today). When you’re ready to scale, also consider Advanced Cluster Management for the ultimate multi-cluster experience. I haven’t had a chance to include it in my reference implementation yet, but check back later.

Stay tuned for my next GitOps articles on ArgoCD security and custom operators…

]]>
https://blogs.perficient.com/2022/12/14/kubernetes-multi-cluster-management-part-3-2/feed/ 0 323506