GitLab CI/CD (Continuous Integration/Continuous Deployment) is a powerful, integrated toolset within GitLab that automates the software development lifecycle (SDLC). It simplifies the process of building, testing, and deploying code, enabling teams to deliver high-quality software faster and more efficiently.
Getting started with GitLab CI/CD is simple. Start by creating a GitLab account and setting up a project for your application if you don’t have then install and configure a GitLab Runner, a tool responsible for executing the tasks defined in your .gitlab-ci.yml file. The runner handles building, testing, and deploying your code, ensuring the pipeline works as intended. This setup streamlines your development process and helps automate workflows efficiently.
A pipeline automates the process of building, testing, and deploying applications. CI (Continuous Integration) means regularly merging code changes into a shared repository. CD (Continuous Deployment/Delivery) automates releasing the application to its target environment.
Related CODE: In this step, you push your local code changes to the remote repository and commit any updates or modifications.
CI Pipeline: Once your code changes are committed and merged, you can run the build and test jobs defined in your pipeline. After completing these jobs, the code is ready to be deployed to staging and production environments.
A .gitlab-ci.yml file in a GitLab repository is used to define the Continuous Integration/Continuous Deployment (CI/CD) pipeline configuration. This file contains instructions on building, testing, and deploying your project.
In GitLab CI/CD, a “runner” refers to the agent that executes the jobs defined in the .gitlab-ci.yml pipeline configuration. Runners can be either shared or specific to the project.
Pipelines are made up of jobs and stages:
First is manually means directly commit, when you merged or commit any changes into code pipeline directly trigger.
And second is by using rules for that, you need to create a scheduled job.
We use scheduled jobs to automate pipeline execution. To create a scheduled job, follow these steps:
After installing GitLab Runner, proceed to register it. Navigate to GitLab, go to Settings, then CI/CD, and under Runners, click on the three dots to access the registration options.
And copy-paste the below cmd:
Run the following command on your EC2 instance and provide the necessary details for configuring the runner based on your requirements:
Check GitLab-runner status and active status using the below cmd:
Check gitlab-runner is active in gitlab also:
Navigate to GitLab, then go to Settings and select GitLab Runners.
Note: If needed, you can add a test job similar to the BUILD and DEPLOY jobs.
Since the Cron job is already configured in the schedule, simply click the Play button to automatically trigger your pipeline.
To check pipeline status, go to Build and then Pipeline. Once the Build Job is successfully completed, the Test Job will start, and once the Test Job is completed, the deploy job will start.
We successfully completed BUILD & DEPLOY Jobs.
Deploy Job
Conclusion
As we can see, the BUILD & DEPLOY jobs pipeline has successfully passed.
We’ve provided a brief overview of GitLab CI/CD pipelines and a practical demonstration of how its components work together. Hopefully, everything is running smoothly on your end!
]]>
Many retailers are embarking on a digital transformation to modernize and scale their order management system (OMS) solution. Built on a modern architecture, the solution wraps Docker containers around order management business services. This architecture streamlines application management and the release of new functionality. The container technology also supports varying levels of technical acumen, business continuity, security, and compliance. If you want to reduce capital and operational expenditures, speed time to market, and improve scalability, elasticity, security, and compliance, you should consider moving your on-premises IBM Sterling application to IBM supported native SaaS or other cloud solutions which best suits your business.
IBM offers retailers three distinct hybrid cloud solutions tailored to their specific needs. The first option involves a do-it-yourself (DIY) approach with containers on any platform. While offering flexibility, it comes with potential downsides such as slower time to market, increased operational costs, and higher risk due to the intricacies of self-managing containerized environments. The second option introduces a more robust solution with IBM Certified Containers deployed using Kubernetes, striking a balance between customization and manageability. Option three, the most advanced choice, employs IBM Certified Containers deployed through the Red Hat OpenShift Containers Platform. This enterprise-grade solution prioritizes faster time to market, reduced operational costs, and lower risk, providing a secure and comprehensive hybrid cloud environment for organizations seeking efficiency and reliability in their IT transformation endeavors.
IBM Sterling Order Management certified containers are distributed in the form of three images—om-base, om-app, and om-agent—via the IBM Entitled Registry. This distribution utilizes licensed API keys, streamlining the process for customers to conveniently retrieve and access these containers in their local registries or incorporate them seamlessly into their CI/CD pipelines.
IBM offers its native Software as a Service (SaaS), commonly known as IBM Cloud or CoC, taking on the responsibility for hosting, managing, maintaining, and monitoring the entire Order Management (OM) ecosystem. This allows customers to direct their focus toward achieving their business requirements and enhancing business services. IBM’s ownership and management of the DevOps process facilitate automatic upgrades of the OMS application with new features, alongside activities such as backup, database reorganization, and upgrades/patches for WebSphere Application Server (WAS) Liberty, MQ, DB2, and Red Hat Enterprise Linux (RHEL). The proactive monitoring of system performance, coupled with the establishment of automatic alerts and remediation procedures for instances of high CPU/memory usage, ensures a seamless experience for customers. Convenient access to detailed audits/graphs of system performance is provided through a self-serve tool, complemented by log monitoring via Greylog.
In contrast, three other well-regarded cloud solutions compatible with IBM Sterling Certified containers—Amazon AWS, Microsoft Azure, and Oracle Cloud Infrastructure (OCI)—present unique advantages. However, customers opting for these alternatives bear the responsibility of implementing measures to manage, maintain, and monitor the entire Order Management (OM) ecosystem. This encompasses tasks such as database backups, infrastructure upgrades, and system performance monitoring. Additionally, customers must seamlessly integrate with logging tools of their choice when opting for these alternatives.
In conclusion, the shift towards a modernized and scalable Order Management System (OMS) is becoming imperative for retailers undergoing digital transformation. The adoption of IBM Sterling Certified Containers and Software as a Service (SaaS) solutions presents a strategic pathway to enhance flexibility, speed, efficiency, and security in managing the OMS ecosystem. IBM’s hybrid cloud offerings provide retailers with tailored choices, allowing them to align their preferences with the desired level of customization, manageability, and risk. The option to leverage IBM’s native SaaS or explore alternate cloud solutions like Amazon AWS, Microsoft Azure or Oracle Cloud underscores the adaptability of IBM Sterling solutions to diverse business needs. As retailers navigate the complexities of modernizing their OMS, the comprehensive support provided by IBM’s SaaS offerings stands out, ensuring a secure, efficient, and future-ready infrastructure for their digital endeavors.
Key Links-
Deploy Sterling Order Management on Azure Red Hat OpenShift – IBM Developer
Deploy IBM Sterling Order Management Software in a Virtual Machine on Oracle Cloud Infrastructure
]]>Most server management infrastructure tasks have been automated for some time, but network changes can still create a bottleneck. Red Hat Ansible enables you to automate many IT tasks including cloud provisioning, configuration management, application deployment, and intra-service orchestration. With Ansible you can configure systems, deploy software, and coordinate more advanced IT tasks such as continuous integration/continuous deployment (CI/CD) or zero downtime rolling updates.
Our Ansible Accelerator provides an overview of what Ansible can do to help modernize and streamline your DevOps and IT operations. The accelerator is available at three different intervention levels: a workshop, technical enablement, or full team consulting. In 6-12 weeks, we architect a proof of concept that delivers a more secure, compliant, reliable, and automated solution for you and your business.
Ready to Accelerate?
Red Hat provides open-source technologies that enable strategic cloud-native development, DevOps, and enterprise integration solutions to make it easier for enterprises to work across platforms and environments. As a Red Hat Premier Partner and a Red Hat Apex Partner, we help drive strategic initiatives around cloud-native development, DevOps, and enterprise integration to ensure successful application modernization and cloud implementations and migrations.
Last week, our team attended Red Hat Summit, in Boston, MA. This past event marks the first time Red Hat combined Ansible Fest with Red Hat Summit. During the three-day conference, Red Hat partners, clients, and vendors got together to hear from Red Hat leadership, industry experts, and get hands-on experience with Red Hat platforms. The Perficient team learned about new capabilities and technologies, heard new product announcements, and connected with peers and clients from across industries.
The stars of Ansible Fest at this year’s Red Hat Summit were undoubtedly the imminent general availability of Event Driven Ansible and the release of a developer plugin called Lightspeed which takes advantage of generative AI to help speed up automation development.
General availability for Event-Driven Ansible (EDA) is set to be included with the next version of Ansible Automation Platform (AAP) 2.4, which should be available for subscribers beginning in June of this year. So, right around the corner.
EDA introduces the concepts of rulebooks which define sources, rules, and actions to kick off automation. Where sources are things like metrics from an APM like Prometheus or Dynatrace, security events from a SIEM, changes to files, and so on. Rules are the conditions to act on from the source. Finally, actions are the defined automation tasks to carry out, like running a playbook or launching a job or workflow from AAP due to its close integration with the AAP controller.
Event-driven content is already being certified by Red Hat for the AAP 2.4 launch. More content will be released in the Automation hub as partners certify and release it.
With Ansible expanding to include the new EDA rulebooks, it’s important to maintain development velocity and quality. Red Hat’s strategy to help is to provide a powerful new VS Code extension leveraging IBM Watson-based generative AI.
Formerly known as “Project Wisdom”, Red Hat will soon be providing a targeted generative AI plugin for Microsoft’s Visual Studio Code editor. The demos presented in Boston last week were exciting. My take was that the plugin works like a slick combination of ChatGPT, intellisense, and tab-completion. If the final release is anything like the demos, developers will be able to prompt code generation with the name:
block of an ansible task. Lightspeed will process for a moment and offer generated code complete with fully qualified collection names, parameters, and even variables inferred from vars blocks and imported vars files.
Lightspeed is still in a closed beta, so to get on the waiting list to try it out you’ll want to visit https://www.redhat.com/en/engage/project-wisdom#sign-up and make sure to include your GitHub ID.
For anyone concerned about being required to participate in the language model for Lightspeed, Red Hat made it a point to emphasize that the data collection for your ansible code will be an opt-in option, meaning data collection is off until explicitly turned on by you, the developer.
Check out this video!
Red Hat provides open-source technologies that enable strategic cloud-native development, DevOps, and enterprise integration solutions to make it easier for enterprises to work across platforms and environments. As a Red Hat Premier Partner and a Red Hat Apex Partner, we help drive strategic initiatives around cloud-native development, DevOps, and enterprise integration to ensure successful application modernization and cloud implementations and migrations.
]]>To thoroughly grasp what open source is, one should understand what it is not. Open source is not restricted by licensing agreements, and the user behind open-source software is not forbidden to change, edit, study, or redistribute manipulated versions of it.
Open-source software grants its users a degree of accessibility that is not possible through its proprietary counterpart. Open-source codes are published publicly for all who wish to study and manipulate them, whereas proprietary software keeps users more restricted, inside hard, iron-clad lines.
Richard Stallman, founder of the free Unix-style operating system GNU and leading voice in the open-source movement, asserts that there are four essential freedoms of open source:
These freedoms, for Stallman and open-source advocates everywhere, are part of what makes open-source a huge driver of innovation. Due to its free nature, open source inevitably cultivates collaboration and prompts interaction among those in the software world. Code can be constantly shared in open-source environments. This leads to increased productivity because coders waste less time searching for solutions to problems, and it supports the diversity of skill sets.
If a glitch occurs when using proprietary software, especially in the business realm, one typically must go through many channels to get it fixed; open-source software, on the other hand, gives the user a greater sense of agency over issues and user experience. This is convenient for expert software engineers and is integral for educational purposes, as it allows students to learn through application. Any student of code, whether they be pursuing a degree in computer science, or a hobbyist trying to make their own program from scratch, can click “view source” in their web browser and dive deeply into the recipe of the site’s open-source code.
This education is also driven by the open-source community’s expectation that users will be active participants in its democracy. Open source follows the philosophy that all can contribute to the pot of knowledge, and discoveries should not be withheld under the guise of intellectual property.
Open source empowers the user over the program and encourages the utmost technological collaboration and education. It allows users the liberty to change the source, making it do what they want it to do. Rather than the user remaining stuck inside the constraints instilled by a proprietary developer, the open-source experience allows a higher potential to execute the exact desire of the user. The philosophy of open source flips the notion that one must maneuver code in the bounds of the preexisting and promotes a more dispersed power dynamic.
***
Perficient partners with many open-source companies to deliver innovative, scalable solutions to our clients. Interested in learning more about how your company can reap the benefits of open source? Contact one of our strategists today.
]]>As an IT leader, you know that adopting a multicloud strategy is a must-have in today’s digital landscape but selecting the right tools can be a bit of a headache. The “IT Leader’s Guide to Multicloud Readiness” is a practical guide that provides key insights and important factors to consider in your multicloud strategy. In the guide, we speak to effective cloud-agnostic tools to help you achieve your goals.
Today, we’ll take a deeper dive in five of the most popular tools mentioned in the guide – Terraform, Azure DevOps, Ansible Automation Platform, Red Hat OpenShift, and CloudBolt – their use cases, strengths, and weaknesses of these tools to help you determine if they are the right fit for your organization.
Terraform is a popular infrastructure as code (IaC) tool that allows you to run code and deploy infrastructure across multiple cloud platforms. It has gained widespread popularity due to its modern syntax that is easy to read and pick up, even for beginners. Terraform is a powerful tool for provisioning infrastructure, whether it’s in the cloud or on-premises. DevOps teams will love Terraform’s ability to work with CI/CD tools, making infrastructure deployment a breeze. Additionally, Terraform’s cloud-agnostic nature means it can deploy infrastructure across multiple cloud platforms, making it perfect for multi-cloud environments.
[CONTINUED BELOW…]
Azure DevOps is a cloud-based service that provides a comprehensive platform for software development and delivery. It offers a range of tools and services to help teams plan, build, test, and deploy applications with ease. With Azure DevOps, teams can collaborate seamlessly, streamline their workflows, and deliver high-quality software products faster and more reliably. It offers a range of use cases, such as Continuous Integration and Continuous Deployment (CI/CD), Agile Project Management, Version Control, and Infrastructure as Code (IaC). With Azure DevOps pipelines, teams can create workflows that integrate with other Microsoft tools such as Visual Studio and many cloud platforms and services, allowing them to spend less time on manual work and more time on coding.
Red Hat Ansible Automation Platform, formerly known as Ansible Tower, is a popular IT automation tool that allows organizations to manage, automate, and orchestrate their infrastructure from a central location. It provides a web-based interface and REST API that helps users to manage Ansible workflows, job scheduling, and inventory management. Ansible is commonly used in DevOps environments as it enables easy configuration management of infrastructure and applications across multiple environments. With Ansible, users can automate tasks such as deployment, scaling of infrastructure, software updates, security patching, and backups, which saves time and reduces errors.
Ansible is also great for configuration management of infrastructure such as VMs, switches, and load balancers. It allows users to write configurations once and apply them to many different machines, which reduces configuration drift and provides a single source of truth. Additionally, Ansible can integrate with other tools to create more complex workflows and automation, such as triggering another automation job, opening tickets for users who need assistance, or trying a configuration a different way.
Red Hat OpenShift is an enterprise-grade, open-source container application platform that simplifies the building, deployment, and management of containerized applications across a variety of environments. Built on top of Kubernetes, it provides robust security, compliance, and monitoring capabilities, as well as features and tools to enhance the development process. With OpenShift, you can automate many aspects of the software development and deployment process, save time and energy, and build, test, and deploy your applications without worrying about managing underlying infrastructure. OpenShift simplifies the process of deploying and managing containerized applications across a wide range of environments, making it an ideal platform for Application deployment and management, DevOps automation, and Platform as a Service (PaaS) scenarios.
CloudBolt is a cloud management platform designed to help organizations manage their cloud infrastructure across different platforms and providers. It allows teams to provision, manage, and optimize resources in the cloud from a single interface. With CloudBolt, organizations can optimize their cloud spend by providing clear visibility, recommendations, and monitoring tools to help them cut costs and maximize ROI. Additionally, managing resources across multiple cloud providers becomes easier, eliminating the need for different tools and portals. CloudBolt also empowers teams to self-provision resources on demand, reducing the burden on central IT teams.
The tools we discussed above are crucial for IT leaders who are preparing for multicloud adoption. They provide valuable insights and help organizations streamline workload deployment and management, optimize costs, and maintain critical app and service reliability.
For those seeking a deeper dive into multicloud, the ‘IT Leader’s Guide to Multicloud Readiness’ is an excellent resource that offers practical guidance on developing a multicloud strategy that aligns with business objectives, identifying the right cloud providers and services, and aligning IT teams and stakeholders for successful multicloud adoption. With the right tools and guidance, IT leaders can confidently navigate the challenges of multicloud adoption and unlock the full potential of cloud technology.
]]>Ansible Tower users are gearing up for a big migration to Ansible Automation Platform 2. Ansible Tower 3.8 is technically AAP 1.2, which sunsets in September of 2023. AAP has a few usability updates like the improved Job search, which now lets you search in the web UI for specific job details like the limit
which is welcome, also the stdout and stderr in the UI is more readable. A Private Automation Hub is available which acts as a combination container registry and on-LAN Ansible Galaxy server. Automation hub brings frequently used collections closer to the controllers which can speed up job execution by removing the need to pull collections directly from the internet. It also hosts the execution environments which AAP uses to execute the ansible plays in containerized runtimes. It’s one of the biggest fundamental difference between AAP and the outgoing Tower product. Where Tower relies on python virtual environments to organize competing python dependencies, its replacement uses container runtimes. The execution environments are more portable than the old virtual environments which must be created or recreated for each developer. Having a container image which runs ansible jobs means developers can pull what they need and get to writing their automation and configuration instead of wrangling their different python environments.
This post will walk through a typical python virtual environment and a simple execution environment creation and execution. At the end I’ll demonstrate the Ansible’s text-based user interface (TUI) with ansible-navigator. There is a lot more to the tool than what I talk about here. This post also assumes a basic familiarity of Python, Red Hat Enterprise Linux, and container tools like Docker or Podman. I’d encourage anyone working with AAP or AWX to also look into ansible-builder for building execution environments. For now, we’re just customizing containers and using them to run ansible jobs.
Until recently, developing ansible automation has been pretty straightforward once you get the hang of it. For the most part it’s just a matter of writing some yaml, double-checking your spacing, maybe installing a collection or two and testing. But what if one of your collection requires a python package outside of the standard library? Your best bet is usually to create python virtual environments that contain everything you need for your plays. A python virtual environment is just that. It’s a way to isolate your python development environment with packages that are only available when the environment is active and keeps unnecessary python packages from polluting your system-python environment.
To create a virtual environment just run
python3 -m venv ansible-test
Then activate it by sourcing the activate script
. ansible-test/bin/activate
(note the leading dot .
)
With the virtual environment active, you can now install any prerequisite packages you might need to support a galaxy collection and the collection itself. For instance the MySQL collection has both system and python dependencies. For Red Hat based distros the system packages are: gcc, python-devel and mysql-devel. Additionally there are also two python package requirements: PyMySQL and mysqlclient. We’ll want to install the system packages with dnf/yum, then pip install our python dependencies. Here’s our barebones requirements.txt for pip packages and requirements.yml for galaxy content – we’ll use these later in the execution environment as well:
requirements.txt
PyMySQL mysqlclient
requirements.yml
collections: - name: community.mysql version: 3.6.0
Now we Install our System Dependencies with:
dnf install gcc mysql-devel python-devel -y
And Our Python Dependencies with:
pip install -r requirements.txt && ansible-galaxy collection install -r requirements.yml
So, not too bad. Just setup a virtual environment, activate it, install your requirements and get to developing. But what if you don’t want to install system packages? Maybe it conflicts with something else installed on your development machine. How do you collaborate with with your team? How do you keep your virtual environments in sync? Of course you can use ansible to create a virtual environment with the pip module, but there might be a better way altogether using a containerized execution environment.
If you’re just getting started with ansible today or if your organization is using AAP or AWX you might want to look at the latest Ansible Content Navigator tool: ansible-navigator. Ansible-navigator combines a lot of the break-out CLI commands listed earlier into a single executable and provides an optional TUI interface to drill-down into playbook execution. More, it eliminates the need for a python virtual environment and swaps it for a more portable and modern containerized execution environment (ee). It’s still on the developer to customize the execution environment, but the upside is you can now push the whole environment to a registry and the ansible content you write will run the same from anywhere that container can run. This is how Red Hat’s AAP and the upstream AWX work, so if you’re using one of those you’ll want to be sure your dev environment is consistent with your production automation platform. Developing automation using the same container image that the controller uses is the trick.
AAP comes with a few standard execution environments out of the box that automation developers can pull to their dev box. Each image is an ansible aware container with some base collections to run your playbooks. The base image I’m using in this example is at quay.io/ansible/creator-ee. It’s got a few base collections, ansible-core, and an ansible-runner to execute the plays. All the same container customizations apply here as they would with any other container image. Future posts might go into using ansible-builder, but for today I’m sticking to plain vanilla container customization.
Lets take that MySQL example for instance. Here’s an example Containerfile that we might use to get started to run our community.mysql plays:
MySQLContainerfile
FROM quay.io/ansible/creator-ee COPY requirements.txt . COPY requirements.yml . RUN microdnf install gcc python3-devel mysql-devel -y RUN pip install --upgrade pip RUN pip install -r requirements.txt RUN ansible-galaxy collection install -r requirements.yml -p /usr/share/ansible/collections
Note: Here we’ve offloaded those system-wide packages to the container instead of our own system. Also I’ve instructed ansible-galaxy to install the collection to the container’s collections directory inside the image. This ensures the collection persists beyond the initial image creation. It’s where the rest of the default collections like ansible.posix and kubernetes.core are, so it’s good enough for me for now.
Save that to a containerfile called MySQLContainerfile (or whatever you want to call it) and build your image. I’m using podman here, but feel free to use docker if that’s your jam.
podman build -t registry.example.com/ansible-demo/mysql-ee:0.0.1 -f MySQLContainerfile
Now we can create and test our plays using the new execution environment we just created and if all goes well, we’ll push the image to our registry to be used by other developers or make it available to AAP.
podman push registry.example.com/ansible-demo/mysql-ee:0.0.1
Lets start with a simple play that installs mariadb (an open fork of mysql), initializes a new schema, and adds a user that can connect with a password on the localhost and from our ansible controller.
Here’s the playbook itself:
mysqldb.yml
--- - name: Install and initialize a mysql database hosts: db become: true vars_files: - secrets.yml tasks: - name: Install SQL packages ansible.builtin.package: name: "{{ db_packages }}" state: present - name: Open Host Firewall for SQL Connections ansible.posix.firewalld: service: mysql permanent: true immediate: true state: enabled - name: Start SQL Server Service ansible.builtin.service: name: "{{ db_service }}" state: started - name: Create .my.cnf ansible.builtin.template: src: templates/my.j2 dest: "{{ db_super_path }}.my.cnf" - name: Create Database community.mysql.mysql_db: login_unix_socket: /var/lib/mysql/mysql.sock name: "{{ db_name }}" state: present - name: Add user to {{ db_name }} community.mysql.mysql_user: login_unix_socket: /var/lib/mysql/mysql.sock name: "{{ db_user }}" password: "{{ db_pass }}" priv: '{{ db_name }}.*:ALL' host: "{{ item }}" state: present loop: "{{ db_hosts }}"
And my secrets.yml file that I’m decrypting by passing –vault-id to the ansible-navigator command. More on that in just a bit.
Here’s my secrets.yml with swapped out passwords. Please don’t use passwords this crummy .
secrets.yml
db_user: “demo_user” db_pass: "password123" db_super_pass: “$uper$ecure”
Finally, we just have a simple template file to create root’s .my.cnf credential file in the fourth task.
templates/my.j2
[client] user=root password={{ db_super_pass }}
I’m including the secrets here because using ansible-vault with ansible-navigator can be a little tricky but easy to demonstrate.
For the decryption, I’ve just set an environment variable
export ANSIBLE_VAULT_PASSWORD=myverysecurepassword
And have a simple bash script that I pass to navigator that just echoes that back out.
vault-pass.sh
#!/bin/bash echo ${ANSIBLE_VAULT_PASSWORD}
Now, we can run our playbook with the following command using a containerized execution environment and make available to AAP.
ansible-navigator run mysqldb.yml --eei registry.example.com/ansible-demo/mysql-ee:0.0.1 --vault-id "vault-pass.sh" -m stdout
PLAY [Install and initialize a mysql database] ********************************* TASK [Gathering Facts] ********************************************************* ok: [fedora1] TASK [Install MySQL package] *************************************************** changed: [fedora1] TASK [Open Host Firewall for SQL Connections] ********************************** changed: [fedora1] TASK [Start SQL Server Service] ************************************************ changed: [fedora1] TASK [Copy .my.cnf] ************************************************************ changed: [fedora1] TASK [Create Database] ********************************************************* changed: [fedora1] TASK [Add user to demo_db] ***************************************************** changed: [fedora1] => (item=192.168.64.2) changed: [fedora1] => (item=localhost) PLAY RECAP ********************************************************************* fedora1 : ok=8 changed=7 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Passing -m stdout
displays the results in the traditional ansible format. To use the TUI interface instead, leave off the -m stdout
.
Just type the number of the row you want to look into (use esc
to go back).
Play name. Ok Changed Unreachable Failed Skipped Ignored In progress Task count Progress 0│Install and initialize a mysql database 7 6 0 0 0 0 0 7 Complete ^b/PgUp page up ^f/PgDn page down ↑↓ scroll esc back [0-9] goto :help help Successful
Typing 0
here will take us into the play execution below:
Result Host Number Changed Task Task action Duration 0│Ok fedora1 0 False Gathering Facts gather_facts 1s 1│Ok fedora1 1 True Install MySQL package ansible.builtin.package 9s 2│Ok fedora1 2 True Open Host Firewall for SQL Connections ansible.posix.firewalld 0s 3│Ok fedora1 3 True Start SQL Server Service ansible.builtin.service 3s 4│Ok fedora1 4 True Copy .my.cnf ansible.builtin.template 1s 5│Ok fedora1 5 True Create Database community.mysql.mysql_db 0s 6│Ok fedora1 6 True Add user to demo_db community.mysql.mysql_user 1s ^b/PgUp page up ^f/PgDn page down ↑↓ scroll esc back [0-9] goto :help help Successful
Now let’s look at line 5
to see more information on the Create Database
task.
The output below shows us all of the parameters and results of a given task in YAML format:
Play name: Install and initialize a mysql database:5 Task name: Create Database CHANGED: fedora1 0│--- 1│duration: 0.392973 2│end: '2023-05-02T07:32:19.647181' 3│event_loop: null 4│host: fedora1 5│play: Install and initialize a mysql database 6│play_pattern: db 7│playbook: /home/lxadmin/mysql/mysqldb.yml 8│remote_addr: fedora1 9│res: 10│ _ansible_no_log: null 11│ changed: true 12│ db: demo_db 13│ db_list: 14│ - demo_db 15│ executed_commands: 16│ - CREATE DATABASE `demo_db` 17│ invocation: 18│ module_args: 19│ ca_cert: null 20│ chdir: null 21│ check_hostname: null 22│ check_implicit_admin: false 23│ client_cert: null 24│ client_key: null 25│ collation: '' ^b/PgUp page up ^f/PgDn page down ↑↓ scroll esc back - previous + next [0-9] goto :help help Successful
Finally, lets test the mysql connection from our ansible controller using our provisioned user to connect to the new database:
mysql -u demo_user -p -h fedora1 Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 5.5.5-10.5.18-MariaDB MariaDB Server Copyright (c) 2000, 2022, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use demo_db; Database changed
So there’s a little more setup on the front-end to get started using execution environments over the old python virtual environments, but I think it’s worth trying out. Especially for teams who are really digging into what AAP and AWX and want that consistent development environment.
]]>Red Hat Summit is only a few weeks away! The conference will take place on May 23-25, 2023, in Boston, Massachusetts, at the Boston Convention and Exhibition Center. Still not registered? Click here!
Red Hat Summit is the premier open-source technology event, that brings together customers, partners, industry thought leaders, and community contributors to learn, network, and experience the full potential of open source.
Be sure to visit us at booth #116 where our Red Hat experts will answer your questions about Ansible, OpenShift, and showcase how Perficient can help you succeed with Red Hat. As a Red Hat Premier Partner and a Red Hat Apex Partner, Perficient helps drive strategic initiatives around cloud-native development, DevOps, and enterprise integration to ensure successful application modernization and cloud implementations and migrations.
Are you attending Red Hat Summit? Reach out to connect with our team.
]]>Recently, I attended the 2023 Bank Automation Summit, where one of the significant topics of discussion was how banks navigate their transition to the cloud.
The “cloud” refers to a global network of servers, each with a unique function, that works in tandem to enable users to access files stored within from any approved device. The computing and storage of cloud data occur in a data center, rather than on a locally sourced device.
Cloud computing makes data more accessible, cheaper, and scalable. For these reasons, Gartner predicts that by 2025, 85% of enterprises will have a cloud-first principle. However, due to their sensitive and regulated natures, some industries – especially the financial services industry – have had more complicated cloud transformation journeys than others.
Traditionally, information was said to be most secure when separated and segmented. However, the cloud’s structure makes data segmentation more complex and potentially more vulnerable if the correct security measures are not followed. For example, as a start, companies should leverage the cloud for initiatives surrounding verification methods, access security, and anti-phishing training.
Migration to the cloud and data transformation does not need to happen overnight. Especially for larger, older institutions, it might take some warming up to cloud-based applications before adopting them at full capacity. And, many have the impression that everything should move to the cloud, but depending on an institution’s needs, it might make sense for them to keep certain things on premises. Institutions should implement cloud technologies in a way that makes sense for their needs. For many, this means starting their journey using microservices.
Financial services institutions must be hypervigilant regarding where customer data is located, who has data access, and how data is managed in a cloud environment. There are also certain global and regional regulatory considerations for migrating to the cloud in a phased approach, and institutions must have a thorough understanding and awareness of the implications.
Interested in discussing how Perficient can support your cloud transformation journey? Contact one of our experts today.
]]>IT organizations are responsible for an ever-increasing number of applications, infrastructure providers, operating system versions, network devices, platforms, monitoring, ticket systems, and more. The challenges that go along with managing it can range anywhere from time-consuming to downright Sisyphean. The rising adoption of cloud services adds a financial component, a new challenge for many organizations starting their cloud journey. It’s more important than ever for organizations to know as much as possible about their infrastructure, how it’s configured, and how it’s all integrated.
There are many enterprise organizations that have long-standing legacy technology which can’t be containerized or launched in the cloud. The idea that servers should be cattle and not pets is a fantastic goal, but sometimes that livestock gets a name and special treatment turning it into a big pet. There’s a constellation of IoT devices out there that might fall under one regulatory agency’s OT security guidelines or another. IT Engineers need to be able to keep their systems and applications flowing with changing business needs, security updates, and regulatory controls. If you’re looking for a solution to all these problems that’s where Ansible comes in.
At its heart, Ansible is a configuration management and automation tool written in Python. That doesn’t mean Ansible developers need to know anything about the Python language to use it (although it is extensible with plugins and custom modules); instead, automation definitions are written in YAML. Sorry, there’s no escaping YAML in today’s IT landscape. Like it or not, it’s the language of configuration for now at least.
Teams using Ansible can define and execute desired states for devices, automate the installation of tools to support an application, and even deploy and configure the application itself using the same tool. Need to update a ServiceNow ticket after modifying a config file on a prod instance? Or add a Jira task if something that wasn’t accounted for pops up? Ansible has modules for that.
Think of a traditional IT application deployment on new infrastructure – let’s say a web server running a simple Flask app in the DMZ VLAN feeding off a PostgreSQL database on the internal VLAN. The Dev team has tested their code and hands it off to the operations team to deploy it on the prod servers with some step-by-step instructions as to what goes where, what required services need to be in place, required versions, and so on. Operations needs to prepare those servers in accordance with their own guidelines, install the Dev team’s prerequisites, then deploy the application. Meanwhile, network engineers need to ensure that the servers have valid IP addresses and that the firewalls on both sides of the DMZ are allowing the correct traffic though so that users can get to the app, and the web server can talk to the database.
What if instead of step-by-step instructions, it was a simple Ansible role that could be called from a playbook along with the network team’s IP and firewall roles and operations server-compliance configuration? Now everything needed to build that application is defined in code, packaged together, and tracked in source control. Ansible enables teams to do just that. When done carefully, Ansible playbooks and roles can be self-documenting. Ansible has a shallow learning curve, fantastic documentation, and a no-cost barrier to entry if using ansible-core to get started.
Ansible core can take a small team a long way. Larger teams and teams who might be outgrowing the command-line-only Ansible tools will want to look at Red Hat’s Ansible Automation Platform. Ansible Automation Platform (AAP for short) is a full suite of tools that expands on the capabilities of Ansible core. Some of the highlights of what AAP provides are:
Red Hat provides open-source technologies that enable strategic cloud-native development, DevOps, and enterprise integration solutions to make it easier for enterprises to work across platforms and environments. As a Red Hat Premier Partner and a Red Hat Apex Partner, we help drive strategic initiatives around cloud-native development, DevOps, and enterprise integration to ensure successful application modernization and cloud implementations and migrations.
Whether you have already adopted Openshift or are considering it, this article will help you increase your ROI and productivity by listing the 12 essential features including with any Openshift subscription. This is where Openshift shines as a platform when compared to pure Kubernetes engine distributions like EKS, AKS, etc. which are more barebones and require quite a bit of setup to be production and/or enterprise ready. When you consider the total value of Openshift, and factor in the total cost of ownership for the alternative, Openshift is a very competitive option not only for cost conscious buyers but also organizations that like to get things done, and get things done the right way. Here we go:
Special Bonus: Api Management
If you want an easy way to manage your Openshift Cloud infra, these managed Openshift solutions are an excellent value and a great way to get ROI fast. Pay-as-you-go running on the hyperscaler’s infrastructure, you can save a ton of money by using reserved instances with a year commitment. RedHat manages the control plane (master and infra nodes) and you pay a small fee per worker. We like the seamless integration with native hyperscaler services like storage and node pools for easy autoscaling, zone awareness for HA, networking and RBAC security with IAM or AAD. Definitely worth a consideration over the EKS/AKS, etc. solutions which are more barebones.
Check out our Openshift Spring Boot Accelerator for ROSA, which leverages most of the tools I’m introducing down below…
Available by default on Openshift, the OperatorHub is pretty much the app store for Kubernetes. Operators manage the installation, upgrade and lifecycle of complex Kubernetes-based solutions like the tools we’re going to present in this list. They also are based on the controller pattern which is at the core of the Kubernete’s architecture and enable declarative configuration through the use of Custom Resource Definitions (CRD). Operators is a very common way to distribute 3rd party software nowadays and the Operator Framework makes it easy to create custom controllers to automate common Kubernetes operations tasks in your organization.
The OperatorHub included with Openshift out-of-the-box allows you to install said 3rd party tools with the click of a button, so you can setup a full-featured cluster in just minutes, instead of spending days, weeks, months gathering installation packages from all over. The Operator Framework support Helm, Ansible and plain Go based controllers to manage your own CRDs and extend the Kubernetes APIs. At Perficient, we leverage custom operators to codify operations of high-level resources like a SpringBootApp. To me, Operators represent the pinnacle of devsecops automation or at least a giant leap forward.
First thing you should install on your clusters to centralize the management of your clusters configuration with Git is GitOps. GitOps is a RedHat’s distribution of ArgoCD which is delivered as an Operator, and integrates seamlessly with Openshift RBAC and single-sign on authentication. Instead of relying on a CI/CD pipeline and the oc (kubectl) cli to implement changes in your clusters, ArgoCD works as an agent running on your cluster which automatically pulls your configuration manifests from a Git repository. This is the single most important tool in my opinion for so many reasons, the main ones being:
I have a very detailed series on GitOps on my Perficient’s blog, this is a must-read whether you’re new to Openshift or not.
Openshift comes with a pre-configured monitoring stack powered by Prometheus and Grafana. Openshift Monitoring manages the collection and visualization of internal metrics like resource utilization, which can be leveraged to create alerts and used as the source of data for autoscaling. This is generally a cheaper and more powerful alternative to the native monitoring systems provided by the hyperscalers like CloudWatch and Azure Monitoring. Like other RedHat’s managed operators, it comes already integrated with Openshift RBAC and authentication. The best part is it can be managed through GitOps by using the provided, super simple CRDs.
A less-know feature is the ability to leverage Cluster Monitoring to collect your own application metrics. This is called user-workload monitoring and can be enabled with one line in a manifest file. You can then create ServiceMonitor resources to indicate where Prometheus can scrape your application custom metrics, which can be then used to build custom alerts, framework-aware dashboards, and best of all, used as a source for autoscaling (beyond cpu/memory). All with a declarative approach which you can manage across clusters with GitOps!
Based on a Fluentd-Elasticsearch stack, cluster logging can be deployed through the OperatorHub and comes with production-ready configuration to collect logs from the Kubernetes engine as well as all your custom workloads in one place. Like Cluster Monitoring, Cluster Logging is generally a much cheaper and Powerful alternative to the hyperscaler’s native services. Again, the integration with Openshift RBAC and single-sign on makes it very easy to secure on day one. The built-in Kibana deployment allows you to visualize all your logs through a web browser without requiring access to the Kubernetes API or CLI. The ability to visualize logs from multiple pods simultaneously, sort and filter messages based on specific fields and create custom analytics dashboards makes Cluster Logging a must-have.
Another feature of Cluster Logging is log forwarding. Through a simple LogForwarder CRD, you can easily (and through GitOps too!) forward logs to external systems for additional processing such as real-time notifications, anomaly detection, or simply integrate with the rest of your organization’s logging systems. A great use case of log forwarding is to selectively send log messages to a central location which is invaluable when managing multiple clusters in active-active configuration for example.
Last but not least is the addition of custom Elasticsearch index schema in recent versions, which allows developers to output structured log messages (JSON) and build application-aware dashboards and analytics. This feature is invaluable when it comes to filtering log messages based on custom fields like log levels, or a trace ID, to track logs across distributed transactions (think Kafka messages transiting through multiple topics and consumers). Bonus points for being able to use Elasticsearch as a metrics source for autoscaling with KEDA for example.
Based on Jaeger and Opentracing, Distributed Tracing can again be quicky installed through the OperatorHub and makes implementing Opentracing for your applications very, ridiculously easy. Just deploy a Jaeger instance in your namespace and you can just annotate any Deployment resource in that namespace with one single line to start collecting your traces. Opentelemetry is invaluable for pinpointing performance bottlenecks in distributed systems. Alongside Cluster Logging with structured logs as mentioned above, it makes up a complete solution for troubleshooting transactions across multiple services if you just log your Opentracing trace IDs.
Openshift Distributed Tracing also integrates with Service Mesh, which we’ll introduce further down, to monitor and troubleshoot traffic between services inside a mesh, even for applications which are not configured with Opentelemetry to begin with.
Based on Tekton, Openshift pipelines allow you to create declarative pipelines for all kind of purposes. Pipelines are the recommended way to create CI/CD workflows and replaces the original Jenkins integration. The granular declarative nature of Tekton makes creating re-usable pipeline steps, tasks and entire pipelines a breeze, and again can be managed through GitOps (!) and custom operators. Openshift pipelines can be deployed through the OperatorHub in one-click and comes with a very intuitive (Jenkins-like) UI and pre-defined tasks like S2I to containerize applications easily. Creating custom tasks is a breeze as tasks are simply containers, which allows you to leverage the massive ecosystem of 3rd party containers without having to install anything additional.
You can use Openshift pipelines for any kind of workflow, from standard ci/cd for application deployments to on demand integration tests, to executing operations maintenance tasks, or even step functions. As Openshift native, Pipelines are very scalable as they leverage the Openshift infrastructure to execute tasks on pods, which can be very finely tuned for maximum performance and high availability, integrate with the Openshift RBAC and storage.
Openshift supports the three types of autoscalers: horizontal pod autoscaler, vertical pod autoscaler, cluster autoscaler. The horizontal pod autoscaler is included OOTB alongisde the node autoscaler, and the vertical pod autoscaler can be installed through the OperatorHub.
Horizontal pod autoscaler is a controller which increases and decreases the number of pod replicas for a deployment based on CPU and Memory metrics threshold. It leverages Cluster Logging to source the Kubernetes pod metrics from the included Prometheus server and can be extended to use custom application metrics. The HPA is great to scale stateless rest services up and down to maximize utilization and increase responsiveness during traffic increase.
Vertical pod autoscaler is another controller which analyses utilization metrics patterns to optimize pod resource configuration. It automatically tweaks your deployment resources CPU and memory requests to reduce wastes or undercommitment to insure maximum performance. It’s worth noting that a drawback of VPA is that pods have to be shutdown and replaced during scaling operations. Use with caution.
Finally, the cluster autoscaler is used to increase or decrease the number of nodes (machines) in the cluster to adapt to the number of pods and requested resources. The cluster autoscaler paired with the hyperscaler integration with machine pools can automatically create new nodes when additional space is required and remove the nodes when the load decreases. There are a lot of considerations to account for before turning on cluster autoscaling related to cost, stateful workloads requiring local storage, multi-zone setups, etc. Use with caution too.
Special mention for KEDA, which is not commercially supported by RedHat (yet), although it is actually a RedHat-Microsoft led project. KEDA is an event-driven scaler which sits on top of the built-in HPA and provides extensions to integrate with 3rd party metrics aggregating systems like Prometheus, Datadog, Azure App Insight, and many many more. It’s most well-known for autoscaling serverless or event-driven applications backed by tools like Kafka, AMQ, Azure EventHub, etc. but it’s very useful to autoscale REST services as well. Really cool tech if you want to move your existing AWS Lambda or Azure Functions over to Kubernetes.
Service mesh is supported by default and can also be installed through the OperatorHub. It leverages Istio and integrates nicely with other Openshift operators such as Distributed Tracing, Monitoring & Logging, as well as SSO. Service mesh serves many different functions that you might be managing inside your application today (For example if you’re using Netflix OSS apps like Eureka, Hystrix, Ribbon, etc):
You don’t even need to use microservices to take advantage of Service Mesh, a lot of these features apply to re-platformed monoliths as well.
Finally you can leverage Service Mesh as a simple API Management tool thanks to the Ingress Gateway components, in order to expose APIs outside of the cluster behind a single pane of glass.
Now we’re getting into real modern application development and deployment. If you want peak performance and maximize your compute resources and/or bring down your cost, serverless is the way to go for APIs. Openshift Serverless is based on KNative and provides 2 main components: serving and eventing. Serving is for HTTP APIs containers autoscaling and basic routing, while eventing is for event-driven architecture with CloudEvents.
If you’re familiar with AWS Lambda or Azure Functions, Serverless is the equivalent in the Kubernetes world, and there are ways to migrate from one to the other if you want to leverage more Kubernetes in your infrastructure.
We can build a similar solution with some of the tools we already discussed like KEDA and Service Mesh, but KNative is a more opinionated model for HTTP-based serverless applications. You will get better results with KNative if you’re starting from scratch.
The big new thing is eventing which promotes a message-based approach to service-to-service communication (as opposed to point-to-point). If you’ve used that kind of decoupling before, you might have used Kafka, or AWS SQS or other types of queues to decouple your applications, and maybe Mulesoft or Spring Integration or Camel (Fuse) to produce and consume messages. KNative eventing is a unified model for message format with CloudEvent and abstracts the transport layer with a concept called event mesh. Check it out: https://knative.dev/docs/eventing/event-mesh/#knative-event-mesh.
One of the first things to address when deploying applications to Kubernetes is the management of sensitive configuration variables like passwords to external systems. Though Openshift doesn’t officially support loading secrets from external vaults, there are widely used solutions which are easily setup on Openshift clusters:
Each have their pros and cons depending on whether you’re in the cloud, use GitOps, your organization policies, existing secrets management processes, etc. If you’re starting from scratch and are not sure of which one to use, I recommend starting with External Secrets and your Cloud provider secret store like AWS Secret Manager or Azure Vault.
If you’re running on AWS or Azure, each cloud provider has released their own operators to manage cloud infrastructure components through GitOps (think vaults, databases, disks, etc), allowing you to consolidate all your cloud configuration in one place, instead of using additional tools like Terraform and CI/CD. This is particularly useful when automating integration or end-to-end tests with ephemeral Helm charts to setup various components of an application.
Muleosft, Boomi or Cloupak for integration customers, this is an add-on but it’s way worth considering if you want to reduce your APIM costs: Redhat Application Foundation and Integration. These suites include a bunch of cool tech like Kafka (with a registry) and AMQ, SSO (SAML, OIDC, OAuth), Runtimes like Quarkus and Spring and Camel, 3Scale for API Management (usage plans, keys, etc), CDC, Caching and more.
Again because it’s all packaged as an operator, you can install and start using all these things in just a few minutes, with the declarative configuration goodness that enables GitOps and custom operators.
]]>One of the big drivers of adopting containers to deploy microservices is the elasticity provided by platforms like Kubernetes. The ability to quickly scale applications up and down according to current demand can cut your spending by more than half, and add a few 9s to your SLAs. Because it’s so easy to setup nowadays, there’s really no good reason for autoscaling not to be one of your top priorities for a successful adoption of Kubernetes. In this post I’m going to give you the 6 easy steps to establish a solid autoscaling foundation using KEDA, and trust me you’ll go a long way with just these basic principles.
Before you jump into autoscaling, please consider the following
Now that we got out of the way, let’s get started…
Let’s super quickly review the different types of autoscaling available for Kubernetes:
Vertical Autoscaling: resizes individual pods to increase the load capacity. Great for rightsizing applications that don’t scale horizontally easily such as Stateful services (databases for example) or applications that are CPU or memory bound in general. Scaling a pod vertically requires replacing the pod, which might cause downtime. Note that for certain type of services, resizing a pod might have no effect at all on its capacity to process more requests. That’s because Spring Boot services for example have a set number of threads per instance, so you would need to explicitly increase the number of threads to leverage the additional CPU.
Horizontal Autoscaling: creates additional identical pods to increase the overall load capacity. Best option to use whenever possible in order to optimize pod density on a node. Supports CPU and memory-based scaling out-of-the-box but supports custom metrics as well. Well-suited for stateless services, event-driven consumers
Node Autoscaling: creates additional identical nodes (machines) in order run more pods when existing nodes are at capacity. This is a great companion for horizontal autoscaling but… there are many considerations to take into account before turning it on. The two main concerns are waste – new nodes might get provisioned for only minor capacity increase – and scaling down – when nodes run Stateful pods which might be tied to specific zones.
The rest of this article will be focused on horizontal pods autoscaling.
HPA ships with Kubernetes and consist of a controller that manages the scaling up and down of the number of pods in a deployment.
In a nutshell:
HPA is limited in terms of what metrics you can use by default though: CPU & memory. So this is fine if your service is CPU or memory bound but if you want to use anything else, you’ll need to provide HPA with a custom API to serve other types of metrics.
This is the basic formula that the HPA to calculate the desired number of pods to schedule:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
This is calculated on every “tic” of the HPA, which can be configured per deployment but default to 30 seconds.
Example:
An HPA configured with a target CPU usage of 60% will try to maintain an average usage of 60% CPU across all deployment’s pods.
If the current deployment is running 8 pods averaging %70 usage, desiredReplicas = ceil[8*(70/60)] = ceil(9.33) = 10. The HPA will add 2 pods.
According to the KEDA website:
KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed.
That’s actually a bit misleading and reducing. The common misconception is that KEDA can only be used when doing event-driven architecture like MQ or Kafka. In reality KEDA provides that API I mentioned earlier for serving custom metrics to the HPA. Any type of metrics, like response time or requests/second, etc
So say you want to use Prometheus metrics, or CloudWatch metrics, etc. KEDA has a lot of scalers to integrate with all these services. This is a very easy way to augment the default HPA and not write custom metrics APIs.
One of the interesting features of KEDA is the ability to scale down to 0 when there’s nothing to do. KEDA will just query the metric system until activity is detected. This is pretty easy to understand when you’re looking at things like queue size of Kafka records age, etc. The underlying service (i.e. Kafka) still runs and is able to receive messages, even if there aren’t any consumer doing work. No message will be lost.
When you consider HTTP services though, it doesn’t work quite the same. You need at least one instance of the service to process the first incoming HTTP request so KEDA cannot scale that type of deployment to 0.
(There is an add-on to handle HttpScaledObjects that creates a sort of HTTP proxy, but if you really need to scale down services to 0, I recommend looking at KNative instead)
You can still leverage KEDA as the HPA backend to scale on things like requests/seconds and this is what we’re going to do next.
What we call rightsizing in Kubernetes is determining the ideal CPU and Memory requirements for your pod to maximize utilization while preserving performance.
Rightsizing serves 2 main purposes:
This is more related to cost control and utilization of compute resources. If you picture the node as a box, and the pods as little balls, the smaller the balls, the less wasted space between the balls.
Also you’re sharing the node with other application pods, so the less you use, the more resources you leave to other applications.
Let’s walk through an example. Say your node pools are made of machine with 16 vCPUs and your pods are configured to request 2 vCPUs, you can put 8 pods on that node. If your pod actually only uses 1 vCPU, then you’re wasting 50% capacity on that node.
If you request a big vCPU number, also keep in mind that every time a new pod comes up, you might only use a fraction of that pod capacity while usage goes up. Say your pod uses 4 vCPU / 1000 concurrent requests. At 1250 requests for example, a new pod would be created, but only ¼ of the requested vCPU would be used. So you’re blocking resources that another application might need to use.
You get the idea… smaller pods = smaller scaling increment
This is to give us a baseline for the metrics to scale on. The idea is to establish a relation between the pod resources and its capacity to reach a target so multiply by 2 and you can twice the capacity, multiple by 3 and you get 3 times the capacity etc.
I recommend using a performance-based metrics for autoscaling as opposed to a utilization metric. That’s because a lot of http services don’t necessarily use more resources to process more requests. Checkout the following load test of a simple Spring Boot application.
In this test I’m doubling the number of concurrent requests at each peak. You can see that the max CPU utilization doesn’t change.
So what’s the right size? In a nutshell, the minimum CPU and memory size to insure a quick startup of the service and provide enough capacity to handle the first few requests.
Typical steps for a microservice with a single container per pod (not counting sidecar containers which should be negligible):
Avoid specifying CPU limits at least at this point to avoid throttling
To really optimize costs, this will need to be refined over time by observing the utilization trends in production.
This step measures how much load a single pod is able to handle. The right measure depends on the type of services that you’re running. For typical APIs, requests/seconds is the preferred metric. For event-driven consumers, throughput or queue-size is best.
A good article about calculating a performance baseline can be found here: https://blog.avenuecode.com/how-to-determine-a-performance-baseline-for-a-web-api
The general idea is to find the maximum load the pod can sustain without degradation of service, which is indicated by a drop in response time.
Don’t feel bad if your application cannot serve 1000s of RPS, that’s what HPA is for, and this is highly dependent on your application response time to begin with.
Keep an eye on your pod CPU utilization and load. A sharp increase might indicate an incorrect CPU request setting on your pod or a problem inside your application (async processes, web server threading configuration, etc)
We rightsized our Spring Boot application and chose 500m CPU and 600Mi memory requests for our pod. We’ve also created a deployment in our Kubernetes cluster with a single replica. Using JMeter and Azure Load Testing we were able to get the following results. The graphs show number of concurrent threads (users) on the top left, response time on the top right, and requests/seconds (RPS) on the bottom left.
1 POD (500m CPU) – 200 users
|
1 POD (500m CPU) – 400 users
|
1 POD (500m CPU) – 500 users
|
1 POD (500m CPU) – 600 users
|
Observe the response time degrading at 600 users (460ms vs 355ms before). So our pod performance baseline is 355ms @ 578 rps (500 users).
Interestingly, the CPU load plateaued at around 580 RPS. That’s because Spring Boot rest services are typically not CPU bound. The requests are still accepted but land in the thread queue until capacity is available to process the request again. That’s why you see an increase of the response time despite the CPU load staying the same. This is a perfect example of why using CPU for autoscaling doesn’t work sometimes, since in this case, you would just never reach a high CPU utilization. We still want the CPU request to be higher because of startup time for Spring Boot apps.
Now let’s scale our deployment to 2 replicas and run the tests again.
2PODS (500m CPU) – 1000
|
2 PODS (500m CPU) – 1200 users
|
This confirms our baseline so we know we can double the number of pods to double the capacity (353.11ms @ 1.17 rps)
I’ve previously explained that the HPA only supports CPU and memory metrics for autoscaling out-of-the-box. Since we’ll be using RPS instead, we need to provide the HPA an API to access the metric. This is where KEDA comes in handy.
KEDA provides access to 3rd party metrics monitoring systems through the concept of Scalers. Available scalers include Azure Monitor, Kafka, App Insights, Prometheus, etc. For our use case, the RPS metric is exposed by our Spring Boot application through the Actuator, then scraped by Prometheus. So we’ll be using the Prometheus scaler.
In order to register a deployment with KEDA, you will need to create a ScaledObject resource, similar to a deployment or service manifest. Here’s an example:
Let’s discuss the main fields:
That’s pretty much it. Apply that resource to the namespace where your deployment is running and you can start testing
Let’s first look at the basic steps and we’ll discuss details down below:
Pay attention, this part is very important: always test for realistic load. Testing with a ramp-up of 10k users/s is probably not realistic and most likely will not work. Understanding your traffic patterns is critical.
Remember that the various components in the autoscaling system are not real-time. Prometheus has a scrapping interval, the HPA has a query interval, KEDA has a scaling interval, and then you have your pod startup time, etc. This can add up to a few minutes in the worst case scenario.
During load increase, only the current number of pods will be able to handle the incoming traffic, until KEDA detects the breach of threshold and triggers a scaling event. So you might experience more or less serious degradation of service until your new pods come up. Can your users tolerate a few seconds of latency? Up to you to decide what’s acceptable.
Example:
Let me try to illustrate what’s going on. Imagine an application which can serve 5 RPM and we set our autoscaling threshold to 4 RPM, and we configure our test with 10 threads and a ramp up time of 150 seconds, this means we have a ramp-up rate of 4 threads per minute. We calculated that it’d would take 1.5 min for KEDA to trigger a scale up, and for a new pod to be ready to receive requests. We can trace the following graph:
In blue we show the number of users/min simulated by our load test, in orange, the capacity of a single pod and in purple, the threshold set in the autoscaler.
At the 1 minute mark, the threshold will be breached (blue line crossing), so 1.5 minutes after that – in the worst case – our second pod will be ready at the 2.5 minutes mark.
The vertical black line shows that the number of users at the 2.5 min would have already reached 10 so the single first pod will have to deal with up to 2x its RPM capacity until the second pod comes up.
We know our application can handle up to 5 RPS without service degradation, so we want to configure our tests so the ramp-up rate falls under the orange line. That’s a 2 threads/min ramp-up, hence we need to increase our ramp-up time in JMeter to 300 seconds and make sure our overall test duration is at least 300 seconds.
In our previous example, what if your actual ramp-up in production is just that high? Before messing with the threshold, try this first:
If none of that helps you achieve your goal, you can try lowering the threshold BUT you need to understand the tradeoffs. Let’s go back to our formula:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
You can see that the number of pods is directly related to the ratio between the threshold and the actual load. Now let’s say you want to handle a load of 1000 RPS.
If you set the threshold to 1000 RPS, the HPA will scale to 10 pods. Now, change the threshold to 50 RPS and the HPA will scale to 20 pods – i.e. twice the amount of pods – for the same load and same pod capacity!
A lower threshold will result in more pods for the same load, which will increase cost, waste resources (under-utilized pods) and potentially impact overall cluster performance. At the same time a lower threshold will result in less risk of degraded service.
Autoscaled – 1400 users – 120 seconds ramp-up – 500 rps threshold
Ramp-up time is too short and threshold is too high, resulting in a serious increase in response time for the first couple pods |
Autoscaled – 1400 users – 240 seconds ramp-up – 500 rps threshold
Double ramp-up time, still small spike in response time but better overall |
Autoscaled – 1400 users – 240 seconds ramp-up – 400 rps threshold
Decreased threshold improves response time degradation |
Autoscaled – 1400 users – 240 seconds ramp-up – 300 rps threshold
Lower threshold improved response time even more BUT… |
![]() HPA scales to 4 pods @ 400 RPS threshold |
![]() HPA scales to 6 pods @ 300 RPS threshold |
In this case, we determined that 400 RPS was the correct threshold to avoid overly degraded response time during initial scale-up while maximizing resource utilization.
Autoscaling a part of a system means making sure the other parts can scale too
If an application response time starts increasing significantly, autoscaling can become a big problem if you’re not using the right metric.
A misconfigured autoscaler can result in much higher costs without benefit and negatively impact other systems.
For example, if an application becomes really slow because of a downstream problem with a database, adding more pods will not improve the situation. In some cases, that would actually aggravate the problem by putting more pressure on the downstream system.
A drop in response time would mean a drop in RPS. By using RPS as the scaling metric, in that case, we would actually decrease the number of pods to match what the system is actually capable of serving. If you instead scaled on response time, the number of pods would increase but the throughput would remain exactly the same. You’d just have stuck requests spred out across more pods.
Monitoring key metrics is critical to avoid runaway costs
Monitor HPA, understand how often pods come up and down and detect anomalies like unusually long response times. Sometimes autoscaling will mask critical problems and waste a lot of resources.
Improve your application’s resilience first
Sometimes it is actually better to not autoscale when you want back-pressure to avoid overwhelming downstream systems and provide feedback to users. It’s a good idea to implement circuit breakers, application firewalls, etc to guard against these problems
All the steps above can be automated as part of your CI/CD pipeline. JMeter and Azure Load Tests can be scripted with ADO and ARM or Terraform templates.
This is to proactively track changes in application baseline performance which would result in changing the target value for the autoscaling metric.
You can easily deploy a temporary complete application stack in Kubernetes by using Helm. Run your scripted load tests, compare with previous results, and automatically update your ScaledObject manifest.
Monitoring the right platform and application metrics will surface optimization opportunities (and anticipate problems). Following are some of the metrics you want to keep an eye on:
Application response time: if the response time generally goes up, it might be time to re-evaluate your baseline performance and adjust your target RPS accordingly
Number of active pods: changes in active pods patterns usually indicate a sub-optimized autoscaling configuration. Spikes in number of pods can be an indication of a too low target
Pod CPU & memory utilization %: monitor your pods utilization to adjust your rightsizing settings
Request per seconds per pod: if the RPS of single pods is much below the configured target, the target is too low which results in underutilized pods
This process can also be automated to a certain extent. Some alerting mechanism which provides recommendation is best, in most cases you want a human looking at the metrics and decide on the appropriate action.
I’ll repeat what I’ve said at the very beginning of this article: autoscaling is not the solution to poor application performance problems. That being said if your application is optimized and you’re able to predictably scale horizontally, KEDA is the easiest way to get started with autoscaling. Just remember that KEDA is just a tool and in my experience, the number one impediment to a successful autoscaling implementation is a lack of understanding of testing procedures or lack of tests altogether. If you don’t want to end up with a huge bill at the end of the month, reach out to Perficient for help!
]]>