Development

Docker Bootcamp – Understanding Performance and Performance Tuning

Istock 1222544927

Welcome back to docker bootcamp.  In a previous post, we learned how to set resource limits for our containers.  In this post, we will dig deep into performance.  We’ll start with a brief history lesson about CPUs and hyperthreading then move on to an example that will help explain how different CPUs can impact performance.  We’ll consider some ways to look at our containers and adjust them for increased performance.

Terminology

  • Socket – A physical connector on the motherboard where a CPU can be attached. Some motherboards have multiple sockets and can support multiple CPUs.
  • CPU (Central Processing Unit AKA Processor) – A complex electronic circuit that contains several subcomponents required to run applications.
  • CPU Core – The subcomponent of the CPU that includes the L1 cache and executes instructions. Many CPUs have multiple cores.
  • Multi-Core CPU –Allows multiple instructions to be executed in parallel on different cores.
  • CPU Scheduler – Allows a single CPU to share system resources between multiple processes. It selects a process in the queue that is ready for execution.
  • Simultaneous Multi-Threading (AKA Hyperthreading) – Allows the CPU scheduler to assign two tasks to the same CPU core.
  • Physical Cores – A count of the real number of cores for all CPUs.
  • Logical Processors – A count of the physical cores multiplied by two (if hyperthreading is enabled).

CPU History

The first modern computers had one processor and could only execute one instruction at a time.  Large server computers had multiple sockets for multiple processors to allow parallel execution and increase performance.  In 2002, Intel introduced the hyperthreading technology to improve the performance of a single CPU core.  One CPU shows as two logical processors to the operating system.  Each logical processor can execute instructions and be interrupted or halted independently.  This does not double the performance of the CPU as the CPU can still only process one instruction at a time.  Later processors included multiple cores to allow true parallel execution.  Today’s processors have both multiple cores and support hyperthreading to improve the performance of all available cores.

Understanding CPU Cores and Hyperthreading

Let’s use an example to understand how our CPU works.  Imagine a bowl of candies.  The candies represent the computer instructions.  Your mouth represents the CPU that processes the instructions.  Your hands represent the CPU scheduler that selects an instruction to execute.  The goal is to consume the bowl of candy as quickly as possible. You can only pick up one piece of candy with each hand at a time.  You must fully chew and swallow that piece of candy before picking up the next piece of candy.  Let’s apply this scenario to different types of CPUs.

  • Single core CPU without hyperthreading – Use only your right hand. Pick up one piece of candy and eat.
  • Single core CPU with hyperthreading – Use both hands. Pick up one piece of candy with your right hand and eat.  While you are eating, pick up another piece of candy with your left hand and have it waiting by your mouth ready for when you have finished the first piece of candy.
  • Dual core CPU without hyperthreading – Have one other person join you. Each person can only use their right hand.  Each person picks up one piece of candy and eats.
  • Dual core CPU with hyperthreading – Have one other person join you. Each person can use both hands. Each person picks up one piece of candy with their right hand and eats.  While you are eating, pick up another piece of candy with your left hand and have it waiting by your mouth ready for when you have finished the first piece of candy.

Expand the example to four, six, or eight people (cores) with and without hyperthreading.  You can see that adding more people greatly increases the speed at which you can empty the bowl. Being able to use both hands increases the speed as well, but each mouth can only chew one piece of candy at a time, so the increase is less than adding another person.

Performance and Hyperthreading

Hyperthreading improves performance because it allows the CPU to do other things while one thread is waiting on other hardware (ie memory, disk, network).  This comes at the cost of slightly higher heat output and slightly decreased battery life.  An application must be written specifically to take advantage of hyperthreading.  Tasks that must be done in sequential order will not benefit from hyperthreading.  Tasks such as video editing, 3d rendering, gaming, and other CPU intensive operations will receive the highest benefit from hyperthreading.

Performance and Virtual Machines

I have a computer with one socket, 6 physical cores, and 12 logical processors.  When I create a new virtual machine, I am offered the ability to assign 12 processors to the VM.  If I assign all 12 logical processors to the VM, the CPU scheduler will think it has 12 physical cores.  This will cause each of the hyperthreads to run at 50% utilization and the cores to run at 100% utilization and lead to degraded performance for both the VM and the host.  It is best to not assign more than the number of physical cores to a single VM.

I like to leave at least one core fully available to the host.  If you run multiple VMs on the same host, the total number of cores assigned to VMs can exceed the number of physical cores.  This is called overcommitment.  With multiple VMs running under normal load, the performance should not degrade as each VM gets a chance to utilize the host CPU.  You can experience decreased performance if one or more of the VMs has high load continuously.  You can easily change the number of CPUs assigned to a VM by stopping the VM and changing the settings.  The guest will instantly recognize the assigned CPUs.  Deciding the number of cores to assign to a VM is an art.  You do not want to assign too few or too many resources.  The final values depend on the workload being performed by the VM.

Performance and Containers

Whether you run your containers on your host or in a VM, your application will take a hit to performance.   Luckily, we can tune the performance of our containers by adjusting CPUs and memory.  You can adjust the number of CPUs available to the container by using the –cpus flag on the docker create/run command or in a docker-compose file.  If no value is provided docker will use a default value.  On windows, a container defaults to using two CPUs.  If hyperthreading is available this is one core and two logical processors.  If hyperthreading is not available this is two cores and two logical processors.  On linux, a container defaults to using all available CPUs of the host.  Just like virtual machines, finding the right amount of resources is an art.  The final values depend on the work load being performed by the container.  Unlike virtual machines, you cannot change the configuration of an existing container, you must destroy the container, change the settings and create a new container.

Rightsizing for Performance

There is no magic formula for determining the “right” size for a VM or a container.  It depends on the amount of available resources on the host, the ability to scale the host, the requirements of the application, the average load of the application, the peak load of the application, and other applications running on the host.  Docker creates a process called “Vmmem” on windows that represents the CPU and memory usage of the container.  If the CPU for this process is constantly running at 30% or above, you could consider increasing the number of CPUs for the container.  If the memory is constantly running at 100% of the allocated amount, you could consider increasing the amount of ram for the container.  Memory is a bit harder to judge than CPU as memory is meant to be utilized at a higher rate to keep application data quickly accessible. Having too little memory causes heavy load on the CPU as it swaps data between memory and disk.  Assigning too much memory can cause performance issues for the host.

Setup

  • Host
    • My host has 6 cores and 12 logical processors
    • docker system info shows 12 CPUs
  • Virtual Machine
    • My VM has 4 cores and 4 logical processors
    • docker system info shows 4 CPUs

Linux Commands

  • List CPU information
    • cat /proc/cpuinfo
  • List number of processors in use
    • nproc
  • List memory information
    • free -m

Windows Commands

  • List system information
    • systeminfo
      • Notice the section Processor(s) list the number of CPUs installed
      • Notice the memory section lists the total memory, available memory, and virtual memory
  • List CPU information
    • WMIC CPU Get DeviceId,NumberOfCores,NumberOfLogicalProcessors (no spaces between field names)
      • If hyperthreading is enabled, the number of logical processors will be twice the number of cores

Docker Commands

  • List docker system information
    • docker system info
      • Notice the section for CPUs lists how many CPUs are available to docker
      • Notice the section for Total Memory lists how much ram is available to docker
  • List container information
    • docker inspect <containername>
      • Notice the sections for HostConfig:NanoCpus, HostConfig:CpuShares, HostConfig:CpusetCpus, HostConfig:Memory
  • List container stats
    • docker stats
      • Shows CPU usage, memory usage, memory limit

Examples – CPU

I’ve created several containers on both my host and inside a VM using both linux and windows containers.  The tables below list the configurations and the number of CPUs for each scenario.

 

Linux Container Configuration NanoCpus nproc Cpuinfo
Host – Default 0 12 12
Host – cpus=4 4000000000 12 12
Host – cpuset-cpu=0,2,4 0 3 12
VM – Default 0 4 4
VM – cpus=4 4000000000 4 4
VM – cpuset-cpu=0,1,2 0 3 4

 

Note that linux containers see all available CPUs by default.  Using the cpus flag limits the processors by limiting the percent of available CPUs.  The cpuset-cpu flag limits the actual number of cores on which the container will run.

 

Windows Container Configuration NanoCpus Number Of Cores Number Of Logical Processors
Host – Default 0 1 2
Host – cpus=4 4000000000 2 4
VM – Default 0 2 2
VM – cpus=4 4000000000 4 4

 

Note that windows containers only see two CPUs by default.  Using the cpus flag limits the processors by limiting the percent of available CPUs.  The cpuset-cpu flag is not available for windows containers.

 

Examples – Memory

I’ve created a windows container that runs iis and hosts a content managed website.  I’ve adjusted the memory limit to find the best settings.  As mentioned before, with no mem_limit set, windows containers default to 1GB of physical ram.  It makes sense that this container was performing slowly as it only had 36MB of free ram.  Testing different values of 2GB, 3GB, and 4GB I noticed that the amount of free ram also increased.  For my container, it seems that I do not gain any benefit setting a mem_limit higher than 2GB.

 

Windows Container Configuration Total Ram Available Ram Virtual Mem Available Virtual
Host 16GB 6.4GB 22.5GB 5.5GB
VM (no mem_limit set) 1GB 36MB 2.7GB 1.4GB
VM – mem_limit=2GB 2.5GB 1.2GB 4.2GB 2.8GB
VM – mem_limit=3GB 3.5GB 2.2GB 7.1GB 5.6GB
VM – mem_limit=4GB 4.5GB 3.0GB 9.2GB 7.4GB

 

Final Thoughts

To have your containers perform well, you should not trust the default docker settings.  Use the cpus and cpuset-cpus flags when possible.  Watch your task manager for high usage and adjust your container settings as needed.  Connect to the console of your container to see what resources your container can see and how they are being used.  You can improve the performance of your containers by adjusting cpu and memory settings, but your application will likely never run as fast it could outside of a container.

About the Author

More from this Author

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to the Weekly Blog Digest:

Sign Up
Categories