OOMKilled Containers and Number of CPUs

Recently few of my colleagues got hit with OOMKilled issues for their application PODs when migrating from OpenShift cluster with smaller worker nodes (in terms of CPUs and Memory) to another cluster with bigger worker nodes (more CPUs and Memory).

The application was working absolutely fine when running on smaller cluster however many of the application PODs failed to start and threw OOMKilled errors when migrated to the bigger cluster. This is when some of us got pulled into to help analyse the issue w.r.to OpenShift and container runtime.

This article is a summary of the work with the hope that it might be useful for some of you.

Kubernetes provides a way to specify minimum and maximum resource requirements for containers. Here is an example of a POD specifying minimum (50M) and maximum (200M) memory requirements.

- name: memory-demo
image: polinux/stress
memory: "200M"
memory: "50M"

Note that you can either specify minimum (requests) or maximum (limits) or both.

If you want to understand all the available options and nuances take a look at link-1 and link-2.

When the POD has a memory ‘limit’ (maximum) defined and if the POD memory usage crosses the specified limit, the POD will get killed, and the status will be reported as OOMKilled. Note that, this happens despite the node having enough free memory. More details available in the following article.

Now in our case the container memory limit settings worked in smaller clusters but it failed on bigger clusters. Other than the size of the cluster nodes, nothing changed. The containers failed to even start when moved to the new cluster with bigger nodes. So what was going wrong?

We started with a basic experiment. A single container was run on different nodes, each with different number of CPUs. And we found that the memory usage of the same container showed an increase with increasing number of CPUs in the node. The increase was more significant on nodes with very large number of CPUs (100+).

So what contributes to this increase? Remember, the container (workload) is the same and there is not change to application parameters.

Let’s dig deeper and understand what contributes to container memory usage. There are three aspects which contributes to the memory usage of a container:

  1. per-cpu object caches used by the memory cgroup controller
  2. per-cpu kernel data structures
  3. process memory allocation

In summary container memory usage includes application memory usage and kernel memory usage. The application memory usage is largely independent of the number of CPUs however kernel memory usage has a directly relationship to the number of CPUs. Consequently if a container’s memory limit was sized on a node with lesser number of CPUs, it won’t be sufficient when the same container is run on a node with larger number of CPUs. This is due to the additional kernel memory that gets allocated because of more CPUs .

This is more pronounced on CPU architectures with large page sizes. For example on Power CPU architecture which uses 64K page size as compared to Intel which uses 4K page size, the increase in container memory usage with large number of CPUs is more pronounced.

Following are the available solution to handle the OOMKilled issues

  1. You’ll need to size the container workload for different node configurations when using memory limits. Unfortunately there is no formula that can be applied to calculate the rate of increase in container memory usage with increasing number of cpus on the node.
  2. One of the kernel tuneables that can help reduce the memory usage of containers is slub_max_order. A value of 0 (default is 3) can help bring down the overall memory usage of the container but can have negative performance implication for certain workloads. It’s advisable to benchmark the container workload with this tuneable.

There is a new memory cgroup controller in the works which shows lots of potential — https://www.phoronix.com/scan.php?page=news_item&px=Slab-Controller-Improvements-V6

I would like to thank Aneesh and Bharata who are Linux kernel experts and helped with this analysis.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store