Introduction
Are your performance-critical Kubernetes workloads suffering from resource contention? Kubernetes v1.36 introduces Pod-Level Resource Managers (alpha), a game-changing feature that lets you allocate exclusive, NUMA-aligned resources to your primary application containers while efficiently sharing resources with sidecars. This guide walks you through enabling and using this feature to achieve predictable performance without wasting cores on auxiliary containers.
What You Need
- A Kubernetes cluster running v1.36 (or later) with alpha feature gates enabled
- Access to the kubelet configuration on each node
- Basic familiarity with Kubernetes pods and resource requests/limits
- Performance-sensitive workloads (e.g., ML training, low-latency databases) that benefit from NUMA alignment
- Understanding of the Topology Manager, CPU Manager, and Memory Manager concepts
Step-by-Step Instructions
Step 1: Enable the Alpha Feature Gates
Before you can use pod-level resources, you need to enable two feature gates on your kubelet. Add the following flags to your kubelet configuration (e.g., in kubelet-config.yaml or command-line arguments):
--feature-gates=PodLevelResourceManagers=true,PodLevelResources=trueRestart the kubelet after making changes. Verify that the gates are active by checking the kubelet logs for messages like PodLevelResourceManagers feature gate enabled.
Step 2: Configure the Kubelet for Pod-Level Resources
Set the Topology Manager policy to one that suits your workload. For pod-level allocation, the pod scope is recommended. Edit your kubelet configuration:
topologyManagerPolicy: "best-effort" # or "restricted" or "single-numa-node"
topologyManagerScope: "pod" # container scope is the default; change to pod
cpuManagerPolicy: "static" # required for exclusive CPU allocation
cpuManagerReconcilePeriod: "5s" # optional tuning
memoryManagerPolicy: "static" # optional if using memory pinningThese settings tell the kubelet to align resources for the entire pod as a unit, rather than per container.
Step 3: Define Your Pod’s Resource Budget
In your Pod spec, use the spec.resources field (pod-level) to set the overall CPU and memory budget. This budget determines the NUMA alignment size. Then, container-level requests and limits can carve out exclusive slices from that budget. Here’s an example for a tightly-coupled database pod:
apiVersion: v1
kind: Pod
metadata:
name: tightly-coupled-database
spec:
resources:
requests:
cpu: "8"
memory: "16Gi"
limits:
cpu: "8"
memory: "16Gi"
containers:
- name: database
image: mydb:latest
resources:
requests:
cpu: "4"
memory: "8Gi"
limits:
cpu: "4"
memory: "8Gi"
- name: metrics-exporter
image: metrics-exporter:v1
- name: backup-agent
image: backup-agent:v1In this example, the database container gets 4 exclusive CPUs and 8 GiB of memory from the pod’s budget. The remaining 4 CPUs and 8 GiB form a pod shared pool for the metrics exporter and backup agent. Those sidecars run without dedicated CPUs, using the shared pool, but they are isolated from the database’s resources.
Step 4: Choose Your Topology Manager Scope
Two scopes determine how allocation decisions are made:
- Pod scope: The kubelet performs a single NUMA alignment for the entire pod budget. All containers are laid out on the same NUMA node (if possible). Ideal for tightly-coupled workloads that benefit from co-location.
- Container scope (default): Each container is aligned independently. Pod-level resources still act as a budget, but the NUMA node can differ for each container. This is useful when sidecars don’t need to be on the same node as the main container.
Set the topologyManagerScope in your kubelet configuration accordingly. Restart the kubelet after changes.
Step 5: Deploy and Verify
Deploy your pod and check resource allocation:
kubectl apply -f pod.yaml
kubectl describe pod tightly-coupled-databaseLook for events indicating NUMA alignment, such as Topology aligned for pod. You can also inspect the kubelet logs for per-pod allocation details. Use kubectl exec into the database container to check CPU affinity (e.g., cat /sys/fs/cgroup/cpuset.cpus). The database should have exclusive CPUs, while sidecars share the remaining ones.
Tips and Best Practices
- Start with the pod scope for maximum performance predictability. Switch to container scope only if sidecars need to be on a different NUMA node for isolation.
- Monitor resource usage to ensure sidecar containers don’t exceed the shared pool. Use tools like Prometheus and Kubecost to track allocation.
- Combine with node-level resource managers if you have multiple performance-critical pods on the same node. The Topology Manager will try to avoid conflicts.
- Test in a non-production environment first. This is an alpha feature and the API may change in future releases.
- Consider using
initContainersfor setup tasks that don’t need long-term resources – they are also covered by the pod-level budget but exit once complete. - Be aware of the trade-off: Setting pod resources too large may waste capacity if sidecars are idle. Right-size based on actual usage.