Predictive Vertical Pod Rightsizing
Predictive Vertical Pod Rightsizing is the process of preemptively optimizing the CPU and memory resource requests and limits for Kubernetes pods to ensure they have just enough resources to run efficiently without over or under provisioning. Unlike horizontal scaling, which adjusts the number of pod replicas, vertical rightsizing focuses on tuning individual pod configurations to better match their actual usage patterns. This helps reduce wasted resources, lower cloud costs, and improve cluster utilization, while also minimizing the risk of resource starvation or out-of-memory (OOM) errors.
Prerequisites
Before you begin, make sure:
- Thoras is installed and running in your Kubernetes cluster.
- The Kubernetes Metrics Server is installed in your cluster.
- Your workloads are instrumented with resource requests and have some usage history.
Sample Configuration
Here’s an example AIScaleTarget
that configures rightsizing.
Vertical Breakdown
spec.vertical.mode
set torecommendation
ensures Thoras suggests optimal requests and limits but does not apply them.spec.vertical.containers[].name
should be set to the name of the container in the pods for the workload you are targeting with thisAIScaleTarget
.spec.vertical.containers[].RESOURCE.lowerbound
is the lowest Thoras is allowed to set the request for that resource and is a required field.spec.vertical.containers[].RESOURCE.upperbound
is the highest Thoras is allowed to set the request for that resource and is an optional field.spec.vertical.containers[].RESOURCE.limit
allows thoras to modify the container’s limit along with the request for this resource and is optional. If unset, Thoras will not modify limits for this resource.spec.vertical.containers[].RESOURCE.limit.ratio
the ratio used to keep the limit inline with the suggested request for this resource. E.g. for a ratio2.5
Thoras will make the limit2.5G
if the suggestion for the request was for1G
.
Additional Details
- Thoras recommendations are based on historical usage data collected from the Kubernetes Metrics Server.
- Recommendations are updated on the schedule defined by
spec.model.forecast_cron
. - If resource usage patterns change significantly, Thoras will adjust its suggestions accordingly.
- You can monitor the impact of rightsizing by reviewing resource utilization and cost metrics in the Thoras Dashboard.
- For workloads with multiple containers, you can specify vertical settings for
each container individually under
spec.vertical.containers
. spec.vertical.containers[].RESOURCE
can be eithercpu
ormemory
. If either is unset for a container, Thoras will not scale that resource for that container.
Viewing Recommendations
Once the AIScaleTarget
is applied, The Thoras Reasoning Engine will begin
analyzing resource usage and generate vertical sizing recommendations on the
schedule defined by spec.model.forecast_cron
.
You can view the suggested requests and their projected effects on your workload in the Thoras dashboard under Resource Allocation Settings.
Next Steps
- Change
spec.vertical.mode
toautonomous
if you want Thoras to automatically apply recommendations. - Add optional
scaling_behavior
if you want to fine-tune the thresholds Thoras uses to determine if a recommendation is worth restarting the entire workload.
Autonomous Mode
The Thoras Reasoning Engine will make usage predictions on the schedule set by
spec.model.forecast_cron
for the length of spec.model.forecast_blocks
. If no
spec.vertical.scaling_behavior
is set then Thoras will check all resource
requests to determine if they match the current suggestion. If any resource
request differs from the current suggestion, Thoras triggers a rollout restart
of the workload. As the pods restart, their resource requests and limits will be
updated to match the suggestion made by Thoras.
Note: Thoras mutates the resource requests and limits of running pods directly. It does not modify the Deployment or StatefulSet object itself.