Predictive Vertical Pod Rightsizing is the process of preemptively optimizing the CPU and memory resource requests and limits for Kubernetes pods to ensure they have just enough resources to run efficiently without over or under provisioning. Unlike horizontal scaling, which adjusts the number of pod replicas, vertical rightsizing focuses on tuning individual pod configurations to better match their actual usage patterns. This helps reduce wasted resources, lower cloud costs, and improve cluster utilization, while also minimizing the risk of resource starvation or out-of-memory (OOM) errors.
AIScaleTarget
that configures rightsizing.
spec.vertical.mode
set to recommendation
ensures Thoras suggests optimal
requests and limits but does not apply them.spec.vertical.containers[].name
should be set to the name of the container
in the pods for the workload you are targeting with this AIScaleTarget
.spec.vertical.containers[].RESOURCE.lowerbound
is the lowest Thoras is
allowed to set the request for that resource and is a required field.spec.vertical.containers[].RESOURCE.upperbound
is the highest Thoras is
allowed to set the request for that resource and is an optional field.spec.vertical.containers[].RESOURCE.limit
allows thoras to modify the
container’s limit along with the request for this resource and is optional. If
unset, Thoras will not modify limits for this resource.spec.vertical.containers[].RESOURCE.limit.ratio
the ratio used to keep the
limit inline with the suggested request for this resource. E.g. for a ratio
2.5
Thoras will make the limit 2.5G
if the suggestion for the request was for 1G
.spec.model.forecast_cron
.spec.vertical.containers
.spec.vertical.containers[].RESOURCE
can be either cpu
or memory
. If
either is unset for a container, Thoras will not scale that resource for that
container.AIScaleTarget
is applied, The Thoras Reasoning Engine will begin
analyzing resource usage and generate vertical sizing recommendations on the
schedule defined by spec.model.forecast_cron
.
You can view the suggested requests and their projected effects on your workload
in the Thoras dashboard under Resource Allocation Settings.
spec.vertical.mode
to autonomous
if you want Thoras to
automatically apply recommendations.scaling_behavior
if you want to fine-tune the thresholds Thoras
uses to determine if a recommendation is worth restarting the entire workload.spec.model.forecast_cron
for the length of spec.model.forecast_blocks
. If no
spec.vertical.scaling_behavior
is set then Thoras will check all resource
requests to determine if they match the current suggestion. If any resource
request differs from the current suggestion, Thoras triggers a rollout restart
of the workload. As the pods restart, their resource requests and limits will be
updated to match the suggestion made by Thoras.
Note: Thoras mutates the resource requests and limits of running pods directly. It does not modify the Deployment or StatefulSet object itself.