Prerequisites
Before you begin, make sure:- Thoras is installed and running in your Kubernetes cluster.
- The Kubernetes Metrics Server is installed in your cluster.
- Your workloads are instrumented with resource requests and have some usage history.
Sample Configuration
Here’s an exampleAIScaleTarget that configures rightsizing.
ast.yaml
Vertical Breakdown
spec.vertical.modeset torecommendationensures Thoras suggests optimal requests and limits but does not apply them.spec.vertical.containers[].nameshould be set to the name of the container in the pods for the workload you are targeting with thisAIScaleTarget.spec.vertical.containers[].RESOURCE.lowerboundis the lowest Thoras is allowed to set the request for that resource and is a required field.spec.vertical.containers[].RESOURCE.upperboundis the highest Thoras is allowed to set the request for that resource and is an optional field.spec.vertical.containers[].RESOURCE.limitallows thoras to modify the container’s limit along with the request for this resource and is optional. If unset, Thoras will not modify limits for this resource.spec.vertical.containers[].RESOURCE.limit.ratiothe ratio used to keep the limit inline with the suggested request for this resource. E.g. for a ratio2.5Thoras will make the limit2.5Gif the suggestion for the request was for1G.spec.vertical.in_place_resizing.enabledwhen set totrue, enables Kubernetes in-place pod resizing, which allows Thoras to adjust container resources without recreating pods (requires Kubernetes 1.33+).spec.vertical.in_place_resizing.allow_restart_on_memory_limit_decreasemust be set totrueif you’re using memory limits and have in-place resizing enabled. This allows Kubernetes to restart containers when memory limits decrease.
In-Place Resizing
Whenspec.vertical.in_place_resizing.enabled is set to true, Thoras will
attempt to resize pods without eviction, significantly reducing disruption
during vertical scaling operations.
Note: Enabling in-place resizing does not guarantee that pods will never restart during resizing operations. Pods may still be evicted and restarted if the resize would change the pod’s QoS class or if the cluster does not support the specific resize operation.How it works:
- Thoras attempts to resize pods in place using the Kubernetes resize API
- If the pod’s QoS class would change or resize is not supported, Thoras falls back to evicting the pod
- Evicted pods are rolled out gradually to minimize service disruption
- Reduced disruption: Pods continue running during resource adjustments
- Faster scaling: Changes take effect immediately without pod recreation
- Better node utilization: Enables more efficient bin-packing and cost savings, especially when combined with node autoscaling tools like Karpenter
- Graceful fallback: Automatic eviction when in-place resizing isn’t possible
- Kubernetes 1.33+
- If using memory limits, set
allow_restart_on_memory_limit_decrease: true
Additional Details
- Thoras recommendations are based on historical usage data collected from the Kubernetes Metrics Server.
- Recommendations are updated on the schedule defined by
spec.model.forecast_cron. - If resource usage patterns change significantly, Thoras will adjust its suggestions accordingly.
- You can monitor the impact of rightsizing by reviewing resource utilization and cost metrics in the Thoras Dashboard.
- For workloads with multiple containers, you can specify vertical settings for
each container individually under
spec.vertical.containers. spec.vertical.containers[].RESOURCEcan be eithercpuormemory. If either is unset for a container, Thoras will not scale that resource for that container.
Viewing Recommendations
Once theAIScaleTarget is applied, The Thoras Reasoning Engine will begin
analyzing resource usage and generate vertical sizing recommendations on the
schedule defined by spec.model.forecast_cron.
You can view the suggested requests and their projected effects on your workload
in the Thoras dashboard under Resource Allocation Settings.

Next Steps
- Change
spec.vertical.modetoautonomousif you want Thoras to automatically apply recommendations. - Add optional
scaling_behaviorif you want to fine-tune the thresholds Thoras uses to determine if a recommendation is worth restarting the entire workload.
ast.yaml
Autonomous Mode
The Thoras Reasoning Engine will make usage predictions on the schedule set byspec.model.forecast_cron for the length of spec.model.forecast_blocks. If no
spec.vertical.scaling_behavior is set then Thoras will check all resource
requests to determine if they match the current suggestion. If any resource
request differs from the current suggestion, Thoras triggers a rollout restart
of the workload. As the pods restart, their resource requests and limits will be
updated to match the suggestion made by Thoras.
Note: Thoras mutates the resource requests and limits of running pods directly. It does not modify the Deployment or StatefulSet object itself.