AIScaleTarget Custom Resource that defines how
Thoras scales workloads.
The following is a sample AIScaleTarget definition:
ast.yaml
metadata
ast.yaml
metadata.name matches the name of the workload being
scaled.
scaleTargetRef
ast.yaml
scaleTargetRef is the reference to a specific workload that Thoras will scale
up or down. The workload must reside in the same namespace as the
AIScaleTarget.
apiVersion specifies the API group and version that defines the resource,
while kind identifies the type of resource (e.g., Deployment, StatefulSet,
Rollout, etc).
Note: scaleTargetRef and selector are mutually exclusive. Use
scaleTargetRef to target a single workload by name, or use selector to
target multiple workloads by label.
selector
ast.yaml
selector allows you to target multiple workloads by
pod labels
instead of specifying a single workload by name. This is useful when you want to
apply the same scaling policy to multiple workloads that share common labels.
Important constraints:
selectorandscaleTargetRefare mutually exclusive—you must use one or the other, not both.- When using
selector, horizontal scaling is not supported. You must leavespec.horizontalempty or omit it entirely. - Pods must be managed by a Deployment, StatefulSet, or Argo Rollout. Standalone pods or other controller types are not supported.
selector is defined, Thoras identifies all pod controllers
(Deployments, StatefulSets, Rollouts) in the same namespace whose pods match the
label selector.
In autonomous mode:
- Thoras restarts each matching pod controller one by one when a resource request suggestion comes in from the forecaster.
- If
update_policy.update_modeis set toin_placeorin_place_or_recreate, pods are resized in place without restarts (when possible).
model
ast.yaml
model.forecast_blocks
Describes how far into the future Thoras’ scaler should prepare you for. For
example, if forecast_blocks is 15m Thoras will forecast the maximum load of
this workload over the next 15 minutes and then scale to that maximum.
Important: forecast_blocks should be at least equal to your
forecast_cron interval. For example, if your forecast_cron is set to run
every 15m, your forecast_blocks should be at least 15m. This ensures
forecasts always cover the next scaling window and prevents gaps in scaling
coverage.
You will need to specify either the “m” (for minutes) or “h” (for hours) unit
for the value of model.forecast_blocks.
model.forecast_buffer_percentage
Defines an additional buffer applied on top of the forecasted resource usage to
reduce the risk of under-provisioning.
For example, if the forecasted CPU usage is 500m and
forecast_buffer_percentage is set to 10%, the final recommended value will be
550m. Setting this to 0% means no buffer will be added.
model.forecast_cron
The forecast_cron setting determines how often forecasts are generated and,
consequently, how often Thoras may trigger scaling events. Shorter intervals
(e.g., every 5 minutes) can lead to more accurate forecasts but may also result
in more frequent scaling.
forecast_cron controls how often forecasts are made, while
model.forecast_blocks defines the time window each
forecast covers.
horizontal
ast.yaml
mode field controls whether Thoras provides recommendations or actively
scales your workload:
In autonomous mode:
-
Horizontally-scaled workloads must have CPU and/or memory requests defined
in the pod spec if you plan to scale on
averageUtilization. -
The target deployment of your
AIScaleTargetmust have a single existing Horizontal Pod Autoscaler (HPA).Note: Only one scaling direction (horizontal or vertical) can be in
autonomousmode at a time. See Understanding Vertical and Horizontal Scaling Modes for additional details.
recommendation mode :
- System does not require an existing HPA. Thoras will assume an 80% utilization target for CPU and memory when generating scaling suggestions.
- Opting in additional metrics using
spec.horizontal.additional_metrics - Opting out of default metrics using
spec.horizontal.exclude_metrics
spec.horizontal.mode to
recommendation. It is recommended to start in recommendation mode to allow
the model time to learn workload patterns and validate scaling suggestions. Once
the recommendations align with your performance expectations, you can switch to
autonomous mode for automated scaling.
Feel free to reach out to the Thoras Engineering Team to discuss model
performance before switching into autonomous mode.
Visit Predictive Horizontal Pod Autoscaling with Thoras Guide
for more info.
vertical
ast.yaml
mode field controls whether Thoras provides recommendations or actively
scales your workload:
To define a vertical scaling policy, you’ll want to set the following fields:
-
vertical.mode: recommendation- we recommend running Thoras inrecommendationmode for at least a day for the model to train before enablingautonomous. -
vertical.containers[0].memory.lowerboundis always required andvertical.containers[0].memory.upperboundis optional if you had a preference for a memory floor and ceiling for your target workload. -
vertical.containers[].RESOURCE.limitallows Thoras to modify the container’s limit along with the request for this resource and is optional. If unset, Thoras will not modify limits for this resource. -
vertical.containers[].RESOURCE.limit.ratiothe ratio used to keep the limit inline with the suggested request for this resource. E.g. for a ratio2.5Thoras will make the limit2.5Gif the suggestion for the request was for1G.Note: Only one scaling direction (horizontal or vertical) can be in
autonomousmode at a time. See Understanding Vertical and Horizontal Scaling Modes for details on how vertical and horizontal modes interact.
update_policy
update_policy field controls pod update behavior during vertical scaling
operations.
Prerequisites for in-place updates:
- Kubernetes 1.33+
update_mode(string): Determines how pod updates are applied:recreate- Pods are recreated when resource changes are appliedin_place- Changes applied without pod recreation (requires Kubernetes 1.33+)in_place_or_recreate- Attempts in-place updates with fallback to recreation (recommended)initial- Recommendations apply only during pod creation
recreate_resources(optional array): Specifies which resources (memory,cpu) trigger pod recreation when using recreate or fallback modes
- When
update_modeisin_placeorin_place_or_recreate, Thoras attempts to resize pods in place using the Kubernetes resize API - When
update_modeisin_place_or_recreateand the pod’s QoS class would change or the resize operation is not supported, Thoras falls back to evicting the pod - Evicted pods are rolled out gradually rather than all at once, minimizing service disruption
- Reduced disruption: Pods are not recreated unless necessary when using in-place updates
- Faster scaling: Resource adjustments take effect immediately
- Better node utilization: Enables more efficient bin-packing and cost savings, especially when combined with node autoscaling tools like Karpenter
- Graceful fallback: Automatic eviction when in-place resizing is not possible
- Fine-grained control: Specify which resources trigger pod recreation
additional_configuration
scaling_behavior
Controls scaling thresholds to prevent unnecessary scaling actions. Thresholds
can be configured independently for scale_up and scale_down. Both
percentage and absolute thresholds can be used for horizontal and
vertical scaling.
Percentage Thresholds - Trigger scaling when change exceeds a percentage of
current usage (works for both horizontal and vertical):
pods:
memory and cpu:
- Use absolute thresholds for small workloads (percentage changes can be misleading)
- Use percentage thresholds for large workloads (scales naturally with deployment size)
- Set asymmetric thresholds (conservative scale-up, aggressive scale-down)
- Start with higher thresholds (20-30%) to avoid scaling churn
additional_metrics and exclude_metrics
ast.yaml
- Exclude certain metrics (e.g., CPU) from its predictions, even if they are used by the HPA.
- Add new metrics for prediction that are not part of the HPA.
Example 1: Exclude CPU from Predictive Scaling
This configuration tells Thoras not to use CPU for predictive scaling, even if it’s included in the HPA:ast.yaml
Example 2: Add Custom Metric for Predictive Scaling
In this example, custom metric is configured to predictively scale on CPU by explicitly including it inadditional_metrics. Thoras will predictively scale
on CPU even though it is not defined in the HPA.
ast.yaml