AIScaleTarget
Custom Resource that
defines how Thoras scales workloads.
The following is a sample AIScaleTarget
definition:
metadata
metadata.name
matches the name of the workload being
scaled.
scaleTargetRef
scaleTargetRef
is the required reference to the workload that Thoras will
scale up or down. The workload must reside in the same namespace as the
AIScaleTarget
.
apiVersion
specifies the API group and version that defines the resource, while kind
identifies the type of resource (e.g., Deployment
, StatefulSet
, Rollout
, etc).
model
model.forecast_blocks
forecast_blocks
is 15m
Thoras will forecast the maximum load of
this workload over the next 15 minutes and then scale to that maximum.
When updating forecast_blocks
, you should always ensure your
forecast_cron
schedule will run as often or more
often than the forecast_blocks
window length.
For example, if your forecast_blocks is set to 10m
, your cron should run at
least every 10m
. A 5m
or 10m
cron would work, but a 15m
cron would not
trigger often enough for optimal scaling.
You will need to specify either the “m” (for minutes) or “h” (for hours) unit
for the value of model.forecast_blocks.
model.forecast_buffer_percentage
forecast_buffer_percentage
is set to 10%, the final recommended value will be
550m. Setting this to 0% means no buffer will be added.
model.forecast_cron
forecast_cron
setting determines how often forecasts are generated and,
consequently, how often Thoras may trigger scaling events. Shorter intervals
(e.g., every 5 minutes) can lead to more accurate forecasts but may also result
in more frequent scaling.
forecast_cron
controls how often forecasts are made, while
model.forecast_blocks defines the time window each
forecast covers.
horizontal
autonomous
mode:
averageUtilization
.AIScaleTarget
must have a single existing
Horizontal Pod Autoscaler (HPA).recommendation
mode :
spec.horizontal.additional_metrics
spec.horizontal.exclude_metrics
spec.horizontal.mode
to
recommendation
. It is recommended to start in recommendation
mode to allow
the model time to learn workload patterns and validate scaling suggestions. Once
the recommendations align with your performance expectations, you can switch to
autonomous
mode for automated scaling.
Feel free to reach out to the Thoras Engineering Team to discuss model
performance before switching into autonomous mode.
Visit
Predictive Horizontal Pod Autoscaling with Thoras Guide
for more info.
vertical
vertical.mode: recommendation
- we recommend running Thoras in
recommendation
mode for at least a day for the model to train before
enabling autonomous
.
vertical.containers[0].memory.lowerbound
is always required and
vertical.containers[0].memory.upperbound
is optional if you had a preference
for a memory floor and ceiling for your target workload.
scaling_behavior
scaling_behavior
field controls the rate and pattern of scaling
actions. It helps smooth out rapid scaling and prevents Thoras from scaling
too aggressively or too conservatively.
scale_up
defines how quickly Thoras is allowed to increase the number of
replicas.scale_down
defines how quickly it can decrease the number of replicas.scaling_behavior
gives you a dial to control how sensitive Thoras is to
triggers a scaling event. This setting helps rate-limit how much Thoras can
scale up or down, making your scaling behavior more predictable and safe, based
on your needs.
additional_configuration
additional_metrics
. Thoras will predictively scale
on CPU even though it is not defined in the HPA.