AIScaleTarget Custom Resource (CRD) in your Kubernetes cluster.
This guide will walk you through doing this by applying an AIScaleTarget
resource with kubectl.
Prerequisites
- A running Kubernetes cluster with Kubernetes metrics server installed
- Thoras installed and running (see quickstart guide)
0. (Optional) Auto-configure AIScaleTargets using the dashboard
On a fresh installation, Thoras can auto-discover workloads and make suggestions
to auto-populate AIScaleTargets in the cluster. This enables metric collection
and resource recommendations for the selected workloads (see
quickstart guide).
Eventually you will probably want to integrate your AIScaleTarget
configurations into your delivery pipelines, but the auto-populate feature is a
particularly useful way to try out the product without having to manage any
configuration.
1. Define Initial AIScaleTarget Custom Resource
Create a new file named my-ast.yaml
metadata.name: The name for yourAIScaleTargetresource (should match your workload name for clarity).metadata.namespace: The namespace where your workload is running.spec.scaleTargetRef: Points to the Kubernetes resource (usually a Deployment) you want to scale.spec.model.forecast_blocks: How far into the future to forecastspec.model.forecast_cron: How often to make a forecast.
2. Apply the AIScaleTarget Using the CLI
Use kubectl apply to register the AIScaleTarget with your cluster:
3. Verify the AIScaleTarget
Check that your AIScaleTarget was created:
AIScaleTarget can be exported to a YAML for version control
or editing with:
Understanding Vertical and Horizontal Scaling Modes
AnAIScaleTarget can have both vertical and horizontal sections
configured, each with its own mode setting (recommendation or autonomous).
However, it is important to understand how these interact.
Recommendation Mode (Both Directions)
When both vertical and horizontal are inrecommendation mode, Thoras provides
suggestions for both scaling directions. These suggestions are mutually
exclusive, and should be treated as one or the other.
For example, if Thoras suggests 1Gi memory (vertical) and 3 pods (horizontal),
the workload could be rightsized by either:
- Applying the vertical suggestion (1Gi memory), OR
- Applying the horizontal suggestion (3 pods)
Autonomous Mode (One Direction Only)
Only one scaling direction can be inautonomous mode at a time. When enabling
autonomous mode, you choose whether Thoras should automatically scale vertically
or horizontally.
Valid configurations:
| Horizontal Mode | Vertical Mode | Valid |
|---|---|---|
recommendation | recommendation | ✓ |
autonomous | recommendation | ✓ |
recommendation | autonomous | ✓ |
autonomous | autonomous | ✗ |
- Horizontal autonomous is recommended for workloads with spiky usage that require rapid scaling to handle traffic bursts.
- Vertical autonomous is ideal for workloads that prioritize efficient cluster bin packing and cost optimization.
Using Label Selectors to Target Multiple Workloads
Instead of targeting a single workload by name withscaleTargetRef, you can
use selector to apply the same vertical scaling policy to multiple workloads
that share common pod labels.
When to Use Label Selectors
Label selectors are useful when you want to:- Scale a workload that uses a blue/green deployment strategy
- Scale workloads dynamically based on labels without hardcoding specific names
- Manage scaling configurations at a higher level of abstraction
Example: Targeting Multiple Workloads by Label
Create anAIScaleTarget using a label selector:
app: my-service and environment: production
labels.
Important Constraints
selectorandscaleTargetRefare mutually exclusive—use one or the other- Horizontal scaling is not supported with
selector—omitspec.horizontalentirely - Only vertical scaling is supported when using label selectors
- All targeted workloads must be in the same namespace as the
AIScaleTarget - Pods must be managed by a Deployment, StatefulSet, or Argo Rollout. Standalone pods or other controller types are not supported.
Behavior in Autonomous Mode
When usingselector in autonomous mode:
- Thoras identifies all matching pod controllers in the namespace
- When the forecaster suggests new resource requests, Thoras updates each matching controller one by one
- If
update_policy.update_modeis set toin_placeorin_place_or_recreate, pods are resized in place if possible (Kubernetes 1.33+) - If
in_place_or_recreateis specified and resizing is not possible or fails, then pods are recreated with the new resource requests. Ifin_placeis specified and resizing is not possible or fails, then pods are not created but any future pods will be right sized.