AIScaleTarget Custom Resource (CRD) in your Kubernetes cluster.
This guide will walk you through doing this by applying an AIScaleTarget
resource with kubectl.
Prerequisites
- A running Kubernetes cluster with Kubernetes metrics server installed
- Thoras installed and running (see quickstart guide)
0. (Optional) Auto-configure AIScaleTargets using the dashboard
On a fresh installation, Thoras can auto-discover workloads and make suggestions
to auto-populate AIScaleTargets in the cluster. This enables metric collection
and resource recommendations for the selected workloads (see
quickstart guide).
Eventually you will probably want to integrate your AIScaleTarget
configurations into your delivery pipelines, but the auto-populate feature is a
particularly useful way to try out the product without having to manage any
configuration.
1. Define Initial AIScaleTarget Custom Resource
Create a new file named my-ast.yaml
metadata.name: The name for yourAIScaleTargetresource (should match your workload name for clarity).metadata.namespace: The namespace where your workload is running.spec.scaleTargetRef: Points to the Kubernetes resource (usually a Deployment) you want to scale.spec.model.forecast_blocks: How far into the future to forecastspec.model.forecast_cron: How often to make a forecast.
2. Apply the AIScaleTarget Using the CLI
Use kubectl apply to register the AIScaleTarget with your cluster:
3. Verify the AIScaleTarget
Check that your AIScaleTarget was created:
AIScaleTarget can be exported to a YAML for version control
or editing with:
Understanding Vertical and Horizontal Scaling Modes
AnAIScaleTarget can have both vertical and horizontal sections
configured, each with its own mode setting (recommendation or autonomous).
However, it is important to understand how these interact.
Recommendation Mode (Both Directions)
When both vertical and horizontal are inrecommendation mode, Thoras provides
suggestions for both scaling directions. These suggestions are mutually
exclusive, and should be treated as one or the other.
For example, if Thoras suggests 1Gi memory (vertical) and 3 pods (horizontal),
the workload could be rightsized by either:
- Applying the vertical suggestion (1Gi memory), OR
- Applying the horizontal suggestion (3 pods)
Autonomous Mode (One Direction Only)
Only one scaling direction can be inautonomous mode at a time. When enabling
autonomous mode, you choose whether Thoras should automatically scale vertically
or horizontally.
Valid configurations:
| Horizontal Mode | Vertical Mode | Valid |
|---|---|---|
recommendation | recommendation | ✓ |
autonomous | recommendation | ✓ |
recommendation | autonomous | ✓ |
autonomous | autonomous | ✗ |
- Horizontal autonomous is recommended for workloads with spiky usage that require rapid scaling to handle traffic bursts.
- Vertical autonomous is ideal for workloads that prioritize efficient cluster bin packing and cost optimization.
Choosing a Forecast Optimization Strategy
Thoras forecasts future demand for your workload and recommends resource levels accordingly. You can choose an optimization strategy that reflects your operational priorities. All strategies forecast demand and include safety guardrails. The difference is how tightly recommendations track expected usage versus how much additional margin you prefer.Quick Reference
| Strategy | Approach | Recommended For |
|---|---|---|
balanced (default) | Balanced reliability and efficiency | General production workloads |
cost_optimized | Tighter margins with increased efficiency focus | Async, retry-safe workloads |
undershoot | Most aggressive efficiency with reduced safety margins | Non-critical or resilient jobs |
Mode Details
balanced (default)
- Balances reliability and efficiency with comfortable safety margins
- Provides strong coverage against historically observed demand volatility
- Choose this for production workloads where you want both optimization and reliability
cost_optimized
- Uses tighter margins to increase efficiency
- May allow occasional brief resource pressure during rare demand spikes
- Best for workloads that tolerate moderate variability: background jobs, async services, retry-safe systems
undershoot
- Most aggressive cost-saving strategy
- Accepts a higher probability of temporary resource pressure (e.g., throttling or backlog)
- Best for non-critical workloads, batch processing, or systems with strong retry logic and fallback behavior
mode: balanced.
Using Label Selectors to Target Multiple Workloads
Instead of targeting a single workload by name withscaleTargetRef, you can
use selector to apply the same vertical scaling policy to multiple workloads
that share common pod labels.
When to Use Label Selectors
Label selectors are useful when you want to:- Scale a workload that uses a blue/green deployment strategy
- Scale workloads dynamically based on labels without hardcoding specific names
- Manage scaling configurations at a higher level of abstraction
Example: Targeting Multiple Workloads by Label
Create anAIScaleTarget using a label selector:
app: my-service and environment: production
labels.
Important Constraints
selectorandscaleTargetRefare mutually exclusive—use one or the other- Horizontal scaling is not supported with
selector—omitspec.horizontalentirely - Only vertical scaling is supported when using label selectors
- All targeted workloads must be in the same namespace as the
AIScaleTarget - Pods must be managed by a Deployment, StatefulSet, or Argo Rollout. Standalone pods or other controller types are not supported.
Behavior in Autonomous Mode
When usingselector in autonomous mode:
- Thoras identifies all matching pod controllers in the namespace
- When the forecaster suggests new resource requests, Thoras updates each matching controller one by one
- If
update_policy.update_modeis set toin_placeorin_place_or_recreate, pods are resized in place if possible (Kubernetes 1.33+) - If
in_place_or_recreateis specified and resizing is not possible or fails, then pods are recreated with the new resource requests. Ifin_placeis specified and resizing is not possible or fails, then pods are not created but any future pods will be right sized.