Skip to main content
To enable Thoras to manage or recommend scaling for your workload, you must register an AIScaleTarget Custom Resource (CRD) in your Kubernetes cluster. This guide will walk you through doing this by applying an AIScaleTarget resource with kubectl.

Prerequisites

0. (Optional) Auto-configure AIScaleTargets using the dashboard

On a fresh installation, Thoras can auto-discover workloads and make suggestions to auto-populate AIScaleTargets in the cluster. This enables metric collection and resource recommendations for the selected workloads (see quickstart guide). Eventually you will probably want to integrate your AIScaleTarget configurations into your delivery pipelines, but the auto-populate feature is a particularly useful way to try out the product without having to manage any configuration.

1. Define Initial AIScaleTarget Custom Resource

Create a new file named my-ast.yaml
apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: {{YOUR_AST_NAME}}
  namespace: {{YOUR_NAMESPACE}}
spec:
  model:
    forecast_blocks: 15m
    forecast_cron: "*/15 * * * *"
  scaleTargetRef:
    kind: Deployment
    name: {{YOUR_AST_NAME}}
  horizontal:
    mode: recommendation
  • metadata.name: The name for your AIScaleTarget resource (should match your workload name for clarity).
  • metadata.namespace: The namespace where your workload is running.
  • spec.scaleTargetRef: Points to the Kubernetes resource (usually a Deployment) you want to scale.
  • spec.model.forecast_blocks: How far into the future to forecast
  • spec.model.forecast_cron: How often to make a forecast.

2. Apply the AIScaleTarget Using the CLI

Use kubectl apply to register the AIScaleTarget with your cluster:
kubectl apply -f my-ast.yaml
You should see output like:
aiscaletarget.thoras.ai/my-name created

3. Verify the AIScaleTarget

Check that your AIScaleTarget was created:
kubectl get aiscaletarget -n my-namespace
To see details:
kubectl describe aiscaletarget my-app -n my-namespace
Note: An existing AIScaleTarget can be exported to a YAML for version control or editing with:
kubectl get aiscaletarget <name> -n <namespace> -o yaml > my-ast.yaml

Understanding Vertical and Horizontal Scaling Modes

An AIScaleTarget can have both vertical and horizontal sections configured, each with its own mode setting (recommendation or autonomous). However, it is important to understand how these interact.

Recommendation Mode (Both Directions)

When both vertical and horizontal are in recommendation mode, Thoras provides suggestions for both scaling directions. These suggestions are mutually exclusive, and should be treated as one or the other. For example, if Thoras suggests 1Gi memory (vertical) and 3 pods (horizontal), the workload could be rightsized by either:
  • Applying the vertical suggestion (1Gi memory), OR
  • Applying the horizontal suggestion (3 pods)
Applying both suggestions would result in an over-provisioned workload.

Autonomous Mode (One Direction Only)

Only one scaling direction can be in autonomous mode at a time. When enabling autonomous mode, you choose whether Thoras should automatically scale vertically or horizontally. Valid configurations:
Horizontal ModeVertical ModeValid
recommendationrecommendation
autonomousrecommendation
recommendationautonomous
autonomousautonomous
When enrolling a workload into autonomous mode, consider which scaling direction best fits your workload:
  • Horizontal autonomous is recommended for workloads with spiky usage that require rapid scaling to handle traffic bursts.
  • Vertical autonomous is ideal for workloads that prioritize efficient cluster bin packing and cost optimization.
Contact Thoras’ support with any questions about which scaling direction is the best option for your workloads and business needs.

Using Label Selectors to Target Multiple Workloads

Instead of targeting a single workload by name with scaleTargetRef, you can use selector to apply the same vertical scaling policy to multiple workloads that share common pod labels.

When to Use Label Selectors

Label selectors are useful when you want to:
  • Scale a workload that uses a blue/green deployment strategy
  • Scale workloads dynamically based on labels without hardcoding specific names
  • Manage scaling configurations at a higher level of abstraction

Example: Targeting Multiple Workloads by Label

Create an AIScaleTarget using a label selector:
apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: production-services
  namespace: {{YOUR_NAMESPACE}}
spec:
  selector:
    matchLabels:
      app: my-service
      environment: production
  model:
    forecast_blocks: 15m
    forecast_cron: "*/15 * * * *"
  vertical:
    containers:
      - name: {{CONTAINER_NAME}}
        cpu:
          lowerbound: 20m
          upperbound: 1
        memory:
          lowerbound: 50Mi
          upperbound: 2G
    mode: recommendation
    update_policy:
      update_mode: in_place_or_recreate
This configuration targets all Deployments, StatefulSets, and Rollouts in the namespace whose pods have both app: my-service and environment: production labels.

Important Constraints

  • selector and scaleTargetRef are mutually exclusive—use one or the other
  • Horizontal scaling is not supported with selector—omit spec.horizontal entirely
  • Only vertical scaling is supported when using label selectors
  • All targeted workloads must be in the same namespace as the AIScaleTarget
  • Pods must be managed by a Deployment, StatefulSet, or Argo Rollout. Standalone pods or other controller types are not supported.

Behavior in Autonomous Mode

When using selector in autonomous mode:
  1. Thoras identifies all matching pod controllers in the namespace
  2. When the forecaster suggests new resource requests, Thoras updates each matching controller one by one
  3. If update_policy.update_mode is set to in_place or in_place_or_recreate, pods are resized in place if possible (Kubernetes 1.33+)
  4. If in_place_or_recreate is specified and resizing is not possible or fails, then pods are recreated with the new resource requests. If in_place is specified and resizing is not possible or fails, then pods are not created but any future pods will be right sized.
Additionally, sometimes it can be useful to sync your workload’s active resource requests with the manifests in your source code repo to keep the baseline workload definition aligned with reality. Reach out to your Thoras point-of-contact for information on how to automate this process.