AIScaleTarget

The specification describes the AIScaleTarget Custom Resource that defines how Thoras scales workloads.

The following is a sample AIScaleTarget definition:

ast.yaml

apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: "{{ YOUR_AST_NAME }}"
  namespace: "{{ YOUR_NAMESPACE }}"
spec:
  scaleTargetRef:
    kind: Deployment
    name: "{{ YOUR_AST_NAME }}"
  model:
    forecast_blocks: 15m
    forecast_buffer_percentage: 0%
    forecast_cron: "*/15 * * * *"
  horizontal:
    mode: recommendation
    scaling_behavior:
      scale_up:
        type: percent
        percent: 50
      scale_down:
        type: percent
        percent: 50
  vertical:
    containers:
      - name: "{{ CONTAINER_NAME }}"
        cpu:
          lowerbound: 20m
          upperbound: 1
        memory:
          lowerbound: 50Mi
          upperbound: 2G
    mode: recommendation
    scaling_behavior:
      scale_up:
        type: percent
        percent: 50
      scale_down:
        type: percent
        percent: 50

`metadata`

ast.yaml

metadata:
 name: {{ YOUR_AST_NAME }}
  namespace: {{ YOUR_NAMESPACE }}

It’s recommended that metadata.name matches the name of the workload being scaled.

`scaleTargetRef`

ast.yaml

scaleTargetRef:
  kind: Deployment
  name: {{ YOUR_AST_NAME }}
  namespace: {{ YOUR_NAMESPACE }}
  apiVersion: apps/v1

scaleTargetRef is the required reference to the workload that Thoras will scale up or down. The workload must reside in the same namespace as the AIScaleTarget.

apiVersion specifies the API group and version that defines the resource, while kind identifies the type of resource (e.g., Deployment, StatefulSet, Rollout, etc).

`model`

ast.yaml

model:
  forecast_blocks: 15m
  forecast_buffer_percentage: 0%
  forecast_cron: "*/15 * * * *"

`model.forecast_blocks`

Describes how far into the future Thoras’ scaler should prepare you for. For example, if forecast_blocks is 15m Thoras will forecast the maximum load of this workload over the next 15 minutes and then scale to that maximum.

When updating forecast_blocks, you should always ensure your forecast_cron schedule will run as often or more often than the forecast_blocks window length.

For example, if your forecast_blocks is set to 10m, your cron should run at least every 10m. A 5m or 10m cron would work, but a 15m cron would not trigger often enough for optimal scaling.

You will need to specify either the “m” (for minutes) or “h” (for hours) unit for the value of model.forecast_blocks.

`model.forecast_buffer_percentage`

Defines an additional buffer applied on top of the forecasted resource usage to reduce the risk of under-provisioning.

For example, if the forecasted CPU usage is 500m and forecast_buffer_percentage is set to 10%, the final recommended value will be 550m. Setting this to 0% means no buffer will be added.

`model.forecast_cron`

The forecast_cron setting determines how often forecasts are generated and, consequently, how often Thoras may trigger scaling events. Shorter intervals (e.g., every 5 minutes) can lead to more accurate forecasts but may also result in more frequent scaling.

forecast_cron controls how often forecasts are made, while model.forecast_blocks defines the time window each forecast covers.

`horizontal`

ast.yaml

horizontal:
  mode: recommendation
  scaling_behavior:
    scale_up:
      type: percent
      percent: 50
    scale_down:
      type: percent
      percent: 50

Thoras horizontal scaling adjusts the number of replicas based on forecasted or observed workload demands.

In autonomous mode:

Horizontally-scaled workloads must have CPU and/or memory requests defined in the pod spec if you plan to scale on averageUtilization.
The target deployment of your AIScaleTarget must have a single existing Horizontal Pod Autoscaler (HPA).

In recommendation mode :

System does not require an existing HPA. Thoras will assume an 80% utilization target for CPU and memory when generating scaling suggestions.

Thoras does not replace your HPA—it works alongside it by feeding forecasted metrics into your existing HPA configuration. You can customize behavior by:

Opting in additional metrics using spec.horizontal.additional_metrics
Opting out of default metrics using spec.horizontal.exclude_metrics

This setup gives you full control over how scaling decisions are made while benefiting from Thoras’ predictive intelligence.

To enable horizontal scaling recommendations, set spec.horizontal.mode to recommendation. It is recommended to start in recommendation mode to allow the model time to learn workload patterns and validate scaling suggestions. Once the recommendations align with your performance expectations, you can switch to autonomous mode for automated scaling.

Feel free to reach out to the Thoras Engineering Team to discuss model performance before switching into autonomous mode.

Visit Predictive Horizontal Pod Autoscaling with Thoras Guide for more info.

`vertical`

ast.yaml

vertical:
  containers:
    - name: {{ CONTAINER_NAME }}
      cpu:
        lowerbound: 20m # Mandatory
        upperbound: 1 # Optional
      memory:
        lowerbound: 50Mi # Mandatory
        upperbound: 2G # Optional
  mode: recommendation
  scaling_behavior:
    scale_up:
      type: percent
      percent: 50
    scale_down:
      type: percent
      percent: 50

The Thoras vertical scaler adjusts container-level CPU and memory resource requests based on forecasted utilization.

To define a vertical scaling policy, you’ll want to set the following two fields:

vertical.mode: recommendation - we recommend running Thoras in recommendation mode for at least a day for the model to train before enabling autonomous.
vertical.containers[0].memory.lowerbound is always required and vertical.containers[0].memory.upperbound is optional if you had a preference for a memory floor and ceiling for your target workload.

Feel free to reach out to the Thoras Engineering Team to discuss model performance before switching into autonomous mode.

Visit Predictive Vertical Pod Rightsizing Guide for more info.

`scaling_behavior`

The scaling_behavior field controls the rate and pattern of scaling actions. It helps smooth out rapid scaling and prevents Thoras from scaling too aggressively or too conservatively.

scale_up defines how quickly Thoras is allowed to increase the number of replicas.
scale_down defines how quickly it can decrease the number of replicas.

scaling_behavior:
  scale_up:
    type: percent
    percent: 50
  scale_down:
    type: percent
    percent: 50

scaling_behavior` gives you a dial to control how sensitive Thoras is to triggers a scaling event. This setting helps rate-limit how much Thoras can scale up or down, making your scaling behavior more predictable and safe, based on your needs.

`additional_configuration`

ast.yaml

horizontal:
  mode: recommendation
  additional_metrics:
    - external:
        metric:
          name: {{ EXTERNAL_METRIC_NAME }}
          selector:
            matchLabels:
              app: {{ APP_NAME }}
        target:
          averageValue: "1"
          type: AverageValue
      type: External
  exclude_metrics:
    - name: cpu
    - type: Resource
  scaling_behavior:
    scale_up:
      type: percent
      percent: 50
    scale_down:
      type: percent
      percent: 50

When you enable horizontal scaling in Thoras, it defaults to using the same metrics as your HPA (Horizontal Pod Autoscaler) to guide predictive scaling. However, there may be cases where you want Thoras to:

Exclude certain metrics (e.g., CPU) from its predictions, even if they are used by the HPA.
Add new metrics for prediction that are not part of the HPA.

Example 1: Exclude CPU from Predictive Scaling

This configuration tells Thoras not to use CPU for predictive scaling, even if it’s included in the HPA:

ast.yaml

spec:
  horizontal:
    exclude_metrics:
      - name: cpu
        type: Resource

Example 2: Add Custom Metric for Predictive Scaling

In this example, custom metric is configured to predictively scale on CPU by explicitly including it in additional_metrics. Thoras will predictively scale on CPU even though it is not defined in the HPA.

ast.yaml

spec:
  horizontal:
    additional_metrics:
      - external:
          metric:
            name: {{ EXTERNAL_METRIC_NAME }}
            selector:
              matchLabels:
                app: {{ APP_NAME }}
          target:
            averageValue: "1"
            type: AverageValue
        type: External

Getting Started

Installation and Setup

Guides

Reference

FAQ

`metadata`

`scaleTargetRef`

`model`

`model.forecast_blocks`

`model.forecast_buffer_percentage`

`model.forecast_cron`

`horizontal`

`vertical`

`scaling_behavior`

`additional_configuration`

Example 1: Exclude CPU from Predictive Scaling

Example 2: Add Custom Metric for Predictive Scaling

Getting Started

Installation and Setup

Guides

Reference

FAQ

​metadata

​scaleTargetRef

​model

​model.forecast_blocks

​model.forecast_buffer_percentage

​model.forecast_cron

​horizontal

​vertical

​scaling_behavior

​additional_configuration

​Example 1: Exclude CPU from Predictive Scaling

​Example 2: Add Custom Metric for Predictive Scaling

`metadata`

`scaleTargetRef`

`model`

`model.forecast_blocks`

`model.forecast_buffer_percentage`

`model.forecast_cron`

`horizontal`

`vertical`

`scaling_behavior`

`additional_configuration`

Example 1: Exclude CPU from Predictive Scaling

Example 2: Add Custom Metric for Predictive Scaling