Prerequisites

Before you begin, make sure:

  • Thoras is installed and running in your Kubernetes cluster.
  • The Kubernetes Metrics Server is installed in your cluster.
  • Your workloads are instrumented with resource requests and have some usage history.

Sample Configuration

Here’s an example AIScaleTarget that configures rightsizing.

ast.yaml
apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: {{ YOUR_AST_NAME }}
  namespace: {{ YOUR_NAMESPACE }}
spec:
  scaleTargetRef:
    kind: Deployment
    name: {{ YOUR_DEPLOYMENT_NAME }}
  model:
    forecast_blocks: 15m0s
    forecast_buffer_percentage: 0%
    forecast_cron: "*/15 * * * *"
  vertical:
    mode: recommendation
    containers:
      - name: {{ YOUR_CONTAINER_NAME }}
        cpu:
          lowerbound: 20m
          upperbound: 1
        memory:
          lowerbound: 50Mi
          limit:
            ratio: 1.5

Vertical Breakdown

  • spec.vertical.mode set to recommendation ensures Thoras suggests optimal requests and limits but does not apply them.
  • spec.vertical.containers[].name should be set to the name of the container in the pods for the workload you are targeting with this AIScaleTarget.
  • spec.vertical.containers[].RESOURCE.lowerbound is the lowest Thoras is allowed to set the request for that resource and is a required field.
  • spec.vertical.containers[].RESOURCE.upperbound is the highest Thoras is allowed to set the request for that resource and is an optional field.
  • spec.vertical.containers[].RESOURCE.limit allows thoras to modify the container’s limit along with the request for this resource and is optional. If unset, Thoras will not modify limits for this resource.
  • spec.vertical.containers[].RESOURCE.limit.ratio the ratio used to keep the limit inline with the suggested request for this resource. E.g. for a ratio 2.5 Thoras will make the limit 2.5G if the suggestion for the request was for 1G.

Additional Details

  • Thoras recommendations are based on historical usage data collected from the Kubernetes Metrics Server.
  • Recommendations are updated on the schedule defined by spec.model.forecast_cron.
  • If resource usage patterns change significantly, Thoras will adjust its suggestions accordingly.
  • You can monitor the impact of rightsizing by reviewing resource utilization and cost metrics in the Thoras Dashboard.
  • For workloads with multiple containers, you can specify vertical settings for each container individually under spec.vertical.containers.
  • spec.vertical.containers[].RESOURCE can be either cpu or memory. If either is unset for a container, Thoras will not scale that resource for that container.

Viewing Recommendations

Once the AIScaleTarget is applied, The Thoras Reasoning Engine will begin analyzing resource usage and generate vertical sizing recommendations on the schedule defined by spec.model.forecast_cron.

You can view the suggested requests and their projected effects on your workload in the Thoras dashboard under Resource Allocation Settings.

Next Steps

  • Change spec.vertical.mode to autonomous if you want Thoras to automatically apply recommendations.
  • Add optional scaling_behavior if you want to fine-tune the thresholds Thoras uses to determine if a recommendation is worth restarting the entire workload.
ast.yaml
  ...
  vertical:
    mode: autonomous
    # Only restart workload if a new resource recommendation is at least 50% greater or lower than the current pod resource allocation
    scaling_behavior:
      scale_up:
        type: percent
        percent: 50
      scale_down:
        type: percent
        percent: 50
    ...

Autonomous Mode

The Thoras Reasoning Engine will make usage predictions on the schedule set by spec.model.forecast_cron for the length of spec.model.forecast_blocks. If no spec.vertical.scaling_behavior is set then Thoras will check all resource requests to determine if they match the current suggestion. If any resource request differs from the current suggestion, Thoras triggers a rollout restart of the workload. As the pods restart, their resource requests and limits will be updated to match the suggestion made by Thoras.

Note: Thoras mutates the resource requests and limits of running pods directly. It does not modify the Deployment or StatefulSet object itself.