Skip to main content

Prerequisites

Before you begin, make sure:
  • Thoras is installed and running in your Kubernetes cluster.
  • The Kubernetes Metrics Server is installed in your cluster.
  • Your workloads are instrumented with resource requests and have some usage history.

Sample Configuration

Here’s an example AIScaleTarget that configures rightsizing.
ast.yaml
apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: {{YOUR_AST_NAME}}
  namespace: {{YOUR_NAMESPACE}}
spec:
  scaleTargetRef:
    kind: Deployment
    name: {{YOUR_DEPLOYMENT_NAME}}
  model:
    forecast_blocks: 15m0s
    forecast_buffer_percentage: 0%
    forecast_cron: "*/15 * * * *"
  vertical:
    mode: recommendation
    containers:
      - name: {{YOUR_CONTAINER_NAME}}
        cpu:
          lowerbound: 20m
          upperbound: 1
        memory:
          lowerbound: 50Mi
          limit:
            ratio: 1.5
    in_place_resizing:
      enabled: true
      allow_restart_on_memory_limit_decrease: true

Vertical Breakdown

  • spec.vertical.mode set to recommendation ensures Thoras suggests optimal requests and limits but does not apply them.
  • spec.vertical.containers[].name should be set to the name of the container in the pods for the workload you are targeting with this AIScaleTarget.
  • spec.vertical.containers[].RESOURCE.lowerbound is the lowest Thoras is allowed to set the request for that resource and is a required field.
  • spec.vertical.containers[].RESOURCE.upperbound is the highest Thoras is allowed to set the request for that resource and is an optional field.
  • spec.vertical.containers[].RESOURCE.limit allows thoras to modify the container’s limit along with the request for this resource and is optional. If unset, Thoras will not modify limits for this resource.
  • spec.vertical.containers[].RESOURCE.limit.ratio the ratio used to keep the limit inline with the suggested request for this resource. E.g. for a ratio 2.5 Thoras will make the limit 2.5G if the suggestion for the request was for 1G.
  • spec.vertical.in_place_resizing.enabled when set to true, enables Kubernetes in-place pod resizing, which allows Thoras to adjust container resources without recreating pods (requires Kubernetes 1.33+).
  • spec.vertical.in_place_resizing.allow_restart_on_memory_limit_decrease must be set to true if you’re using memory limits and have in-place resizing enabled. This allows Kubernetes to restart containers when memory limits decrease.

In-Place Resizing

When spec.vertical.in_place_resizing.enabled is set to true, Thoras will attempt to resize pods without eviction, significantly reducing disruption during vertical scaling operations.
Note: Enabling in-place resizing does not guarantee that pods will never restart during resizing operations. Pods may still be evicted and restarted if the resize would change the pod’s QoS class or if the cluster does not support the specific resize operation.
How it works:
  1. Thoras attempts to resize pods in place using the Kubernetes resize API
  2. If the pod’s QoS class would change or resize is not supported, Thoras falls back to evicting the pod
  3. Evicted pods are rolled out gradually to minimize service disruption
Benefits:
  • Reduced disruption: Pods continue running during resource adjustments
  • Faster scaling: Changes take effect immediately without pod recreation
  • Better node utilization: Enables more efficient bin-packing and cost savings, especially when combined with node autoscaling tools like Karpenter
  • Graceful fallback: Automatic eviction when in-place resizing isn’t possible
Requirements:
  • Kubernetes 1.33+
  • If using memory limits, set allow_restart_on_memory_limit_decrease: true

Additional Details

  • Thoras recommendations are based on historical usage data collected from the Kubernetes Metrics Server.
  • Recommendations are updated on the schedule defined by spec.model.forecast_cron.
  • If resource usage patterns change significantly, Thoras will adjust its suggestions accordingly.
  • You can monitor the impact of rightsizing by reviewing resource utilization and cost metrics in the Thoras Dashboard.
  • For workloads with multiple containers, you can specify vertical settings for each container individually under spec.vertical.containers.
  • spec.vertical.containers[].RESOURCE can be either cpu or memory. If either is unset for a container, Thoras will not scale that resource for that container.

Viewing Recommendations

Once the AIScaleTarget is applied, The Thoras Reasoning Engine will begin analyzing resource usage and generate vertical sizing recommendations on the schedule defined by spec.model.forecast_cron. You can view the suggested requests and their projected effects on your workload in the Thoras dashboard under Resource Allocation Settings. resource-allocation

Next Steps

  • Change spec.vertical.mode to autonomous if you want Thoras to automatically apply recommendations.
  • Add optional scaling_behavior if you want to fine-tune the thresholds Thoras uses to determine if a recommendation is worth restarting the entire workload.
ast.yaml
  ...
  vertical:
    mode: autonomous
    # Only restart workload if a new resource recommendation is at least 50% greater or lower than the current pod resource allocation
    scaling_behavior:
      scale_up:
        type: percent
        percent: 50
      scale_down:
        type: percent
        percent: 50
    ...

Autonomous Mode

The Thoras Reasoning Engine will make usage predictions on the schedule set by spec.model.forecast_cron for the length of spec.model.forecast_blocks. If no spec.vertical.scaling_behavior is set then Thoras will check all resource requests to determine if they match the current suggestion. If any resource request differs from the current suggestion, Thoras triggers a rollout restart of the workload. As the pods restart, their resource requests and limits will be updated to match the suggestion made by Thoras.
Note: Thoras mutates the resource requests and limits of running pods directly. It does not modify the Deployment or StatefulSet object itself.