Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.thoras.ai/llms.txt

Use this file to discover all available pages before exploring further.

When a container is killed due to an out-of-memory (OOM) event, Thoras can automatically increase the container’s memory request to stabilize the workload. This prevents a workload from cycling through repeated OOM kills before while waiting for a scheduled forecast to produce a sufficient memory recommendation. OOM remediation requires autonomous vertical scaling to be enabled. It is opt-in per workload.

How It Works

When OOM kills are detected on a workload, Thoras:
  1. Bumps memory requests for affected containers by a 1.2x multiplier, computed from the average current container memory request across all running pods.
  2. Applies the adjustment to running pods and any newly created pods. For running pods, Thoras uses your update_policy: it attempts an in-place resize first (if enabled), and falls back to a rolling restart (if enabled) when resize fails or is not supported. Any pods created or rescheduled in the meantime (e.g., evictions, scale-out) automatically receive the adjusted memory regardless of whether the resize or restart succeeds.
  3. Repeats every 2 minutes as long as OOM kills continue, compounding each time (1.2×, then 1.44×, then 1.73×, and so on) until OOMs stop.
  4. Holds the memory floor for a configurable stabilization window after the last OOM. During this window, forecasts can raise memory above the floor but cannot lower it. This prevents a new forecast, which may not yet reflect the OOM episode, from reversing the adjustment down to a point which would be prone to OOMing.
  5. Returns full control to the forecaster once the stabilization window expires. If OOMs recur, the cycle begins again.
The adjustment is a short-term stabilization measure, not a permanent right-sizing decision. The goal is to keep the workload running long enough for the forecaster to observe the higher memory usage and incorporate it into future recommendations.

Enabling OOM Remediation

Add oom_remediation to your spec.vertical configuration:
spec:
  vertical:
    mode: autonomous
    oom_remediation:
      enabled: true
    containers:
      - name: my-container
        memory:
          lowerbound: 256Mi
          upperbound: 4Gi
oom_remediation.enabled must be true and spec.vertical.mode must be autonomous. Workloads in recommendation mode are not affected.

Stabilization Window

After Thoras applies an OOM memory adjustment, it holds a memory floor for the duration of the stabilization window. During this window:
  • Incoming forecasts cannot lower memory below the adjusted value.
  • Forecasts can raise memory above the adjusted value if the forecast recommends it.
  • Each new OOM resets the stabilization window, extending it from the time of the most recent OOM. The workload must be stable for the configured interval before memory may me scaled down from the multiplied value.
Once the window expires, Thoras clears the floor and the forecaster resumes full control of memory sizing. Default duration: max(forecast_interval, 1h) The window is at minimum one full forecast cycle, ensuring the forecaster has had at least one opportunity to observe the workload at the adjusted memory level before the floor is removed. The stabilization window should be long enough to include a period where memory usage reaches the amount of memory which would have needed to have been prevented the previously observed OOMs, so that the forecaster is able to account for the high memory usage. To override the default, set stabilization_window explicitly:
oom_remediation:
  enabled: true
  stabilization_window: 2h

Upper Bounds

If memory.upperbound is configured on a container, OOM adjustments are clamped to that value. Adjustments will not exceed the upper bound regardless of how many times the multiplier compounds. If no upper bound is configured, adjustments compound without a ceiling. For workloads where runaway memory growth would be harmful, configure memory.upperbound.

Interaction with Forecasts

During an active stabilization window, Thoras applies max(forecast, floor) to memory when scaling. CPU is always sourced from the forecast unchanged. When a new forecast arrives after the window has expired, the floor is cleared and the forecast value is applied directly. If the workload OOMs again, the remediation cycle restarts from the new forecast baseline.

Considerations

Memory may be held above the forecast during the stabilization window. If an OOM was caused by a one-off spike and the workload returns to lower memory usage afterward, requests will remain elevated until the window expires. Thoras favors avoiding another OOM over reclaiming memory immediately, and allows configuring the duration of the stabilization window. New pods after the window expires start at the forecast value. Once the floor clears, pods created from that point forward use the forecaster’s recommendation. If that recommendation is insufficient, another OOM may occur and trigger a new remediation cycle. Partial in-place resize failures. If some pods cannot be resized in place (e.g., the node lacks capacity), those pods continue at their previous memory until the node has room or they are rescheduled. The next adjustment cycle reads a blended average and targets a slightly higher value.

Full Example

apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: my-service
  namespace: my-namespace
spec:
  scaleTargetRef:
    kind: Deployment
    name: my-service
    apiVersion: apps/v1
  model:
    mode: balanced
    forecast_interval: 1h
    forecast_blocks: 1h
  vertical:
    mode: autonomous
    oom_remediation:
      enabled: true
      stabilization_window: 2h
    update_policy:
      update_mode: in_place_or_recreate
    containers:
      - name: my-service
        memory:
          lowerbound: 256Mi
          upperbound: 4Gi
        cpu:
          lowerbound: 100m
          upperbound: 2
In this configuration:
  • OOM remediation is enabled with a 2-hour stabilization window
  • Memory adjustments are clamped to 4 GiB (upperbound)
  • Pods are resized in place where possible, with recreation as fallback