Skip to main content
When a container is killed due to an out-of-memory (OOM) event, Thoras can automatically increase the container’s memory request and limit to stabilize the workload. This prevents a workload from cycling through repeated OOM kills while waiting for a scheduled forecast to produce a sufficient memory recommendation. OOM remediation requires autonomous vertical scaling to be enabled. It is opt-in per workload.

How It Works

When OOM kills are detected on a workload, Thoras:
  1. Bumps memory requests and limits for affected containers by a 1.2x multiplier, computed independently from the average current memory request and limit across all running pods. Both request and limit are raised since the kernel OOM-kills based on the limit, and request guarantees node memory.
  2. Applies the adjustment to running pods and any newly created pods. For running pods, Thoras uses your update_policy: it attempts an in-place resize first (if enabled), and falls back to a rolling restart (if enabled) when resize fails or is not supported. Any pods created or rescheduled in the meantime (e.g., evictions, scale-out) automatically receive the adjusted memory regardless of whether the resize or restart succeeds.
  3. Repeats every 2 minutes as long as OOM kills continue, compounding each time (1.2×, then 1.44×, then 1.73×, and so on) until OOMs stop.
  4. Holds the memory floor for a configurable stabilization window after the last OOM. During this window, forecasts can raise memory above the floor but cannot lower it. This prevents a new forecast, which may not yet reflect the OOM episode, from reversing the adjustment down to a point which would be prone to OOMing.
  5. Returns full control to the forecaster once the stabilization window expires. If OOMs recur, the cycle begins again.
The adjustment is a short-term stabilization measure, not a permanent right-sizing decision. The goal is to keep the workload running long enough for the forecaster to observe the higher memory usage and incorporate it into future recommendations.

Enabling OOM Remediation

Add oom_remediation to your spec.vertical configuration:
spec:
  vertical:
    mode: autonomous
    oom_remediation:
      enabled: true
    containers:
      - name: my-container
        memory:
          lowerbound: 256Mi
          upperbound: 4Gi
oom_remediation.enabled must be true and spec.vertical.mode must be autonomous. Workloads in recommendation mode are not affected.

Stabilization Window

After Thoras applies an OOM memory adjustment, it holds a memory floor for the duration of the stabilization window. During this window:
  • Incoming forecasts cannot lower memory below the adjusted value.
  • Forecasts can raise memory above the adjusted value if the forecast recommends it.
  • Each new OOM resets the stabilization window, extending it from the time of the most recent OOM. The workload must be stable for the configured interval before memory may be scaled down from the multiplied value.
Once the window expires, Thoras clears the floor and the forecaster resumes full control of memory sizing. Default duration: max(forecast_interval, 1h). The window is at minimum one full forecast cycle, ensuring the forecaster has had at least one opportunity to observe the workload at the adjusted memory level before the floor is removed. The stabilization window should be long enough for the forecaster to observe memory usage at the adjusted level, so future recommendations account for the higher usage that triggered the OOMs. To override the default, set stabilization_window explicitly:
oom_remediation:
  enabled: true
  stabilization_window: 2h

Upper Bounds

memory.upperbound is a steady-state policy and is intentionally bypassed during OOM remediation. OOM is an emergency response: clamping the adjustment to upperbound would leave the workload exposed to the same kill threshold that triggered the cycle in the first place. Adjustments may therefore land above the configured upper bound while the stabilization window is active. Once the stabilization window expires, the forecaster regains full control and steady-state recommendations are clamped to upperbound as usual.

Memory Limits

OOM remediation raises both the memory request and the memory limit. How the limit is determined depends on whether Thoras manages it:
  • Limit managed by Thoras (memory.limit.ratio is set): during the stabilization window, the limit is max(adjusted_request × ratio, adjusted_limit). The ratio is preserved on the way up, and the OOM-adjusted limit acts as a floor so a non-OOM reconcile cannot snap the limit back below it mid-episode. After the window expires, the limit follows the ratio normally as the forecaster regains control.
  • Limit not managed by Thoras (no memory.limit.ratio): the limit is raised by the same multiplier as the request during the stabilization window to give the workload room to recover. When the window expires, Thoras reverts the limit to the value in your workload’s pod template (deployment, statefulset, etc.). If your workload’s true steady-state memory usage permanently exceeds the limit set in your pod template, OOMs will recur each time the limit reverts, visible as repeated OOMKilled events. The resolution is to raise the limit in the workload’s pod template, or to configure memory.limit.ratio so Thoras can manage limits directly.

Interaction with Forecasts

During an active stabilization window, Thoras applies max(forecast, floor) to memory when scaling. CPU is always sourced from the forecast unchanged. When a new forecast arrives after the window has expired, the floor is cleared and the forecast value is applied directly. If the workload OOMs again, the remediation cycle restarts from the new forecast baseline.

Considerations

Memory may be held above the forecast during the stabilization window. If an OOM was caused by a one-off spike and the workload returns to lower memory usage afterward, requests will remain elevated until the window expires. Thoras favors avoiding another OOM over reclaiming memory immediately, and allows configuring the duration of the stabilization window. New pods after the window expires start at the forecast value. Once the floor clears, pods created from that point forward use the forecaster’s recommendation. If that recommendation is insufficient, another OOM may occur and trigger a new remediation cycle. Partial in-place resize failures. If some pods cannot be resized in place (e.g., the node lacks capacity), those pods continue at their previous memory until the node has room or they are rescheduled. The next adjustment cycle reads a blended average and targets a slightly higher value.

Full Example

apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: my-service
  namespace: my-namespace
spec:
  scaleTargetRef:
    kind: Deployment
    name: my-service
    apiVersion: apps/v1
  model:
    mode: balanced
    forecast_interval: 1h
    forecast_blocks: 4h
  vertical:
    mode: autonomous
    oom_remediation:
      enabled: true
      stabilization_window: 2h
    update_policy:
      update_mode: in_place_or_recreate
    containers:
      - name: my-service
        memory:
          lowerbound: 256Mi
          upperbound: 4Gi
        cpu:
          lowerbound: 100m
          upperbound: 2
In this configuration:
  • OOM remediation is enabled with a 2-hour stabilization window
  • Steady-state memory recommendations are clamped to 4 GiB (upperbound); OOM adjustments may exceed this temporarily during stabilization
  • Pods are resized in place where possible, with recreation as fallback