Skip to main content

JVM Predictive Auto-Scaling

Thoras can predictively tune JVM heap settings (-Xmx and -Xms) alongside container cpu and memory requests for Java workloads. Instead of hard-coding heap sizes or relying on JVM ergonomics, Thoras forecasts heap and non-heap memory usage and derives optimal values on every pod creation.

How It Works

Overview

Traditional JVM memory configuration is static: you set -Xmx (maximum heap) and -Xms (initial heap) at deployment time and hope they’re sufficient. If heap usage grows, you get OOM kills. If it shrinks, you waste resources. Thoras solves this by:
  1. Collecting JVM metrics (jvm_memory_used_bytes for heap and non-heap areas) directly from your application
  2. Forecasting future heap and non-heap memory needs using historical patterns
  3. Computing optimal -Xmx, -Xms, and container memory request/limit values
  4. Injecting these as environment variables into pods at creation time via a mutating admission webhook
This happens automatically on each pod creation, so your JVM settings stay in sync with actual demand.

The Calculation

Thoras uses three configurable buffers to compute JVM settings from its forecasts:
ParameterDefaultPurpose
xmx_buffer10%Headroom above observed/predicted heap usage for -Xmx
xms_ratio80%Ratio of -Xms to -Xmx (initial heap as a fraction of max heap)
memory_buffer10%Headroom above total JVM memory (heap + non-heap) for the container memory request
The calculation follows four steps: Step 1: Determine effective heap and non-heap usage Thoras takes the greater of the forecasted value and current observed value for both heap and non-heap memory. This ensures the JVM is never sized below what it’s currently using.
effective_heap    = max(heap_forecast, heap_current)
effective_nonheap = max(nonheap_forecast, nonheap_current)
Step 2: Calculate Xmx (maximum heap) The maximum heap size is the effective heap usage plus the xmx_buffer:
Xmx = effective_heap * (1 + xmx_buffer)
If xmx_bounds are configured, the value is clamped within those bounds. Step 3: Calculate Xms (initial heap) The initial heap size is a ratio of the maximum:
Xms = Xmx * xms_ratio
Step 4: Calculate container memory request The container memory request accounts for the full JVM footprint (heap + non-heap) plus a buffer for native memory, thread stacks, metaspace, etc.:
container_memory = (Xmx + effective_nonheap) * (1 + memory_buffer)

Worked Example

Given:
  • Heap forecast: 200 MiB
  • Non-heap forecast: 50 MiB
  • All buffers at defaults (10% xmx_buffer, 80% xms_ratio, 10% memory_buffer)
StepCalculationResult
Xmx200 MiB * 1.10220 MiB
Xms220 MiB * 0.80176 MiB
Container memory(220 MiB + 50 MiB) * 1.10297 MiB
The pod would receive:
  • THORAS_JVM_HEAP_MEMORY_XMX=-Xmx230686720 (220 MiB in bytes)
  • THORAS_JVM_HEAP_MEMORY_XMS=-Xms184549376 (176 MiB in bytes)
  • Container memory request: 297 MiB

What Gets Injected

Thoras injects two environment variables into the JVM container on pod creation:
Environment VariableFormatDescription
THORAS_JVM_HEAP_MEMORY_XMX-Xmx<bytes>Maximum heap size flag (e.g., -Xmx230686720)
THORAS_JVM_HEAP_MEMORY_XMS-Xms<bytes>Initial/minimum heap size flag (e.g., -Xms184549376)
Values are specified in bytes for precision. These variables are prepended to the container’s environment so they’re available for reference in JAVA_OPTS or your application’s JVM argument configuration.

Configuration Guide

Prerequisites

  1. Prometheus JVM metric endpoint: Your Java application must expose jvm_memory_used_bytes metrics with area="heap" and area="nonheap" labels. Most JVM metric exporters (Micrometer, JMX Exporter, etc.) produce these by default.
  2. Vertical scaling enabled on the AIScaleTarget in auto (autonomous) mode.

Step 1: Configure Your Application to Use the Environment Variables

Your application must reference the Thoras-injected environment variables for its JVM heap settings. The simplest approach is to reference them in JAVA_OPTS or JAVA_TOOL_OPTIONS:
containers:
  - name: my-jvm-app
    image: my-app:latest
    env:
      - name: JAVA_TOOL_OPTIONS
        value: "$(THORAS_JVM_HEAP_MEMORY_XMX) $(THORAS_JVM_HEAP_MEMORY_XMS)"
Or if your entrypoint uses JAVA_OPTS:
env:
  - name: JAVA_OPTS
    value: "$(THORAS_JVM_HEAP_MEMORY_XMX) $(THORAS_JVM_HEAP_MEMORY_XMS) -XX:+UseG1GC"
Kubernetes will resolve the $(...) references at container startup using the environment variables that Thoras injects.

Step 2: Create the AIScaleTarget

Configure an AIScaleTarget with JVM options enabled on the target container. The key requirements are:
  • Vertical mode must be set to auto
  • The container must have memory.limit.ratio set (required when JVM auto-sizing is enabled)
  • jvm_options.enable_auto_heap_size must be true
apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: my-jvm-app
  namespace: my-namespace
spec:
  scaleTargetRef:
    kind: Deployment
    name: my-jvm-app
    apiVersion: apps/v1
  model:
    mode: balanced
  vertical:
    mode: auto
    containers:
      - name: my-jvm-app
        memory:
          lowerbound: 128Mi
          limit:
            ratio: "1.0"
        jvm_options:
          enable_auto_heap_size: true

AIScaleTarget JVM Fields Reference

All JVM configuration lives under spec.vertical.containers[].jvm_options:
FieldTypeRequiredDefaultDescription
enable_auto_heap_sizeboolYesfalseEnables automatic JVM heap sizing for this container.
xmx_bufferpercentageNo10%Buffer added above observed/predicted heap usage when computing -Xmx. Provides headroom to absorb heap growth between forecast cycles.
xms_ratiopercentageNo80%Ratio of -Xms to -Xmx. Controls how much heap the JVM pre-allocates at startup. Higher values reduce GC pressure from heap expansion.
memory_bufferpercentageNo10%Buffer added above total JVM memory (Xmx + non-heap) when computing the container memory request. Accounts for native memory, thread stacks, metaspace growth, etc.
xmx_bounds.lowerboundresource quantityNoMinimum allowed Xmx value (e.g., 256Mi). Prevents the heap from being sized too small regardless of forecast.
xmx_bounds.upperboundresource quantityNoMaximum allowed Xmx value (e.g., 4Gi). Caps the heap to prevent runaway growth.
The parent container spec also requires:
FieldRelevance
memory.limit.ratioRequired when enable_auto_heap_size is true. Defines the ratio of memory limit to request (e.g., "1.0" sets limit equal to request).
memory.lowerboundMinimum container memory request. Acts as a floor.
memory.upperbound(Optional) Maximum container memory request. Acts as a ceiling.

Full Example with Custom Buffers

apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: my-jvm-app
  namespace: my-namespace
spec:
  scaleTargetRef:
    kind: Deployment
    name: my-jvm-app
    apiVersion: apps/v1
  model:
    mode: balanced
  vertical:
    mode: auto
    containers:
      - name: my-jvm-app
        memory:
          lowerbound: 256Mi
          upperbound: 8Gi
          limit:
            ratio: "2.0"
        jvm_options:
          enable_auto_heap_size: true
          xmx_buffer: "15%"
          xms_ratio: "75%"
          memory_buffer: "20%"
          xmx_bounds:
            lowerbound: 512Mi
            upperbound: 4Gi
This configuration:
  • Adds 15% headroom on heap forecasts when computing Xmx
  • Sets Xms to 75% of Xmx
  • Adds 20% headroom for the container memory request
  • Clamps Xmx between 512 MiB and 4 GiB regardless of forecast
  • Keeps the container memory request between 256 MiB and 8 GiB

Multi-Container Pods

JVM auto-scaling supports one JVM container per pod. In multi-container pods, only the container with jvm_options.enable_auto_heap_size: true will have heap settings managed. Other containers (sidecars, init containers, etc.) are unaffected:
vertical:
  mode: auto
  containers:
    - name: my-jvm-app
      memory:
        lowerbound: 256Mi
        limit:
          ratio: "2.0"
      jvm_options:
        enable_auto_heap_size: true
    - name: sidecar
      memory:
        lowerbound: 64Mi
      cpu:
        lowerbound: 50m
In this example, my-jvm-app gets Thoras-managed heap settings while sidecar gets standard vertical scaling (or no scaling, depending on configuration).

Constraints and Considerations

  • One JVM container per pod: Only one container in the pod can have enable_auto_heap_size: true.
  • Requires memory.limit: The container must have a memory.limit.ratio configured. This ensures the JVM has a bounded memory limit. A ratio of "2.0" is typical for JVM workloads since the JVM manages its own memory within the heap boundary.
  • Never scales below current usage: Thoras always takes the maximum of the forecast and current observed usage to prevent sizing below what the JVM is actively using.
  • Metrics must be available: The application must expose jvm_memory_used_bytes with area="heap" and area="nonheap" labels. If these metrics are missing, JVM mutation is skipped for that pod.
  • Applied at pod creation: JVM settings are injected via the admission webhook when pods are created. Existing running pods are not modified in place — a rollout is required for new settings to take effect.