Thoras Documentation

Thoras can predictively tune JVM heap settings (-Xmx and -Xms) alongside container cpu and memory requests for Java workloads. Instead of hard-coding heap sizes or relying on JVM ergonomics, Thoras forecasts heap and non-heap memory usage and derives optimal values on every pod creation.

How It Works

Overview

Traditional JVM memory configuration is static: you set -Xmx (maximum heap) and -Xms (initial heap) at deployment time and hope they’re sufficient. If heap usage grows, you get OOM kills. If it shrinks, you waste resources. Thoras solves this by:

Collecting JVM metrics (jvm_memory_used_bytes for heap and non-heap areas) directly from your application
Forecasting future heap and non-heap memory needs using historical patterns
Computing optimal -Xmx, -Xms, and container memory request/limit values
Injecting these as environment variables into pods at creation time via a mutating admission webhook

This happens automatically on each pod creation, so your JVM settings stay in sync with actual demand.

The Calculation

Thoras uses three configurable buffers to compute JVM settings from its forecasts:

Parameter	Default	Purpose
`xmx_buffer`	10%	Headroom above observed/predicted heap usage for `-Xmx`
`xms_ratio`	80%	Ratio of `-Xms` to `-Xmx` (initial heap as a fraction of max heap)
`memory_buffer`	10%	Headroom above total JVM memory (heap + non-heap) for the container memory request

The calculation follows four steps: Step 1: Determine effective heap and non-heap usage Thoras takes the greater of the forecasted value and current observed value for both heap and non-heap memory. This ensures the JVM is never sized below what it’s currently using.

effective_heap    = max(heap_forecast, heap_current)
effective_nonheap = max(nonheap_forecast, nonheap_current)

Step 2: Calculate Xmx (maximum heap) The maximum heap size is the effective heap usage plus the xmx_buffer:

Xmx = effective_heap * (1 + xmx_buffer)

If xmx_bounds are configured, the value is clamped within those bounds. Step 3: Calculate Xms (initial heap) The initial heap size is a ratio of the maximum:

Xms = Xmx * xms_ratio

Step 4: Calculate container memory request The container memory request accounts for the full JVM footprint (heap + non-heap) plus a buffer for native memory, thread stacks, metaspace, etc.:

container_memory = (Xmx + effective_nonheap) * (1 + memory_buffer)

Worked Example

Given:

Heap forecast: 200 MiB
Non-heap forecast: 50 MiB
All buffers at defaults (10% xmx_buffer, 80% xms_ratio, 10% memory_buffer)

Step	Calculation	Result
Xmx	200 MiB * 1.10	220 MiB
Xms	220 MiB * 0.80	176 MiB
Container memory	(220 MiB + 50 MiB) * 1.10	297 MiB

The pod would receive:

THORAS_JVM_HEAP_MEMORY_XMX=-Xmx230686720 (220 MiB in bytes)
THORAS_JVM_HEAP_MEMORY_XMS=-Xms184549376 (176 MiB in bytes)
Container memory request: 297 MiB

What Gets Injected

Thoras injects two environment variables into the JVM container on pod creation:

Environment Variable	Format	Description
`THORAS_JVM_HEAP_MEMORY_XMX`	`-Xmx<bytes>`	Maximum heap size flag (e.g., `-Xmx230686720`)
`THORAS_JVM_HEAP_MEMORY_XMS`	`-Xms<bytes>`	Initial/minimum heap size flag (e.g., `-Xms184549376`)

Values are specified in bytes for precision. These variables are prepended to the container’s environment so they’re available for reference in JAVA_OPTS or your application’s JVM argument configuration.

Configuration Guide

Prerequisites

Prometheus JVM metric endpoint: Your Java application must expose jvm_memory_used_bytes metrics with area="heap" and area="nonheap" labels. Most JVM metric exporters (Micrometer, JMX Exporter, etc.) produce these by default.
Vertical scaling enabled on the AIScaleTarget in auto (autonomous) mode.

Step 1: Configure Your Application to Use the Environment Variables

Your application must reference the Thoras-injected environment variables for its JVM heap settings. The simplest approach is to reference them in JAVA_OPTS or JAVA_TOOL_OPTIONS:

containers:
  - name: my-jvm-app
    image: my-app:latest
    env:
      - name: JAVA_TOOL_OPTIONS
        value: "$(THORAS_JVM_HEAP_MEMORY_XMX) $(THORAS_JVM_HEAP_MEMORY_XMS)"

Or if your entrypoint uses JAVA_OPTS:

env:
  - name: JAVA_OPTS
    value:
      "$(THORAS_JVM_HEAP_MEMORY_XMX) $(THORAS_JVM_HEAP_MEMORY_XMS) -XX:+UseG1GC"

Kubernetes will resolve the $(...) references at container startup using the environment variables that Thoras injects.

Step 2: Create the AIScaleTarget

Configure an AIScaleTarget with JVM options enabled on the target container. The key requirements are:

Vertical mode must be set to auto
The container must have memory.limit.ratio set (required when JVM auto-sizing is enabled)
jvm_options.enable_auto_heap_size must be true

apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: my-jvm-app
  namespace: my-namespace
spec:
  scaleTargetRef:
    kind: Deployment
    name: my-jvm-app
    apiVersion: apps/v1
  model:
    mode: balanced
  vertical:
    mode: auto
    containers:
      - name: my-jvm-app
        memory:
          lowerbound: 128Mi
          limit:
            ratio: "1.0"
        jvm_options:
          enable_auto_heap_size: true

AIScaleTarget JVM Fields Reference

All JVM configuration lives under spec.vertical.containers[].jvm_options:

Field	Type	Required	Default	Description
`enable_auto_heap_size`	bool	Yes	`false`	Enables automatic JVM heap sizing for this container.
`xmx_buffer`	percentage	No	`10%`	Buffer added above observed/predicted heap usage when computing `-Xmx`. Provides headroom to absorb heap growth between forecast cycles.
`xms_ratio`	percentage	No	`80%`	Ratio of `-Xms` to `-Xmx`. Controls how much heap the JVM pre-allocates at startup. Higher values reduce GC pressure from heap expansion.
`memory_buffer`	percentage	No	`10%`	Buffer added above total JVM memory (Xmx + non-heap) when computing the container memory request. Accounts for native memory, thread stacks, metaspace growth, etc.
`xmx_bounds.lowerbound`	resource quantity	No	-	Minimum allowed Xmx value (e.g., `256Mi`). Prevents the heap from being sized too small regardless of forecast.
`xmx_bounds.upperbound`	resource quantity	No	-	Maximum allowed Xmx value (e.g., `4Gi`). Caps the heap to prevent runaway growth.

The parent container spec also requires:

Field	Relevance
`memory.limit.ratio`	Required when `enable_auto_heap_size` is `true`. Defines the ratio of memory limit to request (e.g., `"1.0"` sets limit equal to request).
`memory.lowerbound`	Minimum container memory request. Acts as a floor.
`memory.upperbound`	(Optional) Maximum container memory request. Acts as a steady-state ceiling. Bypassed during opt-in OOM remediation.

Full Example with Custom Buffers

apiVersion: thoras.ai/v1
kind: AIScaleTarget
metadata:
  name: my-jvm-app
  namespace: my-namespace
spec:
  scaleTargetRef:
    kind: Deployment
    name: my-jvm-app
    apiVersion: apps/v1
  model:
    mode: balanced
  vertical:
    mode: auto
    containers:
      - name: my-jvm-app
        memory:
          lowerbound: 256Mi
          upperbound: 8Gi
          limit:
            ratio: "2.0"
        jvm_options:
          enable_auto_heap_size: true
          xmx_buffer: "15%"
          xms_ratio: "75%"
          memory_buffer: "20%"
          xmx_bounds:
            lowerbound: 512Mi
            upperbound: 4Gi

This configuration:

Adds 15% headroom on heap forecasts when computing Xmx
Sets Xms to 75% of Xmx
Adds 20% headroom for the container memory request
Clamps Xmx between 512 MiB and 4 GiB regardless of forecast
Keeps the container memory request between 256 MiB and 8 GiB

Multi-Container Pods

JVM auto-scaling supports one JVM container per pod. In multi-container pods, only the container with jvm_options.enable_auto_heap_size: true will have heap settings managed. Other containers (sidecars, init containers, etc.) are unaffected:

vertical:
  mode: auto
  containers:
    - name: my-jvm-app
      memory:
        lowerbound: 256Mi
        limit:
          ratio: "2.0"
      jvm_options:
        enable_auto_heap_size: true
    - name: sidecar
      memory:
        lowerbound: 64Mi
      cpu:
        lowerbound: 50m

In this example, my-jvm-app gets Thoras-managed heap settings while sidecar gets standard vertical scaling (or no scaling, depending on configuration).

Constraints and Considerations

One JVM container per pod: Only one container in the pod can have enable_auto_heap_size: true.
Requires memory.limit: The container must have a memory.limit.ratio configured. This ensures the JVM has a bounded memory limit. A ratio of "2.0" is typical for JVM workloads since the JVM manages its own memory within the heap boundary.
Never scales below current usage: Thoras always takes the maximum of the forecast and current observed usage to prevent sizing below what the JVM is actively using.
Metrics must be available: The application must expose jvm_memory_used_bytes with area="heap" and area="nonheap" labels. If these metrics are missing, JVM mutation is skipped for that pod.
Applied at pod creation: JVM settings are injected via the admission webhook when pods are created. Existing running pods are not modified in place — a rollout is required for new settings to take effect.

Getting Started

Installation

Guides

JVM Workload Scaling

How It Works

Overview

The Calculation

Worked Example

What Gets Injected

Configuration Guide

Prerequisites

Step 1: Configure Your Application to Use the Environment Variables

Step 2: Create the AIScaleTarget

AIScaleTarget JVM Fields Reference

Full Example with Custom Buffers

Multi-Container Pods

Constraints and Considerations

​How It Works

​Overview

​The Calculation

​Worked Example

​What Gets Injected

​Configuration Guide

​Prerequisites

​Step 1: Configure Your Application to Use the Environment Variables

​Step 2: Create the AIScaleTarget

​AIScaleTarget JVM Fields Reference

​Full Example with Custom Buffers

​Multi-Container Pods

​Constraints and Considerations

How It Works

Overview

The Calculation

Worked Example

What Gets Injected

Configuration Guide

Prerequisites

Step 1: Configure Your Application to Use the Environment Variables

Step 2: Create the AIScaleTarget

AIScaleTarget JVM Fields Reference

Full Example with Custom Buffers

Multi-Container Pods

Constraints and Considerations