Thoras Documentation

Thoras is a self-hosted, AI/ML-powered platform that replaces reactive Kubernetes autoscaling with predictive, autonomous resource management. It eliminates waste, prevents capacity outages, and safely optimizes critical infrastructure. Thoras is designed to earn your trust before running autonomously.

↓ 40%+ waste

Reduction in cloud compute waste from CPU and memory over-provisioning.

↑ Performance

OOM kill loops resolved automatically. Latency spikes prevented through rightsized compute.

0 toil

Continuous right-sizing without manual tuning or on-call escalations.

Philosophy

Autonomy is earned, not assumed. Thoras doesn’t arrive in your cluster and start making decisions. It observes, learns, surfaces recommendations, and only acts autonomously once you’ve validated its reasoning. Your workload and infrastructure context are always respected. Performance first, efficiency second. Thoras reclaims waste from under-utilized CPU and memory without compromising availability. It knows when not to scale down as clearly as it knows when to scale up.

How it works

Observe

Thoras ingests real-time telemetry from your existing metric sources (Prometheus, the Kubernetes metrics server, and any custom or external metrics you already use) and persists historical usage data inside your cluster. It builds per-workload demand profiles that account for seasonality, traffic patterns, and resource trends.

Forecast

AI/ML models continuously predict horizontal and vertical scaling needs ahead of time. Forecasts update on configurable intervals (typically minutes to hours), with model accuracy improving as patterns repeat.

Recommend

Before any autonomous action, Thoras surfaces predictions and right-sizing recommendations in a dashboard. Teams validate model accuracy first. Autonomous mode requires a minimum of 3 hours of historical data, and benefits from at least 48 hours for workloads with daily seasonality.

Act

Once enabled, Thoras scales pods and replica counts in advance of demand, remediates OOM kill loops in real time with compounding memory adjustments, and maintains configurable upper and lower bounds at all times. You can pause autonomous scaling cluster-wide at any moment.

Capabilities

Predictive vertical rightsizing

Pre-emptive CPU and memory request optimization. Day-one support for K8s 1.33+ in-place pod resize, with rolling restart fallback.

Predictive horizontal scaling

Replica counts adjusted ahead of demand spikes, not after. Integrates with your existing HPA.

OOM remediation

Detects kill loops and stabilizes workloads with compounding memory adjustments (1.2× per cycle) until the forecaster catches up.

JVM-aware scaling

Purpose-built handling for Java workloads, with heap and GC pressure awareness.

Cost visibility and ROI

Real-time waste quantification against node pricing data. Savings estimated per workload.

Fleet policies

ClusterAIScaleTemplate applies scaling policies across namespaces without per-workload config.

What is Thoras?

↓ 40%+ waste

↑ Performance

0 toil

Philosophy

How it works

Capabilities

Predictive vertical rightsizing

Predictive horizontal scaling

OOM remediation

JVM-aware scaling

Cost visibility and ROI

Fleet policies

Get started

Quickstart

Integrations

↓ 40%+ waste

↑ Performance

0 toil

​Philosophy

​How it works

​Capabilities

Predictive vertical rightsizing

Predictive horizontal scaling

OOM remediation

JVM-aware scaling

Cost visibility and ROI

Fleet policies

​Get started

Quickstart

Integrations

Philosophy

How it works

Capabilities

Get started