Thoras is a self-hosted, AI/ML-powered platform that replaces reactive Kubernetes autoscaling with predictive, autonomous resource management. It eliminates waste, prevents capacity outages, and safely optimizes critical infrastructure. Thoras is designed to earn your trust before running autonomously.Documentation Index
Fetch the complete documentation index at: https://docs.thoras.ai/llms.txt
Use this file to discover all available pages before exploring further.
↓ 40%+ waste
Reduction in cloud compute waste from CPU and memory over-provisioning.
↑ Performance
OOM kill loops resolved automatically. Latency spikes prevented through
rightsized compute.
0 toil
Continuous right-sizing without manual tuning or on-call escalations.
Philosophy
Autonomy is earned, not assumed. Thoras doesn’t arrive in your cluster and start making decisions. It observes, learns, surfaces recommendations, and only acts autonomously once you’ve validated its reasoning. Your workload and infrastructure context are always respected. Performance first, efficiency second. Thoras reclaims waste from under-utilized CPU and memory without compromising availability. It knows when not to scale down as clearly as it knows when to scale up.How it works
Observe
Thoras ingests real-time telemetry from your existing metric sources
(Prometheus, the Kubernetes metrics server, and any custom or external
metrics you already use) and persists historical usage data inside your
cluster. It builds per-workload demand profiles that account for
seasonality, traffic patterns, and resource trends.
Forecast
AI/ML models continuously predict horizontal and vertical scaling needs
ahead of time. Forecasts update on configurable intervals (typically minutes
to hours), with model accuracy improving as patterns repeat.
Recommend
Before any autonomous action, Thoras surfaces predictions and right-sizing
recommendations in a dashboard. Teams validate model accuracy first.
Autonomous mode requires a minimum of 3 hours of historical data, and
benefits from at least 48 hours for workloads with daily seasonality.
Capabilities
Predictive vertical rightsizing
Pre-emptive CPU and memory request optimization. Day-one support for K8s
1.33+ in-place pod resize, with rolling restart fallback.
Predictive horizontal scaling
Replica counts adjusted ahead of demand spikes, not after. Integrates with
your existing HPA.
OOM remediation
Detects kill loops and stabilizes workloads with compounding memory
adjustments (1.2× per cycle) until the forecaster catches up.
JVM-aware scaling
Purpose-built handling for Java workloads, with heap and GC pressure
awareness.
Cost visibility and ROI
Real-time waste quantification against node pricing data. Savings estimated
per workload.
Fleet policies
ClusterAIScaleTemplate applies scaling policies across namespaces without
per-workload config.Get started
Quickstart
Install Thoras and see resource recommendations in minutes.
Integrations
How Thoras works alongside KEDA, Cluster Autoscaler, and Karpenter.

