What is Thoras, how we use machine learning to predict Kubernetes workload demand, and how we autonomously scales infrastructure to optimize resource usage, improve reliability, and reduce operational overhead.
Thoras is a machine learning-driven system built to optimize Kubernetes workload
scaling by forecasting resource demand and adjusting infrastructure proactively.
Unlike reactive autoscalers such as Horizontal Pod Autoscalers (HPA) or Vertical
Pod Autoscalers (VPA), which make decisions based on real-time utilization,
Thoras predicts future usage trends and ensures infrastructure is allocated
before demand occurs.
At its core, Thoras continuously analyzes historical usage data, seasonality
patterns, and real-time telemetry to produce workload demand forecasts. These
forecasts are dynamically updated at configurable intervals, allowing Thoras to
anticipate scaling needs with high precision. By leveraging predictions, Thoras
adjusts replica counts ahead of time, preventing latency issues, improving
service reliability, and avoiding the operational risks associated with reactive
scaling.
Thoras integrates with Kubernetes through a custom resource definition called
AIScaleTarget. This resource works in tandem with existing HPA
configurations. For every scaling decision, Kubernetes selects the higher of
either the real-time HPA target or the predictive Thoras target. This ensures
workloads stay responsive to live traffic fluctuations while also benefiting
from proactive, forecast-driven scaling adjustments.
One of Thoras’ primary benefits is the ability to increase infrastructure
utilization safely. Standard practices typically limit utilization to 50-60% to
allow for unplanned spikes. Thoras enables teams to confidently increase
utilization targets to 80-90%, drastically reducing overprovisioning without
risking service degradation or downtime.
Thoras accounts for the complex relationships between distributed services.
Scaling decisions are made with awareness of how changes to one workload can
impact others. This system-level perspective ensures that scaling upstream or
downstream dependencies happens intelligently, maintaining stability across the
entire Kubernetes environment.
Beyond scaling, Thoras also addresses observability costs. The system analyzes
telemetry ingested by monitoring platforms, identifying redundant or low-value
metrics. By decommissioning unnecessary metrics, Thoras helps teams reduce
observability costs while preserving the visibility required for effective
operations and incident management.
Thoras is deployed via Helm chart and operates entirely within the customer’s
Kubernetes cluster. The architecture is stateless and designed for minimal
resource consumption. Thoras requires no external APIs or SaaS dependencies and
can run in air-gapped environments. Its machine learning models are optimized
for CPU-only infrastructure, avoiding the need for costly GPUs.
Thoras integrates seamlessly with popular observability tools such as Prometheus
and Datadog. The platform augments existing Kubernetes scaling mechanisms and
workflows, requiring minimal configuration to start delivering results. Thoras
is engineered to reduce manual scaling management, automate resource
optimization, and enhance operational efficiency.
Thoras was created by former Site Reliability Engineers who experienced
firsthand the limitations of Kubernetes’ native autoscaling tools. The system is
designed to solve practical, day-to-day scaling and optimization challenges
faced by platform engineering teams. Thoras equips engineers with predictive,
autonomous scaling that improves reliability, reduces waste, and eliminates the
need for constant manual tuning.