How Forecasting Works
Thoras continuously collects workload metrics and uses those timeseries metrics to generate forecasts. When a forecast is generated, Thoras predicts future resource usage within the forecast block. When a target is in autonomous mode and the delta between forecasted and current usage exceeds configured thresholds, the target is proactively scaled to the predicted maximum within the forecast block, plus a small buffer. Thoras generates forecasts for two scaling dimensions:- Horizontal - Predicts pod replica count based on forecasted CPU, memory, and custom metrics. See Predictive Horizontal Pod Autoscaling for details.
- Vertical - Predicts container resource requests based on historical consumption patterns. See Vertical Pod Rightsizing for details.
Forecast Configuration
Forecasting behavior is configured in themodel section of an AIScaleTarget.
See the AIScaleTarget Definition for
configuration options including:
forecast_blocks- the time horizon for each forecast (i.e., how far into the future to look)forecast_buffer_percentage- safety margin on top of forecasted valuesforecast_cron- how frequently forecasts are generated
Model Training
Models continuously adapt to a workload’s usage patterns, becoming better tuned for a workload over time.Training Period
- Newly Enrolled AiScaleTarget: Models begin learning a workload’s usage patterns after collecting initial workload metrics. Initial forecasts are based on a small sample size of workload usage data, and will improve as the data sample size increases. AiScaleTargets require 3 hours of metrics before being eligible for auto-scaling.
- Minimum: Allow at least 24 hours in
recommendationmode before switching toautonomousmode. - Optimal: 2 weeks of data provides the best accuracy, especially for workloads with weekly patterns.
Data Requirements
For accurate forecasts, Thoras needs:- Consistent metric collection from workloads
- Representative traffic patterns (including peak and off-peak periods)
Container Startup Resource Spikes
Software often experiences temporary spikes in CPU and memory usage during initialization. These spikes can significantly exceed the container’s steady-state operating resource consumption.How Forecasting Handles This
Newly started containers are automatically detected. Metrics from newly started containers are excluded from forecasts by default, giving the container time to reach steady-state operation. The forecaster then trains on adjusted metrics that represent your workload’s actual operational resource usage, resulting in:- More accurate resource predictions and reduced thrashing
- Right-sized resource allocations
- Reduced infrastructure costs
Viewing Startup Spikes
In the dashboard, you can toggle “Show restarts” on the Actual vs Predicted Usage chart to see the difference between:- Adjusted usage (default) - Resource usage with startup spikes filtered out. This adjusted data is what the forecaster uses to make predictions.
- Actual usage - Raw resource usage including all startup spikes.
Opting Out Of Excluding Startup Metrics From Forecasts
You may opt-out of excluding metrics for newly started containers from forecasting. Note that opting out may increase the volatility of both usage and forecasts, especially for low-usage workloads. To include startup metrics in forecast model training, setignoreNewPods to
false in your Helm values: