Overview
Action Metrics
See Thoras in action. These metrics provide insight into how Thoras actively manages your application resources across your environment. They track when Thoras takes autonomous action to scale horizontally or vertically, when replicas or resource allocations differ from Thoras’ desired values, and how eachAIScaleTarget is currently operating. Together, these metrics give teams
the visibility they need to understand scaling behavior, make informed
decisions, and maintain efficient, well-performing applications.
| Metric Name | Type | Description | Labels |
|---|---|---|---|
thoras_horizontal_scale_total | Counter | Counts horizontal scaling actions in autonomous mode | ai_scale_target, scale_metric, namespace |
thoras_vertical_scale_total | Counter | Counts vertical scaling adjustments in autonomous mode | ai_scale_target, namespace |
thoras_recommendation | Gauge | Current scaling recommendation per AIScaleTarget | ai_scale_target, resource, namespace, container, unit |
thoras_scale_targets | Gauge | The count of AIScaleTargets in various operational modes (autonomous or recommendation) | vertical, horizontal |
thoras_provisioning_ratio | Gauge | Ratio of current resource allocation to forecasted/recommended value. Values > 1.0 indicate over-provisioning, values < 1.0 indicate under-provisioning. | ai_scale_target, namespace, scaling_mode, resource, container, mode |
thoras_provisioning_delta | Gauge | Absolute difference between current and recommended/forecasted values. Units depend on resource type (bytes for memory, cores for CPU) | ai_scale_target, namespace, scaling_mode, resource, container, mode |
System Health Metrics
Observe the state of Thoras’ system health. These metrics offer a window into the health and performance of the Thoras platform. They can also be easily integrated into popular observability tools like Datadog, Grafana, giving your team the flexibility to monitor Thoras wherever you already track system performance.| Metric Name | Type | Description | Labels |
|---|---|---|---|
thoras_api_http_response_total | Counter | Tracks total HTTP responses from internal API; spikes may indicate issues | path, method, code |
thoras_api_http_request_duration_seconds | Gauge | Captures average internal API response time; highlights potential slowdowns | path, method |
thoras_forecast_queue_pending_count | Gauge | Number of pending forecast jobs currently in the queue | |
thoras_forecast_queue_pending_oldest_duration_seconds | Gauge | Duration in seconds that the oldest forecast job has been pending in the queue | |
thoras_forecast_queue_status_count | Counter | Number of forecast jobs by status | status |
thoras_job_completions_total | Counter | Total number of completed jobs by status and kind | status, kind |
thoras_job_duration_seconds | Gauge | Duration of jobs in seconds by status and kind | status, kind |
Advanced Usage
Provisioning Ratio
Metric:thoras_provisioning_ratio
The provisioning ratio compares your current resource allocation to Thoras’
forecasted or recommended values.
Interpretation:
Note: Recommendations are based on the predicted maximum usage over the forecast
window. Utilization ratio lowers when preemptive up-scaling occurs.
ratio > 1.0— Current usage exceeds recommendationratio < 1.0— Recommendations exceed current usage
- Vertical scaling:
current_avg_request / recommended_request - Horizontal scaling:
current_total_usage / forecasted_value
Provisioning Delta
Metric:thoras_provisioning_delta
The provisioning delta shows the absolute difference between your current
allocation and Thoras recommendations.
Units:
- Memory: bytes
- CPU: cores
abs(current_value - recommended_value)
Horizontal Scale Total
Metric:thoras_horizontal_scale_total
Tracks the total number of horizontal scaling actions Thoras has performed in
autonomous mode. Each increment represents a scaling event where Thoras adjusted
replica counts.
Example Queries & Dashboards:
Vertical Scale Total
Metric:thoras_vertical_scale_total
Tracks the total number of vertical scaling actions Thoras has performed in
autonomous mode. Each increment represents a scaling event where Thoras adjusted
resource requests or limits.
Example Queries & Dashboards:
Recommendation
Metric:thoras_recommendation
Shows Thoras’ current scaling recommendation for each AIScaleTarget. The value
and unit depend on the resource type being recommended.
Units:
- Memory: bytes
- CPU: cores
- Replicas: count
Scale Targets
Metric:thoras_scale_targets
Reports the count of AIScaleTargets currently managed by Thoras, broken down
by operational mode (autonomous or recommendation) and scaling dimension
(vertical or horizontal).
Example Queries & Dashboards:
API HTTP Response Total
Metric:thoras_api_http_response_total
Tracks the total number of HTTP responses from Thoras’ internal API. This
counter increments for every API response and can help identify unusual traffic
patterns or issues.
Example Queries & Dashboards:
API HTTP Request Duration
Metric:thoras_api_http_request_duration_seconds
Measures the average response time for Thoras’ internal API requests in seconds.
Rising values may indicate performance degradation or capacity issues.
Example Queries & Dashboards:
Forecast Queue Pending Count
Metric:thoras_forecast_queue_pending_count
Shows the current number of forecast jobs waiting in the queue. High values
indicate that forecast requests are backing up, which may signal worker capacity
issues or processing bottlenecks.
Example Queries & Dashboards:
Forecast Queue Pending Oldest
Metric:thoras_forecast_queue_pending_oldest_duration_seconds
Tracks how long the oldest forecast job has been waiting in the queue. Rising
values indicate jobs are not being processed quickly enough.
Example Queries & Dashboards:
Forecast Queue Status Count
Metric:thoras_forecast_queue_status_count
Tracks the total number of forecast jobs processed, categorized by status. This
counter helps monitor forecast job lifecycle and identify issues with job
processing.
Example Queries & Dashboards:
Job Completions Total
Metric:thoras_job_completions_total
Tracks the total number of completed jobs, categorized by status (success,
failure, etc.) and job kind. This counter helps identify job failure rates and
patterns across different job types.
Example Queries & Dashboards:
Job Duration Seconds
Metric:thoras_job_duration_seconds
Measures the duration of completed jobs in seconds, broken down by status and
job kind. Use this to identify slow jobs or performance regressions.
Example Queries & Dashboards: