Health & Metrics

Monitor agent health and performance with Prometheus metrics.

5 min readAgents

Health Endpoints

Each agent exposes health check endpoints for liveness and readiness probes, compatible with Kubernetes health checks.

text

# Liveness probe - agent is running
GET /health/live
# Response: {"status": "ok"}

# Readiness probe - agent can serve requests
GET /health/ready
# Response: {"status": "ready", "metadata": "connected", "storage": "connected"}

# Detailed health with metrics
GET /api/health
# Response: {"status": "healthy", "version": "0.1.0", "uptime_seconds": 3600}

Prometheus Metrics

Agents export Prometheus metrics at the /metrics endpoint. These metrics cover request rates, latencies, storage operations, and cache hit rates.

text

# Key metrics to monitor
streamhouse_produce_requests_total          # Total produce requests
streamhouse_produce_latency_seconds         # Produce latency histogram
streamhouse_consume_requests_total          # Total consume requests
streamhouse_consume_latency_seconds         # Consume latency histogram
streamhouse_s3_upload_bytes_total           # Total bytes uploaded to S3
streamhouse_s3_download_bytes_total         # Total bytes downloaded from S3
streamhouse_segment_cache_hit_ratio         # Cache hit rate (aim for >90%)
streamhouse_metadata_cache_hit_ratio        # Metadata cache hit rate
streamhouse_active_connections              # Current active connections
streamhouse_writer_buffer_bytes             # Current write buffer usage

Grafana Dashboard

StreamHouse includes pre-built Grafana dashboards for monitoring. Import the dashboard JSON from the repository, or use the web console's built-in monitoring page.

Overview: Request rate, error rate, latency P50/P95/P99
Producers: Produce throughput, batch sizes, flush rates
Consumers: Consumer lag, fetch latency, partition assignments
Storage: S3 upload/download rates, segment sizes, cache hit ratios

Scaling Agents

Lease Management

Health & Metrics

On this page

Health Endpoints

Prometheus Metrics

Grafana Dashboard