Monitoring
Set up monitoring for your StreamHouse deployment.
8 min readOperations
Monitoring Overview
Monitoring is critical for running StreamHouse in production. StreamHouse exposes comprehensive Prometheus metrics, structured logs, and health endpoints. The web console provides built-in dashboards for real-time monitoring.
Key Metrics to Monitor
Focus on these metrics for a healthy StreamHouse deployment.
- Produce/consume latency P99: Should be under 200ms for produce, 100ms for consume
- Consumer lag: The difference between the latest offset and the consumer's committed offset. Rising lag indicates consumers can't keep up.
- S3 error rate: Should be near zero. Spikes indicate S3 throttling or network issues.
- Segment cache hit ratio: Aim for >90%. Low hit rates mean more S3 reads and higher latency.
- Agent CPU/memory: Scale out when agents consistently exceed 70% CPU utilization.
- Metadata query latency: PostgreSQL query time. Should be under 10ms P99.
Prometheus Setup
Configure Prometheus to scrape StreamHouse agents.
yaml
# prometheus.yml
scrape_configs:
- job_name: 'streamhouse'
scrape_interval: 15s
static_configs:
- targets:
- 'agent-1:8080'
- 'agent-2:8080'
- 'agent-3:8080'
metrics_path: '/metrics'Structured Logging
StreamHouse outputs structured JSON logs that can be collected by any log aggregation system. Log levels can be configured per-module for targeted debugging.
bash
# Set log level via environment variable
export RUST_LOG=streamhouse=info,streamhouse::storage=debug
# Example log output
{"timestamp":"2026-01-15T10:30:00Z","level":"INFO","module":"streamhouse::agent","message":"Segment flushed","topic":"events","partition":0,"offset_range":"1000-1999","size_bytes":67108864,"duration_ms":450}