Monitoring

Set up monitoring for your StreamHouse deployment.

8 min readOperations

Monitoring Overview

Monitoring is critical for running StreamHouse in production. StreamHouse exposes comprehensive Prometheus metrics, structured logs, and health endpoints. The web console provides built-in dashboards for real-time monitoring.

Key Metrics to Monitor

Focus on these metrics for a healthy StreamHouse deployment.

  • Produce/consume latency P99: Should be under 200ms for produce, 100ms for consume
  • Consumer lag: The difference between the latest offset and the consumer's committed offset. Rising lag indicates consumers can't keep up.
  • S3 error rate: Should be near zero. Spikes indicate S3 throttling or network issues.
  • Segment cache hit ratio: Aim for >90%. Low hit rates mean more S3 reads and higher latency.
  • Agent CPU/memory: Scale out when agents consistently exceed 70% CPU utilization.
  • Metadata query latency: PostgreSQL query time. Should be under 10ms P99.

Prometheus Setup

Configure Prometheus to scrape StreamHouse agents.

yaml
# prometheus.yml
scrape_configs:
  - job_name: 'streamhouse'
    scrape_interval: 15s
    static_configs:
      - targets:
        - 'agent-1:8080'
        - 'agent-2:8080'
        - 'agent-3:8080'
    metrics_path: '/metrics'

Structured Logging

StreamHouse outputs structured JSON logs that can be collected by any log aggregation system. Log levels can be configured per-module for targeted debugging.

bash
# Set log level via environment variable
export RUST_LOG=streamhouse=info,streamhouse::storage=debug

# Example log output
{"timestamp":"2026-01-15T10:30:00Z","level":"INFO","module":"streamhouse::agent","message":"Segment flushed","topic":"events","partition":0,"offset_range":"1000-1999","size_bytes":67108864,"duration_ms":450}