Introduction
Real-time analytics pipelines traditionally require multiple systems: Kafka for ingestion, Flink for processing, and a database for serving. StreamHouse consolidates this into a single platform.
Architecture Overview
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Events │────▶│ StreamHouse │────▶│ Dashboard │
│ (API) │ │ SQL │ │ (Grafana) │
└─────────────┘ └─────────────┘ └─────────────┘
Step 1: Set Up Event Ingestion
Create a topic for raw events:
streamctl topics create raw_events --partitions 4
Produce events from your API:
// Express middleware
app.use((req, res, next) => {
producer.send('raw_events', {
timestamp: Date.now(),
path: req.path,
method: req.method,
user_id: req.user?.id,
response_time_ms: res.responseTime
});
next();
});
Step 2: Create Aggregation Streams
Use SQL to compute real-time aggregates:
-- Requests per minute by endpoint
CREATE STREAM rpm_by_endpoint AS
SELECT
path,
COUNT(*) as request_count,
AVG(response_time_ms) as avg_latency,
TUMBLE_START(timestamp, INTERVAL '1' MINUTE) as window_start
FROM raw_events
GROUP BY
path,
TUMBLE(timestamp, INTERVAL '1' MINUTE);
Step 3: Connect to Grafana
StreamHouse exposes a Prometheus endpoint for metrics:
# prometheus.yml
scrape_configs:
- job_name: 'streamhouse'
static_configs:
- targets: ['agent:8080']
Create Grafana dashboards to visualize your streams.
Step 4: Add Alerting
Configure alerts for anomalies:
CREATE STREAM high_latency_alerts AS
SELECT *
FROM rpm_by_endpoint
WHERE avg_latency > 500;
Connect alerts to PagerDuty or Slack via webhooks.
Performance Considerations
For high-throughput analytics:
- Partition wisely: Use high-cardinality keys for even distribution
- Window size: Larger windows reduce output volume but increase latency
- Materialized views: Cache expensive aggregations
Conclusion
StreamHouse SQL processing eliminates the need for separate stream processing infrastructure. Build real-time analytics pipelines with familiar SQL syntax and minimal operational overhead.