Back to blog
TutorialDecember 20, 20238 min read

Building a Real-Time Analytics Pipeline with StreamHouse

Learn how to build an end-to-end real-time analytics pipeline using StreamHouse SQL processing, from ingestion to dashboard.

Introduction

Real-time analytics pipelines traditionally require multiple systems: Kafka for ingestion, Flink for processing, and a database for serving. StreamHouse consolidates this into a single platform.

Architecture Overview

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Events    │────▶│ StreamHouse │────▶│  Dashboard  │
│   (API)     │     │    SQL      │     │  (Grafana)  │
└─────────────┘     └─────────────┘     └─────────────┘

Step 1: Set Up Event Ingestion

Create a topic for raw events:

streamctl topics create raw_events --partitions 4

Produce events from your API:

// Express middleware
app.use((req, res, next) => {
  producer.send('raw_events', {
    timestamp: Date.now(),
    path: req.path,
    method: req.method,
    user_id: req.user?.id,
    response_time_ms: res.responseTime
  });
  next();
});

Step 2: Create Aggregation Streams

Use SQL to compute real-time aggregates:

-- Requests per minute by endpoint
CREATE STREAM rpm_by_endpoint AS
SELECT
  path,
  COUNT(*) as request_count,
  AVG(response_time_ms) as avg_latency,
  TUMBLE_START(timestamp, INTERVAL '1' MINUTE) as window_start
FROM raw_events
GROUP BY
  path,
  TUMBLE(timestamp, INTERVAL '1' MINUTE);

Step 3: Connect to Grafana

StreamHouse exposes a Prometheus endpoint for metrics:

# prometheus.yml
scrape_configs:
  - job_name: 'streamhouse'
    static_configs:
      - targets: ['agent:8080']

Create Grafana dashboards to visualize your streams.

Step 4: Add Alerting

Configure alerts for anomalies:

CREATE STREAM high_latency_alerts AS
SELECT *
FROM rpm_by_endpoint
WHERE avg_latency > 500;

Connect alerts to PagerDuty or Slack via webhooks.

Performance Considerations

For high-throughput analytics:

  1. Partition wisely: Use high-cardinality keys for even distribution
  2. Window size: Larger windows reduce output volume but increase latency
  3. Materialized views: Cache expensive aggregations

Conclusion

StreamHouse SQL processing eliminates the need for separate stream processing infrastructure. Build real-time analytics pipelines with familiar SQL syntax and minimal operational overhead.

Read the full tutorial →

Tags:analyticssqlgrafanatutorial

Ready to try StreamHouse?

Get started with S3-native event streaming in minutes.