Creating Streams

Define SQL streams over StreamHouse topics.

8 min readSQL Processing

Creating a Stream

A stream is a SQL view over a StreamHouse topic. Streams define the schema of the data in a topic, enabling typed queries and validation.

sql
-- Create a stream over an existing topic
CREATE STREAM user_events (
  user_id VARCHAR,
  event VARCHAR,
  page VARCHAR,
  timestamp TIMESTAMP,
  metadata JSON
) WITH (
  topic = 'user-events',
  format = 'json',
  timestamp_field = 'timestamp'
);

-- Query the stream
SELECT user_id, event, count(*)
FROM user_events
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY user_id, event;

Continuous Queries

Continuous queries run indefinitely, processing each new message as it arrives and writing results to an output topic.

sql
-- Create a continuous query that counts events per minute
CREATE CONTINUOUS QUERY event_counts AS
  SELECT
    event,
    count(*) as event_count,
    window_start,
    window_end
  FROM TUMBLE(user_events, timestamp, INTERVAL '1 minute')
  GROUP BY event, window_start, window_end
  OUTPUT TO 'event-counts-per-minute';

Data Formats

Streams support multiple data serialization formats.

  • JSON: Self-describing format, flexible schema. Best for development and mixed-type data.
  • Avro: Binary format with schema registry integration. Best for production with strict schemas.
  • Protobuf: Google's binary format. Best for gRPC-heavy environments.
  • CSV: Simple text format. Useful for log data and integration with legacy systems.