StreamHouse Console

Every Event Has a Home

When you produce a message to StreamHouse, it doesn't just vanish into the cloud. It follows a precise, deterministic path: from your producer, through an agent's memory buffer, into a compressed binary segment, and finally into an S3 object where it will live with 11 nines of durability.

This post walks through exactly what happens to your bytes at every step.

The Storage Hierarchy

StreamHouse organizes data in four levels:

Records → Blocks → Segments → Partitions
   │          │         │          │
 Single    ~1MB     ~64MB    Ordered
 event    batches   S3 files   log

Records are individual events — a key, value, timestamp, and optional headers. Blocks group ~1MB of records together for compression. Segments are the unit of storage on S3, typically 64MB containing many blocks. Partitions are the ordered sequence of segments that form a topic's log.

The Segment Binary Format

Every segment is a self-contained, immutable file with four sections:

┌─────────────────────────────────┐
│ Header (64 bytes)               │
│  - Magic: 0x5348 ("SH")        │
│  - Version: u16                 │
│  - Flags: u32                   │
│  - Compression: u8             │
│  - Created timestamp: i64       │
│  - Record count: u64            │
│  - Start/End offset: u64        │
├─────────────────────────────────┤
│ Block 0                         │
│  - Compressed records (~1MB)    │
│  - CRC32 checksum              │
├─────────────────────────────────┤
│ Block 1                         │
│  - Compressed records (~1MB)    │
│  - CRC32 checksum              │
├─────────────────────────────────┤
│ ...more blocks...               │
├─────────────────────────────────┤
│ Sparse Index                    │
│  - Offset → byte position      │
│  - One entry per block          │
├─────────────────────────────────┤
│ Footer (16 bytes)               │
│  - Index offset: u64            │
│  - File CRC32: u32             │
│  - Magic: 0x454E ("EN")        │
└─────────────────────────────────┘

The header is read first to verify the file and understand its contents. The footer is read to locate the index. The index maps offsets to byte positions so consumers can jump directly to the block containing a target offset without scanning the entire file.

Record Encoding

Inside each block, records use varint encoding and delta compression for maximum space efficiency:

┌─────────────────────────────────┐
│ Record                          │
│  - Offset delta (varint)        │
│  - Timestamp delta (varint)     │
│  - Key length (varint)          │
│  - Key bytes                    │
│  - Value length (varint)        │
│  - Value bytes                  │
│  - Header count (varint)        │
│  - Header entries               │
└─────────────────────────────────┘

Instead of storing absolute offsets and timestamps, we store deltas from the previous record. A sequence of offsets like [1000, 1001, 1002, 1003] becomes [1000, 1, 1, 1] — varints that encode in a single byte each.

Why LZ4?

We chose LZ4 as the default compression algorithm after extensive benchmarking:

JSON payloads: 4.3x compression ratio, 3.2 GB/s decompression
Protobuf payloads: 1.4x ratio, 3.8 GB/s decompression
Text logs: 8x ratio, 2.9 GB/s decompression

LZ4 decompresses at nearly memory bandwidth speed, which matters because consumers need to decompress segments on every read. We also support Zstd for workloads where storage cost matters more than latency — it achieves roughly 2x better compression at 5-10x slower decompression.

# Choose compression per topic
streamctl topic create --name logs --partitions 6 --compression lz4
streamctl topic create --name archive --partitions 3 --compression zstd

Why 64MB Segments?

The 64MB target isn't arbitrary. It's the result of balancing three forces:

More S3 PUT operations = higher cost ($0.005 per 1000 PUTs)
More metadata entries in PostgreSQL
Better read granularity for small range queries

Fewer S3 operations = lower cost
Less metadata overhead
Higher read amplification — consumers must download more unused data

At 64MB with LZ4, a typical segment holds 100K-500K records and costs ~$0.000005 to PUT. The segment flushes every 10 seconds or when the buffer hits 64MB, whichever comes first.

CRC32 at Every Level

Data integrity is non-negotiable. StreamHouse computes CRC32 checksums at three levels:

Per-block: Each compressed block has a CRC32 of its compressed bytes. If a block fails validation during read, the agent retries the S3 fetch.
Per-file: The footer contains a CRC32 of the entire segment. Corrupted segments are detected during any read operation.
Per-record (WAL): The Write-Ahead Log checksums every individual record before it enters the buffer.

If any checksum fails, StreamHouse rejects the data and logs a corruption event rather than serving bad records.

The Lifecycle of a Segment

Buffering: Records arrive via gRPC/HTTP and enter the agent's in-memory SegmentBuffer, organized by partition
Flushing: When the buffer reaches 64MB or 10 seconds elapse, the agent compresses blocks with LZ4, builds the index, and computes checksums
Upload: The segment is uploaded to S3 as a single PUT operation
Registration: The agent records the segment's S3 path, offset range, and size in the PostgreSQL metadata store
Sealing: The segment is now immutable — it will never be modified, only eventually deleted by retention policies

S3 Path Layout

Segments are organized in S3 with a predictable path structure:

s3://streamhouse-data/
  topics/
    user-events/
      partitions/
        0/
          segments/
            00000000-00000999.seg
            00001000-00001999.seg
        1/
          segments/
            00000000-00000499.seg

This structure enables efficient prefix listing when an agent needs to discover segments for a partition, and makes it easy to configure S3 lifecycle rules for cost optimization.

What This Means for You

The segment format is entirely transparent to users — you never interact with it directly. But understanding it explains several StreamHouse behaviors:

Why produce latency is 50-100ms: Records are buffered until a segment is ready to flush
Why tail reads are fast: Recent data is still in the agent's memory buffer, no S3 fetch needed
Why storage is cheap: LZ4 compression + S3 pricing = pennies per GB per month
Why data is durable: CRC32 checksums + S3's 11 nines = you won't lose events

The segment format is open and documented. Build tools on top of it, inspect your data directly, or just rest easy knowing your events are stored with care.

Inside the Segment: How StreamHouse Stores Billions of Events on S3

Every Event Has a Home

The Storage Hierarchy

The Segment Binary Format

Record Encoding

Why LZ4?

Why 64MB Segments?

CRC32 at Every Level

The Lifecycle of a Segment

S3 Path Layout

What This Means for You

Keep Reading

More from the blog

Real-Time CDC Pipelines: Streaming Database Changes with StreamHouse

Schema Evolution Without the Pain: StreamHouse's Built-In Schema Registry

Ready to try StreamHouse?