Retention Policies

Configure how long StreamHouse retains data.

5 min readStorage

Data Retention

Retention policies control how long messages are kept in a topic before being deleted. StreamHouse supports time-based retention, size-based retention, or both. When a retention limit is reached, the oldest segments are deleted from S3 and their metadata is removed.

Time-Based Retention

Time-based retention deletes segments older than the specified duration. This is the most common retention strategy.

bash
# Set 30-day retention
streamctl topic create --name events --retention 30d

# Supported units: m (minutes), h (hours), d (days)
# Examples: 1h, 7d, 90d

# Set infinite retention (never delete)
streamctl topic create --name audit-log --retention infinite

Size-Based Retention

Size-based retention caps the total size of a topic. When the limit is exceeded, the oldest segments are deleted.

bash
# Cap topic at 100GB
streamctl topic create --name metrics --retention-bytes 100GB

# Combine time and size (whichever triggers first)
streamctl topic create --name logs \
  --retention 7d \
  --retention-bytes 500GB

Log Compaction

For topics that represent state (like a changelog), log compaction keeps only the latest value for each key. This is useful for maintaining a materialized view that can be rebuilt from the topic.

  • Compaction runs as a background process on the agent
  • Only the most recent record per key is retained
  • Tombstone records (null value) mark a key for deletion
  • Compacted topics can still have time-based retention for tombstones