Overview
Migrating from Kafka to StreamHouse is straightforward thanks to our Kafka-compatible protocol. This guide walks through the process step by step.
Prerequisites
- StreamHouse 0.6+ deployed
- Access to your Kafka cluster
- Producer/consumer applications ready to update
Step 1: Deploy StreamHouse
Start with a minimal StreamHouse deployment:
# Start infrastructure
docker compose up -d minio postgres
# Run an agent
cargo run --bin agent
Step 2: Create Topics
Mirror your Kafka topic configuration:
streamctl topics create orders --partitions 8
streamctl topics create events --partitions 16
Step 3: Set Up Dual-Write
Update producers to write to both Kafka and StreamHouse:
// Pseudo-code for dual-write
async fn produce(event: Event) {
// Write to Kafka (existing)
kafka_producer.send(event.clone()).await?;
// Write to StreamHouse (new)
streamhouse_producer.send(event).await?;
}
Monitor both systems to ensure data consistency.
Step 4: Migrate Consumers
Once dual-write is stable, migrate consumers one at a time:
// Update connection string
let consumer = StreamHouseConsumer::new(
"streamhouse://agent:9090",
"orders",
"order-processor"
).await?;
Step 5: Disable Kafka Writes
After all consumers are migrated, remove Kafka from producers:
async fn produce(event: Event) {
streamhouse_producer.send(event).await?;
}
Step 6: Decommission Kafka
- Stop Kafka consumers
- Verify no active producers
- Backup Kafka data if needed
- Shut down Kafka cluster
Common Issues
Consumer Lag After Migration
If you see consumer lag after migration, it's likely due to offset differences. Reset offsets to earliest:
streamctl consumer-groups reset order-processor --topic orders --to-earliest
Missing Messages
Enable dual-read temporarily to verify data consistency:
let kafka_msg = kafka_consumer.poll().await?;
let sh_msg = streamhouse_consumer.poll().await?;
assert_eq!(kafka_msg.payload, sh_msg.payload);
Performance Tuning
StreamHouse defaults work well for most workloads, but you may want to tune:
[agent]
segment_flush_interval = "5s" # Increase for better batching
cache_size_mb = 512 # Increase for read-heavy workloads
Conclusion
Migration from Kafka to StreamHouse can be completed in a few hours for simple deployments, or a few days for complex production systems. The key is the dual-write phase—take your time to verify data consistency before cutting over.
Questions? Join our Discord for help.