-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
Background
Currently, Bulker supports async inserts but processes them in a single-threaded manner. In stream mode, it listens to new messages on the incoming topic and sends a separate HTTP request to ClickHouse for each message, waiting for a response before continuing. This approach is significantly slower than batching due to the per-message roundtrip latency.
Opportunity
Since ClickHouse async inserts don't require waiting for a response, we can safely improve performance by sending inserts concurrently. Introducing a pool of N concurrent connections will allow Bulker to process multiple inserts in parallel, significantly increasing throughput.
Proposed Implementation
- Create a fixed-size pool of HTTP connections (or workers).
- Route insert requests to a random or round-robin worker in the pool.
- Do not await the HTTP response (fire-and-forget model), but optionally handle errors asynchronously (e.g., logging failed responses in the background).
- Ensure the pool size (
N
) is configurable for tuning based on deployment needs.
Benefits
- Improved message processing throughput in stream mode.
- Better resource utilization on high-throughput pipelines.
- Minimal changes required since async insert semantics already support non-blocking behavior.
Benchmark (TODO)
We’ll benchmark performance before and after introducing concurrency to measure improvement. Metrics to compare:
- Messages processed per second
- Latency per insert
Metadata
Metadata
Assignees
Labels
No labels