Skip to content

ClickHouse: Improve Async Insert Performance by Adding Concurrency #27

@absorbb

Description

@absorbb

Background

Currently, Bulker supports async inserts but processes them in a single-threaded manner. In stream mode, it listens to new messages on the incoming topic and sends a separate HTTP request to ClickHouse for each message, waiting for a response before continuing. This approach is significantly slower than batching due to the per-message roundtrip latency.

Opportunity

Since ClickHouse async inserts don't require waiting for a response, we can safely improve performance by sending inserts concurrently. Introducing a pool of N concurrent connections will allow Bulker to process multiple inserts in parallel, significantly increasing throughput.

Proposed Implementation

  • Create a fixed-size pool of HTTP connections (or workers).
  • Route insert requests to a random or round-robin worker in the pool.
  • Do not await the HTTP response (fire-and-forget model), but optionally handle errors asynchronously (e.g., logging failed responses in the background).
  • Ensure the pool size (N) is configurable for tuning based on deployment needs.

Benefits

  • Improved message processing throughput in stream mode.
  • Better resource utilization on high-throughput pipelines.
  • Minimal changes required since async insert semantics already support non-blocking behavior.

Benchmark (TODO)

We’ll benchmark performance before and after introducing concurrency to measure improvement. Metrics to compare:

  • Messages processed per second
  • Latency per insert

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions