-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections #61417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3b4e99e
to
e354a96
Compare
314606b
to
35038ec
Compare
Hi @kevincai master, are there any suggestion for this case? Or any comment for this PR? |
will take a look today. |
@duanyyyyyyy gentle ping =) |
|
35038ec
to
81b4f55
Compare
a3dbfc0
to
ae36e29
Compare
Seems some unrelated test failed, let me retry for it |
ae36e29
to
40fb8af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a cleanup mechanism for BRPC stub caches to prevent accumulation of unused connections, particularly in Kubernetes deployments where pod IP addresses change frequently. The implementation adds periodic cleanup of expired stubs across three cache types.
Key changes:
- Added
BrpcStubManager
to periodically clean up expired stubs from all cache types - Implemented expiration tracking by recording last access times for cache entries
- Added configuration options for cleanup interval and expiration timeout
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
be/src/util/brpc_stub_cache.h | Added expiration tracking fields and cleanup methods to cache classes, introduced BrpcStubManager |
be/src/util/brpc_stub_cache.cpp | Implemented cleanup logic and manager thread for periodic stub expiration |
be/src/runtime/exec_env.h | Added BrpcStubManager member to execution environment |
be/src/runtime/exec_env.cpp | Integrated BrpcStubManager lifecycle with ExecEnv initialization/destruction |
be/src/common/config.h | Added configuration parameters for cleanup interval and expiration timeout |
be/test/util/brpc_stub_cache_test.cpp | Added unit tests for cleanup functionality across all cache types |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
be/src/util/brpc_stub_cache.cpp
Outdated
auto task = _cleanup_task_map.seek(endpoint); | ||
if (task != nullptr) { | ||
_pipeline_timer->unschedule(task.get()); | ||
} | ||
timespec tm = butil::seconds_from_now(config::brpc_stub_expire_s); | ||
auto task = std::make_shared<EndpointCleanupTask>(this, endpoint); | ||
auto st = _pipeline_timer->schedule(task.get(), tm); | ||
if (st.ok()) { | ||
_cleanup_task_map[endpoint] = task; | ||
} | ||
|
||
auto stub_pool = _stub_map.seek(endpoint); | ||
if (stub_pool == nullptr) { | ||
StubPool* pool = new StubPool(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about use the same map ?.
butil::FlatMap<butil::EndPoint, std::pair<StubPool*, std::shared_ptr<EndpointCleanupTask>>> _stub_map;
c9782bc
to
fc5b948
Compare
8746ce1
to
30c82c3
Compare
2abe155
to
2e639ff
Compare
2e639ff
to
35abe95
Compare
…ctions Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>
…ctions Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>
…ctions Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>
35abe95
to
d2fc1f6
Compare
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 105 / 111 (94.59%) file detail
|
[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections
Why I'm doing:
The BRPC Stub cache will never be evicted and will always stored in the cache.


Therefore there will be some bug here.
In the case that when users are using K8s to deploy the StarRocks backend, after sometime running there will be many log like this.
After check the BRPC code in https://github.com/apache/brpc/blob/1.8.0/src/brpc/socket.cpp#L1340
And from the grafana monitor
This cluster have 148 BE nodes, normally it will be about 148 BRPC stub but after some times running, some pods restarted and the IP changed, some of the node BRPC stub creased to over 200.
And as we all known the endpoint in BRPC are made of ip and host, that means if the IP is changed the endpoint will still in the cache.
What I'm doing:
Here I introduce a BrpcStubManager to periodically check and cleanup expired BRPC stub for BrpcStubCache, HttpBrpcStubCache and LakeServiceBrpcStubCache
_stub_map
or_last_access_time_map
check if now - last_access_time is larger than expire time or notWhat type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: