[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections #61417

duanyyyyyyy · 2025-07-29T16:44:26Z

[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections

Why I'm doing:

The BRPC Stub cache will never be evicted and will always stored in the cache.
Therefore there will be some bug here.
In the case that when users are using K8s to deploy the StarRocks backend, after sometime running there will be many log like this.

After check the BRPC code in https://github.com/apache/brpc/blob/1.8.0/src/brpc/socket.cpp#L1340
And from the grafana monitor

This cluster have 148 BE nodes, normally it will be about 148 BRPC stub but after some times running, some pods restarted and the IP changed, some of the node BRPC stub creased to over 200.
And as we all known the endpoint in BRPC are made of ip and host, that means if the IP is changed the endpoint will still in the cache.

What I'm doing:

Here I introduce a BrpcStubManager to periodically check and cleanup expired BRPC stub for BrpcStubCache, HttpBrpcStubCache and LakeServiceBrpcStubCache

Record the last access time when get the stub for given endpoint
Loop all the endpoint in _stub_map or _last_access_time_map check if now - last_access_time is larger than expire time or not
If less, will not clean up the endpoint
If larger or equal, will try to clean up the endpoint

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

duanyyyyyyy · 2025-08-01T02:28:05Z

Hi @kevincai master, are there any suggestion for this case? Or any comment for this PR?

kevincai · 2025-08-01T02:34:02Z

Hi @kevincai master, are there any suggestion for this case? Or any comment for this PR?

will take a look today.

be/src/util/brpc_stub_cache.h

be/src/util/internal_service_recoverable_stub.cpp

xhumanoid · 2025-08-11T08:13:28Z

@duanyyyyyyy gentle ping =)

duanyyyyyyy · 2025-08-11T08:44:16Z

@duanyyyyyyy gentle ping =)

sry for forget.

duanyyyyyyy · 2025-08-13T03:18:33Z

@duanyyyyyyy gentle ping =)

Seems some unrelated test failed, let me retry for it

Copilot

Pull Request Overview

This PR introduces a cleanup mechanism for BRPC stub caches to prevent accumulation of unused connections, particularly in Kubernetes deployments where pod IP addresses change frequently. The implementation adds periodic cleanup of expired stubs across three cache types.

Key changes:

Added BrpcStubManager to periodically clean up expired stubs from all cache types
Implemented expiration tracking by recording last access times for cache entries
Added configuration options for cleanup interval and expiration timeout

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
be/src/util/brpc_stub_cache.h	Added expiration tracking fields and cleanup methods to cache classes, introduced BrpcStubManager
be/src/util/brpc_stub_cache.cpp	Implemented cleanup logic and manager thread for periodic stub expiration
be/src/runtime/exec_env.h	Added BrpcStubManager member to execution environment
be/src/runtime/exec_env.cpp	Integrated BrpcStubManager lifecycle with ExecEnv initialization/destruction
be/src/common/config.h	Added configuration parameters for cleanup interval and expiration timeout
be/test/util/brpc_stub_cache_test.cpp	Added unit tests for cleanup functionality across all cache types

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

be/src/util/brpc_stub_cache.h

be/src/util/brpc_stub_cache.cpp

stdpain · 2025-08-25T08:26:39Z

be/src/util/brpc_stub_cache.cpp

+    auto task = _cleanup_task_map.seek(endpoint);
+    if (task != nullptr) {
+        _pipeline_timer->unschedule(task.get());
+    }
+    timespec tm = butil::seconds_from_now(config::brpc_stub_expire_s);
+    auto task = std::make_shared<EndpointCleanupTask>(this, endpoint);
+    auto st = _pipeline_timer->schedule(task.get(), tm);
+    if (st.ok()) {
+        _cleanup_task_map[endpoint] = task;
+    }
+
    auto stub_pool = _stub_map.seek(endpoint);
    if (stub_pool == nullptr) {
        StubPool* pool = new StubPool();


how about use the same map ?.

butil::FlatMap<butil::EndPoint, std::pair<StubPool*, std::shared_ptr<EndpointCleanupTask>>> _stub_map;

duanyyyyyyy · 2025-09-02T14:10:45Z

@kevincai @stdpain Finished

duanyyyyyyy · 2025-09-02T14:59:15Z

@kevincai @stdpain Finished

Sry for miss one comment from stdpain, let me fix it.

…ctions Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>

github-actions · 2025-09-02T18:11:43Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2025-09-02T18:11:47Z

[FE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2025-09-02T18:14:29Z

[BE Incremental Coverage Report]

✅ pass : 105 / 111 (94.59%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	be/src/util/brpc_stub_cache.cpp	99	105	94.29%	[68, 183, 184, 195, 258, 269]
🔵	be/src/runtime/exec_env.cpp	1	1	100.00%	[]
🔵	be/src/util/brpc_stub_cache.h	5	5	100.00%	[]

duanyyyyyyy requested a review from a team as a code owner July 29, 2025 16:44

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 3b4e99e to e354a96 Compare July 29, 2025 16:44

mergify bot assigned duanyyyyyyy Jul 29, 2025

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 6 times, most recently from 314606b to 35038ec Compare July 30, 2025 17:24

kevincai self-requested a review July 31, 2025 09:18

github-actions bot added the 3.5 label Aug 1, 2025

kevincai reviewed Aug 1, 2025

View reviewed changes

duanyyyyyyy closed this Aug 11, 2025

duanyyyyyyy reopened this Aug 11, 2025

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 35038ec to 81b4f55 Compare August 12, 2025 18:53

duanyyyyyyy requested a review from a team as a code owner August 12, 2025 18:53

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 5 times, most recently from a3dbfc0 to ae36e29 Compare August 13, 2025 00:54

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from ae36e29 to 40fb8af Compare August 13, 2025 03:46

github-actions bot added the 4.0 label Aug 13, 2025

kevincai requested a review from Copilot August 13, 2025 08:13

Copilot AI reviewed Aug 13, 2025

View reviewed changes

github-actions bot removed the documentation Improvements or additions to documentation label Aug 24, 2025

stdpain reviewed Aug 25, 2025

View reviewed changes

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 11 times, most recently from c9782bc to fc5b948 Compare September 2, 2025 09:59

github-actions bot added the documentation Improvements or additions to documentation label Sep 2, 2025

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 4 times, most recently from 8746ce1 to 30c82c3 Compare September 2, 2025 11:54

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 2 times, most recently from 2abe155 to 2e639ff Compare September 2, 2025 14:51

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 2e639ff to 35abe95 Compare September 2, 2025 15:15

duanyyyyyyy added 3 commits September 3, 2025 00:45

[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused conne…

4264ea7

…ctions Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>

[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused conne…

3269ae6

…ctions Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>

[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused conne…

d2fc1f6

…ctions Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>

duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 35abe95 to d2fc1f6 Compare September 2, 2025 16:46

[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections #61417

Are you sure you want to change the base?

[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections #61417

Uh oh!

Conversation

duanyyyyyyy commented Jul 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Uh oh!

duanyyyyyyy commented Aug 1, 2025

Uh oh!

kevincai commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xhumanoid commented Aug 11, 2025

Uh oh!

duanyyyyyyy commented Aug 11, 2025

Uh oh!

duanyyyyyyy commented Aug 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stdpain Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

duanyyyyyyy commented Sep 2, 2025

Uh oh!

duanyyyyyyy commented Sep 2, 2025

Uh oh!

github-actions bot commented Sep 2, 2025

[Java-Extensions Incremental Coverage Report]

Uh oh!

github-actions bot commented Sep 2, 2025

[FE Incremental Coverage Report]

Uh oh!

github-actions bot commented Sep 2, 2025

[BE Incremental Coverage Report]

file detail

Uh oh!

Uh oh!

duanyyyyyyy commented Jul 29, 2025 •

edited by github-actions bot

Loading