Skip to content

Conversation

duanyyyyyyy
Copy link
Contributor

@duanyyyyyyy duanyyyyyyy commented Jul 29, 2025

[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections

Why I'm doing:

The BRPC Stub cache will never be evicted and will always stored in the cache.
Therefore there will be some bug here.
In the case that when users are using K8s to deploy the StarRocks backend, after sometime running there will be many log like this.
image
After check the BRPC code in https://github.com/apache/brpc/blob/1.8.0/src/brpc/socket.cpp#L1340
And from the grafana monitor
image
This cluster have 148 BE nodes, normally it will be about 148 BRPC stub but after some times running, some pods restarted and the IP changed, some of the node BRPC stub creased to over 200.
And as we all known the endpoint in BRPC are made of ip and host, that means if the IP is changed the endpoint will still in the cache.

What I'm doing:

Here I introduce a BrpcStubManager to periodically check and cleanup expired BRPC stub for BrpcStubCache, HttpBrpcStubCache and LakeServiceBrpcStubCache

  1. Record the last access time when get the stub for given endpoint
  2. Loop all the endpoint in _stub_map or _last_access_time_map check if now - last_access_time is larger than expire time or not
  3. If less, will not clean up the endpoint
  4. If larger or equal, will try to clean up the endpoint

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

@duanyyyyyyy duanyyyyyyy requested a review from a team as a code owner July 29, 2025 16:44
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 3b4e99e to e354a96 Compare July 29, 2025 16:44
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 6 times, most recently from 314606b to 35038ec Compare July 30, 2025 17:24
@kevincai kevincai self-requested a review July 31, 2025 09:18
@duanyyyyyyy
Copy link
Contributor Author

Hi @kevincai master, are there any suggestion for this case? Or any comment for this PR?

@kevincai
Copy link
Contributor

kevincai commented Aug 1, 2025

Hi @kevincai master, are there any suggestion for this case? Or any comment for this PR?

will take a look today.

@github-actions github-actions bot added the 3.5 label Aug 1, 2025
@xhumanoid
Copy link
Contributor

@duanyyyyyyy gentle ping =)

@duanyyyyyyy duanyyyyyyy reopened this Aug 11, 2025
@duanyyyyyyy
Copy link
Contributor Author

@duanyyyyyyy gentle ping =)

sry for forget.

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 35038ec to 81b4f55 Compare August 12, 2025 18:53
@duanyyyyyyy duanyyyyyyy requested a review from a team as a code owner August 12, 2025 18:53
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 5 times, most recently from a3dbfc0 to ae36e29 Compare August 13, 2025 00:54
@duanyyyyyyy
Copy link
Contributor Author

@duanyyyyyyy gentle ping =)

Seems some unrelated test failed, let me retry for it

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from ae36e29 to 40fb8af Compare August 13, 2025 03:46
@github-actions github-actions bot added the 4.0 label Aug 13, 2025
@kevincai kevincai requested a review from Copilot August 13, 2025 08:13
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a cleanup mechanism for BRPC stub caches to prevent accumulation of unused connections, particularly in Kubernetes deployments where pod IP addresses change frequently. The implementation adds periodic cleanup of expired stubs across three cache types.

Key changes:

  • Added BrpcStubManager to periodically clean up expired stubs from all cache types
  • Implemented expiration tracking by recording last access times for cache entries
  • Added configuration options for cleanup interval and expiration timeout

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
be/src/util/brpc_stub_cache.h Added expiration tracking fields and cleanup methods to cache classes, introduced BrpcStubManager
be/src/util/brpc_stub_cache.cpp Implemented cleanup logic and manager thread for periodic stub expiration
be/src/runtime/exec_env.h Added BrpcStubManager member to execution environment
be/src/runtime/exec_env.cpp Integrated BrpcStubManager lifecycle with ExecEnv initialization/destruction
be/src/common/config.h Added configuration parameters for cleanup interval and expiration timeout
be/test/util/brpc_stub_cache_test.cpp Added unit tests for cleanup functionality across all cache types

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Aug 24, 2025
Comment on lines 48 to 61
auto task = _cleanup_task_map.seek(endpoint);
if (task != nullptr) {
_pipeline_timer->unschedule(task.get());
}
timespec tm = butil::seconds_from_now(config::brpc_stub_expire_s);
auto task = std::make_shared<EndpointCleanupTask>(this, endpoint);
auto st = _pipeline_timer->schedule(task.get(), tm);
if (st.ok()) {
_cleanup_task_map[endpoint] = task;
}

auto stub_pool = _stub_map.seek(endpoint);
if (stub_pool == nullptr) {
StubPool* pool = new StubPool();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use the same map ?.

    butil::FlatMap<butil::EndPoint, std::pair<StubPool*, std::shared_ptr<EndpointCleanupTask>>> _stub_map;

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 11 times, most recently from c9782bc to fc5b948 Compare September 2, 2025 09:59
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 2, 2025
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 4 times, most recently from 8746ce1 to 30c82c3 Compare September 2, 2025 11:54
@duanyyyyyyy
Copy link
Contributor Author

@kevincai @stdpain Finished

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 2 times, most recently from 2abe155 to 2e639ff Compare September 2, 2025 14:51
@duanyyyyyyy
Copy link
Contributor Author

@kevincai @stdpain Finished

Sry for miss one comment from stdpain, let me fix it.

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 2e639ff to 35abe95 Compare September 2, 2025 15:15
…ctions

Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>
…ctions

Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>
…ctions

Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 35abe95 to d2fc1f6 Compare September 2, 2025 16:46
Copy link

github-actions bot commented Sep 2, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Sep 2, 2025

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Sep 2, 2025

[BE Incremental Coverage Report]

pass : 105 / 111 (94.59%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/util/brpc_stub_cache.cpp 99 105 94.29% [68, 183, 184, 195, 258, 269]
🔵 be/src/runtime/exec_env.cpp 1 1 100.00% []
🔵 be/src/util/brpc_stub_cache.h 5 5 100.00% []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants