-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[BugFix] Fix redundant replica handling after clone #62542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Hongkun Xu <xuhongkun666@163.com>
|
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
@@ -1185,7 +1185,13 @@ private boolean deleteLocationMismatchReplica(TabletSchedCtx tabletCtx, boolean | |||
if (backend != null) { | |||
Pair<String, String> location = backend.getSingleLevelLocationKV(); | |||
if (location != null && matchedLocations.contains(location)) { | |||
dupReplica = replica; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about iterating replicas backwards and remove the replica with duplicate location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wyb Iterating replicas backwards and removing the replica with duplicate location is indeed one solution, but in that case, wouldn’t this mean that
// tabletId -> replicaId
// used to delete src replica after copy task success
private final Map<Long, Long> cachedReplicaId = new ConcurrentHashMap<>();
becomes useless? This set operation only happens in completeSchedCtx inside createBalanceTask. And since rebalance will inevitably cause duplicate replicas, if we clean them up directly here, doesn’t that mean the subsequent deleteReplicaChosenByRebalancer can also be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current implement doesn't guarantee that the source replica must be deleted via deleteReplicaChosenByRebalancer
after balancing.
Why I'm doing:
Under the current rebalance + deletion mechanism, when a new BE node is added, data cannot be successfully migrated to it.
Here’s why:
As a result, the new BE node never keeps the migrated data, making the rebalance ineffective.
What I'm doing:
I added an extra check in the redundant replica detection logic:
Fixes #62541
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: