Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,11 @@ public Long getToDeleteReplicaId(Long tabletId) {
return replicaId == null ? -1L : replicaId;
}

@Override
public boolean containsCachedReplicaId(Long tabletId) {
return cachedReplicaId.containsKey(tabletId);
}

private void setCachedReplicaId(Long tabletId, Long replicaId) {
cachedReplicaId.put(tabletId, replicaId);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,10 @@ public Long getToDeleteReplicaId(Long tabletId) {
return -1L;
}

public boolean containsCachedReplicaId(Long tabletId) {
return false;
}

public ClusterLoadStatistic getClusterLoadStatistic() {
return loadStatistic.get();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1185,7 +1185,13 @@ private boolean deleteLocationMismatchReplica(TabletSchedCtx tabletCtx, boolean
if (backend != null) {
Pair<String, String> location = backend.getSingleLevelLocationKV();
if (location != null && matchedLocations.contains(location)) {
dupReplica = replica;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about iterating replicas backwards and remove the replica with duplicate location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wyb Iterating replicas backwards and removing the replica with duplicate location is indeed one solution, but in that case, wouldn’t this mean that

// tabletId -> replicaId
// used to delete src replica after copy task success
private final Map<Long, Long> cachedReplicaId = new ConcurrentHashMap<>();

becomes useless? This set operation only happens in completeSchedCtx inside createBalanceTask. And since rebalance will inevitably cause duplicate replicas, if we clean them up directly here, doesn’t that mean the subsequent deleteReplicaChosenByRebalancer can also be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current implement doesn't guarantee that the source replica must be deleted via deleteReplicaChosenByRebalancer after balancing.

// This check is to handle the case where a new BE node is added and a disk balance occurs.
// After the replica from srcBE has been successfully cloned to destBE, the replica on srcBE
// will already be placed into the "to-be-deleted" queue. Therefore, at this point, a redundant
// replica is guaranteed to exist, and we can rely on the later cleanup queue to remove it.
if (!rebalancer.containsCachedReplicaId(tabletCtx.getTabletId())) {
dupReplica = replica;
}
} else {
matchedLocations.add(location);
}
Expand Down
Loading