Optimize Multi-Partition MERGE Operations with Partition-Level Parallelism

### Feature Request / Improvement

## Problem Statement

When executing MERGE operations that affect a large number of partitions, Iceberg currently processes the entire operation atomically as a single logical operation.

This means all affected partitions are included in one execution plan, requiring Spark to shuffle data across all partitions simultaneously.

The planner cannot isolate partition-specific operations, even though partitions are physically independent storage units.

This creates excessive shuffle overhead in Spark that scales poorly with the number of affected partitions, even when data volume is modest.

We've observed that this shuffle dominates processing time, and reducing batch size doesn't help if the partition count remains high.

## Proposed Enhancement

Enhance Iceberg to natively support partition-aware parallelism for MERGE operations:

1. Add capability to automatically split MERGE operations by partition boundaries.
2. Leverage partition independence to reduce shuffle scope per operation.
3. Process partition groups concurrently while maintaining proper commit semantics.

## Impact

This enhancement would:

- Improve performance for workloads with high partition counts.
- Reduce resource consumption for merge operations.
- Better handle late-arriving data affecting historical partitions.
- Eliminate need for application-level workarounds.

## Current Workaround

Workaround would be an application-level solution that:

1. Splits batch into partition-specific chunks.
2. Processes chunks in parallel with controlled concurrency. 
     - Each chunk contains a small subset of partitions to update
     - Each chunk runs a separate MERGE operation on its partition set
     - Multiple chunks execute concurrently up to a configured limit
4. Reduces shuffle cost per operation by limiting partition scope.
### Downsides of Workaround

1. **Increased Complexity**: Requires custom application logic to handle chunking, concurrency control, and error handling.
2. **Metadata Bloat**: Multiple small commits generate more snapshots, requiring more aggressive compaction and cleanup.
3. **Consistency Challenges**: Application must ensure idempotent processing across job restarts when chunks are partially processed.
4. **Parameter Tuning**: Requires careful configuration of commit retry parameters to handle increased concurrency.
5. **Limited Optimization**: Application has less visibility into underlying storage than Iceberg itself would.
6. **Resource Contention**: Concurrent operations can lead to resource contention at the metadata service level.

--- 

Iceberg version 1.9.1.
Spark Version 3.5.5.

### Query engine

Spark

### Willingness to contribute

- [ ] I can contribute this improvement/feature independently
- [ ] I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- [x] I cannot contribute this improvement/feature at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Multi-Partition MERGE Operations with Partition-Level Parallelism #13957

Feature Request / Improvement

Problem Statement

Proposed Enhancement

Impact

Current Workaround

Downsides of Workaround

Query engine

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize Multi-Partition MERGE Operations with Partition-Level Parallelism #13957

Description

Feature Request / Improvement

Problem Statement

Proposed Enhancement

Impact

Current Workaround

Downsides of Workaround

Query engine

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions