[Performance] Docker Build & Runtime Performance Optimization

# Docker Performance Analysis: Critical Issues & Optimization Plan

**Analysis Date**: 2025-06-22  
**Impact**: Build times exceed 10-15 minutes, startup delays of 2-10 minutes  
**Priority**: High - affects developer productivity and deployment speed  

## 🔴 Critical Performance Issues

### 1. Registry Container: Massive Dependency Hell
**Location**: `docker/Dockerfile.registry:33-50`  
**Impact**: 5-10 minutes build time  

- **Heavy ML Dependencies**: Installing `torch>=1.6.0`, `sentence-transformers>=2.2.2`, `faiss-cpu>=1.7.4`
- **PyTorch alone is ~800MB+** and takes 5-10 minutes to download/compile
- **FAISS with CPU optimizations** requires compilation
- **Sentence-transformers** pulls additional model dependencies

### 2. Model Download During Runtime
**Location**: `docker/registry-entrypoint.sh:81-113`  
**Impact**: 2-10 minutes startup delay  

- **Downloads 400MB+ sentence-transformer model** at container startup
- **Multiple fallback attempts** with SSL verification disabled
- **Network-dependent startup time** - varies by connection speed
- **Blocks container readiness** until model download completes

### 3. Inefficient Layer Caching
**Location**: `docker/Dockerfile.registry:24`  
**Impact**: Full rebuild on any code change  

- `COPY . /app/` invalidates cache on ANY file change
- Dependencies reinstalled on every code change
- No multi-stage builds to separate dependencies from application code

### 4. Sequential Service Dependencies
**Location**: `docker-compose.yml:31-32`  
**Impact**: Cascading startup delays  

- Registry depends on auth-server
- MCPGW depends on registry
- Each service waits for previous to be ready

### 5. Build Context Size
**Impact**: Slow context transfer to Docker daemon  

- Entire project copied to each container (including `.venv`, logs, etc.)
- No `.dockerignore` to exclude unnecessary files

## 🚀 Immediate Wins (1-2 days implementation)

### 1. Add .dockerignore
**Impact**: 50-80% faster build context transfer  

```dockerignore
.venv/
__pycache__/
*.pyc
.git/
logs/
*.log
.pytest_cache/
.coverage
node_modules/
.DS_Store
```

### 2. Multi-stage Dockerfile for Registry
**Impact**: 60-70% faster rebuilds  

```dockerfile
# Stage 1: Dependencies
FROM python:3.12-slim as deps
ENV PYTHONUNBUFFERED=1
RUN pip install uv
COPY requirements.txt .
RUN uv pip install -r requirements.txt

# Stage 2: Runtime
FROM python:3.12-slim
COPY --from=deps /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY . /app/
WORKDIR /app
```

### 3. Parallel Builds
**Impact**: 40-50% faster total build time  

```bash
# In build_and_run.sh
docker-compose build --parallel
```

### 4. Pre-download Models in Build Stage
**Impact**: Eliminates 2-10 minute startup delay  

```dockerfile
# Add to Dockerfile.registry after dependencies
RUN mkdir -p /app/registry/models/all-MiniLM-L6-v2 && \
    huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 \
    --local-dir /app/registry/models/all-MiniLM-L6-v2
```

## 📈 Expected Performance Improvements

| Optimization | Build Time Reduction | Startup Time Reduction |
|--------------|---------------------|------------------------|
| .dockerignore | 30-50% | 0% |
| Multi-stage builds | 60-70% | 0% |
| Parallel builds | 40-50% | 0% |
| Pre-download models | 0% | 90-95% |
| **Combined** | **85-95%** | **90-95%** |

## 🎯 Implementation Plan

### Phase 1 (Week 1) - Immediate Wins
- [ ] Add comprehensive .dockerignore file
- [ ] Implement parallel builds in build_and_run.sh
- [ ] Pre-download ML models during build stage
- [ ] Add build timing instrumentation

### Phase 2 (Week 2) - Architecture Improvements  
- [ ] Implement multi-stage Dockerfile for registry
- [ ] Optimize health check intervals
- [ ] Improve dependency layer caching

### Phase 3 (Week 3) - Advanced Optimizations
- [ ] Create base ML image with pre-installed dependencies
- [ ] Implement model caching volumes
- [ ] Optimize service startup dependencies

## 📋 Success Metrics

### Target Performance Goals
- **Total build time**: <2 minutes (currently 10-15 minutes)
- **Container startup time**: <30 seconds (currently 2-10 minutes)  
- **Image sizes**: <1GB per service
- **Cache hit rates**: >80%

### Validation Commands
```bash
# Measure build time
time docker-compose build

# Measure startup time  
time docker-compose up -d && docker-compose logs -f --until=30s

# Check image sizes
docker images --format "table {{.Repository}}\t{{.Size}}"
```

## 🔧 Technical Details

The full performance analysis includes:
- Detailed instrumentation strategy for build_and_run.sh
- Layer-by-layer Docker build analysis
- Medium-term solutions (base images, caching volumes)
- Long-term architectural improvements (dependency splitting)

---

This issue addresses critical performance bottlenecks that significantly impact developer productivity and deployment efficiency. The proposed optimizations can reduce build times from 10-15 minutes to under 2 minutes, and startup times from 2-10 minutes to under 30 seconds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Docker Build & Runtime Performance Optimization #70

Docker Performance Analysis: Critical Issues & Optimization Plan

🔴 Critical Performance Issues

1. Registry Container: Massive Dependency Hell

2. Model Download During Runtime

3. Inefficient Layer Caching

4. Sequential Service Dependencies

5. Build Context Size

🚀 Immediate Wins (1-2 days implementation)

1. Add .dockerignore

2. Multi-stage Dockerfile for Registry

3. Parallel Builds

4. Pre-download Models in Build Stage

📈 Expected Performance Improvements

🎯 Implementation Plan

Phase 1 (Week 1) - Immediate Wins

Phase 2 (Week 2) - Architecture Improvements

Phase 3 (Week 3) - Advanced Optimizations

📋 Success Metrics

Target Performance Goals

Validation Commands

🔧 Technical Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimization	Build Time Reduction	Startup Time Reduction
.dockerignore	30-50%	0%
Multi-stage builds	60-70%	0%
Parallel builds	40-50%	0%
Pre-download models	0%	90-95%
Combined	85-95%	90-95%

[Performance] Docker Build & Runtime Performance Optimization #70

Description

Docker Performance Analysis: Critical Issues & Optimization Plan

🔴 Critical Performance Issues

1. Registry Container: Massive Dependency Hell

2. Model Download During Runtime

3. Inefficient Layer Caching

4. Sequential Service Dependencies

5. Build Context Size

🚀 Immediate Wins (1-2 days implementation)

1. Add .dockerignore

2. Multi-stage Dockerfile for Registry

3. Parallel Builds

4. Pre-download Models in Build Stage

📈 Expected Performance Improvements

🎯 Implementation Plan

Phase 1 (Week 1) - Immediate Wins

Phase 2 (Week 2) - Architecture Improvements

Phase 3 (Week 3) - Advanced Optimizations

📋 Success Metrics

Target Performance Goals

Validation Commands

🔧 Technical Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions