-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Docker Performance Analysis: Critical Issues & Optimization Plan
Analysis Date: 2025-06-22
Impact: Build times exceed 10-15 minutes, startup delays of 2-10 minutes
Priority: High - affects developer productivity and deployment speed
🔴 Critical Performance Issues
1. Registry Container: Massive Dependency Hell
Location: docker/Dockerfile.registry:33-50
Impact: 5-10 minutes build time
- Heavy ML Dependencies: Installing
torch>=1.6.0
,sentence-transformers>=2.2.2
,faiss-cpu>=1.7.4
- PyTorch alone is ~800MB+ and takes 5-10 minutes to download/compile
- FAISS with CPU optimizations requires compilation
- Sentence-transformers pulls additional model dependencies
2. Model Download During Runtime
Location: docker/registry-entrypoint.sh:81-113
Impact: 2-10 minutes startup delay
- Downloads 400MB+ sentence-transformer model at container startup
- Multiple fallback attempts with SSL verification disabled
- Network-dependent startup time - varies by connection speed
- Blocks container readiness until model download completes
3. Inefficient Layer Caching
Location: docker/Dockerfile.registry:24
Impact: Full rebuild on any code change
COPY . /app/
invalidates cache on ANY file change- Dependencies reinstalled on every code change
- No multi-stage builds to separate dependencies from application code
4. Sequential Service Dependencies
Location: docker-compose.yml:31-32
Impact: Cascading startup delays
- Registry depends on auth-server
- MCPGW depends on registry
- Each service waits for previous to be ready
5. Build Context Size
Impact: Slow context transfer to Docker daemon
- Entire project copied to each container (including
.venv
, logs, etc.) - No
.dockerignore
to exclude unnecessary files
🚀 Immediate Wins (1-2 days implementation)
1. Add .dockerignore
Impact: 50-80% faster build context transfer
.venv/
__pycache__/
*.pyc
.git/
logs/
*.log
.pytest_cache/
.coverage
node_modules/
.DS_Store
2. Multi-stage Dockerfile for Registry
Impact: 60-70% faster rebuilds
# Stage 1: Dependencies
FROM python:3.12-slim as deps
ENV PYTHONUNBUFFERED=1
RUN pip install uv
COPY requirements.txt .
RUN uv pip install -r requirements.txt
# Stage 2: Runtime
FROM python:3.12-slim
COPY --from=deps /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY . /app/
WORKDIR /app
3. Parallel Builds
Impact: 40-50% faster total build time
# In build_and_run.sh
docker-compose build --parallel
4. Pre-download Models in Build Stage
Impact: Eliminates 2-10 minute startup delay
# Add to Dockerfile.registry after dependencies
RUN mkdir -p /app/registry/models/all-MiniLM-L6-v2 && \
huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 \
--local-dir /app/registry/models/all-MiniLM-L6-v2
📈 Expected Performance Improvements
Optimization | Build Time Reduction | Startup Time Reduction |
---|---|---|
.dockerignore | 30-50% | 0% |
Multi-stage builds | 60-70% | 0% |
Parallel builds | 40-50% | 0% |
Pre-download models | 0% | 90-95% |
Combined | 85-95% | 90-95% |
🎯 Implementation Plan
Phase 1 (Week 1) - Immediate Wins
- Add comprehensive .dockerignore file
- Implement parallel builds in build_and_run.sh
- Pre-download ML models during build stage
- Add build timing instrumentation
Phase 2 (Week 2) - Architecture Improvements
- Implement multi-stage Dockerfile for registry
- Optimize health check intervals
- Improve dependency layer caching
Phase 3 (Week 3) - Advanced Optimizations
- Create base ML image with pre-installed dependencies
- Implement model caching volumes
- Optimize service startup dependencies
📋 Success Metrics
Target Performance Goals
- Total build time: <2 minutes (currently 10-15 minutes)
- Container startup time: <30 seconds (currently 2-10 minutes)
- Image sizes: <1GB per service
- Cache hit rates: >80%
Validation Commands
# Measure build time
time docker-compose build
# Measure startup time
time docker-compose up -d && docker-compose logs -f --until=30s
# Check image sizes
docker images --format "table {{.Repository}}\t{{.Size}}"
🔧 Technical Details
The full performance analysis includes:
- Detailed instrumentation strategy for build_and_run.sh
- Layer-by-layer Docker build analysis
- Medium-term solutions (base images, caching volumes)
- Long-term architectural improvements (dependency splitting)
This issue addresses critical performance bottlenecks that significantly impact developer productivity and deployment efficiency. The proposed optimizations can reduce build times from 10-15 minutes to under 2 minutes, and startup times from 2-10 minutes to under 30 seconds.