Skip to content

Conversation

vrushankportkey
Copy link
Collaborator

Description

New Gateway strategy: Sample

Quick info:
The sample strategy fans out a single gateway request to multiple targets in parallel and returns the first-success response. It’s useful for shaving tail latency, probing multiple providers at once, and improving reliability without waiting for sequential fallbacks.

How it works

  • Targets listed under strategy.mode = "sample" are invoked concurrently.
  • Winner selection:
    • If strategy.on_status_codes is set: first response whose HTTP status is NOT in that list wins. Gateway exceptions also count as a win to break out early.
    • Otherwise: first OK (2xx) response wins.
  • Cancellation:
    • If strategy.cancel_others = true, once the winner is decided, all in-flight sibling requests are aborted.
    • If false, non-winners are allowed to finish silently.
  • Circuit breaker: unhealthy targets are filtered (parity with other strategies).
  • Caching, hooks, retries, logging: same pipeline as single/fallback/loadbalance. Each target logs independently; the gateway still returns a single response.

Current limitations

  • Streaming requests (SSE/chunked/ReadableStream input) are not supported under sample in v1.
  • Costs: fan-out increases spend and may hit per-provider rate limits. Use on_status_codes thoughtfully to prevent fast-fail responses from “winning.”

Configuration

  • mode: must be "sample".
  • cancel_others: optional boolean; aborts losers when true.
  • on_status_codes: optional array; statuses to treat as “fail” for winner selection.

Minimal example

{
  "strategy": {
    "mode": "sample",
    "cancel_others": false
  },
  "targets": [
    {
      "provider": "openai",
      "api_key": "<OPENAI_KEY>",
      "override_params": { "model": "gpt-4o-mini" }
    },
    {
      "provider": "google",
      "api_key": "<GEMINI_KEY>",
      "override_params": { "model": "gpt-2.5-flash" }
    }
  ]
}

With winner policy guardrails

{
  "strategy": {
    "mode": "sample",
    "cancel_others": true,
    "on_status_codes": [400, 401, 403, 404, 409, 422, 429, 500, 502, 503, 504]
  },
  "targets": [
    { "provider": "openai", "api_key": "<OPENAI_KEY>", "override_params": { "model": "gpt-4o" } },
    { "provider": "groq", "api_key": "<GROQ_KEY>", "override_params": { "model": "meta-llama/llama-4-scout-17b-16e-instruct" } },
    { "provider": "google", "api_key": "<GOOGLE_KEY>", "override_params": { "model": "gemini-2.0-flash" } }
  ]
}

Request example (curl)

curl -s -X POST http://localhost:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-portkey-config: {\"strategy\":{\"mode\":\"sample\",\"cancel_others\":true,\"on_status_codes\":[400,401,403,404,409,422,429,500,502,503,504]},\"targets\":[{\"provider\":\"openai\",\"api_key\":\"<OPENAI_KEY>\",\"override_params\":{\"model\":\"gpt-4o\"}},{\"provider\":\"groq\",\"api_key\":\"<GROQ_KEY>\",\"override_params\":{\"model\":\"meta-llama/llama-4-scout-17b-16e-instruct\"}},{\"provider\":\"google\",\"api_key\":\"<GOOGLE_KEY>\",\"override_params\":{\"model\":\"gemini-2.0-flash\"}}]}" \
  -d '{"messages":[{"role":"user","content":"Say hello from sample routing."}]}'

Observability

  • Logs/metrics: each target produces an independent log entry (transformed request/response, timings, cache). With cancel_others=true aborted targets show abort errors; with false, they complete normally.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • [s] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Testing

Screenshots (if applicable)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Related Issues

Copy link
Contributor

matter-code-review bot commented Sep 1, 2025

Code Quality refactoring

Description

Summary By MatterAI MatterAI logo

🔄 What Changed

The start-test.js script has been modified to remove the explicit port configuration for the spawned start-server.js process. Previously, the script attempted to use process.env.PORT or '8790' via a --port argument; now, start-server.js is launched without this specific port argument, implying it will use its internal default port.

🔍 Impact of the Change

This change simplifies the start-test.js script by removing the logic for setting a custom port. The start-server.js will now rely on its default port, which might streamline local testing setups if the default is consistent. However, it removes the flexibility to easily override the test server's port via an environment variable directly from start-test.js. This modification is likely a refinement related to the broader "Sample Strategy" feature, ensuring the test environment aligns with new architectural decisions.

📁 Total Files Changed

  • start-test.js: Removed explicit port configuration for the spawned server process.

🧪 Test Added

N/A - This pull request modifies an existing test utility script but does not introduce new functional tests for the application logic.

🔒Security Vulnerabilities

N/A - No new security vulnerabilities were introduced or addressed by this change.

Motivation

This change refines the test environment setup for the "Sample Strategy" by simplifying the start-test.js script's port management, aligning it with the intended default behavior of the start-server.js process.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Testing (Verification that the start-test.js script still successfully launches the server for testing purposes.)

Screenshots (if applicable)

N/A

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Related Issues

N/A

Tip

Quality Recommendations

  1. Clarify the new default port for start-server.js and document how it should be configured or accessed in the test environment.

  2. Ensure that start-server.js has a well-defined default port or a clear mechanism for port configuration when the --port argument is omitted.

Tanka Poem ♫

Port removed from script,
Server finds its own true home.
Default now reigns,
Simpler tests, a quiet hum.
Code evolves, a cosmic dance. ✨

Sequence Diagram

sequenceDiagram
    participant TestScript as start-test.js
    participant NodeChildProcess as node:child_process
    participant ServerApp as build/start-server.js

    TestScript->>+NodeChildProcess: spawn('node', ['build/start-server.js', '--headless'], { stdio: 'inherit' })
    Note over NodeChildProcess: Executes 'node build/start-server.js --headless'
    NodeChildProcess->>+ServerApp: Start server process
    ServerApp-->>-NodeChildProcess: Server initialized (using default port)
    NodeChildProcess-->>-TestScript: Process spawned successfully
    TestScript->>TestScript: app.on('exit', (err) => { ... })

Loading

Copy link
Contributor

@matter-code-review matter-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implements SAMPLE strategy with abort signal support. Good implementation but has a critical type bug and missing error handling.

const cancelOthers = currentTarget.strategy?.cancelOthers;

// v1 limitation: do not support sampling when request body is a ReadableStream
if (request instanceof ReadableStream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug Fix

Issue: Type confusion bug - request instanceof ReadableStream will always be false. The request parameter is a Request object, not a ReadableStream. This check should verify if the request body is a stream.
Fix: Check the request body type instead of the request object type
Impact: Prevents incorrect rejection of streaming requests, ensuring proper validation

Suggested change
if (request instanceof ReadableStream) {
// v1 limitation: do not support sampling when request body is a ReadableStream
if (request.body instanceof ReadableStream) {

Comment on lines +774 to +778
const isSuccess =
(Array.isArray(onStatusCodes) &&
!onStatusCodes.includes(resp?.status)) ||
(!onStatusCodes && resp?.ok) ||
gatewayException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug Fix

Issue: Logic error in success condition - the boolean OR operator | should be logical OR ||. This causes incorrect type coercion and wrong success evaluation.
Fix: Use logical OR operator for proper boolean evaluation
Impact: Fixes success detection logic to properly identify successful responses

Suggested change
const isSuccess =
(Array.isArray(onStatusCodes) &&
!onStatusCodes.includes(resp?.status)) ||
(!onStatusCodes && resp?.ok) ||
gatewayException;
const isSuccess =
(Array.isArray(onStatusCodes) &&
!onStatusCodes.includes(resp?.status)) ||
(!onStatusCodes && resp?.ok) ||
gatewayException;

for (const ctl of controllers) {
try {
ctl.abort();
} catch {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Code Refactor

Issue: Empty catch block silently ignores abort errors, making debugging difficult
Fix: Add minimal error logging for abort operations
Impact: Improves debugging capability while maintaining non-blocking behavior

Suggested change
} catch {}
} catch (e) {
// Ignore abort errors but log for debugging
console.debug('Abort controller error:', e);
}

Comment on lines +780 to +788
winnerResolved = true;
resolveWinner(resp);
if (cancelOthers) {
for (const ctl of controllers) {
try {
ctl.abort();
} catch {}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Code Refactor

Issue: Duplicate abort controller cleanup logic - same code block appears twice (lines 782-788 and 812-818)
Fix: Extract abort cleanup into a reusable function
Impact: Reduces code duplication and improves maintainability

Suggested change
winnerResolved = true;
resolveWinner(resp);
if (cancelOthers) {
for (const ctl of controllers) {
try {
ctl.abort();
} catch {}
}
}
const cleanupControllers = () => {
if (cancelOthers) {
for (const ctl of controllers) {
try {
ctl.abort();
} catch (e) {
console.debug('Abort controller error:', e);
}
}
}
};
winnerResolved = true;
resolveWinner(resp);
cleanupControllers();

Comment on lines +13 to +19
if (externalAbortSignal) {
if (externalAbortSignal.aborted) {
controller.abort();
} else {
externalAbortSignal.addEventListener('abort', () => controller.abort());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug Fix

Issue: Race condition - external abort signal could be triggered between the aborted check and event listener registration
Fix: Register event listener first, then check abort status
Impact: Prevents missed abort signals and ensures reliable cancellation

Suggested change
if (externalAbortSignal) {
if (externalAbortSignal.aborted) {
controller.abort();
} else {
externalAbortSignal.addEventListener('abort', () => controller.abort());
}
}
if (externalAbortSignal) {
externalAbortSignal.addEventListener('abort', () => controller.abort());
if (externalAbortSignal.aborted) {
controller.abort();
}
}

Copy link
Contributor

Clean simplification removing hardcoded port configuration. Code change is straightforward with no critical issues detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant