Load testing openai/gpt-oss-20b with vLLM and Docker

This repository contains everything necessary to replicate the load test to the vLLM server with the openai/gpt-oss-20b model, where 98.5% of successful requests were obtained (It was run on an NVIDIA H100, but it is expected that with 3 NVIDIA RTX 50590, they can have a similar performance)

Command to start the vLLM server:

docker run --gpus all \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:v0.10.1 \
    --model openai/gpt-oss-20b \
    --api-key dummyapikey \
    --async-scheduling

And to run the load test:

python load_test_vllm_gpt_oss_20b.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
load_test_vllm_gpt_oss_20b.py		load_test_vllm_gpt_oss_20b.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Load testing openai/gpt-oss-20b with vLLM and Docker

About

Uh oh!

Releases

Packages

Languages

Aquiles-ai/load-test-vllm-gpt-oss-20b

Folders and files

Latest commit

History

Repository files navigation

Load testing openai/gpt-oss-20b with vLLM and Docker

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages