Skip to content

Commit 2e4a590

Browse files
authored
[router] Release router 0.1.0 with dynamic scaling and fault tolerance (#2455)
1 parent c0ee46f commit 2e4a590

File tree

2 files changed

+59
-1
lines changed

2 files changed

+59
-1
lines changed

rust/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "sglang-router"
7-
version = "0.0.11"
7+
version = "0.1.0"
88
description = "SGLang router is a standalone module implemented in Rust to achieve data parallelism across SGLang instances."
99
authors = [{name = "Byron Hsu", email = "byronhsu1230@gmail.com"}]
1010
requires-python = ">=3.8"

rust/v0.1.0.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# SGLang Router v0.1.0: Dynamic Scaling and Fault Tolerance
2+
3+
We have released `sglang-router` v0.1.0 equipped with dynamic scaling and fault tolerance! It is essential for the router to be able to dynamically scale the number of workers and handle worker failures. To achieve this, we have implemented the following features:
4+
5+
## 1. Dynamic scaling: The router can dynamically scale the number of workers based on the request load.
6+
7+
We offer `/add_worker` and `/remove_worker` APIs to dynamically add or remove workers from the router.
8+
9+
- `/add_worker`
10+
11+
Usage:
12+
13+
```bash
14+
$ curl -X POST http://localhost:30000/add_worker?url=http://worker_url_1
15+
```
16+
17+
Example:
18+
19+
```bash
20+
$ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30001
21+
$ curl -X POST http://localhost:30000/add_worker?url=http://127.0.0.1:30001
22+
Successfully added worker: http://127.0.0.1:30001
23+
```
24+
25+
- `/remove_worker`
26+
27+
Usage:
28+
29+
```bash
30+
$ curl -X POST http://localhost:30000/remove_worker?url=http://worker_url_1
31+
```
32+
33+
Example:
34+
35+
```bash
36+
$ curl -X POST http://localhost:30000/remove_worker?url=http://127.0.0.1:30001
37+
Successfully removed worker: http://127.0.0.1:30001
38+
```
39+
40+
Note:
41+
42+
- For cache-aware router, the worker will be removed from the tree and the queues.
43+
44+
## 2. Fault tolerance: The router can handle worker failures and automatically remove the failed worker from the router.
45+
46+
We provide retries based for failure tolerance.
47+
48+
1. If the request to a worker fails for `max_worker_retries` times, the router will remove the worker from the router and move on to the next worker.
49+
2. If the total number of retries exceeds `max_total_retries`, the router will return an error.
50+
51+
Note:
52+
53+
- `max_worker_retries` is 3 and `max_total_retries` is 6 by default.
54+
55+
Closing remarks:
56+
57+
1. Please read the full usage at https://sgl-project.github.io/router/router.html
58+
2. The feature is still under active improvement, so please don't hesitate to raise issues or submit PRs if you have any suggestions or feedback.

0 commit comments

Comments
 (0)