Skip to content

Commit 0f79162

Browse files
tengomuchodanieldk
andauthored
chore: prepare version 3.3.5 (#3314)
* chore: prepare version 3.3.5 * black * neuron: black * Update hf-xet in uv lockfile * Attempt to fix API doc check failure Add `error_type` where missing. * Pin redocly version * Sync redocly with Nix for now --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>
1 parent 06d9d88 commit 0f79162

37 files changed

+143
-112
lines changed

.github/workflows/autodocs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,5 +41,5 @@ jobs:
4141

4242
- name: Check that documentation is up-to-date
4343
run: |
44-
npm install -g @redocly/cli
44+
npm install -g @redocly/cli@1.34.2
4545
python update_doc.py --check

Cargo.lock

Lines changed: 8 additions & 8 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ default-members = [
2121
resolver = "2"
2222

2323
[workspace.package]
24-
version = "3.3.4-dev0"
24+
version = "3.3.5-dev0"
2525
edition = "2021"
2626
authors = ["Olivier Dehaene"]
2727
homepage = "https://github.com/huggingface/text-generation-inference"

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ model=HuggingFaceH4/zephyr-7b-beta
8484
volume=$PWD/data
8585

8686
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
87-
ghcr.io/huggingface/text-generation-inference:3.3.4 --model-id $model
87+
ghcr.io/huggingface/text-generation-inference:3.3.5 --model-id $model
8888
```
8989

9090
And then you can make requests like
@@ -121,7 +121,7 @@ curl localhost:8080/v1/chat/completions \
121121

122122
**Note:** To use NVIDIA GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 12.2 or higher. For running the Docker container on a machine with no GPUs or CUDA support, it is enough to remove the `--gpus all` flag and add `--disable-custom-kernels`, please note CPU is not the intended platform for this project, so performance might be subpar.
123123

124-
**Note:** TGI supports AMD Instinct MI210 and MI250 GPUs. Details can be found in the [Supported Hardware documentation](https://huggingface.co/docs/text-generation-inference/installation_amd#using-tgi-with-amd-gpus). To use AMD GPUs, please use `docker run --device /dev/kfd --device /dev/dri --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:3.3.4-rocm --model-id $model` instead of the command above.
124+
**Note:** TGI supports AMD Instinct MI210 and MI250 GPUs. Details can be found in the [Supported Hardware documentation](https://huggingface.co/docs/text-generation-inference/installation_amd#using-tgi-with-amd-gpus). To use AMD GPUs, please use `docker run --device /dev/kfd --device /dev/dri --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:3.3.5-rocm --model-id $model` instead of the command above.
125125

126126
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or in the cli):
127127
```
@@ -152,7 +152,7 @@ volume=$PWD/data # share a volume with the Docker container to avoid downloading
152152
token=<your cli READ token>
153153

154154
docker run --gpus all --shm-size 1g -e HF_TOKEN=$token -p 8080:80 -v $volume:/data \
155-
ghcr.io/huggingface/text-generation-inference:3.3.4 --model-id $model
155+
ghcr.io/huggingface/text-generation-inference:3.3.5 --model-id $model
156156
```
157157

158158
### A note on Shared Memory (shm)

backends/gaudi/examples/docker_commands/docker_commands.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ docker run -p 8080:80 \
1919
--ipc=host \
2020
-v $volume:/data \
2121
-e HF_TOKEN=$hf_token \
22-
ghcr.io/huggingface/text-generation-inference:3.3.4-gaudi \
22+
ghcr.io/huggingface/text-generation-inference:3.3.5-gaudi \
2323
--model-id $model \
2424
--max-input-tokens 1024 --max-total-tokens 2048 \
2525
--max-batch-prefill-tokens 2048 --max-batch-size 32 \
@@ -39,7 +39,7 @@ docker run -p 8080:80 \
3939
--ipc=host \
4040
-v $volume:/data \
4141
-e HF_TOKEN=$hf_token \
42-
ghcr.io/huggingface/text-generation-inference:3.3.4-gaudi \
42+
ghcr.io/huggingface/text-generation-inference:3.3.5-gaudi \
4343
--model-id $model \
4444
--sharded true --num-shard 8 \
4545
--max-input-tokens 1024 --max-total-tokens 2048 \
@@ -58,7 +58,7 @@ docker run -p 8080:80 \
5858
--cap-add=sys_nice \
5959
--ipc=host \
6060
-v $volume:/data \
61-
ghcr.io/huggingface/text-generation-inference:3.3.4-gaudi \
61+
ghcr.io/huggingface/text-generation-inference:3.3.5-gaudi \
6262
--model-id $model \
6363
--max-input-tokens 4096 --max-batch-prefill-tokens 16384 \
6464
--max-total-tokens 8192 --max-batch-size 4
@@ -81,7 +81,7 @@ docker run -p 8080:80 \
8181
--ipc=host \
8282
-v $volume:/data \
8383
-e HF_TOKEN=$hf_token \
84-
ghcr.io/huggingface/text-generation-inference:3.3.4-gaudi \
84+
ghcr.io/huggingface/text-generation-inference:3.3.5-gaudi \
8585
--model-id $model \
8686
--kv-cache-dtype fp8_e4m3fn \
8787
--max-input-tokens 1024 --max-total-tokens 2048 \
@@ -102,7 +102,7 @@ docker run -p 8080:80 \
102102
--ipc=host \
103103
-v $volume:/data \
104104
-e HF_TOKEN=$hf_token \
105-
ghcr.io/huggingface/text-generation-inference:3.3.4-gaudi \
105+
ghcr.io/huggingface/text-generation-inference:3.3.5-gaudi \
106106
--model-id $model \
107107
--kv-cache-dtype fp8_e4m3fn \
108108
--sharded true --num-shard 8 \

backends/neuron/tests/server/test_prefill.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ def _test_prefill(config_name, generator, batch_size, do_sample):
5656
assert tokens.ids[0] == expectations[0]
5757
assert tokens.texts[0] == expectations[1]
5858

59+
5960
def test_prefill_truncate(neuron_model_config):
6061
config_name = neuron_model_config["name"]
6162
neuron_model_path = neuron_model_config["neuron_model_path"]

docs/openapi.json

Lines changed: 41 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"name": "Apache 2.0",
1111
"url": "https://www.apache.org/licenses/LICENSE-2.0"
1212
},
13-
"version": "3.3.4-dev0"
13+
"version": "3.3.5-dev0"
1414
},
1515
"paths": {
1616
"/": {
@@ -57,7 +57,8 @@
5757
"$ref": "#/components/schemas/ErrorResponse"
5858
},
5959
"example": {
60-
"error": "Input validation error"
60+
"error": "Input validation error",
61+
"error_type": "validation"
6162
}
6263
}
6364
}
@@ -70,7 +71,8 @@
7071
"$ref": "#/components/schemas/ErrorResponse"
7172
},
7273
"example": {
73-
"error": "Request failed during generation"
74+
"error": "Request failed during generation",
75+
"error_type": "generation"
7476
}
7577
}
7678
}
@@ -83,7 +85,8 @@
8385
"$ref": "#/components/schemas/ErrorResponse"
8486
},
8587
"example": {
86-
"error": "Model is overloaded"
88+
"error": "Model is overloaded",
89+
"error_type": "overloaded"
8790
}
8891
}
8992
}
@@ -96,7 +99,8 @@
9699
"$ref": "#/components/schemas/ErrorResponse"
97100
},
98101
"example": {
99-
"error": "Incomplete generation"
102+
"error": "Incomplete generation",
103+
"error_type": "incomplete_generation"
100104
}
101105
}
102106
}
@@ -181,7 +185,8 @@
181185
"$ref": "#/components/schemas/ErrorResponse"
182186
},
183187
"example": {
184-
"error": "Input validation error"
188+
"error": "Input validation error",
189+
"error_type": "validation"
185190
}
186191
}
187192
}
@@ -194,7 +199,8 @@
194199
"$ref": "#/components/schemas/ErrorResponse"
195200
},
196201
"example": {
197-
"error": "Request failed during generation"
202+
"error": "Request failed during generation",
203+
"error_type": "generation"
198204
}
199205
}
200206
}
@@ -207,7 +213,8 @@
207213
"$ref": "#/components/schemas/ErrorResponse"
208214
},
209215
"example": {
210-
"error": "Model is overloaded"
216+
"error": "Model is overloaded",
217+
"error_type": "overloaded"
211218
}
212219
}
213220
}
@@ -220,7 +227,8 @@
220227
"$ref": "#/components/schemas/ErrorResponse"
221228
},
222229
"example": {
223-
"error": "Incomplete generation"
230+
"error": "Incomplete generation",
231+
"error_type": "incomplete_generation"
224232
}
225233
}
226234
}
@@ -264,7 +272,8 @@
264272
"$ref": "#/components/schemas/ErrorResponse"
265273
},
266274
"example": {
267-
"error": "Input validation error"
275+
"error": "Input validation error",
276+
"error_type": "validation"
268277
}
269278
}
270279
}
@@ -277,7 +286,8 @@
277286
"$ref": "#/components/schemas/ErrorResponse"
278287
},
279288
"example": {
280-
"error": "Request failed during generation"
289+
"error": "Request failed during generation",
290+
"error_type": "generation"
281291
}
282292
}
283293
}
@@ -290,7 +300,8 @@
290300
"$ref": "#/components/schemas/ErrorResponse"
291301
},
292302
"example": {
293-
"error": "Model is overloaded"
303+
"error": "Model is overloaded",
304+
"error_type": "overloaded"
294305
}
295306
}
296307
}
@@ -303,7 +314,8 @@
303314
"$ref": "#/components/schemas/ErrorResponse"
304315
},
305316
"example": {
306-
"error": "Incomplete generation"
317+
"error": "Incomplete generation",
318+
"error_type": "incomplete_generation"
307319
}
308320
}
309321
}
@@ -558,7 +570,8 @@
558570
"$ref": "#/components/schemas/ErrorResponse"
559571
},
560572
"example": {
561-
"error": "Input validation error"
573+
"error": "Input validation error",
574+
"error_type": "validation"
562575
}
563576
}
564577
}
@@ -571,7 +584,8 @@
571584
"$ref": "#/components/schemas/ErrorResponse"
572585
},
573586
"example": {
574-
"error": "Request failed during generation"
587+
"error": "Request failed during generation",
588+
"error_type": "generation"
575589
}
576590
}
577591
}
@@ -584,7 +598,8 @@
584598
"$ref": "#/components/schemas/ErrorResponse"
585599
},
586600
"example": {
587-
"error": "Model is overloaded"
601+
"error": "Model is overloaded",
602+
"error_type": "overloaded"
588603
}
589604
}
590605
}
@@ -597,7 +612,8 @@
597612
"$ref": "#/components/schemas/ErrorResponse"
598613
},
599614
"example": {
600-
"error": "Incomplete generation"
615+
"error": "Incomplete generation",
616+
"error_type": "incomplete_generation"
601617
}
602618
}
603619
}
@@ -646,7 +662,8 @@
646662
"$ref": "#/components/schemas/ErrorResponse"
647663
},
648664
"example": {
649-
"error": "Input validation error"
665+
"error": "Input validation error",
666+
"error_type": "validation"
650667
}
651668
}
652669
}
@@ -659,7 +676,8 @@
659676
"$ref": "#/components/schemas/ErrorResponse"
660677
},
661678
"example": {
662-
"error": "Request failed during generation"
679+
"error": "Request failed during generation",
680+
"error_type": "generation"
663681
}
664682
}
665683
}
@@ -672,7 +690,8 @@
672690
"$ref": "#/components/schemas/ErrorResponse"
673691
},
674692
"example": {
675-
"error": "Model is overloaded"
693+
"error": "Model is overloaded",
694+
"error_type": "overloaded"
676695
}
677696
}
678697
}
@@ -685,7 +704,8 @@
685704
"$ref": "#/components/schemas/ErrorResponse"
686705
},
687706
"example": {
688-
"error": "Incomplete generation"
707+
"error": "Incomplete generation",
708+
"error_type": "incomplete_generation"
689709
}
690710
}
691711
}

0 commit comments

Comments
 (0)