Skip to content

Commit 081b926

Browse files
v0.8.0
1 parent b8b950b commit 081b926

File tree

6 files changed

+12
-18
lines changed

6 files changed

+12
-18
lines changed

Cargo.lock

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ members = [
88
]
99

1010
[workspace.package]
11-
version = "0.7.0"
11+
version = "0.8.0"
1212
edition = "2021"
1313
authors = ["Olivier Dehaene"]
1414
homepage = "https://github.com/huggingface/text-generation-inference"

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ to power LLMs api-inference widgets.
4242
- Serve the most popular Large Language Models with a simple launcher
4343
- Tensor Parallelism for faster inference on multiple GPUs
4444
- Token streaming using Server-Sent Events (SSE)
45-
- [Continous batching of incoming requests](https://github.com/huggingface/text-generation-inference/tree/main/router) for increased total throughput
45+
- [Continuous batching of incoming requests](https://github.com/huggingface/text-generation-inference/tree/main/router) for increased total throughput
4646
- Optimized transformers code for inference using [flash-attention](https://github.com/HazyResearch/flash-attention) on the most popular architectures
4747
- Quantization with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
4848
- [Safetensors](https://github.com/huggingface/safetensors) weight loading
@@ -61,6 +61,9 @@ to power LLMs api-inference widgets.
6161
- [Llama](https://github.com/facebookresearch/llama)
6262
- [OPT](https://huggingface.co/facebook/opt-66b)
6363
- [SantaCoder](https://huggingface.co/bigcode/santacoder)
64+
- [Starcoder](https://huggingface.co/bigcode/starcoder)
65+
- [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b)
66+
- [Falcon 40B](https://huggingface.co/tiiuae/falcon-40b)
6467

6568
Other architectures are supported on a best effort basis using:
6669

@@ -81,7 +84,7 @@ model=bigscience/bloom-560m
8184
num_shard=2
8285
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
8386

84-
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.7 --model-id $model --num-shard $num_shard
87+
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard
8588
```
8689
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
8790

docs/openapi.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"name": "Apache 2.0",
1111
"url": "https://www.apache.org/licenses/LICENSE-2.0"
1212
},
13-
"version": "0.7.0"
13+
"version": "0.8.0"
1414
},
1515
"paths": {
1616
"/": {

server/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "text-generation-server"
3-
version = "0.7.0"
3+
version = "0.8.0"
44
description = "Text Generation Inference Python gRPC Server"
55
authors = ["Olivier Dehaene <olivier@huggingface.co>"]
66

supported_models.json

Lines changed: 0 additions & 9 deletions
This file was deleted.

0 commit comments

Comments
 (0)