You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<imgalt="Swagger API documentation"src="https://img.shields.io/badge/API-Swagger-informational">
15
12
</a>
16
-
</div>
17
13
18
14
A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co)
19
-
to power LLMs api-inference widgets.
15
+
to power Hugging Chat, the Inference API and Inference Endpoint.
16
+
17
+
</div>
20
18
21
19
## Table of contents
22
20
@@ -85,7 +83,7 @@ The easiest way of getting started is using the official Docker container:
85
83
model=tiiuae/falcon-7b-instruct
86
84
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
87
85
88
-
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.9.4 --model-id $model
86
+
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id $model
89
87
```
90
88
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
0 commit comments