Falcon-180B // Any 🤗 model via --model meta/llama

KillianLucas · web-flow · commit a79080319511 · 2023-09-10T07:41:58.000-07:00
diff --git a/README.md b/README.md
@@ -151,6 +151,20 @@ print(interpreter.system_message)
 
 ### Change the Model
 
+For `gpt-3.5-turbo`, use fast mode:
+
+```shell
+interpreter --fast
+```
+
+In Python, you will need to set the model manually:
+
+```python
+interpreter.model = "gpt-3.5-turbo"
+```
+
+### Running Open Interpreter locally
+
 ⓘ  **Issues running locally?** Read our new [GPU setup guide](/docs/GPU.md) and [Windows setup guide](/docs/WINDOWS.md).
 
 You can run `interpreter` in local mode from the command line to use `Code Llama`:
@@ -159,16 +173,20 @@ You can run `interpreter` in local mode from the command line to use `Code Llama
 interpreter --local
 ```
 
-For `gpt-3.5-turbo`, use fast mode:
+Or run any HugginFace model **locally** by using its repo ID (e.g. "tiiuae/falcon-180B"):
 
 ```shell
-interpreter --fast
+interpreter --model tiiuae/falcon-180B
 ```
 
-In Python, you will need to set the model manually:
+#### Local model params
 
-```python
-interpreter.model = "gpt-3.5-turbo"
+You can easily modify the `max_tokens` and `context_window` (in tokens) of locally running models.
+
+Smaller context windows will use less RAM, so we recommend trying a shorter window if GPU is failing.
+
+```shell
+interpreter --max_tokens 2000 --context_window 16000
 ```
 
 ### Azure Support