swap: true
doesn't do anything
#274
-
If I make 3 requests at the same time to different model in My config: healthCheckTimeout: 1800
models:
"gemma-3-27b-it":
cmd: |
llama-server
-m /models/gemma-3-27b-it-UD-IQ2_M.gguf
--ctx-size 2048
--cache-type-k q8_0
--cache-type-v q8_0
--flash-attn
--mlock
--port ${PORT}
"Mistral-Small-24B-Instruct-2501":
cmd: |
ik_llama-server
-m /models/Mistral-Small-24B-Instruct-2501.i1-IQ3_XXS.gguf
--ctx-size 2048
--cache-type-k q8_0
--cache-type-v q8_0
--mlock
--gpu-layers 0
--temp 0.15
--top-p 1.00
--flash-attn
--port ${PORT}
"EtherealAurora-12B-v2":
cmd: |
llama-server
-m /models/EtherealAurora-12B-v2.i1-Q4_K_M.gguf
--ctx-size 2048
--cache-type-k q8_0
--cache-type-v q8_0
--mlock
--gpu-layers 0
--flash-attn
--port ${PORT}
"solar-10.7b-instruct-v1.0":
cmd: |
llama-server
-m /models/solar-10.7b-instruct-v1.0.Q4_K_M.gguf
--ctx-size 2048
--cache-type-k q8_0
--cache-type-v q8_0
--flash-attn
--port ${PORT}
"TiTan-Qwen2.5-0.5B":
cmd: |
llama-server
-m /models/TiTan-Qwen2.5-0.5B-q4_k_m.gguf
--ctx-size 2048
--cache-type-k q8_0
--cache-type-v q8_0
--temp 0.7
--top-p 0.9
--flash-attn
--keep -1
--port ${PORT}
groups:
group1:
swap: true
exclusive: false
members:
- gemma-3-27b-it
- Mistral-Small-24B-Instruct-2501
- EtherealAurora-12B-v2
- solar-10.7b-instruct-v1.0
group2:
swap: false
exclusive: false
members:
- TiTan-Gemma3-0.27B |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi, set logging level to debug and share the proxy logs when you’re requesting the models simultaneously please. |
Beta Was this translation helpful? Give feedback.
-
@Garaymie thanks for reporting this. I fixed it in #277. It was a small fix but tricky to get testing right. |
Beta Was this translation helpful? Give feedback.
@Garaymie thanks for reporting this. I fixed it in #277. It was a small fix but tricky to get testing right.