Enhancement: Custom endpoints support #245
HenkieTenkie62
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
llama-swap will stay as close as possible to the openai api. We have allowed for some llama-server specific endpoints but these are on a case by case basis. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As most people I'm working with limited resources and thus need model swapping.
For ASR/STT and Document-OCR I'm now unloading models to free up VRAM for processing.
Would it be an idea to support custom endpoints to automate this?
These could reside in llamahost:8080/custom/
This way llama-swap would be in full control over this process.
Endpoints could be configured like models and placed inside groups to avoid large models/applications being loaded at the same time and causing OOM.
Proxy addresses can be kept the same; the start and stop cmd line and healthCheckpoint can be also be repurposed for this exact goal.
I guess these requests probably don't follow any openAI style API and thus need no reprocessing, only rerouting.
Would this be an interesting option or would this go beyond the scope of the project?
I think it would be a great addition.
I've now glued the function in but the implementation is not very flexible:
main...HenkieTenkie62:llama-swap:main#diff-36e6873dfbfa67366814a9233117652a77b868fa2de6082e6d060fcbb91a5fb8
Beta Was this translation helpful? Give feedback.
All reactions