-
Notifications
You must be signed in to change notification settings - Fork 216
Open
Description
I'm trying to use different models with LMQL, but it seems that each new model is loaded onto the GPU. Is it possible to unload a model before loading a new one? I've searched through the code but haven't been able to figure out how to unload a model.
Here is the code I use to load a model :
self._llm = lmql.model(
f"local:llama.cpp:{model.get_model_absolute_path()}",
tokenizer=model.tokenizer,
n_gpu_layers=-1,
n_ctx=4096,
)
I found this issue #228 but it refers to loading model using the cli "lmql serve-model"
Metadata
Metadata
Assignees
Labels
No labels