Skip to content

Unload model from GPU #362

@miqaP

Description

@miqaP

I'm trying to use different models with LMQL, but it seems that each new model is loaded onto the GPU. Is it possible to unload a model before loading a new one? I've searched through the code but haven't been able to figure out how to unload a model.
Here is the code I use to load a model :

self._llm = lmql.model(
                    f"local:llama.cpp:{model.get_model_absolute_path()}",
                    tokenizer=model.tokenizer,
                    n_gpu_layers=-1,
                    n_ctx=4096,
                )

I found this issue #228 but it refers to loading model using the cli "lmql serve-model"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions