Skip to content

Quantisation with LLaMA.NET #3

@hpretila

Description

@hpretila

LLaMA should include quantisation. This introduces a dilemma between two options:

  • The quantisation is done by invoking the current Python interpreter available on the path to convert model state dicts to ggml. The library will only do the conversion from ggml to quantified ggml.
  • None of it is handled by the model, but pack the scripts with the model for consistency on quantisation.

Done

  • Create any relevant shims or implementation for quantisation
  • Create documentation for quantisation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueGood for newcomersstoryStory derived from bug or feature request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions