Quantisation with LLaMA.NET

LLaMA should include quantisation. This introduces a dilemma between two options:

- The quantisation is done by invoking the current Python interpreter available on the path to convert model state dicts to ggml. The library will only do the conversion from ggml  to quantified ggml.
- None of it is handled by the model, but pack the scripts with the model for consistency on quantisation.

**Done**
- Create any relevant shims or implementation for quantisation
- Create documentation for quantisation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantisation with LLaMA.NET #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Quantisation with LLaMA.NET #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions