NativeChat is a simple, win32 native frontend for llama-cli. It redirects the console input and output of llama-cli into a gui interface. By using llama-cli, you can explore small models directly from your old laptop, with no need for an expensive graphics card.
-
Install C++ Runtime:
Locate and install the C++ runtime in theredistdirectory (VC_redist 2015-2022.x64.exe). -
Download Required Models:
Download the necessary GGUF models from Hugging Face and place them in themodelsdirectory. (qwen2.5-0.5b-instruct-q5_k_m.gguf)
-
Run
NativeChat.exe. -
Select the required model and prompt.
- When changing the model or prompt, the program will restart automatically.
- To run another instance, simply execute the program again.
-
Once the model is loaded, the Send button will become active.
- The lower text area allows you to send input to the model.
- The upper text area displays the model's response.
NativeChat supports single model with various LoRA adapters rather than needing fine-tuned models for each task. To utilize adapters:
- Use the
convert_lora_to_gguf.pyscript from llamacpp to convert the adapter into a GGUF file. - Place this adapter GGUF file in the
adaptersdirectory. - In the
adaptersdirectory, create a UTF-8 text file named after your model (e.g.,"your_model".txt). - List the adapter file names intended for use with that model in the text file, separated by new lines (
\r\n).
- Download nomic-embed-text-v1.5.Q8_0.gguf and place it in the program directory.
- Create a UTF-8 formatted text file containing the knowledge data, with each paragraph separated by
\r\n\r\n. - Import this text file into the vector database using the Add to Vector Database option.
- Enable the Answer Using Vector Database option to utilize embedded text responses.
-
Show Initialization: To view the system prompt on loading, enable the "Show Initialization" option in the settings.
-
Using CUDA for Improved Performance:
For larger models or improved inference speed, replace the following files with their CUDA-equivalent versions. By default, supplied files only utilize the CPU, allowing simple models to run on older systems.ggml.dllllama.dllllama-cli.exellama-embedding.exe
- If the program fails to load or crashes, delete the
settings.cfgfile. - Use Task Manager to terminate any non-responsive
llama-cli.exeprocesses.
NativeChat installation is portable. You can run multiple instances from different folders independently.

