Knowledged Optimized Augmentation for Long-context Access (KOALA)

6.5940 Final Project

Kavya Anbarasu, Gilford Ting, Sarah Wang, Jessica Xu, and Joyce Yuan

[(need to update link)paper] [poster][video demo]

TL;DR

By integrating StreamingLLM with Retrieval-Augmented Generation (RAG), we can dynamically retrieve and use relevant context that would otherwise have been evicted from the cache to allow for infinite-length inputs without sacrificing performance.

Diagram

Abstract

Large language models (LLMs) have made significant advancements, yet they remain constrained by a finite attention window, limiting their ability to process information beyond a fixed sequence length. Efficient Streaming Language Models with Attention Sinks (StreamingLLM) partially addresses this by enabling LLMs to generalize to infinite sequence lengths without fine-tuning. However, StreamingLLM cannot access tokens that have been evicted from its cache and loses previous context. To overcome this limitation, we propose deploying StreamingLLM with Retrieval-Augmented Generation (RAG) to create Knowledge Optimized Augmentation for Long-context Access (KOALA). This hybrid approach enables dynamic retrieval of previously evicted tokens to effectively simulate "infinite memory" by reintroducing relevant information back into the model’s attention span as needed. KOALA demonstrates improved results for Needle in Haystack evaluation as well as decreased perplexity compared to StreamingLLM. This solution holds promise for LLM applications requiring sustained, contextually aware responses in real-time, long-context tasks.

Usage

Environment Setup

conda create -yn streaming python=3.8

conda activate streaming

  

pip install torch torchvision torchaudio

pip install transformers==4.33.0 accelerate datasets evaluate wandb scikit-learn scipy sentencepiece

pip install llama-index

  

python setup.py develop

OpenAI Key

An OpenAI Key is needed for LlamaIndex. It can be set in your ~/.bashrc or by running

export OPENAI_API_KEY = "{key}"

Run Demo Chatbot

python examples/koala_demo.py

Run Needle in Haystack Evaluation

python examples/eval_haystack.py

Run Perplexity Evaluation

For the KOALA cache:

python examples/koala_eval_ppl.py --num_eval_tokens 1000

For the original cache to compare:

python examples/original_eval_ppl.py --num_eval_tokens 1000

Note: You can preface each Python script with CUDA_VISIBLE_DEVICES=0 to specify a desired gpu to suit your purposes

Acknowledgements

Thank you to the 6.5940 staff for all your support and a great semester!

Citation

Our project was based off the following paper for StreamingLLM

@article{xiao2023streamingllm,
        title={Efficient Streaming Language Models with Attention Sinks},
        author={Xiao, Guangxuan and Tian, Yuandong and Chen, Beidi and Han, Song and Lewis, Mike},
        journal={arXiv},
        year={2023}
        }

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
assets		assets
data		data
examples		examples
figures		figures
outputs		outputs
streaming_llm		streaming_llm
.gitignore		.gitignore
KOALA_diagram.jpg		KOALA_diagram.jpg
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
test_run_openai.py		test_run_openai.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledged Optimized Augmentation for Long-context Access (KOALA)

TL;DR

Diagram

Abstract

Usage

Environment Setup

OpenAI Key

Run Demo Chatbot

Run Needle in Haystack Evaluation

Run Perplexity Evaluation

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

joyce-yuan/koala

Folders and files

Latest commit

History

Repository files navigation

Knowledged Optimized Augmentation for Long-context Access (KOALA)

TL;DR

Diagram

Abstract

Usage

Environment Setup

OpenAI Key

Run Demo Chatbot

Run Needle in Haystack Evaluation

Run Perplexity Evaluation

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages