This is a sample application that demonstrates how to use Couchbase as a cache for Large Language Models (LLMs) like OpenAI's GPT, using the LangChain framework.
The application makes an initial call to an LLM and caches the response in Couchbase. Subsequent identical requests will be served directly from the cache, resulting in significantly faster response times and reduced API costs.
Features Integrates LangChain with Couchbase for LLM caching. Uses environment variables for secure credential management. Sets a Time-to-Live (TTL) on cached documents to ensure they expire automatically. Provides a clear demonstration of cache hits and misses with timing information.
Python 3.8+ A running Couchbase Server instance (or Couchbase Capella). An OpenAI API key. Setup and Installation Clone the Repository git clone cd llm_cache Create a Python Virtual Environment It is highly recommended to use a virtual environment to manage project dependencies. You will need OpenAI API credits to run this demo.
python3 -m venv venv
source venv/bin/activate
Install the required Python packages using the requirements.txt file.
pip install -r requirements.txt
This application uses environment variables for configuration. You must set the following variables in your terminal before running the script.
- export OPENAI_API_KEY="your-openai-api-key"
- export COUCHBASE_CONN_STR="couchbases://your-couchbase-url"
- export COUCHBASE_USERNAME="your-username"
- export COUCHBASE_PASSWORD="your-password"
- export COUCHBASE_BUCKET_NAME="your-bucket-name"
- export COUCHBASE_SCOPE_NAME="your-scope-name"
- export COUCHBASE_COLLECTION_NAME="your-collection-name"
Note: The script provides default values for a local Couchbase setup (couchbases://localhost, Administrator, password), but it is best practice to set these explicitly. In this project is a _env_sample file that can set these variables once the values are updated for the runtime environment.
In Linux/OS X:
source _env_sample
`Once you have installed the dependencies and configured the environment variables, run the application with the following command:
python app.py
Expected Output The script will make two identical calls to the LLM. The first call will query the LLM and will be slower. The second call will be served from the Couchbase cache and will be significantly faster.
You should see output similar to this:
Couchbase connection successful. Couchbase LLM cache has been set.
--- First LLM Call (should not be cached) --- Response: Why did the robot go on a diet? Because it had too many bytes! Time taken: 1.23 seconds
--- Second LLM Call (should be cached) --- Response: Why did the robot go on a diet? Because it had too many bytes! Time taken: 0.01 seconds
Couchbase connection closed.