Skip to content

This Repository is designed to process latin medieval accounting book data for the project "Aldersbach Digital" using different LLM Provider (OpenAI, Mistral, Gemini, Anthropic).

License

Notifications You must be signed in to change notification settings

MaxVogeltanz/LLMProcessing_HistoricalAccountingBookData

Repository files navigation

This folder is designed to process latin medieval accounting book data for the project Aldersbach digital using different LLMs and LLM Provider (Mistral, OpenAI, Anthropic, Gemini).

--- JSONtoRDF.py ---
scripts that iterate a specially designed system prompt over a list of user prompts which are json files representing structured accounting book entries and their context. The prompting is designed to make the respective LLM create very specific structured RDF-XML content out of the individual input json objects and combine them into one single xml-file at the end. A log file will also be created. Parameters and file paths can be set in config.yaml. Make sure to have all requirements installed in python (-> requirements.txt)

--- config.yaml ---
used to set model name, provider, parameters, file paths before running with JSONtoRDF.py

--- data/JSONtoRDF ---
this folder contains all input files necessary for the rdf_encoder scripts.
prompts/systemprompt.txt contains the systemprompt, encoding rules and 5 few shot examples. The entire system prompt has been carefully created and tested. 
plaintextentries_[number].json represent the json objects described above. These files were originally created somewhere else.
The [number] reflects the number of the specific latin accounting book. The one file with "Test" in it is a good test file because it only contains 4 objects.

--- providers/... ---
accompanying scripts for JSONtoRDF.py and add_transactiontype_to_json.py. Each of the scripts define specific code requirements for different proprietary LLM-Providers (OPEN AI, MISTRAL AI, ANTHROPIC, GEMINI). They are used in the main scripts depending on how the config.yaml is set

--- .env ---
contains API KEYS registered to the different LLM providers used
is currently set to my (= Max Vogeltanz) personal keys


--- output/JSONtoRDF ---
folder with subfolders named after the providers used. Subfolders /raw should be used for output file generated by JSONtoRDF.py.
Postprocessing can be applied with postprocess_output.py, transforming files in "raw" to "postprocessed" accordingly.

--- GPTAPI_basic.py ---
Simple standalone script using OPEN AI API to process a simple text prompt

--- ChatGPT_API_Tutorial.ipynb ---
Small jupyter notebook explaining GPTAPI_basic.py

About

This Repository is designed to process latin medieval accounting book data for the project "Aldersbach Digital" using different LLM Provider (OpenAI, Mistral, Gemini, Anthropic).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published