-
Notifications
You must be signed in to change notification settings - Fork 0
This Repository is designed to process latin medieval accounting book data for the project "Aldersbach Digital" using different LLM Provider (OpenAI, Mistral, Gemini, Anthropic).
License
MaxVogeltanz/LLMProcessing_HistoricalAccountingBookData
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This folder is designed to process latin medieval accounting book data for the project Aldersbach digital using different LLMs and LLM Provider (Mistral, OpenAI, Anthropic, Gemini). --- JSONtoRDF.py --- scripts that iterate a specially designed system prompt over a list of user prompts which are json files representing structured accounting book entries and their context. The prompting is designed to make the respective LLM create very specific structured RDF-XML content out of the individual input json objects and combine them into one single xml-file at the end. A log file will also be created. Parameters and file paths can be set in config.yaml. Make sure to have all requirements installed in python (-> requirements.txt) --- config.yaml --- used to set model name, provider, parameters, file paths before running with JSONtoRDF.py --- data/JSONtoRDF --- this folder contains all input files necessary for the rdf_encoder scripts. prompts/systemprompt.txt contains the systemprompt, encoding rules and 5 few shot examples. The entire system prompt has been carefully created and tested. plaintextentries_[number].json represent the json objects described above. These files were originally created somewhere else. The [number] reflects the number of the specific latin accounting book. The one file with "Test" in it is a good test file because it only contains 4 objects. --- providers/... --- accompanying scripts for JSONtoRDF.py and add_transactiontype_to_json.py. Each of the scripts define specific code requirements for different proprietary LLM-Providers (OPEN AI, MISTRAL AI, ANTHROPIC, GEMINI). They are used in the main scripts depending on how the config.yaml is set --- .env --- contains API KEYS registered to the different LLM providers used is currently set to my (= Max Vogeltanz) personal keys --- output/JSONtoRDF --- folder with subfolders named after the providers used. Subfolders /raw should be used for output file generated by JSONtoRDF.py. Postprocessing can be applied with postprocess_output.py, transforming files in "raw" to "postprocessed" accordingly. --- GPTAPI_basic.py --- Simple standalone script using OPEN AI API to process a simple text prompt --- ChatGPT_API_Tutorial.ipynb --- Small jupyter notebook explaining GPTAPI_basic.py
About
This Repository is designed to process latin medieval accounting book data for the project "Aldersbach Digital" using different LLM Provider (OpenAI, Mistral, Gemini, Anthropic).
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published