ðStudyFriend is a pool of AI tools that helps me study.
ðStudyFriend runs AI model locally thanks to the open-source community â€
ðStudyFriend can:
- Generate a Q&A file from pdfs and images, useful for self-evaluation of study materials.
- Display Q&A file.
- Convert pdfs into images.
âYou're more than welcome to fix, add or suggest study tools.â
To build use python>=3.10:
python -m pip install .
Important
On both Windows/Linux: install CUDA Toolkit>=11.8 and torch>=2.4+cu118.
On Windows: add poppler /bin in you PATH, as stated in pdf2image.
On Linux: install poppler via: apt-get install poppler-utils
Look at the cookbooks!
To generate Q&A file from pdfs:
python -m study_friend.query -d ./samples
Note
On MAC: ~6GB of unified RAM are required.
On Windows/Linux: 4GB of GPU VRAM are required
To display the Q&A file:
python -m study_friend.display -f ./samples/output.md
To generate images from pdfs:
python -m study_friend.convert -d ./samples
To print help:
python -m study_friend.query -h
python -m study_friend.convert -h
python -m study_friend.display -h
Tip
Use --image_size to control the size of converted images.
The smaller the image size the smoller the amount of memory needed to store the prompt tokens, at the cost of less intepretability of the images.
Tip
Use --title_prompt, --question_prompt, --answer_prompt to control the prompts used to query the AI model.
You can find the default prompts in utils.py.
Warning
Markdown beatufier heavily depends on prompts templates, change it accordingly or disable it.
output.md is a Q&A file automatically generated from the slides in presentation.pdf (taken from this repo of mine), after being transformed into these images.
This command was used:
python -m study_friend.query -d ./samples -im 700
On my Mac M1, using default ð€Qwen2.5-VL-7B-Instruct-4bit and --image_size of 500 it yields:
Prompt: 69.904 tokens-per-sec
Generation: 12.865 tokens-per-sec
Peak memory: 6.206 GB
On my Mac M1, using default ð€Qwen2.5-VL-7B-Instruct-4bit and --image_size of 700 it yields:
Prompt: 58.693 tokens-per-sec
Generation: 11.566 tokens-per-sec
Peak memory: 7.351 GB
A brief and incomplete list of things to do or fix in this project:
- MLX support
- CUDA support
- ð€Transformers integration
- ð€smolagents integration
Thanks go to the open-source community that makes this possible.
mlx-vlm - Vision model inferencing for MLX.
mlx-community/Qwen2.5-VL-7B-Instruct-4bit - ð€HuggingFace quantized version of Qwen2.5-VL from MLX-community.
unsloth/Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit - ð€HuggingFace quantized version of Qwen2.5-VL from unsloth.
flask - Pallets lightweight WSGI web application framework.
- I cannot load the model with the following error in the ð€Transformers library:
ValueError: Unrecognized image processor ...
Try installing this commit of ð€Transformers v4.49.0 as stated here.
Alternatively, avoid installing ð€Transformers on build:
python -m pip install git+https://github.com/huggingface/transformers.git@1931a351408dbd1d0e2c4d6d7ee0eb5e8807d7bf
python -m pip install . --no-dependencies
- I cannot load the model with the following error:
... CUDAOutOfMemory ...or similar.
Try playing with the--group_sizeargument starting from 1 upwards, eventually play with the--image_sizeargument:
python -m study_friend.query -d ./samples -g 1 -im 250
-
How can I make the model generate faster?
Lower the computational burden by lowering--image_sizeand--group_sizearguments, eventually use--max_tokensto limit output generation at a specified length. -
I don't have a GPU. What can I do?
Use a free Colab account, start with the cookbooks.
Marco Sangiorgi
2025©