Refactor: Implement Singleton ModelManager to fix VRAM memory leak by Siddhazntx · Pull Request #521 · AOSSIE-Org/EduAid

Siddhazntx · 2026-03-03T10:53:51Z

Addressed Issues:

Fixes #520

Screenshots/Recordings:

Server Startup Output :

> [nltk_data]    |
> [nltk_data]  Done downloading collection popular
> Starting Flask App...
> Initializing Shared ModelManager... Loading massive models into memory ONCE.
> You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>.

Additional Notes:

Summary of Changes:

Memory Optimization: Implemented a ModelManager Singleton in main.py to prevent server.py from loading the 3GB t5-large model and spacy tools multiple times into RAM/VRAM.
Refactoring: Updated MCQGenerator, ShortQGenerator, and ParaphraseGenerator to utilize lightweight pointers to the shared models via the ModelManager. This drops the overall memory footprint from ~9GB down to ~3GB and significantly speeds up server boot times.
Test Fix: Updated the stale /get_answer endpoint string in test_server.py to the correct /get_shortq_answer route. All backend tests now pass locally without any Out-Of-Memory (OOM) crashes.

This is strictly an internal architectural improvement; the external API behavior remains completely unchanged.

Checklist

My PR addresses a single issue, fixes a single bug or makes a single improvement.
My code follows the project's code style and conventions
If applicable, I have made corresponding changes or additions to the documentation
If applicable, I have made corresponding changes or additions to tests
My changes generate no new warnings or errors
I have joined the Discord server and I will share a link to this PR with the project maintainers there
I have read the Contribution Guidelines
Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.

AI Usage Disclosure

Check one of the checkboxes below:

This PR does not contain AI-generated code at all.
This PR contains AI-generated code. I have tested the code locally and I am responsible for it.

Summary by CodeRabbit

Refactor
- Share heavy ML resources across question generators to reduce startup cost and improve inference consistency
- Improved device handling and disabled unnecessary gradient tracking for more reliable, faster predictions
Chores
- Renamed test API endpoint for short-question answers to better reflect its purpose

coderabbitai · 2026-03-03T10:54:09Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97bf585 and 687653a.

📒 Files selected for processing (1)

backend/Generator/main.py

📝 Walkthrough

Walkthrough

Introduces a ModelManager singleton that lazily loads and shares heavy ML models and NLP tools across generators; refactors MCQGenerator, ShortQGenerator, ParaphraseGenerator, and BoolQGenerator to use the shared resources. AnswerPredictor now moves its NLI model to the detected device, sets eval mode, and runs boolean prediction without grad. Test endpoint renamed.

Changes

Cohort / File(s)	Summary
Model manager & generators `backend/Generator/main.py`	Adds `ModelManager` singleton for lazy, shared loading of tokenizer/model/device/nlp/s2v/fdist/normalized_levenshtein; refactors `MCQGenerator`, `ShortQGenerator`, `ParaphraseGenerator`, and `BoolQGenerator` to obtain shared resources instead of constructing duplicates.
Answer prediction fixes `backend/Generator/main.py`	Updates `AnswerPredictor` to load/push the NLI model to `self.device`, call `model.eval()`, decorate `predict_boolean_answer` with `@torch.no_grad()`, and move input tensors to the correct device before inference.
Test endpoint rename `backend/test_server.py`	Renames test route and test function from `/get_answer`/`test_get_answer` to `/get_shortq_answer`/`test_get_shortq_answer` and updates printed output formatting.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ShortQGenerator
    participant ModelManager
    participant NLIModel

    Client->>ShortQGenerator: POST /get_shortq_answer (input)
    ShortQGenerator->>ModelManager: request shared resources (tokenizer, model, device, nlp, s2v, fdist)
    ModelManager-->>ShortQGenerator: return shared references (lazily loaded if needed)
    ShortQGenerator->>ShortQGenerator: prepare inputs (tokenize, preprocess, move to device)
    ShortQGenerator->>NLIModel: call AnswerPredictor.predict_boolean_answer (inputs on device, no-grad, eval)
    NLIModel-->>ShortQGenerator: boolean prediction
    ShortQGenerator-->>Client: response (answer)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I nibble code with care and cheer,
A single Manager now sits near.
No triple loads that made me grieve,
Generators share — less to achieve.
Hoppity hop, the server's clear!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check	❓ Inconclusive	The PR includes a minor test endpoint rename (test_get_answer to test_get_shortq_answer) which is a necessary correction unrelated to the core objective but internally justified.	Clarify whether the test endpoint rename was part of the original scope or if it should be separated into a distinct PR for better change isolation.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Refactor: Implement Singleton ModelManager to fix VRAM memory leak' accurately summarizes the main change—introducing a ModelManager singleton to resolve memory inefficiency.
Linked Issues check	✅ Passed	The PR successfully implements the ModelManager singleton pattern to eliminate redundant model loading across generators, achieving the primary objective of issue `#520`.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/test_server.py (1)

77-77: Align test naming/log label with the new endpoint.

Endpoint update is correct, but keeping old test_get_answer naming/output text is misleading during debugging.

✏️ Suggested cleanup

-def test_get_answer():
+def test_get_shortq_answer():
     endpoint = '/get_shortq_answer'
@@
-    print(f'/get_answer Response: {response}')
+    print(f'/get_shortq_answer Response: {response}')
@@
-    test_get_answer()
+    test_get_shortq_answer()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/test_server.py` at line 77, The test still uses the old name/label
"test_get_answer" while the endpoint variable was changed to
'/get_shortq_answer'; rename the test function and any log/assert messages to
match the new endpoint (e.g., change function name test_get_answer ->
test_get_shortq_answer and update any printed/logged strings or pytest ids), and
update any references to the endpoint in the test body that mention the old name
so test output and labels reflect endpoint variable endpoint =
'/get_shortq_answer'.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/Generator/main.py`:
- Around line 34-57: The ModelManager singleton is not thread-safe: both __new__
and __init__ use unsynchronized check-then-act on _instance and _is_initialized
causing multiple heavy initializations; fix by adding a class-level
threading.Lock (e.g., _init_lock) and use it to guard creation/initialization
with double-checked locking — acquire the lock in __new__ around the _instance
check before calling super().__new__, and/or acquire it at start of __init__ and
re-check _is_initialized inside the lock before loading models (update
references to ModelManager._instance and self._is_initialized accordingly) so
only one thread performs the heavy model loads.

---

Nitpick comments:
In `@backend/test_server.py`:
- Line 77: The test still uses the old name/label "test_get_answer" while the
endpoint variable was changed to '/get_shortq_answer'; rename the test function
and any log/assert messages to match the new endpoint (e.g., change function
name test_get_answer -> test_get_shortq_answer and update any printed/logged
strings or pytest ids), and update any references to the endpoint in the test
body that mention the old name so test output and labels reflect endpoint
variable endpoint = '/get_shortq_answer'.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and f525d73.

📒 Files selected for processing (2)

backend/Generator/main.py
backend/test_server.py

backend/Generator/main.py

… PR review

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/Generator/main.py`:
- Line 25: Remove the duplicate import of the threading module by keeping a
single "import threading" statement and deleting the redundant one; locate both
occurrences of the "import threading" import statements in main.py and remove
the second/redundant import so the module is only imported once (resolving the
Ruff F811 redefinition).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f525d73 and 97bf585.

📒 Files selected for processing (2)

backend/Generator/main.py
backend/test_server.py

backend/Generator/main.py

… PR review

Siddhazntx · 2026-03-03T11:41:29Z

@coderabbitai , I have addressed both of these findings in the latest commit:

Thread-Safety: Added a class-level threading.Lock() and implemented double-checked locking in both new and init within the ModelManager class. This ensures heavy model initialization is strictly single-threaded and prevents race conditions.
Test Naming: Renamed test_get_answer to test_get_shortq_answer in test_server.py and updated the corresponding print statements to accurately reflect the endpoint being tested.

Thanks for catching these!

coderabbitai · 2026-03-03T11:42:22Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify thread-safety implementation in ModelManager
echo "=== Checking ModelManager for threading.Lock implementation ==="
ast-grep --pattern $'class ModelManager:
  $$$
'

echo -e "\n=== Checking for threading import ==="
rg -n "import threading" backend/Generator/main.py

echo -e "\n=== Checking for Lock usage in ModelManager ==="
rg -n -A5 -B5 "threading.Lock" backend/Generator/main.py

echo -e "\n=== Checking test_server.py for test_get_shortq_answer ==="
rg -n "def test_get_shortq_answer" backend/test_server.py

echo -e "\n=== Checking for old test_get_answer function ==="
rg -n "def test_get_answer" backend/test_server.py

echo -e "\n=== Checking endpoint usage in test ==="
rg -n "get_shortq_answer" backend/test_server.py

Length of output: 3120

@Siddhazntx, excellent work! I've verified both fixes:

✅ Thread-Safety: Confirmed that threading.Lock() is properly declared as a class-level variable and double-checked locking is correctly implemented in both __new__ and __init__. The pattern ensures thread-safe singleton initialization.

✅ Test Naming: Verified that test_get_answer has been successfully renamed to test_get_shortq_answer throughout the test file, including the endpoint variable and test runner invocation.

Both issues are now fully addressed. Great job on the quick turnaround and thorough implementation! 🎉

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Siddhazntx added 3 commits February 19, 2026 23:15

fix: push NLI model and input tensors to device in AnswerPredictor

68b44bc

Optimize NLI model with eval(), no_grad(), and device-aware inputs

13cad6e

Refactor: Implement Singleton ModelManager to fix memory leak

f525d73

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

backend/Generator/main.py Outdated Show resolved Hide resolved

Fix: Add thread-safe locking to ModelManager and update test name per…

97bf585

… PR review

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

backend/Generator/main.py Show resolved Hide resolved

Fix: Add thread-safe locking to ModelManager and update test name per…

687653a

… PR review

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: Implement Singleton ModelManager to fix VRAM memory leak#521

Refactor: Implement Singleton ModelManager to fix VRAM memory leak#521
Siddhazntx wants to merge 5 commits intoAOSSIE-Org:mainfrom
Siddhazntx:enhancement/model-manager-singleton

Siddhazntx commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Siddhazntx commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Siddhazntx commented Mar 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressed Issues:

Screenshots/Recordings:

Additional Notes:

Checklist

AI Usage Disclosure

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Siddhazntx commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Siddhazntx commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading