Skip to content

[FEATURE] Add RAG-powered Chat with Quiz (multi-turn support)#522

Open
mahek2016 wants to merge 11 commits intoAOSSIE-Org:mainfrom
mahek2016:feature/chat-with-quiz-rag
Open

[FEATURE] Add RAG-powered Chat with Quiz (multi-turn support)#522
mahek2016 wants to merge 11 commits intoAOSSIE-Org:mainfrom
mahek2016:feature/chat-with-quiz-rag

Conversation

@mahek2016
Copy link

@mahek2016 mahek2016 commented Mar 3, 2026

Description

Fixes #508

This PR introduces an AI-powered “Chat with Quiz” feature using a Retrieval-Augmented Generation (RAG) pipeline.

The feature enables users to ask follow-up questions about quiz source content, such as:

  • Why is this answer correct?
  • Explain this in simpler terms.
  • Give an example of this concept.

Implementation Details

  • Text chunking with overlap for improved semantic retrieval
  • Sentence Transformers (all-MiniLM-L6-v2) for embeddings
  • FAISS vector index using cosine similarity (IndexFlatIP with L2 normalization)
  • T5-base for grounded answer generation
  • Multi-turn conversational support via chat history integration
  • Efficient re-index prevention to avoid rebuilding FAISS for the same document
  • Input validation for production-safe API handling

Scope

This PR implements:

  • Chat with Quiz functionality only

PDF-based chat and persistent storage are intentionally excluded and can be introduced in a separate PR to keep changes scoped and reviewable.


📸 Screenshots / Recordings

Not applicable.
This PR introduces backend functionality (RAG-based conversational layer) and does not modify the UI.


Testing

  • Verified semantic retrieval returns relevant chunks
  • Confirmed grounded generation works correctly with retrieved context
  • Tested multi-turn conversational flow
  • Ensured no regression in existing quiz generation features
  • No new warnings or errors observed

Checklist

  • My PR addresses a single issue
  • My code follows the project's code style and conventions
  • Documentation updated where applicable
  • No breaking changes introduced
  • My changes generate no new warnings or errors
  • I have joined the Discord server and will share this PR
  • I have read the Contribution Guidelines
  • I will address CodeRabbit review comments

🤖 AI Usage Disclosure

  • This PR contains AI-generated code.
  • This PR does not contain AI-generated code.

Summary by CodeRabbit

  • New Features

    • OCR support to extract text from images.
    • Document-informed chat using a retrieval-augmented generation service (new chat endpoint).
  • Bug Fixes / Improvements

    • Safer file handling: sanitized filenames, directory-traversal protection, robust extension-based processing, and uploaded files cleaned up after processing.
    • Keyword/paraphrase handling updated for more robust tokenization behavior.
  • Chores

    • Server startup and error logging refinements.

@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Warning

Rate limit exceeded

@mahek2016 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 4c28f60 and e768c17.

📒 Files selected for processing (2)
  • backend/Generator/main.py
  • backend/Generator/mcq.py
📝 Walkthrough

Walkthrough

Adds a new RAGService (FAISS + SentenceTransformer + T5) and a /chat endpoint; introduces OCR-based image text extraction and safer file handling in FileProcessor; modernizes tokenizer padding/truncation usage in MCQ generation and paraphrasing calls.

Changes

Cohort / File(s) Summary
RAG feature
backend/Generator/rag.py
New RAGService class with chunking, embedding (SentenceTransformer), L2-normalized FAISS IndexFlatIP indexing, atomic index_text, and query that assembles context + history and generates answers via T5 (device-aware).
Server integration / Endpoint
backend/server.py
Adds global rag_service and rag_lock, lazy initialization of RAGService, and new POST /chat endpoint that validates input, indexes document_text, queries RAGService with optional chat_history, and returns generated answers with error handling.
File processing & OCR
backend/Generator/main.py
Added extract_text_from_image(self, file_path) using OpenCV + pytesseract and cross-platform tesseract discovery; updated process_file() to use secure_filename, enforce absolute-path safety, handle .txt/.pdf/.docx and common image formats, save uploads only after checks, and ensure cleanup of saved files.
MCQ / Tokenizer updates
backend/Generator/mcq.py
Replaced deprecated pad_to_max_length=True usages with padding="max_length", truncation=True, return_tensors="pt" in batch encoding calls (also mirrored in paraphrase/tokenizer usage).
Paraphrase / minor fixes
backend/Generator/main.py (paraphrase sections)
Adjusted tokenizer encoding to use modern padding/truncation args and fixed minor indentation/bracket placement around model.generate calls; no functional change to generation logic.
Formatting / cleanup
backend/Generator/...
Small whitespace/formatting adjustments and removal of some extraneous newlines across generator files; changed some Sense2Vec usage to initialize s2v = None and pass None to keyword extraction calls.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Server as Server
    participant RAG as RAGService
    participant Embed as SentenceTransformer
    participant Index as FAISSIndex
    participant Gen as T5Generator

    User->>Server: POST /chat { document_text, question, chat_history? }
    Server->>RAG: index_text(document_text)
    RAG->>RAG: chunk_text(text, chunk_size=400, overlap=50)
    RAG->>Embed: embed(chunks)
    Embed-->>RAG: chunk embeddings
    RAG->>Index: build normalized FAISS index
    Index-->>RAG: index ready

    Server->>RAG: query(question, chat_history, top_k=3)
    RAG->>Embed: embed(question)
    Embed-->>RAG: question embedding
    RAG->>Index: search(top_k)
    Index-->>RAG: top_k chunks
    RAG->>RAG: assemble prompt (system + context + history + question)
    RAG->>Gen: generate(answer_prompt)
    Gen-->>RAG: decoded answer
    RAG-->>Server: answer
    Server-->>User: { answer, status: "success" }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled through chunks and climbed each stack,
Embeddings and OCR on my floppy back,
I built a small index and whispered a chat,
“Ask about your quiz — I’ll fetch answers back!” 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main feature being added: RAG-powered Chat with multi-turn support for quiz interactions.
Linked Issues check ✅ Passed The PR successfully implements core objectives from #508: RAG-based semantic search via FAISS, multi-turn chat with history, grounded answer generation with T5-base, and relevant chunk retrieval for quiz content context.
Out of Scope Changes check ✅ Passed Code changes align with PR scope (RAG chat for quiz only). Minor refactoring like tokenizer API updates and OCR extraction in main.py are supportive rather than out-of-scope additions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
backend/server.py (1)

33-35: Update startup logging to match lazy initialization behavior.

Line 34 says the RAG service is initialized, but Line 33 sets it to None until first /chat request.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 33 - 35, The startup log incorrectly claims
the RAG service is initialized while rag_service is set to None; update the
message to reflect lazy initialization by either removing or changing the
print("RAG SERVICE INITIALIZED SUCCESSFULLY") to a message like "RAG service
will be initialized on first /chat request" or move the success log into the
actual initialization code path that creates rag_service (the handler for the
/chat endpoint) so the success message is emitted only when rag_service is
assigned.
backend/Generator/rag.py (1)

96-96: Drop the unused FAISS distance binding.

distances is never consumed; replacing it with _ keeps intent clear.

Proposed fix
-        distances, indices = self.index.search(question_embedding, top_k)
+        _, indices = self.index.search(question_embedding, top_k)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/rag.py` at line 96, Replace the unused FAISS distance
binding by discarding it: in the call to self.index.search(question_embedding,
top_k) ignore the first returned value (currently assigned to distances) and
only capture indices (e.g., use _ , indices = self.index.search(...)); update
the assignment where distances is currently declared so intent is clear and no
unused variable remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/Generator/main.py`:
- Around line 4-5: There are duplicate imports of T5ForConditionalGeneration and
T5Tokenizer causing a lint (Ruff F811); update the import statement that
currently lists AutoModelForSequenceClassification, AutoTokenizer,
AutoModelForSeq2SeqLM, T5ForConditionalGeneration, T5Tokenizer to remove the
duplicated T5ForConditionalGeneration and T5Tokenizer symbols so only one import
of those classes remains (keep AutoModelForSequenceClassification,
AutoTokenizer, AutoModelForSeq2SeqLM in that statement and rely on the separate
T5 import line).
- Around line 406-413: process_file currently trusts file.filename when building
file_path which allows path traversal; before constructing file_path (and before
calling file.save) sanitize the uploaded name (e.g., use a safe helper like
werkzeug.utils.secure_filename or at minimum os.path.basename) and then join
with self.upload_folder; additionally resolve and validate the final absolute
path is inside self.upload_folder (compare
os.path.abspath(file_path).startswith(os.path.abspath(self.upload_folder))) and
reject or raise if the check fails to prevent ../ traversal (update references
in process_file, file_path, upload_folder, file.save, and filename).

In `@backend/Generator/rag.py`:
- Around line 31-36: The chunk_text method currently computes the range step as
(chunk_size - overlap) which can be zero or negative if overlap >= chunk_size;
add a parameter validation at the start of chunk_text to raise a ValueError (or
adjust behavior) when overlap is not less than chunk_size and when chunk_size <=
0 or overlap < 0; update the function (chunk_text) to either normalize the
values or explicitly raise with a clear message referencing chunk_size and
overlap before using them to compute the range step so the loop for i in
range(0, len(words), chunk_size - overlap) never receives a non-positive step.
- Around line 47-74: The update to self.current_text is happening before
embedding/index creation succeeds, which can leave the object in a stale state;
change the function to build everything into temporary variables first (e.g.,
temp_chunks = self.chunk_text(text), temp_embeddings =
self.embedder.encode(...), normalize and cast temp_embeddings, create temp_index
= faiss.IndexFlatIP(...); temp_index.add(...)) and only after all steps succeed
assign self.text_chunks = temp_chunks, self.index = temp_index, self.dimension =
temp_embeddings.shape[1], and finally self.current_text = text; also wrap the
embedding/index creation in a try/except to avoid partial assignments on failure
(references: self.current_text, chunk_text, embedder.encode, faiss.normalize_L2,
faiss.IndexFlatIP, self.index, self.dimension).

In `@backend/server.py`:
- Around line 39-48: Several service objects (MCQGen, AnswerPredictor/answer,
BoolQGen, ShortQGen, qg/QuestionGenerator, docs_service/GoogleDocsService,
qa_model) were commented out but remain referenced by route handlers, causing
NameError; either restore their initializations (uncomment and ensure proper
imports/credentials like SERVICE_ACCOUNT_FILE/SCOPES for GoogleDocsService and
model pipeline call for qa_model) or add guards in each dependent handler to
check for None/availability and return a 503/meaningful error. Locate the
commented declarations (MCQGen, answer, BoolQGen, ShortQGen, qg, docs_service,
qa_model) and either re-enable their creation with proper error
handling/logging, or wrap routes that use methods on these instances with
existence checks (e.g., if not MCQGen: return error) to prevent runtime
NameError.
- Around line 499-520: Concurrent requests can interleave calls to the shared
rag_service (created from RAGService) causing wrong document state; fix by
adding a synchronization primitive (e.g., a global threading.Lock named
rag_lock) and acquire it around initialization and the pair of operations that
must be atomic: rag_service.index_text(...) and rag_service.query(...); ensure
you also protect the RAGService instantiation (the rag_service = RAGService()
block) with the same lock so initialization and per-request index/query are
serialized, and release the lock in a finally block or via a context manager to
avoid deadlocks.
- Around line 505-528: The exception handler in the /chat endpoint returns raw
exception text (str(e)) to clients; change it to log the full exception
server-side (including stack trace) and return a safe, generic JSON error
message to the client (e.g., {"error":"Internal server error"}) with a 500
status; keep the existing input validation that returns 400 for missing
document_text/question, and consider catching and mapping known/expected
exceptions from rag_service.index_text or rag_service.query to appropriate 4xx
responses, but never expose internal exception strings in the response.

---

Nitpick comments:
In `@backend/Generator/rag.py`:
- Line 96: Replace the unused FAISS distance binding by discarding it: in the
call to self.index.search(question_embedding, top_k) ignore the first returned
value (currently assigned to distances) and only capture indices (e.g., use _ ,
indices = self.index.search(...)); update the assignment where distances is
currently declared so intent is clear and no unused variable remains.

In `@backend/server.py`:
- Around line 33-35: The startup log incorrectly claims the RAG service is
initialized while rag_service is set to None; update the message to reflect lazy
initialization by either removing or changing the print("RAG SERVICE INITIALIZED
SUCCESSFULLY") to a message like "RAG service will be initialized on first /chat
request" or move the success log into the actual initialization code path that
creates rag_service (the handler for the /chat endpoint) so the success message
is emitted only when rag_service is assigned.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and 7263bc7.

📒 Files selected for processing (4)
  • backend/Generator/main.py
  • backend/Generator/mcq.py
  • backend/Generator/rag.py
  • backend/server.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
backend/server.py (1)

507-512: ⚠️ Potential issue | 🟠 Major

Validate request JSON shape in /chat before field access.

Malformed or non-object JSON currently falls into a 500 path instead of a client 400.

Proposed fix
-        data = request.get_json()
+        data = request.get_json(silent=True) or {}
+        if not isinstance(data, dict):
+            return jsonify({"error": "Invalid JSON body"}), 400

         document_text = data.get("document_text")
         question = data.get("question")
         chat_history = data.get("chat_history", [])
+
+        if not isinstance(chat_history, list):
+            return jsonify({"error": "chat_history must be a list"}), 400
+        if any(not isinstance(turn, dict) for turn in chat_history):
+            return jsonify({"error": "chat_history entries must be objects"}), 400

Also applies to: 532-534

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 507 - 512, The handler reads
request.get_json() and directly accesses data.get(...), causing 500s on
malformed or non-object JSON; update the /chat request handling to validate that
request.get_json() returned a dict (and not None) before field access, returning
a 400 client error for invalid JSON; then validate required fields document_text
and question are present and are strings, and ensure chat_history if provided is
a list (or default to an empty list) before using it; apply the same validation
pattern to the other spot where request.get_json() is used (the block that also
accesses document_text/question/chat_history).
🧹 Nitpick comments (1)
backend/server.py (1)

520-526: Consider reducing lock scope for better chat throughput.

Current global lock includes retrieval + generation, so one slow request blocks all users. A per-document index cache (keyed by content hash) plus narrower write locking would scale better.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 520 - 526, The current global rag_lock around
rag_service creation, index_text and query blocks all requests; narrow the lock
to only create or update per-document indexes by introducing a per-document
cache keyed by a content hash (use document_text hash) and protect that cache
with a short-lived lock instead of rag_lock; keep RAGService.query(chat_history,
question) outside the lock so concurrent queries run in parallel; modify code
that references rag_service, RAGService, index_text and query to check the
per-document cache, acquire the lock only when inserting a new RAGService or
calling index_text for that document, and then release before calling query.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/Generator/main.py`:
- Around line 393-428: The new extract_text_from_image function is implemented
but never called from the upload pipeline; update process_file to detect image
uploads (by MIME type or file extension) and call
extract_text_from_image(file_path) when an uploaded file is an image, then
attach the returned OCR text (e.g., into the file metadata, a text field, or the
same return structure process_file uses) so image uploads produce extracted
text; ensure errors from extract_text_from_image are handled the same way other
extraction failures are handled in process_file.
- Line 2: Remove the conflicting import of safe_name from pkg_resources and rely
on the local safe_name used in process_file; specifically delete the line
importing safe_name (the top-level "from pkg_resources import safe_name") so it
no longer collides with the local variable named safe_name in the function
process_file, then run linters to confirm Ruff F811 is resolved.
- Around line 443-460: The uploaded temp file saved via file.save(file_path) is
never removed; update the function to delete file_path after extraction (whether
success or error) by moving the extraction logic into a try/except and adding a
finally block that calls os.remove(file_path) (or pathlib.Path.unlink() with an
exists check) so both branches that call self.extract_text_from_pdf and
self.extract_text_from_docx (and the .txt read branch) always remove the temp
file; ensure removal itself is wrapped to ignore FileNotFoundError and does not
mask the original extraction exception.

---

Duplicate comments:
In `@backend/server.py`:
- Around line 507-512: The handler reads request.get_json() and directly
accesses data.get(...), causing 500s on malformed or non-object JSON; update the
/chat request handling to validate that request.get_json() returned a dict (and
not None) before field access, returning a 400 client error for invalid JSON;
then validate required fields document_text and question are present and are
strings, and ensure chat_history if provided is a list (or default to an empty
list) before using it; apply the same validation pattern to the other spot where
request.get_json() is used (the block that also accesses
document_text/question/chat_history).

---

Nitpick comments:
In `@backend/server.py`:
- Around line 520-526: The current global rag_lock around rag_service creation,
index_text and query blocks all requests; narrow the lock to only create or
update per-document indexes by introducing a per-document cache keyed by a
content hash (use document_text hash) and protect that cache with a short-lived
lock instead of rag_lock; keep RAGService.query(chat_history, question) outside
the lock so concurrent queries run in parallel; modify code that references
rag_service, RAGService, index_text and query to check the per-document cache,
acquire the lock only when inserting a new RAGService or calling index_text for
that document, and then release before calling query.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7263bc7 and 4fda52a.

📒 Files selected for processing (3)
  • backend/Generator/main.py
  • backend/Generator/rag.py
  • backend/server.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/Generator/main.py`:
- Around line 433-443: The code currently writes uploads directly to file_path
built from safe_name (in Generator.main.py using self.upload_folder, safe_name,
file_path and file.save), which allows concurrent requests with the same
filename to overwrite each other; change the upload flow to generate a unique
temp filename (e.g., include a UUID or use tempfile to create a unique name in
self.upload_folder), write to that unique path instead of file_path, then pass
that unique path to the downstream extraction logic and cleanup; ensure
references to file_path/file.save are updated to use the unique filename and
that any function expecting safe_name or file_path (e.g., the extraction caller)
receives the generated unique path.
- Around line 61-69: The crash occurs because is_word_available in mcq.py calls
s2v_model.get_best_sense without handling s2v_model=None; update
is_word_available to check if s2v_model is None before calling get_best_sense
(e.g., skip sense2vec-based filtering and return the non-sense2vec availability
result or True/appropriate fallback) so identify_keywords can safely pass
s2v_model=None; reference the symbols is_word_available, s2v_model, and
get_best_sense (and identify_keywords which passes s2v_model) when making the
change.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fda52a and 4c28f60.

📒 Files selected for processing (1)
  • backend/Generator/main.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: AI-powered "Chat with your Quiz" — ask follow-up questions on quiz content

1 participant