Skip to content

feat: Add PPTX file upload support for quiz generation (#361)#492

Open
zohaib-7035 wants to merge 5 commits intoAOSSIE-Org:mainfrom
zohaib-7035:feature/add-pptx-support
Open

feat: Add PPTX file upload support for quiz generation (#361)#492
zohaib-7035 wants to merge 5 commits intoAOSSIE-Org:mainfrom
zohaib-7035:feature/add-pptx-support

Conversation

@zohaib-7035
Copy link

@zohaib-7035 zohaib-7035 commented Feb 26, 2026

Summary

Closes #361

Adds support for .pptx (PowerPoint) file uploads, allowing users to extract text from presentations for quiz generation.

Changes

Backend

  • Added python-pptx dependency to requirements.txt
  • Implemented extract_text_from_pptx() method in FileProcessor class (backend/Generator/main.py)
    • Extracts text from text frames (titles, text boxes, body placeholders)
    • Extracts text from table cells across all slides
  • Updated process_file() to route .pptx files to the new extractor

Frontend

  • Updated file upload label from "PDF, MP3 supported" → "PDF, PPTX, MP3 supported" in:
    • eduaid_web/src/pages/Text_Input.jsx
    • extension/src/pages/text_input/TextInput.jsx
  • Added accept attribute to file inputs to filter for supported file types

Tests

  • Added backend/test_pptx_extraction.py with 6 unit tests:
    • Simple text extraction from textboxes
    • Table cell text extraction
    • Empty presentation handling
    • process_file() routing for .pptx
    • Unsupported .ppt (legacy binary) rejection
    • Multi-slide text concatenation

How to Test

cd backend
pip install python-pptx
python -m pytest test_pptx_extraction.py -v

All 6 tests pass ✅

Notes

  • Only .pptx (modern XML-based format) is supported. Legacy .ppt (binary format) is not supported by python-pptx and will gracefully return empty content.
  • The implementation follows the same pattern as the existing PDF (PyMuPDF) and DOCX (mammoth) file processors.

Summary by CodeRabbit

  • New Features

    • PowerPoint (.pptx) text extraction support
    • Wikipedia enrichment fallback: requests proceed and return a user-facing warning if enrichment fails
    • File upload UI now lists supported formats (PDF, PPTX, TXT, DOCX, MP3) and restricts selectable types
  • Tests

    • Added tests for PPTX extraction and for Wikipedia enrichment fallback across endpoints

zohaib-7035 and others added 3 commits February 13, 2026 23:16
AOSSIE-Org#428)

When use_mediawiki=1 is passed but the MediaWiki API call fails (SSL
error, timeout, network unreachable, etc.), the entire request crashed
with a 500 response. Users saw a generic error and lost their input.

Changes:
- Wrap mediawikiapi.summary() in try/except inside process_input_text()
- On failure, log a warning and continue with the original input text
- Return a "warning" field in the JSON response so the frontend can
  notify users that Wikipedia enrichment was skipped
- All 7 endpoints that use MediaWiki are updated: /get_mcq, /get_boolq,
  /get_shortq, /get_problems, /get_shortq_hard, /get_mcq_hard,
  /get_boolq_hard
- Add conftest.py with session-scoped fixture to prevent heavy ML model
  loading during tests
- Add 30 new pytest tests covering SSL errors, connection errors,
  timeouts, successful calls, and text preservation

Co-authored-by: Cursor <cursoragent@cursor.com>
- Use defensive .get() with default [] on output["questions"] and
  output["Boolean_Questions"] to prevent KeyError when generator
  returns an empty dict (major)
- Rename test_ssl_error_hard_endpoints to
  test_connection_error_hard_endpoints to match actual exception (minor)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add python-pptx dependency to requirements.txt
- Implement extract_text_from_pptx() in FileProcessor class
  - Extracts text from text frames (titles, text boxes, placeholders)
  - Extracts text from table cells across all slides
- Update process_file() to route .pptx files to the new extractor
- Update frontend labels and file accept filters in:
  - eduaid_web (Text_Input.jsx)
  - extension (TextInput.jsx)
- Add 6 unit tests covering:
  - Simple text extraction
  - Table text extraction
  - Empty presentations
  - process_file routing for .pptx
  - Unsupported .ppt rejection
  - Multi-slide extraction

Closes AOSSIE-Org#361
@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

Adds PPTX upload/extraction support (new FileProcessor in backend/utils/file_processor.py), moves FileProcessor export from Generator.main, updates requirements and frontends to accept .pptx, adds tests for PPTX extraction and MediaWiki fallback, and introduces test fixtures that mock heavy ML/models and services; server now logs and returns optional MediaWiki warning.

Changes

Cohort / File(s) Summary
FileProcessor relocation & implementation
backend/Generator/main.py, backend/utils/file_processor.py, backend/Generator/__init__.py
Removed FileProcessor from Generator/main.py; added full FileProcessor implementation in backend/utils/file_processor.py with extract_text_from_pdf/docx/pptx and process_file; __init__ now imports FileProcessor from utils.file_processor.
PPTX dependency
requirements.txt
Added python-pptx to support PPTX parsing.
Backend server: wiki enrichment & logging
backend/server.py
Added logging setup and process_input_text() to attempt MediaWiki enrichment; endpoints unpack (text, wiki_warning) and include optional warning in responses when enrichment fails.
Testing: fixtures & mocks
backend/conftest.py
New comprehensive pytest fixtures that mock heavy ML models, NLTK/transformers downloads, MediaWiki, QA pipeline, file processor and docs service to enable fast, isolated tests.
PPTX extraction tests
backend/test_pptx_extraction.py
New test module with in-file FileProcessor variant and tests for PPTX extraction (text frames, tables, multi-slide), process_file routing, and unsupported .ppt.
MediaWiki fallback tests
backend/test_wikipedia_fallback.py
New tests covering MediaWiki failures (SSL, Connection, Timeout, generic) across endpoints; expect HTTP 200 outputs plus warning when enrichment fails and no warning on success or when disabled.
Frontend: accept .pptx
eduaid_web/src/pages/Text_Input.jsx, extension/src/pages/text_input/TextInput.jsx
Added ALLOWED_FILE_TYPES constant; UI text and file input accept updated to include .pptx (and other supported types); minor import formatting fixes.
New/changed tests & CI surface
backend/test_*
Added test modules and mocks; no production API signature changes but response payloads may include optional warning.

Sequence Diagrams

sequenceDiagram
    participant User as User/Client
    participant Frontend as Frontend
    participant Backend as Backend Server
    participant FileProc as File Processor
    participant Generator as Quiz Generator

    User->>Frontend: Upload PPTX file
    Frontend->>Backend: POST /upload (file)
    Backend->>FileProc: process_file(file)
    FileProc->>FileProc: extract_text_from_pptx()
    FileProc-->>Backend: return extracted text
    Backend->>Generator: generate_quiz(extracted_text)
    Generator-->>Backend: quiz results
    Backend-->>Frontend: HTTP 200 JSON (output)
    Frontend-->>User: Display quiz
Loading
sequenceDiagram
    participant User as User/Client
    participant Frontend as Frontend
    participant Backend as Backend Server
    participant MediaWiki as MediaWiki API
    participant Generator as Quiz Generator

    User->>Frontend: Submit text (use_mediawiki=1)
    Frontend->>Backend: POST /get_mcq (text, flag)
    Backend->>Backend: process_input_text(text)
    Backend->>MediaWiki: attempt enrichment
    alt MediaWiki Success
        MediaWiki-->>Backend: enriched_text
        Backend->>Generator: generate_mcq(enriched_text)
        Generator-->>Backend: MCQs
        Backend-->>Frontend: response (output, no warning)
    else MediaWiki Failure
        MediaWiki--xBackend: error/timeout
        Backend->>Backend: log warning, use original text
        Backend->>Generator: generate_mcq(original_text)
        Generator-->>Backend: MCQs
        Backend-->>Frontend: response (output + warning)
    end
    Frontend-->>User: Display quiz (+/- warning)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble slides and hop through frames so bright,
I pull the words from tables, make quizzes overnight.
Tests guard my burrow and warnings keep us wise,
PPTX now hops along — hooray for learning skies! 🥕📊

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main feature addition—PPTX file upload support for quiz generation—which aligns with the primary objective of the PR.
Linked Issues check ✅ Passed The PR fulfills all coding-related requirements from issue #361: PPTX file uploads are supported with text extraction from slides and tables, backend and frontend are updated, file type validation is added via accept attributes, and tests validate the implementation.
Out of Scope Changes check ✅ Passed All changes are directly related to PPTX support and issue #361. Additional improvements include refactored conftest.py for testing infrastructure and MediaWiki integration with graceful error handling, which extend quality assurance but remain within the reasonable scope of the feature work.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/Generator/main.py (1)

394-410: ⚠️ Potential issue | 🟠 Major

Temp upload cleanup is not guaranteed on extraction failures.

If any extractor raises (e.g., malformed PPTX), execution won’t reach Line 409, leaving orphaned files in uploads/.

🛠️ Proposed hardening
 def process_file(self, file):
     file_path = os.path.join(self.upload_folder, file.filename)
     file.save(file_path)
     content = ""
-
-    if file.filename.endswith('.txt'):
-        with open(file_path, 'r') as f:
-            content = f.read()
-    elif file.filename.endswith('.pdf'):
-        content = self.extract_text_from_pdf(file_path)
-    elif file.filename.endswith('.docx'):
-        content = self.extract_text_from_docx(file_path)
-    elif file.filename.endswith('.pptx'):
-        content = self.extract_text_from_pptx(file_path)
-
-    os.remove(file_path)
-    return content
+    ext = os.path.splitext(file.filename)[1].lower()
+    try:
+        if ext == '.txt':
+            with open(file_path, 'r', encoding='utf-8') as f:
+                content = f.read()
+        elif ext == '.pdf':
+            content = self.extract_text_from_pdf(file_path)
+        elif ext == '.docx':
+            content = self.extract_text_from_docx(file_path)
+        elif ext == '.pptx':
+            content = self.extract_text_from_pptx(file_path)
+    finally:
+        if os.path.exists(file_path):
+            os.remove(file_path)
+    return content
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 394 - 410, The process_file function
can leave uploaded files in uploads/ if an extractor raises because
os.remove(file_path) is only called on the normal path; wrap the save/processing
in a try/finally so file_path is always removed (or use a context-managed temp
file), e.g., ensure the block that calls extract_text_from_pdf,
extract_text_from_docx, extract_text_from_pptx or reads the .txt is inside a try
and move os.remove(file_path) into the finally so cleanup always runs; keep
raising or logging the original exception after cleanup so calling code can
handle extractor errors.
backend/server.py (1)

456-464: ⚠️ Potential issue | 🔴 Critical

/get_boolq_hard currently calls the generator with an invalid answer style.

answer_style="true_false" is not supported by QuestionGenerator and will fail at runtime. Also, Line 463 applies make_question_harder to each whole item instead of question text.

✅ Safe fix (use existing bool generator path)
 `@app.route`("/get_boolq_hard", methods=["POST"])
 def get_boolq_hard():
     data = request.get_json()
     input_text = data.get("input_text", "")
     use_mediawiki = data.get("use_mediawiki", 0)
-    input_questions = data.get("input_question", [])
+    max_questions = data.get("max_questions", 4)

     input_text, wiki_warning = process_input_text(input_text, use_mediawiki)

-    # Generate questions using the same QG model
-    generated = qg.generate(
-        article=input_text,
-        num_questions=input_questions,
-        answer_style="true_false"
-    )
-
-    # Apply transformation to make each question harder
-    harder_questions = [make_question_harder(q) for q in generated]
+    generated = BoolQGen.generate_boolq(
+        {"input_text": input_text, "max_questions": max_questions}
+    )
+    harder_questions = [
+        make_question_harder(q)
+        for q in generated.get("Boolean_Questions", [])
+    ]

     result = {"output": harder_questions}
     if wiki_warning:
         result["warning"] = wiki_warning
     return jsonify(result)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 456 - 464, The call to qg.generate uses an
invalid answer_style ("true_false") and then maps make_question_harder over
entire items instead of their question text; update the qg.generate invocation
to use the boolean-compatible generator path or a supported answer_style (e.g.,
the existing bool generator variant used elsewhere in the project) so generation
succeeds, assign its result to generated as before, and change the list
comprehension that builds harder_questions to call make_question_harder on the
question string within each generated item (e.g.,
make_question_harder(item["question"])) rather than on the whole item.
🧹 Nitpick comments (1)
eduaid_web/src/pages/Text_Input.jsx (1)

188-190: Consider deriving upload copy from a shared allowed-types constant.

This same list is duplicated in both web and extension UIs. Defining one shared constant (or helper) for accepted extensions + display label would prevent drift like the current copy mismatch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_web/src/pages/Text_Input.jsx` around lines 188 - 190, Derive the
upload copy and accept list from a single shared constant instead of hardcoding
them in the JSX: create an exported constant (e.g., ALLOWED_FILE_TYPES or
allowedFileTypes with shape {accept: ".pdf,.pptx,.txt,.docx,.mp3", label: "PDF,
PPTX, MP3 supported"}) and import it where Text_Input.jsx uses fileInputRef and
handleFileUpload; replace the hardcoded accept attribute and the paragraph text
("Choose a file...") to reference allowedFileTypes.accept and
allowedFileTypes.label so both the input accept and visible copy stay in sync
(also update the extension UI to import the same constant).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/Generator/main.py`:
- Around line 406-407: The routing currently only matches
file.filename.endswith('.pptx') and therefore rejects .ppt files; add explicit
handling for '.ppt' by detecting file.filename.endswith('.ppt') and either (a)
call a new helper extract_text_from_ppt(file_path) that extracts text from
legacy PPTs, or (b) convert the .ppt to .pptx (e.g., via soffice/LibreOffice
headless conversion) and then call the existing
extract_text_from_pptx(file_path_converted); update or add the helper function
name (extract_text_from_ppt) or conversion flow and ensure the branch logic
calls the correct helper so both .ppt and .pptx are supported by the upload
processing path.

In `@backend/test_pptx_extraction.py`:
- Around line 16-55: The test file currently defines its own FileProcessor class
which duplicates production logic and masks regressions; replace the local class
by importing the real implementation (or extract the shared functionality into a
small module and import that from both places). Specifically, remove the
duplicate FileProcessor in this test and import the production FileProcessor
from the module containing the real class (or move
extract_text_from_pptx/process_file into a new shared module and have both
backend/Generator/main.py and backend/test_pptx_extraction.py import that
module) so tests exercise the actual production symbols (FileProcessor,
extract_text_from_pptx, process_file).

In `@extension/src/pages/text_input/TextInput.jsx`:
- Around line 237-244: The upload hint text "PDF, PPTX, MP3 supported" is out of
sync with the input accept list (which also allows .txt and .docx); update the
hint text near the JSX where the upload message is rendered to reflect the
actual accepted types or remove .txt/.docx from the accept attribute if those
types shouldn't be allowed—locate the hint element adjacent to the <input
type="file" ref={fileInputRef} onChange={handleFileUpload}
accept=".pdf,.pptx,.txt,.docx,.mp3" and make the text match the accept list (or
vice versa) so users see the correct supported file types.

---

Outside diff comments:
In `@backend/Generator/main.py`:
- Around line 394-410: The process_file function can leave uploaded files in
uploads/ if an extractor raises because os.remove(file_path) is only called on
the normal path; wrap the save/processing in a try/finally so file_path is
always removed (or use a context-managed temp file), e.g., ensure the block that
calls extract_text_from_pdf, extract_text_from_docx, extract_text_from_pptx or
reads the .txt is inside a try and move os.remove(file_path) into the finally so
cleanup always runs; keep raising or logging the original exception after
cleanup so calling code can handle extractor errors.

In `@backend/server.py`:
- Around line 456-464: The call to qg.generate uses an invalid answer_style
("true_false") and then maps make_question_harder over entire items instead of
their question text; update the qg.generate invocation to use the
boolean-compatible generator path or a supported answer_style (e.g., the
existing bool generator variant used elsewhere in the project) so generation
succeeds, assign its result to generated as before, and change the list
comprehension that builds harder_questions to call make_question_harder on the
question string within each generated item (e.g.,
make_question_harder(item["question"])) rather than on the whole item.

---

Nitpick comments:
In `@eduaid_web/src/pages/Text_Input.jsx`:
- Around line 188-190: Derive the upload copy and accept list from a single
shared constant instead of hardcoding them in the JSX: create an exported
constant (e.g., ALLOWED_FILE_TYPES or allowedFileTypes with shape {accept:
".pdf,.pptx,.txt,.docx,.mp3", label: "PDF, PPTX, MP3 supported"}) and import it
where Text_Input.jsx uses fileInputRef and handleFileUpload; replace the
hardcoded accept attribute and the paragraph text ("Choose a file...") to
reference allowedFileTypes.accept and allowedFileTypes.label so both the input
accept and visible copy stay in sync (also update the extension UI to import the
same constant).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and 5c9f086.

📒 Files selected for processing (8)
  • backend/Generator/main.py
  • backend/conftest.py
  • backend/server.py
  • backend/test_pptx_extraction.py
  • backend/test_wikipedia_fallback.py
  • eduaid_web/src/pages/Text_Input.jsx
  • extension/src/pages/text_input/TextInput.jsx
  • requirements.txt

Comment on lines +406 to +407
elif file.filename.endswith('.pptx'):
content = self.extract_text_from_pptx(file_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Current implementation does not satisfy the stated .ppt / .pptx objective.

Routing was added only for .pptx. .ppt remains unsupported (and tests currently codify rejection). If .ppt is truly in scope for issue #361 objectives, this is still incomplete.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 406 - 407, The routing currently only
matches file.filename.endswith('.pptx') and therefore rejects .ppt files; add
explicit handling for '.ppt' by detecting file.filename.endswith('.ppt') and
either (a) call a new helper extract_text_from_ppt(file_path) that extracts text
from legacy PPTs, or (b) convert the .ppt to .pptx (e.g., via
soffice/LibreOffice headless conversion) and then call the existing
extract_text_from_pptx(file_path_converted); update or add the helper function
name (extract_text_from_ppt) or conversion flow and ensure the branch logic
calls the correct helper so both .ppt and .pptx are supported by the upload
processing path.

- Add explicit .ppt handling with warning log for unsupported legacy format
- Rewrite tests to import the real FileProcessor from Generator.main
  (mocking heavy ML dependencies via sys.modules to avoid model loading)
- Sync upload hint text with accept filter in both web and extension
  ('PDF, PPTX, TXT, DOCX, MP3 supported')
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
backend/utils/file_processor.py (1)

13-18: Consider closing the PDF document explicitly.

fitz.open() returns a Document object that holds file resources. While it will be garbage-collected eventually, explicitly closing it is more robust, especially if processing many files.

♻️ Proposed fix using context manager
     def extract_text_from_pdf(self, file_path):
-        doc = fitz.open(file_path)
-        text = ""
-        for page in doc:
-            text += page.get_text()
-        return text
+        with fitz.open(file_path) as doc:
+            text = ""
+            for page in doc:
+                text += page.get_text()
+            return text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/file_processor.py` around lines 13 - 18, The
extract_text_from_pdf function opens a fitz Document with fitz.open(file_path)
but never closes it, risking leaked file handles; update extract_text_from_pdf
to either use a context manager (with fitz.open(file_path) as doc:) or ensure
doc.close() in a finally block so the Document is always closed after reading
pages (refer to extract_text_from_pdf and the fitz.open/doc usage to locate the
change).
backend/test_pptx_extraction.py (1)

82-82: Minor: Unused lambda parameter.

The dest parameter in the mock's side_effect lambda is unused. You can use _ to indicate it's intentionally ignored.

♻️ Optional fix
-        mock_file.save = MagicMock(side_effect=lambda dest: None)
+        mock_file.save = MagicMock(side_effect=lambda _: None)

Apply similarly on line 97.

Also applies to: 97-97

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_pptx_extraction.py` at line 82, The test uses
MagicMock(side_effect=lambda dest: None) for mock_file.save with an unused
parameter name; update the side_effect lambdas to use a throwaway parameter name
(e.g., lambda _: None) for mock_file.save (and the other occurrence at the
second lambda mentioned) so the unused argument is explicitly ignored; locate
the MagicMock side_effect definitions for mock_file.save to apply the change.
backend/server.py (1)

68-79: Consider catching more specific exceptions for MediaWiki failures.

While the broad Exception catch ensures graceful degradation (per the docstring), it may mask unrelated programming errors. The MediaWikiAPI typically raises specific exceptions like MediaWikiException or network-related errors.

Also, the log message on line 72 uses an EN DASH () instead of a standard hyphen (-).

♻️ Suggested improvement
-        except Exception:
+        except (OSError, ConnectionError, TimeoutError, Exception) as exc:
             logger.warning(
-                "Wikipedia enrichment failed – continuing without it.",
+                "Wikipedia enrichment failed - continuing without it.",
                 exc_info=True,
             )

If you want to be more precise, you could import and catch mediawikiapi.MediaWikiException specifically, falling back to a broader catch for unexpected errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 68 - 79, The broad except around
mediawikiapi.summary should be narrowed: import and catch
mediawikiapi.MediaWikiException (and/or the library's network error class)
specifically to handle expected MediaWiki failures, log via logger.warning(...,
exc_info=True) for that specific exception, and then add a final fallback except
Exception to preserve graceful degradation; also replace the EN DASH in the
logger warning message with a standard hyphen. Reference mediawikiapi.summary,
mediawikiapi.MediaWikiException, logger.warning and the existing return block
when applying the change.
backend/Generator/main.py (1)

25-25: Remove unused import: Presentation from pptx.

This import is not used anywhere in main.py. Since FileProcessor (which uses Presentation) was moved to backend/utils/file_processor.py, this is leftover dead code from the refactoring.

♻️ Proposed fix
-from pptx import Presentation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` at line 25, Remove the unused import
"Presentation" from pptx at the top of main.py; this was left over after
FileProcessor was moved (FileProcessor now uses Presentation in its own module),
so delete the line "from pptx import Presentation" and run a quick search in
main.py for any remaining references to Presentation to confirm no other changes
are needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/Generator/main.py`:
- Line 25: Remove the unused import "Presentation" from pptx at the top of
main.py; this was left over after FileProcessor was moved (FileProcessor now
uses Presentation in its own module), so delete the line "from pptx import
Presentation" and run a quick search in main.py for any remaining references to
Presentation to confirm no other changes are needed.

In `@backend/server.py`:
- Around line 68-79: The broad except around mediawikiapi.summary should be
narrowed: import and catch mediawikiapi.MediaWikiException (and/or the library's
network error class) specifically to handle expected MediaWiki failures, log via
logger.warning(..., exc_info=True) for that specific exception, and then add a
final fallback except Exception to preserve graceful degradation; also replace
the EN DASH in the logger warning message with a standard hyphen. Reference
mediawikiapi.summary, mediawikiapi.MediaWikiException, logger.warning and the
existing return block when applying the change.

In `@backend/test_pptx_extraction.py`:
- Line 82: The test uses MagicMock(side_effect=lambda dest: None) for
mock_file.save with an unused parameter name; update the side_effect lambdas to
use a throwaway parameter name (e.g., lambda _: None) for mock_file.save (and
the other occurrence at the second lambda mentioned) so the unused argument is
explicitly ignored; locate the MagicMock side_effect definitions for
mock_file.save to apply the change.

In `@backend/utils/file_processor.py`:
- Around line 13-18: The extract_text_from_pdf function opens a fitz Document
with fitz.open(file_path) but never closes it, risking leaked file handles;
update extract_text_from_pdf to either use a context manager (with
fitz.open(file_path) as doc:) or ensure doc.close() in a finally block so the
Document is always closed after reading pages (refer to extract_text_from_pdf
and the fitz.open/doc usage to locate the change).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c9f086 and ce560b7.

📒 Files selected for processing (8)
  • backend/Generator/__init__.py
  • backend/Generator/main.py
  • backend/conftest.py
  • backend/server.py
  • backend/test_pptx_extraction.py
  • backend/utils/file_processor.py
  • eduaid_web/src/pages/Text_Input.jsx
  • extension/src/pages/text_input/TextInput.jsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • eduaid_web/src/pages/Text_Input.jsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Support for Uploading PPT Files to Generate Quizzes

1 participant