Skip to content

Feature/pdf chunking rag prep#513

Open
zohaib-7035 wants to merge 5 commits intoAOSSIE-Org:mainfrom
zohaib-7035:feature/pdf-chunking-rag-prep
Open

Feature/pdf chunking rag prep#513
zohaib-7035 wants to merge 5 commits intoAOSSIE-Org:mainfrom
zohaib-7035:feature/pdf-chunking-rag-prep

Conversation

@zohaib-7035
Copy link

@zohaib-7035 zohaib-7035 commented Mar 2, 2026

Addressed Issues:

Adds a PDF chunking utility (utils/text_processor.py) for RAG preparation to prevent OOM errors and pipeline hangs when processing large documents (50+ pages).

Screenshots/Recordings:

image

All 38 unit tests pass locally in 0.36s.

Additional Notes:

  • Zero new dependencies — pure Python implementation using only stdlib
  • Backward-compatible: /upload endpoint still returns content as before, plus new chunks and num_chunks fields
  • Chunking uses a RecursiveCharacterTextSplitter-style separator hierarchy: paragraph → line → sentence → word → character

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.

I have used the following AI models and tools: Gemini (Antigravity coding assistant)

zohaib-7035 and others added 5 commits February 13, 2026 23:16
AOSSIE-Org#428)

When use_mediawiki=1 is passed but the MediaWiki API call fails (SSL
error, timeout, network unreachable, etc.), the entire request crashed
with a 500 response. Users saw a generic error and lost their input.

Changes:
- Wrap mediawikiapi.summary() in try/except inside process_input_text()
- On failure, log a warning and continue with the original input text
- Return a "warning" field in the JSON response so the frontend can
  notify users that Wikipedia enrichment was skipped
- All 7 endpoints that use MediaWiki are updated: /get_mcq, /get_boolq,
  /get_shortq, /get_problems, /get_shortq_hard, /get_mcq_hard,
  /get_boolq_hard
- Add conftest.py with session-scoped fixture to prevent heavy ML model
  loading during tests
- Add 30 new pytest tests covering SSL errors, connection errors,
  timeouts, successful calls, and text preservation

Co-authored-by: Cursor <cursoragent@cursor.com>
- Use defensive .get() with default [] on output["questions"] and
  output["Boolean_Questions"] to prevent KeyError when generator
  returns an empty dict (major)
- Rename test_ssl_error_hard_endpoints to
  test_connection_error_hard_endpoints to match actual exception (minor)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add python-pptx dependency to requirements.txt
- Implement extract_text_from_pptx() in FileProcessor class
  - Extracts text from text frames (titles, text boxes, placeholders)
  - Extracts text from table cells across all slides
- Update process_file() to route .pptx files to the new extractor
- Update frontend labels and file accept filters in:
  - eduaid_web (Text_Input.jsx)
  - extension (TextInput.jsx)
- Add 6 unit tests covering:
  - Simple text extraction
  - Table text extraction
  - Empty presentations
  - process_file routing for .pptx
  - Unsupported .ppt rejection
  - Multi-slide extraction

Closes AOSSIE-Org#361
- Add explicit .ppt handling with warning log for unsupported legacy format
- Rewrite tests to import the real FileProcessor from Generator.main
  (mocking heavy ML dependencies via sys.modules to avoid model loading)
- Sync upload hint text with accept filter in both web and extension
  ('PDF, PPTX, TXT, DOCX, MP3 supported')
Add a TextProcessor utility that breaks large documents into overlapping
'Context Blocks' before they reach the question-generation pipeline.
This prevents OOM errors and pipeline hangs when processing large PDFs
(50+ pages).

Changes:
- Add backend/utils/text_processor.py with RecursiveCharacterTextSplitter-
  style chunking (zero new dependencies, pure Python stdlib)
- Add process_file_chunked() method to FileProcessor in Generator/main.py
- Update /upload endpoint in server.py to return chunks alongside content
  (backward-compatible: existing 'content' field is preserved)
- Add 38 unit tests in test_text_processor.py covering chunking logic,
  overlap, edge cases, and metadata generation

The splitter uses a separator hierarchy (paragraph > line > sentence >
word > character) with configurable chunk_size (default 1000 chars) and
chunk_overlap (default 200 chars), keeping each chunk within the T5
model's 512-token limit.
@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

This pull request introduces PowerPoint (.pptx) text extraction support, chunked text processing via a new TextProcessor utility, and MediaWiki-based text enrichment with warning propagation. FileProcessor gains PPTX extraction and chunked processing methods; the Flask server integrates these features and propagates warnings across MCQ/boolean/short-answer generation endpoints. Comprehensive test coverage validates PPTX extraction, TextProcessor chunking behavior, and graceful MediaWiki fallback. Web UI updates reflect expanded file format support (PPTX, TXT, DOCX).

Changes

Cohort / File(s) Summary
Core Text Processing
backend/utils/text_processor.py, backend/utils/__init__.py
Introduces TextProcessor class with recursive boundary-based chunking algorithm supporting configurable chunk_size, chunk_overlap, and multiple separator granularities for RAG preparation.
File Processing Enhancements
backend/Generator/main.py
Adds extract_text_from_pptx method to extract text from PPTX slides, text frames, and tables; adds process_file_chunked for large document handling; initializes TextProcessor instance for chunked processing.
Server Integration
backend/server.py
Integrates TextProcessor and MediaWiki text enrichment; propagates wiki_warning across MCQ, BoolQ, ShortQ, and hard-variant endpoints; implements file chunking on upload with chunk metadata in responses.
Test Infrastructure
backend/conftest.py
Establishes centralized pytest configuration with session-scoped autouse fixtures to mock heavy ML dependencies (torch, transformers, nltk, MediaWikiAPI) before import; provides mock factories and public fixtures for Flask testing.
PPTX & Text Processing Tests
backend/test_pptx_extraction.py, backend/test_text_processor.py
Validates PPTX text extraction across text boxes, tables, and multiple slides; validates TextProcessor chunking behavior including boundary handling, overlap preservation, and metadata generation across edge cases.
MediaWiki Fallback Tests
backend/test_wikipedia_fallback.py
Tests graceful fallback when MediaWiki API fails across all generation endpoints; verifies warning propagation on failure, absence of warning on success, and correct text routing to generators.
UI Format Updates
eduaid_web/src/pages/Text_Input.jsx, extension/src/pages/text_input/TextInput.jsx
Updates supported file types text and input accept attributes to include PPTX, TXT, and DOCX alongside existing PDF and MP3 formats.
Dependencies
requirements.txt
Adds python-pptx for PPTX file handling.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server as Flask Server
    participant FileProc as FileProcessor
    participant TextProc as TextProcessor
    
    Client->>Server: POST /upload (PPTX file)
    Server->>FileProc: process_file(file)
    FileProc->>FileProc: extract_text_from_pptx()
    FileProc->>TextProc: chunk_document(text)
    TextProc->>TextProc: _recursive_split() with overlaps
    TextProc-->>FileProc: [chunks with metadata]
    FileProc-->>Server: extracted text + chunks
    Server-->>Client: {content, chunks, num_chunks}
Loading
sequenceDiagram
    participant Client
    participant Server as Flask Server
    participant MediaWiki as MediaWiki API
    participant Generator as MCQ/BoolQ/ShortQ
    
    Client->>Server: POST /get_mcq?text=...&use_mediawiki=1
    Server->>MediaWiki: fetch_enriched_text()
    alt MediaWiki Success
        MediaWiki-->>Server: enriched_text
        Server->>Generator: generate(enriched_text)
    else MediaWiki Failure
        MediaWiki-->>Server: Exception
        Server->>Generator: generate(original_text)
        Server->>Server: wiki_warning = "API failed"
    end
    Generator-->>Server: questions
    Server-->>Client: {output, warning?}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Hop, hop! New features bloom,
PPTX slides light up the room!
Chunks and warnings, flows so clean,
TextProcessor's the best we've seen.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Feature/pdf chunking rag prep' accurately describes a key feature introduced in the changeset. The PR adds TextProcessor for chunking documents, process_file_chunked() method, /upload endpoint updates for RAG preparation, and comprehensive test coverage. While the title uses 'pdf' and the actual scope is broader (PPTX, TXT, DOCX support also added), the primary technical feature—chunking for RAG—is clearly communicated.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
backend/server.py (1)

472-492: Consider passing source_type to chunk_document for better metadata.

The upload endpoint processes various file types (PDF, PPTX, DOCX, TXT), but chunking doesn't capture the source type in metadata. This could help downstream RAG pipelines.

💡 Suggested enhancement
     content = file_processor.process_file(file)
     
     if content:
+        # Derive source type from file extension
+        ext = file.filename.rsplit('.', 1)[-1].lower() if '.' in file.filename else 'unknown'
-        chunks = text_processor.chunk_document(content)
+        chunks = text_processor.chunk_document(content, source_type=ext)
         return jsonify({
             "content": content,
             "chunks": chunks,
             "num_chunks": len(chunks),
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 472 - 492, The upload_file endpoint currently
calls file_processor.process_file(file) and then
text_processor.chunk_document(content) without informing chunk_document of the
original source; update upload_file to determine a source_type (e.g., from
file.mimetype or file.filename extension) and pass it into
text_processor.chunk_document as a second argument (e.g.,
chunk_document(content, source_type)), propagate that source_type into the chunk
metadata inside chunk_document, and include source_type in the upload_file JSON
response alongside content, chunks, and num_chunks so downstream RAG pipelines
can use the source information.
backend/Generator/main.py (2)

408-415: Legacy .ppt format returns empty content silently.

The warning is logged, but the caller receives an empty string without any indication that the file format was unsupported. Consider returning an error indicator or raising an exception so upstream code can inform the user.

💡 Suggested improvement to surface the warning
         elif file.filename.endswith('.pptx'):
             content = self.extract_text_from_pptx(file_path)
         elif file.filename.endswith('.ppt'):
             import logging
             logging.warning(
                 "Legacy .ppt format is not supported. "
                 "Please convert to .pptx and try again."
             )
+            # Return a sentinel or raise so callers know extraction failed
+            content = ""  # Explicitly mark as unsupported
 
         os.remove(file_path)
-        return content
+        return content

Alternatively, return a tuple (content, warning) or raise a custom exception so the API layer can return a proper error response to the user.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 408 - 415, The branch handling legacy
.ppt files currently only logs a warning and returns no content; change it so
the caller can detect the unsupported format by raising an exception (e.g.,
raise ValueError or a custom UnsupportedFormatError) when
file.filename.endswith('.ppt') instead of only logging, and include a clear
message like "Legacy .ppt format is not supported; please convert to .pptx" so
upstream code (that calls extract_text_from_pptx / the file-handling path) can
catch the exception and return an appropriate user-facing error.

420-436: Verify file object state after process_file consumes it.

process_file calls file.save(file_path), which may consume the file stream. The subsequent access to file.filename (Line 432) should work since filename is a metadata attribute, but if process_file were ever modified to alter the file object, this could break.

The implementation is correct as-is, but consider extracting the filename before calling process_file for defensive coding.

♻️ Optional defensive refactor
     def process_file_chunked(self, file, chunk_size=1000, chunk_overlap=200):
         """Process file and return chunked text for large documents.

         Returns a list of chunk dicts (see TextProcessor.chunk_document).
         Falls back to an empty list when the file type is unsupported or
         the extracted text is empty.
         """
+        # Capture filename before process_file potentially modifies file object
+        filename = file.filename
         content = self.process_file(file)
         if not content:
             return []

         # Determine source type from filename
-        ext = os.path.splitext(file.filename)[1].lstrip('.').lower()
+        ext = os.path.splitext(filename)[1].lstrip('.').lower()
         return self.text_processor.chunk_document(
             content, source_type=ext or "unknown",
             chunk_size=chunk_size, chunk_overlap=chunk_overlap,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 420 - 436, Extract the
filename/extension from the incoming file before calling process_file to avoid
relying on file state after consumption: in process_file_chunked capture ext =
os.path.splitext(file.filename)[1].lstrip('.').lower() (or set "unknown" if
filename falsy) before calling self.process_file(file), then pass that ext into
self.text_processor.chunk_document instead of reading file.filename after
process_file; keep the existing fallback to an empty list when content is falsy.
backend/test_text_processor.py (3)

183-203: Overlap test has a weak assertion fallback.

The test first checks for substring overlap, then falls back to checking for shared words between the first two chunks only. The fallback could pass even if overlap isn't working correctly (e.g., common words like "the", "is").

Consider strengthening the assertion or documenting why word overlap is an acceptable proxy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_text_processor.py` around lines 183 - 203, The fallback
assertion in test_overlap_present is too weak because it only checks shared
words between chunks[0] and chunks[1], which can pass due to common stopwords;
update the test_overlap_present to iterate all adjacent chunk pairs returned by
TextProcessor.chunk_text (use the same chunk_size and chunk_overlap values), and
for each pair assert a stronger overlap: either that the last chunk_overlap
characters of chunks[i] appear at the start of chunks[i+1], or that there is at
least one shared n-gram of length >= min(3, chunk_overlap_word_count) after
removing common stopwords; reference the test function test_overlap_present, the
TextProcessor.chunk_text method, LONG_PARAGRAPH, and the chunk_overlap parameter
when implementing the stronger check.

165-173: Large tolerance (25%) for chunk size validation.

The test allows chunks up to 250 characters when chunk_size=200, which is a 25% tolerance. While the comment explains this is for edge cases with indivisible tokens, this seems generous.

Consider tightening the tolerance or adding a comment explaining the specific scenario that requires such a large margin.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_text_processor.py` around lines 165 - 173, The test
test_chunks_respect_size_limit uses a very large 25% tolerance for chunk size
when constructing TextProcessor(chunk_size=200) and asserting chunks from
TextProcessor.chunk_text(LONG_PARAGRAPH) are <=250; tighten this by reducing the
allowed overshoot (e.g., assert len(chunk) <= 220) or, if 250 is required,
replace the generic comment with a precise rationale describing the exact edge
case (e.g., unbreakable token/very long word or specific punctuation sequences)
that forces the larger overshoot so the test documents why
TextProcessor.chunk_text may emit chunks larger than chunk_size.

113-115: Update type hint to Optional[str] or remove the test.

The implementation does handle None gracefully via short-circuit evaluation (line 84: if not text or not text.strip(): return [] prevents calling .strip() on None). However, the type hint declares text: str, not Optional[str], which creates a contract violation.

Decide whether None handling is intentional: if yes, update the type hint to text: Optional[str]; if no, remove this test and rely on type checking to catch invalid calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_text_processor.py` around lines 113 - 115, The test
test_none_returns_empty asserts TextProcessor.chunk_text accepts None but the
method signature uses text: str; decide intent and fix accordingly—if None
should be supported, change the chunk_text signature to accept Optional[str]
(import Optional from typing) and update any docstring/annotations to reflect
Optional[str]; if None should not be accepted, remove the test
test_none_returns_empty from backend/test_text_processor.py so the type contract
remains text: str. Reference: TextProcessor.chunk_text and the test
test_none_returns_empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/test_wikipedia_fallback.py`:
- Around line 115-128: The test function
test_ssl_error_get_problems_returns_200_with_warning currently sets
wiki_mock.summary.side_effect = TimeoutError(...) but is intended to simulate an
SSL failure; update the test to raise an SSL-related exception (e.g., set
wiki_mock.summary.side_effect = ssl.SSLError("...") or
requests.exceptions.SSLError(...)) and update the docstring/name if you prefer
to reflect the actual exception; locate the failure injection at
wiki_mock.summary.side_effect in the
test_ssl_error_get_problems_returns_200_with_warning function and replace the
TimeoutError with an appropriate SSLError instance.

---

Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 408-415: The branch handling legacy .ppt files currently only logs
a warning and returns no content; change it so the caller can detect the
unsupported format by raising an exception (e.g., raise ValueError or a custom
UnsupportedFormatError) when file.filename.endswith('.ppt') instead of only
logging, and include a clear message like "Legacy .ppt format is not supported;
please convert to .pptx" so upstream code (that calls extract_text_from_pptx /
the file-handling path) can catch the exception and return an appropriate
user-facing error.
- Around line 420-436: Extract the filename/extension from the incoming file
before calling process_file to avoid relying on file state after consumption: in
process_file_chunked capture ext =
os.path.splitext(file.filename)[1].lstrip('.').lower() (or set "unknown" if
filename falsy) before calling self.process_file(file), then pass that ext into
self.text_processor.chunk_document instead of reading file.filename after
process_file; keep the existing fallback to an empty list when content is falsy.

In `@backend/server.py`:
- Around line 472-492: The upload_file endpoint currently calls
file_processor.process_file(file) and then
text_processor.chunk_document(content) without informing chunk_document of the
original source; update upload_file to determine a source_type (e.g., from
file.mimetype or file.filename extension) and pass it into
text_processor.chunk_document as a second argument (e.g.,
chunk_document(content, source_type)), propagate that source_type into the chunk
metadata inside chunk_document, and include source_type in the upload_file JSON
response alongside content, chunks, and num_chunks so downstream RAG pipelines
can use the source information.

In `@backend/test_text_processor.py`:
- Around line 183-203: The fallback assertion in test_overlap_present is too
weak because it only checks shared words between chunks[0] and chunks[1], which
can pass due to common stopwords; update the test_overlap_present to iterate all
adjacent chunk pairs returned by TextProcessor.chunk_text (use the same
chunk_size and chunk_overlap values), and for each pair assert a stronger
overlap: either that the last chunk_overlap characters of chunks[i] appear at
the start of chunks[i+1], or that there is at least one shared n-gram of length
>= min(3, chunk_overlap_word_count) after removing common stopwords; reference
the test function test_overlap_present, the TextProcessor.chunk_text method,
LONG_PARAGRAPH, and the chunk_overlap parameter when implementing the stronger
check.
- Around line 165-173: The test test_chunks_respect_size_limit uses a very large
25% tolerance for chunk size when constructing TextProcessor(chunk_size=200) and
asserting chunks from TextProcessor.chunk_text(LONG_PARAGRAPH) are <=250;
tighten this by reducing the allowed overshoot (e.g., assert len(chunk) <= 220)
or, if 250 is required, replace the generic comment with a precise rationale
describing the exact edge case (e.g., unbreakable token/very long word or
specific punctuation sequences) that forces the larger overshoot so the test
documents why TextProcessor.chunk_text may emit chunks larger than chunk_size.
- Around line 113-115: The test test_none_returns_empty asserts
TextProcessor.chunk_text accepts None but the method signature uses text: str;
decide intent and fix accordingly—if None should be supported, change the
chunk_text signature to accept Optional[str] (import Optional from typing) and
update any docstring/annotations to reflect Optional[str]; if None should not be
accepted, remove the test test_none_returns_empty from
backend/test_text_processor.py so the type contract remains text: str.
Reference: TextProcessor.chunk_text and the test test_none_returns_empty.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and 7892037.

📒 Files selected for processing (11)
  • backend/Generator/main.py
  • backend/conftest.py
  • backend/server.py
  • backend/test_pptx_extraction.py
  • backend/test_text_processor.py
  • backend/test_wikipedia_fallback.py
  • backend/utils/__init__.py
  • backend/utils/text_processor.py
  • eduaid_web/src/pages/Text_Input.jsx
  • extension/src/pages/text_input/TextInput.jsx
  • requirements.txt

Comment on lines +115 to +128
def test_ssl_error_get_problems_returns_200_with_warning(self, client):
"""SSL failure should NOT crash the combined /get_problems endpoint."""
with patch("server.mediawikiapi") as wiki_mock:
wiki_mock.summary.side_effect = TimeoutError("Connection timed out")
resp = client.post(
"/get_problems",
json={"input_text": SAMPLE_TEXT, "use_mediawiki": 1},
)
assert resp.status_code == 200
data = resp.get_json()
assert "output_mcq" in data
assert "output_boolq" in data
assert "output_shortq" in data
assert "warning" in data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test name doesn't match the exception being tested.

The test is named test_ssl_error_get_problems_returns_200_with_warning but raises TimeoutError instead of an SSL-related exception.

🔧 Suggested fix
-    def test_ssl_error_get_problems_returns_200_with_warning(self, client):
-        """SSL failure should NOT crash the combined /get_problems endpoint."""
+    def test_timeout_error_get_problems_returns_200_with_warning(self, client):
+        """Timeout failure should NOT crash the combined /get_problems endpoint."""
         with patch("server.mediawikiapi") as wiki_mock:
             wiki_mock.summary.side_effect = TimeoutError("Connection timed out")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_ssl_error_get_problems_returns_200_with_warning(self, client):
"""SSL failure should NOT crash the combined /get_problems endpoint."""
with patch("server.mediawikiapi") as wiki_mock:
wiki_mock.summary.side_effect = TimeoutError("Connection timed out")
resp = client.post(
"/get_problems",
json={"input_text": SAMPLE_TEXT, "use_mediawiki": 1},
)
assert resp.status_code == 200
data = resp.get_json()
assert "output_mcq" in data
assert "output_boolq" in data
assert "output_shortq" in data
assert "warning" in data
def test_timeout_error_get_problems_returns_200_with_warning(self, client):
"""Timeout failure should NOT crash the combined /get_problems endpoint."""
with patch("server.mediawikiapi") as wiki_mock:
wiki_mock.summary.side_effect = TimeoutError("Connection timed out")
resp = client.post(
"/get_problems",
json={"input_text": SAMPLE_TEXT, "use_mediawiki": 1},
)
assert resp.status_code == 200
data = resp.get_json()
assert "output_mcq" in data
assert "output_boolq" in data
assert "output_shortq" in data
assert "warning" in data
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_wikipedia_fallback.py` around lines 115 - 128, The test function
test_ssl_error_get_problems_returns_200_with_warning currently sets
wiki_mock.summary.side_effect = TimeoutError(...) but is intended to simulate an
SSL failure; update the test to raise an SSL-related exception (e.g., set
wiki_mock.summary.side_effect = ssl.SSLError("...") or
requests.exceptions.SSLError(...)) and update the docstring/name if you prefer
to reflect the actual exception; locate the failure injection at
wiki_mock.summary.side_effect in the
test_ssl_error_get_problems_returns_200_with_warning function and replace the
TimeoutError with an appropriate SSLError instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant