Feature/pdf chunking rag prep by zohaib-7035 · Pull Request #513 · AOSSIE-Org/EduAid

zohaib-7035 · 2026-03-02T19:18:00Z

Addressed Issues:

Adds a PDF chunking utility (utils/text_processor.py) for RAG preparation to prevent OOM errors and pipeline hangs when processing large documents (50+ pages).

Screenshots/Recordings:

All 38 unit tests pass locally in 0.36s.

Additional Notes:

Zero new dependencies — pure Python implementation using only stdlib
Backward-compatible: /upload endpoint still returns content as before, plus new chunks and num_chunks fields
Chunking uses a RecursiveCharacterTextSplitter-style separator hierarchy: paragraph → line → sentence → word → character

Checklist

My PR addresses a single issue, fixes a single bug or makes a single improvement.
My code follows the project's code style and conventions
If applicable, I have made corresponding changes or additions to the documentation
If applicable, I have made corresponding changes or additions to tests
My changes generate no new warnings or errors
I have read the Contribution Guidelines
Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.

I have used the following AI models and tools: Gemini (Antigravity coding assistant)

AOSSIE-Org#428) When use_mediawiki=1 is passed but the MediaWiki API call fails (SSL error, timeout, network unreachable, etc.), the entire request crashed with a 500 response. Users saw a generic error and lost their input. Changes: - Wrap mediawikiapi.summary() in try/except inside process_input_text() - On failure, log a warning and continue with the original input text - Return a "warning" field in the JSON response so the frontend can notify users that Wikipedia enrichment was skipped - All 7 endpoints that use MediaWiki are updated: /get_mcq, /get_boolq, /get_shortq, /get_problems, /get_shortq_hard, /get_mcq_hard, /get_boolq_hard - Add conftest.py with session-scoped fixture to prevent heavy ML model loading during tests - Add 30 new pytest tests covering SSL errors, connection errors, timeouts, successful calls, and text preservation Co-authored-by: Cursor <cursoragent@cursor.com>

- Use defensive .get() with default [] on output["questions"] and output["Boolean_Questions"] to prevent KeyError when generator returns an empty dict (major) - Rename test_ssl_error_hard_endpoints to test_connection_error_hard_endpoints to match actual exception (minor) Co-authored-by: Cursor <cursoragent@cursor.com>

- Add python-pptx dependency to requirements.txt - Implement extract_text_from_pptx() in FileProcessor class - Extracts text from text frames (titles, text boxes, placeholders) - Extracts text from table cells across all slides - Update process_file() to route .pptx files to the new extractor - Update frontend labels and file accept filters in: - eduaid_web (Text_Input.jsx) - extension (TextInput.jsx) - Add 6 unit tests covering: - Simple text extraction - Table text extraction - Empty presentations - process_file routing for .pptx - Unsupported .ppt rejection - Multi-slide extraction Closes AOSSIE-Org#361

- Add explicit .ppt handling with warning log for unsupported legacy format - Rewrite tests to import the real FileProcessor from Generator.main (mocking heavy ML dependencies via sys.modules to avoid model loading) - Sync upload hint text with accept filter in both web and extension ('PDF, PPTX, TXT, DOCX, MP3 supported')

Add a TextProcessor utility that breaks large documents into overlapping 'Context Blocks' before they reach the question-generation pipeline. This prevents OOM errors and pipeline hangs when processing large PDFs (50+ pages). Changes: - Add backend/utils/text_processor.py with RecursiveCharacterTextSplitter- style chunking (zero new dependencies, pure Python stdlib) - Add process_file_chunked() method to FileProcessor in Generator/main.py - Update /upload endpoint in server.py to return chunks alongside content (backward-compatible: existing 'content' field is preserved) - Add 38 unit tests in test_text_processor.py covering chunking logic, overlap, edge cases, and metadata generation The splitter uses a separator hierarchy (paragraph > line > sentence > word > character) with configurable chunk_size (default 1000 chars) and chunk_overlap (default 200 chars), keeping each chunk within the T5 model's 512-token limit.

coderabbitai · 2026-03-02T19:18:21Z

📝 Walkthrough

Walkthrough

This pull request introduces PowerPoint (.pptx) text extraction support, chunked text processing via a new TextProcessor utility, and MediaWiki-based text enrichment with warning propagation. FileProcessor gains PPTX extraction and chunked processing methods; the Flask server integrates these features and propagates warnings across MCQ/boolean/short-answer generation endpoints. Comprehensive test coverage validates PPTX extraction, TextProcessor chunking behavior, and graceful MediaWiki fallback. Web UI updates reflect expanded file format support (PPTX, TXT, DOCX).

Changes

Cohort / File(s)	Summary
Core Text Processing `backend/utils/text_processor.py`, `backend/utils/__init__.py`	Introduces TextProcessor class with recursive boundary-based chunking algorithm supporting configurable chunk_size, chunk_overlap, and multiple separator granularities for RAG preparation.
File Processing Enhancements `backend/Generator/main.py`	Adds extract_text_from_pptx method to extract text from PPTX slides, text frames, and tables; adds process_file_chunked for large document handling; initializes TextProcessor instance for chunked processing.
Server Integration `backend/server.py`	Integrates TextProcessor and MediaWiki text enrichment; propagates wiki_warning across MCQ, BoolQ, ShortQ, and hard-variant endpoints; implements file chunking on upload with chunk metadata in responses.
Test Infrastructure `backend/conftest.py`	Establishes centralized pytest configuration with session-scoped autouse fixtures to mock heavy ML dependencies (torch, transformers, nltk, MediaWikiAPI) before import; provides mock factories and public fixtures for Flask testing.
PPTX & Text Processing Tests `backend/test_pptx_extraction.py`, `backend/test_text_processor.py`	Validates PPTX text extraction across text boxes, tables, and multiple slides; validates TextProcessor chunking behavior including boundary handling, overlap preservation, and metadata generation across edge cases.
MediaWiki Fallback Tests `backend/test_wikipedia_fallback.py`	Tests graceful fallback when MediaWiki API fails across all generation endpoints; verifies warning propagation on failure, absence of warning on success, and correct text routing to generators.
UI Format Updates `eduaid_web/src/pages/Text_Input.jsx`, `extension/src/pages/text_input/TextInput.jsx`	Updates supported file types text and input accept attributes to include PPTX, TXT, and DOCX alongside existing PDF and MP3 formats.
Dependencies `requirements.txt`	Adds python-pptx for PPTX file handling.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server as Flask Server
    participant FileProc as FileProcessor
    participant TextProc as TextProcessor
    
    Client->>Server: POST /upload (PPTX file)
    Server->>FileProc: process_file(file)
    FileProc->>FileProc: extract_text_from_pptx()
    FileProc->>TextProc: chunk_document(text)
    TextProc->>TextProc: _recursive_split() with overlaps
    TextProc-->>FileProc: [chunks with metadata]
    FileProc-->>Server: extracted text + chunks
    Server-->>Client: {content, chunks, num_chunks}

sequenceDiagram
    participant Client
    participant Server as Flask Server
    participant MediaWiki as MediaWiki API
    participant Generator as MCQ/BoolQ/ShortQ
    
    Client->>Server: POST /get_mcq?text=...&use_mediawiki=1
    Server->>MediaWiki: fetch_enriched_text()
    alt MediaWiki Success
        MediaWiki-->>Server: enriched_text
        Server->>Generator: generate(enriched_text)
    else MediaWiki Failure
        MediaWiki-->>Server: Exception
        Server->>Generator: generate(original_text)
        Server->>Server: wiki_warning = "API failed"
    end
    Generator-->>Server: questions
    Server-->>Client: {output, warning?}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Hop, hop! New features bloom,
PPTX slides light up the room!
Chunks and warnings, flows so clean,
TextProcessor's the best we've seen. ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Feature/pdf chunking rag prep' accurately describes a key feature introduced in the changeset. The PR adds TextProcessor for chunking documents, process_file_chunked() method, /upload endpoint updates for RAG preparation, and comprehensive test coverage. While the title uses 'pdf' and the actual scope is broader (PPTX, TXT, DOCX support also added), the primary technical feature—chunking for RAG—is clearly communicated.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

backend/server.py (1)

472-492: Consider passing source_type to chunk_document for better metadata.

The upload endpoint processes various file types (PDF, PPTX, DOCX, TXT), but chunking doesn't capture the source type in metadata. This could help downstream RAG pipelines.

💡 Suggested enhancement

     content = file_processor.process_file(file)
     
     if content:
+        # Derive source type from file extension
+        ext = file.filename.rsplit('.', 1)[-1].lower() if '.' in file.filename else 'unknown'
-        chunks = text_processor.chunk_document(content)
+        chunks = text_processor.chunk_document(content, source_type=ext)
         return jsonify({
             "content": content,
             "chunks": chunks,
             "num_chunks": len(chunks),
         })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 472 - 492, The upload_file endpoint currently
calls file_processor.process_file(file) and then
text_processor.chunk_document(content) without informing chunk_document of the
original source; update upload_file to determine a source_type (e.g., from
file.mimetype or file.filename extension) and pass it into
text_processor.chunk_document as a second argument (e.g.,
chunk_document(content, source_type)), propagate that source_type into the chunk
metadata inside chunk_document, and include source_type in the upload_file JSON
response alongside content, chunks, and num_chunks so downstream RAG pipelines
can use the source information.

backend/Generator/main.py (2)

408-415: Legacy .ppt format returns empty content silently.

The warning is logged, but the caller receives an empty string without any indication that the file format was unsupported. Consider returning an error indicator or raising an exception so upstream code can inform the user.

💡 Suggested improvement to surface the warning

         elif file.filename.endswith('.pptx'):
             content = self.extract_text_from_pptx(file_path)
         elif file.filename.endswith('.ppt'):
             import logging
             logging.warning(
                 "Legacy .ppt format is not supported. "
                 "Please convert to .pptx and try again."
             )
+            # Return a sentinel or raise so callers know extraction failed
+            content = ""  # Explicitly mark as unsupported
 
         os.remove(file_path)
-        return content
+        return content

Alternatively, return a tuple (content, warning) or raise a custom exception so the API layer can return a proper error response to the user.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 408 - 415, The branch handling legacy
.ppt files currently only logs a warning and returns no content; change it so
the caller can detect the unsupported format by raising an exception (e.g.,
raise ValueError or a custom UnsupportedFormatError) when
file.filename.endswith('.ppt') instead of only logging, and include a clear
message like "Legacy .ppt format is not supported; please convert to .pptx" so
upstream code (that calls extract_text_from_pptx / the file-handling path) can
catch the exception and return an appropriate user-facing error.

420-436: Verify file object state after process_file consumes it.

process_file calls file.save(file_path), which may consume the file stream. The subsequent access to file.filename (Line 432) should work since filename is a metadata attribute, but if process_file were ever modified to alter the file object, this could break.

The implementation is correct as-is, but consider extracting the filename before calling process_file for defensive coding.

♻️ Optional defensive refactor

     def process_file_chunked(self, file, chunk_size=1000, chunk_overlap=200):
         """Process file and return chunked text for large documents.

         Returns a list of chunk dicts (see TextProcessor.chunk_document).
         Falls back to an empty list when the file type is unsupported or
         the extracted text is empty.
         """
+        # Capture filename before process_file potentially modifies file object
+        filename = file.filename
         content = self.process_file(file)
         if not content:
             return []

         # Determine source type from filename
-        ext = os.path.splitext(file.filename)[1].lstrip('.').lower()
+        ext = os.path.splitext(filename)[1].lstrip('.').lower()
         return self.text_processor.chunk_document(
             content, source_type=ext or "unknown",
             chunk_size=chunk_size, chunk_overlap=chunk_overlap,
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 420 - 436, Extract the
filename/extension from the incoming file before calling process_file to avoid
relying on file state after consumption: in process_file_chunked capture ext =
os.path.splitext(file.filename)[1].lstrip('.').lower() (or set "unknown" if
filename falsy) before calling self.process_file(file), then pass that ext into
self.text_processor.chunk_document instead of reading file.filename after
process_file; keep the existing fallback to an empty list when content is falsy.

backend/test_text_processor.py (3)

183-203: Overlap test has a weak assertion fallback.

The test first checks for substring overlap, then falls back to checking for shared words between the first two chunks only. The fallback could pass even if overlap isn't working correctly (e.g., common words like "the", "is").

Consider strengthening the assertion or documenting why word overlap is an acceptable proxy.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_text_processor.py` around lines 183 - 203, The fallback
assertion in test_overlap_present is too weak because it only checks shared
words between chunks[0] and chunks[1], which can pass due to common stopwords;
update the test_overlap_present to iterate all adjacent chunk pairs returned by
TextProcessor.chunk_text (use the same chunk_size and chunk_overlap values), and
for each pair assert a stronger overlap: either that the last chunk_overlap
characters of chunks[i] appear at the start of chunks[i+1], or that there is at
least one shared n-gram of length >= min(3, chunk_overlap_word_count) after
removing common stopwords; reference the test function test_overlap_present, the
TextProcessor.chunk_text method, LONG_PARAGRAPH, and the chunk_overlap parameter
when implementing the stronger check.
165-173: Large tolerance (25%) for chunk size validation.

The test allows chunks up to 250 characters when chunk_size=200, which is a 25% tolerance. While the comment explains this is for edge cases with indivisible tokens, this seems generous.

Consider tightening the tolerance or adding a comment explaining the specific scenario that requires such a large margin.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_text_processor.py` around lines 165 - 173, The test
test_chunks_respect_size_limit uses a very large 25% tolerance for chunk size
when constructing TextProcessor(chunk_size=200) and asserting chunks from
TextProcessor.chunk_text(LONG_PARAGRAPH) are <=250; tighten this by reducing the
allowed overshoot (e.g., assert len(chunk) <= 220) or, if 250 is required,
replace the generic comment with a precise rationale describing the exact edge
case (e.g., unbreakable token/very long word or specific punctuation sequences)
that forces the larger overshoot so the test documents why
TextProcessor.chunk_text may emit chunks larger than chunk_size.
113-115: Update type hint to Optional[str] or remove the test.

The implementation does handle None gracefully via short-circuit evaluation (line 84: if not text or not text.strip(): return [] prevents calling .strip() on None). However, the type hint declares text: str, not Optional[str], which creates a contract violation.

Decide whether None handling is intentional: if yes, update the type hint to text: Optional[str]; if no, remove this test and rely on type checking to catch invalid calls.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_text_processor.py` around lines 113 - 115, The test
test_none_returns_empty asserts TextProcessor.chunk_text accepts None but the
method signature uses text: str; decide intent and fix accordingly—if None
should be supported, change the chunk_text signature to accept Optional[str]
(import Optional from typing) and update any docstring/annotations to reflect
Optional[str]; if None should not be accepted, remove the test
test_none_returns_empty from backend/test_text_processor.py so the type contract
remains text: str. Reference: TextProcessor.chunk_text and the test
test_none_returns_empty.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/test_wikipedia_fallback.py`:
- Around line 115-128: The test function
test_ssl_error_get_problems_returns_200_with_warning currently sets
wiki_mock.summary.side_effect = TimeoutError(...) but is intended to simulate an
SSL failure; update the test to raise an SSL-related exception (e.g., set
wiki_mock.summary.side_effect = ssl.SSLError("...") or
requests.exceptions.SSLError(...)) and update the docstring/name if you prefer
to reflect the actual exception; locate the failure injection at
wiki_mock.summary.side_effect in the
test_ssl_error_get_problems_returns_200_with_warning function and replace the
TimeoutError with an appropriate SSLError instance.

---

Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 408-415: The branch handling legacy .ppt files currently only logs
a warning and returns no content; change it so the caller can detect the
unsupported format by raising an exception (e.g., raise ValueError or a custom
UnsupportedFormatError) when file.filename.endswith('.ppt') instead of only
logging, and include a clear message like "Legacy .ppt format is not supported;
please convert to .pptx" so upstream code (that calls extract_text_from_pptx /
the file-handling path) can catch the exception and return an appropriate
user-facing error.
- Around line 420-436: Extract the filename/extension from the incoming file
before calling process_file to avoid relying on file state after consumption: in
process_file_chunked capture ext =
os.path.splitext(file.filename)[1].lstrip('.').lower() (or set "unknown" if
filename falsy) before calling self.process_file(file), then pass that ext into
self.text_processor.chunk_document instead of reading file.filename after
process_file; keep the existing fallback to an empty list when content is falsy.

In `@backend/server.py`:
- Around line 472-492: The upload_file endpoint currently calls
file_processor.process_file(file) and then
text_processor.chunk_document(content) without informing chunk_document of the
original source; update upload_file to determine a source_type (e.g., from
file.mimetype or file.filename extension) and pass it into
text_processor.chunk_document as a second argument (e.g.,
chunk_document(content, source_type)), propagate that source_type into the chunk
metadata inside chunk_document, and include source_type in the upload_file JSON
response alongside content, chunks, and num_chunks so downstream RAG pipelines
can use the source information.

In `@backend/test_text_processor.py`:
- Around line 183-203: The fallback assertion in test_overlap_present is too
weak because it only checks shared words between chunks[0] and chunks[1], which
can pass due to common stopwords; update the test_overlap_present to iterate all
adjacent chunk pairs returned by TextProcessor.chunk_text (use the same
chunk_size and chunk_overlap values), and for each pair assert a stronger
overlap: either that the last chunk_overlap characters of chunks[i] appear at
the start of chunks[i+1], or that there is at least one shared n-gram of length
>= min(3, chunk_overlap_word_count) after removing common stopwords; reference
the test function test_overlap_present, the TextProcessor.chunk_text method,
LONG_PARAGRAPH, and the chunk_overlap parameter when implementing the stronger
check.
- Around line 165-173: The test test_chunks_respect_size_limit uses a very large
25% tolerance for chunk size when constructing TextProcessor(chunk_size=200) and
asserting chunks from TextProcessor.chunk_text(LONG_PARAGRAPH) are <=250;
tighten this by reducing the allowed overshoot (e.g., assert len(chunk) <= 220)
or, if 250 is required, replace the generic comment with a precise rationale
describing the exact edge case (e.g., unbreakable token/very long word or
specific punctuation sequences) that forces the larger overshoot so the test
documents why TextProcessor.chunk_text may emit chunks larger than chunk_size.
- Around line 113-115: The test test_none_returns_empty asserts
TextProcessor.chunk_text accepts None but the method signature uses text: str;
decide intent and fix accordingly—if None should be supported, change the
chunk_text signature to accept Optional[str] (import Optional from typing) and
update any docstring/annotations to reflect Optional[str]; if None should not be
accepted, remove the test test_none_returns_empty from
backend/test_text_processor.py so the type contract remains text: str.
Reference: TextProcessor.chunk_text and the test test_none_returns_empty.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and 7892037.

📒 Files selected for processing (11)

backend/Generator/main.py
backend/conftest.py
backend/server.py
backend/test_pptx_extraction.py
backend/test_text_processor.py
backend/test_wikipedia_fallback.py
backend/utils/__init__.py
backend/utils/text_processor.py
eduaid_web/src/pages/Text_Input.jsx
extension/src/pages/text_input/TextInput.jsx
requirements.txt

coderabbitai · 2026-03-02T19:22:43Z

backend/test_wikipedia_fallback.py

+    def test_ssl_error_get_problems_returns_200_with_warning(self, client):
+        """SSL failure should NOT crash the combined /get_problems endpoint."""
+        with patch("server.mediawikiapi") as wiki_mock:
+            wiki_mock.summary.side_effect = TimeoutError("Connection timed out")
+            resp = client.post(
+                "/get_problems",
+                json={"input_text": SAMPLE_TEXT, "use_mediawiki": 1},
+            )
+        assert resp.status_code == 200
+        data = resp.get_json()
+        assert "output_mcq" in data
+        assert "output_boolq" in data
+        assert "output_shortq" in data
+        assert "warning" in data


⚠️ Potential issue | 🟡 Minor

Test name doesn't match the exception being tested.

The test is named test_ssl_error_get_problems_returns_200_with_warning but raises TimeoutError instead of an SSL-related exception.

🔧 Suggested fix

- def test_ssl_error_get_problems_returns_200_with_warning(self, client): - """SSL failure should NOT crash the combined /get_problems endpoint.""" + def test_timeout_error_get_problems_returns_200_with_warning(self, client): + """Timeout failure should NOT crash the combined /get_problems endpoint.""" with patch("server.mediawikiapi") as wiki_mock: wiki_mock.summary.side_effect = TimeoutError("Connection timed out")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_ssl_error_get_problems_returns_200_with_warning(self, client):

"""SSL failure should NOT crash the combined /get_problems endpoint."""

with patch("server.mediawikiapi") as wiki_mock:

wiki_mock.summary.side_effect = TimeoutError("Connection timed out")

resp = client.post(

"/get_problems",

json={"input_text": SAMPLE_TEXT, "use_mediawiki": 1},

)

assert resp.status_code == 200

data = resp.get_json()

assert "output_mcq" in data

assert "output_boolq" in data

assert "output_shortq" in data

assert "warning" in data

def test_timeout_error_get_problems_returns_200_with_warning(self, client):

"""Timeout failure should NOT crash the combined /get_problems endpoint."""

with patch("server.mediawikiapi") as wiki_mock:

wiki_mock.summary.side_effect = TimeoutError("Connection timed out")

resp = client.post(

"/get_problems",

json={"input_text": SAMPLE_TEXT, "use_mediawiki": 1},

)

assert resp.status_code == 200

data = resp.get_json()

assert "output_mcq" in data

assert "output_boolq" in data

assert "output_shortq" in data

assert "warning" in data

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/test_wikipedia_fallback.py` around lines 115 - 128, The test function test_ssl_error_get_problems_returns_200_with_warning currently sets wiki_mock.summary.side_effect = TimeoutError(...) but is intended to simulate an SSL failure; update the test to raise an SSL-related exception (e.g., set wiki_mock.summary.side_effect = ssl.SSLError("...") or requests.exceptions.SSLError(...)) and update the docstring/name if you prefer to reflect the actual exception; locate the failure injection at wiki_mock.summary.side_effect in the test_ssl_error_get_problems_returns_200_with_warning function and replace the TimeoutError with an appropriate SSLError instance.

zohaib-7035 and others added 5 commits February 13, 2026 23:16

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/pdf chunking rag prep#513

Feature/pdf chunking rag prep#513
zohaib-7035 wants to merge 5 commits intoAOSSIE-Org:mainfrom
zohaib-7035:feature/pdf-chunking-rag-prep

zohaib-7035 commented Mar 2, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 2, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zohaib-7035 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressed Issues:

Screenshots/Recordings:

Additional Notes:

Checklist

Uh oh!

coderabbitai bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zohaib-7035 commented Mar 2, 2026 •

edited

Loading

coderabbitai bot commented Mar 2, 2026 •

edited

Loading