Skip to content

feat: update CustomIngestionPipeline to accept transformations parameter#33

Merged
amindadgar merged 3 commits intomainfrom
feat/support-custom-chunking-strategy
Oct 14, 2025
Merged

feat: update CustomIngestionPipeline to accept transformations parameter#33
amindadgar merged 3 commits intomainfrom
feat/support-custom-chunking-strategy

Conversation

@amindadgar
Copy link
Member

@amindadgar amindadgar commented Oct 14, 2025

Updated the version in setup.py to 1.4.8. Enhanced the CustomIngestionPipeline class to allow users to specify a list of transformations during the ingestion process, improving flexibility in document processing.

Summary by CodeRabbit

  • New Features

    • Added an option to customize the ingestion pipeline’s transformation steps, allowing custom processing flows while keeping the previous default behavior when none are provided.
  • Documentation

    • Updated usage guidance to explain the new ingestion customization option, defaults, and how to supply custom transformations.
  • Chores

    • Package version bumped to 1.4.8.

Updated the version in setup.py to 1.4.8. Enhanced the CustomIngestionPipeline class to allow users to specify a list of transformations during the ingestion process, improving flexibility in document processing.
@coderabbitai
Copy link

coderabbitai bot commented Oct 14, 2025

Walkthrough

Version bumped to 1.4.8 in setup.py. In tc_hivemind_backend/ingest_qdrant.py, CustomIngestionPipeline.run_pipeline signature now accepts an optional transformations: list[TransformComponent] = None; if not provided a default transformations pipeline is constructed and passed to IngestionPipeline.

Changes

Cohort / File(s) Summary of Changes
Version bump
setup.py
Incremented package version from 1.4.7 to 1.4.8.
Ingestion pipeline API
tc_hivemind_backend/ingest_qdrant.py
Added import TransformComponent. Updated run_pipeline signature to run_pipeline(self, docs: list[Document], transformations: list[TransformComponent] = None). If transformations is None, build default pipeline (SemanticSplitterNodeParser + embedding); otherwise use provided list; pass chosen transformations_pipeline to IngestionPipeline. Docstring updated.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant CIP as CustomIngestionPipeline
  participant IP as IngestionPipeline
  participant Split as SemanticSplitterNodeParser
  participant Embed as EmbeddingComponent

  Client->>CIP: run_pipeline(docs, transformations?)
  alt transformations provided
    CIP->>CIP: use provided transformations list
  else no transformations
    CIP->>Split: instantiate default splitter
    CIP->>Embed: instantiate default embedding
    note right of CIP#e6f7ff: build default transformations_pipeline
  end
  CIP->>IP: construct IngestionPipeline(with transformations_pipeline)
  IP->>IP: run pipeline on docs
  IP-->>CIP: return nodes
  CIP-->>Client: return nodes
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my whiskers, version set to new,
Paths for splitting — default or brought by you.
I stitch the transforms, tidy and spry,
Nodes hop out neat beneath the sky. 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely describes the primary change in the pull request by specifying the CustomIngestionPipeline enhancement to accept a transformations parameter, matching the key objective without extraneous information.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/support-custom-chunking-strategy

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 449dad5 and 243f400.

📒 Files selected for processing (1)
  • tc_hivemind_backend/ingest_qdrant.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tc_hivemind_backend/ingest_qdrant.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci / test / Test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84d2e8d and 449dad5.

📒 Files selected for processing (2)
  • setup.py (1 hunks)
  • tc_hivemind_backend/ingest_qdrant.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.14.0)
tc_hivemind_backend/ingest_qdrant.py

92-92: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci / test / Test
🔇 Additional comments (4)
setup.py (1)

9-9: LGTM!

Version bump to 1.4.8 appropriately reflects the feature addition in this PR.

tc_hivemind_backend/ingest_qdrant.py (3)

11-11: LGTM!

Import of TransformComponent is necessary for the type annotation in the updated run_pipeline method signature.


102-103: LGTM!

Documentation clearly describes the new transformations parameter, improving the method's usability.


127-138: LGTM!

The implementation correctly provides backward compatibility by using default transformations when none are specified, while allowing users to supply custom transformations for flexible document processing.

@amindadgar amindadgar merged commit 1ca5735 into main Oct 14, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant