-
Notifications
You must be signed in to change notification settings - Fork 298
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Version
main
Which installation method(s) does this occur on?
Source
Describe the bug.
Summary
Audit of api/ found missing core dependencies (used in api/src but not declared) and dependencies that are only needed for tests and should be moved to an optional test extra.
1. Add missing core dependencies
These packages are imported in api/src but not listed in api/pyproject.toml. They should be added under [project] dependencies.
| Package | PyPI name | Notes |
|---|---|---|
| numpy | numpy |
Used in yolox, ocr, pdfium, transforms, table_and_chart, etc. |
| pypdfium2 | pypdfium2 |
Used in pdf util, PDF engines, metadata aggregators, pptx_helper. |
| requests | requests |
Used in rest client, nim client, helpers, tika engine. |
| OpenCV | opencv-python |
Imported as cv2 in transforms and model_interface/helpers. |
| Pillow | Pillow |
Imported as PIL in transforms, aggregators, image_helpers, cached. |
| gRPC | grpcio |
Imported as grpc in parakeet model interface. |
| scikit-learn | scikit-learn |
Imported as sklearn; used in table_and_chart.py for sklearn.cluster.DBSCAN. |
| redis | redis |
Used in util/service_clients/redis/redis_client.py. |
| python-docx | python-docx |
Imported as docx in docx extractor (internal/extract/docx/.../docxreader.py). |
| python-pptx | python-pptx |
Imported as pptx in pptx helper (internal/extract/pptx/engines/pptx_helper.py). |
| minio | minio |
Used in internal/store/embed_text_upload.py for Minio client. |
| pymilvus | pymilvus |
Used in internal/store/embed_text_upload.py for Collection, connections, bulk writer. |
| aiohttp | aiohttp |
Used in internal/extract/pdf/engines/llama.py for async HTTP. |
| scipy | scipy |
Used in internal/primitives/nim/model_interface/parakeet.py (scipy.io.wavfile). |
| nvidia-riva-client | nvidia-riva-client |
Imported as riva.client in parakeet model interface. |
| unstructured-client | unstructured-client |
Used in internal/extract/pdf/engines/unstructured_io.py. |
| tqdm | tqdm |
Used in util/dataloader/dataloader.py. |
| python-dateutil | python-dateutil |
Imported as dateutil in util/converters/datetools.py. |
| fastparquet | fastparquet |
Used in util/converters/dftools.py. |
Optional: Add openai if the LLM summarizer UDF (api/src/udfs/llm_summarizer_udf.py) is part of the shipped package.
GPU / optional: cudf is used in util/converters/dftools.py; consider adding as an optional extra (e.g. gpu or cudf) rather than a core dependency.
2. Move test-only dependencies out of core
These are currently in dependencies but are only used by tests. Move them into [project.optional-dependencies] (e.g. a test extra).
| Package | Action |
|---|---|
| moviepy | Remove from core dependencies; add to optional-dependencies (e.g. test). Only used in api_tests/util/dataloader/ (dataloader_test_tools, test_dataloader_video). |
| pydantic-settings | Remove from core dependencies (not used in api src or api_tests). Add to an optional extra later if needed. |
Acceptance criteria
- All 19 core packages above are listed in
api/pyproject.tomlunderdependencies(and optionallyopenaiif applicable; considercudfas an optional extra). -
moviepyandpydantic-settingsare removed from coredependencies. - An optional-dependencies group (e.g.
test) exists and includesmoviepy(and optionally pytest/ray if desired for test runs). - Install with no extras works for production code; install with the test extra works for running the full test suite.
Minimum reproducible example
Relevant log output
Other/Misc.
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working