List view
# Semantra 0.2.0 A redesign of Semantra to be more efficient and versatile. With these changes, Semantra will be easily installable and able to be run stand-alone, with documents added/removed through the UI. The changes introduced will likely not be backwards-compatible with old, stored embeddings but will be a strong step towards stability. ## Robustness * Unit tests * Linting * Pre-commit hooks * GitHub actions, including to deploy to PyPI ## Faster document storage and retrieval * Using [annlite](https://github.com/jina-ai/annlite) and [docarray](https://github.com/docarray/docarray) * Deprecate using Annoy as it doesn't scale well for large collections of documents and poses installation problems ## Additional formats * Rewrite PDF frontend renderer to use [PDF.js](https://mozilla.github.io/pdf.js/) to avoid needing backend PDF rendering * CSV with indexing certain columns * Audio and video with transcription using [faster-whisper](https://github.com/guillaumekln/faster-whisper) * Ability to represent different processing options per file and memoize results (potentially requires central sqlite db) ## Ease of installation * Use [PyInstaller](https://pyinstaller.org/en/stable/) to create an installer that non-technical users can employ * Ability to export document collections as entirely web-runnable demos using [Transformers.js](https://xenova.github.io/transformers.js/) ## Website * A dedicated documentation and demo website at semantra.ai (already registered) ## Extensibility and documentation * A plug-in system to build additional document loaders, frontend document renderers * Well-documented APIs * Welcoming to contributors * Additional guides (contributing, installing, deploying on a server, recipes, how embeddings are stored/cached) ## Probably not for this release * Add a terminal-only search UI using [Textual](https://textual.textualize.io/tutorial/)
No due date•0/8 issues closed