Singular is a high-performance file analysis and deduplication daemon that scans the filesystem, fingerprints files using SHA-256, and maintains a persistent metadata database to detect duplicates efficiently over time.
Support long-running background operation, and allow future extensibility without being constrained by rigid SQL schemas.
- Duplicate file detection using SHA-256 hashing
- Performance-oriented design
- Metadata reuse across runs
- Reduced re-hashing for unchanged files
- Persistent metadata store (JSON-based, schema-flexible)
- Daemon / service mode support (systemd)
- Rust-accelerated file I/O for critical paths
- Modular, extensible architecture
.
├── deamon.sh
├── LICENSE
├── pyproject.toml
├── README.md
├── singular
│ ├── analysis_pulgins
│ ├── analysis.py
│ ├── cli_texts.py
│ ├── config.py
│ ├── data_base
│ │ └── __init__.py
│ ├── data_base_manager.py
│ ├── file_io.rs
│ ├── file.py
│ ├── __init__.py
│ ├── logger.py
│ ├── __main__.py
│ ├── process.py
│ └── utils.py
├── singular_config.json
└── singular.service-
Filesystem Scan
- Files are discovered and passed through the processing pipeline.
- Processing pipeline is optimized to process the discovered non-registerd files in parallel
- Processing pipeline uses rust to read bytes increasing the speed
-
Hashing & Metadata Collection
- SHA-256 hash is computed.
- File size, path, and processing time are recorded.
-
Persistent Storage
- Metadata is stored in a JSON-based database.
- Existing entries are reused to avoid unnecessary disk reads.
-
Analysis
- Files with identical hashes are grouped as duplicates.
- Missing or deleted files are handled gracefully.
Check out this
Currently supports only linux
- Clone git repo
git clone https://github.com/HostServer001/singular/- Change direcotry
cd singular- Change permissions
sudo chmod +x daemon.sh
- Start singular daemon
./daemon.sh
- Confirm
systemctl status singular.service- Analysis
singular analysis- More help
singular --help-
Avoid SQL when flexibility matters JSON storage allows evolving metadata without migrations.
-
Disk I/O is the real bottleneck The system is optimized to reduce repeated reads and writes.
-
Composable internals Each component (scanner, database, analyzer) can evolve independently.
-
Future-proof Rust is used selectively where Python overhead becomes significant.
Early but functional, actively evolving and performance characteristics already measurable and improving
Licensed under the terms specified in the LICENSE file.
- More analyzing fucntions
- File similarity feature , idk how but yeah :)
- More Rust optimizations for data base read wirte also
Singular is built as a systems-level learning project with real-world constraints in mind: performance, correctness, and long-running reliability.
Contributions, reviews, and architectural discussions are welcome.