Security features, threat model, and compliance information for RexLit M0.
- Security Overview
- Threat Model
- Security Features
- Path Traversal Protection
- Audit Trail Security
- Legal Compliance
- Security Testing
- Reporting Security Issues
RexLit is designed for adversarial document sets where malicious actors may attempt to:
- Exploit path traversal vulnerabilities
- Tamper with audit trails
- Compromise chain-of-custody
- Inject malicious content
Security Philosophy: Defense-in-depth with cryptographic guarantees.
✅ 0 Critical Vulnerabilities ✅ 13 Security Tests Passing ✅ Path Traversal Protection ✅ Tamper-Evident Audit Trail ✅ Legal Compliance (FRCP Rule 26)
- Opposing Counsel: May submit documents designed to exploit vulnerabilities
- Malicious Insiders: Users with file system access attempting to tamper with evidence
- Automated Attacks: Scripts generating malicious file structures
Goal: Access files outside the designated document root
Attack Techniques:
- Symlinks pointing to
/etc/passwd,/home/user/.ssh/id_rsa ../sequences to escape document directory- Absolute paths like
/tmp/malicious.pdf - Nested symlink chains
- Mixed techniques (symlink +
../)
Impact: Confidential data exposure, unauthorized file access
Mitigation: ✅ Implemented (see Path Traversal Protection)
Goal: Modify or delete audit entries to hide actions
Attack Techniques:
- Direct modification of
audit.jsonl - Entry deletion
- Entry reordering
- Hash manipulation
- Replay attacks
Impact: Loss of legal defensibility, inadmissible evidence
Mitigation: ✅ Implemented (see Audit Trail Security)
Goal: Crash or slow down RexLit
Attack Techniques:
- Extremely large files (10GB+ PDFs)
- Deeply nested directory structures
- Circular symlinks
- Malformed document headers
Mitigation: 🟡 Partial
- Memory limits enforced
- Timeout handling (future)
- Resource quotas (future)
Goal: Inject malicious content into indexed documents
Attack Techniques:
- JavaScript in PDF annotations
- Macro-enabled DOCX files
- Embedded executables in metadata
Mitigation: ✅ Text-only extraction
- No script execution
- No macro evaluation
- Metadata sanitization
Status: ✅ Production-Ready
Every file path is validated through a 3-layer security check:
def validate_path(path: Path, allowed_root: Path) -> bool:
"""Validate path is within allowed boundary."""
# Layer 1: Resolve symlinks and relative paths
resolved = path.resolve()
# Layer 2: Check if within boundary
try:
resolved.relative_to(allowed_root.resolve())
return True
except ValueError:
# Layer 3: Log security event
logger.warning(f"PATH_TRAVERSAL blocked: {path} → {resolved}")
return FalseImplementation detail: RexLit resolves every candidate path with fs.realpathSync (following all symlinks and junctions) before performing the boundary comparison, ensuring that symlink chains cannot escape REXLIT_HOME even if the original string appears safe.
✅ Symlinks outside document root
✅ ../ path traversal attempts
✅ Absolute paths
✅ Nested traversal attacks
✅ Symlink chains
# Attacker creates malicious symlink
cd /litigation/docs
ln -s /etc/passwd evil.txt
# RexLit detects and blocks
$ rexlit ingest /litigation/docs
Warning: Skipping evil.txt (path traversal detected)
Blocked: /litigation/docs/evil.txt → /etc/passwdAll traversal attempts are logged to the audit trail:
{
"timestamp": "2025-10-23T10:15:42Z",
"action": "PATH_TRAVERSAL_BLOCKED",
"details": {
"path": "/litigation/docs/evil.txt",
"resolved": "/etc/passwd",
"boundary": "/litigation/docs"
},
"severity": "WARNING"
}Status: ✅ Production-Ready
Each audit entry contains:
{
"timestamp": "2025-10-23T09:15:23.123456Z",
"operation": "index.build",
"sequence": 42,
"previous_hash": "9f1e8a4b2c5d7f3e1a6b9d4c2a7b3c2d...",
"entry_hash": "4a7b3c2d9f1e8a4b2c5d7f3e1a6b9d4c...",
"signature": "5fd5b996e0a9b0c1..."
}Hash Computation:
hash = SHA256(
timestamp +
action +
JSON(details) +
previous_hash
)
Genesis Entry:
- First entry has
previous_hash = "0000000000000000..." - Establishes the chain starting point
- Each entry is signed with an HMAC keyed by a secret stored under
~/.config/rexlit/audit-ledger.key. - The ledger tip (
last_sequence,last_hash) is replicated inaudit.jsonl.metaand sealed with the same HMAC. - Verification fails if:
- An entry signature does not match (content tampering)
- The ledger file is truncated or deleted (metadata mismatch)
- Metadata is altered without the secret key (HMAC mismatch)
- Immutability: Changing any entry breaks all subsequent hashes
- Append-Only: No deletions without breaking chain or metadata seal
- Temporal Ordering: Reordering breaks linkage
- Tamper-Evidence: Verification detects content, signature, or metadata tampering
- Deletion Detection: Missing files or truncated tails trigger verification failure
Scenario: Attacker modifies entry #5 to hide a search query
# Before tampering
Entry 5: hash=ABC123..., previous_hash=DEF456...
Entry 6: hash=GHI789..., previous_hash=ABC123...
# After tampering (attacker changes Entry 5)
Entry 5: hash=XYZ999..., previous_hash=DEF456... # Hash changed!
Entry 6: hash=GHI789..., previous_hash=ABC123... # Still points to old hash
# Verification fails
$ rexlit audit verify
✗ FAILED: Entry 6 has invalid previous_hash
Expected: XYZ999...
Actual: ABC123...
Tampering detected at entry 6Every audit write is followed by:
file.write(json.dumps(entry) + "\n")
file.flush()
os.fsync(file.fileno()) # Force write to diskGuarantee: Even if system crashes immediately after write, entry is persisted.
Legal Significance: Meets FRCP requirements for defensible preservation.
Status: ✅ Production-Ready
- PII findings are persisted via
EncryptedPIIStorewhich encrypts every record using Fernet (AES-128 + HMAC). - Encryption keys are generated on first use and stored at
~/.config/rexlit/pii.keywith0600permissions. - The encrypted findings file (
pii_findings.enc) contains only ciphertext; document identifiers, entity text, and coordinates are never written in plaintext. - Decryption occurs in memory only when calling
EncryptedPIIStore.read_*helpers. EncryptedPIIStore.purge()securely removes all stored ciphertext for breach response workflows.
Status: ✅ Implemented
ALLOWED_EXTENSIONS = {".pdf", ".docx", ".txt", ".md"}
if path.suffix.lower() not in ALLOWED_EXTENSIONS:
raise ValueError(f"Unsupported file type: {path.suffix}")- Resolve all symlinks with
.resolve() - Normalize paths with
.absolute() - Validate UTF-8 encoding
- Max file size: 100MB (configurable)
- Max path length: 4096 characters
- Max directory depth: 50 levels
Status: ✅ By Design
RexLit is offline-by-default:
- No HTTP clients
- No external API calls
- No phone-home telemetry
Exception: Future --online flag for case law lookups (explicit opt-in)
- PDF: Text extraction only, no JavaScript evaluation
- DOCX: XML parsing only, no macro execution
- No
eval(),exec(), orsubprocesscalls on untrusted input
Core: Tantivy, PyMuPDF, python-docx, Pydantic
Dev: pytest, ruff, black, mypy
All dependencies pinned with hash verification (future).
def discover_documents(
root: Path,
recursive: bool = True,
allowed_root: Optional[Path] = None
) -> Iterator[DocumentMetadata]:
"""Discover documents with path validation."""
if allowed_root is None:
allowed_root = root
for path in scan_directory(root, recursive):
# Validate before processing
if not validate_path(path, allowed_root):
logger.warning(f"Blocked path traversal: {path}")
continue # Skip malicious file
yield DocumentMetadata(path=str(path), ...)When ingesting a single file, boundary check is bypassed:
if path.is_file():
# Direct file access allowed
return discover_single_file(path)Rationale: User explicitly specified file path, no traversal risk.
$ rexlit audit verifyentries = [json.loads(line) for line in open("audit.jsonl")]first_entry = entries[0]
assert first_entry["previous_hash"] == "0" * 64for i, entry in enumerate(entries):
# Recompute hash
computed = compute_hash(entry)
assert computed == entry["hash"], f"Entry {i} hash mismatch"
# Verify linkage
if i > 0:
assert entry["previous_hash"] == entries[i-1]["hash"]- PASSED: All entries valid, chain intact
- FAILED: Specific entry number and error details
RexLit provides:
- Preservation: Fsync guarantees prevent data loss
- Documentation: Audit trail of all actions
- Chain-of-Custody: Cryptographic hash chain
- Authenticity: SHA-256 fingerprints for every document
- Completeness: All documents tracked in manifest
Audit Trail:
- Tamper-evident by design
- Cryptographically verifiable
- Detailed timestamp records
- Meets business records exception (FRE 803(6))
Document Hashes:
- SHA-256 fingerprints for integrity
- Detect unauthorized modifications
- Prove document unchanged since collection
If a party deletes or tampers with documents:
- Detection:
rexlit audit verifyshows chain break - Evidence: Audit log shows exact timestamp of tampering
- Preservation: Original manifest shows all collected documents
Path Traversal Tests (13 tests):
test_discover_document_symlink_within_boundary✅test_discover_document_symlink_outside_boundary✅test_discover_document_path_traversal_dotdot✅test_discover_document_absolute_path_outside_root✅test_discover_documents_nested_path_traversal✅test_discover_documents_system_file_access_attempt✅test_symlink_chain_outside_boundary✅- And 6 more...
Audit Tests (10 tests):
test_audit_genesis_hash✅test_audit_chain_entry_linking✅test_audit_tampering_modified_content✅test_audit_tampering_deleted_entry✅test_audit_tampering_reordered_entries✅- And 5 more...
# Test: Symlink to /etc/passwd
def test_symlink_outside_boundary(temp_dir):
"""Verify symlink to system file is blocked."""
malicious = temp_dir / "evil.txt"
malicious.symlink_to("/etc/passwd")
docs = list(discover_documents(temp_dir, allowed_root=temp_dir))
assert len(docs) == 0, "Should block symlink outside boundary"Result: ✅ All attacks successfully blocked
-
File Permissions: Restrict
audit.jsonlto read-only after creationchmod 444 ~/.local/share/rexlit/audit.jsonl -
Backup Audit Trail: Regular backups to immutable storage
cp audit.jsonl /backup/audit-$(date +%Y%m%d).jsonl -
Verify Regularly: Run verification before critical deadlines
rexlit audit verify || alert "Audit verification failed!"
-
Monitor Logs: Watch for path traversal warnings
rexlit audit show | grep PATH_TRAVERSAL
- Don't Edit Audit Trail: Any modification breaks legal defensibility
- Verify Before Production: Always
rexlit audit verifybefore producing documents - Keep Manifests: Store document manifests separately for redundancy
- Report Anomalies: Unusual path traversal warnings may indicate malicious documents
Issue: Very large files (10GB+) can exhaust memory
Mitigation:
- Monitor resource usage
- Implement file size limits (future)
- Use streaming extraction (future)
Severity: LOW (DoS only, no data compromise)
Issue: File could change between validation and read
Mitigation:
- Minimal time window
- Read-only mode recommended
- File system snapshots (user responsibility)
Severity: LOW (requires attacker write access)
Issue: No validation of PDF/DOCX embedded metadata
Mitigation:
- Metadata stored as-is, not executed
- Future: Sanitization layer
Severity: LOW (no execution risk)
Email: security@rexlit.example.com
PGP Key: [Public key block here]
- Detailed description of vulnerability
- Steps to reproduce
- Proof-of-concept code (if applicable)
- Suggested fix (optional)
- 24 hours: Initial acknowledgment
- 7 days: Preliminary assessment
- 30 days: Fix or mitigation plan
- 90 days: Public disclosure (coordinated)
Contributors to RexLit security:
- Your name here?
- File size limits
- Timeout handling for extraction
- Metadata sanitization
- Dependency hash verification
- Encrypted audit trail option
- Digital signatures for entries
- Multi-party audit verification
- Hardware security module (HSM) support
- Security audit by third party
- Penetration testing
- CVE monitoring for dependencies
- SBOM (Software Bill of Materials)
- ✅ FRCP Rule 26 (Federal Rules of Civil Procedure)
- ✅ FRE 803(6) (Business Records Exception)
- SOC 2 Type II
- ISO 27001
- NIST 800-53
- GDPR (if applicable)
Last Updated: 2025-10-23 (M0 Release)
Security Contact: security@rexlit.example.com