A lightweight, fast DOCX text extraction library for Rust with minimal dependencies.
- 🚀 Fast - Optimized for speed with streaming XML parsing
- 🪶 Lightweight - Minimal dependencies (only
zip,quick-xml, andthiserror) - 🛡️ Safe - Zero unsafe code
- 📊 Tables - Full support for table text extraction
- 🎯 Simple API - Easy to use with both simple and advanced APIs
- 🔧 Robust - Handles malformed documents gracefully
Add this to your Cargo.toml:
[dependencies]
docx-lite = "0.2.0"use docx_lite::extract_text;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let text = extract_text("document.docx")?;
println!("{}", text);
Ok(())
}use docx_lite::{parse_document_from_path, ExtractOptions};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let doc = parse_document_from_path("document.docx")?;
// Extract text with all options enabled
let options = ExtractOptions::all();
let text = doc.extract_text_with_options(&options);
println!("{}", text);
// Or customize extraction
let custom_options = ExtractOptions {
include_headers: true,
include_footers: true,
include_footnotes: false,
include_endnotes: false,
include_list_markers: true,
};
let custom_text = doc.extract_text_with_options(&custom_options);
// Access specific elements
for list_item in &doc.lists {
println!("List item (level {}): {}", list_item.level, list_item.text);
}
for footnote in &doc.footnotes {
println!("Footnote {}: {}",
footnote.id,
footnote.paragraphs[0].to_text()
);
}
Ok(())
}extract_text(path)- Extract all text from a DOCX fileextract_text_from_bytes(bytes)- Extract text from DOCX bytesextract_text_from_reader(reader)- Extract text from any reader
parse_document(reader)- Parse DOCX into a structured Documentparse_document_from_path(path)- Parse DOCX file into a structured Document
- ✅ Paragraphs
- ✅ Runs (with bold, italic, underline formatting)
- ✅ Tables (with rows and cells)
- ✅ Lists (bullets and numbering) - NEW in v0.2.0
- ✅ Headers/Footers - NEW in v0.2.0
- ✅ Footnotes/Endnotes - NEW in v0.2.0
- ✅ Advanced text extraction with options
docx-lite is designed for speed and efficiency:
- Streaming XML parsing (no full DOM loading)
- Minimal memory allocation
- Zero-copy where possible
- Optimized for text extraction use case
Unlike other DOCX libraries in the Rust ecosystem, docx-lite:
- Compiles on modern Rust - No issues with latest Rust versions
- Minimal dependencies - Reduces compilation time and security surface
- Production-ready - Used in production at V-Lawyer
- Focused scope - Does one thing well: text extraction
Contributions are welcome! Please feel free to submit a Pull Request.
This project is dual-licensed under MIT OR Apache-2.0.
Developed by the V-Lawyer team as part of our commitment to open source.