This repository contains the API reference documentation for GroupDocs.Parser - a comprehensive document parser and extractor SDK that enables developers to extract text, images, metadata, and structured data from 50+ document formats.
GroupDocs.Parser is a powerful document parsing solution that provides APIs for extracting data from documents without requiring external dependencies or additional software installations. The library supports parsing and extraction from popular document formats including PDF, Microsoft Word, Excel, PowerPoint, OneNote, Outlook, and many more.
This repository contains API references for the following platforms:
- GroupDocs.Parser for .NET - API references for .NET Framework applications
- GroupDocs.Parser for Java - API references for Java-based applications
- GroupDocs.Parser for Python via .NET - API references for Python applications
The documentation is available in multiple languages:
- English
- Arabic
- Chinese
- French
- German
- Greek
- Hindi
- Indonesian
- Italian
- Japanese
- Korean
- Dutch
- Russian
- Spanish
- Swedish
- Turkish
- Text Extraction: Extract raw or formatted text from entire documents or specific pages
- Image Extraction: Extract images from documents with support for various image formats
- Metadata Extraction: Retrieve document properties, creation dates, author information, and more
- Structured Data Parsing: Extract tables, forms, and structured data using template-based parsing
- Container Extraction: Extract attachments and embedded documents from container formats
- Cross-Platform Support: Available for .NET, Java, and Python platforms
- No External Dependencies: Parse documents without requiring Microsoft Office, Adobe Acrobat, or other third-party software
GroupDocs.Parser supports a wide range of document formats:
- Word Processing: DOC, DOCX, DOT, DOTX, RTF, ODT, OTT
- Spreadsheets: XLS, XLSX, XLSM, XLSB, CSV, ODS, OTS
- Presentations: PPT, PPTX, PPS, PPSX, ODP, OTP
- PDF Documents: PDF, PDF/A
- Email: MSG, EML, EMLX, PST, OST
- Archives: ZIP, TAR, RAR
- Other Formats: OneNote, Markdown, EPUB, and more
Groupdocs.Parser-References/
├── english/ # English documentation
│ ├── net/ # .NET API references
│ ├── java/ # Java API references
│ └── python-net/ # Python via .NET API references
├── arabic/ # Arabic documentation
├── chinese/ # Chinese documentation
├── french/ # French documentation
├── german/ # German documentation
└── ... # Other language directories
Each platform documentation includes:
- Namespaces/Packages: Core API namespaces and packages
- Classes: API classes and their members
- Interfaces: API interfaces and contracts
- Enumerations: API enumerations and constants
- Exceptions: Exception classes for error handling
- Product Overview - Learn about features, supported formats, and use cases
- Developer Documentation - Comprehensive guides, tutorials, and code examples
- Releases & Downloads - Download the latest versions and release notes
- Document indexing and search engine integration
- Content management systems (CMS)
- Data migration and conversion projects
- Document analysis and reporting
- Automated document processing workflows
- Text mining and content extraction
- Metadata cataloging and organization
This repository contains auto-generated API reference documentation. For issues, suggestions, or contributions related to the GroupDocs.Parser product itself, please visit our Free Support Forum.