Skip to content

groupdocs-parser/Groupdocs.Parser-References

Repository files navigation

GroupDocs.Parser API References

This repository contains the API reference documentation for GroupDocs.Parser - a comprehensive document parser and extractor SDK that enables developers to extract text, images, metadata, and structured data from 50+ document formats.

Overview

GroupDocs.Parser is a powerful document parsing solution that provides APIs for extracting data from documents without requiring external dependencies or additional software installations. The library supports parsing and extraction from popular document formats including PDF, Microsoft Word, Excel, PowerPoint, OneNote, Outlook, and many more.

Available Platforms

This repository contains API references for the following platforms:

Supported Languages

The documentation is available in multiple languages:

  • English
  • Arabic
  • Chinese
  • French
  • German
  • Greek
  • Hindi
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Dutch
  • Russian
  • Spanish
  • Swedish
  • Turkish

Key Features

  • Text Extraction: Extract raw or formatted text from entire documents or specific pages
  • Image Extraction: Extract images from documents with support for various image formats
  • Metadata Extraction: Retrieve document properties, creation dates, author information, and more
  • Structured Data Parsing: Extract tables, forms, and structured data using template-based parsing
  • Container Extraction: Extract attachments and embedded documents from container formats
  • Cross-Platform Support: Available for .NET, Java, and Python platforms
  • No External Dependencies: Parse documents without requiring Microsoft Office, Adobe Acrobat, or other third-party software

Supported File Formats

GroupDocs.Parser supports a wide range of document formats:

  • Word Processing: DOC, DOCX, DOT, DOTX, RTF, ODT, OTT
  • Spreadsheets: XLS, XLSX, XLSM, XLSB, CSV, ODS, OTS
  • Presentations: PPT, PPTX, PPS, PPSX, ODP, OTP
  • PDF Documents: PDF, PDF/A
  • Email: MSG, EML, EMLX, PST, OST
  • Archives: ZIP, TAR, RAR
  • Other Formats: OneNote, Markdown, EPUB, and more

Repository Structure

Groupdocs.Parser-References/
├── english/          # English documentation
│   ├── net/         # .NET API references
│   ├── java/        # Java API references
│   └── python-net/  # Python via .NET API references
├── arabic/          # Arabic documentation
├── chinese/         # Chinese documentation
├── french/          # French documentation
├── german/          # German documentation
└── ...              # Other language directories

Documentation Structure

Each platform documentation includes:

  • Namespaces/Packages: Core API namespaces and packages
  • Classes: API classes and their members
  • Interfaces: API interfaces and contracts
  • Enumerations: API enumerations and constants
  • Exceptions: Exception classes for error handling

Resources

Common Use Cases

  • Document indexing and search engine integration
  • Content management systems (CMS)
  • Data migration and conversion projects
  • Document analysis and reporting
  • Automated document processing workflows
  • Text mining and content extraction
  • Metadata cataloging and organization

Contributing

This repository contains auto-generated API reference documentation. For issues, suggestions, or contributions related to the GroupDocs.Parser product itself, please visit our Free Support Forum.

About

This repository contains the API reference documentation for GroupDocs.Parser

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •