Skip to content
@groupdocs-parser

GroupDocs.Parser Product Family

Transform your document reading and analyzing process to extract raw or formatted text from different file formats

Document Parsing API & SDKs

Product Page Docs Demos API Blog Search Support Temp License

GroupDocs.Parser is a document parsing and data extraction API. Extract text, metadata, barcodes, structured fields, images, tables, and document entities from PDFs, Office files, emails, eBooks, and archives—built for search indexing, compliance, data capture, and content ingestion workflows.

📰 Latest Parser News & Updates

  • See the latest release notes on NuGet and Maven Central for parser engine improvements, faster template-based extraction, and better table detection.
  • Updated sample apps show invoice data extraction, email parsing, and PDF text extraction scenarios.
  • New how-tos on templated parsing and container file processing in the documentation.

📂 Supported Platforms & Repository Groups

🌐 .NET Document Parsing (C#, ASP.NET, WinForms)

High-performance APIs for document parsing on .NET Framework and .NET Core.

  • GroupDocs.Parser-for-.NET: Core C# API for text, metadata, tables, and template-based extraction.
  • Samples & Demos: Explore runnable examples in the repository to parse PDFs, DOCX, XLSX, PPTX, MSG/EML, EPUB, ZIP, and more.
// Quick .NET Parsing Example
using (var parser = new GroupDocs.Parser.Parser("invoice.pdf"))
{
    // Extract plain text from the document
    using (var reader = parser.GetText())
    {
        Console.WriteLine(reader.ReadToEnd());
    }
}

☕ Java Document Parsing (Maven, Spring)

Native Java library for text, metadata, and structured data extraction.

// Quick Java Parsing Example
try (com.groupdocs.parser.Parser parser = new com.groupdocs.parser.Parser("contract.docx")) {
    java.io.Reader reader = parser.getText();
    if (reader != null) {
        char[] buffer = new char[2048];
        int read;
        while ((read = reader.read(buffer)) != -1) {
            System.out.print(new String(buffer, 0, read));
        }
    }
}

🐍 Python Document Parsing (Python via .NET)

Cross-platform Python bindings for text, metadata, and structured data extraction.

# Quick Python Parsing Example
from groupdocs.parser import Parser

with Parser("sample.pdf") as parser:
    text = parser.GetText()
    print(text)

🧠 Business Use-Cases

  • Invoice & receipt data extraction: pull totals, dates, vendors, and line items via templates.
  • Email & attachment parsing: extract headers, bodies, attachments, and metadata from MSG/EML.
  • Contract analysis: capture clauses, signatures, and key fields from DOCX/PDF.
  • PDF table extraction: pull line items and financial tables from PDFs (see table extraction sample).
  • Content migration: normalize mixed file types into structured outputs.

✅ API Key Features & Benefits

  • High-fidelity text extraction for PDF, DOC/DOCX, XLS/XLSX, PPT/PPTX, HTML, RTF, TXT, EPUB.
  • Template-based extraction to capture labeled fields, tables, and repeating blocks reliably.
  • Table recognition with cell-by-cell extraction for spreadsheets and tabular PDFs.
  • Metadata parsing (built-in and custom) for compliance and governance.
  • Container support for ZIP, OST/PST, MSG/EML, and attachments within archived files.
  • Image & embedded object extraction for logos, signatures, and inline graphics.
  • Page-level & area-limited parsing to target specific regions for faster processing.
  • Performance & scaling tuned for server-side, multi-document workloads.

🆘 Technical Support & Resources

🏷️ Tags

groupdocs-parser document-parser pdf-parser text-extraction data-extraction metadata-parser email-parser invoice-parsing table-extraction template-based-parsing content-ingestion document-ai search-indexing enterprise-parsing

Pinned Loading

  1. GroupDocs.Parser-for-.NET GroupDocs.Parser-for-.NET Public

    GroupDocs.Parser for .NET examples, plugins and showcase projects

    16 10

  2. GroupDocs.Parser-for-Java GroupDocs.Parser-for-Java Public

    GroupDocs.Parser for Java examples, plugins and showcase projects

    10 4

Repositories

Showing 10 of 12 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…