docuglean-ocr/node-ocr at main · cernis-intelligence/docuglean-ocr

Name	Name	Last commit message	Last commit date
parent directory ..
package	package
src	src
test	test
.gitignore	.gitignore
.npmignore	.npmignore
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
README.md	README.md
banner.png	banner.png
package-lock.json	package-lock.json
package.json	package.json
tsconfig.json	tsconfig.json

Intelligent document processing using State of the Art AI models.

If you find Docuglean helpful, please ⭐ this repository to show your support!

What is Docuglean AI?

Docuglean is a unified SDK for intelligent document processing using State of the Art AI models. Docuglean provides multilingual and multimodal capabilities with plug-and-play APIs for document OCR, structured data extraction, annotation, classification, summarization, and translation. It also comes with inbuilt tools and supports different types of documents out of the box.

Features

🚀 Easy to Use: Simple, intuitive API with detailed documentation. Just pass in a file and get markdown in response.
🔍 OCR Capabilities: Extract text from images and scanned documents
📊 Structured Data Extraction: Use Zod schemas for type-safe data extraction
📑 Document Classification: Intelligently split multi-section documents by category with automatic chunking
📄 Multimodal Support: Process PDFs and images with ease
🤖 Multiple AI Providers: Support for OpenAI, Mistral, and Google Gemini, with more coming soon
⚡ Batch Processing: Process multiple documents concurrently with automatic error handling
🔒 Type Safety: Full TypeScript support with comprehensive types
📝 Document Parsers: Local parsing for DOC, DOCX, PPTX, XLSX, XLS, ODS, ODT, ODP, CSV, TSV, and PDF files (no API required)

Coming Soon

📝 summarize(): TLDRs of long documents
🌐 translate(): Support for multilingual documents
🏷️ classify(): Document type classifier (receipt, ID, invoice, etc.)
🔍 search(query): LLM-powered search across documents
🤖 More Models. More Providers: Integration with Meta's Llama, Together AI, OpenRouter and lots more.
🌍 Multilingual: Support for multiple languages (coming soon)
🎯 Smart Classification: Automatic document type detection (coming soon)

Quick Start

Installation

npm i docuglean-ocr

Features in Detail

OCR Function - Pure OCR Processing

Extracts text from documents and images. Returns text content with basic metadata (varies by provider).

import { ocr, extract } from 'docuglean-ocr';

// Extract raw text from documents (supports URLs and local files)
const ocrResult = await ocr({
  filePath: 'https://arxiv.org/pdf/2302.12854',
  provider: 'openai',
  model: 'gpt-4o-mini',
  apiKey: 'your-api-key'
});

// Mistral OCR with local file
const mistralResult = await ocr({
  filePath: './document.pdf',
  provider: 'mistral',
  model: 'mistral-ocr-latest',
  apiKey: 'your-api-key'
});

// Local OCR (no API, PDFs only) using pdf2json
const localResult = await ocr({
  filePath: './document.pdf',
  provider: 'local',
  apiKey: 'local'
});
console.log('Local text:', (localResult as any).text.substring(0, 200) + '...');

Extract Function - Document Analysis & Information Extraction

Structured extraction for analyzing document content and extracting specific information based on custom prompts.

import { z } from 'zod';

// Define schema for structured extraction
const ReceiptSchema = z.object({
  date: z.string(),
  total: z.number(),
  items: z.array(z.object({
    name: z.string(),
    price: z.number()
  }))
});

// Extract structured data from documents
const extractResult = await extract({
  filePath: './receipt.pdf',
  provider: 'openai',
  model: 'gpt-4o-mini',
  apiKey: 'your-api-key',
  responseFormat: ReceiptSchema,
  prompt: 'Extract receipt details including date, total, and items'
});

// You can now access fields directly:
console.log('Date:', extractResult.date);
console.log('Total:', extractResult.total);
console.log('First item name:', extractResult.items[0]?.name);

Document Classification - Split Documents by Category

Intelligently classify and split documents into categories based on content. Perfect for processing multi-section documents like medical records, legal contracts, or research papers.

import { classify } from 'docuglean-ocr';

// Classify a patient medical record
const result = await classify(
  './patient-record.pdf',
  [
    {
      name: 'Patient Intake Forms',
      description: 'Pages with patient registration, insurance information, and consent forms'
    },
    {
      name: 'Medical History',
      description: 'Pages containing past medical history, medications, allergies, and family history'
    },
    {
      name: 'Lab Results',
      description: 'Pages with laboratory test results, blood work, and diagnostic reports'
    },
    {
      name: 'Treatment Notes',
      description: 'Pages with doctor\'s notes, treatment plans, and prescriptions'
    }
  ],
  'your-api-key',
  'mistral' // or 'openai', 'gemini'
);

// Access the results
result.splits.forEach(split => {
  console.log(`\n${split.name}:`);
  console.log(`  Pages: ${split.pages}`);
  console.log(`  Confidence: ${split.conf}`);
});

// Example output:
// Patient Intake Forms:
//   Pages: 1,2,3,4
//   Confidence: high
// Medical History:
//   Pages: 5,6,7
//   Confidence: high
// Lab Results:
//   Pages: 8,9,10,11,12
//   Confidence: high
// Treatment Notes:
//   Pages: 13,14,15,16
//   Confidence: high

Key Features:

🎯 Automatic Chunking: Handles large documents (100+ pages) by automatically splitting into chunks
⚡ Concurrent Processing: Processes chunks in parallel for faster results
🎚️ Confidence Scores: Returns "high" or "low" confidence for each classification
📊 Page-Level Granularity: Get exact page numbers for each category
🔧 Configurable: Adjust chunk size and concurrency limits

Advanced Options:

const result = await classify(
  './large-document.pdf',
  [...],
  'your-api-key',
  'openai',
  {
    model: 'gpt-4o-mini', // Optional: specify model
    chunkSize: 75, // Pages per chunk (default: 75)
    maxConcurrent: 5 // Max parallel requests (default: 5)
  }
);

Batch Processing - Process Multiple Documents Concurrently

Process multiple documents concurrently with automatic error handling for maximum speed.

import { batchOcr, batchExtract } from 'docuglean-ocr';
import { z } from 'zod';

// Batch OCR - Process multiple files
const ocrResults = await batchOcr([
  {
    filePath: './invoice1.pdf',
    provider: 'openai',
    apiKey: 'your-api-key',
    model: 'gpt-4o-mini'
  },
  {
    filePath: './invoice2.pdf',
    provider: 'mistral',
    apiKey: 'your-api-key',
    model: 'pixtral-12b-2409'
  },
  {
    filePath: './receipt.png',
    provider: 'local',
    apiKey: 'not-needed'
  }
]);

// Handle results - errors don't stop processing
ocrResults.forEach((result, index) => {
  if (result.success) {
    console.log(`File ${index + 1} processed:`, result.result);
  } else {
    console.error(`File ${index + 1} failed:`, result.error);
  }
});

// Batch Extract - Extract structured data from multiple files
const InvoiceSchema = z.object({
  invoice_number: z.string(),
  vendor: z.string(),
  total: z.number()
});

const extractResults = await batchExtract([
  {
    filePath: './invoice1.pdf',
    provider: 'openai',
    apiKey: 'your-api-key',
    responseFormat: InvoiceSchema
  },
  {
    filePath: './invoice2.pdf',
    provider: 'openai',
    apiKey: 'your-api-key',
    responseFormat: InvoiceSchema
  }
]);

// Get successful extractions
const successful = extractResults.filter(r => r.success);
console.log(`Processed ${successful.length}/${extractResults.length} files`);

Key Features:

✅ Automatic error handling
✅ Results returned in same order as input
✅ Mix different providers in single batch
✅ Simple success/failure status for each file

Provider Options

Currently supported providers and models:

OpenAI: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4o, o1-mini, o1, o3, o4-mini
Mistral: mistral-ocr-latest for OCR. All currently available models except for codestral-mamba are supported for structured outputs.
Google Gemini: gemini-2.5-flash, gemini-2.5-pro, gemini-1.5-flash, gemini-1.5-pro
Local: No API required - supports DOC, DOCX, PPTX, XLSX, XLS, ODS, ODT, ODP, CSV, TSV, and PDF files
More coming soon: Together AI, OpenRouter, Anthropic etc

Document Parsers (Local - No API Required)

Extract text from various document formats without any AI provider:

import { parseDocumentLocal, parsePdf, parseDocx, parseCsv } from 'docuglean-ocr';

// Parse any supported document format
const result = await parseDocumentLocal('./document.pdf');
console.log(result.text);

// Or use specific parsers
const pdf = await parsePdf('./document.pdf');           // PDF
const docx = await parseDocx('./document.docx');        // DOCX (also supports DOC)
const pptx = await parsePptx('./presentation.pptx');    // PowerPoint
const xlsx = await parseSpreadsheet('./data.xlsx');     // Excel (XLSX, XLS)
const csv = await parseCsv('./data.csv');               // CSV/TSV
const odt = await parseOdt('./document.odt');           // OpenDocument Text
const odp = await parseOdp('./presentation.odp');       // OpenDocument Presentation
const ods = await parseOds('./spreadsheet.ods');        // OpenDocument Spreadsheet

Supported Formats:

Word: DOC, DOCX (via mammoth)
PowerPoint: PPTX (via officeparser)
Excel: XLSX, XLS, ODS (via officeparser)
CSV/TSV: CSV, TSV (via d3-dsv)
OpenDocument: ODT, ODP, ODS (via officeparser)
PDF: PDF (via pdf2json, or convert to images via pdf-poppler)

Configuration

OCR Configuration

interface OCRConfig {
  filePath: string;
  provider?: 'openai' | 'mistral' | 'gemini';
  model?: string;
  apiKey: string;
  prompt?: string;
  options?: {
    mistral?: {
      includeImageBase64?: boolean;
    };
    openai?: {
      maxTokens?: number;
    };
    gemini?: {
      temperature?: number;
      topP?: number;
      topK?: number;
    };
  };
}

Extraction Configuration

interface ExtractConfig {
  filePath: string;
  apiKey: string;
  provider?: 'openai' | 'mistral' | 'gemini';
  model?: string;
  prompt?: string;
  responseFormat?: z.ZodType<any>;
  systemPrompt?: string;
}

Additional Examples

// Structured extraction with Gemini
const geminiReceipt = await extract({
  filePath: './receipt.pdf',
  provider: 'gemini',
  apiKey: 'your-gemini-api-key',
  responseFormat: ReceiptSchema,
  prompt: 'Extract receipt information including date, total, and all items'
});

// Structured extraction with different schema
const DocumentSchema = z.object({
  title: z.string(),
  authors: z.array(z.string()),
  summary: z.string()
});

const documentInfo = await extract({
  filePath: './research-paper.pdf',
  provider: 'openai',
  apiKey: 'your-api-key',
  responseFormat: DocumentSchema,
  prompt: 'Extract document metadata and summary'
});

// Summarization via extract
const SummarySchema = z.object({
  title: z.string().optional(),
  summary: z.string(),
  keyPoints: z.array(z.string()),
});
const summary = await extract({
  filePath: './long-report.pdf',
  provider: 'openai',
  apiKey: 'your-api-key',
  responseFormat: SummarySchema,
  prompt: 'Provide a concise 3-sentence summary of the document.'
});
console.log('Summary:', summary.summary);

Note: you can also use extract with a targeted "search" prompt (e.g., "Find all occurrences of X and return matching passages") to perform semantic search within a document.

Check out our test folder for more comprehensive examples and use cases, including:

Receipt parsing
Document summarization
Image OCR
Structured data extraction
Custom schema validation

Stay Up to Date

⭐ Star this repo to get notified about new releases and updates!

Contributing

We welcome contributions! Please refer to the CONTRIBUTING.md file for information about how to get involved. We welcome issues, questions, and pull requests.

License

Apache 2.0 - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Intelligent document processing using State of the Art AI models.

If you find Docuglean helpful, please ⭐ this repository to show your support!

What is Docuglean AI?

Features

Coming Soon

Quick Start

Installation

Features in Detail

OCR Function - Pure OCR Processing

Extract Function - Document Analysis & Information Extraction

Document Classification - Split Documents by Category

Batch Processing - Process Multiple Documents Concurrently

Provider Options

Document Parsers (Local - No API Required)

Configuration

OCR Configuration

Extraction Configuration

Additional Examples

Stay Up to Date

Contributing

License

FilesExpand file tree

node-ocr

Directory actions

More options

Directory actions

More options

Latest commit

History

node-ocr

Folders and files

parent directory

README.md

Intelligent document processing using State of the Art AI models.

If you find Docuglean helpful, please ⭐ this repository to show your support!

What is Docuglean AI?

Features

Coming Soon

Quick Start

Installation

Features in Detail

OCR Function - Pure OCR Processing

Extract Function - Document Analysis & Information Extraction

Document Classification - Split Documents by Category

Batch Processing - Process Multiple Documents Concurrently

Provider Options

Document Parsers (Local - No API Required)

Configuration

OCR Configuration

Extraction Configuration

Additional Examples

Stay Up to Date

Contributing

License