Skip to content

theolodocoder/resume-parser-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Resume Parser Library

npm version License: MIT TypeScript Node

A powerful, production-ready TypeScript library for extracting structured resume data from PDF files. Built from scratch with intelligent pattern matching and feature scoring algorithms.

Table of Contents

Features

Extract Complete Resume Data

  • Personal information (name, email, phone, location, URL, summary)
  • Work experience (company, title, duration, descriptions)
  • Education (school, degree, field, graduation date, GPA)
  • Projects (name, date, descriptions)
  • Skills (featured and other skills)
  • Custom sections and certifications

Smart Recognition

  • Pattern-based matching for contact information
  • Intelligent section detection and parsing
  • Automatic subsection division (multiple jobs/schools)
  • Bullet point handling with 10+ bullet types
  • Formatting hints detection (bold, size)

Production Ready

  • Full TypeScript support with strict typing
  • Comprehensive error handling
  • Configurable parsing options
  • Debug mode for troubleshooting
  • Timeout protection

Fully Customizable

  • Modular architecture - use what you need
  • Extend feature sets for custom attributes
  • Adjust scoring weights for your use cases
  • Create custom section extractors

Installation

npm install pdfjs-dist @yourusername/resume-parser

Quick Start

import { ResumeParser, initializePdfJs } from '@yourusername/resume-parser';
import * as pdfjsLib from 'pdfjs-dist';
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.mjs';

// Initialize PDF.js (required once)
initializePdfJs(pdfjsLib, pdfjsWorker);

// Create parser instance
const parser = new ResumeParser({ debug: true });

// Parse resume from file
const fileUrl = URL.createObjectURL(file); // Or file path
const result = await parser.parse(fileUrl);

if (result.success) {
  console.log(result.data);
  // {
  //   profile: { name, email, phone, ... },
  //   experience: [...],
  //   education: [...],
  //   projects: [...],
  //   skills: { featured, other },
  //   ...
  // }
} else {
  console.error(result.error);
}

React Integration

import { ResumeParser, initializePdfJs } from '@yourusername/resume-parser';
import * as pdfjsLib from 'pdfjs-dist';
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.mjs';

function ResumeParsing() {
  const [result, setResult] = useState(null);
  const [loading, setLoading] = useState(false);

  // Initialize once
  useEffect(() => {
    initializePdfJs(pdfjsLib, pdfjsWorker);
  }, []);

  async function handleFileUpload(file: File) {
    setLoading(true);
    const parser = new ResumeParser();
    const fileUrl = URL.createObjectURL(file);

    try {
      const parseResult = await parser.parse(fileUrl);
      setResult(parseResult);
    } finally {
      setLoading(false);
      URL.revokeObjectURL(fileUrl);
    }
  }

  return (
    <div>
      <input
        type="file"
        accept=".pdf"
        onChange={(e) => handleFileUpload(e.target.files[0])}
      />
      {loading && <p>Parsing...</p>}
      {result?.success && <pre>{JSON.stringify(result.data, null, 2)}</pre>}
      {result?.error && <p>Error: {result.error}</p>}
    </div>
  );
}

Advanced Usage

Custom Feature Scoring

import {
  TextLine,
  findBestCandidate,
  Feature
} from '@yourusername/resume-parser';

const customFeatures: Feature[] = [
  {
    matcher: (line: TextLine) => line.text.includes('CEO'),
    weight: 4,
  },
  {
    matcher: (line: TextLine) => line.text.length > 30,
    weight: -2,
  },
];

const result = findBestCandidate(lines, customFeatures);

Extract Specific Sections

import {
  groupElementsIntoLines,
  groupLinesIntoSections,
  extractProfileSection,
  extractExperienceSection
} from '@yourusername/resume-parser';

const lines = groupElementsIntoLines(elements);
const sections = groupLinesIntoSections(lines);

const experienceSection = sections.find(s =>
  s.name.includes('EXPERIENCE')
);

if (experienceSection) {
  const jobs = extractExperienceSection(experienceSection);
  console.log(jobs);
}

API Reference

ResumeParser

Constructor

new ResumeParser(options?: {
  debug?: boolean;      // Enable console logging
  timeout?: number;     // Parse timeout in ms (default: 30000)
})

Methods

async parse(fileUrl: string): Promise<ParseResult>

interface ParseResult {
  success: boolean;
  data?: ResumeData;
  error?: string;
  duration?: number;
}

Utility Functions

String Helpers

cleanText(text: string): string
splitIntoWords(text: string): string[]
isAllUpperCase(text: string): boolean
hasLetters(text: string): boolean
countWords(text: string): number

Pattern Matching

matchEmail(text: string): string | null
matchPhone(text: string): string | null
matchLocation(text: string): { city: string; state: string } | null
matchYear(text: string): string | null
matchDegree(text: string): string | null
hasJobTitleKeyword(text: string): boolean
isBulletPoint(text: string): boolean

Text Processing

groupElementsIntoLines(elements: PdfTextElement[]): TextLine[]
groupLinesIntoSections(lines: TextLine[]): ResumeSection[]
detectSubsections(lines: TextLine[]): TextLine[][]
isSectionHeader(line: TextLine, lineIndex: number): { isSectionHeader: boolean; sectionName?: string }

Feature Scoring

findBestCandidate(lines, features, options?): ScoredCandidate
findCandidatesAboveThreshold(lines, features, threshold): ScoredCandidate[]
scoreLineByFeatures(line, features, lineIndex?): ScoredCandidate

// Feature matcher helpers
hasAnyOf(keywords, caseSensitive?)
startsWithAny(keywords, caseSensitive?)
matchesRegex(pattern, extractMatch?)
hasFormatting(type: 'bold' | 'large')
onlyLettersAndSpaces()
minWordCount(count)
isAllUpperCase()
isShort(maxWords?)
maxLineLength(chars)

Data Structure

ResumeData

{
  profile: Profile;
  experience: WorkExperience[];
  education: Education[];
  projects: Project[];
  skills: { featured: Skill[]; other: string[] };
  certifications?: string[];
  languages?: string[];
  custom: { title: string; items: string[] }[];
}

Profile

{
  name: string;
  email: string;
  phone: string;
  location: string;
  url: string;
  summary: string;
}

WorkExperience

{
  company: string;
  title: string;
  duration: string;
  description: string[];
}

Education

{
  school: string;
  degree: string;
  field?: string;
  graduationDate: string;
  gpa?: string;
  details: string[];
}

Supported Resume Formats

Standard Formats

  • Single-column resumes
  • English language content
  • Bullet point descriptions
  • Common section names (Experience, Education, Skills, etc.)

⚠️ Limitations

  • Multi-column layouts (will parse but may have order issues)
  • Scanned/image PDFs (requires OCR)
  • Non-English resumes (use appropriate patterns)
  • Complex table layouts

Performance

  • Typical resume: 50-200ms parse time
  • Large resume (3+ pages): 200-500ms
  • Timeout: 30 seconds (configurable)

Testing

npm test
npm run test:watch
npm run coverage

Building

npm run build      # Build distribution files
npm run dev        # Watch mode
npm run lint       # Run ESLint
npm run format     # Format with Prettier

Architecture

src/
├── types/              # Type definitions
├── pdf/                # PDF text extraction
├── processing/         # Text grouping and section detection
├── features/           # Feature scoring engine
├── extraction/         # Section-specific extractors
├── utils/              # Helper functions
└── index.ts            # Main export

Key Algorithms

Feature Scoring System

Scores resume candidates using weighted feature matching:

Score = Σ(weight × matches)
Best candidate = max(scores)

Section Detection

  • Primary: Bold + All Uppercase text
  • Fallback: Keyword matching with constraints

Subsection Division

  • Primary: Line gap > 1.4× typical gap
  • Fallback: Bold text after non-bold text

Text Grouping

  • Groups elements into lines using Y-coordinate clustering
  • Merges adjacent elements within character width

Tech Stack

  • Language: TypeScript 5.0+
  • PDF Processing: PDF.js (Mozilla)
  • Testing: Jest
  • Build: TypeScript Compiler + ESM conversion
  • Code Quality: ESLint + Prettier

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

License

MIT - See LICENSE file for details

Support

For issues, questions, or suggestions:

  • GitHub Issues: [Your repo]/issues
  • Documentation: [Your repo]/wiki

Changelog

v1.0.0

  • Initial release
  • Complete resume extraction pipeline
  • Support for standard resume sections
  • Comprehensive testing

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published