Resume Parser Library

A powerful, production-ready TypeScript library for extracting structured resume data from PDF files. Built from scratch with intelligent pattern matching and feature scoring algorithms.

Features

✅ Extract Complete Resume Data

Personal information (name, email, phone, location, URL, summary)
Work experience (company, title, duration, descriptions)
Education (school, degree, field, graduation date, GPA)
Projects (name, date, descriptions)
Skills (featured and other skills)
Custom sections and certifications

✅ Smart Recognition

Pattern-based matching for contact information
Intelligent section detection and parsing
Automatic subsection division (multiple jobs/schools)
Bullet point handling with 10+ bullet types
Formatting hints detection (bold, size)

✅ Production Ready

Full TypeScript support with strict typing
Comprehensive error handling
Configurable parsing options
Debug mode for troubleshooting
Timeout protection

✅ Fully Customizable

Modular architecture - use what you need
Extend feature sets for custom attributes
Adjust scoring weights for your use cases
Create custom section extractors

Installation

npm install pdfjs-dist @yourusername/resume-parser

Quick Start

import { ResumeParser, initializePdfJs } from '@yourusername/resume-parser';
import * as pdfjsLib from 'pdfjs-dist';
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.mjs';

// Initialize PDF.js (required once)
initializePdfJs(pdfjsLib, pdfjsWorker);

// Create parser instance
const parser = new ResumeParser({ debug: true });

// Parse resume from file
const fileUrl = URL.createObjectURL(file); // Or file path
const result = await parser.parse(fileUrl);

if (result.success) {
  console.log(result.data);
  // {
  //   profile: { name, email, phone, ... },
  //   experience: [...],
  //   education: [...],
  //   projects: [...],
  //   skills: { featured, other },
  //   ...
  // }
} else {
  console.error(result.error);
}

React Integration

import { ResumeParser, initializePdfJs } from '@yourusername/resume-parser';
import * as pdfjsLib from 'pdfjs-dist';
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.mjs';

function ResumeParsing() {
  const [result, setResult] = useState(null);
  const [loading, setLoading] = useState(false);

  // Initialize once
  useEffect(() => {
    initializePdfJs(pdfjsLib, pdfjsWorker);
  }, []);

  async function handleFileUpload(file: File) {
    setLoading(true);
    const parser = new ResumeParser();
    const fileUrl = URL.createObjectURL(file);

    try {
      const parseResult = await parser.parse(fileUrl);
      setResult(parseResult);
    } finally {
      setLoading(false);
      URL.revokeObjectURL(fileUrl);
    }
  }

  return (
    <div>
      <input
        type="file"
        accept=".pdf"
        onChange={(e) => handleFileUpload(e.target.files[0])}
      />
      {loading && <p>Parsing...</p>}
      {result?.success && <pre>{JSON.stringify(result.data, null, 2)}</pre>}
      {result?.error && <p>Error: {result.error}</p>}
    </div>
  );
}

Advanced Usage

Custom Feature Scoring

import {
  TextLine,
  findBestCandidate,
  Feature
} from '@yourusername/resume-parser';

const customFeatures: Feature[] = [
  {
    matcher: (line: TextLine) => line.text.includes('CEO'),
    weight: 4,
  },
  {
    matcher: (line: TextLine) => line.text.length > 30,
    weight: -2,
  },
];

const result = findBestCandidate(lines, customFeatures);

Extract Specific Sections

import {
  groupElementsIntoLines,
  groupLinesIntoSections,
  extractProfileSection,
  extractExperienceSection
} from '@yourusername/resume-parser';

const lines = groupElementsIntoLines(elements);
const sections = groupLinesIntoSections(lines);

const experienceSection = sections.find(s =>
  s.name.includes('EXPERIENCE')
);

if (experienceSection) {
  const jobs = extractExperienceSection(experienceSection);
  console.log(jobs);
}

API Reference

ResumeParser

Constructor

new ResumeParser(options?: {
  debug?: boolean;      // Enable console logging
  timeout?: number;     // Parse timeout in ms (default: 30000)
})

Methods

async parse(fileUrl: string): Promise<ParseResult>

interface ParseResult {
  success: boolean;
  data?: ResumeData;
  error?: string;
  duration?: number;
}

Utility Functions

String Helpers

cleanText(text: string): string
splitIntoWords(text: string): string[]
isAllUpperCase(text: string): boolean
hasLetters(text: string): boolean
countWords(text: string): number

Pattern Matching

matchEmail(text: string): string | null
matchPhone(text: string): string | null
matchLocation(text: string): { city: string; state: string } | null
matchYear(text: string): string | null
matchDegree(text: string): string | null
hasJobTitleKeyword(text: string): boolean
isBulletPoint(text: string): boolean

Text Processing

groupElementsIntoLines(elements: PdfTextElement[]): TextLine[]
groupLinesIntoSections(lines: TextLine[]): ResumeSection[]
detectSubsections(lines: TextLine[]): TextLine[][]
isSectionHeader(line: TextLine, lineIndex: number): { isSectionHeader: boolean; sectionName?: string }

Feature Scoring

findBestCandidate(lines, features, options?): ScoredCandidate
findCandidatesAboveThreshold(lines, features, threshold): ScoredCandidate[]
scoreLineByFeatures(line, features, lineIndex?): ScoredCandidate

// Feature matcher helpers
hasAnyOf(keywords, caseSensitive?)
startsWithAny(keywords, caseSensitive?)
matchesRegex(pattern, extractMatch?)
hasFormatting(type: 'bold' | 'large')
onlyLettersAndSpaces()
minWordCount(count)
isAllUpperCase()
isShort(maxWords?)
maxLineLength(chars)

Data Structure

ResumeData

{
  profile: Profile;
  experience: WorkExperience[];
  education: Education[];
  projects: Project[];
  skills: { featured: Skill[]; other: string[] };
  certifications?: string[];
  languages?: string[];
  custom: { title: string; items: string[] }[];
}

Profile

{
  name: string;
  email: string;
  phone: string;
  location: string;
  url: string;
  summary: string;
}

WorkExperience

{
  company: string;
  title: string;
  duration: string;
  description: string[];
}

Education

{
  school: string;
  degree: string;
  field?: string;
  graduationDate: string;
  gpa?: string;
  details: string[];
}

Supported Resume Formats

✅ Standard Formats

Single-column resumes
English language content
Bullet point descriptions
Common section names (Experience, Education, Skills, etc.)

⚠️ Limitations

Multi-column layouts (will parse but may have order issues)
Scanned/image PDFs (requires OCR)
Non-English resumes (use appropriate patterns)
Complex table layouts

Performance

Typical resume: 50-200ms parse time
Large resume (3+ pages): 200-500ms
Timeout: 30 seconds (configurable)

Testing

npm test
npm run test:watch
npm run coverage

Building

npm run build      # Build distribution files
npm run dev        # Watch mode
npm run lint       # Run ESLint
npm run format     # Format with Prettier

Architecture

src/
├── types/              # Type definitions
├── pdf/                # PDF text extraction
├── processing/         # Text grouping and section detection
├── features/           # Feature scoring engine
├── extraction/         # Section-specific extractors
├── utils/              # Helper functions
└── index.ts            # Main export

Key Algorithms

Feature Scoring System

Scores resume candidates using weighted feature matching:

Score = Σ(weight × matches)
Best candidate = max(scores)

Section Detection

Primary: Bold + All Uppercase text
Fallback: Keyword matching with constraints

Subsection Division

Primary: Line gap > 1.4× typical gap
Fallback: Bold text after non-bold text

Text Grouping

Groups elements into lines using Y-coordinate clustering
Merges adjacent elements within character width

Tech Stack

Language: TypeScript 5.0+
PDF Processing: PDF.js (Mozilla)
Testing: Jest
Build: TypeScript Compiler + ESM conversion
Code Quality: ESLint + Prettier

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

License

MIT - See LICENSE file for details

Support

For issues, questions, or suggestions:

GitHub Issues: [Your repo]/issues
Documentation: [Your repo]/wiki

Changelog

v1.0.0

Initial release
Complete resume extraction pipeline
Support for standard resume sections
Comprehensive testing

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
DOCX_SUPPORT.md		DOCX_SUPPORT.md
ENHANCEMENTS.md		ENHANCEMENTS.md
ENHANCEMENTS_SUMMARY.md		ENHANCEMENTS_SUMMARY.md
GET_STARTED.md		GET_STARTED.md
IMPLEMENTATION_GUIDE.md		IMPLEMENTATION_GUIDE.md
LICENSE		LICENSE
NEXT_STEPS.md		NEXT_STEPS.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
Promise_Resume.pdf		Promise_Resume.pdf
QUICK_START.md		QUICK_START.md
QUICK_TEST.md		QUICK_TEST.md
REACT_SETUP.md		REACT_SETUP.md
README.md		README.md
SAMPLE_RESUME.txt		SAMPLE_RESUME.txt
TESTING_GUIDE.md		TESTING_GUIDE.md
browser-test.html		browser-test.html
demo.js		demo.js
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
parse-promise-resume.ts		parse-promise-resume.ts
parse-promise.js		parse-promise.js
parsed-result.json		parsed-result.json
peom.pdf		peom.pdf
server.js		server.js
test-resume.js		test-resume.js
test-resume.ts		test-resume.ts
tsconfig.json		tsconfig.json

License

theolodocoder/resume-parser-lib

Folders and files

Latest commit

History

Repository files navigation

Resume Parser Library

Table of Contents

Features

Installation

Quick Start

React Integration

Advanced Usage

Custom Feature Scoring

Extract Specific Sections

API Reference

ResumeParser

Utility Functions

Data Structure

ResumeData

Profile

WorkExperience

Education

Supported Resume Formats

Performance

Testing

Building

Architecture

Key Algorithms

Feature Scoring System

Section Detection

Subsection Division

Text Grouping

Tech Stack

Contributing

License

Support

Changelog

v1.0.0

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages