A powerful, production-ready TypeScript library for extracting structured resume data from PDF files. Built from scratch with intelligent pattern matching and feature scoring algorithms.
- Features
- Installation
- Quick Start
- React Integration
- Advanced Usage
- API Reference
- Data Structure
- Architecture
- Performance
- Testing
- Contributing
- License
✅ Extract Complete Resume Data
- Personal information (name, email, phone, location, URL, summary)
- Work experience (company, title, duration, descriptions)
- Education (school, degree, field, graduation date, GPA)
- Projects (name, date, descriptions)
- Skills (featured and other skills)
- Custom sections and certifications
✅ Smart Recognition
- Pattern-based matching for contact information
- Intelligent section detection and parsing
- Automatic subsection division (multiple jobs/schools)
- Bullet point handling with 10+ bullet types
- Formatting hints detection (bold, size)
✅ Production Ready
- Full TypeScript support with strict typing
- Comprehensive error handling
- Configurable parsing options
- Debug mode for troubleshooting
- Timeout protection
✅ Fully Customizable
- Modular architecture - use what you need
- Extend feature sets for custom attributes
- Adjust scoring weights for your use cases
- Create custom section extractors
npm install pdfjs-dist @yourusername/resume-parserimport { ResumeParser, initializePdfJs } from '@yourusername/resume-parser';
import * as pdfjsLib from 'pdfjs-dist';
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.mjs';
// Initialize PDF.js (required once)
initializePdfJs(pdfjsLib, pdfjsWorker);
// Create parser instance
const parser = new ResumeParser({ debug: true });
// Parse resume from file
const fileUrl = URL.createObjectURL(file); // Or file path
const result = await parser.parse(fileUrl);
if (result.success) {
console.log(result.data);
// {
// profile: { name, email, phone, ... },
// experience: [...],
// education: [...],
// projects: [...],
// skills: { featured, other },
// ...
// }
} else {
console.error(result.error);
}import { ResumeParser, initializePdfJs } from '@yourusername/resume-parser';
import * as pdfjsLib from 'pdfjs-dist';
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.mjs';
function ResumeParsing() {
const [result, setResult] = useState(null);
const [loading, setLoading] = useState(false);
// Initialize once
useEffect(() => {
initializePdfJs(pdfjsLib, pdfjsWorker);
}, []);
async function handleFileUpload(file: File) {
setLoading(true);
const parser = new ResumeParser();
const fileUrl = URL.createObjectURL(file);
try {
const parseResult = await parser.parse(fileUrl);
setResult(parseResult);
} finally {
setLoading(false);
URL.revokeObjectURL(fileUrl);
}
}
return (
<div>
<input
type="file"
accept=".pdf"
onChange={(e) => handleFileUpload(e.target.files[0])}
/>
{loading && <p>Parsing...</p>}
{result?.success && <pre>{JSON.stringify(result.data, null, 2)}</pre>}
{result?.error && <p>Error: {result.error}</p>}
</div>
);
}import {
TextLine,
findBestCandidate,
Feature
} from '@yourusername/resume-parser';
const customFeatures: Feature[] = [
{
matcher: (line: TextLine) => line.text.includes('CEO'),
weight: 4,
},
{
matcher: (line: TextLine) => line.text.length > 30,
weight: -2,
},
];
const result = findBestCandidate(lines, customFeatures);import {
groupElementsIntoLines,
groupLinesIntoSections,
extractProfileSection,
extractExperienceSection
} from '@yourusername/resume-parser';
const lines = groupElementsIntoLines(elements);
const sections = groupLinesIntoSections(lines);
const experienceSection = sections.find(s =>
s.name.includes('EXPERIENCE')
);
if (experienceSection) {
const jobs = extractExperienceSection(experienceSection);
console.log(jobs);
}Constructor
new ResumeParser(options?: {
debug?: boolean; // Enable console logging
timeout?: number; // Parse timeout in ms (default: 30000)
})Methods
async parse(fileUrl: string): Promise<ParseResult>
interface ParseResult {
success: boolean;
data?: ResumeData;
error?: string;
duration?: number;
}String Helpers
cleanText(text: string): string
splitIntoWords(text: string): string[]
isAllUpperCase(text: string): boolean
hasLetters(text: string): boolean
countWords(text: string): numberPattern Matching
matchEmail(text: string): string | null
matchPhone(text: string): string | null
matchLocation(text: string): { city: string; state: string } | null
matchYear(text: string): string | null
matchDegree(text: string): string | null
hasJobTitleKeyword(text: string): boolean
isBulletPoint(text: string): booleanText Processing
groupElementsIntoLines(elements: PdfTextElement[]): TextLine[]
groupLinesIntoSections(lines: TextLine[]): ResumeSection[]
detectSubsections(lines: TextLine[]): TextLine[][]
isSectionHeader(line: TextLine, lineIndex: number): { isSectionHeader: boolean; sectionName?: string }Feature Scoring
findBestCandidate(lines, features, options?): ScoredCandidate
findCandidatesAboveThreshold(lines, features, threshold): ScoredCandidate[]
scoreLineByFeatures(line, features, lineIndex?): ScoredCandidate
// Feature matcher helpers
hasAnyOf(keywords, caseSensitive?)
startsWithAny(keywords, caseSensitive?)
matchesRegex(pattern, extractMatch?)
hasFormatting(type: 'bold' | 'large')
onlyLettersAndSpaces()
minWordCount(count)
isAllUpperCase()
isShort(maxWords?)
maxLineLength(chars){
profile: Profile;
experience: WorkExperience[];
education: Education[];
projects: Project[];
skills: { featured: Skill[]; other: string[] };
certifications?: string[];
languages?: string[];
custom: { title: string; items: string[] }[];
}{
name: string;
email: string;
phone: string;
location: string;
url: string;
summary: string;
}{
company: string;
title: string;
duration: string;
description: string[];
}{
school: string;
degree: string;
field?: string;
graduationDate: string;
gpa?: string;
details: string[];
}✅ Standard Formats
- Single-column resumes
- English language content
- Bullet point descriptions
- Common section names (Experience, Education, Skills, etc.)
- Multi-column layouts (will parse but may have order issues)
- Scanned/image PDFs (requires OCR)
- Non-English resumes (use appropriate patterns)
- Complex table layouts
- Typical resume: 50-200ms parse time
- Large resume (3+ pages): 200-500ms
- Timeout: 30 seconds (configurable)
npm test
npm run test:watch
npm run coveragenpm run build # Build distribution files
npm run dev # Watch mode
npm run lint # Run ESLint
npm run format # Format with Prettiersrc/
├── types/ # Type definitions
├── pdf/ # PDF text extraction
├── processing/ # Text grouping and section detection
├── features/ # Feature scoring engine
├── extraction/ # Section-specific extractors
├── utils/ # Helper functions
└── index.ts # Main export
Scores resume candidates using weighted feature matching:
Score = Σ(weight × matches)
Best candidate = max(scores)
- Primary: Bold + All Uppercase text
- Fallback: Keyword matching with constraints
- Primary: Line gap > 1.4× typical gap
- Fallback: Bold text after non-bold text
- Groups elements into lines using Y-coordinate clustering
- Merges adjacent elements within character width
- Language: TypeScript 5.0+
- PDF Processing: PDF.js (Mozilla)
- Testing: Jest
- Build: TypeScript Compiler + ESM conversion
- Code Quality: ESLint + Prettier
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
MIT - See LICENSE file for details
For issues, questions, or suggestions:
- GitHub Issues: [Your repo]/issues
- Documentation: [Your repo]/wiki
- Initial release
- Complete resume extraction pipeline
- Support for standard resume sections
- Comprehensive testing