pinDef: A Benchmark Dataset for Extracting Pin Definitions from Electronic Component Datasheets

This repository contains code and data for the NeurIPS paper "pinDef: A Benchmark Dataset for Extracting Pin Definitions from Electronic Component Datasheets".

This project provides a benchmark dataset and tools for extracting pin definitions from electronic component datasheets, supporting research and development in automated datasheet analysis.

Installation

This project requires Python 3.10.

1. Install requirements

pip install -r requirements.txt

2. Setup environment

Rename the file .env.sample to .env.
Replace the placeholders in .env with your actual API keys and variables.

3. Insert dataset into MongoDB

The dataset introduced in the paper is presented in the file components.json. To use the dataset in this project, insert it into the database by running the script import_components.py:

python import_components.py

4. Download all the pdf-datasheets

To download all the pdf-datasheets, run the file download_datasheets.py:

python download_datasheets.py

Repository Structure

components.json: Dataset of sensor components with pin details and datasheet links.
src/: Contains all the Python code needed for the experiments as well as the code for the web server for the webFrontend.
webFrontend/: Contains tools to collect new components, review already collected components, and a page to perform manual grading.

Experiment from the paper

Result Table (Table 1)

To obtain the results from the tables, execute the three pipelines:

proprietary_pipeline.py
vision_pipeline.py
text_pipeline.py

Each pipeline processes the sensor component datasheets differently, leveraging various models and techniques.

Execution Policy

The execution_policy controls whether a pipeline step should run or use cached results. It has three modes:

OVERWRITE: Always run the step and overwrite any cached results.
CACHE: Run the step but use cached results if available.
CACHE_ONLY: Only use cached results; do not run the step if no cache exists.

Exception Policy

The exception_policy defines how exceptions during step execution are handled:

TRY: Attempt to run the step, save exceptions if they occur, and continue.
THROW: Raise exceptions immediately and do not save them.
IGNORE: Ignore exceptions, do not save them, and return None.

Together, these policies provide flexible and reliable control over the pipeline executions, allowing for customization based on the use case or experimental needs.

Quantitative Results (Figure 4)

To obtain the quantitative results for Figure 4 in the paper, execute the following notebook in the root of the project:

quantitative_analysis.ipynb

Statistics (Figure 2)

To obtain the statistical data presented in the paper, execute the following notebook in the root of the project:

statistics.ipynb

Web Tools

Execution

Start the backend server:

cd webFrontend/src/server/
fastapi dev src/server/main.py

Install the frontend requirements:

cd webFrontend
npm install

Start the frontend:

cd webFrontend
npm run dev

Usage

The frontend offers three functionalities:

Component Collection

Allows collecting components which are then stored in the MongoDB database. The components can then be downloaded using the script export_components.py.

View Components

Allows reviewing all components in the database.

Human Grading

Allows manual grading of pins. The file random_pins.json is read, which serves as the basis for the experiment in section 3.4 of the paper. The file random_pins.json was generated by the script get_random_pins.py. The agreement of the human vs LLM grading can then be evaluated using the script compare_gradings.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pinDef: A Benchmark Dataset for Extracting Pin Definitions from Electronic Component Datasheets

Installation

1. Install requirements

2. Setup environment

3. Insert dataset into MongoDB

4. Download all the pdf-datasheets

Repository Structure

Experiment from the paper

Result Table (Table 1)

Execution Policy

Exception Policy

Quantitative Results (Figure 4)

Statistics (Figure 2)

Web Tools

Execution

Usage

Component Collection

View Components

Human Grading

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
webFrontend		webFrontend
.env.sample		.env.sample
.gitignore		.gitignore
compare_gradings.py		compare_gradings.py
components.Component.json		components.Component.json
components.json		components.json
download_datasheets.py		download_datasheets.py
export_components.py		export_components.py
get_random_pins.py		get_random_pins.py
import_components.py		import_components.py
proprietary_pipeline.py		proprietary_pipeline.py
quantitative_analysis.ipynb		quantitative_analysis.ipynb
random_pins.json		random_pins.json
readme.md		readme.md
requirements.txt		requirements.txt
result_table.py		result_table.py
statistics.ipynb		statistics.ipynb
text_pipeline.py		text_pipeline.py
vision_pipeline.py		vision_pipeline.py

tk-king/pinDef

Folders and files

Latest commit

History

Repository files navigation

pinDef: A Benchmark Dataset for Extracting Pin Definitions from Electronic Component Datasheets

Installation

1. Install requirements

2. Setup environment

3. Insert dataset into MongoDB

4. Download all the pdf-datasheets

Repository Structure

Experiment from the paper

Result Table (Table 1)

Execution Policy

Exception Policy

Quantitative Results (Figure 4)

Statistics (Figure 2)

Web Tools

Execution

Usage

Component Collection

View Components

Human Grading

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages