Python APIs Data Integration Exercise

Overview

This project demonstrates how to integrate data from the Open Library API into a Neo4j graph database. The goal is to fetch book data based on a user-specified subject, clean and transform the data, and load it into a Neo4j database for further analysis and visualization. The project also includes functionality to run analytics queries on the data stored in Neo4j.

Features

Fetch Data from Open Library API:
- Dynamically fetch book data by subject using the Open Library API.
- Handle API errors (e.g., retry mechanism for 503 Service Unavailable).
Data Transformation:
- Extract relevant book information (e.g., title, authors, publication year, subject).
- Clean and enhance the data (e.g., normalize titles, handle missing values, generate unique IDs).
Neo4j Integration:
- Load the cleaned data into a Neo4j graph database.
- Create nodes for books, authors, subjects, and publication years.
- Establish relationships between these entities (e.g., WRITTEN_BY, HAS_SUBJECT, PUBLISHED_IN).
Analytics Queries:
- Run queries to analyze the data in Neo4j, such as:
  - Top authors by the number of books.
  - Distribution of books by decade.
  - Most popular subjects.
Error Handling:
- Gracefully handle API errors and database connection issues.
- Log errors and provide meaningful feedback to the user.

Project Structure

.
├── config.py                # Configuration file for environment variables
├── neo4j_connection.py      # Handles Neo4j database operations
├── open_library_API.py      # Fetches and processes data from the Open Library API
├── test_neo4j_connection.py # Script to test the Neo4j connection
├── .env                     # Environment variables (e.g., Neo4j credentials)
├── README.md                # Project documentation

Prerequisites

Python 3.8 or higher
Neo4j database installed and running locally or remotely

.env file with the following variables:

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=<your_password>

Installation

Clone the repository:

git clone <repository_url>
cd Python-APIs-data-integration-exercise

Install dependencies:
```
pip install -r requirements.txt
```
Set up the .env file with your Neo4j credentials.
Start your Neo4j database.

Usage

Run the main script to fetch data and load it into Neo4j:
```
python open_library_API.py
```
Follow the prompts to enter a subject (e.g., "python", "ships").
View the data in Neo4j or run analytics queries.
Test the Neo4j connection (optional):
```
python test_neo4j_connection.py
```

Example Workflow

Enter a subject (e.g., "ships").
The script fetches book data from the Open Library API.
The data is cleaned and transformed into a structured format.
The cleaned data is loaded into Neo4j as nodes and relationships.
Analytics queries are run to extract insights from the data.

Key Files

open_library_API.py: Handles API requests, data extraction, cleaning, and Neo4j integration.
neo4j_connection.py: Manages Neo4j database operations (e.g., creating nodes, relationships, and constraints).
test_neo4j_connection.py: Tests the connection to the Neo4j database.
config.py: Loads environment variables for Neo4j credentials.

Analytics Queries

The following analytics queries are run on the Neo4j database:

Node Counts:
- Count the number of nodes by type (e.g., books, authors, subjects).
Top Authors:
- Find the top 5 authors with the most books.
Books by Decade:
- Analyze the distribution of books published by decade.
Top Subjects:
- Identify the top 5 subjects with the most books.

Troubleshooting

API Errors: If the API returns a 503 error, the script retries the request with exponential backoff.
Neo4j Connection Issues: Ensure the Neo4j database is running and the credentials in the .env file are correct.

License

This project is for educational purposes and is not licensed for commercial use.

Acknowledgments

Open Library API for providing book data.
Neo4j for the graph database platform.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
.gitignore		.gitignore
README.md		README.md
neo4j_connection.py		neo4j_connection.py
open_library_API.py		open_library_API.py
test_neo4j_connection.py		test_neo4j_connection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python APIs Data Integration Exercise

Overview

Features

Project Structure

Prerequisites

Installation

Usage

Example Workflow

Key Files

Analytics Queries

Troubleshooting

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

MenteCat/Python-APIs-data-integration-exercise

Folders and files

Latest commit

History

Repository files navigation

Python APIs Data Integration Exercise

Overview

Features

Project Structure

Prerequisites

Installation

Usage

Example Workflow

Key Files

Analytics Queries

Troubleshooting

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages