PERS HTML Project

This project automates the process of fetching article data from the PERS webpage on the IngentaConnect website, saving it to a CSV file, and generating an HTML file to display the articles dynamically on a website.

Workflow Overview

Scrape Data from PERS Webpage: Use Python to read research articles from the IngentaConnect PERS webpage, extracting titles, authors, page numbers, access status, and URLs.
Save to CSV: Store the scraped article data in a CSV file (filtered_articles_info.csv).
Generate HTML: Convert the CSV file content into an HTML file (articles.html) for embedding in a web page.
Load Articles HTML Dynamically: Integrate articles.html into your main webpage using JavaScript to load the content dynamically.

Steps

Step 1: Run the Python Script to Scrape and Save Data

Execute the Python script to scrape articles from the PERS webpage on IngentaConnect.
The script saves the article data to filtered_articles_info.csv.

Sample Python Code: Ensure the Python script is set up correctly to fetch the articles and output them to filtered_articles_info.csv.

Step 2: Generate `articles.html` from CSV

After obtaining filtered_articles_info.csv, run the second Python script to generate articles.html based on the CSV data.
The HTML file will format each article according to the desired layout.

Step 3: Upload `articles.html` to GitHub

Add articles.html to your GitHub repository.
Commit and push the file to make it available for GitHub Pages.

Step 4: Use GitHub Pages to Host `articles.html`

Go to your repository on GitHub.
Navigate to Settings > Pages.
In the Source section, select the branch where articles.html is stored (typically main) and set the root directory.
Save your settings. GitHub Pages will provide a URL for your repository, such as https://yourusername.github.io/your-repository-name/.

Once GitHub Pages is set up, you can access articles.html at https://yourusername.github.io/your-repository-name/articles.html.

Step 5: Integrate `articles.html` with JavaScript on Your Main HTML Page

Create a Placeholder <div>: In your main HTML file where you want the articles to appear, add a <div> with an ID (e.g., articles-content):
```
<div id="articles-content"></div>
```

Add JavaScript to Load articles.html: Include the following JavaScript code in your main HTML file to fetch and inject articles.html into the #articles-content div:

document.addEventListener("DOMContentLoaded", function() {
 // Fetch the articles HTML content and inject it into the placeholder div
 fetch('https://yourusername.github.io/your-repository-name/articles.html')
     .then(response => {
         if (!response.ok) {
             throw new Error("Network response was not ok " + response.statusText);
         }
         return response.text();
     })
     .then(html => {
         document.getElementById('articles-content').innerHTML = html;
     })
     .catch(error => {
         console.error("Failed to load articles.html:", error);
         document.getElementById('articles-content').innerHTML = "<p>Failed to load articles. Please try again later.</p>";
     });
});

The following JS read OA html pool, and non OA html pool. randomly get 3 articles from the pool and updated it by every loading.

async function loadRandomArticles() {
 // URLs of the HTML files
 const openAccessURL = 'path/to/open_access_articles.html';
 const memberOnlyURL = 'path/to/member_only_articles.html';

 // Fetch and parse the articles from each HTML file
 const openAccessArticles = await fetchArticles(openAccessURL);
 const memberOnlyArticles = await fetchArticles(memberOnlyURL);

 // Randomly select 3 articles from each group
 const selectedOpenAccess = selectRandomArticles(openAccessArticles, 3);
 const selectedMemberOnly = selectRandomArticles(memberOnlyArticles, 3);

 // Combine and display the selected articles
 const articlesContainer = document.getElementById('articles-content');
 articlesContainer.innerHTML = [...selectedOpenAccess, ...selectedMemberOnly].join('');
 }

 // Fetch and parse articles from a URL
 async function fetchArticles(url) {
     try {
         const response = await fetch(url);
         const htmlText = await response.text();

         // Create a temporary DOM element to parse the HTML
         const parser = new DOMParser();
         const doc = parser.parseFromString(htmlText, 'text/html');

         // Extract all article elements
         return Array.from(doc.querySelectorAll('article')).map(article => article.outerHTML);
     } catch (error) {
         console.error(`Failed to load articles from ${url}`, error);
         return [];
     }
 }

 // Select a specified number of random articles from the list
 function selectRandomArticles(articles, count) {
     const shuffled = articles.sort(() => 0.5 - Math.random());
     return shuffled.slice(0, count);
 }

 // Run the function to load random articles on page load
 loadRandomArticles();

The following JS read OA html pool, and non OA html pool. randomly get 3 articles from the pool and updated it everyday.

async function loadDailyArticles() {
    // URLs of the HTML files
    const openAccessURL = 'path/to/open_access_articles.html';
    const memberOnlyURL = 'path/to/member_only_articles.html';

    // Fetch and parse the articles from each HTML file
    const openAccessArticles = await fetchArticles(openAccessURL);
    const memberOnlyArticles = await fetchArticles(memberOnlyURL);

    // Generate a daily seed based on the current date
    const dailySeed = generateDailySeed();

    // Select 3 articles from each list using the daily seed
    const selectedOpenAccess = selectSeededArticles(openAccessArticles, 3, dailySeed);
    const selectedMemberOnly = selectSeededArticles(memberOnlyArticles, 3, dailySeed + 1);

    // Combine and display the selected articles
    const articlesContainer = document.getElementById('articles-content');
    articlesContainer.innerHTML = [...selectedOpenAccess, ...selectedMemberOnly].join('');
}

// Function to generate a daily seed based on the date
function generateDailySeed() {
    const now = new Date();
    return now.getFullYear() * 1000 + Math.floor((now - new Date(now.getFullYear(), 0, 0)) / 86400000);
}

// Fetch and parse articles from a URL
async function fetchArticles(url) {
    try {
        const response = await fetch(url);
        const htmlText = await response.text();

        // Create a temporary DOM element to parse the HTML
        const parser = new DOMParser();
        const doc = parser.parseFromString(htmlText, 'text/html');

        // Extract all article elements
        return Array.from(doc.querySelectorAll('article')).map(article => article.outerHTML);
    } catch (error) {
        console.error(`Failed to load articles from ${url}`, error);
        return [];
    }
}

// Select a specified number of articles based on a seeded shuffle
function selectSeededArticles(articles, count, seed) {
    const seededRandom = mulberry32(seed);
    const shuffled = articles
        .map((article) => ({ article, sortValue: seededRandom() }))
        .sort((a, b) => a.sortValue - b.sortValue)
        .map(({ article }) => article);

    return shuffled.slice(0, count);
}

// Pseudo-random number generator using a seed
function mulberry32(seed) {
    return function() {
        let t = seed += 0x6D2B79F5;
        t = Math.imul(t ^ t >>> 15, t | 1);
        t ^= t + Math.imul(t ^ t >>> 7, t | 61);
        return ((t ^ t >>> 14) >>> 0) / 4294967296;
    };
}

// Run the function to load daily articles on page load
loadDailyArticles();

Replace https://yourusername.github.io/your-repository-name/articles.html with your actual GitHub Pages URL. for example https://tang1693.github.io/PERShtml/articles.html

Updating the Articles

Re-run the Python Scripts: To update the articles

re-run the Python scripts to scrape the latest data and regenerate filtered_articles_info.csv and articles.html.

Push to GitHub:

Commit and push the updated articles.html to your repository. GitHub Pages will automatically serve the latest version.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
1_InPress		1_InPress
2_Issues		2_Issues
3_MostCited		3_MostCited
4_MostDownload		4_MostDownload
5_RecentArticles		5_RecentArticles
6_IssuesArticles		6_IssuesArticles
IssuesArticles		IssuesArticles
PublishersCollectioninIngenta		PublishersCollectioninIngenta
chromedriver/mac-arm64		chromedriver/mac-arm64
website_backups		website_backups
.DS_Store		.DS_Store
articles.html		articles.html
articles_with_citations.csv		articles_with_citations.csv
articles_with_citations_scholarly.csv		articles_with_citations_scholarly.csv
filtered_articles_info_abs.csv		filtered_articles_info_abs.csv
in_press_articles.csv		in_press_articles.csv
in_press_articles.html		in_press_articles.html
issues.html		issues.html
journals_cleaned.csv		journals_cleaned.csv
journals_cleaned.numbers		journals_cleaned.numbers
journals_cleaned_with_issn_and_latest_issue.csv		journals_cleaned_with_issn_and_latest_issue.csv
journals_filtered.csv		journals_filtered.csv
journals_with_issn_and_latest_issue.csv		journals_with_issn_and_latest_issue.csv
member_only_articles.html		member_only_articles.html
most_download_articles.html		most_download_articles.html
open_access_articles.html		open_access_articles.html
readme.md		readme.md
sorted_articles_by_citations.csv		sorted_articles_by_citations.csv
test.csv		test.csv
todo.py		todo.py
top_6_articles.html		top_6_articles.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PERS HTML Project

Workflow Overview

Steps

Step 1: Run the Python Script to Scrape and Save Data

Step 2: Generate `articles.html` from CSV

Step 3: Upload `articles.html` to GitHub

Step 4: Use GitHub Pages to Host `articles.html`

Step 5: Integrate `articles.html` with JavaScript on Your Main HTML Page

Updating the Articles

Re-run the Python Scripts: To update the articles

Push to GitHub:

About

Uh oh!

Releases

Packages

Languages

tang1693/PERShtml

Folders and files

Latest commit

History

Repository files navigation

PERS HTML Project

Workflow Overview

Steps

Step 1: Run the Python Script to Scrape and Save Data

Step 2: Generate articles.html from CSV

Step 3: Upload articles.html to GitHub

Step 4: Use GitHub Pages to Host articles.html

Step 5: Integrate articles.html with JavaScript on Your Main HTML Page

Updating the Articles

Re-run the Python Scripts: To update the articles

Push to GitHub:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 2: Generate `articles.html` from CSV

Step 3: Upload `articles.html` to GitHub

Step 4: Use GitHub Pages to Host `articles.html`

Step 5: Integrate `articles.html` with JavaScript on Your Main HTML Page

Packages