This project automates the process of fetching article data from the PERS webpage on the IngentaConnect website, saving it to a CSV file, and generating an HTML file to display the articles dynamically on a website.
- Scrape Data from PERS Webpage: Use Python to read research articles from the IngentaConnect PERS webpage, extracting titles, authors, page numbers, access status, and URLs.
- Save to CSV: Store the scraped article data in a CSV file (
filtered_articles_info.csv). - Generate HTML: Convert the CSV file content into an HTML file (
articles.html) for embedding in a web page. - Load Articles HTML Dynamically: Integrate
articles.htmlinto your main webpage using JavaScript to load the content dynamically.
- Execute the Python script to scrape articles from the PERS webpage on IngentaConnect.
- The script saves the article data to
filtered_articles_info.csv.
Sample Python Code:
Ensure the Python script is set up correctly to fetch the articles and output them to filtered_articles_info.csv.
- After obtaining
filtered_articles_info.csv, run the second Python script to generatearticles.htmlbased on the CSV data. - The HTML file will format each article according to the desired layout.
- Add
articles.htmlto your GitHub repository. - Commit and push the file to make it available for GitHub Pages.
- Go to your repository on GitHub.
- Navigate to Settings > Pages.
- In the Source section, select the branch where
articles.htmlis stored (typicallymain) and set the root directory. - Save your settings. GitHub Pages will provide a URL for your repository, such as
https://yourusername.github.io/your-repository-name/.
Once GitHub Pages is set up, you can access articles.html at https://yourusername.github.io/your-repository-name/articles.html.
-
Create a Placeholder
<div>: In your main HTML file where you want the articles to appear, add a<div>with an ID (e.g.,articles-content):<div id="articles-content"></div>
-
Add JavaScript to Load articles.html: Include the following JavaScript code in your main HTML file to fetch and inject articles.html into the #articles-content div:
document.addEventListener("DOMContentLoaded", function() { // Fetch the articles HTML content and inject it into the placeholder div fetch('https://yourusername.github.io/your-repository-name/articles.html') .then(response => { if (!response.ok) { throw new Error("Network response was not ok " + response.statusText); } return response.text(); }) .then(html => { document.getElementById('articles-content').innerHTML = html; }) .catch(error => { console.error("Failed to load articles.html:", error); document.getElementById('articles-content').innerHTML = "<p>Failed to load articles. Please try again later.</p>"; }); });
The following JS read OA html pool, and non OA html pool. randomly get 3 articles from the pool and updated it by every loading.
async function loadRandomArticles() { // URLs of the HTML files const openAccessURL = 'path/to/open_access_articles.html'; const memberOnlyURL = 'path/to/member_only_articles.html'; // Fetch and parse the articles from each HTML file const openAccessArticles = await fetchArticles(openAccessURL); const memberOnlyArticles = await fetchArticles(memberOnlyURL); // Randomly select 3 articles from each group const selectedOpenAccess = selectRandomArticles(openAccessArticles, 3); const selectedMemberOnly = selectRandomArticles(memberOnlyArticles, 3); // Combine and display the selected articles const articlesContainer = document.getElementById('articles-content'); articlesContainer.innerHTML = [...selectedOpenAccess, ...selectedMemberOnly].join(''); } // Fetch and parse articles from a URL async function fetchArticles(url) { try { const response = await fetch(url); const htmlText = await response.text(); // Create a temporary DOM element to parse the HTML const parser = new DOMParser(); const doc = parser.parseFromString(htmlText, 'text/html'); // Extract all article elements return Array.from(doc.querySelectorAll('article')).map(article => article.outerHTML); } catch (error) { console.error(`Failed to load articles from ${url}`, error); return []; } } // Select a specified number of random articles from the list function selectRandomArticles(articles, count) { const shuffled = articles.sort(() => 0.5 - Math.random()); return shuffled.slice(0, count); } // Run the function to load random articles on page load loadRandomArticles();
The following JS read OA html pool, and non OA html pool. randomly get 3 articles from the pool and updated it everyday.
async function loadDailyArticles() { // URLs of the HTML files const openAccessURL = 'path/to/open_access_articles.html'; const memberOnlyURL = 'path/to/member_only_articles.html'; // Fetch and parse the articles from each HTML file const openAccessArticles = await fetchArticles(openAccessURL); const memberOnlyArticles = await fetchArticles(memberOnlyURL); // Generate a daily seed based on the current date const dailySeed = generateDailySeed(); // Select 3 articles from each list using the daily seed const selectedOpenAccess = selectSeededArticles(openAccessArticles, 3, dailySeed); const selectedMemberOnly = selectSeededArticles(memberOnlyArticles, 3, dailySeed + 1); // Combine and display the selected articles const articlesContainer = document.getElementById('articles-content'); articlesContainer.innerHTML = [...selectedOpenAccess, ...selectedMemberOnly].join(''); } // Function to generate a daily seed based on the date function generateDailySeed() { const now = new Date(); return now.getFullYear() * 1000 + Math.floor((now - new Date(now.getFullYear(), 0, 0)) / 86400000); } // Fetch and parse articles from a URL async function fetchArticles(url) { try { const response = await fetch(url); const htmlText = await response.text(); // Create a temporary DOM element to parse the HTML const parser = new DOMParser(); const doc = parser.parseFromString(htmlText, 'text/html'); // Extract all article elements return Array.from(doc.querySelectorAll('article')).map(article => article.outerHTML); } catch (error) { console.error(`Failed to load articles from ${url}`, error); return []; } } // Select a specified number of articles based on a seeded shuffle function selectSeededArticles(articles, count, seed) { const seededRandom = mulberry32(seed); const shuffled = articles .map((article) => ({ article, sortValue: seededRandom() })) .sort((a, b) => a.sortValue - b.sortValue) .map(({ article }) => article); return shuffled.slice(0, count); } // Pseudo-random number generator using a seed function mulberry32(seed) { return function() { let t = seed += 0x6D2B79F5; t = Math.imul(t ^ t >>> 15, t | 1); t ^= t + Math.imul(t ^ t >>> 7, t | 61); return ((t ^ t >>> 14) >>> 0) / 4294967296; }; } // Run the function to load daily articles on page load loadDailyArticles();
Replace https://yourusername.github.io/your-repository-name/articles.html with your actual GitHub Pages URL. for example https://tang1693.github.io/PERShtml/articles.html
re-run the Python scripts to scrape the latest data and regenerate filtered_articles_info.csv and articles.html.
Commit and push the updated articles.html to your repository. GitHub Pages will automatically serve the latest version.