generated from TechLadies/nodejs-backend-starterkit
-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
With the current scrape implementation, it only fetches events from the most active 100 tech groups due to the default sort & pagination imposed by the Meetup Website. Hence @danielepolencic suggested to fetch newer groups and add them to DB first, then we fetch events based on the groups in the DB.
This issue should be addressed with the following solution:
Step by Step description
Current Implementation:
- Fetch 100 most active groups & their RSS urls
- Parse RSS urls to get relevant event urls
- fetch event details from event urls
- if events don't already exist in events table, add them to events table. otherwise update state of existing events.
Proposed New Implementation:
Getting groups
- get 100 newest groups
- if groups don't already exist in groups table, add them to groups table. otherwise update state of existing events.
- if any already exist, stop this task.
Getting events
- based on groups table, get RSS urls, and parse them to get relevant event urls.
- fetch event details from event urls
- if events don't already exist in events table, add them to events table. otherwise update state of existing events.
High level overview of tasks:
- Configure the harvester service to continuously scrape for groups & add them to the DB until it find a group that already exists in the DB
- Configure the harvester service to parse RSS from groups existing in the DB
- Check for duplication of groups & events
- Add integration tests
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels