Simeon Flühmann (fluehsi2)
Elias Hager (hagereli)
Joanna Gutbrod (gutbrjoa)
WAVE aims to link current events with relevant Wikipedia articles, providing a platform to visualize how changes to these articles relate to ongoing events.
More information about this project is available in the blog post.
During WAVE’s development, we explored various approaches for identifying the most relevant Wikipedia articles for a given topic. The code and evaluation of these approaches can be found in the research branch of this repository.
Before deploying the project, ensure the following tools are installed on your system:
- Docker
- Docker Compose
- Git
-
Clone this repository:
git clone <repository-url>
-
Move into the git repository directory:
cd WAVE -
Also clone the git submodules:
This will ensure that all necessary submodules are included in your local copy of the repository.git submodule update --init --recursive
-
Move into the source directory:
cd src -
Copy the
.env.examplefile to.envand fill in the required values:cp .env.example .env
Update the
.envfile with your specific configuration, such as database credentials, API keys, and ports.
-
Build the required Docker images:
docker compose --profile build build
-
Run the Docker Compose stack:
docker compose --profile deploy up -d
Use the
-dflag to run in detached mode.
- Frontend: Accessible at
http://localhost:<FRONTEND_PORT>(default: 5000) - Orchestrator Dashboard: Accessible at
http://localhost:<DASHBOARD_PORT>/admin(default: 5050, default password:changeme)- Set up recurring Tasks in the Orchestrator Dashboard.
- For testing run the following from the commandline:
docker run --rm --env-file .env --name data-collector-test --network wave_default data-collector --date "latest"
- For real deployment, set up NginxPM in the dashboard exposed at
http://localhost:81.
-
Docker Container Issues:
- Check the logs of a specific container:
docker logs <container_name>
- Check the logs of a specific container:
-
Environment Variable Issues:
- Ensure the
.envfile is correctly configured and matches the required format.
- Ensure the
-
Port Conflicts:
- Verify that the ports specified in the
.envfile are not already in use.
- Verify that the ports specified in the
-
Rebuilding Images:
- If changes are made to the code or configuration, rebuild the images:
docker compose --profile build build
- If changes are made to the code or configuration, rebuild the images:
For local development, you can modify the services and rebuild the images as needed. Use the following command to stop and remove all containers:
docker compose --profile deploy downTo clean up unused Docker resources:
docker system prune -fBelow is the directory structure of the project, with explanations for the most important files:
WAVE
│
├── README.md # Project documentation (this file)
├── docker-compose.yml # Docker Compose configuration for all services
├── .env.example # Example environment variables file
│
└── src # Main source folder for all services
├── data-collector # Service for collecting and processing data
│ ├── run.py # Main script for data collection
│ ├── clean_data.py # Cleans and preprocesses collected data
│ ├── clustering.py # Clustering logic for data analysis
│ ├── Dockerfile # Dockerfile for the data-collector service
│ └── ... # Other scripts for data collection
│
├── frontend # Service for the user interface
│ ├── app.py # Main Flask application for the frontend
│ ├── visualisation.py # Visualization logic for displaying data
│ ├── Dockerfile # Dockerfile for the frontend service
│ ├── static # Static assets (CSS, JS, images)
│ ├── templates # HTML templates for the frontend
│ └── ... # Other frontend-related files
│
├── history-collector # Service for collecting Wikipedia history data
│ ├── run.py # Main script for collecting historical data
│ ├── safe_wiki_to_db.py # Saves Wikipedia data to the database
│ ├── Dockerfile # Dockerfile for the history-collector service
│ └── ... # Other scripts for history collection
│
└── orchestrator # Service for orchestrating tasks
├── app.py # Main Flask application for orchestrating tasks
├── queue_api.py # API for managing task queues
├── utils.py # Utility functions for the orchestrator
├── Dockerfile # Dockerfile for the orchestrator service
└── ... # Other orchestrator-related files
