Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
d19e426
Enhance Airflow environment setup and PDF browsing functionality (#3)
falric05 Apr 24, 2025
ae6d3aa
Add bootstrap script and configuration for Airflow with UID and GID
falric05 Apr 29, 2025
9be2f48
Add DL3 Explorer plugin with web interface for viewing result pipelin…
falric05 Apr 29, 2025
72ad439
Removed the service in the webgui compose, because it is integrated i…
falric05 Apr 29, 2025
14f8256
Enhance DL3 Explorer UI with improved styling and navigation links
falric05 Apr 29, 2025
fc6901b
Re-add the accidentally deleted scripts to generate input.yaml and in…
falric05 Apr 29, 2025
effbac4
Removed docker-compose.override file and updated bootsrap file to cha…
falric05 Apr 29, 2025
edb15fe
Update README.md
falric05 Apr 29, 2025
3661f42
Add exception handler for Airflow tasks and improve logger configurat…
falric05 May 13, 2025
dc122d1
Added alert monitoring system with email sending and Mailhog integration
falric05 May 14, 2025
e3e983b
Implement email alert system for task failures with configuration and…
falric05 Jun 23, 2025
ae3450a
Remove alert monitoring system and related logging configuration files
falric05 Jun 23, 2025
be07b1a
Refactor task handler for Airflow tasks and update logging to print s…
falric05 Jun 23, 2025
42cca52
Update Docker configuration and entrypoint script for COSI directory …
falric05 Jul 28, 2025
04e3b14
Add HEASARC Data Explorer plugin with folder browsing and file downlo…
falric05 Jul 28, 2025
546aa0d
Enhance data explorer functionality by displaying current path and al…
falric05 Jul 28, 2025
6d79d91
Refactor DataPipeline initialization to use environment variables for…
falric05 Jul 28, 2025
75a8778
Update Dockerfile for Airflow environment to include unzip and modify…
falric05 Sep 19, 2025
bc163fb
Add tags to fail_task DAG and enhance explorer.html with CSS comments…
falric05 Sep 19, 2025
9f992ce
Add new DAGs for COSI TS Map computation and single-task optimization
falric05 Sep 19, 2025
b567b4a
Add new parallel DAGs for testing
falric05 Sep 19, 2025
6508223
Update docker-compose.yaml to include pipeline volume for Airflow ser…
falric05 Sep 19, 2025
8b11bcd
Add new DAG and utility functions for GRB light curve generation
falric05 Oct 7, 2025
0275569
Enhance data explorer with file preview functionality and improved na…
falric05 Oct 7, 2025
f73a133
Add .env.example file for Airflow environment configuration
falric05 Oct 15, 2025
a82e6ad
Update Dockerfile for Airflow to allow dynamic user and group IDs
falric05 Oct 15, 2025
6b55034
Add BOOTSTRAP ID variables to .env.example for user configuration
falric05 Oct 15, 2025
1f3ecfc
Update docker-compose.yaml to enhance user configuration and volume m…
falric05 Oct 15, 2025
8f1f61b
Update Dockerfile for Airflow to set default user and group IDs
falric05 Oct 15, 2025
7c4bb9f
Update PostgreSQL connection string in airflow.cfg.postgresql
falric05 Oct 15, 2025
0470b82
Update README.md to enhance setup instructions and environment config…
falric05 Oct 15, 2025
c91222c
Update README.md to clarify environment setup instructions
falric05 Oct 16, 2025
9fc2baa
Update README.md and Dockerfile.airflow for user and group ID consist…
falric05 Oct 16, 2025
1cc70ef
Update Dockerfile.airflow to allow empty user and group ID arguments
falric05 Oct 16, 2025
b126d0a
Update .gitignore to exclude postgres data folder
falric05 Oct 16, 2025
56bff0d
Update .gitignore to refine data folder exclusions
falric05 Oct 16, 2025
deeb8ae
Refine .gitignore and update docker-compose volume mapping
falric05 Oct 16, 2025
4aa50dd
Refactor .gitignore and enhance README.md for PostgreSQL data setup
falric05 Oct 16, 2025
a309713
Fix typo in README.md for PostgreSQL data directory creation
falric05 Oct 17, 2025
af4356b
Update README.md to include note about Cosiflow introduction talk at …
falric05 Oct 17, 2025
b5ae435
Update README.md to improve PostgreSQL data directory setup instructions
falric05 Oct 17, 2025
ea65fc3
Enhance docker-compose and Dockerfile for improved Airflow setup
falric05 Oct 17, 2025
90bb630
Update docker-compose.yaml to include PostgreSQL password for Airflow DB
falric05 Oct 18, 2025
3df208b
Add binning scripts and input YAML files for GRB and background data …
falric05 Oct 18, 2025
f8ef935
Refactor input validation in cosipipe_lc_ops.py to remove unused YAML…
falric05 Oct 18, 2025
d941bd3
Add download_data.py script and update README.md for data retrieval i…
falric05 Oct 18, 2025
bd0f72a
Update DAG tags in cosipipe_lightcurve.py to include 'handson' and 't…
falric05 Oct 18, 2025
b4e8d86
Add hello_world_dag.py for a minimal Airflow example
falric05 Oct 18, 2025
3ef8505
Refactor TSMap pipeline and update related scripts and documentation
falric05 Oct 18, 2025
51de0d4
Add DAG Tutorial section to README.md
falric05 Oct 18, 2025
dda7576
Swap sections for Light Curve and TSmap plot pipelines in README.md f…
falric05 Oct 18, 2025
12555a7
Old DAG backup
falric05 Oct 18, 2025
284af83
Update owner in cosipipe_tsmap.py from 'cosipy_team' to 'gamma' for c…
falric05 Oct 18, 2025
6d4f8fe
Update hello_world_dag.py to include default arguments and modify DAG…
falric05 Oct 18, 2025
0a1651b
Add tutorials DAG
falric05 Oct 19, 2025
52624af
Add COSIfest Airflow tutorial slides
falric05 Oct 20, 2025
ebc2806
- Renamed Alice and Bob as A and B
falric05 Oct 20, 2025
3d2fd58
Renamed README.md
falric05 Oct 20, 2025
447259f
minor changes
falric05 Oct 20, 2025
75859fd
typo
falric05 Oct 20, 2025
71c76ae
Update .gitignore to include specific data file for tracking
falric05 Oct 21, 2025
1043ec3
Refactor data handling in pipeline scripts
falric05 Oct 21, 2025
b5280a3
Update .gitignore to track all data files in the data directory
falric05 Oct 21, 2025
23873d2
Enhance .gitignore to manage raw data files and add specific backgrou…
falric05 Oct 21, 2025
225478a
Refactor ts_map plotting functions to remove redundant skycoord param…
falric05 Oct 21, 2025
773cf8e
Add background window extraction script
falric05 Nov 11, 2025
6b1d0ca
Add paths module for file path management
falric05 Nov 11, 2025
7b89854
Add COSI pipeline initialization and staging script
falric05 Nov 11, 2025
d2d5f75
Add example usage for paths module in new test script
falric05 Nov 11, 2025
1b8933a
Add COSIDAG module for automated folder monitoring and processing
falric05 Nov 11, 2025
98260a8
Enhance COSIDAG module with input resolution and failure notification
falric05 Nov 13, 2025
f4a9248
Add example DAG for COSIDAG module
falric05 Nov 13, 2025
027a37f
Update DAG ID in cosidag_example.py for clarity and consistency
falric05 Nov 13, 2025
3db574e
Update .gitignore and remove background data file
falric05 Nov 13, 2025
2699d0a
Add configuration helpers to COSIDAG module
falric05 Nov 13, 2025
9dc34d9
Add COSIDAG TSMap DAG and associated operations
falric05 Nov 13, 2025
dbc7f0b
Add controlled parallelism settings to COSIDAG configuration
falric05 Nov 13, 2025
e29e593
Enhance COSIDAG with automatic retriggering feature
falric05 Nov 19, 2025
9507021
Refactor COSIDAG date filtering logic and enhance folder monitoring
falric05 Nov 24, 2025
4e813cb
Add date helper module for COSIDAG
falric05 Nov 24, 2025
eaa5d64
Update COSIDAG date handling and add new lcurve DAG
falric05 Nov 24, 2025
eb03d6b
Add utility functions for GRB data binning and light curve plotting
falric05 Nov 24, 2025
2786e14
Refactor mailhog_link_plugin.py for clarity and consistency
falric05 Nov 24, 2025
653c384
Refactor tutorial DAGs for improved structure and clarity
falric05 Nov 24, 2025
b73f81e
Add README.md for COSIDAG module
falric05 Nov 24, 2025
c6a740a
Update README.md
falric05 Dec 3, 2025
4c37a63
Add module directory to Docker Compose configuration
falric05 Dec 10, 2025
aa365d2
Add development environment for COSIPY tools in Dockerfile
falric05 Dec 15, 2025
418781b
Minor changes
falric05 Dec 15, 2025
ebf58b5
Remove deprecated DAGs and pipeline scripts
falric05 Dec 23, 2025
c8626e7
Add new tutorial DAGs for COSIDAG
falric05 Dec 23, 2025
b7d5978
Enhance COSIDAG functionality with optional task disabling and improv…
falric05 Dec 23, 2025
7263025
Refactor background window saving logic in bkg_cut.py
falric05 Dec 23, 2025
c06abc8
Refactor file naming logic in binning functions
falric05 Dec 23, 2025
7eca147
Add new DAG for initializing and staging COSI pipeline data
falric05 Dec 23, 2025
7ebc960
Update background file naming pattern in COSIDAG configurations
falric05 Dec 23, 2025
3afd942
Revise pipeline README to reflect new workflow and structure
falric05 Dec 23, 2025
dfb6eeb
Add comprehensive README for DAG & COSIDAG catalog
falric05 Dec 23, 2025
14782c3
Update README to introduce COSIDAG concept and enhance tutorial sections
falric05 Dec 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,9 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# Data Folder
data/*
# TODO: Remove when you need the cut background file anymore
# Raw data folder excluding only the background cut file
data/raw/
224 changes: 170 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,217 @@
# cosiflow
# Cosiflow

The COSI SDOC pipeline based on Apache Airflow
Cosiflow provides an Airflow-based orchestration environment for managing and monitoring scientific pipelines for COSI.

## Build the cosiflow docker
---

We assume that the cosiflow repository is in your $HOME directory.
### 1. REQUIREMENTS

```bash
cd $HOME/cosiflow/env
```
#### PREPARE THE ENVIRONMENT FILE

Mac:
1. Copy and rename the `.env.example` file as `.env`:
```bash
cd env
cp .env.example .env
```

```bash
docker build --platform linux/arm64 -t airflow:1.0.0 -f Dockerfile .
```
2. Open the `.env` file and set a secure password for Airflow:
```bash
AIRFLOW_ADMIN_PASSWORD=<YOUR_AIRFLOW_PASSWORD>
```

Linux:
3. Manually bootstrap user and group IDs for the container:

```bash
docker build -t airflow:1.1.0 -f Dockerfile .
```
```bash
id -u
# Copy the output as YOUR_USER_ID

## Execute the docker compose to start containers
id -g
# Copy the output as YOUR_GROUP_ID
```

```bash
docker compose up -d
```
4. Open `.env` and modify the following environment variables:

If you want to enter into the postgre docker container: `docker compose exec postgres bash`
```bash
UID=<YOUR_USER_ID>
GID=<YOUR_GROUP_ID>
```

If you want to enter into the postgre docker container: `docker compose exec airflow bash`
#### PREPARE THE DOCKERFILE

## Connect to the web server using a browser
Open the `Dockerfile.airflow` file and paste the same values:

localhost:8080
```dockerfile
ARG UID=<YOUR_USER_ID>
ARG GID=<YOUR_GROUP_ID>
```

#### PREPARE THE FOLDER FOR STORING POSTGRESS DATA
```bash
cd ..
mkdir -p data/postgres_data
```

Note: if you use a remote server you can change the `docker-compose.yaml` file to use another port.
---

For example:

```yaml
ports:
- "28080:8080"
```
### 2. BUILD THE COMPOSE

then from your local pc you can forward the port in this way:
Build all containers defined in `docker-compose.yml`:

```bash
ssh -N -L 28080:localhost:28080 [user]@[remote machine]
cd env
docker compose build
```

and open the airflow webpace from your local pc at `localhost:28080`
⏱ Estimated build time: **~490 seconds**

---

Login with username: `admin` password: `<password>`
### 3. RUN THE CONTAINER

To obtain the password `<password>` execute this command after the initialization of the containers
To run with logs visible:

```bash
docker compose logs | grep pass
docker compose up
```

### Shutdown the dockers
To run in detached mode (no logs):

```bash
docker compose down -v
docker compose up -d
```

## Test the cosipy DAG
---

### 4. ENTER THE CONTAINER

Enter in the docker airflow
To open a terminal inside the running Airflow container:

```bash
docker compose exec airflow bash
```

First download the data file from wasabi.
---

```bash
cd /shared_dir/pipeline
source activate cosipy
python initialize_pipeline.py
```
### 5. CONNECT TO THE AIRFLOW WEB UI

1. Open your web browser and go to:

This script downloads the input file from wasabi and move it in `/home/gamma/workspace/data`
[http://localhost:8080/home](http://localhost:8080/home)

Now we must activate the DAG named `"cosipt_test_v0"` from the airflow website
2. Insert the user credentials:
```text
user: admin
password: <YOUR_AIRFLOW_PASSWORD>
```

Then we have to copy the file in the input directory to trigger the DAG
---

### 6. STOP THE CONTAINER

To stop and remove all running containers, networks, and volumes:

```bash
cd /home/gamma/workspace/data
cp GalacticScan.inc1.id1.crab2hr.extracted.tra.gz input
docker compose down -v
```

We should see that the DAG started to process the data.
---

### 7. CONFIGURATIONS

Below is the list of environment variables defined in `.env` with their purpose:

| Variable | Description |
|-----------|--------------|
| **UID** | User ID for container bootstrap |
| **GID** | Group ID for container bootstrap |
| **DISPLAY** | Display variable for X11 forwarding (optional) |
| **AIRFLOW_ADMIN_USERNAME** | Default Airflow Web UI username |
| **AIRFLOW_ADMIN_EMAIL** | Email associated with Airflow admin user |
| **AIRFLOW_ADMIN_PASSWORD** | Secure password for Airflow Web UI |
| **ALERT_USERS_LIST_PATH** | Path to YAML file containing user alert configurations |
| **ALERT_SMTP_SERVER** | SMTP server used for alert notifications |
| **ALERT_SMTP_PORT** | Port of the SMTP server |
| **ALERT_EMAIL_SENDER** | Email address used as sender for system alerts |
| **ALERT_LOG_PATH** | Path to Airflow log file monitored by alert system |
| **AIRFLOW__SMTP__SMTP_STARTTLS** | Enables/disables STARTTLS (default: False) |
| **AIRFLOW__SMTP__SMTP_SSL** | Enables/disables SMTP over SSL (default: False) |
| **MAILHOG_WEBUI_URL** | URL for MailHog web interface (for testing alerts) |
| **COSI_DATA_DIR** | Root directory for COSI data |
| **COSI_INPUT_DIR** | Directory for COSI input data |
| **COSI_LOG_DIR** | Directory for COSI log files |
| **COSI_OBS_DIR** | Directory for observation data |
| **COSI_TRANSIENT_DIR** | Directory for transient event data |
| **COSI_TRIGGER_DIR** | Directory for trigger event data |
| **COSI_MAPS_DIR** | Directory for map data products |
| **COSI_SOURCE_DIR** | Directory for source-level data products |

---

### NOTES

- Make sure `.env` and `docker-compose.yml` are located in the same directory.
- Do **not** commit your personal `.env` file to version control.
- To inspect container logs, use:
```bash
docker compose logs -f airflow
```

---

**Cosiflow environment ready for use.**

---

## What is COSIDAG

A **COSIDAG** (COSI DAG) is a structured abstraction built on top of Apache Airflow DAGs.

It provides a **standardized workflow layout** for scientific pipelines, reducing boilerplate and enforcing consistent patterns across different analyses.

In particular, a COSIDAG:

* defines a common execution skeleton (input resolution, optional monitoring, result handling)
* encapsulates best practices for:

* file discovery
* parameter propagation
* XCom-based communication
* allows developers to focus only on **scientific tasks**, while orchestration logic is handled automatically

COSIDAGs are used for all production scientific pipelines (e.g. Light Curve, TS Map), while standard DAGs are reserved for orchestration, testing, or utilities.

**How to write and customize a COSIDAG** is explained in detail in the [tutorial section](modules/README.md).

---

## Tutorials and developer guide

A complete, step-by-step guide on how to:

* understand the COSIDAG execution model
* write new COSIDAGs
* add custom tasks
* use XCom correctly
* integrate external Python environments

is available in:

[tutorial section](tutorials/README.md).

This is the **recommended starting point for developers**.

---

## Available DAGs and COSIDAGs

A complete and up-to-date list of all DAGs and COSIDAGs implemented in this repository — including:

* workflow purpose
* inputs and outputs
* task structure
* operators used
* XCom usage

This directory `/home/gamma/workspace/heasarc/dl0` contains several folders with this format `2025-01-24_14-31-56`.
is documented in: [DAG and COSIDAG LIST README](dags/README.md)

Inside the folder we have the results of the analysis.
This document serves as the **catalog and reference** for all workflows available in Cosiflow.
Empty file added callbacks/__init__.py
Empty file.
82 changes: 82 additions & 0 deletions callbacks/on_failure_callback.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import yaml
import os
from airflow.utils.email import send_email
import urllib.parse

ALERT_CONFIG_PATH = os.getenv("ALERT_USERS_LIST_PATH", "/home/gamma/env/alert_users.yaml")


def load_alert_config():
with open(ALERT_CONFIG_PATH, "r") as f:
return yaml.safe_load(f)


def get_recipients(keyword: str) -> list[str]:
config = load_alert_config()
matched_groups = set()

for rule in config.get("rules", []):
if rule["pattern"] == keyword:
matched_groups.update(rule["notify"])

emails = set()
for group in matched_groups:
group_data = config["groups"].get(group)
if group_data:
emails.update(group_data.get("emails", []))
return sorted(emails)



def notify_email(context):
task = context["task_instance"]
dag_id = task.dag_id
task_id = task.task_id
run_id = task.run_id
execution_date = context.get("execution_date")

# URL-encode i parametri per sicurezza
base_url = "http://localhost:8080"
query = urllib.parse.urlencode({
"execution_date": execution_date.isoformat(),
"tab": "logs",
"dag_run_id": run_id,
"task_id": task_id
})
log_url = f"{base_url}/dags/{dag_id}/grid?{query}"

# Percorso log locale (personalizzabile)
log_path = f"/home/gamma/airflow/logs/dag_id={dag_id}/run_id={run_id}/task_id={task_id}/attempt=1.log"
if not os.path.exists(log_path):
log_preview = "⚠️ Log file not found."
else:
with open(log_path, "r") as f:
lines = f.readlines()[-30:] # Ultime 30 righe
log_preview = "".join(lines)
log_preview = log_preview.replace("<", "&lt;").replace(">", "&gt;") # Escaping HTML

recipients = get_recipients("ALERT_FAIL")
if not recipients:
return # no recipients, skip

subject = f"[ALERT] Task {task.task_id} in DAG {task.dag_id} has failed"
html_content = f"""
<html>
<body style="font-family:Arial, sans-serif; font-size:14px; color:#333;">
<h2 style="color:#c0392b;">⚠️ Task Failure Alert</h2>
<table style="border-collapse:collapse;">
<tr><td><strong>DAG:</strong></td><td>{dag_id}</td></tr>
<tr><td><strong>Task:</strong></td><td>{task_id}</td></tr>
<tr><td><strong>Execution Time:</strong></td><td>{execution_date}</td></tr>
<tr><td><strong>Log URL:</strong></td><td><a href="{log_url}">{log_url}</a></td></tr>
</table>
<h3 style="margin-top:20px;">🔍 Log Preview</h3>
<pre style="background-color:#f5f5f5; padding:10px; border:1px solid #ccc; max-height:300px; overflow:auto;">
{log_preview}
</pre>
<p style="color:#888; font-size:12px;">Full log available at the link above.</p>
</body>
</html>
"""

send_email(to=recipients, subject=subject, html_content=html_content)
Loading