Skip to content

hrss implementation#66

Open
katyagovorkova wants to merge 19 commits intomainfrom
for-nplm
Open

hrss implementation#66
katyagovorkova wants to merge 19 commits intomainfrom
for-nplm

Conversation

@katyagovorkova
Copy link
Collaborator

@katyagovorkova katyagovorkova commented Feb 25, 2026

  1. Add script to. make offline O3 dataset
  2. cWB-like cuts
  3. Added hrss computation in the dataloader. For this, the ML4GW has to be modified to return snr scaling constants. The implementation was fixed by @asasli
  4. Optional whitening in the dataloader
  5. Added hrss and correlation variables in the offline dataset generation
  6. Temporarily remove correlation cuts from plotting, but add hrss based efficiency

Katya Govorkova and others added 17 commits October 28, 2025 08:06
This script:
- Loads JIT-compiled ResNet embedding model
- Processes continuous strain data from h5 files
- Applies whitening and bandpassing (matching training)
- Segments data into 1-second windows
- Computes embeddings for each window
- Handles BBC background (with valid clean times) and O4 signals
- Saves embeddings to HDF5 files

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Documents usage, arguments, workflow, and data formats for the
compute_embeddings.py script.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This script:
- Loads computed embeddings from HDF5 or .npy files
- Loads reference embeddings for comparison
- Computes mean and std for each embedding dimension
- Compares statistics and reports differences
- Checks similarity within tolerance
- Provides detailed dimension-by-dimension comparison

Usage:
  python scripts/test_embeddings.py --computed output/embeddings/o4_test_embeddings.h5
  python scripts/test_embeddings.py --help

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Issues fixed:
- Validate that valid_times have enough data BEFORE them for PSD
- Add explicit segment size validation before stacking
- Skip segments that don't have expected number of samples
- Prevent empty batches from being processed

This fixes "Number of samples 0 in input x is insufficient for
number of fft samples 8192" errors at file boundaries.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes all default path arguments and documentation to use
gwak/output/ instead of output/ to match the actual directory
structure on the cluster.

Files updated:
- scripts/compute_embeddings.py: Default args for model, data dirs, output
- scripts/test_embeddings.py: Auto-detect paths
- scripts/README.md: Documentation examples

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Corrects the default model path to use ResNet_HL instead of
ResNet_HK to match the actual model directory.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This script calculates signal detection efficiencies by:
- Converting embedding trigger indices to GPS times
- Matching triggers to injections from h5 injection files
- Computing efficiency vs SNR for each signal type
- Generating efficiency plots and summary statistics

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@katyagovorkova katyagovorkova changed the title hrss and NPLM hrss implementation Feb 25, 2026
@katyagovorkova
Copy link
Collaborator Author

@AndyC80297, why are all the embedding-related new files from you not in the main? I can't figure out why they are in my branch, but are not in main...

@katyagovorkova katyagovorkova mentioned this pull request Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants