Skip to content

Improve dataset output information#92

Merged
loucerac merged 1 commit intodevelopfrom
feature_improve_outputs
Nov 20, 2025
Merged

Improve dataset output information#92
loucerac merged 1 commit intodevelopfrom
feature_improve_outputs

Conversation

@dlopez-bioinfo
Copy link
Collaborator

Summary

Replace basic shape tuple printing with formatted, descriptive output messages in the dataset loading functions.

Changes

  • get_disease_data() (line 494): Changed print(pathvals.shape) to formatted message showing "samples × circuits"
  • get_data() (lines 524-526): Replaced print(gene_xpr.shape, pathvals.shape) with structured "Data Summary" section showing labeled dimensions for both gene expression features and
    pathway activities

Benefits

  • More user-friendly output with clear labels
  • Better readability for understanding data dimensions
  • Consistent with informative logging practices

@dlopez-bioinfo
Copy link
Collaborator Author

CI Build/Test Failure — Possibly Related to Test Input Data

It appears that the CI build failure might be caused by inconsistencies in the test input data files.
Specifically, some expected pathway or circuit IDs are missing from the test dataset columns.

Observed Behavior

pathvals test input (temporary file):

pathvals /tmp/tmp2dmsa5ts/pathvals.tsv.gz
                              P.hsa03320.28  ...  P.hsa04920.43
index                                        ...               
GTEX-1KXAM-0005-SM-DIPEC           0.244546  ...       0.380935
GTEX-1J8Q2-2226-SM-CM2TZ           0.286403  ...       0.511682
GTEX-11TTK-2726-SM-5GU58           0.201513  ...       0.245063
GTEX-18A7A-1726-SM-7LT93           0.177306  ...       0.337708
GTEX-14BIM-0011-R5b-SM-5S2RM       0.197391  ...       0.074667

circuits2genes resource file:

/home/runner/work/drexml/drexml/drexml/resources/circuits2genes_gtex-v8_hipathia-v2-14-0.tsv.gz
      circuit_id  10000  10010  100132074  ...  998  9985  999  9992
0  P-hsa03320-10      0      0          0  ...    0     0    0     0
1  P-hsa03320-20      0      0          0  ...    0     0    0     0
2  P-hsa03320-21      0      0          0  ...    0     0    0     0
3  P-hsa03320-22      0      0          0  ...    0     0    0     0
4  P-hsa03320-23      0      0          0  ...    0     0    0

Test Failure

FAILED tests/test_datasets.py::test_get_disease_data[False-True]
KeyError: "None of [Index(['P-hsa04920-43', 'P-hsa03320-28'], dtype='object')] are in the [columns]"

Possible Cause

The test expects certain circuit or pathway identifiers (P-hsa04920-43, P-hsa03320-28),
but these are not present in the loaded circuits2genes dataset columns. This could be due to name re-formatting (i.e. dots and slashes).

@loucerac loucerac merged commit a3682a9 into develop Nov 20, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants