-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Received via email
I'm working with TWAS-GSEA and encountering some challenges with gene identifier handling (specifically ENSG IDs).
For example, I have a twas results file created with using the GTExv8 multi-tissue expression files (like the one available here: https://s3.us-west-1.amazonaws.com/gtex.v8.fusion/ALL/GTExv8.ALL.Brain_Substantia_nigra.tar.gz). This file uses the ENSG IDs instead of the gene symbols. I used the same files to create an appropriate Reference expression txt.gz file with the FeaturePred script. However, when I run TWAS-GSEA script, I do not get the results, only the log info about 0 features with entrez IDs and 0 gene sets have a sufficient number of genes available in the TWAS (please see the log file). I tried it with and without --use_alt_id parameter, but the outcome is the same.
As a potential solution, I replaced the ENSG IDs in the GTExv8 weight files with gene symbols and obtained new twas results and the reference expression file. However, many genes (~15%) do not have an assigned gene symbol, so I had to remove them from the weight files. This time TWAS-GSEA script successfully completed the run, suggesting that the problem was indeed due to the ENSG identifiers. However, since a significant portion of the genes was removed from the weight files during the preparation process, I'm worried that might significantly affect the results (~15% of weight files were removed and, consequentially, when using those processed weight files we lost 31 of 237 entries in the TWAS results file).
So, I have a couple of questions regarding this problem and I would be grateful if you could find some time to answer:
-
Can the files containing ENSG identifiers be used with TWAS-GSEA script? If yes, should they be in the format "ENSGXXXXX" or "ENSGXXXXX.version"?
-
If not, is there any reliable way to transform the weight files which contain ENSG identifiers so they can be used with the TWAS-GSEA script, without losing any information?
TWAS-GSEA_SLURM_JOB_LOG.log
TWAS-GSEA_SLURM_JOBS.txt
TWAS-GSEA_LOG.log