Skip to content

ID specification #10

@opain

Description

@opain

Received via email

I'm working with TWAS-GSEA and encountering some challenges with gene identifier handling (specifically ENSG IDs).

For example, I have a twas results file created with using the GTExv8 multi-tissue expression files (like the one available here: https://s3.us-west-1.amazonaws.com/gtex.v8.fusion/ALL/GTExv8.ALL.Brain_Substantia_nigra.tar.gz). This file uses the ENSG IDs instead of the gene symbols. I used the same files to create an appropriate Reference expression txt.gz file with the FeaturePred script. However, when I run TWAS-GSEA script, I do not get the results, only the log info about 0 features with entrez IDs and 0 gene sets have a sufficient number of genes available in the TWAS (please see the log file). I tried it with and without --use_alt_id parameter, but the outcome is the same.

As a potential solution, I replaced the ENSG IDs in the GTExv8 weight files with gene symbols and obtained new twas results and the reference expression file. However, many genes (~15%) do not have an assigned gene symbol, so I had to remove them from the weight files. This time TWAS-GSEA script successfully completed the run, suggesting that the problem was indeed due to the ENSG identifiers. However, since a significant portion of the genes was removed from the weight files during the preparation process, I'm worried that might significantly affect the results (~15% of weight files were removed and, consequentially, when using those processed weight files we lost 31 of 237 entries in the TWAS results file).

So, I have a couple of questions regarding this problem and I would be grateful if you could find some time to answer:

  1. Can the files containing ENSG identifiers be used with TWAS-GSEA script? If yes, should they be in the format "ENSGXXXXX" or "ENSGXXXXX.version"?

  2. If not, is there any reliable way to transform the weight files which contain ENSG identifiers so they can be used with the TWAS-GSEA script, without losing any information?

TWAS-GSEA_SLURM_JOB_LOG.log
TWAS-GSEA_SLURM_JOBS.txt
TWAS-GSEA_LOG.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions