Skip to content

Muszeb/ENQUIRE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

drawing


ENQUIRE DOI GitHub License

References Link
Methods
Application
Source Code
Updates (Latest Release) GitHub Release
Media Static Badge for ISMB/ECCB 2025 in Liverpool, UK
How to use Start here
Implementation (Link) Requires Containerizes
Static Badge Static Badge
Docker Image Size Static Badge
Static Badge

Static Badge
Static Badge Static Badge
Static Badge Static Badge
Static Badge

Static Badge
Static Badge Static Badge
Static Badge Static Badge
Static Badge

ABSTRACT

The accelerating growth of scientific literature overwhelms our capacity to manually distil complex phenomena like molecular networks linked to diseases. Moreover, biases in biomedical research and database annotation limit our interpretation of facts and generation of hypotheses. ENQUIRE (Expanding Networks by Querying Unexpectedly Inter-Related Entities) offers a time- and resource-efficient alternative to manual literature curation and database mining. ENQUIRE reconstructs and expands co-occurrence networks of genes and biomedical ontologies from user-selected input corpora and network-inferred PubMed queries. The integration of text mining, automatic querying, and network-based statistics mitigating literature biases makes ENQUIRE unique in its broad-scope applications. For example, ENQUIRE can generate co-occurrence gene networks that reflect high-confidence, functional networks. When tested on case studies spanning cancer, cell differentiation and immunity, ENQUIRE identified interlinked genes and enriched pathways unique to each topic, thereby preserving their underlying diversity. ENQUIRE supports biomedical researchers by easing literature annotation, boosting hypothesis formulation, and facilitating the identification of molecular targets for subsequent experimentation.

drawing

INSTRUCTION MANUAL

INSTALLATION

ENQUIRE can currently run on LINUX systems and LINUX virtual machines using Apptainer/Singularity and on Linux, MacOS, and Windows using Docker. If you would rather use Docker instead of Singularity, please follow the dedicated README available here. Please check the implementation table for the latest available images and requirements.

If you want to use ENQUIRE with Apptainer/Singularity, please install the latter following the steps for Linux or Windows/Mac. The file called ENQUIRE.sif is a compressed Singularity Image File (SIF) that already contains all the code, dependendencies and stable metadata needed to run ENQUIRE, so no further installation steps are needed. The original and latest SIF files are available on Figshare - see implementation table. We recommend adding the path to the apptainer executable to your PATH variable (e.g. by editing your .bashrc file). This allows to directly execute ENQUIRE.sif as any other executable (./ENQUIRE.sif).

To follow the next steps in the tutorial, clone the repository:

git clone https://github.com/Muszeb/ENQUIRE.git
cd ENQUIRE

then, download the SIF image file ENQUIRE.sif from FigShare and place it in the repository. We provided checksum files (md5sum_ENQUIRE_sif.txt and md5sum_original_ENQUIRE_sif.txt) to ensure the download completed successfully. Remember to also make the SIF file executable.

md5sum -c md5sum_ENQUIRE_sif.txt
chmod +x ENQUIRE.sif

You can then place the ENQUIRE directory or ENQUIRE.sif wherever you wish to, and possibly add its location to your PATH variable for an easier calling.

Back to the beginning of the instruction manual

USAGE

The exemplary code snippets assume that apptainer location is added to your PATH variable, and that you're running the commands from the ENQUIRE main directory (do cd /path/to/ENQUIRE to test them). Here is how you call ENQUIRE scripts using ENQUIRE.sif:

# assuming the `apptainer` location is in your PATH variable and you did `cd ENQUIRE` or `ENQUIRE.sif` is in your working directory
Usage: ./ENQUIRE.sif <script_name> [script_argument]

Where <script_name> is one of:

  • efetch_references.py
  • ENQUIRE.sh
  • context_aware_gene_sets.R
  • context_aware_pathway_enrichment.R

Back to the beginning of the instruction manual

INPUT FILE

A valid input file should consist of a list of PubMed Identifiers (PMIDs) stored in plain text files, one PMID per line. The easiest way to generate a valid ENQUIRE input file is to generate a PubMed query on the NCBI's website. Use of MeSH terms and exclusion of review articles is recommended but not mandatory. Then, click on Save, choose Selection: All results and Format: PMID, and Create file: Example of a PubMed Query with ENQUIRE-compliant Save options Alternatively, we also offer a Python script to extract the PubMed identifiers of all papers cited in a reading of interest (e.g. a review paper of a particular topic). From the ENQUIRE folder and virtual environment, type on the command line:

# assuming the `apptainer` location is in your PATH variable and you did `cd ENQUIRE` or `ENQUIRE.sif` is in your working directory
./ENQUIRE.sif efetch_references.py tag ref1 ref2 ref3 ...

where tag is the name of the plain text output file, while ref1 ref2 ref3 ... are the PMIDs of the papers you want to extract the references from. The output will look like the example from the previous section and is therefore ready to be used as ENQUIRE input. DISCLAIMER: if the references are not annotated into the Pubmed's API, The script will silently return no match - this may go unnoticed when fetching references from multiple articles. As a rule of thumb, look for "References" in the "page navigation" menu on the Pubmed page of the article of interest to tell the web-annotation status of an article.

Back to the beginning of the instruction manual

LAUNCHING ENQUIRE
  • Before running an actual task, take a look at ENQUIRE_methods_overview.png: the figure briefly illustrates the main steps of the algorithm.

  • In the next exemplary code snippet, we assumed you cloned this repository and ENQUIRE is your current working directory.

  • IMPORTANT NOTE: it is highly recommended to get an NCBI API_KEY before running ENQUIRE. Getting one is very easy. You can then copy the API key and enter it as an environmental variable on the command line, like so:

export NCBI_API_KEY=your_api_key_here

This will ensure your API KEY is passed as an environmental variable to all ENQUIRE runs within the same terminal session.

  • you can inspect the code Help section by running (from the ENQUIRE directory) ./ENQUIRE.sif ENQUIRE.sh -h:
####################################################################################

Expanding Networks by Querying Unexpectedly Inter-Related Entities

####################################################################################

####################################################################################

Usage: ./ENQUIRE.sif ENQUIRE.sh [script_arguments]

Legend:	[-flag_short|--flag_long|config file variable, if available]:

[-p|--path|wd] = the path to the working directory (wd), where the output directory will be written in.
	It must be the ENQUIRE main folder, with ./code and ./input as subfolders.
	The default is the current working directory.

[-i|--input|to_py] = input.txt: a 'seed' input text file containing one PMID per line.
	It can be obtained from a PubMed querying specifying 'PMID' as the download format option.
	A minimun of 3 entries is required, but a list at least a few dozens articles is highly recommended.

[-t|--tag|tag] = A tag definining the task.
	It must be an alphanumeric string (underline_spaced_words are accepted).

[-j|--ncores|ncores] = The max number of CPU cores to be used.
	Default is 6.

[-c|--combine-set|comb] = how many k entities should be intersected to construct a query?
	3: loose searches, 4: moderate (default), 5: very strict queries.

[-r|--representativeness|thr] = representativeness threshold (%) for a subgraph to be included in the network expansion steps (default: 1 %).
	Example: if a subgraph contains nodes exclusively mentioned in 1 paper out of a total of 100, that subgraph has a 1% representativeness.

[-a|--attempts|A] = how many query attempts (i.e. k-sized graphlets) should be run to connect any two network communities?
	1: conservative, 2: moderate (default), 3: greedy.

[-k|--connectivity|K] = minimal community connectivity (K), which applies to any expansion-derived entities:
	each gene/MeSH term must be connected to at least K original communities to be incorporated in the expanded network - default: 2.

[-e|--entity|etype] = which entity type ('gene','MeSH') are you interested into? Omit or 'all' to textmine both entities.

[-f|--config] = if a config file is being used, specify its full path (e.g. input/textmining_config.txt).
	This option overwrites any parameter set by a different option.

[-w|--rscript|rscript] = path to the Rscript compiler. If using 'ENQUIRE.sif', it defaults to the containerized version of R.

[-d|--inputdata|sd] = path to the input data folder. If using 'ENQUIRE.sif', it defaults to the containerized input folder.
	WARNING: this option is still under development, to allow users to set different species targets
	and subsequently change the H.s. specific metadata.

[-m|--cellentitymodule|CELLTAGSBOOL] = Boolean, enable removing of character spans tagged as cell lines or types (e.g. 'CD8+ T-cell')?
	Default: False.

[-h|--help] = print this help message.

You might be seeing this Help because of an input error.

####################################################################################

Let's set up an example: we want to extract biomedical information from publications dealing with chemically-induced colitis in melanoma patients undergoing checkpoint-inhibitors therapy. Our ENQUIRE job might then look something like

# assuming the `apptainer` location is in your PATH variable and you did `cd ENQUIRE` or `ENQUIRE.sif` is in your working directory
./ENQUIRE.sif ENQUIRE.sh -t ICI_and_Colitis -i test_input/pmid-ICI_and_Colitis.txt

Where all the other parameters described in the Help message of ENQUIRE.sh are set to default values. The passing of the parameters could be easen by using the ENQUIRE_config.txt file that resides in the main ENQUIRE directory: the left hand side of each variable assignment must be kept unchanged, while the right hand side can be tweaked according to one's needs. Additional information on the parameters are given in ENQUIRE_flowchart.png. Then, the program can be launched by running:

# assuming the `apptainer` location is in your PATH variable and you did `cd ENQUIRE` or `ENQUIRE.sif` is in your working directory
./ENQUIRE.sif ENQUIRE.sh -f ENQUIRE_config.txt

Back to the beginning of the instruction manual

EXPLANATION OF THE OUTPUT DATA STRUCTURE
  • Provided a recognisable tag has been passed to textmining algorithm, a typical output would produce a folder named tmp-tag, which in turn contains as many subdirectories as the number of steps/iterations performed. For example, if the algorithm performed

    1. Reconstruction of a Gene/Mesh network from the original set of papers;
    2. One query expansion and network reconstruction as the Gene/Mesh network was not fully connected yet;
    3. One query expansion and network reconstruction as the gene-gene network was not fully connected yet, then stopped;

    Then there will be three subfolders, namely tag, tag_subgraph_expansion1, tag_subgraph_expansion2. The counter attached to folders and file names records the subsequent attempts to the expansion and reconstruction of co-occurence networks.

    Typically, within each of these sub-folders/iterations, three pairs of edge and node tables can be found, respectively corresponding to "Complete" (Gene/Mesh), "Gene"- and "Mesh"-only networks (TSV files). These files can be easily imported in Cytoscape or similar graph visualization tools.

    Whenever it wasn't possible to obtain one or more of the aforementioned networks, the pipeline should print a message with information on the most meaningful files to look at. It is worth mentioning that the file tag...Complete_literature_links.tsv within each subfolder allows fast retrieval of specific edge-associated papers by means of encoded hyperlinks.

    The batch of queries that were tested in each iteration is stored in tag...ordered_queries.tsv within each respective subfolder. Additional meta-data can be explored under the data/ subfolder. Besides node and edge tables for individual subgraphs (i.e. gene/MeSH of gene-only connected components), here you could also explore how the original co-occurrence multigraph looked like, before the network-based test statistics (tag...edge_list_allxall.tsv).

    Furthemore, under tmp-tag, the file source_pmids.txt contains all the inspected articles for the given ENQUIRE job. These can also be consulted specifically for each iteration under tmp-tag/efetch_inputs. Starting from release v4.0.0, this subdirectory also contains literature metadata for all ierations under CitationToPMID_record.tsv.

    Please don't hesitate to contact us for any clarification on the purposes of any file.

  • Interactive .html networks

    It is also possible to visually inspect Gene-MeSH networks and the reduced networks containing only cliques in two .html files, respectively stored within each iteration's subfolder as tag...interactive_Gene-MeSH_Network.html and tag...interactive_Cliques_Network.html.

Back to the beginning of the instruction manual

EXECUTING POST-HOC ANALYSES

Context-aware gene set annotation

  • Run ./ENQUIRE.sif context_aware_gene_sets.R [options] to perform automatic annotation of gene sets, using ENQUIRE-generated, Gene/MeSH edge and node tables and Fuzzy-C-Means (FCM). See the original manuscript for further information.
Usage: ./ENQUIRE.sif context_aware_gene_sets.R [options]

Options:
	-w PATH, --directory=PATH
		Output directory [default to current working directory]

	-e PATH, --edgetable=PATH
		Path to an ENQUIRE-generated, Gene/MeSH edge table file (required)

	-n PATH, --nodetable=PATH
		Path to an ENQUIRE-generated, Gene/MeSH node table file (required)

	-t TAG, --tag=TAG
		tag prefix for all output files (default to 'ENQUIRE')

	-o MODALITY, --modality=MODALITY
		node embedding modality used for clustering.
		Default is node2vec+ (Liu et al. 2023), using `ztPois.cdf` as weights, as implemented in https://github.com/krishnanlab/PecanPy.
		Type 'invlogweight' to reproduce the method described in ENQUIRE's original publication (Musella et al. 2025).

	--num-walks=NUMWALKS
		node2vec parameter. Number of walks per source. (default: 150)

	--walk-length=WALKLENGTH
		node2vec parameter. Length of walk per source. (default: 150)

	--n2vp=N2VP
		node2vec parameter. Return hyperparameter. (default: 1)

	--n2vq=N2VQ
		node2vec parameter. Inout hyperparameter. (default: 2)

	--window-size=WINDOWSIZE
		node2vec parameter. Context size for optimization. (default: 10)

	--dimensions=DIMENSIONS
		node2vec parameter. Number of dimensions. (default: 32)

	-d PARAMETER, --membdeg=PARAMETER
		minimal membership degree for gene-to-cluster association (default: 0.05), range [0-1]

	-r PARAMETER, --round=PARAMETER
		Should membership degrees be rounded to the first significant digit (helps the stability of the results)?
		default: True [T,F]

	-s PARAMETER, --setsize=PARAMETER
		minimal gene set size (default: 2)

	-v VARIANCE, --varthreshold=VARIANCE
		Dimensionality reduction based on the chosen proportion of Variance
		observed upon PCA-transforming the inverse-log-similarity between nodes (default: 0.99. range [0-1]).
		Set it to 1 to use untrasformed, scaled node similarities.

	-m MESH, --meshxgs=MESH
		How many MeSH terms which are closest to the cluster centroids should be used to describe a gene set? (default:3)

	-p PATH, --netpathdata=PATH
		Path to 'ENQUIRE-KNet_STRING_RefNet_Reactome_Paths.RData.gz' (required).
		If using the ENQUIRE.sif singularity image, the default path should point to the containerized copy of the file.

	-h, --help
		Show this help message and exit
  • You can use the exemplary output files contained in tmp-Ferroptosis_and_Immune_System to test the script. As of release v4.0.0 the default node embedding modality is node2vec+ (Liu et al. 2023). Set -o invlogweight for original behaviour. For comparison, both modalities have been precomputed and distributed under tmp-Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System/.
# assuming the `apptainer` location is in your PATH variable and you did `cd ENQUIRE` or `ENQUIRE.sif` is in your working directory
./ENQUIRE.sif context_aware_gene_sets.R -e tmp-Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System_Complete_edges_table_subgraph.tsv 
-n tmp-Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System_Complete_nodes_table_subgraph.tsv

The output will be saved in the default-tagged spreadsheet file ENQUIRE_context_aware_gene_sets.xlsx as well as a plot showing the reconstructed gene sets as a PNG image. Please note that the script might last quite long, due to the FCM algorithm.

Context-aware pathway enrichment analysis

  • Run ./ENQUIRE.sif context_aware_pathway_enrichment.R [options] to perform topology-based, pathway enrichment analysis using SANTA, Reactome H. sapiens pathways, and STRING's H. sapiens, physical PPI network, using ENQUIRE-generated, gene-gene edge table. See the original manuscript for further information.
Usage: Rscript code/context_aware_pathway_enrichment.R [options]

Options:
	-w PATH, --directory=PATH
		Working directory (default to current working directory)

	-o PATH, --outdirectory=PATH
		Output directory (default to current working directory, and must preexist)

	-n PATH, --netpathdata=PATH
		Path to 'ENQUIRE-KNet_STRING_RefNet_Reactome_Paths.RData.gz' (required).
		If using the ENQUIRE.sif singularity image, the default path should point to the containerized copy of the file.

	-e PATH, --edgetable=PATH
		Path to an ENQUIRE-generated, gene-gene edge table file (required).

	-c PARAMETER, --cores=PARAMETER
		max number of cores used (PSOCK parallelization) (default: 4), >1 recommended.

	-t TAG, --tag=TAG
		tag prefix (default to 'ENQUIRE').

	-s PARAMETER, --setsize=PARAMETER
		maximum Reactome pathway size (default: 100, minimum 3).

	-p PARAMETER, --permutations=PARAMETER
		number of permutations to infer KNet null distribution
		(default: 100, the higher the more accurate the test statistics).

	-f PARAMETER, --padjust=PARAMETER
		P-value adjustment method, must be one of [holm, hochberg, hommel, bonferroni, BH, BY, fdr, none].
		Default: holm.

	-q QSCORENET, --qscorenet=QSCORENET
		Do you want to save a copy of the STRING network in GRAPHML format with ENQUIRE-inferred QScores as node weights?
		default: False [T,F]

	-h, --help
		Show this help message and exit
  • You can use the exemplary output files contained in tmp-Ferroptosis_and_Immune_System to test the script (we reduce the number of tested pathways with the s parameter to speed up the process):
# assuming the `apptainer` location is in your PATH variable and you did `cd ENQUIRE` or `ENQUIRE.sif` is in your working directory
./ENQUIRE.sif context_aware_pathway_enrichment.R -e tmp-Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System_Genes_edges_table_subgraph.tsv -s 30

The output will be saved in the default-tagged spreadsheet file ENQUIRE_context_aware_pathway_enrichment.xlsx, together with two PNG images showing the test statistics p-value distribution and the correlation between the Node score and degree. Please note that the script might take quite long to finish, and it benefits from a high performance computer, if available.

Back to the beginning of the instruction manual

UPDATE (September 2025): TRANSFORM ENQUIRE NETWORKS INTO GRAPH DATABASES

Usage

The latest Apptainer and Docker images also retrieve bibliographic data associated to queried PMIDs and are shipped with Community Edition (v5.25), allowing for easy graph database construction starting from ENQUIRE's *_Complete_* TSV files. The SIF image is complemented with the shell script ENQUIRE2KG.sh(also available in the GitHub repository),orchestrating the database construction and initiation. If you downloaded the script from FigShare, remember to make ENQUIRE2KG.sh executable via chmod +x. In short, the ENQUIRE2KG.sh

  • creates (if not previously existing) a enquire2kg-tag directory and mounts it under a containerized path in which the graph database will outputed;
  • converts ENQUIRE's Complete edge and node files into Neo4j-friendly CSV files;
  • uses neo4j-admin to establish a graph database and test its functionality;
  • runs neo4j console to establish a (remote) connection via http://localhost:7474/.

Unfortunately, ENQUIRE2KG does not work with ENQUIRE output generated using the original image!.

############# TURN ENQUIRE NETWORKS INTO KNOWLEDGE GRAPHS USING NEO4J - UTILITY SCRIPT ##############
Path to code: /path/to/ENQUIRE2KG.sh

####################################################################################

Expanding Networks by Querying Unexpectedly Inter-Related Entities

####################################################################################

####################################################################################

Usage: ENQUIRE2KG.sh [script_arguments]

Legend:	[-flag_short|--flag_long|config file variable, if available]:

[-i|--image|image] = the path to the singularity image file (.sif). Defaults to 'ENQUIRE.sif'.

[-p|--path|wd] = the path to the working directory (wd), where the output directory will be written in.
	It must be the ENQUIRE main folder, with ./code and ./input as subfolders.
	The default is the current working directory.

[-t|--tag|tag] = A tag definining the task.
	It must be an alphanumeric string (underline_spaced_words are accepted).

[-d|--inputdir|input] = path to the input data folder. It must point to an ENQUIRE-generated directory containing co-occurrence network data
	(e.g https://github.com/Muszeb/ENQUIRE/tree/main/tmp-Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System).

[-f|--config] = if a config file is being used, specify its full path (e.g. input/textmining_config.txt).
	This option overwrites any parameter set by a different option.

[-h|--help] = print this help message.

You might be seeing this Help because of an input error.

####################################################################################

Here is how you can test this with the example output data tmp-Ferroptosis_and_Immune_System available in the GitHub repository.

# assuming the `apptainer` location is in your PATH variable, you did `cd ENQUIRE`, and `ENQUIRE.sif` is in your working directory
./ENQUIRE2KG.sh -i ENQUIRE.sif -t Ferroptosis_and_Immune_System -d tmp-Ferroptosis_and_Immune_System/Ferroptosis_and_Immune_System_subgraphs_expansion2/  

Eventually, it should print the following:

[...previously printed messages and log data...]
[...here neo4j console is executed...]
Starting Neo4j.
2025-09-23 16:15:59.033+0000 INFO  Logging config in use: File '/etc/neo4j/user-logs.xml'
2025-09-23 16:15:59.048+0000 INFO  Starting...
2025-09-23 16:15:59.650+0000 INFO  This instance is ServerId{68729ca1} (68729ca1-634f-409d-83b9-0a41c2ce8fc2)
2025-09-23 16:16:00.448+0000 INFO  ======== Neo4j 2025.04.0 ========
2025-09-23 16:16:01.398+0000 INFO  Anonymous Usage Data is being sent to Neo4j, see https://neo4j.com/docs/usage-data/
2025-09-23 16:16:01.517+0000 INFO  Bolt enabled on ------0-01vpnedf0-0002d-admin-nocnocnoc-us-uforms.gbc.criteo.com:7687.
2025-09-23 16:16:01.989+0000 INFO  HTTP enabled on 0.0.0.0:7474.
2025-09-23 16:16:01.990+0000 INFO  Remote interface available at http://localhost:7474/
2025-09-23 16:16:01.991+0000 INFO  id: D5F5F166C7623343039979E7682345DA2E4F9E9D41BB6A402A579BAB34544889
2025-09-23 16:16:01.991+0000 INFO  name: system
2025-09-23 16:16:01.991+0000 INFO  creationDate: 2025-09-23T16:15:27.007Z
2025-09-23 16:16:01.992+0000 INFO  Started.

As long as the session stays open (or detached via screen or tmux), the local HTTP port http://localhost:7474/ is pointing to Neo4j Browser, allowing for inspection and querying of the ENQUIRE-derived graph database .

You can also use Neo4j Desktop - here's how:

  1. Initialize a "New Project", then add a "Remote connection";

drawing

  1. Keep everything as default and hit "Next";

drawing

  1. Set a username and password;

drawing

  1. Click on "Connect", wait for the Remote DBMS to be active, the click "Open" to access Neo4j Browser.

drawing

Examples

A

Suppose you want to know which entities are related to the concept of neoplasms within a broader search concerning the interrelation between ferroptosis and immune system. As a proxy, we can write a query that matches MeSH terms containing the word "neoplasm" and that returns genes (orange), MeSH (turquoise), and Literature (red) nodes from the example ENQUIRE network like so:

MATCH (m:MeSH)-[:HAS_SOURCE]-(l:Literature)-[:HAS_SOURCE]-(g:Gene)
WHERE m.ENTITY =~ '.*neoplasm.*'
RETURN m,l,g

yielding

drawing

B

Suppose you have conducted a differential expression analysis and obtained a list of differentially expressed genes (DEGs). Researchers often want to compare their DEG list with findings from previously published studies to contextualize their results. However, traditional literature searches that explicitly include specific DEGs as search terms are susceptible to cherry-picking bias, where curators may (unconsciously) select papers that confirm their expectations. With ENQUIRE, you can first query for all papers relevant to your experimental topic without specifying individual genes, then extract significantly co-occurring entities, and finally examine the literature support and co-occurrence patterns of your DEGs. This workflow is reproducible and it minimizes selection bias. We employed such validation strategy in this publication. Here's how to construct such a query, using genes contained in the example ENQUIRE network (we also demonstrate additional filtering options such as Year of publication):

MATCH (g1:Gene)-[:CO_OCCURS]-(g:Gene)-[:HAS_SOURCE]-(p:Literature)
WHERE any(x IN g.ENTITY WHERE x IN [
'CD36',
'FAM126A',
'ROS1',
'SLC7A11',
'GPX4',
'IFNA1',
'ACSL3', // will not appear in the output network
'ACSL4', 
]) AND p.Year > 2023
RETURN g,p

yielding

drawing

Back to the beginning of the instruction manual

POSSIBLE SOURCES OF ERRORS
  • Test the command which apptainer: if apptainer location is not in your PATH variable, you need to invoke it by specifying its path, that is doing /path/to/apptainer run /path/to/ENQUIRE.sif ... instead of ./path/to/ENQUIRE.sif ...

  • Test the command awk '/MemAvailable/ {print $2}' /proc/meminfo on your command line: this is the way ENQUIRE checks the available RAM on Linux systems, in order to avoid overflows. Make sure awk is installed on your system. If you witness a non-awk related issue, contact us with information on your system and possible solutions to alternatively track the available memory on your OS.

  • When computing large networks, an error related to the default Stack Size can potentially appear, especially when running R scripts, such as Error: C stack usage is too close to the limit. In this case, one shall set a higher stacksize to allow the script to complete, via

    ulimit -s N 
    

    Where N shall be a size expressed in Kb to set as the maximum stack size. You could first check the number returned by Cstack_info() in an active R shell. You can read more about the issue here and here.

  • If you get a curl-related error of the form

HTTP/1.1 400 Bad Request
 WARNING:  FAILURE ( Thu Feb 15 10:24:24 AM CET 2024 )

It means that NCBI is not willing to process your request. Sometimes, this can be due to a server hiccup, but most times using an API KEY fixes the issue. Getting one is very easy. You can then copy the API key and enter it as an environmental variable on the command line, like so:

export NCBI_API_KEY=your_api_key_here

This will ensure your API KEY is passed as an environmental variable to all ENQUIRE runs within the same terminal session.

REPRODUCIBILITY

Two identical runs of ENQUIRE should produce identical co-occurrence networks and query formulations, as long as NCBI made no updates on the MeSH indexing of PubMed articles involved during the time that separates the two runs. In that case, the later run should produce queries that are supersets of the earlier one. The exemplary output directory tmp-Ferroptosis_and_Immune_System was generated between 10.10.23 and 11.10.23 and has been used to generate the results illustrated in the ENQUIRE manuscript. The output was found to be reproducible on 3 different Linux Machines (2 Ubuntu and 1 ARCH-LINUX distributions). The use of a containerized image (the SIF file) should guarantee the reproducibility irrespective of the host operating system. While several other tests on different operating systems show consistency in the network reconstruction steps, we cannot rule out the possibility that the network expansion step might diverge in some cases, irrespective of the internally coded, fixed seeds.

Back to the beginning of the instruction manual

IMPORTANT INFORMATION ON PUBMED ACCESSIBILITY

As of 21.11.22, important changes have been applied to NCBI's e-utilities. In particular, it is now impossible to stream all records exceeding 10,000 PMIDs from any particular query to the PubMed database. This required to redesign the use of the e-utilities. While it's overall functionality was still preserved, we cannot guarantee the retrieval of all matching records, if the network-based queries obtained by intersecting relevant entities match more than 10,000 records (typically, this is a rare event when intersecting at least 4 distinct entities).

TESTED OPERATING SYSTEMS

Below is a list of operating systems tested for installation and running of Singularity/Apptainer and ENQUIRE:

  • Linux 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC (x86_64 GNU/LINUX)
  • Linux 5.15.0-84-generic #93~20.04.1-Ubuntu SMP (x86_64 GNU/LINUX)
  • Virtual Machine created using Oracle Virtual Box and running Ubuntu 20 LTS
  • MacOS Catalina 15.7 (Docker implementation, mid-2012 MacBook Pro)
  • Windows 10 (Docker implementation)

Back to the beginning of the instruction manual

About

Expanding Networks by Querying Unexpectedly Inter-Related Entities

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages