A multiplexed FISH image analysis pipeline for multi-FOV datasets using a pixel-vector decoding approach. Will work for any coding scheme. Allows for visualization of correlation to bulk FPKM counts, and stitches images and spots.
- Prerequisites
- Getting started
- Parameter settings explained
- Citations and code dependencies
- Authors
- License
- Acknowledgements
A computer that can run Python, with at least 16 GB of RAM. No non-standard hardware is required.
- Split-FISH analysis package
- Python 3.7.3
- numpy: 1.18.1
- pandas: 1.0.3
- h5py: 2.9.0
- matplotlib: 3.1.0
- seaborn: 0.10.0
- scikit-image: 0.16.2
- scikit-learn: 0.22.1
- scipy: 1.4.1
- probe design requires Matlab
This software has been tested on the following versions of software packages:
- Python: 3.7.0, 3.7.3, 3.7.4
- Spyder: 3.3.6, 4.1.1
- numpy: 1.15.1, 1.16.4, 1.16.5, 1.18.1
- pandas: 0.23.4, 0.24.2, 0.25.1, 1.0.3
- h5py: 2.8.0, 2.9.0
- matplotlib: 2.2.3, 3.1.0, 3.1.1
- seaborn: 0.9.0, 0.10.0
- scikit-image: 0.14.0, 0.15.0, 0.16.2
- scikit-learn: 0.19.2, 0.21.2. 0.21.3, 0.22.1
- scipy: 1.1.0, 1.4.1
- Download and install Anaconda.
- Spyder and the required packages will be installed with Anaconda. No further library installation is required.
- Installation of Anaconda typically takes less than 30 minutes.
- Code has been tested to run on Spyder and PyCharm but also can also be run from command line (run
MainScript.py) or any other Python 3 compatible IDE.
-
MainScript.py
This script analyzes multiplexed-FISH data. Requires example datasets . Output: The analysis script generates an output folder with the gene coordinates (.hdf5files), gene counts, and imagedata.hdf5files containing metadata and intermediate processing stage outputs. Plots comparing total counts to bulk FPKM values, visualizations of per-FOV FPKM correlation and counts on a grid, and tables of gene counts are saved in theqcfolder within theoutputfolder. A summary file is also generated. -
stitchMstClasses.py
This script stitches together images and spot coordinate positions after analysis. Output from the main script is required. When run, this script will allow selection of the main data directory as well as the directory where the output of the main script is stored. The bit used for stitching should match the reference bit used for registration in the main script. Output: A combined coordinates file of the gene spot coordinates (coords_..._<date>.hdf5) as well as a stitched image (stitched.hdf5) file. -
spotViewer.py– This simple GUI allows visualization of gene spot locationscoords_..._<date>.hdf5overlaid on the stitched imagestitched.hdf5, either for 4 genes at a time or for all genes. A stitched image file is required to view the spots.
MainScript.py took 7-8 minutes to run on the example dataset, while stitchMstClasses.py took less than 1 minute to run. Both analysis steps were done on a computer with Intel Core i7-6700 CPU @ 3.40GHz, 8 cores and 32GB RAM.
Scripts and data used for the manuscript figures can be found in the splitFISH_manuscript_figures folder. When running the code, a dialog window will open up asking for the directory containing data. Select the main folder for each figure, e.g for Figure 1, navigate to this folder.
When using Spyder to run split-FISH_manuscript_figures:
- Go to tools > Preferences > Current working directory.
- Set the current working directory to the home folder of the split-FISH library.
follow these steps:
-
Open the
MainScript.pyfile in Spyder. -
Look for the
Define user’s base parametersline: fields below that need changing. Refer to parameter descriptions for more information. For the purpose of this guide we will only be exploring some basic parameters. -
file_params[“fpkm_filepath”]specifies Location of the FPKM file for the dataset. Change the file name / location accordingly. In this example, we will run the brain dataset. -
“fovs_to_process”: Provide a list of FOVs for analysis. Four FOVs are provided in the brain dataset, from 0 to 4. You can either type in the list manually, or by using Python'slist(range())function. -
“hyb_list”and“type_list”are important parameters that tell the script which images correspond to which bit of the codebook. The correct“hyb_list”and“type_list”for the example dataset has been provided in theMainScript.py, but as an example, for a dataset with 6 bits run in 3 hybridization rounds using only the Cy5 and Cy7 channel:
split_fish_params["hyb_list"] = [0, 1, 2, 0, 1, 2]
split_fish_params["type_list"] = ["Cy5", "Cy5", "Cy5", "Cy7", "Cy7", "Cy7"]Here, Bit 1, 2, and 3 are ran in hybridzation rounds 0, 1, and 2 respectively, and imaged in the Cy5 channel.
-
Run the script. A pop up window will appear behind the Spyder window asking you to select the data directory. Navigate to the location of the demo dataset and hit select folder.
-
The code will start running the analysis. The analysis is complete when the console reports the time taken to run the whole script.
-
The code will create three folders in the data directory – output, data_tables, and filters. The output folder is where you will find results – FPKM plots, gene coordinates, gene counts, and more. A
.tsvfile summary will also be generated. Example:
| correlation | p_value | total_count | percent_above_blank | gene_blank_ratio | |
|---|---|---|---|---|---|
| split_fish_4fovs_test | 0.74 | 1.94E-3 | 14498 | 76 | 20 |
- Stitching can only be done after the main analysis has been completed. The script will look for the image from bit
basebit, but the approprate bit to use can also be specified by a combination ofbasetype(referring to colour channel) andbasehyb(referring to hybridization round) if you setbasebit = None. For this example we use bit 14 as ourbasebit. Important: Thebasebitmust be the same as thereference_bitset in the main script, otherwise the spots will be offset from the image.
# Define parameters
# -----------------
microscope_type = "Dory"
basebit = 14
basetype = "Cy5"
basehyb = 0- Run the code. A popup window will appear behind the Spyder window, asking you to choose the main data directory. Navigate to the folder where the images are and select folder. Next, a window will appear asking to choose the directory with the processed images and spots data. Navigate to the specific output folder where the hdf5 coordinates files are located. In this guide, with the provided example output, this would be the “
split_fish_4FOVs_mag0_40_ssth0_80_<date>” folder. Select the folder and let the stitching program run. Note For example dataset, field correction masks (estimated from all 49 FOVs) are not provided and may result in suboptimal stitcing output.
-
SpotViewer can be used to open
.daxand.hdf5image files, as well as.hdf5spot coordinate files for gene location visualization. Open thespotViewer.pyscript with Spyder and run the script. Click the ‘Raw images file’ button, and select thestitched.hdf5file generated from the stitching code. Wait a moment for the stitched image to appear. You can then adjust the contrast of the image using the figure button. Click the ‘Genes file’ button, and select thecoords_combined_merged_<date>.hdf5file generated from the stitching code. -
Clicking on the 'View ALL Genes' button will show all the genes in the
.hdf5spot coordinate files; clicking on 'Show Genes' will display the genes selected in the drop down menus. Clicking on the 'Eraser' button will remove all gene spots. Gene spot size can be adjusted in the 'SPOT SIZE' field. -
The displayed image can be saved using the save button; hi-res images can be saved by using the 'Save Hi-res' button.
An explanation of each of the parameters used for decoding in MainScript.py.
Detailed explanations:
-
hyb_list, colour_list, num_bits
2 sets of lists of the same length that specify the hybridization round number (typeint) and colour channel (typestr) for each bit of the coding scheme.num_bitsis the total number of bits and must match the lengths ofhyb_listandcolour_list. -
fovs_to_process
Subset of FOVs to process. Can be specified as a list ofints orstrs. If usingstrto represent, you must match the filename FOV exactly. e.g. if filename is<colour>_<hyb>_01, FOV must be specified as01. -
fovstr_length
Length of the FOV string representation e.g.01has length of 2,001has length of 3. Typically dependent on number of FOVs taken e.g. 10-99 will have string length of 2, more than 100 would have string length of 3. -
reference_bit
The bit to use as a reference for registration. Make sure to use the same bit when stitching, otherwise spots will be offset from the image. -
stage_pixel_matrix
Conversion matrix from stage coordinates (in um) to image coordinates (pixels). The configuration of the matrix accounts for any kind of rotation or flipping relative to physical stage coordinates. This is used to generate a grid of FOVs as well as set up starting estimates for stitching FOVs. Hence needs to be reasonably accurate but no need to be exact. e.g. Microscope used in split-probe paper has approximately 8 pixels per micron, and is flipped and has y and x axes swapped:8 * np.array([[0, -1], [-1, 0]]) -
num_iterations
Not implemented yet in this version. Set to1. -
drop_genes
Not implemented. Do not use. -
roi
Ignore. Not applicable to microscope used in split-probe paper. Set toNone. -
fpkm_structname
Ignore. Legacy parameter for when we were reading matlab .mat files that contained FPKM values.
The following paramters toggle whether various corrections (background subtraction, field and chromatic distortion correction) are applied to images
-
correct_field
Whether to perform field correction. If set toTrue, make sure to have an appropriate field correction .hdf5 file in the calibration folder. This file should contain field maps for all colours used in your imaging run. -
imgcolour_to_maskcolour
A dictionary that matches the corresponding colours in the image filenames to those in the correction .hdf5 file. e.g. if you choose to use washed / bleached images for correction:
params["imgcolour_to_maskcolour"] = {
"Cy5": "Cy5_Bleach",
"Cy7": "Cy7_Bleach",
}-
correct_distortion
Whether to perform chromatic distortion correction. If set toTrue, make sure to have the appropriate colour-to-colour .tsv files. -
type_to_colour
A dictionary that matches the corresponding references in the image filenames to the colour channel specifications in the distortion correction .tsv files. e.g.
params["type_to_colour"] = {
'Cy3': '558',
'Alexa594': '620',
'Cy5': '684',
'Cy7': '776',
}-
reference_colour
The reference colour to use for correcting distortion. All other colour channels will be distorted to match this channel. -
subtract_background
Whether to subtract background. Not used in split-probe paper.
The following parameters change the properties of the filter used to preprocess the image
-
filter_order
The order of the 2D frequency domain filter. 2 seems to work best. -
low_cut
The low frequency cutoff for the 2D frequency domain filter. Specified in pixels. -
high_cut
The high frequency cutoff for the 2D frequency domain filter. Specified in pixels.
The following parameters affect the normalization of images across each bit.
-
percentile_norm_high
The percentile of image intensities in a single image to use for normalization, if global normalization is not used. -
global_perfov_percentile, global_pooled_percentile, globalnorm_fovsubset
Parameters for global normalization. For each bit, all pixel intensities in each FOV aboveglobal_perfov_percentileare pooled. Then, theglobal_pooled_percentileintensity value of the pooled pixel intensities is used for normalization.globalnorm_fovsubsetis optional and allows use of a subset of the FOVs for global normalization. -
clip_normalized Whether to clip values above the normalization value. e.g. if normalization value for filtered image is 2000, a pixel intensity value of 2200 will have a normalized value of 1.1 if this is set to
Falseand will be clipped to 1 if this is set toTrue.
The following thresholds are applied on Corrected, Registered, Filtered, Normalized and (optional) Clipped images.
- magnitude_threshold
This is the root mean square of intensities across all bits. Used to eliminate pixels that are too dim. Note that pixels that do not pass this threshold are not decoded. Setting this higher may slightly reduce decoding time.
The following thresholds are applied after unit-normalization of each pixel (i.e. each pixel vector is normalized to 1).
- distance_threshold
Maximum vector distance from unit-normalized codeword within which to call the pixel as a particular codeword
The following are applied after spots are grouped into connected regions i.e. they apply to spots instead of pixels.
- min_region_size
set minimum size for spots (connected regions) to be greater than or equal to this number of pixels
Detailed explanations:
- codebook_filepath Location of the codebook .tsv file. This should be a tab-separated text file with bit values as its first columns and gene names as its right-most column.
- fpkm_filepath Location of the FPKM values .tsv or .txt file. This should be a tab-separated text file with gene names in its first column and FPKM values as its second column. The gene names should match those in codebook_filepath. Only genes common to the gene name columns in both files will be used for analysis.
- calibration_path The folder where the calibration files for both field and chromatic correction are stored. Chromatic correction files are
.tsvfiles. Field correction files are generated byfieldCorrMaskGenerator.py. There should only be one field correction file in the folder.
-
The pixel decoding approach, in particular the use of normalized vector distances to compare pixel vectors to codewords, was originally implemented here: https://github.com/ZhuangLab/MERFISH_analysis
-
ReadDax and WriteDax are modified versions of readers/writers for the
.daxbinary image format found in https://github.com/zhuanglab/storm-control, which is under the MIT License. -
LHS implementation is from PyDOE which is under the BSD License (3-Clause). An unmodified version of the code for Latin-Hypercube sampling is included in the "external" folder so that the user does not need to install the full PyDOE package.
-
The probe design script was modified from the Zhuang Lab github (https://github.com/ZhuangLab/MERFISH_analysis/blob/master/example_scripts/library_design_example.m) to 'split' the probes and attach the bridge sequences
- NumPy
- Pandas
- SciPy
- scikit-image
A version ofregister_translationis included in the "utils" folder. This has been modified to output a value proportional to the size of the phase correlation peak (the original implemenation outputs a translation-invariant value). The main algorithm is unchanged. This modification is only used for stitching FOVs. - scikit-learn
- matplotlib
- Seaborn
- h5py
- pyQt5
-
Nigel Chou - main contributor - (https://github.com/chousn)
-
Mike Huang - contributed initial stitching pipeline - (https://github.com/mikejhuang)
-
Vipul Singhal - contributed LHS implementation - (https://github.com/vipulsinghal02)
-
Wan Yi Seow - performed code testing and created documentation - (https://github.com/seowwy)
See full license here.
- Jolene Goh
- Shyam Prabhakar
- Yun-ching Chang
- Maurice Lee
- CHEN Kok Hao, chenkh at gis.a-star.edu.sg
- CHOU Shijie Nigel, Nigel_Chou at gis.a-star.edu.sg
