Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
25408c6
Increament version.
emanuega Dec 3, 2019
450ab95
Added ability to put text labels on the mosaic to indicate the fov. (…
timblosser Dec 5, 2019
5ba40d8
Assign cell fix (#32)
Dec 5, 2019
907f4c2
Merge pull request #33 from emanuega/v0.1.3
Dec 5, 2019
60f4ca9
Increment version.
emanuega Dec 7, 2019
0f0dc83
Update docs and requirements to better reflect the current version of…
emanuega Dec 8, 2019
f08e0be
Parallel cleancell (#34)
Dec 11, 2019
e3f3c95
Reduced snakemake overhead by adding analysis task to check completio…
Jan 11, 2020
63aebc5
Fix bug in GenerateAdaptiveThreshold, avoids crash when resubmit (#40)
leonardosepulveda Jan 22, 2020
a9409a7
Merge pull request #41 from emanuega/v0.1.4
Jan 22, 2020
fe5c4de
Increment version number.
emanuega Jan 22, 2020
3e82b52
Updated aws bucket name.
emanuega Jan 30, 2020
ec5eb87
Updated shapely requirements.
emanuega Jan 30, 2020
861b4e8
Merge pull request #45 from emanuega/aws_test_bucket_update
Jan 31, 2020
b688a08
Fix docutil version requirements.
leonardosepulveda Feb 15, 2020
f221a3d
Updated filemap to only include filename so that the path can be easi…
timblosser Feb 27, 2020
ab9a4ee
Moved graph read and write to within dataset.
emanuega Feb 9, 2020
9b8a43a
Added ancient tag for snakemake for all inputs.
emanuega Feb 10, 2020
7640da5
Updated how tracking of completion of parallel analysis tasks is done.
emanuega Mar 10, 2020
96af66e
Removed print statement.
emanuega Mar 10, 2020
9aef305
Reduced time to check if analysis task is idle. Now a analysis task c…
emanuega Mar 10, 2020
bcb7eb8
Merge pull request #52 from emanuega/gpickle_readwrite_to_dataset
Mar 20, 2020
559a115
Merge branch 'v0.1.5' into cleanup_duplicate_barcodes
Mar 24, 2020
502ba49
adding functionality to remove zplane duplicates
Mar 25, 2020
703282c
fixing imports, adding to tests, updating docs
Mar 25, 2020
d21f9a0
pep8 and changelog
Mar 25, 2020
5ff99cf
pep8
Mar 25, 2020
c24585c
pep8
Mar 25, 2020
01fbb7b
pep8
Mar 25, 2020
43c7f73
Improved decoding speed
Mar 28, 2020
a120ba3
restructuring to address comments
Mar 30, 2020
47893d8
updating for unit test
Mar 30, 2020
d0f6523
updating for unit test
Mar 30, 2020
e37fb10
adding test
Mar 30, 2020
ba99438
updating item method to address future warning
Mar 30, 2020
f1a5c8d
updating test
Mar 30, 2020
c505efc
fixing function call
Mar 30, 2020
f94c64e
changing sequential high pass to false by default
Mar 30, 2020
5d7767f
updating docs
Mar 30, 2020
2e36ddb
Merge branch 'v0.1.5' into cleanup_duplicate_barcodes
emanuega Mar 30, 2020
3b3b783
pep8
Mar 31, 2020
bd661cd
Merge pull request #53 from emanuega/cleanup_duplicate_barcodes
Mar 31, 2020
9024430
Merge pull request #55 from emanuega/v0.1.5
Mar 31, 2020
4818343
Increment version number.
emanuega Mar 31, 2020
cde887b
Add Lucy-Richardson deconvolution algorithm that uses projectors as d…
HazenBabcock Apr 8, 2020
9dd91d7
Fix super-class. Add FIXME about setting 'decon_iterations'.
HazenBabcock Apr 9, 2020
fa5376f
Update FIXME.
HazenBabcock Apr 9, 2020
0cfd0f7
Fix default value for 'decon_iterations' as suggested by George E. Im…
HazenBabcock Apr 9, 2020
1360338
Add DeconvolutionPreprocessGuo to the documentation. Update CHANGELOG.
HazenBabcock Apr 9, 2020
6217aca
addressing edge case of no barcodes and correcting z position indexin…
Apr 10, 2020
1855b12
updating test z barcode z indexes to reflect typical barcode z indexing
Apr 10, 2020
e38e9b2
updating changelog
Apr 10, 2020
d09954d
Rename filter module to imagefilters. Change high_pass_filter functio…
HazenBabcock Apr 10, 2020
7beed88
Move image high pass filtering into it's own method. Use imagefilters…
HazenBabcock Apr 10, 2020
e1f7b1c
Remove normalization as the back projector is already normalized. Red…
HazenBabcock Apr 10, 2020
61212f7
Cast high pass image to float for deconvolve_lucyrichardson() function.
HazenBabcock Apr 10, 2020
4f3f481
Fix whitespace.
HazenBabcock Apr 10, 2020
1b59e3d
Changed high_pass_filter to maintain the data type of the input image.
emanuega Apr 11, 2020
2fa7927
Add deconvolution tests.
HazenBabcock Apr 11, 2020
e7d61a4
Merge branch 'v0.1.6' into faster_decon
emanuega Apr 13, 2020
b940770
Merge pull request #56 from HazenBabcock/faster_decon
emanuega Apr 13, 2020
0af265e
Add optional 'fov_index' parameter that specifies which fov and z sec…
HazenBabcock Apr 13, 2020
d8a60fc
Add documentation for the fov_index parameter. Move random choice of …
HazenBabcock Apr 15, 2020
64532d2
Remove whitespace.
HazenBabcock Apr 15, 2020
cce491a
Text tweak.
HazenBabcock Apr 15, 2020
0e734e0
moving z plane duplicate removal to decode step
Apr 15, 2020
74e08b8
updating docs and changelog
Apr 15, 2020
9c2bfd6
adding test for case where no barcodes are present but get passed to …
Apr 15, 2020
7f2551c
pep8
Apr 15, 2020
b907c68
updating tests analysis parameters
Apr 15, 2020
fd627bd
updating test analysis parameters
Apr 15, 2020
85af00b
moving paraams and function from barcodesaving task to decode
Apr 17, 2020
175480d
Merge branch 'v0.1.6' into filterfix
emanuega Apr 17, 2020
6fc6d46
Merge pull request #57 from emanuega/filterfix
Apr 17, 2020
22002ff
Merge branch 'v0.1.6' into deterministic_optimization
emanuega Apr 17, 2020
c802058
Merge pull request #59 from HazenBabcock/deterministic_optimization
emanuega Apr 17, 2020
cfd8aba
Merge pull request #60 from emanuega/v0.1.6
emanuega Apr 17, 2020
1436fc4
updating license
Apr 20, 2020
616d8e9
adding citation
Apr 20, 2020
29bda2c
adding doi badge
Apr 20, 2020
be3c994
Merge pull request #61 from ZhuangLab/master
emanuega Apr 20, 2020
5056b1c
Change .format to % in dataset.py
LesikDee Nov 16, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions CHANGELOG.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Exposed tolerance parameter in the adaptive filter barcodes method
- Added plot for scale factor magnitude vs bit index
- Fixed barcode partitioning to include cells from adjacent fields of view when a cell falls across fov boundaries

## [0.1.3] - 2019-12-04
### Fixed
- Addressed bugs present in cleaning overlapping cells and assigning them to a fov
### Added
- Added option to draw field of view labels overlaid on the mosaic

## [0.1.4] - 2019-12-05
### Added
- Added task to evaluate whether a parallel analysis task has completed
### Changed
- Changed the clean overlapping cells to run in parallel
- Snakemake job inputs were simplified using the ParallelCompleteTask to improve DAG construction speed and overall snakemake runtime performance

## [0.1.5] - 2020-01-22
### Changed
- Updated the filemap to only store the file name so that it can easily be pointed to new data home directories. This change maintains backward compatibility.
- Improved decoding speed
### Added
- Parameters to filter tasks that enable removing barcodes that were putatively duplicated across adjacent z planes.

## [0.1.6] -
### Fixed
- Fixed bug and edge cases in removal of barcodes duplicated across z planes. Moved to the decode step to prevent unintended conflict with misidentification rate determination.

### Added
- An alternative Lucy-Richardson deconvolution approach that requires ~10x fewer iterations.

4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[![CircleCI](https://circleci.com/gh/emanuega/MERlin/tree/master.svg?style=svg)](https://circleci.com/gh/emanuega/MERlin/tree/master)
[![codecov](https://codecov.io/gh/emanuega/MERlin/branch/master/graph/badge.svg)](https://codecov.io/gh/emanuega/MERlin)
[![DOI](https://zenodo.org/badge/202668055.svg)](https://zenodo.org/badge/latestdoi/202668055)

# MERlin - Extensible pipeline for scalable data analysis

Expand All @@ -9,6 +10,9 @@ single task or split among many subtasks that can be executed in parallel. MERli
execute workflows on a single computer, on a high performance cluster, or on the cloud
(AWS and Google Cloud).

If MERlin is useful for your research, consider citing:
Emanuel, G., Eichhorn, S. W., Zhuang, X. 2020, MERlin - scalable and extensible MERFISH analysis software, v0.1.6, Zenodo, doi:10.5281/zenodo.3758540

Please find the most recent version of MERlin [here](https://github.com/emanuega/merlin).

## MERFISH data analysis
Expand Down
11 changes: 9 additions & 2 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,14 @@ MERlin can be installed by cloning the repository and installing with pip:
Specifying paths with a .env file
==================================

A .env file is required to specify the search locations for the various input and output files. The following variables should be defined in a file named .env in the user home directory (~\.env on linux or C:\users\UserName\.env on Windows):
A .merlinenv file is required to specify the search locations for the various input and output files. The following variables should be defined in a file named .merlinenv in the user home directory (~\\.merlinenv on linux or C:\\users\\UserName\\.merlinenv on Windows):

* DATA\_HOME - The path of the root directory to the raw data.
* ANALYSIS\_HOME - The path of the root directory where analysis results should be stored.
* PARAMETERS\_HOME - The path to the directory where the merfish-parameters directory resides.

The PARAMETERS_HOME directory should contain the following folders:

* analysis - Contains the analysis parameters json files.
* codebooks - Contains the codebook csv files.
* dataorganization - Contains the data organization csv files.
Expand All @@ -76,10 +77,16 @@ The PARAMETERS_HOME directory should contain the following folders:
An example PARAMETERS_HOME directory with typical files can be found in the
`merlin-parameters-example <https://github.com/emanuega/merlin-parameters-example>`_ repository.

The contents of an example .env file are below:
The contents of an example .merlinenv file are below:

.. code-block:: none

DATA_HOME=D:/data
ANALYSIS_HOME=D:/analysis
PARAMETERS_HOME=D:/merfish-parameters

Merlin can create a .merlinenv file for you using the command:

.. code-blocks:: none

merlin --configure .
44 changes: 40 additions & 4 deletions docs/tasks.rst
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,19 @@ Parameters:
* decon\_iterations -- The number of Lucy-Richardson deconvolution iterations to perform on each image.
* decon\_filter\_size -- The size of the gaussian filter to use for the deconvolution. It is not recommended to set this parameter manually.

preprocess.DeconvolutionPreprocessGuo
--------------------------------------

Description: High-pass filters and deconvolves the image data in preparation for bit-calling. This version uses the Lucy-Richardson deconvolution approach described in this reference - `Guo et al. <http://dx.doi.org/10.1101/647370>`.

Parameters:

* warp\_task -- The name of the warp task that provides the aligned image stacks.
* highpass\_pass -- The standard deviation to use for the high pass filter.
* decon\_sigma -- The standard deviation to use for the Lucy-Richardson deconvolution.
* decon\_iterations -- The number of Lucy-Richardson deconvolution iterations to perform on each image. The default value is 2.
* decon\_filter\_size -- The size of the gaussian filter to use for the deconvolution. It is not recommended to set this parameter manually.

optimize.Optimize
------------------

Expand All @@ -43,7 +56,8 @@ Description: Determines the optimal per-bit scale factors for barcode decoding.
Parameters:

* iteration\_count -- The number of iterations to perform for the optimization.
* fov\_per\_iteration -- The number of fields of view to decode in each round of optimization.
* fov\_index -- (Optional) A list of [[fov_1, z_value_1], [fov_2, z_value_2], ..] specifying which fields of view and what z values should be used for optimization.
* fov\_per\_iteration -- The number of fields of view to decode in each round of optimization. This will be set to the length of ``fov_index`` if the ``fov_index`` parameter is specified.
* estimate\_initial\_scale\_factors\_from\_cdf -- Flag indicating if the initial scale factors should be estimated from the pixel intensity cdf. If false, the initial scale factors are all set to 1. If true, the initial scale factors are based on the 90th percentile of the pixe intensity cdf.
* area\_threshold -- The minimum barcode area for barcodes to be used in the calculation of the scale factors.

Expand All @@ -58,6 +72,9 @@ Parameters:
* write_decoded\_images -- Flag indicating if the decoded and intensity images should be written.
* minimum\_area -- The area threshold, below which decoded barcodes are ignored.
* lowpass\_sigma -- The standard deviation for the low pass filter prior to decoding.
* remove\_z\_duplicated\_barcodes -- Remove putative duplicate barcode counts from adjacent z planes.
* z\_duplicate\_zPlane\_threshold -- If removing putative duplicate barcodes, number of adjacent z planes to consider, generally anything within 2 µm would be worth considering.
* z\_duplicate\_xy\_pixel\_threshold -- If removing putative duplicate barcodes, maximum euclidean distance in xy pixels that can separate the centroids of putative duplicates.

filterbarcodes.FilterBarcodes
------------------------------
Expand Down Expand Up @@ -98,10 +115,20 @@ Parameters:
* seed\_channel\_name -- The name of the data channel to use to find seeds
* watershed\_channel\_name -- The name of the data channel to use as the watershed image.W

segment.AssignCellFOV
segment.CleanCellBoundaries
--------------------------------

Description: For a FOV of interest, this task identifies all other FOVs with any overlapping regions, and constructs a graph containing cells from the FOV of interest and all cells from either that FOV or the overlapping FOVs that overlap a cell, with edges connecting overlapping cells

segment.CombineCleanedBoundaries
--------------------------------

Description: Assigns each cell to the FOV centroid they are closest to, and eliminates overlapping cells from the dataset, keeping 1.
Description: Combines the cleaned cell boundaries generated for each fov, and eliminates overlapping cells, preferentially removing cells that overlap with the largest number of other cells until there is no more overlap in a given group of cells.

segment.RefineCellDatabases
--------------------------------

Description: Creates a new cell database based on an initial cell database and a set of cells to keep.

segment.ExportCellMetadata
--------------------------------
Expand All @@ -119,7 +146,7 @@ Parameters:
* data\_channels -- The names of the data channels to export, corresponding to the data organization. If not provided, all data channels are exported.
* z\_indexes -- The z index to export. If not provided all z indexes are exported.
* fov\_crop\_width -- The number of pixels to remove from each edge of each fov before inserting it into the mosaic.

* draw\_fov\_labels -- Flag indicating if the fov index should be drawn on top of each fov in the mosaic
sequential.SumSignal
-------------------------------

Expand Down Expand Up @@ -169,3 +196,12 @@ Parameters:
* sum\_task
* partition\_task
* global\_align\_task

paralleltaskcomplete.ParallelTaskComplete
_________________________________________

Description: Check whether a parallel analysis task has completed all jobs and create a done fine for that task if so. This task does not need to be invoked by the user, it is used by the snakewriter.

Parameters:

* dependent\_task -- the parallel analysis task to check to see if it has completed
150 changes: 21 additions & 129 deletions license.md
Original file line number Diff line number Diff line change
@@ -1,129 +1,21 @@
**Academic and/or Non-Commercial Research Use Software License and Terms of
Use**

This Agreement concerns MERLIN (the “Software”), useful for analysis of images
obtained using MERFISH. The Software was developed by George Emanuel, Stephen
Eichhorn, and Xiaowei Zhuang at Harvard University.

Using the Software indicates your agreement to be bound by the terms of this
Software Use Agreement (“Agreement”). Absent your agreement to the terms below,
you (the “End User”) have no rights to hold or use the Software whatsoever.

President and Fellows of Harvard College (“Harvard”) agrees to grant hereunder
the limited non-exclusive license to End User for the use of the Software in the
performance of End User’s internal, non-commercial research and/or academic use
at End User’s institution or company (“Institution”) on the following terms and
conditions:

1. **NO REDISTRIBUTION.** The Software remains the property of Harvard, and
except as set forth in Section 4, End User shall not publish, distribute, or
otherwise transfer or make available the Software to any other party.

2. **NO COMMERCIAL USE.** End User shall not use the Software for commercial
purposes and any such use of the Software is expressly prohibited. This
prohibition includes, but is not limited to, use of the Software in
fee-for-service arrangements or to provide research services to (or in
collaboration with) third parties for a fee. This prohibition does not
extend to use for internal research purposes within a for-profit entity. If
End User wishes to use the Software for commercial purposes prohibited
herein, or for any other restricted purpose, End User must execute a
separate license agreement with Harvard.

> *Requests for use of the Software for or on behalf of for-profit entities or
> for any commercial purposes, please contact*:
>
> Office of Technology Development
Harvard University
Smith Campus Center, Suite 727E
1350 Massachusetts Avenue
Cambridge, MA 02138 USA
Telephone: (617) 495-3067
E-mail: otd\@harvard.edu

3. **OWNERSHIP AND COPYRIGHT NOTICE.** Harvard owns all intellectual property
in the Software. End User shall gain no ownership to the Software. End User
shall not remove or delete and shall retain in the Software, in any
modifications to Software and in any Derivative Works, the copyright,
trademark, or other notices pertaining to Software as provided with the
Software.

4. **DERIVATIVE WORKS.** End User may create and use Derivative Works, as such
term is defined under U.S. copyright laws, provided that any such Derivative
Works shall be restricted to non-commercial, internal research and/or
academic use at End User’s Institution. End User may distribute Derivative
Works to other institutions solely for the performance of non-commercial,
internal research and/or academic use on terms substantially similar to this
License and Terms of Use.

5. **FEEDBACK.** In order to improve the Software, comments from End Users may
be useful. End User agrees to provide Harvard with feedback on the End
User’s use of the Software (e.g., any bugs in the Software, the user
experience, etc.). Harvard is permitted to use such information provided by
End User in making changes and improvements to the Software without
compensation or an accounting to End User.

6. **NON ASSERT.** End User acknowledges that Harvard may develop modifications
to the Software that may be based on the feedback provided by End User under
Section 5 above. Harvard shall not be restricted in any way by End User
regarding the use of such information. End User acknowledges the right of
Harvard to prepare, publish, display, reproduce, transmit and or use
modifications to the Software that may be substantially similar or
functionally equivalent to End User’s modifications and/or improvements if
any. In the event that End User obtains patent protection for any
modification or improvement to Software, End User agrees not to allege or
enjoin infringement of End User’s patent against Harvard or any of its
researchers, medical or research staff, officers, directors and employees.

7. **PUBLICATION & ATTRIBUTION.** End User has the right to publish, present,
or share results from the use of the Software.  In accordance with customary
academic practice, End User will acknowledge Xiaowei Zhuang Laboratory at
Harvard as the provider of the Software.

8. **NO WARRANTIES.** THE SOFTWARE IS PROVIDED "AS IS." TO THE FULLEST EXTENT
PERMITTED BY LAW, HARVARD HEREBY DISCLAIMS ALL WARRANTIES OF ANY KIND
(EXPRESS, IMPLIED OR OTHERWISE) REGARDING THE SOFTWARE, INCLUDING BUT NOT
LIMITED TO ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, OWNERSHIP, AND NON-INFRINGEMENT. HARVARD MAKES NO
WARRANTY ABOUT THE ACCURACY, RELIABILITY, COMPLETENESS, TIMELINESS,
SUFFICIENCY OR QUALITY OF THE SOFTWARE. HARVARD DOES NOT WARRANT THAT THE
SOFTWARE WILL OPERATE WITHOUT ERROR OR INTERRUPTION.

9. **Limitations of Liability and Remedies**. USE OF THE SOFTWARE IS AT END
USER’S OWN RISK. IF END USER IS DISSATISFIED WITH THE SOFTWARE, ITS
EXCLUSIVE REMEDY IS TO STOP USING IT. IN NO EVENT SHALL HARVARD BE LIABLE TO
END USER OR ITS INSTITUTION, IN CONTRACT, TORT OR OTHERWISE, FOR ANY DIRECT,
INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR OTHER DAMAGES OF
ANY KIND WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE SOFTWARE, EVEN
IF HARVARD IS NEGLIGENT OR OTHERWISE AT FAULT, AND REGARDLESS OF WHETHER
HARVARD IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

10. **INDEMNIFICATION.** To the extent permitted by law, End User shall
indemnify, defend and hold harmless Harvard and its current or future
directors, trustees, officers, faculty, medical and professional staff,
employees, students and agents and their respective successors, heirs and
assigns (the "Indemnitees"), against any liability, damage, loss or expense
(including reasonable attorney's fees and expenses of litigation) incurred
by or imposed upon the Indemnitees or any one of them in connection with any
claims, suits, actions, demands or judgments arising from End User’s breach
of this Agreement or its Institution’s use of the Software except to the
extent caused by the gross negligence or willful misconduct of Harvard. This
indemnification provision shall survive expiration or termination of this
Agreement.

11. **GOVERNING LAW.** This Agreement shall be construed and governed by the
laws of the Commonwealth of Massachusetts regardless of otherwise applicable
choice of law standards.

12. **NON-USE OF NAME.** Nothing in this License and Terms of Use shall be
construed as granting End Users or their Institutions any rights or licenses
to use any trademarks, service marks or logos associated with the Software.
You may not use the terms “Harvard” (or a substantially similar term) in any
way that is inconsistent with the permitted uses described herein. You agree
not to use any name or emblem of Harvard or any of its subdivisions for any
purpose, or to falsely suggest any relationship between End User (or its
Institution) and Harvard, or in any manner that would infringe or violate
any of Harvard’s rights.

13. End User represents and warrants that it has the legal authority to enter
into this License and Terms of Use on behalf of itself and its Institution.

The MIT License

Copyright (c) 2019 Harvard University

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Loading