Skip to content

Simulation studies to evaluate clustering concordance measures

License

Notifications You must be signed in to change notification settings

ms609/split-support

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

207 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clustering concordance - simulation evaluation

This repository contains the simulation studies employed in Smith (forthcoming). Analyses and analytical output are contained within the data-raw directory.

R scripts

To reproduce the simulation workflow, execute the .R scripts in numerical sequence.

First ensure that you have installed a current version of TreeSearch: packageVersion("TreeSearch") should return at least 1.8.0.

To install the development version of the package, run devtools::install_github("ms609/TreeSearch")

Configuration and simulation

  • _config.R: Sets up the analysis and defines utility functions. Edit this file to specify:

    • The size of the tree and the datasets to be simulated.
    • The location of your own executables of TNT, MrBayes and IQ-TREE.
    • Your HPC login credentials.
  • 10_simluate.R: Simulates alignments using six rate categories drawn from a discretized gamma distribution.

    Outputs:

    • reference-gam.tre: Reference topology used for simulation, in newick format;
    • alignments/gam####.nex: Simulated alignment in NEXUS format; #### denotes replicate ID.

Inference and support calculation

  • 20_MrBayes.R: Conduct Bayesian inference locally in MrBayes.

    Inputs:

    • Alignments generated by 10_simluate.R;
    • mb-gam.nex: MrBayes block, in NEXUS format, specifying analytical parameters.

    Outputs:

    • MrBayes/gam####.con.tre: Majority-rule consensus tree in Nexus format;
    • MrBayes/gam####.con.parts: Partition identifiers in plain text format;
    • MrBayes/gam####.con.pstat: Parameter estimates (tree length, alpha) in tabular format;
    • MrBayes/gam####.con.tstat: Split probabilities in tabular format.
  • 20_MrBayes_HPC.R: Alternatively, use SLURM template slurm.sh to schedule Bayesian inference analyses on a remote server.

  • 25_MrBayes_HPC_retrieve.R: Retrieve available output files from completed tasks on remote server.

  • 30_iqtree.R: Conduct maximum likelihood analysis using IQ-TREE.

    Inputs:

    • Alignments generated by 10_simluate.R.

    Outputs:

    • gam####.phy.splits.nex: Split identifiers in NEXUS format, labelled with ultrafast bootstrap support values.
    • gam####.phy.treefile: Maximum likelihood tree with edge lengths, in Newick format; each node is labelled with SH-aLRT support (%) / local bootstrap support (%) / aBayes support / ultrafast bootstrap support (%).
  • 40_tnt.R: Conduct parsimony analysis in TNT. Inputs:

    • Alignments generated by 10_simluate.R.
    • tnt-ew.run: TNT script for parsimony analysis and calculation of edge support.
    • bremer.run: TNT script to compute Bremer support, from Goloboff et al. (2008).

    Outputs: gam####.ew.out: Single most parsimonious tree with edge support values, in TNT format.

Analysis and visualization

Scripts that process the outputs obtained above to compute and display statistics. Calculation results are cached on first calculation in the alignments, concordance and entropy subdirectories.

  • 80_byEdge.R: Edgewise character concordance statistics. Outputs:

    • Fig 2 - edge concordance.pdf: Figure 2 from Smith (forthcoming).
  • 90_byChar.R: Characterwise character concordance statistics. Outputs:

    • Fig 3 - character concordance.pdf: Figure 3 from Smith (forthcoming).

References

Goloboff, P.A., Farris, J.S., and Nixon, K.C. (2008) TNT, a free program for phylogenetic analysis. Cladistics 24(5): 774--786. doi:10.1111/j.1096-0031.2008.00217.x

Smith, M.R. (forthcoming). Which characters support which clades? Exploring the distribution of phylogenetic signal using concordant information.