Skip to content
Eric T Dawson edited this page Sep 15, 2016 · 1 revision

Intro

vg doesn't provide any rendering tools, instead opting to generate a graphviz-formatted output stream. Here, I'll show you some techniques that are available for the visualization of various elements in the vg universe.

Let's viz!

Basic visualization in vg can be accomplished by processing the graphviz output from vg view. For example, starting in the test/ directory, we can use the following to visualize the graph in chromium:

vg construct -v tiny/tiny.vcf.gz -r tiny/tiny.fa \
    | vg view -d - \
    | dot -Tsvg -o x.svg
chromium-browser x.svg

screenshot from 2015-10-28 12 49 54

You will need to install graphviz tools (such as via sudo apt-get install graphviz on linux).

Visualizing bidirectional sequence graphs

Variation graphs in vg are "train track graphs"---

They implicitly include their reverse complement, and also edges can "reverse" and go from the forward to reverse strand, akin to the way that two rails on a train track work.

This lets us represent inversions without duplicating the inverted sequence, which achieves one of the goals of using variation graphs: that annotations and information about variation can be represented with minimal duplication. Similarly, it means we can avoid the multiple mapping problem that might result if there were two disparate positions in the graph that encode a particular sequence. Finally, this is a standard way of modeling graphs and exactly matches the model encoded in GFA, so this ensures we can use graphs from any source that produces GFA.

If we refer to the two sides of our nodes as "start" and "end", if we go from the start to the end in the forward direction and from the end to the start in the reverse, and if we allow edges to connect either of the two ways then we get four types of edges.

We record the edge type in protobuf/json by indicating which ends are in the non-standard orientation using the from_start and to_end flags.

The default goes "from the end to the start", or "from_start": false, "to_end": false in our serialization format:

Where if we want to express an transition from the forward strand of one node to the reverse of the next, we'd say "from_start": false, "to_end": true:

And a transition from the reverse strand of one node to the forward of the next, would be "from_start": true, "to_end": false:

We render the reverse strand version of the default ("from_start": true, "to_end": true) the same way as the forward strand, as it doesn't provide any additional information. It's just an alternative way of saying the same thing.

These edge types can all be represented in vg format, in GFA, and also in the graphviz output which was used to render these images. In the graphviz output, the different types of edges are modeled using graphviz ports, which let us attach an edge to a particular corner of a node.

Note that in practice we don't usually need to render the node arrows, although this can sometimes help with ambiguous visualizations as in the preceding examples. You can add them back in by piping the graphviz output from vg through sed s/arrowhead=none/arrowhead=normal/g.

Some complex examples

There are a few test cases which we've used during the extension of vg to handle cyclic graphs and bidirectional edges. They exhibit a mixture of cases that were initially problematic, and here can provide an example of what's possible to express and visualize with vg:

screenshot from 2015-10-28 14 39 52

Extending vg view -d to visualize paths

The graph isn't just nodes and edges. We also have paths, which are a critical component of the variation graph reference architecture. We can render them in a few ways by combining the -d flag with -p (show paths as external subgraphs), -n (label edges), and -w (walk edges, adding a new edge to the graph for each path between two nodes).

Here they are in action on a trivial test graph produced above (vg construct -v tiny/tiny.vcf.gz -r tiny/tiny.fa):

vg view -dp

image

vg view -dn

screenshot from 2015-10-28 14 11 25

vg view -dw

screenshot from 2015-10-28 14 12 07

They can be combined. I find vg view -dpn to be very useful:

screenshot from 2015-10-28 14 12 51

Path labeling in visualization

Right now you may be wondering where the grey saxophone is coming from. For visualizing many graphs vg uses 766 unicode pictographs (emoji) and the 8 colors of the Brewer dark28 palette to generate 6128 possible color/symbol combinations that allow easy differentiation of the paths based on tiny symbols. The color/symbol combination is chosen on the basis of a hash of the path name, so as long as the path names are unique we should typically expect unique symbolic identifiers within graphs with reasonable numbers of paths.

This doesn't really help for single paths, but as the number of paths increases it can really help debugging. For instance, this is a fragment of the MHC which has 9 haplotypes:

image

(You can render this with vg msga -f GRCh38_alts/FASTA/HLA/K-3138.fa -B 256 -k 22 -K 11 -X 1 -E 4 -Q 22 -D | vg view -dpn - | dot -Tsvg -o K-3138.svg.)

Viewing alignments

Given the importance of alignments in sequence analysis, it should be easy to view them. Existing tools that work on linear references really won't cut it in the graph world. A solution is to treat the alignments like paths and add some visual indicators that help us interpret the alignment orientation and mismatches between the alignment and the graph.

If you have your alignments in GAM (graph alignment / map) format, you can visualize them against a graph using another extension to vg view -d:

vg construct -v tiny/tiny.vcf.gz -r tiny/tiny.fa >t.vg
vg index -x t.xg -g t.gcsa -k 11 t.vg         
vg sim -l 20 -n 10 -e 0.05 -i 0.02 t.vg  >t.reads
vg map -r t.reads -x t.xg -g t.gcsa -k 22 >t.gam
vg view -d t.vg -A t.gam | dot -Tsvg -o aln.svg

The result shows blue segments for exact matches, yellow for mismatches, and green and purple ends to the alignments to indicate their relative orientation.

image

Large graphs

It's not feasible to use standard visualization algorithms for large graphs. For larger graphs, instead of dot you can use neato, and in addition to neato you can greatly reduce layout time by running the mars graph layout algorithm prior to feeding the graphviz format graph into neato. This remains an active area of research, and is extremely important for improving our ability to work with sequence graphs of all types!

Clone this wiki locally