Skip to content

Compressed Suffix Tree Example

simongog edited this page May 19, 2012 · 12 revisions

Compressed Suffix Tree Example

This example shows how to use libsdsl to perform operations on a compressed suffix tree.

Creating a compressed suffix tree (cst)

In this section we show how to create a cst. The creation of a suffix tree is handled by the construct_cst function in conjunction with the template parameters of the cst type. The different template parameters are described in more detail below.

Construction:

#include <sdsl/suffixtrees.hpp>
#include <sdsl/util.hpp>

typedef cst_sct3<csa_wt<wt_huff<>, 4>, lcp_dac<> > tCST;

int main(int argc,char** argv) {
    const char* inputfile = argv[1];
    /* tCST defines the type of the cst */
    tCST cst;
    /* construct the suffix tree */
    construct_cst( inputfile, cst );
    return EXIT_SUCCESS; 
}

Saving and Loading a compressed suffix tree

In this section we will show how to write and load a cst from disk:

Saving and loading a suffix tree is managed by the util::store_to_file and util::load_from_file helper functions:

saving:

#include "sdsl/suffixtrees.hpp"

typedef cst_sct3<csa_wt<wt_huff<>, 4>, lcp_dac<> > tCST;

int main(int argc,char** argv) {
    const char* cstfile = argv[1];

    /* assume cst contains a constructed/loaded cst */
    tCST cst;

    util::store_to_file(cst,cstfile);
    return EXIT_SUCCESS; 
}

loading:

#include "sdsl/suffixtrees.hpp"

typedef cst_sct3<csa_wt<wt_huff<>, 4>, lcp_dac<> > tCST;

int main(int argc,char** argv) {
    const char* cstfile = argv[1];
    tCST cst;
    util::load_from_file(cst,cstfile);
    return EXIT_SUCCESS; 
}

Using a compressed suffix tree

Iterating over all nodes in a cst

There are several iterators defined for a cst. The default one does a depth-first search traversal. The iterator takes constant space and constant time per movement. The type of the iterator can be accessed by tCST::const_iterator. As in the STL we get the beginning iterator by calling the begin() method of our object. The iterator points to the root of the cst. We can increment the iterator it (i.e. go to the next node in the depth-first search) by calling ++it. To access the actual content of an iterator, i.e. the actual node, we call the dereference operator *it. Since we visit a node v in the depth-first search traversal two times

  • first, when we have not traversed the subtree rooted at v
  • and second, when we have already traversed the subtree rooted at v

there is method visit() in the corresponding iterator of the node which provides this information. In the first case it returns 1 and in the second case it returns 2. If you want to skip a whole subtree rooted at a node v of an iterator it than you can use it.skip_subtree() to do that. Finally, to check whether you are done with the traversal there is an iterator end() which is reached when all nodes were traversed.

    tCST cst;
    util::load_from_file(cst,cstfile);
    for (tCST::const_iterator it = cst.begin(), end = cst.end(); it != end; ++it){
        if (it.visit()==1){
          tCST::node_type v = *it;
          // do something with the node
        }
    }

Different compressed suffix tree implementations in libsdsl

The are myriads of possible configurations for a cst in the library with different time and space trade-offs since a compressed suffix tree is composed of three substructures (a compressed suffix array, a compressed LCP representation, and a navigation structure). The two cst classes which support the whole set of functionality are:

  • cst_sada and
  • cst_sct3

They differ in construction speed and traversal speed. While cst_sada takes approximately 50% longer to construct, the traversal of the cst is significantly faster.

Clone this wiki locally