GitHub - swarminglogic/gmcmc: [C++] General Any-Type N-order Markov Chain Monte Carlo for Random Data Generation

`gmcmc` General N-order Markov Chain Monte Carlo

A C++ demonstration/implementation of a type-agnostic markov chain table generator, together with random sampling to generate similar data (the monte carlo part).

This is designed to be completely agnostic to the data type it represents and generates, and to be very easily extended to handle any custom type.

The concept is as following:

I have set of data element sequences (e.g. set of words)
Generate more sequences that are similar (i.e. new words)

Keywords elements and sequences can represent anything, as long as the following is true:

A sequence can be broken up into elements.
Elements can be put together to form a sequence.

To extend the code to support your own custom data type T, you only need to write the logic for sequence->{elements}, and {elements}->sequence transitions, where elements are of type T.

Examples

Random sentence generator

(sequences, elements) = (sentences, words)

Random word generator

(sequences, elements) = (words, letters)

However, by parsing syllables, it could also be:

(sequences, elements) = (words, syllables)

Random arpeggio generator

(sequences, elements) = (arpeggios, music-notes)

Features

N-order: Supports any order (1, 2, ...). This is the size of the N-grams in each markov chain node. In other words 'N previous elements should matter when determining the next transition'.

N_init-order: Since many sequences have special starting properties, it is possible when generating new random data, that the first N_init elements should match that of an input data sequence.

It is required that 1 <= N_init <= N, although a TODO is to have the lower bound be 0, at which point the N_init feature becomes a NOP.

Complexity

Assuming fixed average length of input sequences, perparing the markov chain table from input data is O(K lg K), where K is the number of input sequences.

Generating a new sequence is O(1) average case.

Compiling

Requirements: boost headers, can be installed with your favourite package manager, e.g.:

sudo apt-get install libboost-all-dev

The basic example program is a word generator, and can be built with SCons:

First make sure scons will find boost by first calling . scripts/initialize.sh from the root directory.

After that call scons from the root directory.

To cross-compile for windows, call scons --win64.

See other options with scons --help.

Alternatively:

g++ -o wordgen -std=c++11 -Wall src/wordgen.cpp src/markov/*.cpp \
    src/util/*.cpp -Isrc/ -isystem [BOOST_HEADER_PATH]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
SConstruct		SConstruct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`gmcmc` General N-order Markov Chain Monte Carlo

Examples

Features

Complexity

Compiling

About

Uh oh!

Releases

Packages

Languages

License

swarminglogic/gmcmc

Folders and files

Latest commit

History

Repository files navigation

gmcmc General N-order Markov Chain Monte Carlo

Examples

Features

Complexity

Compiling

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`gmcmc` General N-order Markov Chain Monte Carlo

Packages