optimize atom representation

Atoms (especially those used as functor names), are really just fancy integers with a human-readable form.  I suspect that about 85% of all atoms can be mapped to a 64-bit integer.  For those that can't be, fall back to the current string representation.
- create a `CodedAtom` type which implements the `Atom` interface
- have `NewAtom` try to create a coded atom then fall back to a string atom
- unification of two compressed atoms is just integer comparison

The simplest coding that could possibly work is:
- treat an atom string as a base-27 (lowercase letters + underscore) integer
- add and multiply our way through the string
- if an unknown letter is encountered or we run out of numeric precision, fall back to a string representation.

Optimize that by extending the base-27 alphabet to a base-64 alphabet where the extra 37 symbols are the 37 [most common bigrams](http://www.sttmedia.com/syllablefrequency-english) in English text.  A base-32 alphabet with only 5 bigrams might work better.

Optimize that with [Huffman coding](https://en.wikipedia.org/wiki/Huffman_coding) or [arithmetic encoding](https://en.wikipedia.org/wiki/Arithmetic_coding) to fit more symbols into a 64 bit integer (or float).

The real goal here is not to achieve the highest possible compression ratio (encoding long atoms into small integers).  Rather, we want to encode the highest percentage of atoms into a fixed space.

If I've correctly understood [Go's internal string representation](http://golang.org/src/pkg/runtime/runtime.h), a single-character string occupies 17 bytes on a 64-bit machine.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize atom representation #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

optimize atom representation #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions