Stemex is a NIF wrapper above the snowball language (http://snowball.tartarus.org/index.php).
Add any snowball algorithm in the algorithms directory, for instance
algorithms/ALGONAME.sbl containing an external procedure named stem and
Stemex compiler will:
- Compile the nif shared library
priv/Elixir.Stemex_nif.so - Add dynamically the Elixir function
Stemex.ALGONAME/1which takes an UTF8 binary string and returned the result of theALGONAME.sblalgorithm
It is MEANT to be used in your project with custom stemming snowball algorithms, with a clone of this project. But by default porter stem implementations are included and published in the HEX package.
They are the ones present in the snowball distribution, available stemmers are :
Stemex.danish/1
Stemex.dutch/1
Stemex.english/1
Stemex.finnish/1
Stemex.french/1
Stemex.german/1
Stemex.german2/1
Stemex.hungarian/1
Stemex.italian/1
Stemex.kraaij_pohlmann/1
Stemex.lovins/1
Stemex.norwegian/1
Stemex.portuguese/1
Stemex.romanian/1
Stemex.russian/1
Stemex.spanish/1
Stemex.swedish/1
Stemex.turkish/12 compilers are included in the mix.exs :
:stemex_snowballcompilesc_src/gen/ALGONAME.(c|h)from youralgorithms/ALGONAME.sbl- it needs to have the
snowballexecutable in your PATH, explain you how to get it otherwise - so this compiler is only used on
:devMix env in order to help you developing your snowball algorithms
- it needs to have the
:stemex_nifcompilespriv/Elixir.Stemex_nif.soused as the nif library.
All files in test/diffs/ALGONAME.txt must contains one pair of word per line,
every test case will be then tested : Stemex.ALGONAME(first_elem) == second_elem.
The format is compatible with snowball test files and contains by default all the tests from the snowball website but you can easily add tests for your own snowball algorithm.