The WordGen library allows creating random words (and names) by providing simple text files or rules for syllables and/or characters.
The library is located in the wgen-lib module. The main parts of this library are:
-
a parser and word generator (
WordGenParser,WordGen) -
and a text/language scanner (
Scanner).
The Scanner is used to scan text files for words and to generate syllable rules (= how syllables and characters can be concatenated together). These rules are used by WordGenParser and WordGen to generate random words and/or names.
To use the word generator the class WordGen has to be instanciated. The easiest method is to provide a rules file (see Syllable Rules below) and load it by using the WordGenParser.
To create a new instance of WordGenParser it’s recommended to use the provided WordGenParser.Builder
With WordGenParser instanciated it’s easy to parse a rules file and create an instance of WordGen (the word generator itself)
public WordGen fromFile(final File file) throws IOExceptionpublic WordGen fromFile(final File file, final long seed) throws IOException`Once this is done random words are generated by using the following method of the WordGen class:
public String nextWord(final int minLength, final int maxLength)where minLength and maxLength define the minimum and maximum (both inclusive) length of a word. NOTE: the lenght of the word is not measured in characters but in syllable rules used.
To generate rules for a specific language text files can be scanned by a Scanner. The more words are provided in a text file the better and more accurate the rules will become.
To instantiate a new Scanner a Scanner.Builder is provided:
final Scanner scan = new Scanner.Builder().build();To scan files for words scanFilesForWords is used. This method returns a Set of words found in all provided text files.
public TreeSet<String> scanFilesForWords(final File...files) throws IOExceptionTo generate syllable rules (see blow) for provided text files scanFilesForRules is used. This method returns a List of generated syllable rules. One rule per entry.
public List<String> scanFilesForRules(final File...files) throws IOExceptionTo generate random words syllables have to be defined. These syllables need to be put into a simple text file. Only one syllable rule per line is allowed. Rules can be defined manually or by using a Scanner (see above).
Valid syllable rules consist of a position rule (+, -, = or nothing) and an array (list) of syllables/characters to be chosen from. The word generator randomly picks a syllable from the given list. Some examples:
-
+[a,e,i,o,u] -
[a,e,i,o,u] -
-[as,is,es,os,us]
Position rules define the allowed position of the syllable in a word:
-
+only at the end of a word -
-only at the start of a word -
=everywhere -
no position rule means: this syllable can only occur inside a word (not allowed at start or end of a word)
-[a,e,i,o,u] [ka] [ga] [rra,tta] +[m,n]
In this example every word will begin with a vowel, because only this rule: -[a,e,i,o,u] is allowed to start a word. The name generator will then randomly pick a vowel from the list. For all mid parts of the word the following syllable rules are relevant: [ka], [ga] and [rra,tta]. So every time a syllable is needed, that is neither the start nor the end of the word one of these three syllable rules is randomly picked. The end of a word can only be m or n, because only this rule: +[m,n] is allwoed to a word.
Some output examples using this rules:
-
A
-
Ekakam
-
Igarran
-
Orragakan
-
Un
Syllable rules additionally can hold expressions and flags. Expressions are specific behaviours of syllable rules (some kind of conditions). A flag does not have any functionality by itself, but they are needed for some expressions. Expressions and flags can be appended to a syllable rule. They are seperated by a space.
-
c`, `+v` and `+n`: The next (`) syllable needs to start with a consonant (c) or a vowel (v) or a number (n) -
-c,-vand-n: The previous (-) syllable needs to end with a consonant (c) or a vowel (v) or a number (n)
The word generator comes with a small set of vowels and numbers:
public static final String VOWELS = "aeiouyäöüáéíóúýàèìòùỳâêîôûŷ";
public static final String NUMBERS = "0123456789";They can be extended by using the following methods that are provided by the Builder:
public Builder setVowels(final String vowels)
public Builder setNumbers(final String numbers)All other characters are treated as consonants.
-
-accept(a,b),accept(a,b)`: the next syllable (`) has to start withaorbor the previous (-) syllable has to end withaorb. -
+minlen(5),+maxlen(5): min or max length (in characters) of the current word and the next syllable. -
-minlen(5),-maxlen(5): min and max length (in characters) of the current word. -
-flag(A,B): the previous syllable rule needs to contain the flag:AandB. -
+flag(A,B): the next syllable rule needs to contain the flag:AandB. -
noRepeat`: the next syllable (`) must not be the same as the current one. -
#A: set the flagAfor the current syllable rule
Some examples:
Only a consonant can be appended to this syllable rule:
[a,e,i,o,u] +c
This syllable rule can only be attached to a consonant and only a consonant can be appended to this syllable rule:
[a,e,i,o,u] -c +c
Only a syllable rule that contains flag A can be appended to this syllable rule:
[a,e,i,o,u] +flag(A)
Only a syllable rule that contains flag A can be appended to this syllable rule. Additionally flag B is set for this rule:
[a,e,i,o,u] +flag(A) #B
Only a syllable rule that contains flag A can be appended to this syllable rule and only a consonant can be appended to this syllable rule. Additionally flag B, C and D are set for this rule:
[a,e,i,o,u] +flag(A) +c #B #C #D
Can be found in the wgen-examples module, including six fictive languages (three of them have been generated with the help of Scanners).
There is also a hispanic name generator
-
Pseudo-finnish has been defined manually and uses flags to simulate vowel harmony.
-
Pseudo-english, pseudo-norwegian and pseudo-polish and pseudo_german has been generated by scanning simple text files containing words of these languages.
-
brarto and simpli are just simple manually defined languages.