-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi,
I saw that CLTS knows the "ultra-long" length:
However, pyclts only parses it for a closed list of vowels, which are coded extensively in the vowels file, e.g.:
I am working with Nuer right now, where morphological contrasts can combine tone, 3 levels of length, vowel quality, and breathyness; which leads to many attested combinations. When other diacritics than length are involved, pyclts incorrectly parses "ultra-long" as if it were "long":
import pyclts
clts = pyclts.CLTS()
o1 = clts.bipa["oːː"]
print(o1, o1.featuredict["duration"])
o2 = clts.bipa["o̤ːː"]
print(o2, o2.featuredict["duration"])
o3 = clts.bipa["ó̤ːː"]
print(o3, o3.featuredict["duration"])
outputs:
oːː ultra-long
o̤ː long
ó̤ː long
I saw that the ultra-long diacritic is not in the diacritics file:
However, adding a row in that file with double "ː" is not enough, from which I would guess that the parser does not allow any combinations of sound + compatible diacritics.
Is this intended behavior ? I understand that for some applications, losing such fine sound resolution might not matter. For morphology, where I was hoping to use CLTS as a parser (in order to obtain featural definitions from grapheme sequences, and where I want to trust the data sources), the contrast between, for example, long and ultra-long, is sometimes crucial.