CLTS knows the "ultra-long" length, but pyclts only parses it for a closed list of vowels.

Hi,

I saw that CLTS knows the "ultra-long" length:

https://github.com/cldf-clts/clts/blob/d52aa50ec524b590f7715334b58c89b839dc585b/pkg/transcriptionsystems/features.json#L65

However, pyclts only parses it for a closed list of vowels, which are coded extensively in the vowels file, e.g.:

https://github.com/cldf-clts/clts/blob/cccee296b1e54e653e1b4bea103bf0e870072765/pkg/transcriptionsystems/bipa/vowels.tsv#L52

I am working with Nuer right now, where morphological contrasts can combine tone, 3 levels of length, vowel quality, and breathyness; which leads to many attested combinations. When other diacritics than length are involved, pyclts incorrectly parses "ultra-long" as if it were "long":

~~~
import pyclts
clts = pyclts.CLTS()
o1 = clts.bipa["oːː"]
print(o1, o1.featuredict["duration"])
o2 = clts.bipa["o̤ːː"]
print(o2, o2.featuredict["duration"])
o3 = clts.bipa["ó̤ːː"]
print(o3, o3.featuredict["duration"])
~~~

outputs:

~~~
oːː ultra-long
o̤ː long
ó̤ː long
~~~


I saw that the ultra-long diacritic is not in the diacritics file:

https://github.com/cldf-clts/clts/blob/cccee296b1e54e653e1b4bea103bf0e870072765/pkg/transcriptionsystems/bipa/diacritics.tsv#L82

However, adding a row in that file with double "ː" is not enough, from which I would guess that the parser does not allow any combinations of sound + compatible diacritics.

Is this intended behavior ? I understand that for some applications, losing such fine sound resolution might not matter. For morphology, where I was hoping to use CLTS as a parser (in order to obtain featural definitions from grapheme sequences, and where I want to trust the data sources), the contrast between, for example, long and ultra-long, is sometimes crucial.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLTS knows the "ultra-long" length, but pyclts only parses it for a closed list of vowels. #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CLTS knows the "ultra-long" length, but pyclts only parses it for a closed list of vowels. #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions