Skip to content

CLTS knows the "ultra-long" length, but pyclts only parses it for a closed list of vowels. #45

@XachaB

Description

@XachaB

Hi,

I saw that CLTS knows the "ultra-long" length:

https://github.com/cldf-clts/clts/blob/d52aa50ec524b590f7715334b58c89b839dc585b/pkg/transcriptionsystems/features.json#L65

However, pyclts only parses it for a closed list of vowels, which are coded extensively in the vowels file, e.g.:

https://github.com/cldf-clts/clts/blob/cccee296b1e54e653e1b4bea103bf0e870072765/pkg/transcriptionsystems/bipa/vowels.tsv#L52

I am working with Nuer right now, where morphological contrasts can combine tone, 3 levels of length, vowel quality, and breathyness; which leads to many attested combinations. When other diacritics than length are involved, pyclts incorrectly parses "ultra-long" as if it were "long":

import pyclts
clts = pyclts.CLTS()
o1 = clts.bipa["oːː"]
print(o1, o1.featuredict["duration"])
o2 = clts.bipa["o̤ːː"]
print(o2, o2.featuredict["duration"])
o3 = clts.bipa["ó̤ːː"]
print(o3, o3.featuredict["duration"])

outputs:

oːː ultra-long
o̤ː long
ó̤ː long

I saw that the ultra-long diacritic is not in the diacritics file:

https://github.com/cldf-clts/clts/blob/cccee296b1e54e653e1b4bea103bf0e870072765/pkg/transcriptionsystems/bipa/diacritics.tsv#L82

However, adding a row in that file with double "ː" is not enough, from which I would guess that the parser does not allow any combinations of sound + compatible diacritics.

Is this intended behavior ? I understand that for some applications, losing such fine sound resolution might not matter. For morphology, where I was hoping to use CLTS as a parser (in order to obtain featural definitions from grapheme sequences, and where I want to trust the data sources), the contrast between, for example, long and ultra-long, is sometimes crucial.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions