unicode confusables and normalization

We have more or less clarified this in code already:

- normalize is a one to many conversion procedure, only single characters are allowed, it is transcriptionsystem specific, as it is possible that different systems normalize in different ways
- confusables going beyond this are excluded and placed into the alias section

But we also started to collect things in [cldf/multicode](https://github.com/cldf/multicode). Many of the examples there belong to what we would use to normalize a dataset. But not all. 

I think we can drop multicode, as it was never really followed up, and we'd have to think how to integrate it into any of our tools (maybe one could use it for normalization in [linse](https://github.com/lingpy/linse), where we also have a small normalization procedure for bipa only, to be able to use linse without depending on pyclts). But we should thoroughly check to have harvested all major characters from the unicode confusables list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unicode confusables and normalization #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unicode confusables and normalization #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions