Skip to content

orgize-wasm: data-range links are broken with non-ASCII characters #91

@parasyte

Description

@parasyte

I have documents with Unicode characters, which rowan correctly handles. But the orgize-wasm frontend has issues because it creates the data-range links without considering that the indices are byte ranges, and JavaScript has UTF-16 strings. This clash causes the links to get out of sync rapidly.

For instance, parse this input in the web frontend:

Hello

world

Foo 💩 Bar

Baz

- 💩
- Lorem ipsum dolor sit amet
- 💩
- consectetur adipiscing elit
- 💩
- sed do eiusmod tempor incididunt ut labore et dolore magna aliqua

In the "Syntax" tab, clicking on the range in TEXT@0..7 "Hello\r\n" properly highlights "Hello", and the same for "World". However, after the poo emoji, the links will desynchronize with the input. The range in TEXT@34..39 "Baz\r\n" will highlight the "z" and all of the newlines up to the hyphen. The list is worse, since each emoji extends the range by 2 additional "phantom" UTF-16 characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions