Skip to content

What to do with Unicode strings #1

@Ashvin-Ranjan

Description

@Ashvin-Ranjan

The issue

Currently EMP removes all non-ascii characters from its strings using this line of code, which might make things harder when converting because both JSON and NBT support Unicode of some variety.

JSON

In JSON:

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.

From json.org

There are some useful things to note from this definition, as though it might not work to include UTF-8 or UTF-16 because of the fact that strings over length 16 cannot have the byte 00000100 in it otherwise it will terminate prematurely, messing up the rest of the decoding, we can still include \b, \f, \n, \t, and \r as none of those conflict with the current system.

NBT

According to wiki.vg NBT uses Modified UTF-8, this may be able to be modified further to disallow the usage of the byte 00000100.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions