What to do with Unicode strings

## The issue

Currently EMP removes all non-ascii characters from its strings using [this line of code](https://github.com/Ashvin-Ranjan/EMP/blob/893c0e9a01f777ae66f62f41cd0738dca498d276/src/encode.rs#L109), which might make things harder when converting because both JSON and NBT support Unicode of some variety.

### JSON

In JSON:
> A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.

*From [json.org](https://www.json.org/json-en.html)*

There are some useful things to note from this definition, as though it might not work to include UTF-8 or UTF-16 because of the fact that strings over length 16 cannot have the byte `00000100` in it otherwise it will terminate prematurely, messing up the rest of the decoding, we can still include `\b`, `\f`, `\n`, `\t`, and `\r` as none of those conflict with the current system.

### NBT

According to [wiki.vg](https://wiki.vg/NBT) NBT uses [Modified UTF-8](https://docs.oracle.com/javase/8/docs/api/java/io/DataInput.html#modified-utf-8), this may be able to be modified further to disallow the usage of the byte `00000100`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What to do with Unicode strings #1

The issue

JSON

NBT

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What to do with Unicode strings #1

Description

The issue

JSON

NBT

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions