Skip to content

Accepts unpaired surrogate codepoints in both encode and decode #156

@DavidBuchanan314

Description

@DavidBuchanan314
import { encode, decode } from 'cborg'

const wtf8 = "😎".slice(1);
console.log(wtf8); // �

const encoded = encode(wtf8); // I expect an error here but there isn't one (or a U+FFFD substitution)
console.log(encoded); // Uint8Array(4) [ 99, 237, 184, 142 ] 

const decoded = decode(new Uint8Array([ 99, 237, 184, 142 ])); // I expect an error here although U+FFFD substitution is also reasonable
console.log(decoded); // ��� (this is in fact U+FFFD repeated 3x)

const roundtrip = encode(decoded);
console.log(roundtrip); // Uint8Array(10) [ 105, 239, 191, 189, 239, 191, 189, 239, 191, 189 ]

Encoding unpaired surrogate codepoints produces invalid utf8 byte sequences in the CBOR result.

Decoding the invalid utf8 produces strings with U+FFFD ("Replacement Character"), which is maybe reasonable, but other CBOR decoders may throw errors in the same situation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions