Skip to content

Some JSON loader issues #1581

@AnyOldName3

Description

@AnyOldName3

I noticed a few problems with the JSON loader yesterday, so thought I better write them down so they're not forgotten.

  • JSON supports arbitrary characters in keys, including quote marks, so:

    • Places like JSONParser::read_object that detect the end of a key name by searching for a " character will fail when that character's in the key. If you want to find the end, it's not the first ", it's the first " that's preceeeded by an even number of \ characters (obviously zero backslashes is an even number), i.e. the first one that's not escaped.
    • It might make sense just to reuse JSONParser::read_string to read keys as that handles escaping of special characters already. The grammar and example state machine on json.org don't distinguish between a string that's a value and a string that's a key, so it's right and proper to use one function to consume that token in both contexts.
    • If the key strings are modified as they're loaded, then having a std::string_view into the original buffer is no longer viable for the schema read overloads that take a key name - an rvalue reference might make more sense so the key can be moved rather than copied into place.
  • Commas separating object members and array elements are handled oddly. {"key1":"one",,,,,,,,,,"key2":"two"} and {"key1":"one""key2":"two"} aren't valid JSON, but the loader currently treats them exactly the same way as {"key1":"one","key2":"two"}. This isn't a devastating problem on its own, but might mean that some users don't notice corrupt data as quickly as they would otherwise.

  • The default handling for unexpected fields is error-prone.

    • the default JSONParser::Schema::read_X implementations for string, object and array don't call JSONParser::read_X, so it's pos member isn't advanced past the value. If a schema isn't expecting a field of that type at that time, then these functions will be called. If the value is a string, it'll then be read as a key with no value and emit a delimiter warning, or if it's an array or object, then a non-descriptive warning is emitted followed by a parser stuck warning. In both cases, any remaining members won't be read.
    • The approach taken in the 3D tiles schema (i.e. things that look like
      if (property == "refine")
          parser.read_string(refine);
      else
          parser.warning();
      where recognised members are read but unrecognised members aren't read) can end up with the same results as the reader's pos isn't advanced past the field.

    Numbers, booleans and null are already advanced past when they're read, so they're not a concern. Skipping over a string is easy - you can just scan for the next unescaped quote mark or even call read_string and discard the result. Skipping over recursive structures like objects and arrays is harder, though - they might contain another object or array as a child, so scanning for the next } or ] isn't adequate and they therefore have to be parsed.

    Because of that, I think a pretty reasonable solution would be to:

    • define a schema that just silently skips everything, but still makes the necessary recursive parse_object and parse_array calls, and either explicitly calls parse_string or skips over strings itself.
    • make the JSONParser::Schema::read_x implementations emit a warning about an unexpected field then consume it (potentially by telling the parser to read with the ignore-everything schema to ensure objects and arrays are handled correctly).
    • update the existing glTF and 3D tiles schemas to call the base class implementation of a method when it's called with an unexpected field name so they don't need to handle emitting a warning and skipping it explicitly..

    I originally noticed this class of problem when I attempted to load this tileset: https://s3.eu-west-2.wasabisys.com/ems-sgct-photomaillage/ODACIT/EMS_PM2022/tileset.json, which has "refine":"REPLACE" as part of the top-level tileset object, which the VSG doesn't handle yet. It caused the "root" object and everything after it not to load, so it was completely useless.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions