-
Notifications
You must be signed in to change notification settings - Fork 255
Description
I noticed a few problems with the JSON loader yesterday, so thought I better write them down so they're not forgotten.
-
JSON supports arbitrary characters in keys, including quote marks, so:
- Places like
JSONParser::read_objectthat detect the end of a key name by searching for a"character will fail when that character's in the key. If you want to find the end, it's not the first", it's the first"that's preceeeded by an even number of\characters (obviously zero backslashes is an even number), i.e. the first one that's not escaped. - It might make sense just to reuse
JSONParser::read_stringto read keys as that handles escaping of special characters already. The grammar and example state machine on json.org don't distinguish between a string that's a value and a string that's a key, so it's right and proper to use one function to consume that token in both contexts. - If the key strings are modified as they're loaded, then having a
std::string_viewinto the original buffer is no longer viable for the schema read overloads that take a key name - an rvalue reference might make more sense so the key can be moved rather than copied into place.
- Places like
-
Commas separating object members and array elements are handled oddly.
{"key1":"one",,,,,,,,,,"key2":"two"}and{"key1":"one""key2":"two"}aren't valid JSON, but the loader currently treats them exactly the same way as{"key1":"one","key2":"two"}. This isn't a devastating problem on its own, but might mean that some users don't notice corrupt data as quickly as they would otherwise. -
The default handling for unexpected fields is error-prone.
- the default
JSONParser::Schema::read_Ximplementations forstring,objectandarraydon't callJSONParser::read_X, so it'sposmember isn't advanced past the value. If a schema isn't expecting a field of that type at that time, then these functions will be called. If the value is a string, it'll then be read as a key with no value and emit a delimiter warning, or if it's an array or object, then a non-descriptive warning is emitted followed by a parser stuck warning. In both cases, any remaining members won't be read. - The approach taken in the 3D tiles schema (i.e. things that look like
where recognised members are read but unrecognised members aren't read) can end up with the same results as the reader's
if (property == "refine") parser.read_string(refine); else parser.warning();
posisn't advanced past the field.
Numbers, booleans and null are already advanced past when they're read, so they're not a concern. Skipping over a string is easy - you can just scan for the next unescaped quote mark or even call
read_stringand discard the result. Skipping over recursive structures like objects and arrays is harder, though - they might contain another object or array as a child, so scanning for the next}or]isn't adequate and they therefore have to be parsed.Because of that, I think a pretty reasonable solution would be to:
- define a schema that just silently skips everything, but still makes the necessary recursive
parse_objectandparse_arraycalls, and either explicitly callsparse_stringor skips over strings itself. - make the
JSONParser::Schema::read_ximplementations emit a warning about an unexpected field then consume it (potentially by telling the parser to read with the ignore-everything schema to ensure objects and arrays are handled correctly). - update the existing glTF and 3D tiles schemas to call the base class implementation of a method when it's called with an unexpected field name so they don't need to handle emitting a warning and skipping it explicitly..
I originally noticed this class of problem when I attempted to load this tileset: https://s3.eu-west-2.wasabisys.com/ems-sgct-photomaillage/ODACIT/EMS_PM2022/tileset.json, which has
"refine":"REPLACE"as part of the top-level tileset object, which the VSG doesn't handle yet. It caused the"root"object and everything after it not to load, so it was completely useless. - the default