Skip to content

Add refactored/revamped JSON schema codegen#307

Closed
krupkad wants to merge 5 commits intogltf-schema-prettierfrom
new-json-schema-codegen
Closed

Add refactored/revamped JSON schema codegen#307
krupkad wants to merge 5 commits intogltf-schema-prettierfrom
new-json-schema-codegen

Conversation

@krupkad
Copy link
Contributor

@krupkad krupkad commented Aug 11, 2021

This is a fairly revamped JSON schema / C++ codegen, which I used to generate the 3D Tiles struts and writers in #306.

Main functional differences:

  • Handling for recursive types (types with properties of the same type, e.g. trees)
  • Handling of "native" enums (e.g. enum: ["value1", "value2"], which is used in 3D Tiles but not glTF and not handled by the current codegen)
  • Handling of JSON pointer references, which 3D Tiles requires
  • Added writer generation

Reader/struct generation:

  • Mostly unchanged - output code is largely the same
  • Removal of -Spec and final in the struct output - felt somewhat "extra" and not really needed

Writer generation:

  • Writers are primarily implemented as overloads of writeJson in an anonymous namespace in cpp files
  • They are exposed as classes (e.g. TilesetWriter) with a single static method write, which just calls writeJson
  • The cpp files contain writeJson implementations for primitive types (double, string, integer, bool), composites (string->X dictionary, array), and optional

Code structure:

  • Refactored resolveProperty.json to utilize JS clases to better organize the handlers for the various JSON schema types
  • Refactored generate.json to use handlebars.js and external templates (see the files in templates/) do the actual codegen (vs. embedded format strings in the JS itself)

@cesium-concierge
Copy link

Thanks again for your contribution @krupkad!

No one has commented on this pull request in 30 days. Maintainers, can you review, merge or close to keep things tidy?

I'm going to re-bump this in 30 days. If you'd like me to stop, just comment with @cesium-concierge stop. If you want me to start again, just delete the comment.

@lilleyse
Copy link
Contributor

I'd like to take over this branch and help move it along so that we no longer have two code generators that are slowly getting more and more out of sync with each other.

My plan is this:

  • Get this branch to a state where the output glTF reader code looks like the generated code in #351 (which is branched off #350)
  • Remove generate-gltf-classes and use this generator instead. @kring, @baothientran, @javagl do you have any reservations about this PR? The handlebars templates while a bit old school are nicely organized.
  • Open a PR that replaces the hand-coded glTF writer with autogenerated code. We can test this in asset-pipeline.
  • Get 3d tiles writer #306 to a mergeable state. We've been using this branch in tilers for a while now.
  • Get 3d tiles reader #308 to a mergeable state.

I plan on starting on this ASAP so please post any thoughts if this is not the right direction.

@lilleyse lilleyse changed the base branch from main to gltf-schema-updates September 26, 2021 17:34
@lilleyse
Copy link
Contributor

lilleyse commented Sep 27, 2021

Some notes after diving a little bit deeper today

  • The -Spec suffix should be added back to distinguish the autogenerated base class ModelSpec.h from the subclass with additional helper functions Model.h
  • Need to evaluate whether the attachTo mechanism needs to be added back. Other extensions like EXT_feature_metadata, EXT_meshopt_compression, and 3DTILES_metadata attach themselves to different objects. EXT_feature_metadata has since been added to generate-gltf-classes and will be a good reference for this.
    • It's possible that attachTo could be inferred by the schema path. Though primitive.EXT_feature_metadata.schema.json would need to be renamed to mesh.primitive.EXT_feature_metadata.schema.json to make this work. Might have to be careful about schema file naming for some 3DTILES_ extensions as well.
  • All the generated headers in CesiumGltf and CesiumGltfReader have been combined into a single header file with the exception of extensions, similar to Put all generated glTF JSON handlers into a single C++ source file, rather than one file per class #279. I want to double check that this is good with everyone.
  • Scripts to run schema-gen should be added to package.json. tools/schema-gen/package.json needs to be reworked so that glTF and 3D Tiles aren't writing to the same test_output folder. There should also be scripts to put the autogenerated files in the actual code folders like generate-gltf-classes does.
  • This PR has code for autogenerating Tileset.h but it's not really clear what the plan was for integrating this with Cesium3DTilesSelection. I see that 3d tiles writer #306 adds a new project Cesium3DTiles similar to CesiumGltf but it doesn't integrate with Cesium3DTilesSelection.
    • Side note: the autogenerated Tileset.h seems to be renamed to Cesium3DTiles.h in 3d tiles writer #306. I'm good with that, but why isn't the generated file called Cesium3DTiles.h in this PR?
  • Not sure yet how we're going to support implicit tiling subtrees since they are an "orphan" schema, aka not part of tileset.json in any way. https://github.com/CesiumGS/3d-tiles/tree/3d-tiles-next/extensions/3DTILES_implicit_tiling/schema/subtree
  • Extensions that start with 3DTILES_ are obviously troublesome for generating C++ classes because they start with a number. Though the approach here of prefixing all class names with EXT_ doesn't seem super robust, like if we add support for EXT_meshopt_compression it will be called EXT_EXT_meshopt_compression.
  • Will need to add support for 3DTILES_content_gltf, 3DTILES_implicit_tiling, 3DTILES_metadata, EXT_feature_metadata, EXT_meshopt_compression, and KHR_mesh_quantization. These are required by other repos using cesium-native.
  • This PR caught a schema bug in 3DTILES_content_gltf: CesiumGS/3d-tiles@3cc445d

@lilleyse
Copy link
Contributor

@krupkad let me know if I'm missing anything in my comments above.

@javagl
Copy link
Contributor

javagl commented Sep 27, 2021

There are some deeply technical questions in there, and I haven't looked at the details e.g. of the extension mechanisms, so cannot say much about that. But ... apologies for the apparent bike-shedding here:

The -Spec suffix should be added back to distinguish the autogenerated base class ModelSpec.h from the subclass with additional helper functions Model.h

I can see the point of adding convenience functions in Model or Accessor. But I think it would be more consistent to let the generated classes be exactly what is defined by the schema. The convenience functions could then be in classes that could be named ModelExt, AccessorExt or similar, to make the destinction between the auto-generated and the extending classes clearer.

On the other hand, I see the rationale of giving the extending classes the "simple, straightforward" names, because these are the ones that are actually used by clients. So that's not a really strong opinion, but I think it would be nice if one could state a rule like "Entity X is found in X.h", without having to list exceptions, and without having to define them, technically and manually, via the toBeInherited flag. (On top of that: If we ever wanted to add convenience functions to one of the other classes, then that would imply renaming that class from Foo to FooSpec...)

@krupkad
Copy link
Contributor Author

krupkad commented Sep 27, 2021

In no particular order:

  • I don't have a strong opinion on class naming or restoring base classes / interfaces
  • Implicit tiling and other orphan schemas (e.g. the various feature-table-related ones) - the current system should be adequate for handling this? i would just encourage orthogonality, a la making sure the schemas can still be processed independently
  • extension naming - i had ideas of using namespaces or such scoping to deal with this, more configuration options for naming also feels acceptable

@kring
Copy link
Member

kring commented Sep 29, 2021

The handlebars templates while a bit old school are nicely organized.

It's ok. I still have a slight preference for template literals, but I won't complain too much. The way I see it:

  • Pros of Handlebars: Much better and easier indentation handling. #if and #each let us write familiar procedural code.
  • Cons of Handlebars: It's a "new language" to learn (even if it's widely used) rather than plain JavaScript. Harder to use arbitrary JavaScript code in your template (functions have to be explicitly exposed to Handlebars, and the syntax is far less expressive than javascript).
  • Pros of template literals: It's just JavaScript. No new library, no new syntax to learn. Use any JavaScript code you want in your generator.
  • Cons of template literals: Indentation handling is awkward. Code must use functional style because if statements and for loops are not supported.

But if we are sending the generated code through clang-format, the indentation handling is mostly irrelevant. And having spent a lot of time with React, which also doesn't have if or for statements (By design, see: https://reactjs.org/docs/conditional-rendering.html), I find the functional style quite comfortable and the ability to use regular javascript is a really nice advantage.

The -Spec suffix should be added back to distinguish the autogenerated base class ModelSpec.h from the subclass with additional helper functions Model.h

Yes agreed!

I can see the point of adding convenience functions in Model or Accessor. But I think it would be more consistent to let the generated classes be exactly what is defined by the schema.

I don't think that can work. The problem is that the Model class has a vector of, say, Accessor, right? We could define an AccessorExt with some extra stuff, but the Model would still refer to the original Accessor. Creating an AccessorExt and pushing it into the Model's vector would slice it down to an Accessor (lose the extra data). We can't solve this by defining a ModelExt that inherits from Model because there's no way to change the type of a field in the base class. As usual we can probably sort this out with another layer of abstraction (i.e. virtual functions, heap allocations) but IMO it's definitely not worth it.

But in general, we should try to avoid using this toBeInherited mechanism to add data. We should only add behavior (methods). The current exception to that is the cesium field that we've added to Image and Buffer. Perhaps a better solution would be add a cesium-native-private extension to these types that provides access to the image and buffer data.

So that's not a really strong opinion, but I think it would be nice if one could state a rule like "Entity X is found in X.h", without having to list exceptions, and without having to define them, technically and manually, via the toBeInherited flag. (On top of that: If we ever wanted to add convenience functions to one of the other classes, then that would imply renaming that class from Foo to FooSpec...)

Hmm most of this doesn't make sense to me, not sure what I'm missing. Entity X is always found in X.h. There's just a generated XSpec.h under the hood in same cases, too, but users don't need to worry about those. Adding convenience functions to other classes requires us to tweak the generator config to create a XSpec class (set toBeInherited), and then manually define an X class, but having done so it's transparent to our users. Well, it's an ABI-breaking change, sure, but not a source-level breaking change.

Need to evaluate whether the attachTo mechanism needs to be added back.

I think it's not really used anymore. Instead, the assocation between extensions and the objects they extend is hard-coded in the GltfReader constructor where it registers extensions on particular types. If we can generate this code from the extension JSON Schema (e.g. based on the extension file name as you suggested), that would be excellent, but it probably needs work on the spec side. In the absence of that, defining the attachment manually in code seems just as good as definining it manually in a JSON config file.

All the generated headers in CesiumGltf and CesiumGltfReader have been combined into a single header file with the exception of extensions

I don't think this will work once we're generating some of the types a -Spec again. Because the single generated header file will have e.g. AccessorSpec but not Accessor, but all the other types in there (e.g. Model) will reference a type called Accessor that doesn't exist. Including the hand-written Accessor.h first won't help, because that will reference AccessorSpec which is only in the generated header file.

Scripts to run schema-gen should be added to package.json. tools/schema-gen/package.json needs to be reworked so that glTF and 3D Tiles aren't writing to the same test_output folder. There should also be scripts to put the autogenerated files in the actual code folders like generate-gltf-classes does.

👍

I see that 3d tiles writer #306 adds a new project Cesium3DTiles similar to CesiumGltf but it doesn't integrate with Cesium3DTilesSelection.

Yeah, it's going to take some work. I don't know what the answer is here.

Not sure yet how we're going to support implicit tiling subtrees since they are an "orphan" schema, aka not part of tileset.json in any way.

I'm not sure I totally understand the issue here.

Extensions that start with 3DTILES_ are obviously troublesome for generating C++ classes because they start with a number. Though the approach here of prefixing all class names with EXT_ doesn't seem super robust, like if we add support for EXT_meshopt_compression it will be called EXT_EXT_meshopt_compression.

The extension names in the existing code are pretty cumbersome. Maybe we should normalize them by prefixing with Extension and then a de-underscored, case-normalized version of the extension name. So EXT_meshopt_compression becomes ExtensionExtMeshoptCompression and 3DTiles_implicit_tiling becomes Extension3dtilesImplicitTiling. And when we have different types for extending different objects, ExtensionMeshPrimitiveExtFeatureMetadata. Still kinda cumbersome, but I don't have any better ideas short of totally manual names that wouldn't seem connected to the extension names at all, such as MeshPrimitiveFeatureMetadata for that last one.

Daniel's idea of using namespaces is an interesting one! Put all the extensions in CesiumGltf::Extensions and then remove that Extension prefix?

Will need to add support for 3DTILES_content_gltf, 3DTILES_implicit_tiling, 3DTILES_metadata, EXT_feature_metadata, EXT_meshopt_compression, and KHR_mesh_quantization. These are required by other repos using cesium-native.

Of these, EXT_feature_metadata is already (at least partially) supported in main. 3DTILES_content_gltf is supported too but not via an extension. cesium-native just happily loads a GLB as tile content when it sees one.

@javagl
Copy link
Contributor

javagl commented Oct 4, 2021

3DTILES_content_gltf is supported too but not via an extension. cesium-native just happily loads a GLB as tile content when it sees one.

It still would have to check whether there are extensionsRequired, and whether these extensions are supported. Otherwise, it would load the glTF that required extensions that are not supported. (Maybe it does check for the extensionsRequired in the glTF, and hopefully bails out gracefully, but it should not even attempt to load that glTF in the first place...)

Creating an AccessorExt and pushing it into the Model's vector would slice it down to an Accessor (lose the extra data).
...
But in general, we should try to avoid using this toBeInherited mechanism to add data.

The second part is IMHO the crucial one here. And if these objects are not extended with data, then I don't see a problem. The options are then, roughly:

  • to offer these convenience functions in classes like AccessorExt, which can trivially be created as

    AccessorExt accessorExt(accessor);
    int64_t bytes = accessorExt.computeBytesPerVertex();
    
  • or just offer them as static functions,

    int64_t bytes = AccessorExt.computeBytesPerVertex(accessor);
    

(I mean, there currently is a convenience function in Accessor that requires a Model as a parameter - somehow, this doesn't feel so consistent anyhow...)

Regarding the cases where this is not possible (namely, Buffer and Image): I already mentioned elsewhere that this cesium field is something that raised an eyebrow for me. I think that it does not belong there (and even if it is there, it should not be named cesium - imagine trying to use some JSON parser, and each json_node element contains some property that you don't need, an which has the name of a company...). It will probably not be changed (because that's the way it is, and changing things is a lot of effort), but in doubt, I can dig out the thread where the talked about the alternatives for that.

@lilleyse lilleyse changed the base branch from gltf-schema-updates to gltf-schema-prettier October 4, 2021 23:11
@lilleyse lilleyse force-pushed the new-json-schema-codegen branch from db235b2 to 6cde110 Compare October 9, 2021 17:32
@lilleyse
Copy link
Contributor

Thanks for the insights everyone. I'll still be referring back to these comments but now that #361 is merged this PR can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments