Skip to content

Conversation

@chadrik
Copy link
Contributor

@chadrik chadrik commented Sep 3, 2024

This PR makes it easier to create python stub files, which are used for IDE completion and static type analysis. The direct benefactors of this change are a very niche group, but we're a group that support the community by providing tools that make developing with OpenUSD a more productive and pleasurable experience.

I'm using these changes to create stubs which are published to pypi here. You can peruse the stub files here. For the past year or two I've had to maintain my own fork of OpenUSD to make this possible.

In order to sell you on this change, I'm going to demonstrate that the function signatures in the stub files that I'm creating are superior to those included with docstrings that ship with OpenUSD, and I'd like to plant the idea that this PR is the first step towards an eventual integration of stub generation into the native OpenUSD build process.

Background

As someone who wants very accurate python stubs, I've faced a handful of challenges:

  • Boost does not have native support for generating stubs. In 2019 I created a ticket to add stub generation support to Boost, and while it's gotten some +1's there is still zero movement. There seems to be little hope of side-stepping this issue unless OpenUSD plans to move to pybind11, which apparently does have stubgen support.
  • Using the C++ Doxygen documentation alone does not provide enough information to generate highly accurate stubs. The actual number of python overloads often differs from the number of C++ overloads. A cause of this and numerous other hard-to-solve idiosyncracies is that there are many manual adjustments made in the boost wrappers within this project.
  • Boost does have the ability to generate docstrings with python function signatures, but they are incomplete: many types are simply refered to as object. However, it does give us the actual number of python overloads, and is reliable about certain result types like list and tuple, and this extra information can help fill the gap between live inspection of python objects and the rich information in the C++ docs.

Comparison between python signatures: yours (native) vs mine (cg-stubs)

Overview

As part of its build process, OpenUSD generates docstrings in side-car python files named __DOC.py, which are loaded at import time. In order to make use of these docstrings within your IDE, it must be configured to be able to import the OpenUSD python modules to inspect them.

By comparison, pyi stub files are a lightweight approach to load not only function and class documentation into your IDE but also type information, which is informative, and is used to drive code completion. Additionally, stub files can be used by static type analysis tools like mypy to validate your code. Providing access to stub files within your IDE is typically as simple as running pip install types-usd within your project's virtualenv.

Quality

This is the docstring that OpenUSD generates for the method Usd.ClipsAPI.GetClipAssetPaths:

GetClipAssetPaths(assetPaths, clipSet) -> bool

assetPaths : VtArray[AssetPath]
clipSet : str


List of asset paths to the clips in the clip set named C{clipSet}.


This list is unordered, but elements in this list are referred to by
index in other clip-related fields.


----------------------------------------------------------------------
GetClipAssetPaths(assetPaths) -> bool

assetPaths : VtArray[AssetPath]


This is an overloaded member function, provided for convenience. It
differs from the above function only in what argument(s) it accepts.
This function operates on the default clip set.



UsdClipsAPISetNames

These are my stubs generated for the same methods:

class ClipsAPI(APISchemaBase):
    ...
    @overload
    def GetClipAssetPaths(self, clipSet: str | pxr.Ar.ResolvedPath) -> list[pxr.Sdf.AssetPath]:
        """
        List of asset paths to the clips in the clip set named C{clipSet}.


        This list is unordered, but elements in this list are referred to by
        index in other clip-related fields.
        """
    @overload
    def GetClipAssetPaths(self) -> list[pxr.Sdf.AssetPath]:
        """
        This is an overloaded member function, provided for convenience. It
        differs from the above function only in what argument(s) it accepts.
        This function operates on the default clip set.



        UsdClipsAPISetNames
        """

You'll notice that the docstrings present in yours and mine are equivalent (and share the same whitespace idiosyncracies) because I'm using the same doxygenlib modules to process docstrings.

The primary difference is in the overload signatures. Here's a more direct side-by-side comparsion:

First overload

# yours
def GetClipAssetPaths(assetPaths: VtArray[AssetPath], clipSet: str) -> bool: ...
# mine
def GetClipAssetPaths(self, clipSet: str | pxr.Ar.ResolvedPath) -> list[pxr.Sdf.AssetPath]: ...

Second overload

# yours
def GetClipAssetPaths(assetPaths: VtArray[AssetPath]) -> bool: ...
# mine
def GetClipAssetPaths(self) -> list[pxr.Sdf.AssetPath]: ...

tl;dr The native signatures are wrong in many ways. Below is a table explaining the sources of these differences:

OpenUSD cg-stubs
number of overloads may not be accurate number of overloads always matches those in python
doesn't handle pointers that become results inspects C++ signatures for pointer arguments and converts them to python result when the the boost signatures indicate a match
only converts std::vector and std::sequence also converts std::set, std::unordered_set, std::function, std::map, std::unordered_map, std::optional
doesn't handle implicitly convertible types like Ar.ResolvedPath scans headers for typedef and using statements to substitute these aliases for their actual types
intended for documentation reference intended for documentation reference, IDE code completion, and static type analysis
does not include self or cls args adds self and cls args for methods
not validated validated against actual python code
only visits functions which have matches in Doxygen processes every function in every mdoule

Ok, I'm convinced. What next?

Here's my proposed 3-step plan:

Step 1

Merge this PR.

Step 2

I will contribute type annotations to several open source project which I can use to further validate my stubs, and continue to broadcast the existence of these stubs far and wide to drive adoption. I've already created a PR for the great USD Cookbook repo. Ideally, if Pixar is open to it, I would like to contribute type annotations to OpenUSD itself, for example in Usdviewq. To ease into this, I have a followup to this PR to add annotations to doxygenlib.

Step 3

If we are all in agreement, then the final step would be to move the pyi stub generation process into OpenUSD itself. I'm a pretty busy guy and moving the project into OpenUSD would ensure that it undergoes more regular releases than I can manage. I will be around to help out with maintenance of the generator.

Q & A

What has changed under the hood?

There are two main changes in this PR, both limited to the build process:

  1. OpenUSD python bindings are built with the boost option to include boost-generated signatures.
  2. doxygenlib is updated to hide these signatures from users.

Tf.PreparePythonModule overrides function __doc__ attributes with values from __DOC.py. This behavior has remained unchanged. What has changed is the functions that are included in __DOC.py files during the build process. Prior to this change, cdWriterDocstring omitted entries in __DOC.py for functions which had a docstring provided by Boost, because an existing docstring indicated that it had been manually authored in the C++ wrapper.

After this PR, all functions have docstrings generated by Boost (because of the new Boost signatures), so we have to do a little extra work to determine if the Boost docstrings are just the auto-generated python signatures, or custom docstrings from the wrappers. Now when cdWriterDocstring processes an existing docstring, it removes the Boost signatures and if there's anything left it writes it to __DOC.py.

The upshot is that the additional docstrings make the python lib directory slightly larger, 25MB, or about 0.6% to a full OpenUSD build:

before: 4141 MB
after:  4166 MB

What are the differences for the user?

As mentioned, I've taken great pains to preserve the original behavior as much as possible, however, there are some edge cases.

Functions for which cdWriterDocstring found a C++ match in doxygen will remain the same, because they will recieve an entry in __DOC.py. However, there are functions which cdWriterDocstring never visits and which therefore do not receive an override in __DOC.py. These functions will now have docstrings generated by Boost.

For example, before this PR, the following function had no docstring:

$ python3.9 -c "import pxr.Sdf as Sdf;print(Sdf.ListEditorProxy_SdfNameKeyPolicy.ContainsItemEdit.__doc__)"
None

After this PR, it has a docstring generated by Boost:

$ python3.9 -c "import pxr.Sdf as Sdf;print(Sdf.ListEditorProxy_SdfNameKeyPolicy.ContainsItemEdit.__doc__)"
ContainsItemEdit( (ListEditorProxy_SdfNameKeyPolicy)arg1, (object)item [, (bool)onlyAddOrExplicit=False]) -> bool

Why enable the boost signatures by default?

If we follow through with the 3-step plan above, building with signatures will need to be the default, because we will not want to build one version of OpenUSD with signatures enabled to create the stubs, and then rebuild with signatures disabled.

Why not provide an option to enable the Boost signatures?

I'm open to doing this, but adding an option to enable the Boost signatures adds complexity to the doxygenlib code because it needs to produce parity in output with both scenarios.


I know this is a lot to read, but I thought I'd front load all of the context necessary to make a decision on this PR. Take your time and ask as many questions as you like.

Thanks!

@jesschimein
Copy link
Collaborator

Filed as internal issue #USD-10058

@jesschimein
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@spiffmon
Copy link
Member

Hi @chadrik - sorry for the delay; we finally had a chance to talk this over, and we like the idea of at least your step #1 at this point. We're going to do a little more testing for build-time impact, and we're pretty sure the new pxr_boost should support this, though if you had a chance to rebase against current dev (or 24.11) to verify, that would be awesome also!

@chadrik
Copy link
Contributor Author

chadrik commented Jun 26, 2025

Hi all, I'm back on this! I'm going to rebase onto dev, and see if we can get this merged.

By the way, I've succeeded in moving my stub generation process into OpenImageIO and I'm getting close on OpenColorIO. OpenEXR is next. I'm hoping to hit get the entire ASWF-and-friends done before SIGGRAPH.

@spiffmon is there any new info I should be aware of?

@spiffmon
Copy link
Member

Nothing new to add, @chadrik , pxr_boost is there for you to experiment with. Full disclosure we won't have time to help or get you feedback till after siggraph!

Copy link
Contributor

@pmolodo pmolodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few nitpicks, but I think it would be ok to merge even if they're not addressed...

Comment on lines +45 to +47
if (NOT PXR_BUILD_PYTHON_DOCUMENTATION)
_add_define(BOOST_PYTHON_NO_PY_SIGNATURES)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the documentation builds also build TF, I'm thinking that this comment may now be out of date, and we can remove the define unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting of the BOOST_PYTHON_NO_PY_SIGNATURES define has moved here: https://github.com/PixarAnimationStudios/OpenUSD/blob/dev/pxr/external/boost/python/CMakeLists.txt#L327

I tried to port the same behavior, like so:

--- a/pxr/external/boost/python/CMakeLists.txt
+++ b/pxr/external/boost/python/CMakeLists.txt
@@ -321,11 +321,13 @@ pxr_library(python
     DISABLE_PRECOMPILED_HEADERS
 )
 
-# Disable insertion of C++ signatures in docstrings. We generate this
-# information separately via our Python doc build process.
-target_compile_definitions(python
-    PUBLIC PXR_BOOST_PYTHON_NO_PY_SIGNATURES
-)
+if (NOT PXR_BUILD_PYTHON_DOCUMENTATION)
+    # Disable insertion of C++ signatures in docstrings. We generate this
+    # information separately via our Python doc build process.
+    target_compile_definitions(python
+        PUBLIC PXR_BOOST_PYTHON_NO_PY_SIGNATURES
+    )
+endif()

Unfortunately, this leads to a build error that I don't understand.

/Users/chad/dev/USD/pxr/external/boost/python/type_id.hpp:88:9: error: 'typeid' of incomplete type 'pxrInternal_v0_25_8__pxrReserved__::ArAsset'
   88 |         typeid(T)
      |         ^
/Users/chad/dev/USD/pxr/external/boost/python/converter/registered.hpp:111:31: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::type_id<const volatile pxrInternal_v0_25_8__pxrReserved__::ArAsset &>' requested here
  111 |       return registry::lookup(type_id<T&>());
      |                               ^
/Users/chad/dev/USD/pxr/external/boost/python/converter/registered.hpp:118:14: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::converter::detail::registry_lookup2<const volatile pxrInternal_v0_25_8__pxrReserved__::ArAsset>' requested here
  118 |       return registry_lookup2((T(*)())0);
      |              ^
/Users/chad/dev/USD/pxr/external/boost/python/converter/registered.hpp:129:64: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::converter::detail::registry_lookup1<const volatile pxrInternal_v0_25_8__pxrReserved__::ArAsset &>' requested here
  129 |   registration const& registered_base<T>::converters = detail::registry_lookup1(type<T>());
      |                                                                ^
/Users/chad/dev/USD/pxr/external/boost/python/to_python_value.hpp:117:54: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::shared_ptr_to_python_value<const std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset> &>::get_pytype<pxrInternal_v0_25_8__pxrReserved__::ArAsset>' requested here
  117 |       PyTypeObject const* get_pytype() const {return get_pytype((type<argument_type>*)0);}
      |                                                      ^
/Users/chad/dev/USD/pxr/external/boost/python/detail/caller.hpp:137:98: note: in instantiation of member function 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::shared_ptr_to_python_value<const std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset> &>::get_pytype' requested here
  137 |         return create_result_converter((PyObject*)0, (ResultConverter *)0, (ResultConverter *)0).get_pytype();
      |                                                                                                  ^
/Users/chad/dev/USD/pxr/external/boost/python/detail/caller.hpp:159:61: note: (skipping 6 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
  159 |         , &detail::converter_target_type<result_converter>::get_pytype
      |                                                             ^
/Users/chad/dev/USD/pxr/external/boost/python/make_function.hpp:148:20: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::make_function_aux<std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset> (pxrInternal_v0_25_8__pxrReserved__::ArResolver::*)(const pxrInternal_v0_25_8__pxrReserved__::ArResolvedPath &) const, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::default_call_policies, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::type_list<std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset>, pxrInternal_v0_25_8__pxrReserved__::ArResolver &, const pxrInternal_v0_25_8__pxrReserved__::ArResolvedPath &>, std::integral_constant<int, 1>>' requested here
  148 |     return detail::make_function_aux(
      |                    ^
/Users/chad/dev/USD/pxr/external/boost/python/class.hpp:458:13: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::make_function<std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset> (pxrInternal_v0_25_8__pxrReserved__::ArResolver::*)(const pxrInternal_v0_25_8__pxrReserved__::ArResolvedPath &) const, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::default_call_policies, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::keywords<1>, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::type_list<std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset>, pxrInternal_v0_25_8__pxrReserved__::ArResolver &, const pxrInternal_v0_25_8__pxrReserved__::ArResolvedPath &>>' requested here
  458 |           , make_function(
      |             ^
/Users/chad/dev/USD/pxr/external/boost/python/class.hpp:527:15: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::class_<pxrInternal_v0_25_8__pxrReserved__::ArResolver, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::noncopyable>::def_impl<pxrInternal_v0_25_8__pxrReserved__::ArResolver, std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset> (pxrInternal_v0_25_8__pxrReserved__::ArResolver::*)(const pxrInternal_v0_25_8__pxrReserved__::ArResolvedPath &) const, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::def_helper<pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::keywords<1>>>' requested here
  527 |         this->def_impl(
      |               ^
/Users/chad/dev/USD/pxr/external/boost/python/class.hpp:205:15: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::class_<pxrInternal_v0_25_8__pxrReserved__::ArResolver, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::noncopyable>::def_maybe_overloads<std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset> (pxrInternal_v0_25_8__pxrReserved__::ArResolver::*)(const pxrInternal_v0_25_8__pxrReserved__::ArResolvedPath &) const, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::keywords<1>>' requested here
  205 |         this->def_maybe_overloads(name, a1, a2, &a2);
      |               ^
/Users/chad/dev/USD/pxr/usd/ar/wrapResolver.cpp:97:10: note: in instantiation of function template specialization 'pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::class_<pxrInternal_v0_25_8__pxrReserved__::ArResolver, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::noncopyable>::def<std::shared_ptr<pxrInternal_v0_25_8__pxrReserved__::ArAsset> (pxrInternal_v0_25_8__pxrReserved__::ArResolver::*)(const pxrInternal_v0_25_8__pxrReserved__::ArResolvedPath &) const, pxrInternal_v0_25_8__pxrReserved__::pxr_boost::python::detail::keywords<1>>' requested here
   97 |         .def("OpenAsset", &This::OpenAsset,
      |          ^
/Users/chad/dev/USD/pxr/usd/ar/resolver.h:25:7: note: forward declaration of 'pxrInternal_v0_25_8__pxrReserved__::ArAsset'
   25 | class ArAsset;
      |       ^
1 error generated.
gmake[2]: *** [pxr/usd/ar/CMakeFiles/_ar.dir/build.make:233: pxr/usd/ar/CMakeFiles/_ar.dir/wrapResolver.cpp.o] Error 1

My goal in not setting BOOST_PYTHON_NO_PY_SIGNATURES is to allow boost to add its representation of the python signatures to the python docstrings.

In an ideal world, instead of adding these to the __doc__ attribute of each object we would write them to disk as a build artifact that can be read by the custom docstring generator. Now that we have our own fork of boost that might actually be on the table, but either way this build error needs to be resolved.

Unfortunately this has reached a level of complexity that I won't be able to solve it on my own, which means I can't make stubs for any versions of USD after the switch to pxr_boost. Any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how far on the journey I can go, but this particular error is saying that wrapResolver.cpp, with that option enabled, requires a complete declaration of ArAsset, whereas resolver.h is only forward declaring it. So I think you just need to include asset.h in wrapResolver.cpp

not parentPath[-1].name.startswith(obj.__module__.split(".")[1]):
# The doxygen object clearly indicates it's from another module.
obj = None
elif type(obj).__module__ != 'Boost.Python':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the module name changed with pxr_boost, but seeing this reminds me that you didn't explicitly confirm if you've tested this with pxr_boost yet...

Comment on lines +720 to +722
# boost signature has been prepended. that means if there is
# an existing description provided in the C++ wrapper it will be
# indented. strip the signature and dedent the description.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an example before / after docstring (either real or made up) would greatly help understanding here. I had to re-read this a few times before I thought I understood what's going on, as my initial assumption about what these would look like was wrong.

Comment on lines +725 to +730
for line in lines:
if not found and line.startswith(' '):
found = True
if found:
newLines.append(line)
return textwrap.dedent('\n'.join(newLines))
Copy link
Contributor

@pmolodo pmolodo Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, basically we're finding the first indented line, grabbing it and all following, and then dedenting.

I'm assuming you haven't encountered the situation where unindented text follows indented, ie:

A
B
    C
    D
E
    F

...but we should consider what to do if it happens. Current behavior would be to return:

    C
    D
E
    F

...which is likely not desired.

Simplest solution is to just error if we're fairly sure this should never happen, and/or can't make any guesses at what the best behavior would be if it does. Other options would be to return None, or one of:

C
D
F

or

C
D
E
F

...all of which feel more likely to be correct.

found = False
newLines = []
for line in lines:
if not found and line.startswith(' '):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add a and line.strip() check here? ie, if we have several lines of indented-but-empty lines preceeding the start of the non-whitespace section, we'll probably want to strip those too.

Would also (help) prevent us from returning a result that's nothing but whitespace-only lines. (We'd also need to check if found=False after the loop).

Comment on lines +725 to +730
for line in lines:
if not found and line.startswith(' '):
found = True
if found:
newLines.append(line)
return textwrap.dedent('\n'.join(newLines))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return None if found = False, or error?

return None
if len(doc) > 0 and not overloads[0].isModule():
newDoc = self.__stripBoostSig(doc)
if newDoc is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some cases where __stripBoostSig() could return a non-None string that's either empty or nothing but whitespace. We should either clean those up, or change this check to if newDoc is None or not newDoc.strip():, or both (an effectively empty docstring is probably never the correct thing to return, I'm guessing).

and not overloads[0].isModule():
Debug("Docstring exists for %s - skipping" % pypath)
return None
if len(doc) > 0 and not overloads[0].isModule():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to add the old we always override the module doc string for now... comment back here

try:
module = importlib.import_module(".__DOC", f_locals["__name__"])
module.Execute(f_locals)
if os.environ.get("PXR_USD_PYTHON_DISABLE_DOCS", "false").lower() not in ("1", "true", "yes"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tf.GetEnvBool() also returns true for "on" - we should probably follow that convention:

https://github.com/PixarAnimationStudios/OpenUSD/blob/v25.05.01/pxr/base/tf/getenv.h#L48

@chadrik
Copy link
Contributor Author

chadrik commented Aug 3, 2025

Had a few nitpicks, but I think it would be ok to merge even if they're not addressed...

@pmolodo, thanks for the review!

Sorry it's taken me awhile to get back to this. I'll address a few of your notes now, but I'm unfortunately a bit stuck putting this back together after the pxr_boost change. I'm hoping that someone can point me in the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants