Skip to content

Conversation

@NicksWorld
Copy link
Contributor

This replaces the flawed manual demangling with proper demanglers capable of supporting additional language features such as namespaces.

For example, the mangled symbol N6DFHack21dfhack_lua_viewscreenE was previously demangled as DFHack21dhack_lua_viewscreenE. This patch ensures demangling returns the proper DFHack::dfhack_lua_viewscreen. This also ensures that class names returned remain consistent between platforms.

Basic functionality has been tested on Linux, Windows, and Wine, with no unexpected results seen so far.

@lethosor
Copy link
Member

I wish I had seen the discussion sooner. I had a similar effort going on https://github.com/lethosor/dfhack/commits/cxx-demangle-windows/ about a year ago. I hadn't touched doReadClassName, though.

It would be great if cxx_demangle in MiscUtils could use your changes too: https://github.com/DFHack/dfhack/blob/e4f3cd94ce621d02ae82e82a68a418d8c45bd01c/library/MiscUtils.cpp#L672C27-L672C39
From what I recall, my project here had stalled because ab9rf and I were debating what to do on Windows (well, MSVC) where class names and function names are demangled differently. Maybe our current API in MiscUtils is insufficient. But there is Lua dev tooling that would benefit from demangling capability on Windows.

@NicksWorld
Copy link
Contributor Author

@lethosor I absolutely should be able to update cxx_demangle as well. I had learned that the . prefix on a symbol denotes that it is a type and should be demangled as such. Using UNDNAME_NO_ARGUMENTS a type can be demangled, and functions are demangled without. The main concern was what information do we want to exclude from the demangled symbols?

Take the following demangled symbol for example:

class std::_Func_impl_no_alloc<class `public: void __cdecl widgets::widget::set_custom_render(class std::function<void __cdecl(class widgets::widget * __ptr64)>) __ptr64'::`2'::<lambda_1>,void,class widgets::widget * __ptr64,unsigned int>

Do we want to include the class specifiers, visibility modifiers, or any other markers? I know that a number of those don't seem to appear in a gcc mangled name.

@lethosor
Copy link
Member

I would care more about what more "normal" symbols look like, i.e. things that we're more likely to demangle. Mainly things like global type names (viewscreenst should be demangled just like that), and maybe stuff like std::vector<int> or std::function<void(widgets::widget*)>. I know some of those can get messy with default template arguments and such - that's fine. I guess I would rather cut down on the __cdecl and class bits, but IIRC MSVC mangles struct, class and union differently (just with a different prefix, e.g. .AU, .AV).

I think the bigger question is what DFHack currently cares about. doReadClassName is only used for matching vtable names in-memory, isn't it? Is the goal here to just make that more accurate for the DFHack viewscreen classes, or are you trying to solve a different issue here? If we don't have a pressing need to demangle any STL types, I don't think we need to care too much about what they look like. In devel/find-vtables.lua, all I wanted demangling for was to catch and filter out types starting with std::, while keeping types starting with widgets::.

Thanks for looking into the UNDNAME flags.

@NicksWorld
Copy link
Contributor Author

It is true doReadClassName is used mainly for checking viewscreen names. The function is also used sparingly in other places like virtual_identity checking to provide debug information.

The main reason for the change was for cross-platform consistency and ensuring the function returns a result that isn't trimmed in unusual manners. This mainly affects classes inside of namespaces or including template parameters. Class names that aren't in the format of viewscreen_<name>st are already modified via trimming the first 10 and last 2 chars when creating a focus string, losing more of their symbol.

If this is a change that doesn't seem warranted at this time, I am glad to simply repurpose portions of this experiment into improving cxx_demangle.

@NicksWorld
Copy link
Contributor Author

NicksWorld commented Dec 14, 2024

Through further testing I have discovered inconsistencies can occur when running under wine. The affected symbols were discovered via a devel/scan-vtables test, and appear to be in only specific highly-nested template parameters. Wine also returns the mangled string in place of the demangled string upon complete failure, which may be undesirable.

An example of an offending symbol:

.?AV?$_Func_impl_no_alloc@V<lambda_1>@?1??add_happiness_tex@widgets@@YAXV?$shared_ptr@Vcontainer@widgets@@@std@@PEAX@Z@X$$V@std@@

is (improperly) demangled to

std::shared_ptr<widgets::container>::_Func_impl_no_alloc

In the simpler symbols with fewer template arguments I am yet to see any unusual results.

@NicksWorld NicksWorld marked this pull request as draft December 14, 2024 02:58
@lethosor
Copy link
Member

Class names that aren't in the format of viewscreen_<name>st are already modified via trimming the first 10 and last 2 chars when creating a focus string, losing more of their symbol.

Isn't this also the case for classes that do match that pattern? (or did you mean "are" instead of "aren't"?) The focus string logic is very specific to viewscreens, so I don't think doReadClassName should be concerned with it at all. If the viewscreen focus string generation has a problem, it should be fixed there.

is (improperly) demangled to

for comparison, what's the "proper" result?

@NicksWorld
Copy link
Contributor Author

All strings returned by doClassName are trimmed as described, not just ones not matching the standard pattern. At this point, it looks like adding Windows support to cxxDemangle would be preferable and introduce fewer potential issues.

The proper demangling of the symbol I previously provided would be:

std::_Func_impl_no_alloc<`widgets::add_happiness_tex'::`2'::<lambda_1>,void>

The demangled result still looks somewhat incorrect, but at least seems more accurate.

Wine contains a flawed implementation of demangling which causes some symbols to fail demangling as well. As such, the use of the UndecorateSymbolName should likely only occur on native windows, which may be an acceptable requirement for cxxDemangle as it is only used by devel/scan-vtables, but likely is unacceptable in a core function used for viewscreens and lua.

For the time being I am closing this PR as the changes are not critical and may cause additional problems in some environments.

@NicksWorld NicksWorld closed this Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants