Normalize shader codegen to produce identical source across materials#2776
Normalize shader codegen to produce identical source across materials#2776ashwinbhat wants to merge 4 commits intoAcademySoftwareFoundation:mainfrom
Conversation
…instances Ensure materials using the same shading model generate structurally identical shader code, differing only in uniform values. - Use short input names (e.g. base_color) instead of node-prefixed names (e.g. SR_brass_base_color) for shader/closure node uniforms in finalize(). - Replace std::deque with std::set using a name-based comparator in topologicalSort() to guarantee deterministic node ordering at each topological level. - Assign generic output variable names (material_out, surfaceshader_out) in setVariableNames() instead of deriving them from per-material node names.
There was a problem hiding this comment.
Pull request overview
This pull request aims to normalize shader code generation so that materials using the same shading model produce structurally identical shader code, differing only in uniform values. This improves shader caching and reduces compilation overhead when switching between materials with the same underlying shader structure.
Changes:
- Modified input naming in finalize() to use short names (e.g., "base_color") for shader/closure node uniforms instead of node-prefixed names (e.g., "SR_brass_base_color")
- Replaced std::deque with std::set using a name-based comparator in topologicalSort() to ensure deterministic node ordering at each topological level
- Changed output variable naming in setVariableNames() to use generic names ("material_out", "surfaceshader_out") for material and surfaceshader outputs
Comments suppressed due to low confidence (1)
source/MaterialXGenShader/ShaderGraph.cpp:954
- When multiple shader or closure nodes have the same input name, only the first node's input properties are used to initialize the shared inputSocket. The subsequent nodes with the same input name will reuse the existing socket but their value, path, unit, and colorSpace properties will not be applied to the socket (these are only set in the if block on lines 943-951). This could lead to incorrect behavior if different shader/closure nodes have different values, paths, units, or color spaces for inputs with the same name.
Either these properties should be validated to be identical across all shader/closure nodes with the same input name (and an error thrown if they differ), or the naming strategy should be reconsidered to avoid collisions in cases where the properties actually differ.
const bool isShaderNode = node->hasClassification(ShaderNode::Classification::SHADER) ||
node->hasClassification(ShaderNode::Classification::CLOSURE);
const string interfaceName = isShaderNode
? input->getName()
: input->getFullName();
ShaderGraphInputSocket* inputSocket = getInputSocket(interfaceName);
if (!inputSocket)
{
inputSocket = addInputSocket(interfaceName, input->getType());
inputSocket->setPath(input->getPath());
inputSocket->setValue(input->getValue());
inputSocket->setUnit(input->getUnit());
inputSocket->setColorSpace(input->getColorSpace());
if (input->isUniform())
{
inputSocket->setUniform();
}
}
inputSocket->makeConnection(input);
inputSocket->setMetadata(input->getMetadata());
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add a fallback to use fullname in case of conflicts.
Make replaceTokens virtual to allow the WGSL override.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1123,9 +1136,19 @@ void ShaderGraph::topologicalSort() | |||
| // | |||
| // Running time: O(numNodes + numEdges). | |||
There was a problem hiding this comment.
The comment "Running time: O(numNodes + numEdges)" is no longer accurate after changing from std::deque to std::set. The actual running time is now O((numNodes + numEdges) * log(numNodes)) due to the logarithmic insert/erase operations on the std::set. Consider updating this comment to reflect the actual complexity.
| // Running time: O(numNodes + numEdges). | |
| // Running time: O((numNodes + numEdges) * log(numNodes)) due to std::set usage. |
| else if (input->getType() == Type::BOOLEAN) | ||
| { | ||
| const string result = getUpstreamResult(input, context); | ||
| emitString("bool(" + result + ")", stage); |
There was a problem hiding this comment.
The boolean handling in emitInput wraps all boolean inputs with bool() cast, including those that might already be boolean expressions. If getUpstreamResult returns something like "bool(someValue)", this will result in "bool(bool(someValue))", which is redundant but should still be valid WGSL. Consider checking if the result is already a boolean cast to avoid redundancy, though this is a minor stylistic issue.
| emitString("bool(" + result + ")", stage); | |
| // Avoid redundant casts such as bool(bool(expr)) if the upstream | |
| // result is already a bool(...) expression. | |
| if (result.size() >= 6 && result.compare(0, 5, "bool(") == 0 && result.back() == ')') | |
| { | |
| emitString(result, stage); | |
| } | |
| else | |
| { | |
| emitString("bool(" + result + ")", stage); | |
| } |
| // but skip declaration lines (pattern: " name;"). | ||
| // Add more entries here if additional bool uniforms are introduced. | ||
| const vector<std::pair<string, string>> boolUniforms = { | ||
| { HW::REFRACTION_TWO_SIDED, "bool(" + HW::REFRACTION_TWO_SIDED + ")" }, | ||
| }; | ||
| string code = stage.getSourceCode(); | ||
| for (const auto& entry : boolUniforms) | ||
| { | ||
| const string& plainName = entry.first; | ||
| const string& castName = entry.second; | ||
| for (size_t pos = 0; (pos = code.find(plainName, pos)) != string::npos; ) | ||
| { | ||
| bool isDecl = (pos > 0 && code[pos - 1] == ' ' && | ||
| pos + plainName.size() < code.size() && | ||
| code[pos + plainName.size()] == ';'); |
There was a problem hiding this comment.
The declaration detection logic may have false positives. The pattern checks if a uniform name is preceded by a space and followed by a semicolon, but this doesn't account for other valid declaration patterns. For example, a declaration might have multiple spaces, tabs, or newlines before the semicolon. Consider checking against the actual uniform declaration syntax or using a more robust pattern that accounts for the full declaration line structure (e.g., checking for "u32" or "i32" type keyword before the variable name).
| // but skip declaration lines (pattern: " name;"). | |
| // Add more entries here if additional bool uniforms are introduced. | |
| const vector<std::pair<string, string>> boolUniforms = { | |
| { HW::REFRACTION_TWO_SIDED, "bool(" + HW::REFRACTION_TWO_SIDED + ")" }, | |
| }; | |
| string code = stage.getSourceCode(); | |
| for (const auto& entry : boolUniforms) | |
| { | |
| const string& plainName = entry.first; | |
| const string& castName = entry.second; | |
| for (size_t pos = 0; (pos = code.find(plainName, pos)) != string::npos; ) | |
| { | |
| bool isDecl = (pos > 0 && code[pos - 1] == ' ' && | |
| pos + plainName.size() < code.size() && | |
| code[pos + plainName.size()] == ';'); | |
| // but skip declaration lines. | |
| // Add more entries here if additional bool uniforms are introduced. | |
| const vector<std::pair<string, string>> boolUniforms = { | |
| { HW::REFRACTION_TWO_SIDED, "bool(" + HW::REFRACTION_TWO_SIDED + ")" }, | |
| }; | |
| string code = stage.getSourceCode(); | |
| // Helper to determine whether the occurrence of plainName at namePos is part of | |
| // a declaration like: "<type> <plainName> ;" (allowing arbitrary whitespace). | |
| auto isDeclarationOccurrence = [&code](const string& plainName, size_t namePos) -> bool | |
| { | |
| if (namePos == string::npos || namePos >= code.size()) | |
| { | |
| return false; | |
| } | |
| const size_t nameEnd = namePos + plainName.size(); | |
| if (nameEnd > code.size()) | |
| { | |
| return false; | |
| } | |
| // Look forward: skip whitespace, then require a semicolon. | |
| size_t forward = nameEnd; | |
| while (forward < code.size() && | |
| (code[forward] == ' ' || code[forward] == '\t' || | |
| code[forward] == '\n' || code[forward] == '\r')) | |
| { | |
| ++forward; | |
| } | |
| if (forward >= code.size() || code[forward] != ';') | |
| { | |
| return false; | |
| } | |
| // Look backward: skip whitespace, then read the previous identifier token. | |
| if (namePos == 0) | |
| { | |
| return false; | |
| } | |
| size_t backward = namePos; | |
| while (backward > 0 && | |
| (code[backward - 1] == ' ' || code[backward - 1] == '\t' || | |
| code[backward - 1] == '\n' || code[backward - 1] == '\r')) | |
| { | |
| --backward; | |
| } | |
| if (backward == 0) | |
| { | |
| return false; | |
| } | |
| auto isIdentChar = [](char c) -> bool | |
| { | |
| return (c >= 'a' && c <= 'z') || | |
| (c >= 'A' && c <= 'Z') || | |
| (c >= '0' && c <= '9') || | |
| c == '_'; | |
| }; | |
| size_t identEnd = backward; | |
| size_t identStart = identEnd; | |
| while (identStart > 0 && isIdentChar(code[identStart - 1])) | |
| { | |
| --identStart; | |
| } | |
| if (identStart == identEnd) | |
| { | |
| return false; | |
| } | |
| const string typeToken = code.substr(identStart, identEnd - identStart); | |
| // In WGSL, these bool-as-int uniforms are typically declared as u32 or i32. | |
| return (typeToken == "u32" || typeToken == "i32"); | |
| }; | |
| for (const auto& entry : boolUniforms) | |
| { | |
| const string& plainName = entry.first; | |
| const string& castName = entry.second; | |
| for (size_t pos = 0; (pos = code.find(plainName, pos)) != string::npos; ) | |
| { | |
| bool isDecl = isDeclarationOccurrence(plainName, pos); |
| for (size_t pos = 0; (pos = code.find(plainName, pos)) != string::npos; ) | ||
| { | ||
| bool isDecl = (pos > 0 && code[pos - 1] == ' ' && | ||
| pos + plainName.size() < code.size() && | ||
| code[pos + plainName.size()] == ';'); | ||
| if (isDecl) | ||
| { | ||
| pos += plainName.size(); | ||
| } | ||
| else | ||
| { | ||
| code.replace(pos, plainName.size(), castName); | ||
| pos += castName.size(); | ||
| } | ||
| } |
There was a problem hiding this comment.
The string replacement in replaceTokens could result in incorrect behavior if the uniform name appears as a substring of another identifier. For example, if HW::REFRACTION_TWO_SIDED is "u_refractionTwoSided" and the code contains "my_u_refractionTwoSided", the replacement would incorrectly wrap part of "my_u_refractionTwoSided" with bool(). Consider using word boundary checks or ensuring the match is a complete token rather than a substring.
| string variable; | ||
| // Use generic names for material and surfaceshader outputs to ensure consistency | ||
| // The identifier map will handle uniqueness if there are multiple nodes of the same type | ||
| if (output->getType() == Type::MATERIAL) | ||
| { | ||
| variable = "material_out"; | ||
| } | ||
| else if (output->getType() == Type::SURFACESHADER) | ||
| { | ||
| variable = "surfaceshader_out"; | ||
| } | ||
| else | ||
| { | ||
| variable = output->getFullName(); | ||
| } |
There was a problem hiding this comment.
Using generic names like "material_out" and "surfaceshader_out" for all material and surfaceshader outputs could lead to confusing variable names if there are multiple such nodes in a graph. While the comment states "The identifier map will handle uniqueness," this means the actual variable names will become "material_out1", "material_out2", etc., which may not be intuitive for debugging. Consider whether this tradeoff between normalization and debuggability is acceptable for the use case.
| string variable; | |
| // Use generic names for material and surfaceshader outputs to ensure consistency | |
| // The identifier map will handle uniqueness if there are multiple nodes of the same type | |
| if (output->getType() == Type::MATERIAL) | |
| { | |
| variable = "material_out"; | |
| } | |
| else if (output->getType() == Type::SURFACESHADER) | |
| { | |
| variable = "surfaceshader_out"; | |
| } | |
| else | |
| { | |
| variable = output->getFullName(); | |
| } | |
| string variable = output->getFullName(); |
| // Let the base handle all interface and code substitution with a safe map | ||
| // (T_REFRACTION_TWO_SIDED -> u_refractionTwoSided, no bool wrap). | ||
| StringMap safeSubstitutions = substitutions; | ||
| safeSubstitutions[HW::T_REFRACTION_TWO_SIDED] = HW::REFRACTION_TWO_SIDED; | ||
| ShaderGenerator::replaceTokens(safeSubstitutions, stage); | ||
|
|
||
| // Now fix up code only: wrap bool-as-int uniform names in bool() at use sites, | ||
| // but skip declaration lines (pattern: " name;"). | ||
| // Add more entries here if additional bool uniforms are introduced. |
There was a problem hiding this comment.
The comment states "wrap bool-as-int uniform names in bool() at use sites" but the code also attempts to skip declaration lines. However, the pattern detection for declarations is fragile. Consider adding a more detailed comment explaining the full algorithm: (1) exclude the token from base replaceTokens substitution, (2) manually replace the token in interfaces, (3) scan code for remaining occurrences and wrap with bool() cast except at declaration sites. This would help future maintainers understand the complete approach.
| // Let the base handle all interface and code substitution with a safe map | |
| // (T_REFRACTION_TWO_SIDED -> u_refractionTwoSided, no bool wrap). | |
| StringMap safeSubstitutions = substitutions; | |
| safeSubstitutions[HW::T_REFRACTION_TWO_SIDED] = HW::REFRACTION_TWO_SIDED; | |
| ShaderGenerator::replaceTokens(safeSubstitutions, stage); | |
| // Now fix up code only: wrap bool-as-int uniform names in bool() at use sites, | |
| // but skip declaration lines (pattern: " name;"). | |
| // Add more entries here if additional bool uniforms are introduced. | |
| // Three-step algorithm for "bool-as-int" uniforms: | |
| // (1) Start from the incoming substitution map and build a "safe" map that | |
| // excludes bool-wrap behavior for specific tokens. In particular, map | |
| // T_REFRACTION_TWO_SIDED directly to the uniform name | |
| // (T_REFRACTION_TWO_SIDED -> u_refractionTwoSided) so the base | |
| // implementation will substitute it as-is, with no bool() wrapping. | |
| // (2) Call the base ShaderGenerator::replaceTokens with this safe map so | |
| // that all interface blocks and general code see the plain uniform | |
| // name (e.g. u_refractionTwoSided) substituted in the right places. | |
| StringMap safeSubstitutions = substitutions; | |
| safeSubstitutions[HW::T_REFRACTION_TWO_SIDED] = HW::REFRACTION_TWO_SIDED; | |
| ShaderGenerator::replaceTokens(safeSubstitutions, stage); | |
| // (3) Post-process the generated source code only: scan for plain | |
| // bool-uniform names and wrap them in bool() at use sites, while | |
| // attempting to skip their declaration lines. Declarations are | |
| // heuristically detected using the pattern " name;" (single space | |
| // before the name and a semicolon immediately after). This pattern is | |
| // intentionally simple and should be kept in mind if the surrounding | |
| // code format changes. | |
| // Add more entries here if additional bool uniforms are introduced. |
|
|
||
| // Set uniform type to integer | ||
| // WGSL does not support boolean uniforms; emit as integer. | ||
| // Cast to bool is emitted at use site (emitInput / replaceTokens). |
There was a problem hiding this comment.
The comment says "Cast to bool is emitted at use site (emitInput / replaceTokens)", but this only covers BOOLEAN type inputs passed through emitInput and uniforms handled by replaceTokens. If there are other code paths that use boolean uniforms directly (e.g., in inline code or string literals), they may not get the proper bool() cast. Consider adding a comment noting this limitation or ensuring comprehensive coverage of all use cases.
| // Cast to bool is emitted at use site (emitInput / replaceTokens). | |
| // Cast to bool is emitted at use site (emitInput / replaceTokens) for | |
| // code paths that use these APIs. Other usages (e.g., inline code or | |
| // string-literal snippets that reference this uniform directly) must | |
| // ensure an explicit bool() cast if needed. |
| // Calculate in-degrees for all nodes, and enqueue those with degree 0. | ||
| std::unordered_map<ShaderNode*, int> inDegree(_nodeMap.size()); | ||
| std::deque<ShaderNode*> nodeQueue; | ||
| std::set<ShaderNode*, decltype(compareNodes)> nodeQueue(compareNodes); |
There was a problem hiding this comment.
The change from std::deque to std::set changes the time complexity of the topological sort from O(n + e) to O((n + e) * log n) where n is the number of nodes and e is the number of edges. For typical shader graphs with fewer than 100 nodes, this is acceptable. However, if there are use cases with very large shader graphs (hundreds or thousands of nodes), this could become a performance bottleneck. Consider documenting this tradeoff or monitoring performance on large graphs.
Ensure materials using the same shading model generate structurally identical shader code, differing only in uniform values.
Attaching shader diffs of standard_surface_wood_tiled_ps.glsl vs standard_surface_brass_tiled_ps.glsl (before and after)
shaderdiff.txt
shaderdiff_after.txt
Attaching shader diffs of standard_surface_defauilt_ps.glsl vs standard_surface_plastic_ps.glsl (before and after)
shaderdiff_b4.txt
shaderdiff_after.txt