Fix exponential expansion based DoS in merge key processing (duplicate alias references)#916
Open
Fix exponential expansion based DoS in merge key processing (duplicate alias references)#916
Conversation
1475feb to
b46710a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #897
Summary
flatten_mapping() exhibits exponential time and memory growth when a merge sequence contains duplicate references to the same alias:
Because YAML aliases resolve to shared MappingNode instances, both entries in the sequence refer to the same object. During merge processing, the mapping’s .value list is extended and then mutated in place. When the same node appears twice in the same merge sequence, its pairs are copied twice within a single call.
With nested constructions, the pair count doubles at each level:
2^(n+1) - 1.A document of 847 bytes at depth 22 produces 8,388,607 pairs and consumes ~12 seconds and ~288MB on CPython 3.11.
All Python loaders that construct mappings are affected (
SafeLoader,FullLoader,Loader,UnsafeLoader). BaseLoader is unaffected.Root Cause
Three behaviors interact:
The composer resolves aliases to shared node objects. Duplicate references inside a merge sequence therefore refer to the same MappingNode instance.
flatten_mapping() iterates the merge sequence and performs the below for each operation:
If the same node appears twice in a merge sequence, its (already expanded)
.valuelist is appended twice within the same call. Nested merges therefore cause exponential growth.Fix
Skip duplicate alias references within a single merge sequence by tracking node identity:
elif isinstance(value_node, SequenceNode): submerge = [] + seen = set() for subnode in value_node.value: if not isinstance(subnode, MappingNode): raise ConstructorError(...) + if id(subnode) in seen: + continue + seen.add(id(subnode)) self.flatten_mapping(subnode) submerge.append(subnode.value)Alias resolution guarantees that repeated references to the same anchor resolve to the same node instance, so identity comparison is sufficient. The seen set is local to each merge sequence. No global state is introduced.
Semantics
Merging the same mapping twice in a single merge sequence produces the same constructed mapping as merging it once. Skipping duplicate references therefore preserves observable behavior.
YAML merge specification tests produce identical output before and after this change.
Performance Impact
A nested document of 847 bytes at depth 22 produces:
The growth prior to this change is exponential (
2^(n+1) -1pairs), meaning small inputs can trigger disproportionately large CPU and memory consumption.When parsing untrusted YAML, this behavior can exhaust system resources.
After this change, processing time and memory usage scale linearly with input size.
Tests
All existing tests pass (1,283 total).
No public API changes.