Skip to content

Fix exponential expansion based DoS in merge key processing (duplicate alias references)#916

Open
akshat-sj wants to merge 1 commit intoyaml:mainfrom
akshat-sj:merge-key-dos
Open

Fix exponential expansion based DoS in merge key processing (duplicate alias references)#916
akshat-sj wants to merge 1 commit intoyaml:mainfrom
akshat-sj:merge-key-dos

Conversation

@akshat-sj
Copy link

Fixes #897

Summary

flatten_mapping() exhibits exponential time and memory growth when a merge sequence contains duplicate references to the same alias:

<<: [*A, *A]

Because YAML aliases resolve to shared MappingNode instances, both entries in the sequence refer to the same object. During merge processing, the mapping’s .value list is extended and then mutated in place. When the same node appears twice in the same merge sequence, its pairs are copied twice within a single call.

With nested constructions, the pair count doubles at each level: 2^(n+1) - 1.
A document of 847 bytes at depth 22 produces 8,388,607 pairs and consumes ~12 seconds and ~288MB on CPython 3.11.
All Python loaders that construct mappings are affected (SafeLoader, FullLoader, Loader, UnsafeLoader). BaseLoader is unaffected.

Root Cause

Three behaviors interact:

  1. The composer resolves aliases to shared node objects. Duplicate references inside a merge sequence therefore refer to the same MappingNode instance.

  2. flatten_mapping() iterates the merge sequence and performs the below for each operation:

merge.extend(subnode.value)
  1. The parent mapping is rebuilt via:
node.value = merge + node.value

If the same node appears twice in a merge sequence, its (already expanded) .value list is appended twice within the same call. Nested merges therefore cause exponential growth.

Fix

Skip duplicate alias references within a single merge sequence by tracking node identity:

elif isinstance(value_node, SequenceNode):
    submerge = []
+   seen = set()
    for subnode in value_node.value:
        if not isinstance(subnode, MappingNode):
            raise ConstructorError(...)
+       if id(subnode) in seen:
+           continue
+       seen.add(id(subnode))
        self.flatten_mapping(subnode)
        submerge.append(subnode.value)

Alias resolution guarantees that repeated references to the same anchor resolve to the same node instance, so identity comparison is sufficient. The seen set is local to each merge sequence. No global state is introduced.

Semantics

Merging the same mapping twice in a single merge sequence produces the same constructed mapping as merging it once. Skipping duplicate references therefore preserves observable behavior.
YAML merge specification tests produce identical output before and after this change.

Performance Impact

A nested document of 847 bytes at depth 22 produces:

  • Before: 12.2s execution time, +288MB memory
  • After: 0.002s execution time, negligible memory growth

The growth prior to this change is exponential (2^(n+1) -1pairs), meaning small inputs can trigger disproportionately large CPU and memory consumption.
When parsing untrusted YAML, this behavior can exhaust system resources.
After this change, processing time and memory usage scale linearly with input size.

Tests

All existing tests pass (1,283 total).
No public API changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: YAML Merge Keys Drive Exponential Expansion DoS

1 participant

Comments