Fix DataFrame mutation bug by copying data on initialization #206
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #205
Summary
This PR fixes the mutation bug described in #205 where the input DataFrame is modified in-memory when
include_null=True.Implemented Fix
We implemented a conditional copy to balance safety and performance. Instead of copying the DataFrame on every instantiation, we only copy when mutation would actually occur:
Changes
Tests: Added 5 comprehensive regression tests covering numeric/string categorical types and edge cases.
Code: Added a surgical copy to minimize performance impact on large DataFrames.
Verification
All 53 tests pass, including the new regression tests which previously failed.