Skip to content

Conversation

@mystic74
Copy link

Fixes #205

Summary

This PR fixes the mutation bug described in #205 where the input DataFrame is modified in-memory when include_null=True.

Implemented Fix

We implemented a conditional copy to balance safety and performance. Instead of copying the DataFrame on every instantiation, we only copy when mutation would actually occur:

if self._categorical and self._include_null:
    # Create a copy to avoid mutating the user's original DataFrame
    data = data.copy()
    data[self._categorical] = handle_categorical_nulls(data[self._categorical], self._categorical)
    

Changes

Tests: Added 5 comprehensive regression tests covering numeric/string categorical types and edge cases.
Code: Added a surgical copy to minimize performance impact on large DataFrames.

Verification

All 53 tests pass, including the new regression tests which previously failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFrame Mutation Bug with Categorical NaN Values

1 participant