Fix TypeError when computing p-values with missing categorical values #188

tompollard · 2025-04-08T02:20:50Z

This PR fixes a TypeError that occurs when computing p-values for categorical variables containing missing values. The error arises because missing values are replaced with the string 'None', leading to mixed-type categories (e.g.int and str) which cannot be sorted during internal processing (ref #161 and #160).

Previously, the following code would raise TypeError: '<' not supported between instances of 'str' and 'int':

from tableone import TableOne
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'group': ['A', 'B', 'A', 'B', 'A'],
    'numeric_cat': [1, 2, np.nan, 2, 1]
})

t1 = TableOne(df, columns=['numeric_cat'], categorical=['numeric_cat'], groupby='group', pval=True)
print(t1.tableone)

The fixes are:

Modified handle_categorical_nulls() to convert entire columns to string before replacing nulls with 'None', avoiding mixed-type category issues.
Added a key=str argument when sorting categories to prevent sorting errors in edge cases.

Fixes a TypeError that occurs when computing p-values for categorical variables containing missing values. The error arises because missing values are replaced with the string 'None', leading to mixed-type categories (e.g. int and str) which cannot be sorted during internal processing.

tompollard added 3 commits April 7, 2025 18:45

replicate data type error in tests. ref #160.

4e0b3c6

Add tests.

b120399

tompollard merged commit 6a641f0 into main Apr 8, 2025
3 checks passed

tompollard deleted the tp/issue_160 branch April 8, 2025 02:41

This was referenced Apr 8, 2025

TypeError: '<' not supported between instances of 'str' and 'int' when setting pval=True #160

Closed

Problem - The truth value of an array with more than one elemnet is ambigous. #161

Closed

mystic74 mentioned this pull request Jan 10, 2026

DataFrame Mutation Bug with Categorical NaN Values #205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix TypeError when computing p-values with missing categorical values #188

Fix TypeError when computing p-values with missing categorical values #188

Uh oh!

tompollard commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix TypeError when computing p-values with missing categorical values #188

Fix TypeError when computing p-values with missing categorical values #188

Uh oh!

Conversation

tompollard commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants