Skip to content

BUG: string replace results in invalid regular expression: invalid perl operator: (?<= #63385

@jorisvandenbossche

Description

@jorisvandenbossche

The pyjanitor clean_names method performs a bunch of regex .str.replace(..) calls, including with this regex, which is not supported by pyarrow:

>>> ser = pd.Series(['a@b', 'c@'], dtype='str')
>>> ser.str.replace('(?<=\\w)@(?=\\w)', '_', regex=True)
...
ArrowInvalid: Invalid regular expression: invalid perl operator: (?<=
>>> ser = ser.astype(pd.StringDtype(storage="python"))
>>> ser.str.replace('(?<=\\w)@(?=\\w)', '_', regex=True)
0    a_b
1     c@
dtype: string

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions