Skip to content

Handle pandas dependency properly #180

@lisad

Description

@lisad

I'm trying to pull together the real way to handle pandas as a dependency properly. Idiotically, I thought we were down to zero dependencies for real but it only looked that way because my venv still had pandas installed.

Ideas

  1. Separate packages: have a 'phaser-pandas' package that the phaser user can import if they want to use the pandas features . Move dataframe_step into that package. Are there any other steps that we'd add that nicely package pandas features?

  2. Cheese it: when checking to see if a parameter is a DataFrame, look at the string value of its class. In the one place we really need to construct a DataFrame, import pandas extremely locally. I THINK I've made this work but I don't think this is the long-term plan.

  3. Conditional tests: At the very least, we should have tests that don't fail if pandas isn't installed - it's hard to maintain the package with the try/fail import and tests that don't work unless pandas is installed but if we install pandas as devs, we're going to make the same kind of mistake I made originally, and ship versions of pandas that unintentionally do require pandas. One suggestion here is to mock pandas if it's not imported? Or can we dynamically skip some tests? https://stackoverflow.com/questions/6076770/ignore-importerror-when-exec-source-code --> see response on returning a dummy module. also see https://codereview.stackexchange.com/questions/222872/skipping-over-failed-imports-until-they-are-needed-if-ever.

About 1:
It might be possible to import pandas but not deep within a function - instead separate "dataframe_step" (and any other pandas-based steps if any are made) to its own file, and separate the items in those files into a sub-library or something. The 'dataframe_step' function would not appear in the main init.py which defines the things automatically imported when somebody has "import phaser" . The user would also add "import phaser.dfsteps" or "from phaser.pandas import dataframe_step" or something like that to additionally bring dataframe_step into scope. The same way when I import 'django' I don't load all of 'django.contrib' into context -- if I want to use the features in django.contrib they're additional imports.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions