Choose a new approach for deciding whether or not to check dataframe size in a step - without making step less testable

We originally had a goal for developers using phaser to be able to write steps and write unit tests to make sure those steps worked as expected.  Then, with the addition of the "check_size" flag in the step decorator, we made this unfortunately a little harder.  Now to test a steP;

```
@dataframe_step
def sum_bonuses(df, context):
    df['total'] = df.sum(axis=1, numeric_only=True)
    return df

def test_sum_bonuses():
    data = {'eid': ['001', '001'], 'commission': [1000, 1000], 'performance': [9000, 1000]}
    output = [{'eid': '001', 'commission': 1000, 'performance': 9000, 'total': 10000},
              {'eid': '001', 'commission': 1000, 'performance': 1000, 'total': 2000}]
    bonus_df = pd.DataFrame(data)
    test_step_output, check_size_flag = sum_bonuses(bonus_df)
    assert test_step_output == output
```

To test the step, the developer must account for the check_size flag and deal with it.  We should fix that - I think check_size should have been dealt with differently in retrospect, but we probably forgot about this impact when we built it like this.  This is an issue also for row_step and batch_step.

Even better would be if there were some magical way to test sum_bonuses output as a dataframe.  That seems even more challenging than fixing our approach to check_size, but maybe there's some way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a new approach for deciding whether or not to check dataframe size in a step - without making step less testable #190

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Choose a new approach for deciding whether or not to check dataframe size in a step - without making step less testable #190

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions