We originally had a goal for developers using phaser to be able to write steps and write unit tests to make sure those steps worked as expected. Then, with the addition of the "check_size" flag in the step decorator, we made this unfortunately a little harder. Now to test a steP;
@dataframe_step
def sum_bonuses(df, context):
df['total'] = df.sum(axis=1, numeric_only=True)
return df
def test_sum_bonuses():
data = {'eid': ['001', '001'], 'commission': [1000, 1000], 'performance': [9000, 1000]}
output = [{'eid': '001', 'commission': 1000, 'performance': 9000, 'total': 10000},
{'eid': '001', 'commission': 1000, 'performance': 1000, 'total': 2000}]
bonus_df = pd.DataFrame(data)
test_step_output, check_size_flag = sum_bonuses(bonus_df)
assert test_step_output == output
To test the step, the developer must account for the check_size flag and deal with it. We should fix that - I think check_size should have been dealt with differently in retrospect, but we probably forgot about this impact when we built it like this. This is an issue also for row_step and batch_step.
Even better would be if there were some magical way to test sum_bonuses output as a dataframe. That seems even more challenging than fixing our approach to check_size, but maybe there's some way.