-
Notifications
You must be signed in to change notification settings - Fork 377
Description
In the shuffle_experiment function, I believe the accessing of the experiment_data array used to calculate the mean is off (see highlighted portion below). Specifically, the condition is grabbing the rows labeled 0 or 1 correctly, however, after that we need to only grab the second column of each row so as to exclude the label value from the mean calculation.
Current code:
def shuffle_experiment(number_of_times):
experiment_diff_mean = np.empty([number_of_times,1])
for times in np.arange(number_of_times):
experiment_label = np.random.randint(0,2,shoe_sales.shape[0])
experiment_data = np.array([experiment_label, shoe_sales[:,1]]).T
experiment_diff_mean[times] = experiment_data[experiment_data[:,0]==1].mean()
- experiment_data[experiment_data[:,0]==0].mean()
return experiment_diff_mean
Proposed code:
def shuffle_experiment(number_of_times):
experiment_diff_mean = np.empty([number_of_times,1])
for times in np.arange(number_of_times):
experiment_label = np.random.randint(0,2,shoe_sales.shape[0])
experiment_data = np.array([experiment_label, shoe_sales[:,1]]).T
experiment_diff_mean[times] = experiment_data[experiment_data[:,0]==1][:,1].mean()
- experiment_data[experiment_data[:,0]==0][:,1].mean()
return experiment_diff_mean
The same issue exists in this block:
experiment_diff_mean = experiment_data[experiment_data[:,0]==1][:,1].mean()
- experiment_data[experiment_data[:,0]==0][:,1].mean()