The MaxPooling2D layers perform similar nonlinearity as ReLu so there is no need for activation function.
By replacing them with strided Conv2D we lose the nonlinearity effect and should add ReLu activation or the layer is basically useless (could be merged with next layer, because linear system).
Also the paper indicates that all Conv2D layers have ReLu activation.
@marcj maybe this gives your missing performance #4