CUB is supposedly faster than Thrust: http://nvlabs.github.io/cub/ We could use it in longest computations in step_sync. First check performance of the newest version of thrust!