How to get gradients w.r.t the reward value prediction error ?

Here we predict the reward value by multiplying **immediate state representations with reward prediction vector W**. When getting the mean squared error w.r.t above loss we get gradients w.r.t reward vector prediction variables right  ?