An implementation of the reinforcement learning for CartPole-v0 by policy optimization The step plot of the result The histogram of the 100 simulation result (mean value 199) Reference CartPole-v0: https://gym.openai.com/envs/CartPole-v0/ Ilyas, Andrew, et al. "A closer look at deep policy gradients." arXiv preprint arXiv:1811.02553 (2018).