self.qc_thres = cost_limit * (1 - self.gamma**self.episode_len) / (
1 - self.gamma) / self.episode_len
In COptiDICE, the flow constraint is normalized by multiplying by (1-gamma) to the initial distribution, so they should sum up to 1.
So, isn't self.qc_thres equal to cost_limit/self.episode_len rather than cost_limit * (1 - self.gamma**self.episode_len) / (1 - self.gamma) / self.episode_len?