Hi, thank you for your attribition!
It seems that the "factor" in "model.py" is None first, and set to a fixed value after the first batch of first epoch, and keeped fixed during the training process. What the benefit of the strategy?
Besides,should the "factor" in "model.py" be removed or just consider the effect of self norm during obtaining embedding? As it would conssider the effect of other data in the same batch if we do not set the batch_size = 1.
Thanks again!