Why do you store separate prior mean and covariance for each sample within a batch? https://github.com/chaiyujin/glow-pytorch/blob/487a6b149295f4ec4b36e408f63604c593ff2031/glow/models.py#L199 It doesn't make sense to me. As I understand, you sample each element of the batch from its own distribution.