Approximating probability distributions with short vectors, via information theoretic distance measures