Comment by xk3

With a large corpus (10,000+ sentences--each sentence is a "document" in my usecase) I can get similar results by kmeans clustering TF-IDF spmatrix vectors but it looks like this has a lot of utilities for making the kmeans part faster (binarization, etc).

Looking forward to doing some benchmarking over the next couple weeks