Comment by kianN
Some statistical notes for those interested:
Under the hood, this model resembles LDA, but replaces its Dirichlet priors with Pitman–Yor Processes (PYPs), which better capture the power-law behavior of word distributions. It also supports arbitrary hierarchical priors, allowing metadata-aware modeling.
For example, in an earnings-transcript corpus, a typical LDA might have a flat structure: Prior → Document
Our model instead uses a hierarchical graph: Uniform Prior → Global Topics → Ticker → Quarter → Paragraph
This hierarchical structure, combined with the PYP statistics, consistently yields more coherent and fine-grained topic structures than standard LDA does. There’s also a “fast mode” that collapses some hierarchy levels for quicker runs; it’s a handy option if you’re curious to see the impact hierarchy has on the model results (or in a rush).
Curious about what you use to productionalize this; it is so cool and inspiring to see hierarchical bayes applications like this.
What is the go to "production" stack for something like this nowadays? Is Stan dead? Do you do HMC or approximations with e.g. Pyro?