An attempt to intuitively explain HDP, and why it makes sense to use HDPs
for joint mixture modeling of multiple related datasets.
Suppose we consider that every document discusses a set of topics. Our task is to learn the topics. We apriori do not know how many topics are there in a document. A salient approach to this problem is non-parametric Bayesian modeling, wherein we use a Dirichlet Process (DP) prior, which offers probabilities to infinite topics apriori. Using the document data, we could pick the required finite topics a posteriori.
Now, it is natural to expect that various documents share topics. For example, a document on Beta Processes and a document on Gaussian Processes might share a topic like Stochastic. Learning topics for each document seperately denies us an opportunity to share data across documents. We could’ve learned about the topic Stochastic better if we jointly learned on both documents. Note that we don’t want to club both documents into one! This destroys the individual document structure. A solution in statistics which enables joint learning while preserving the document structure is the hierarchical modeling technique.
Let’s take 2 documents, and put up
One appealing aspect of the model is its recursiveness, as in, we could easily extend the model to another level of hierarchy by sampling H from another DP. This can be used to learn topics jointly from multiple corpora. For example, it is likely that documents from Statistics corpora and Computer Science corpora share a topic like ’Python’.
2
How are
The CRP metaphor for Dirichlet Process can be extended to the Chine Restaurant Franchise metaphor to explain HDP intuitively. The core idea is that, in addition to customers going to restaurants and picking tables, we now have tables taking up a customer role and going to a parent restaurant. For simplicity's sake, let's take the case of 2 restaurants X and Y. A, B, C are tables in restaurant X which has 4 customers. D and E are tables in restaurant Y with 5 customers. The parent restaurant has tables p, q, and r, with the 5 tables from two restaurants being customers. The customer seating arrangement is as shown in Figure 1. Since A and D sit at table p, it means the dishes served at table A and D would be the same as p.
Suppose a new customer 5 comes in restaurant X. She would sit at table A
with probability
Table/Dish | Probability |
---|---|
p | |
q | |
r | |
New Dish |
In case the waiter sits at table p, she'd call restaurant X and tell the chef to serve dish p at the new table that customer 5 chose. If the waiter chooses a new dish, say s, then this new dish s will be served to customer 5.
The process in similar in Restaurant Y too. A new customer would either sit at one of the tables, or choose to sit at a new table. If she prefers a new table, a waiter would go to the parent restaurant and choose a dish according to another CRP. Notice that there is a global set of dishes; a subset of those dishes is being served in each restaurant.
4
The dishes p, q, r are analogous to
The article was prepared using the Distill Template
I wrote this article as a part of the course CS698X: Probabilistic Modeling and Inference by Prof. Piyush Rai
If you see mistakes or want to suggest changes, please contact me