Nomenclature for stochastic processes and Bayesian nonparametric statistics
Stochastic processes are now a key part of mainstream Bayesian statistics and probabilistic machine learning. Yet many authors do not have a solid foundation in probability theory, and so many papers commit basic errors when attempting to talk precisely about stochastic processes. This articles tackles a few key ideas and misconceptions. Ideas for new sections are welcome.
Nomenclature for stochastic processes
In probability theory, a stochastic process is an indexed collection of random variables defined on the same space. That is, a stochastic process is a collection of random variables on a (measurable) space . That is, a stochastic process is a collection of -measurable functions , for , where is a probability space and is a measurable space. You can alternatively think about a stochastic process as a function , but, crucially, this function is not necessarily measurable itself: only the functions are presumed to be measurable. Indeed, we have not even specified a -algebra on the index set , and so we cannot even speak formally about the joint measurability of . That said, the index set often has some structure (say, the Real line and its Borel structure), and we may want to be measurable. In that case, we are interested in the existence, and construction, of a measurable version. But this is beyond the scope of this article. (See Shalizi's course notes  for a relatively gentle introduction.)
Confusion with meaning of "stochastic process" in English
A lot of confusion in machine learning comes from understanding the term "stochastic process" in terms of its English meaning: a "process" (i.e., series of actions or steps) that unfolds in a "stochastic" (i.e., randomly determined) way. Indeed, a Markov chain in discrete or continuous time is a stochastic process modeling a process unfolding in a random way. But when we move from stochastic processes indexed by time, i.e., or , to stochastic processes indexed by, say, the collection of all Borel measurable subsets of the real line, then the intuitive English meaning becomes misleading.
Confusingly, many stochastic processes defined on more exotic index sets are defined in terms of stochastic processes indexed by . But, mathematically, an indexed collection of random variables is simply a collection of (measurable) functions, and so, while they may be defined recursively, or defined in terms of a stochastic process modeling a "process" unfolding in time, these functions simply exist at the outset: they don't appear individually out of thin air when some random event happens.
A random measure on a measurable space is a stochastic process on with index set such that
and, for every countable sequence of measurable sets,
where , as is usual.
(A random probability measure also satisfies .)
One often demands more measurability of : for example, enough measurability to be able to demand that countable additivity holds for all possible countable collections of sets simultaneously. Alternatively, one can think about a random measure as a random element in the space of measures, where the -algebra is that generated by the functions of the form , for . It is typically to also demand that there exists some measurable partition of such that .
Priors versus processes
A very common misstatement found in machine learning papers is "The Dirichlet process is a distribution on the space of probability measures". Confusingly, this statement could be true, but it is probably false. Consider the standard setup:
(To be precise, we must say that the are conditionally i.i.d. given . Alternatively, we could have written, e.g., , which implies this statement.)
Let us assume that is a probability measure on the real line. The way to read the first statement is: " is a Dirichlet process." Emphatically, is NOT a "sample from a Dirichlet process". (The would fit this description, though.) The confusion comes from the English meaning discussed above and the fact that one often first encounters Dirichlet processes by way of stick-breaking constructions, which are themselves stochastic processes indexed by .
In the setting of a Bayesian statistics paper, the easiest way to fix the statement at the start of the section is to replace "Dirichlet process" with "Dirichlet process prior", although perhaps a better statement would pluralize "prior" to "priors" and "distribution" to "distributions". A Dirichlet process is a random probability measure. (Note that a vector in with a finite-dimensional Dirichlet distribution is also a Dirichlet process when the vector is viewed as a distribution on .)
As I mentioned above, there is also the real possibility of ambiguity in the statement at the start of the section: a Dirichlet process could be a random probability measure on the space of probability measures. Let be a Dirichlet process prior. Then
is a random probability measure on the space of probability measures. This actually appears in the literature.