Dan Roy aka daniel roy, daniel m roy, droy

Research Fellow, Emmanuel College
University of Cambridge

research

My research blends computer science, statistics and probability theory; I study "probabilistic programming" and develop computational perspectives on fundamental ideas in probability theory and statistics. I am particularly interested in: representation theorems that connect computability, complexity, and probabilistic structures; stochastic processes, the use of recursion to define stochastic processes, and applications to nonparametric Bayesian statistics; and the complexity of probabilistic and statistical inference, especially in the context of probabilistic programming. Ultimately, I am motivated by the long term goal of making lasting contributions to our understanding of complex adaptive systems and especially Artificial Intelligence.

news

JAN 2014
I will be giving an invited talk at the Mathematical Foundations of Programming Semantics conference this June at Cornell.

OCT 2013
I will be serving on the program committee for Computability and Complexity in Analysis, which will be held in Darmstadt this coming July.

SEP 2013
I will be applying for tenure-track positions this academic year in Computer Science and in Statistics. If you feel your institution would be a good fit, please contact me.

JUL 2013
I am giving a plenary talk at the Computability and Complexity in Analysis conference in Nancy, France this July, 2013.

APR 2013
I have completed my Newton Fellowship and would like to thank the Royal Society, the Engineering Department of the University of Cambridge, and my host, Zoubin Ghahramani, for this unique opportunity. I will be continuing my postdoctoral research at Cambridge as a Research Fellow of Emmanuel College until October 2014.

DEC 2012
Cameron Freer, Josh Tenenbaum and I have just released a forthcoming book chapter connecting legacies of Turing and the probabilistic programming perspective on AI.

DEC 2012
I am co-organizing another NIPS workshop on probabilistic programming with Vikash Mansinghka and Noah Goodman. See the workshop website for more information.

Show More ▼

probabilistic programming

Programs can be used to give compact representations of distributions: in order to represent a distribution, one simply gives a program that would generate an exact sample were the random number generator to produce realizations of independent and identically distributed random variables. This approach to representing distributions by probabilistic programs works not only for simple distributions on numbers like Poissons, Gaussians, etc., and combinations thereof, but also for more exotic distributions on, e.g., phrases in natural language, rendered 2D images of 3D scenes, and climate sensor measurements.

Probabilistic programming systems support statistical inference on models defined by probabilistic programs. By constraining some variables of a program (e.g., simulated sensor readings in some climate model) and studying the conditional distribution of other variables (e.g., the parameters of the climate model), we can identify plausible variable settings that agree with the constraints. Conditional inferences like this would allow us to, e.g., build predictive text systems for mobile phones, guess the 3D shape of an object from only a photograph, or study the underlying mechanisms driving observed climate measurements.

Probabilistic programming systems for machine learning and statistics are still in their infancy, and there are many interesting theoretical and applied problems yet to be tackled. My own work focuses on theoretical questions around representing stochastic processes and the computational complexity of sampling-based approaches to inference. I was involved in the definition of the probabilistic programming language Church, and its first implementation, MIT-Church, a Markov Chain Monte Carlo algorithm operating on the space of execution histories of an interpreter. Some of my key theoretical work includes a study of the computability of conditional probability and de Finetti measures, both central notions in Bayesian statistics. Readers looking for an overview of these results are directed to the introduction of my doctoral dissertation. A less technical description of a probabilistic programming approach to artificial intelligence can be found in a recent book chapter on legacies of Alan Turing, co-authored with Freer and Tenenbaum.

More information on probabilistic programming can be found on probabilistic-programming.org, a wiki that I maintain. In particular, look at the list of research articles and papers on probabilistic programming and the tutorials.

NIPS Workshop on Probabilistic Programming

I co-organized the first workshop on probabilistic programming for statistics and machine learning at NIPS*2008 (with Vikash Mansinghka, John Winn, David McAllester and Josh Tenenbaum). Four years later, I co-organized the second workshop on probabilistic programming at NIPS*2012 (with Vikash Mansinghka and Noah Goodman).

academic

Hello, my name is Daniel M. Roy (or simply, Dan) and I am a Research Fellow of Emmanuel College at the University of Cambridge. I am also a member of the Machine Learning Group, headed by Zoubin Ghahramani and part of the Computational and Biological Learning Lab in the Department of Engineering.

I am a graduate of the EECS PhD program in computer science at the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory (CSAIL), where I was advised by Leslie Kaelbling. I also collaborated with members of Josh Tenenbaum's Computational Cognitive Science group.

You may find my abbreviated curriculum vitæ online.

coauthors

Nate Ackerman, William Beebee, Keith Bonawitz, Cristian Cadar, Hal Daumé III, Brian Demsky, Daniel Dumitran, Cameron Freer, Zoubin Ghahramani, Noah Goodman, Creighton Heaukulani, Eric Jonas, Leslie Kaelbling, Charles Kemp, Balaji Lakshminarayanan, Tudor Leu, James Lloyd, Vikash Mansinghka, Peter Orbanz, Ryan Rifkin, Martin Rinard, Virginia Savova, Lauren Schmidt, David Sontag, Yee Whye Teh, Josh Tenenbaum, David Wingate

chalk-talk-style slides

Several template .TEX files for producing slides for mathematical talks that more closely mimic the chalk talk aesthetic.

valid xhtml

Valid XHTML 1.1 re-validate
Valid CSS /styles/screen

preprints and working papers

Computability and a.e.-continuous disintegration
(with Nate Ackerman and Cameron Freer)
Preprint available upon request.

Beta processes and a continuum of urns process
Preprint available upon request.

The combinatorial structure of negative binomial processes
(with Creighton Heaukulani)

Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures
(with Peter Orbanz)
Slides for talk at CIMAT

On the computability of conditional probability
(with Nate Ackerman and Cameron Freer)

Chapter 5. Distributions on data structures: a case study
(Mondrian process theory)

Those seeking a more formal presentation of the Mondrian process than the NIPS paper should see Chapter 5 of my dissertation (linked above). A revised version of these and additional results is in preparation.

Note: Some articles, distinguished by "(with ...)", have alphabetical author lists, as is the convention in mathematics and theoretical computer science.

thesis

Computability, inference and modeling in probabilistic programming
Daniel M. Roy
Ph.D. thesis, Massachusetts Institute of Technology, 2011.
MIT/EECS George M. Sprowls Doctoral Dissertation Award

research articlesand publications

Top-down particle filtering for Bayesian decision trees
Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh
Proc. Int. Conf. on Machine Learning  (ICML), 2013.
[ code ]

Towards common-sense reasoning via conditional simulation:
Legacies of Turing in Artificial Intelligence

(with Cameron Freer and Josh Tenenbaum)
Turing's Legacy (ASL Lecture Notes in Logic), 2012.

Random function priors for exchangeable arrays with applications to graphs and relational data
James Lloyd, Peter Orbanz, Zoubin Ghahramani, Daniel M. Roy
Adv. Neural Information Processing Systems 25 (NIPS), 2012.

Computable de Finetti measures
(with Cameron Freer)
Annals of Pure and Applied Logic, 2012.

Complexity of Inference in Latent Dirichlet Allocation
David Sontag and Daniel Roy
Adv. Neural Information Processing Systems 24 (NIPS), 2011.

On the computability and complexity of Bayesian reasoning
NIPS Philosophy and Machine Learning Workshop, 2011.

Noncomputable conditional distributions
(with Nate Ackerman and Cameron Freer)
Proc. Logic in Computer Science (LICS), 2011.

Probabilistically Accurate Program Transformations
Sasa Misailovic, Daniel M. Roy, and Martin C. Rinard
Proc. Int. Static Analysis Symp. (SAS), 2011.

Bayesian Policy Search with Policy Priors
David Wingate, Noah D. Goodman, Daniel M. Roy, Leslie P. Kaelbling, and Joshua B. Tenenbaum
Proc. Int. Joint Conf. on Artificial Intelligience (IJCAI), 2011.

When are probabilistic programs probably computationally tractable?
(with Cameron Freer and Vikash Mansinghka)
NIPS Workshop on Monte Carlo Methods for Modern Applications, 2010.

Posterior distributions are computable from predictive distributions
(with Cameron Freer)
Proc. Artificial Intelligence and Statistics (AISTATS), 2010.

Complexity of Inference in Topic Models
David Sontag and Daniel Roy
NIPS Workshop on Applications for Topic Models: Text and Beyond, 2009.

The Infinite Latent Events Model
David Wingate, Noah D. Goodman, Daniel M. Roy, and Joshua B. Tenenbaum
Proc. Uncertainty in Artificial Intelligence (UAI), 2009.

Computable exchangeable sequences have computable de Finetti measures
(with Cameron Freer)
Proc. Computability in Europe (CiE), 2009.

Exact and Approximate Sampling by Systematic Stochastic Search
Vikash Mansinghka, Daniel M. Roy, Eric Jonas, and Joshua Tenenbaum
Proc. Artificial Intelligence and Statistics (AISTATS), 2009.

The Mondrian Process
(with Yee Whye Teh)
Adv. Neural Information Processing Systems 21 (NIPS), 2009.

Video animation of the Mondrian process as one zooms into the origin (under a beta Levy rate measure at time t=1.0). See also the time evolution of a Mondrian process on the plane as we zoom in with rate proportional to time. In both cases, the colors are chosen at random from a palette. These animations were produced by Yee Whye in Matlab. For now, we reserve copyright, but please email me and we'll be more than likely happy to let you use them.

A stochastic programming perspective on nonparametric Bayes
Daniel M. Roy, Vikash Mansinghka, Noah Goodman, and Joshua Tenenbaum
ICML Workshop on Nonparametric Bayesian, 2008.

Church: a language for generative models
Noah Goodman, Vikash Mansinghka, Daniel M. Roy, Keith Bonawitz, and Joshua Tenenbaum
Proc. Uncertainty in Artificial Intelligence (UAI), 2008.

Bayesian Agglomerative Clustering with Coalescents
Yee Whye Teh, Hal Daumé III, and Daniel M. Roy
Adv. Neural Information Processing Systems 20 (NIPS), 2008.

Discovering Syntactic Hierarchies
Virginia Savova, Daniel M. Roy, Lauren Schmidt, and Joshua B. Tenenbaum
Proc. Cognitive Science (COGSCI), 2007.

AClass: An online algorithm for generative classification
Vikash K. Mansinghka, Daniel M. Roy, Ryan Rifkin, and Joshua B. Tenenbaum
Proc. Artificial Intelligence and Statistics (AISTATS), 2007.

Efficient Bayesian Task-level Transfer Learning
Daniel M. Roy and Leslie P. Kaelbling
Proc. Int. Joint Conf. on Artificial Intelligience (IJCAI), 2007.

Learning Annotated Hierarchies from Relational Data
Daniel M. Roy, Charles Kemp, Vikash Mansinghka, and Joshua B. Tenenbaum
Adv. Neural Information Processing Systems 19 (NIPS), 2007.

Clustered Naive Bayes
MEng thesis, Massachusetts Institute of Technology, 2006.

Enhancing Server Availability and Security Through Failure-Oblivious Computing
Martin Rinard, Cristian Cadar, Daniel Dumitran, Daniel M. Roy, Tudor Leu, and William S. Beebee, Jr.
Proc. Operating Systems Design and Implementation (OSDI), 2004.

A Dynamic Technique for Eliminating Buffer Overflow Vulnerabilities (and Other Memory Errors)
Martin Rinard, Cristian Cadar, Daniel Dumitran, Daniel M. Roy, and Tudor Leu
Proc. Annual Computer Security Applications Conference (ACSAC), 2004.

Efficient Specification-Assisted Error Localization
Brian Demsky, Cristian Cadar, Daniel M. Roy, and Martin C. Rinard
Proc. Workshop on Dynamic Analysis (WODA), 2004.

Efficient Specification-Assisted Error Localization and Correction
Brian Demsky, Cristian Cadar, Daniel M. Roy, and Martin C. Rinard
MIT CSAIL Technical Report 927. November, 2003.

Implementation of Constraint Systems for Useless Variable Elimination
(advised by Mitchell Wand)
Research Science Institute. August, 1998.

Candid

contact

(email)
d.roy@eng.cam.ac.uk

(mail)
Emmanuel College
St. Andrew's Street
Cambridge CB2 3AP
United Kingdom

(UK mobile)
+44 (0) 7552 784 664

(US mobile)
+1 310 913 6380

recruiters: no, thank you.