David Robinson bio photo

David Robinson

Principal Data Scientist at Heap, works in R and Python.

Email Twitter Github Stack Overflow


Recommended Blogs

About Me

I’m a data scientist at Heap. My interests include statistics, data analysis, education, and programming in R.

I’m the co-author with Julia Silge of the tidytext package and the O’Reilly book Text Mining with R. I’m also the author of the broom and fuzzyjoin packages, and of the e-book Introduction to Empirical Bayes.

I previously worked as Chief Data Scientist at DataCamp and as a data scientist at Stack Overflow, and received a PhD in Quantitative and Computational Biology from Princeton University.



  • broom: Convert messy model outputs to a tidy format, for use with tools such as dplyr and tidyr.
  • fuzzyjoin: Join tables based on inexact matching of columns
  • tidytext: Analyze text using tidy packages such as dplyr, ggplot2, and tidyr
  • stackr: R package for connecting to the Stack Exchange API


  1. Johnson EL, Robinson D.G., Coller HA. (2017) Widespread changes in mRNA stability contribute to quiescence-specific gene expression patterns in a fibroblast model of quiescence. BMC Genomics 2017, 18(1):123.
  2. Robinson, D.G. (2015) broom: An R package for converting statistical analysis objects into tidy data frames. arXiv preprint. arXiv:1412.3565 [stat.CO].
  3. Robinson, D.G., Wang, J., and Storey, J.D. (2014) A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays. biorXiv preprint. doi:10.1101/013342.
  4. Robinson, D.G. and Storey, J.D. (2014) subSeq: Determining appropriate sequencing depth through efficient read subsampling. Bioinformatics, 30 (23): 3424-3426. doi: 10.1093/bioinformatics/btu552.
  5. Robinson, D.G., Chen, W., Storey, J.D., and Gresham, D. (2014) Design and Analysis of Bar-seq Experiments. G3: Genes/Genomes/Genetics, 4(1), 11-18
  6. Robinson, D.G., Lee, M.C. and Marx, C.J. (2012) OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences. Nucleic Acids Research, 10.1093/nar/gks778.


profile for David Robinson at Stack Overflow, Q&A for professional and enthusiast programmers

About This Site

This site is powered by Jekyll using the Minimal Mistakes theme. All blog posts are released under a Creative Commons Attribution-ShareAlike 4.0 International License. The favicon and logo were created by Thomas Lin Pedersen.

All blog posts are compiled with knitr R markdown using this script. You can find the reproducible sources of each blog post here.

All opinions and views are my own and do not represent my employer.