Ph.D.: Ecology and Evolutionary Biology, Princeton University, 2001
Postdoctoral Fellow: University of Colorado, Boulder, 2001-2004
Genomics, molecular evolution and the microbiome
Advances in high-throughput sequencing and in computational techniques allow us to address large-scale questions about evolution that have never before been accessible. Our research combines computational and experimental techniques to ask questions about the evolution of the composition of biomolecules, genomes, and communities.
Community composition and the Human Microbiome: We are developing new methods to test factors that make environments more or less similar in terms of the phylogenetic diversity of the organisms they contain. For example, in hot springs in Yellowstone, the driving factors might be temperature, pH, hydrogen sulfide, or any of a number of other physical and chemical factors. In our own human bodies, the microbial symbionts we carry with us outnumber our so-called human cells by as much as an order of magnitude, and these microbial communities have profound implications for health and disease. The Human Microbiome Project  seeks to understand these communities.
We recently developed UniFrac , a clustering metric that uses a phylogenetic tree to measure the biological distance between each pair of environments represented in the tree. We can then use clustering methods, such as hierarchical clustering, and ordination methods, such as PCA, to identify environments that are more similar or different, and to correlate these differences with physical and biological properties of the environment. We found that microbial diversity in the mouse gut is primarily inherited by parent-offspring contact, but that the relative abundance of different taxa depends on the host genotype . UniFrac allows all the information in a phylogeny to be brought to bear on the clustering problem, allowing new insights into the factors that govern community assembly. For example, we have used UniFrac to discover that salinity is the main driving factor in a broad range of distinct physical habitats , that mammalian gut communities cluster primarily by diet , that the gut is a much more distinct habitat from different physical habitats than they are from each other (e.g. the difference between the gut and the mouth communities can be larger than the differences in the communities living in a hot spring and on an ice cap). We expect UniFrac to have a wide impact in a range of environmental and medical applications.
A key advance we have made is barcoded pyrosequencing [6-7], especially the use of formal error-correcting codes to allow us to use 454 pyrosequencing to simultaneously study hundreds of microbial communities. Barcoded pyrosequencing opens up whole new vistas of unexplored microbial diversity. For example, we have used this technology to study water quality and samples from cystic fibrosis patients’ lungs , to show that, if you’re typical of our study population, your left and right hand probably share only about 18% of their species , and that systematic shifts in the gut community occur with obesity . Remarkably, this latter paper shows that radically different species assemblages can maintain a core at the level of gene functions, parallelling trends in macroecology where e.g. grasslands on different continents may share none of their species but may look extremely similar in physical and chemical conditions when compared to e.g. rainforests. We have also been exploring the biogeography of the human body .
Collaborators on this project include Jeffrey Gordon, Ruth Ley, Frederick Bushman, Noah Fierer, Scott Kelley, Norman Pace, Eric Triplett, Allan Konopka, Henry Tufo, Gary Andersen, Todd DeSantis, and a host of others (list intended to be illustrative rather than exhaustive). We are or have been funded by the Keck Foundation, the NIH (NIDDK for obesity, NHGRI for new methods development and to participate in the Human Microbiome Project Data Analysis and Coordination Center), the Crohns and Colitis Foundation of America (for applying these techniques to IBD), the Bill and Melinda Gates Foundation (for applying these techniques to malnutrition), PNNL (radioisotope immobilization), NSF/USDA (for studies of soil archaea), and HHMI.
RNA composition: At a much lower molecular level, an experimental technique called SELEX, or in vitro selection, allows functional RNA molecules to be isolated from large pools of random RNA sequences. Typically, these pools are designed to have equal compositions of the four nucleotides. However, it is unclear whether this is the best region of the space of possible compositions to search for functional RNA molecules.
We are comparing RNA molecules isolated from SELEX to biological RNA molecules to test whether there are general rules that govern the nucleotide composition of specific RNA structural features. Several researchers, including Erik Schultes and Donald Forsdyke, have shown that biological RNAs are specifically biased towards purines. We are testing whether functional molecules of defined overall composition differ statistically from random molecules of the same composition, and whether there are rules that govern how many of the A’s, C’s, G’s, and U’s in a random sequence end up in different structural categories such as stems, loops, bulges and junctions . These patterns are heavily influenced by 3D structural features . We expect that these rules  will help us improve our RNA secondary structure prediction software, BayesFold . BayesFold uses the information contained in an alignment of sequences that share the same function, and therefore presumably share the same structure, to provide highly accurate secondary structure predictions for alignments of short RNA sequences. We also expect that we will find general rules that influence the assembly of particular RNA architectures.
We are also testing whether the information contained in minimal functional RNA motifs is sufficient, as well as necessary, for function. SELEX experiments typically isolate short, degenerate sequences that are necessary for function from many different random-sequence backgrounds. Continuing work in Michael Yarus’s lab has shown that the minimal motif that performs a particular task, such as binding or catalysis, can be found by "squeezing" the random region into shorter and shorter lengths. If these sequences and their specific secondary structure configuration are sufficient for activity, we should be able to obtain functional sequences by embedding them in longer, random sequences. We are currently determining whether this is the case, or whether additional identity elements are needed. Because we can accurately predict how many random sequences are required to obtain a specified sequence and secondary structure motif [16-18], this work is crucial for estimating the information required to perform different catalytic or binding functions.
Collaborators on this project include Michael Yarus, Manuel Lladser, Hans De Sterck, Sandra Smit, and Jana Chocholousova. This project is funded by NASA’s Astrobiology program.
New bioinformatics tool development: An important component of our research is the development of new bioinformatics tools. In addition to UniFrac  and its web interface , and the much faster FastUnifrac  we have been developing tools to improve proteomics analyses [21-22], identify motifs associated with function [23-24], untangle pseudoknots , etc. We depend heavily on an open-source toolkit we develop, PyCogent , which supports many of our tools and web sites. We are also heavily engaged in benchmarking efforts through the HMP DACC, the RNA Ontology Consortium, and other standards efforts.
Collaborators on these projects include Gavin Huttley, Kristian Rother, Jaap Heringa, Shelley Copley, Neocles Leontis, Eric Westhof, Todd DeSantis, Jennifer Wortman, Owen White, Shelley Copley, and a host of others. These projects are or have been funded by the Jane and Charlie Butcher Foundation, the Keck Foundation, the US Air Force, NIDDK, NHGRI, the Crohns and Colitis Foundation of America, the Bill and Melinda Gates Foundation, and HHMI.
 Turnbaugh, P.J, Hamady, M., Ley, R., Fraser, C., Knight, R., and Gordon, J.I. (2007) "The human microbiome project: exploring the microbial side of ourselves". Nature 449:804.
 Lozupone, C.A. and Knight, R. (2005). "Unifrac: A New Phylogenetic Method For Comparing Microbial Communities." Appl Envrionm Microbiol 71:8228-35.
 Ley, R.E., Backhed, F., Turnbaugh, P., Lozupone, C.A., Knight, R.D., and Gordon, J.I. (2005) "Obesity alters gut microbial ecology." Proceedings of the National Academy of Sciences. 102:11070-11075.
 Lozupone, C.A., and Knight, R. (2007) "Global patterns in bacterial diversity." PNAS 104:11436-40.
 Ley, R.E., Hamady, M., Lozupone, C., Turnbaugh, P.J., Ramey, R.R., Bircher, J.S., Schlegel, M.L., Tucker, T.A., Schrenzel, M.D., Knight, R., and Gordon, J.I. (2008) "Evolution of Mammals and their Gut Microbes." Science 320:1647-51.
 Hamady, M., Walker, J.J., Harris, J.K., Gold, N., and Knight, R. (2008) "Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex". Nature Methods 5:235-7.
 McKenna, P., Hoffman, C., Aye, P.P., Lackner. A., Liu, Z., Lozupone, C.A., Hamady, M., Knight, R., and Bushman, F.D. (2008) "The Macaque Gut Microbiome in Health, Lentiviral Infection and Inflammatory Bowel Disease." PLoS Pathogens 4:e20
 Fierer, N., Hamady, M., Lauber, C., and Knight, R. (2008) "The influence of sex, handedness, and washing on the diversity of hand surface bacteria." PNAS 105:17994
 Turnbaugh, P.J., Hamady, M., Yatsunenko, T., Cantarel, B.L., Duncan, A., Ley, R.E., Sogin, M.L., Jones, W.J., Roe, B.A., Affourtit, J.P., Egholm, M., Henrissat, B., Heath, A.C., Knight, R., and Gordon, J.I. (2009) "A core gut microbiome in obese and lean twins." Nature 457:480-4.
 Lozupone, C.A., Hamady, M., Cantarel, B.L., Coutinho, P.M., Henrissat, B., Gordon, J.I., and Knight, R. (2008). "The convergence of carbohydrate active gene repertoires in human gut microbes." PNAS 105:15076.
 Costello, E.K., Lauber, C.L., Hamady, M., Fierer, N., JGordon, J.I., and Knight, R. (2009). "Bacterial community variation in human body habitats over space and time."' Science 326:1694.
 Smit, S., Yarus, M, and Knight, R (2006). "Natural selection is not required to explain universal compositional patterns in rRNA structural categories." RNA 12:1-14.
 Smit, S., Widmann, J. and Knight, R. (2007) "Evolutionary rates vary among rRNA structural elements." Nucleic Acids Res 35:3339-54.
 Smit, S., Knight, R. and Heringa, J. (2009) "RNA structure prediction from evolutionary patterns of nucleotide composition". NAR 37:1378-86.
 Knight, R., Birmingham, A. E., and Yarus, M. (2004). "Bayesfold: Rational secondary folds that combine thermodynamic, covariation and chemical data for aligned RNA sequences". RNA 10(9):1323-36.
 Knight, R. and Yarus, M. (2003). "Finding specific RNA motifs: Function in a zeptomole world?" RNA 9:218-230.
 Knight, R., De Sterck, H., Markel, R., Smit, S., Oshmyansky, A., and Yarus, M. (2005) "Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids". Nucleic Acids Research 33:5924-35.
 Legiewicz, M., Lozupone, C., Knight, R. and Yarus, M. (2005). "Size and constant sequences alter selection". RNA 11:1701-9.
 Lozupone, C., Hamady, M. and Knight, R. (2006). "UniFrac: An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context." BMC Bioinformatics 7:371
 Lozupone, C., Hamady, M. and Knight, R. (2010). "Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data." ISME J 4:7
21 Resing, K.A., Meyer-Arendt, K.E., Alex M. Mendoza, A.M., Aveline-Wolf, L.D., Jonscher, K.R., Pierce, K.G., Old, W.M., Cheung, H.T., Russell, S., Wattawa, J.L., Goehle, G.R., Knight, R.D., and Ahn, N.G. (2004). "Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics." Anal Chem. 76(13):3556-68
22 Ruth, M.C., Old, W.M., Emrick, M.A., Meyer-Arendt, K., Aveline-Wold, L.D., Pierce, K.G., Mendoza, A.M., Sevinsky, J.R., Hamady, M., Knight, R.D., Resing, K.A., and Ahn, N.G. "Analysis of Membrane Proteins from Human Chronic Myelogenous Leukemia Cells: Comparison of Extraction Methods for Multidimensional LC-MS/MS". Journal of Proteome Research 5:709-19.
 Widmann, J., Hamady, M., and Knight, R. (2006). "DivergentSet: Picking non-redundant sequences from large sequence collections." Mol Cell Proteomics 8:1520-1532.
 Hamady, M., Widmann, J., Copley, S.D., and Knight, R. (2008) "MotifCluster: An interactive online tool for clustering and visualizing sequences using shared motifs." Genome Biol 9:R128.
 Smit, S., Rother, C., Heringa, J., and Knight, R. (2008) "From knotted to nested RNA structures: a variety of computational methods to objectively untangle RNA pseudoknots." RNA 14:410-6.
 Knight, R., Maxwell, P., Birmingham, A., Carnes, J., Caporaso, J.G., Easton, B.C., Hamady, M., Liu, Z., Lozupone, C., Sammut, R., Smit, S., Wakefield, M., Widmann, J., Wikman, S., Wilson, S., and Huttley, G.A.. (2007) "PyCogent: a toolkit for making sense from sequence." Genome Biology 8:R171.