Integrative visual and computational exploratory analysis of genomics data High-throughput genomics is now shifting from a data generation field to a data analysis field. Rapid advances in sequencing technologies and their use in large consortium projects like Encode, 1000 genomes project and the Human Epigenome Roadmap, among others, hold promise for biomedical scientists to posit and test hypothesis on complex mechanisms of development and disease by integrating massive publicly available data as context for their own experimental data. The R / Bioconductor project is a success story in the field of high- throughput genomics data analysis, with a large software repository, well-established software development and dissemination practices, and extensive user base. The core project provides infrastructure for leading edge analysis of a wide range of genomics data, chiefly high-throughput sequencing and microarrays. Bioconductor is well-suited to primary and integrative analysis of, e.g., RNA-seq differential expression, copy number, SNP and other variants, and methylation and other epigenetic data. Significant opportunity exists to develop integrative and interactive visualization facilities based on the infrastructure provided by Bioconductor. Such tools would be immediately accessible to the large number of international software developers using Bioconductor to implement analytic methods, and to established and nascent user communities hungry for effective, flexible, statistically informed visualization tools Our group has extensive experience in the development of statistical, computational and visualization tools for genomics data. It also collaborates closely with biomedical researchers in substantive cutting-edge research providing first-hand knowledge of the needs of this community. Our group has consistently demonstrated a commitment to the public dissemination of tools as open-source publicly available software. In this project we will develop interactive visualization methods and systems that provide tight-knit coupling with computational and statistical modeling and data analysis. We will use this framework to transition and implement cutting-edge methods for visualization of large datasets and apply these to three important areas in genomics: epigenomics, transcriptomics and metagenomics, all holding great promise for the understanding of human development and disease.
Public Health Relevance
Integrative visual and computational exploratory analysis of genomics data High-throughput genomics is now shifting from a data generation field to a data analysis field. Biomedical scientists need software tools that support flexible data integration and analysis, collaboration and dissemination so that the full promise of genomics as a data analysis field is successfully met. Our proposal is to leverage our extensive experience in building computational, statistical and visualization systems to create tools that support exploratory, nimble and creative data analysis workflows in an integrative, reproducible, collaborative environment.