Unlocking sequence data archives with scalable software and resources

March 1, 2017 @ 04:00 pm to 05:00 pm

Weekly Wednesday Wartik Genomics Seminar speaker Benjamin Langmead, Johns Hopkins University

501 Wartik Lab

Event Website

Abstract: The Sequence Read Archive contains data for over 450K RNA-seq samples, including over 140K from human samples. Large-scale projects like GTEx and ICGC are generating RNA-seq data on many thousands of samples. Such huge and carefully designed datasets are valuable, but unwieldy for typical researchers, especially when access to computational resources is limited. I will describe work toward the goal of making it easy for biological researchers to use the archived RNA-seq data available today. I will highlight the Rail-RNA software (http://rail.bio), its dbGaP-protected version (http://docs.rail.bio/dbgap/), as well as the recount (https://jhubiostatistics.shinyapps.io/recount/) and Snaptron (http://snaptron.cs.jhu.edu) resources. The Rail-RNA software uses the Amazon Web Services commercial cloud to analyze many samples at once. We used Rail-RNA to study tens of thousands of public RNA-seq accessions, yielding new insights about the completeness of existing gene annotations and about how our knowledge of human splicing diversity has evolved over time. I will demonstrate how the recount resource can be used to answer questions about expression and differential expression across 10,000s of RNA-seq samples, and how the Snaptron API can be used to rapidly answer sophisticated queries against the splicing patterns in recount. Much of this is joint work with Abhinav Nellore, Jeff Leek, Kasper Hansen, Andrew Jaffe and others.Bio: Ben Langmead is an Assistant Professor of Computer Science at Johns Hopkins University. He earned a Ph.D. in Computer Science from the University of Maryland in 2012. _His group seeks to make high-throughput biological datasets easy for biomedical researchers to use. _The group studies and applies ideas from sequence alignment, text indexing, statistics and parallel programming. _He has released several high-impact software tools (e.g. Bowtie, Bowtie 2) that address common genomics research questions. _His paper describing Bowtie won the Genome Biology award for outstanding paper in 2009. _He has also released scalable software tools (e.g. Myrna, Rail-RNA) that use the MapReduce parallel programming model and commercial cloud computing services to analyze large collections of sequencing data. _Ben's lab also collaborates with biostatisticians and biologists to create resources that allow biological researchers to easy query the huge amount of sequencing data available in public archives (e.g. ReCount, Intropolis). _He is the recipient of a Sloan Research Fellowship (2014), a National Science Foundation CAREER award (2014) and the Benjamin Franklin award for contributions to open access (2016). _ This event is part of the Weekly Wednesday Genomics Seminar Series._ All Seminars will be held at 4:00 pm in 501 Wartik Lab unless otherwise noted. _Coffee, tea and cookies will be available at 3:45 pm._ To Subscribe to the weekly seminar announcement list, send a blank email to: l-wwwgls-subscribe-request@lists.psu.edu to Unsubscribe, send a blank email to:_ l-wwwgls-unsubscribe-request@lists.psu.edu

Contact

Donna McMinn
dlp18@psu.edu
814-867-5973