TY - GEN AB - Genomic structural variations are an important class of genetic variants with a wide va- riety of functional impacts. The detection of structural variations using high-throughput short-read sequencing data is a difficult problem, and published algorithms do not pro- vide the sensitivity and specificity required in research and clinical settings. Meanwhile, high-throughput sequencing is rapidly generating ever-larger data sets, necessitating the development of algorithms that can provide results rapidly and scale to use cloud and cluster infrastructures. MapReduce and Hadoop are becoming a standard for managing the distributed processing of large data sets, but existing structural variation detection approaches are difficult to translate into the MapReduce framework. We have formulated a general framework for structural variation detection in MapReduce, and implemented a software package called Cloudbreak, which detects genomic deletions and insertions with very high accuracy compared to existing popular tools. AD - Oregon Health and Science University AU - Whelan, Christopher DA - 2014 DO - 10.6083/M4DJ5CZ2 DO - DOI ED - Sönmez, Kemal ED - Advisor ID - 2707 KW - Computational Biology KW - Machine Learning KW - Artificial Intelligence KW - Genomics KW - Genomic Structural Variation L1 - https://digitalcollections.ohsu.edu/record/2707/files/3482_etd.pdf L2 - https://digitalcollections.ohsu.edu/record/2707/files/3482_etd.pdf L4 - https://digitalcollections.ohsu.edu/record/2707/files/3482_etd.pdf LK - https://digitalcollections.ohsu.edu/record/2707/files/3482_etd.pdf N2 - Genomic structural variations are an important class of genetic variants with a wide va- riety of functional impacts. The detection of structural variations using high-throughput short-read sequencing data is a difficult problem, and published algorithms do not pro- vide the sensitivity and specificity required in research and clinical settings. Meanwhile, high-throughput sequencing is rapidly generating ever-larger data sets, necessitating the development of algorithms that can provide results rapidly and scale to use cloud and cluster infrastructures. MapReduce and Hadoop are becoming a standard for managing the distributed processing of large data sets, but existing structural variation detection approaches are difficult to translate into the MapReduce framework. We have formulated a general framework for structural variation detection in MapReduce, and implemented a software package called Cloudbreak, which detects genomic deletions and insertions with very high accuracy compared to existing popular tools. PY - 2014 T1 - Detecting and analyzing genomic structural variation using distributed computing TI - Detecting and analyzing genomic structural variation using distributed computing UR - https://digitalcollections.ohsu.edu/record/2707/files/3482_etd.pdf Y1 - 2014 ER -