In Partial Fulfillment of the Requirements for the Degree of
Master of Science
Will defend her thesis
DNA Sequencing technologies introduced in this decade have reduced the time per genome to days or weeks. Next Generation Sequencing technologies generate a huge amount of data in the form of short DNA sequences called ‘reads’ along with quality scores for each base in the read. Using these reads and qualities, we can perform Genome Assembly and Mapping and can distinguish between SNPs (Single Nucleotide Polymorphisms) and sequencing errors. As part of this project, new data structures and algorithms are developed to handle this huge amount of data and perform analysis on it. The key steps of the design are:
• Developing data structures for handling sequencing reads of arbitrary lengths.
• Developing data structures for quick access to reads along with the quality scores of each base in the read.
• Incorporating the Quality scores of the Reads into a Mapping algorithm to map sequenced data to reference genomes.