This is an edited repost of a year-old article from my blog Genetic Inference. It explains how the state-of-the-art Second Generation sequencing works, and how it is being used to sequence thousands of genomes per day. I also try to explain some of the distinctions between First, Second and Third Generation sequencing.
This post follows on from an even older post that explained First Generation sequencing; the tech that was used in the Human Genome Project.
Recap: What are we trying to do?
In a previous post, we saw how DNA is made up of little strings of nucleotides, and we used different shapes to represent different base pairs (A = triangle, C = diamond, G = circle, T = pentagon). For instance, is GCAT.
We looked at how the DNA polymerase enzyme can be used to amplify up DNA, using the Polymerase Chain Reaction, and how we can determine the sequence of DNA using ddNTPs; nucleotides that, when incorporated into DNA, stop the polymerase working.
In First Generation (Sanger) sequencing, we run a PCR reaction in the presence of a bunch of ddNTPs, with each different base pair dyed a different colour. We then measure the length and colour of the resulting fragments of DNA, and use that to work out the sequence; a bit of DNA 35 base pairs long ending in a blue ddNTP tells us that the sequence has a “C” at the 35th position.
The problem with this method is that it requires a lot of space; you need a place to run the reaction, and then you need a capillary tube or a gel to determine the length of the DNA. As a result, you could only run perhaps a hundred of these reactions at any one time. There are 3 billion base pairs of DNA in the human genome, meaning about 6 million 500-base pair fragments of DNA; it would take a very long time to sequence all of these if you had to do them one hundred at a time.
Second Generation sequencing techniques overcome this restriction by finding ways to sequence the DNA without having to move it around. You stick the bit of DNA you want to sequence in a little dot, called a cluster, and you do the sequencing there; as a result, you can pack many millions of clusters into one machine. Sequencing a strand of DNA while keeping it held in place is tricky, and requires a lot of cleverness. I’ll explain how Illumina‘s Second Generation technology achieves this, as it is the most similar to Sanger sequencing.
Continue reading ‘Basics: Second-Generation Sequencing’