Chapter 10

Sequencing, forensic analysis and genetic analysis

Introduction

The beginning of the 21st century has witnessed a revolution in our knowledge of the DNA sequences of various organisms, the most notable example being the sequencing of the human genome. Analysis of genomic DNA will enable us to learn a great deal about evolution, the relationship between different organisms, the mechanisms by which genes are controlled, susceptibility to disease, and the hidden languages within the DNA sequence. The sequencing of an entire genome is not yet, however, a routine technique, and other methods of genetic analysis are used to quickly and effectively analyze DNA samples. These methods, and technologies such as DNA fingerprinting, have also transformed forensic science.

Established methods of DNA sequencing, genetic and forensic analysis all depend on the use of labelled oligonucleotides and/or deoxy- or dideoxy-nucleoside triphosphates, and require a DNA polymerization step. This can be the polymerase chain reaction (genetic analysis in STR analysis), a single nucleotide extension (mini-sequencing in SNP analysis), or a combination of polymerization and DNA chain termination (Sanger sequencing).

The Polymerase Chain Reaction (PCR)

The polymerase chain reaction (PCR) is a technique used widely in molecular biology, diagnostics, forensic science and molecular genetics, to amplify a specific region (the amplicon) of a DNA sample. PCR can amplify a few molecules of a precious DNA sample (e.g. at the scene of a crime) to produce large quantities of DNA, from 50 to over 25 000 base pairs in length.

In PCR, two short oligonucleotides (PCR primers) are designed such that each is complementary to the 3'-end of one of the two target strands at the region to be amplified: the two PCR primers define the amplicon. The region of the template bound by the primers is amplified in a series of cycles (Figure 1). PCR requires a DNA polymerase enzyme. While all organisms contain DNA polymerases, the polymerase that is used in PCR comes from the thermophilic bacterium Thermus aquaticus. This Taq polymerase is heat-resistant, meaning that temperatures of up to 95 °C can be used in PCR, conditions of low DNA duplex stability.

Figure 1

The polymerase chain reaction (PCR)

In the first cycle, the double stranded target is separated into two single template strands by heating to 95 °C. It is then cooled to 55 °C to allow the synthetic oligonucleotide primers to anneal to the template strands with their 3' ends facing each other. The temperature is then increased to 72 °C, the optimum temperature for activity of the thermostable Taq DNA polymerase. The polymerase utilizes deoxyribonucleoside triphosphates (dNTPs) to extend the primers along the length of the template producing two new double strands of DNA (Figure 3).

Figure 2

Structures of standard nucleoside triphosphoates (dNTPs)

Figure 3

The PCR cycle A single cycle of the polymerase chain reaction.

The second cycle of PCR is a repeat of the first cycle, and each newly synthesized single strand also acts as a template for primer annealing and extension. The polymerase can only be extend the DNA as far as the locus of the first primer, producing DNA duplexes of a specific length. In all subsequent cycles amplification produces PCR products of a length specified by the loci of the two primers, and these PCR products soon outnumber the original target molecules. In theory, n cycles of PCR will produce 2 ⁿ PCR products.

Invented in the 1980s by Kary Mullis, the polymerase chain reaction has had such a ubiquitous effect on molecular biology, DNA diagnostics and forensic science that Mullis was awarded the Nobel Prize.

Primer dimer formation

Incorrect amplicons are sometimes generated in PCR owing to primer dimer formation, in which the PCR primers hybridize, and are amplified instead of the template DNA (Figure 4). Primer dimers are most likely to form at the beginning of PCR amplification, when the primers are present in high concentrations relative to the template. Primer dimer formatin can occur even if there are several mismatches in the primer-dimer duplexes, which are stabilized by binding to Taq polymerase. The formation of primer dimers is a particular problem when several PCR reactions are carried out in the same tube (multiplex PCR).

Figure 4

Primer dimer formation The formation and amplification of primer dimers, instead of amplification of the DNA template, can have a detrimental effect on PCR.

An obvious step in reducing primer dimer formation is to design the primers so that they have low self-complementarity, but this is not always possible, and even weak interactions can result in amplification of primer dimers. "Hot-start" PCR has been developed to alleviate primer dimer formation. In hot-start PCR the reaction mixture initially lacks a crucial component, such as the Taq polymerase, or Mg²⁺ (which the Taq polymerase requires for activity). Prolonged heating at the beginning of the first cycle ensures that all primer dimers are denatured, then the missing component is added. PCR can now continue as normal. The major disadvantage of this technique is the possibility of contamination during the addition of the essential component to the open reaction tube. An alternative involves the use of a noncovalent inhibitor of Taq polymerase (e.g. a peptide or antibody) that dissociates on heating. In another method, an essential component of the PCR reaction is enclosed in a wax pellet, which melts upon heating, liberating its contents.

DNA sequencing

All this has been possible because of methods developed by Fred Sanger in Cambridge over 30 years ago. Sanger developed a novel method of DNA sequencing (the dideoxy method) for which he was awarded his second Nobel Prize, in 1980.

Sanger (dideoxy) DNA sequencing

Sanger's dideoxy method of DNA sequencing was the first method that was used routinely for sequencing of DNA in the laboratory. The following components are required for Sanger sequencing:

A DNA template to be sequenced
An oligonucleotide primer labelled at the 5'-end with ³²P
A DNA sequencing polymerase
Four deoxynucleoside triphosphates: dATP, dGTP, dCTP, dTTP
Four dideoxy nucleoside triphosphates (nucleoside triphosphates lacking both 2'- and 3'- hydroxy groups): ddATP, ddGTP, ddCTP, ddTTP (Figure 5)

Sanger sequencing is a modified form of DNA replication. The primer hybridizes to a specific locus on the template and the polymerase binds and incorporates nucleotides to assemble a reverse complementary copy of the template. This process would not provide any information on the sequence of the template and for this purpose four sequencing reactions are carried out in separate tubes. In each tube a small quantity of the key ingredient, a 2',3'-dideoxy nucleoside triphosphate is added. The dideoxy nucleoside triphosphate ddATP is added to tube 1, ddGTP to tube 2, ddCTP to tube 3 and ddTTP to tube 4.

Figure 5

Structures of dideoxy nucleoside triphosphoates (ddNTPs)

The polymerase enzyme does not discriminate between deoxynucleoside triphosphates (dNTPs) and dideoxynucleoside triphosphates (ddNTPs), so either can be added at each step. If a dNTP is added, the DNA chain will continue to grow; if a ddNTP is added, the DNA chain will terminate, as it has no 3'-hydroxyl group to react with an incoming nucleoside triphosphate: no further nucleosides can be added. The result in each tube is a mixture of oligonucleotides of different lengths, all terminated with a particular ddNTP: in tube 1 all the terminations will be at A, in tube 2 at G, in tube 3 at C, and in tube 4 at T. The oligos can then be separated according to their size by electrophoresis. If all four ladders are run side by side on a polyacrylamide gel, and the gel is exposed to a photographic film, the ³²P-labelled fragments will produce an image that can be used to read the DNA sequence (which will be the reverse-complement of the template) (Figure 6). In practice it is possible to sequence around 300 bases of DNA by this method.

Figure 6

Sanger (dideoxy) sequencing

Fluorescence-based dideoxy-DNA sequencing

In the automated high-throughput fluorescent version of Sanger sequencing, an unlabelled oligonucleotide primer is used, along with a thermostable DNA polymerase, four normal deoxynucleoside triphosphates, and four dideoxy nucleoside triphosphates with different fluorescent labels on them (Figure 7).

Now only one sequencing reaction is necessary because termination in ddA gives the DNA fragment a particular fluorecent colour, ddG a different colour, ddC a third colour and ddT a fourth colour. The nature of the fluorescent dyes depends upon the DNA sequencer used, but the basic requirement is four dyes with well-resolved fluorescence emission spectra. A common system uses FAM, JOE, TAMRA and ROX as the four dyes.

Dye	Max. absorbtion wavelength / nm	Max. Emission wavelength / nm	Colour
FAM	495	520	blue
JOE	530	555	green
TAMRA	550	575	yellow
ROX	580	605	red

The fragments are separated by electrophoresis and the fluorescent dyes are excited by a laser. The gel image can then be analyzed by a computer and a DNA sequence is produced. Over 800 bases can be read off in a single gel lane. Automated DNA sequencers can analyze 96 different lanes on a single gel (i.e. around 76 800 bases from one gel) and can analyze 3 gels a day, giving an throughput of around 230 400 bases per day or more than 50 000 000 bases a year. Machines have been developed to analyze 384 sequencing reactions simultaneously. Other recent innovations include the development of dideoxy nucleoside triphosphates labelled with two fluorescent dyes ("Big Dye chemistry"). Excitation of one dye (usually fluorescein) at its λ_max of 495 nm results in emission at 520 nm which is transferred by FRET to the second dye which has a λ_max close to 520 nm. The second dye fluoresces strongly at a higher wavelength. This produces a stronger fluorescent signal than would be obtained by direct excitation of the second fluorescent dye at 495 nm. Advances have also been made in gel technology. The use of capillary gels rather than flat bed gels has facilitated automation of sample loading and analysis, providing even higher throughput.

Figure 7

Fluorescent Sanger (dideoxy) sequencing

Next-generation sequencing

The Human Genome Project achieved the complete sequencing of the 3 billon base pairs of the human genome, as a result of a huge collaborative effort between 1990 and 2003. The Human Genome Project relied on (albeit high optimized and automated) Sanger sequencing. It is now possible to sequence an entire human genome in a matter of days, thanks to a new generation of sequencing technologies known collectively as next-generation DNA sequencing.

Sequencing modified DNA

Existing DNA sequencing methods (including next-generation sequencing) are not able to detect modified bases. With the recent surge in interest in epigenetics, the failure to distinguish between cytosine and 5-methylcytosine (both of which form Watson-Crick base pairs with guanine) is a serious drawback of current sequencing technologies.

Bisulfite sequencing

Bisulfite (HSO₃^-) deaminates unmethylated cytosine to uracil, but does not react with methylcytosine (Figure 8). This provides a method for sequencing DNA containing 5-methylcytosine bases. The DNA is sequenced before and after bisulfite treatment: any change from cytosine to uracil is ascribed to unmethylated cytosine, while cytosine bases that remain after bisulfite treatment are assumed to be methylated in the original sample.

Bisulfite conversion of cytosine to uracil Bisulfite (HSO3-) converts unmethylated cytosine to uracil, but does not convert methylcytosine to thymine.

Figure 8

Bisulfite conversion of cytosine to uracil Bisulfite (HSO₃^-) converts unmethylated cytosine to uracil, but does not convert methylcytosine to thymine.

Bisulfite sequencing Sequencing of DNA before and after treatment with bisulfite (HSO3-), which deaminates unmethylated cytosine bases to uracil, allows the methylation state of a DNA sample to be determined.

Figure 9

Bisulfite sequencing Sequencing of DNA before and after treatment with bisulfite (HSO₃^-), which deaminates unmethylated cytosine bases to uracil, allows the methylation state of a DNA sample to be determined.

Commercially available bisulfite conversion kits make the procedure routine, but it is nevertheless costly: care must be taken that all unmethylated cytosines are deaminated, and each DNA sample must (of course) be sequenced twice.

DNA Fingerprinting

DNA fingerprinting was invented in the 1980s by Sir Alec Jeffreys at the University of Leicester. Human DNA can be analyzed by this method to identify individuals at the genetic level with a far greater degree of certainty than previous forensic methods such as blood group determination or traditional fingerprint analysis. DNA fingerprinting (also called DNA profiling, DNA typing or genetic fingerprinting) can also be used to determine the relationship between individuals (e.g. paternity testing).

DNA fingerprinting quickly became the technological basis of the UK National DNA database, which contains millions of personal profiles. It is now a major resource that is used routinely in the detection and conviction of criminals, and produces hundreds of matches to DNA found at crime scenes every week. The analysis of short tandem repeats (STRs) forms the basis of forensic DNA profiling systems used throughout the world.

Short tandem repeats

The DNA of two randomly chosen humans differs by around 1 in 1000 bases; in other words, we are 99.9% the same. It is this similarity that makes the sequencing of the "reference" human genome relevant to all of us. Certain regions of our DNA contain more differences than others, and short tandem repeats are an example of a region of DNA that exhibits large variations between individuals.

DNA fingerprinting depends on the the analysis of short tandem repeats (STRs), short repeating patterns of two or more nucleotides (e.g. (CA)_n or (ACGT)_n, where n is several hundred). For example, in the sequence CGTCAGCACACACACACACACACACACACACATGGCGTG, the dinucleotide CA is repeated 13 times (n = 13).

Tens of thousands of different short tandem repeats, or microsatellites have been identified in the human genome. STRs are observed at the same positions on chromosomes (loci) in different members of the population, but the number of repeats (n) varies between individuals. This variation in number of repeats is an example of polymorphism.

STR Analysis

STR analysis uses PCR to measure the number of repeats at specific loci. Primers bind to the DNA at specific STR loci and, are extended by PCR. The length of the PCR product depends on the number of repeats. If the PCR primers are labelled, the PCR products will be labelled, allowing the products to be detected at the end of the reaction. For each STR locus, there will be two PCR products (one for each of two alleles).

The simultaneous analysis of multiple different STR loci enables a unique profile of an individual to be built up. Several PCR reactions are carried out simultaneously in a single tube at different STR loci, giving several products (two for each locus). The following components are required:

A DNA sample, e.g. a single human hair from the scene of a crime, or buccal cells from a mouth scrape of a suspect
Two oligonucleotide PCR primers: one primer labelled at the 5'-end with ³²P, and one unlabelled reverse primer
A thermostable DNA polymerase
Four deoxynucleoside triphosphates: dATP, dGTP, dCTP, dTTP.

When the labelled PCR products are run on a polyacrylamide gel, they separate according to size. The result is a "DNA ladder" that is characteristic of an individual (Figure 10).

Figure 10

DNA Fingerprinting by STR analysis Short tandem repeats (STRs), di-, tri- or tetranucleotide units that are repeated several times, exist in everyone's genome. When amplified by PCR using labelled primers, labelled DNA fragments of different lengths are generated. These fragments, separated by gel electrophoresis or capillary electrophoresis, give a unique DNA "barcode" of an individual.

The use of multiple loci provides a very high degree of certainty that no two individuals in a population will have the same profile (unless they are identical twins). Some current forensic systems use 10 (e.g. United Kingdom) or 13 (e.g. United States) STR loci. Kits containing PCR primers for the standard STR loci are sold commercially.

Fluorescent STR analysis

In a more modern variant of STR analysis, the PCR primers are labelled with fluorescent dyes. Primers for different STR loci are labelled with different fluorescent dyes, adding a second dimension to the assay (Figure 11) As it has so far been possible to develop only a limited number of fluorescent dyes with well-resolved spectral characteristics, three different fluorescent dyes are typically used.

Figure 11

Fluorescent STR analysis In fluorescent STR analysis, the PCR primers are fluorescently labelled, which leads to fluorescently-labelled PCR products. The use of different fluorescent labels means that the products and bands originating from different STR loci can be more easily distinguished.

Paternity testing

For each STR locus, an individual inherits one allele from each parent (the same allele may be inherited from both parents). Therefore, half of the bands in a child's DNA profile are inherited from the mother, and half from the father. If a child's DNA profile contains bands that are from neither the mother nor the supposed father, then the supposed father is not the child's natural father (Figure 12).

Figure 12

STR analysis in a parentage dispute An simplified example of the use of STR (short tandem repeat) analysis in a parentage dispute (using 3 STR loci). Half of the STR bands come from the mother and half from the father. Some bands are inherited from both the mother and the father. If a child's DNA profile contains bands that are present in neither the mother nor the supposed father's DNA profiles, then the supposed father is not the child's natural father. Should the alleged father in this example be instructed to pay child support?

Single nucleotide polymorphisms (SNPs)

Mutations occur in DNA as a result of mistakes in DNA replication and chemical damage. These mutations are potentially damaging to an organism, and a panel of DNA repair mechanisms exist to reverse them. Polymorphisms are variations in a DNA sequence that are established and stable in the population, and are not hazardous. There are two or more equally acceptable alternatives, one of which happens to be more common than the others. In general, if a variation has a frequency of 1 per cent or more in the population, it is a polymorphism rather than a mutation.

Single nucleotide polymorphisms (SNPs, pronounced snips) are the most common type of polymorphism in the human genome. One of two specific bases (alleles) will occur at a SNP site; for example, A will occur in some members of population, and G in others. SNPs occur about once every 1000 base pairs, making up the bulk of the 3 x 10⁶ variations in the genome, and tend to remain stable in the population. SNPs occur in genes and also in the surrounding regions of the genome that control gene expression.

The effect of one allele of a particular SNP on a gene may not be large - perhaps influencing the activity of the encoded protein in a subtle way - but even subtle effects can influence susceptibility to common diseases, such as coronary thrombosis or Alzheimer's disease. By studying large numbers of SNPs in known sufferers of a disease it is possible to determine whether the disease has a genetic link. In such cases, the larger the number of individuals studied, the greater the likelihood of a genetic link emerging. Once such a link is found, the susceptibility of any individual in the population to contracting the disease can be predicted. SNP analysis is likely to play an increasingly influential role in medicine and diagnostics in the future; for example, in predicting an individual's reaction to a drug, so that the appropriate treatment can be prescribed (pharmacogenetics).

In many SNPs the two alleles have about the same frequency in the population, i.e. if the two bases that can occur at a particular locus are A and G, half the population will have adenine and the other half will have guanine. Clearly there has been no evolutionary pressure to favour one or other of the alleles at such loci. These kinds of SNPs are very useful in genetic analysis. In contrast, if an allele occurs at a very low level, the SNP is much less useful as an analytical tool, because it will be encountered rarely. Consequently, in DNA that has been pooled from a large number of individuals (e.g. from a group of people suffering from heart disease), it will be difficult to detect the presence of the minor allele, even if it is over-represented in this group relative to a group that is not suffering from the disease.

The analysis of single nucleotide polymorphisms can also provide information on physical characteristics. An example of this is first commercial test for eye colour (Retinome™). In addition to eye colour, SNPs have been linked with other features of human phenotype such as hair and skin colour. Analysis of a large number of phenotypic SNPs is equivalent to building an "identikit" picture of an individual, and is therefore a powerful accompaniment to STR analysis for forensic investigators. The challenge in analyzing SNPs is equivalent to that of analyzing DNA for single point mutations, as the phenomena of polymorphism and mutation are related.

DNA diagnostics and mutation detection

DNA diagnostics involves identification and analysis of a region of genomic or mitochondrial DNA associated with a particular disease. Often this DNA sequence will be responsible for expression of a faulty protein, but it could also be a region that controls gene expression. When such a DNA sequence has been identified, its presence (or absence) in the DNA sample of an individual must be determined. DNA screening for mutations depends upon the development of sensitive, rapid, accurate and economical procedures, in which PCR amplification is used in conjunction with an appropriate probe technology (real-time PCR).

DNA probe-based technologies can be used to detect differences as small as single nucleotide substitutions, insertions and deletion mutations. Such point mutations can cause genetic diseases such as sickle cell anaemia, cystic fibrosis, phenylketonuria and Huntington's disease. Genetic analysis is important in pre-and postnatal diagnosis, and genetic information can be also be used to predict the susceptibility of individuals to exogenous risks such as dietary or environmental factors. Infectious diseases such as measles, rubella, HIV, Hepatitis A and B, and diseases caused by pathogenic organisms such as Salmonella and Candida, can be diagnosed using DNA probe technologies. Some oligonucleotide probe-based technologies are sufficiently selective to differentiate between closely related organisms/viruses such as the type 1 and 2 variants of Herpes simplex virus (HSV-1 and HSV-2).

Real-time PCR

Real-time PCR is a variation on the PCR theme that combines normal PCR amplification of DNA with simultaneous detection of the PCR product, usually in a single reaction tube. In PCR, the amount of double-stranded DNA increases with each cycle. After multiple cycles of PCR, there is a large increase in the amount of DNA. In real-time PCR (also called quantitative PCR, or qPCR), an agent that binds to double-stranded DNA is added to the PCR reaction. As double-stranded DNA is produced, the agent binds to the newly-synthesized DNA, and produces a signal allowing the reaction to be monitored in real time. While the aim of PCR is the amplification of DNA, the purpose of real-time PCR is the analysis of a DNA sample or reaction.

Fluorescent real-time PCR is a combination of PCR amplification and fluorescence detection. In its simplest form, fluorescent real-time PCR involves the use of an organic dye that is fluorescent only when bound to a DNA duplex. When such a dye is added at the beginning of a PCR reaction an increase in fluorescence occurs as the number of DNA duplexes increases, and this is indicative of successful PCR. SYBR Green (Figure 13) is an example of a molecule that binds to double-stranded DNA and becomes fluorescent on binding (the dsDNA-dye complex is fluorescent).

Figure 13

SYBR Green Structure of SYBR Green I, a fluorescent molecule that binds to double-stranded DNA.

The SYBR-Green real-time PCR method has severe limitations as it is non-specific, i.e. a positive result is obtained regardless of the nature of the PCR product. As PCR is prone to artefacts such as primer-dimer formation, simple amplification using unselective dyes is not always very informative, and probe-based methods provide more meaningful results.

Probe-based real-time PCR

When using PCR in human diagnostics it is important to be certain of the precise nature of the product. identification of a key sequence in the PCR product (the amplicon) can be achieved by adding a fluorogenic DNA probe (a short synthetic oligonucleotide that is complementary to a specific sequence in the PCR amplicon, and does not fluoresce unless it binds to the amplicon) to the PCR reaction. Several different types of probes fulfil these criteria, and these are discussed in detail below.

When a DNA probe is used in real-time PCR, a positive signal is obtained only if the PCR amplicon contains the complementary sequence to the fluorogenic probe: the fluorescent signal is sequence-specific. For example, clinical samples can be tested for genes that cause cystic fibrosis as follows: a probe that is complementary to a mutated region of the cystic fibrosis gene (the target) is synthesized, and PCR is performed on the sample in the presence of this probe. During the course of PCR amplification the number of DNA molecules (PCR amplicons) increases. If the mutation is present, the concentration of probe-target hybrids also increases, and the fluorescence signal grows in a predictable manner, indicating that the mutated gene is indeed present. However, if no fluorescent signal is generated, this is indicative that the patient does not carry that specific mutation. In a real clinical application both mutant and wild-type probes are used in parallel. A person suffering from the disease will give a positive signal from the mutant probe, while an unaffected person will give a positive signal from the wild-type probe. A carrier of the disease might give a positive signal from both probes.

Specialized equipment is necessary for fluorogenic real-time PCR and a number of instruments have been designed for this purpose. They all consist of a thermal cycler to drive the PCR reaction, a light source to excite the fluorescent dye(s), and a fluorescence detector, together with a computer to control the instrument and process the data.

In general "fluorogenic" probes contain a fluorescent dyes and a fluorescence quencher. They are non-fluorescent in the absence of a target nucleic acid because the quencher absorbs energy from the excited fluorophore, and this energy is dissipated as heat or radiation at a higher wavelength. There are two separate mechanisms for fluorescence quenching: collisional quenching and FRET quenching.

The TaqMan assay

The TaqMan assay is the most widely used real-time method for the analysis of PCR products, and is used extensively in SNP analysis and mutation detection. A TaqMan probe consists of an oligonucleotide labelled with a fluorophore at one end, e.g. 5'-FAM (5'-fluorescein), and a fluorescent quencher at the other, e.g. 3'-TAMRA. Excitation of fluorescein at its absorption wavelength of 495 nm would normally lead to fluorescence emission at 525 nm. However, this falls within the broad absorption spectrum of the TAMRA dye which is in close proximity in the TaqMan probe, so energy is absorbed by the TAMRA dye owing to fluorescence resonance energy transfer (FRET) and fluorescence is observed at the emission wavelength of TAMRA (585 nm) rather than at the emission frequency of FAM.

The TaqMan method is described in Figure 14. In each cooling cycle, prior to the extension phase, the TaqMan probe hybridizes to its complementary sequence in the PCR amplicon. When Taq polymerase encounters the TaqMan probe during polymerization, the 5' to 3' exonuclease activity of the enzyme leads to digestion of the probe. This separates the fluorescence donor (FAM) from the acceptor (TAMRA) and excitation at 495 nm now leads to fluorescence emission by FAM at 525 nm. No energy transfer from FAM to TAMRA can now take place, as the two fluorescent dyes are too far apart. Therefore the TAMRA dye does not fluoresce when the TaqMan probe has been digested. Overall, an increase in fluorescence emission at 525 nm and a decrease at 585 nm is indicative of a positive PCR reaction, and importantly proves the presence of the correct amplicon. Other fluorescent dyes can be used in TaqMan probes provided they have suitable fluorescence emission and absorption spectra.

Figure 14

The TaqMan assay

Fluorescence resonance energy transfer (FRET)

The TaqMan assay utilizes Fluorescence resonance energy transfer (FRET), a powerful tool that has been used extensively as a spectroscopic "ruler" to examine DNA secondary structure. In FRET, an excited fluorophore (energy donor) transfers its energy to a neighbouring chromophore or fluorophore (acceptor) non-radiatively through induced dipole-dipole interactions when the dipoles are in approximately parallel orientations. For FRET to occur there must be an overlap between the emission spectrum of the donor and the absorption spectrum of the acceptor. The efficiency of energy transfer between the donor and the acceptor is inversely proportional to the sixth power of the distance between the two fluorophores (1 / r⁶). The optimum distance (r) for non-radiative transfer of energy is between 10–100 Å for most common fluorophores.

Figure 15

Mechanism of fluorescence quenching

Molecular Beacons

A Molecular Beacon is a chemically modified oligonucleotide that adopts a stem-loop structure in the absence of a complementary target sequence. The loop consists of a probe that is designed to hybridize to a specific sequence in the amplicon during PCR, and the stem is constructed of two short complementary oligonucleotides, one labelled with a fluorophore and the other with a non-fluorescent quencher. In the stem-loop form, the fluorophore and quencher are held in very close proximity by the short duplex structure, and, if the fluorophore is excited, energy is absorbed by the quencher and is dissipated as heat (non-radiative energy transfer). This is the "closed" or "dark" state of the Molecular Beacon (Figure 16, top).

In each cycle of PCR, during the denaturation step, the stem-loop of the Molecular Beacon opens (i.e. the base pairs in the stem dissociate or "melt"). During the subsequent cooling (annealing) phase, the loop of the Molecular Beacon hybridizes to its complementary sequence in the amplicon, to form a short duplex. In this form (the "open" form) the stems are too far apart to associate, the fluorophore and quencher are kept apart, and irradiation produces a fluorescent signal (Figure 16, bottom). During real-time PCR, fluorescence must be measured at a temperature at which the stem-loop form of the Beacon is stable (normally around 50 °C), so that the unhybridized probes do not produce undesirable background fluorescence.

Figure 16

Molecular Beacons

Originally, Molecular Beacons were designed with the fluorophore 5-(2'-aminoethyl)-amino-napthalene-1-sulfonic acid (EDANS) at the 5'-end and the fluorescence quencher 4-(4'-dimethylaminophenylazo) benzoic acid (DABCYL) at the 3'-end (Figure 17).

Figure 17

DABCYL and EDANS Structures of the fluorescence quencher DABCYL and the fluorophore EDANS.

More recently, a wider range of fluorophores have been used in conjunction with the same DABCYL quencher. Collisional fluorescence quenching is very efficient in the closed form and does not depend strongly on the absorption spectrum of the quencher. However, the level of fluorescence in the open form is limited as the fluorophore and quencher remain relatively close to each other.

Molecular Beacons can be used to quantify the concentration of the amplicon during PCR by measuring the intensity of fluorescence at the annealing stage in each PCR cycle (Quantitative PCR). They have also been used extensively in SNP and mutation analysis. Several Molecular Beacons, each with a unique fluorophore, can be used in multiplex PCR reactions to analyze SNPs at different loci simultaneously.

Scorpion primers

A Scorpion is a PCR primer with a Molecular Beacon attached (via a linker) which acts as a PCR stopper. This linker, normally hexaethylene glycol, isolates the beacon moiety from the primer. After PCR extension of the primer portion of the Scorpion, the resulting amplicon contains a sequence that is complementary to the probe portion. In the denaturation stage of the PCR cycle the amplicon is rendered single-stranded and on cooling the probe element binds to this complement to form an intramolecular duplex. In this form the quencher is no longer positioned close to the fluorophore and a fluorescent signal is produced (Figure 18).

Figure 18

Scorpion primers

Scorpion probes contain several chemical modifications in a single oligonucleotide and therefore are relatively complex molecules to synthesize. However, they have some major advantages over Molecular Beacons. Formation of the active Scorpion during PCR is an intramolecular process and is therefore much faster than the equivalent intermolecular reaction that occurs with Molecular Beacons. Moreover, the active form of Scorpions is kinetically more stable than that of Molecular Beacons, which tend to fall off their target and fold into a non-fluorescent intramolecular hairpin loop. In contrast to TaqMan probes, Scorpions do not depend upon enzymatic cleavage to produce a fluorescent signal, and rapid PCR cycling is therefore possible, resulting in a very fast and reliable detection system.

Other diagnostic methods

DNA microarrays

Oligonucleotides can be chemically attached to the surface of materials such as glass or silicon, on which they form small "spots" of around 100 μm (10⁻⁴ m) in diameter. Large numbers of oligonucleotides can be laid down on a single slide to form a microarray, and single strands of fluorescently-labelled DNA (labelled PCR products or cDNA) can be captured by hybridization. (cDNA is single stranded DNA complementary to the RNA from which it is synthesized by reverse transcription. It gives indirect information on the nature of the various RNA messages expressed in a cell (expression analysis)). If such a microarray contains 1000 spots then in theory it is possible to hybridize a unique complementary nucleic acid sequence to each spot. The identity of the DNA sequence is deduced from the location of the spot to which it hybridizes using a fluorescence scanner.

The fluorescent label attached to the captured nucleic acid strand can be added by a number of different methods. PCR products can be labelled at the 5'-end simply by using a PCR primer containing a 5'-fluorescent dye. PCR primers can be labelled with multiple fluorophores, but these tend to quench each other and also inhibit the PCR reaction. A better way to introduce multiple labels into the PCR product is to use fluorescently labelled deoxynucleoside triphosphates in the PCR or reverse transcriptase reaction (e.g. fluorescein-labelled dT). However, the efficiency of the PCR reaction may be compromised by the chemical modification on the heterocyclic base, which can inhibit the Taq polymerase. A carefully determined mixture of unlabelled and labelled deoxynucleoside triphosphates must therefore be used, and it is rare to achieve labelling densities greater than one fluorophore per 30 nucleotides. Microarray assays can also be carried out in the reverse format by attaching individual PCR products to the slide as discrete spots and probing with a pool of fluorescently labelled oligonucleotides (Figure 19).

Figure 19

DNA microarrays

DNA microarrays are useful in high-throughput mutation, SNP and gene expression analysis because very large numbers of DNA strands can be attached to a single array. Microarrays are amenable to automation by robotic systems, allowing very high throughput. However, they present challenges owing to some undesirable chemical and biophysical properties of molecules on surfaces. Firstly, it is difficult to create very dense arrays. A spot size of 100 μm is achievable, but smaller spots (e.g. 1 μm) would allow far higher numbers of spots per array, permitting the use of smaller volumes of solution-phase DNA and greater throughput. Secondly, the hybridization of complementary DNA molecules on a surface is not nearly as efficient as solution hybridization. To make the system workable, the properties of the surface and the nature of the linker between the surface and the attached DNA must be carefully controlled.

Fluorescence in situ hybridization (FISH)

In situ hybridization (ISH), which allows the identification and visualization of specific DNA sequences on chromosomes using radioactive labels, are discussed in The synthesis and applications of chemically modified oligonucleotides. Fluorescence in situ hybridization (FISH), which extends ISH by employing fluorescence-based detection and visualization by fluorescence microscopy, is an important tool in genetic analysis. The principle of FISH lies in the annealing of a labelled probe to its complementary strand within the chromosomes of fixed cells or tissues, followed by detection of the fluorescent label. The probes (DNA or RNA) are usually prepared by one of three polymerase enzyme-based methods (nick translation, random priming or PCR) which allow the incorporation of fluorescently-labelled deoxynucleoside triphosphates. An average incorporation level of one fluorescent label per 30 nucleotides is typical. The length of a DNA probe can be between 100 bp and 1000 bp. Longer probes increase non-specific background fluorescence but short probes can be difficult to detect owing to insufficient hybridization and low levels of labelling. It is important that the target is accessible to the probe and must be retained in situ, not degraded by nuclease enzymes. Visualisation limits span from an entire chromosome to a 40 kb chromosomal section.

Fluorescence in situ hybridization has been used to identify the positions of genes within a chromosome (although this is less useful with the sequencing of the human genome), chromosome "painting", a technique for visualizing entire chromosomes, and in the characterization and diagnosis of diseases.

Contents

Contents

Sequencing, forensic analysis and genetic analysis

Introduction

The Polymerase Chain Reaction (PCR)

Primer dimer formation

DNA sequencing

Sanger (dideoxy) DNA sequencing

Fluorescence-based dideoxy-DNA sequencing

Next-generation sequencing

Sequencing modified DNA

Bisulfite sequencing

DNA Fingerprinting

Short tandem repeats

STR Analysis

Fluorescent STR analysis

Paternity testing

Single nucleotide polymorphisms (SNPs)

DNA diagnostics and mutation detection

Real-time PCR

Probe-based real-time PCR

The TaqMan assay

Fluorescence resonance energy transfer (FRET)

Molecular Beacons

Scorpion primers

Other diagnostic methods

DNA microarrays

Fluorescence in situ hybridization (FISH)