Sequencing, forensic analysis and genetic analysis

Contents

Background

Established methods of DNA sequencing, genetic and forensic analysis all depend upon the use of fluorescently-labelled oligonucleotides and/or fluorescent deoxy- or dideoxy-nucleoside triphosphates. These technologies have one other common thread – they require a DNA polymerisation step. This can be the polymerase chain reaction (genetic analysis in STR analysis), a single nucleotide extension (mini-sequencing in SNP analysis), or a combination of polymerisation and DNA chain termination (Sanger DNA sequencing).

The Polymerase Chain Reaction (PCR)

PCR is a technique used widely in molecular biology, diagnostics, forensic science and molecular genetics to amplify a specific region of target DNA (Figure 1). PCR is immensely valuable, permitting the amplification of a few molecules of a precious DNA sample (e.g. at the scene of a crime) to produce microgram quantities of DNA consisting of the specific region of interest only (the amplicon). PCR can be used to amplify any region of DNA from 50 to 25 000 base pairs in length. In PCR, two short oligonucleotides (PCR primers) are designed such that each is complementary to the 3′-end of one of the two target strands at the locus to be amplified.

The polymerase chain reaction (PCR)

Figure 1 | The polymerase chain reaction (PCR)

The region of the template bound by the primers is amplified in a series of cycles (Figure 2). In the first cycle, the double stranded target is separated into two single template strands by heating to 95 °C. It is then cooled to 55 °C to allow the synthetic oligonucleotide primers to anneal to the template strands with their 3' ends facing each other. The temperature is then increased to 72 °C, the optimum temperature for activity of the thermostable Taq DNA polymerase. The polymerase, which requires Mg2+, utilizes deoxyribonucleotide triphosphates (dNTPs) to extend the primers along the length of the template producing two new double strands of DNA. The second cycle of PCR is a repeat of the first, and each newly synthesized single strand also acts as a template for primer annealing and extension. In this case the polymerase can only be extended as far as the locus of the first primer, producing DNA duplexes of a specific length. In all subsequent cycles amplification produces PCR products of a length specified by the loci of the two primers, and these PCR products soon outnumber the original target molecules. In theory, n cycles of PCR will produce 2n PCR products. The DNA polymerase used in PCR (Taq polymerase) was originally isolated from the thermophilic bacterium Thermus aquaticus. It has a half life of more than two hours at 95 °C and is therefore stable to the conditions of PCR.

A single cycle of the polymerase chain reaction

Figure 2 | A single cycle of the polymerase chain reaction

Incorrect amplicons are sometimes generated in PCR owing to primer-dimer formation. This is a particular problem when several PCR reactions are carried out in the same tube (multiplex PCR). It is most likely to occur at the beginning of PCR when the primers are present in high concentrations relative to the template. They can interact with each other and bind to Taq polymerase to produce mini-amplicons. This can happen even if there are several mismatches in the primer-dimer duplexes, as these are stabilized by binding to Taq polymerase. "Hot-start" PCR has been developed to alleviate this problem. In hot-start PCR the reaction mixture initially lacks a crucial component, such as Mg2+ or Taq polymerase. Prolonged heating at the beginning of the first cycle ensures primer dimers are denatured, and at this point the tube can be opened and the missing component added. PCR can now continue as normal. The major disadvantage of this technique is the possibility of contamination during the addition of the essential component to the open reaction tube. A superior alternative involves the use of a reagent that can bind to the Taq polymerase and inhibit its action (e.g. an antibody). The antibody is denatured in the first long heating cycle and the Taq polymerase becomes active. In another method, an essential component of the PCR reaction is enclosed in a wax pellet, which melts upon heating and liberates its contents.

The polymerase chain reaction was invented by Kary Mullis in the 1980s and at that time thermostable polymerases were not available. It was therefore necessary to add a fresh aliquot of normal DNA polymerase at every cycle of PCR, making it a very tedious and difficult procedure. With the advent of thermostable polymerases, PCR is now a routine process and is central to molecular biology, DNA diagnostics and forensic science. PCR is of such far reaching importance that Kary Mullis was awarded the Nobel Prize for his work.

DNA sequencing

The beginning of the 21st century has witnessed a revolution in our knowledge of the DNA sequences of various organisms, the most notable example being the sequencing of the entire human genome. This is of immense importance and analysis of the data will enable us to learn a great deal about evolution, the relationship between different organisms, the mechanisms by which genes are controlled and the hidden languages within the DNA sequence. The next 20 years will undoubtedly lead to major discoveries and will give us a much deeper understanding of genetics at the molecular level. All this has been possible because of developments made by Fred Sanger in Cambridge over 30 years ago. Sanger developed a novel method of DNA sequencing (the dideoxy method) for which he was awarded his second Nobel Prize, in 1980.

Sanger (dideoxy) DNA sequencing

The following components are required:
  • A DNA template to be sequenced
  • An oligonucleotide primer labelled at the 5′-end with 32P
  • A DNA sequencing polymerase
  • Four deoxynucleoside triphosphates: dATP, dGTP, dCTP, dTTP
  • Four dideoxy nucleoside triphosphates (nucleoside triphosphates lacking both 2′- and 3′- hydroxy groups): ddATP, ddGTP, ddCTP, ddTTP (Figure 3)

Sanger sequencing is a modified form of DNA replication. The primer hybridizes to a specific locus on the template and the polymerase binds and incorporates nucleotides to assemble a reverse complementary copy of the template. This process would not provide any information on the sequence of the template and for this purpose four sequencing reactions are carried out in separate tubes. In each tube a small quantity of the key ingredient, a 2',3′-dideoxy nucleotide triphosphate is added. The dideoxy nucleoside triphosphate ddATP is added to tube 1, ddGTP to tube 2, ddCTP to tube 3 and ddTTP to tube 4.

Structures of dideoxy nucleotide triphosphoates (ddNTPs)

Figure 3 | Structures of dideoxy nucleotide triphosphoates (ddNTPs)

The polymerase enzyme does not discriminate between the deoxynucleoside triphosphates (dNTPs) and the dideoxynucleoside triphosphates (ddNTPs), so either can be added at each step. When a dNTP is added, the DNA chain will continue to grow; when a ddNTP is added, the DNA chain will terminate as it has no 3′-hydroxyl group to react with an incoming nucleotide triphosphate: no further nucleosides can be added. The result in each tube is a mixture of oligonucleotides of different lengths, all terminated with the ddNTP for that tube: in tube 1 all the terminations will be at A, in tube 2 at G, in tube 3 at C, and in tube 4 at T. The oligos can then be separated according to their size by gel electrophoresis: if all four ladders are run side by side on a polyacrylamide gel, and the gel is exposed to a photographic film, the 32P-labelled fragments will produce an image that can be used to read the DNA sequence, which will be the reverse-complement of the template. In practice it is possible to sequence around 300 bases of DNA by this method. A schematic representation of a DNA sequencing gel is shown in Figure 4.

Schematic representation of Sanger sequencing

Figure 4 | Schematic representation of Sanger sequencing

Fluorescence-based dideoxy-DNA sequencing

In the modern automated high-throughput fluorescent version of Sanger sequencing, an unlabelled oligonucleotide primer is used, along with a thermostable DNA polymerase, four normal deoxynucleoside triphosphates, and four dideoxy nucleoside triphosphates with different fluorescent labels on them (Figure 5).

Now only one sequencing reaction is necessary because termination in ddA gives the DNA fragment a particular fluorecent colour, ddG a different colour, ddC a third colour and ddT a fourth colour. The nature of the fluorescent dyes depends upon the DNA sequencer used, but the basic requirement is four dyes with well-resolved fluorescence emission spectra. A common system uses FAM, JOE, TAMRA and ROX as the four dyes (Table 1).

Table 1Fluorescent dyes used in fluorescece-based sequencing; their wavelengths of absorption (excitation) and emission, and colours

DyeMax. absorbtion wavelength / nmMax. Emission wavelength / nmColour
FAM495520blue
JOE530555green
TAMRA550575yellow
ROX580605red

The fragments are separated by gel electrophoresis and the fluorescent dyes are excited by a laser. The gel image can then be analysed by a computer and a DNA sequence is produced. Over 800 bases can be read off in a single gel lane. Automated DNA sequencers can analyse 96 different lanes on a single gel (i.e. around 76 800 bases from one gel) and can analyse 3 gels a day, giving an throughput of around 230 400 bases per day or more than 50 000 000 bases a year. Machines have been developed to analyse 384 sequencing reactions simultaneously. Other recent innovations include the development of dideoxy nucleoside triphosphates labelled with two fluorescent dyes ("Big Dye chemistry"). Excitation of one dye (usually fluorescein) at its λmax of 495 nm results in emission at 520 nm which is transferred by FRET to the second dye which has a λmax close to 520 nm. The second dye fluoresces strongly at a higher wavelength. This produces a stronger fluorescent signal than would be obtained by direct excitation of the second fluorescent dye at 495 nm. Advances have also been made in gel technology. The use of capillary gels rather than flat bed gels has facilitated automation of sample loading and analysis, providing even higher throughput.

Schematic representation of fluorescent dideoxy sequencing

Figure 5 | Schematic representation of fluorescent dideoxy sequencing

Fluorescent Sanger sequencing is based on some fundamental chemical and physical principles. The key chemical reaction is nucleophilic attack on the triphosphate moiety of the incoming ddNTP by the 3′-OH group of the previously incorporated nucleotide. Diphosphate (pyrophosphate) is a good leaving group and the specificity of nucleotide incorporation is controlled by a combination of Watson-Crick base pairing and enzyme binding.

STR analysis in Forensic science and Genetics

The powerful technique of DNA fingerprinting, initially involving the analysis of mini- and subsequently microsatellites, was invented by Alec Jeffreys in Leicester. The technique produces a "bar code" of person and human DNA can be analysed by this method to identify individuals at the genetic level with a far greater degree of certainty than traditional blood group determination or physical analysis of fingerprints. DNA fingerprinting can also be used to determine the relationship between individuals (paternity testing).

The method was developed for forensic applications by the UK Forensic Science Service and quickly became the technological basis of the National DNA database, which by 1999 contained over 700,000 personal profiles, giving rise to around 700 matches per week. It is now a major resource that has been very effective in the detection and conviction of criminals.

The analysis of short tandem repeats (STRs) forms the basis of the current forensic DNA profiling system which is now used throughout the world. In this methodology several PCR amplification reactions are carried out simultaneously in a single tube at a number of STR loci (multiplex PCR), and the resultant fluorescent fragments are analysed by automated gel-based methods.

For STR analysis the following components are required:
  • A DNA sample, e.g. a single human hair from the scene of a crime, or buccal cells from a mouth scrape of a suspect
  • Two oligonucleotide PCR primers, one labelled at the 5′-end with 32P, and one unlabelled reverse-primer
  • A thermostable DNA polymerase
  • Four deoxynucleoside triphosphates: dATP, dGTP, dCTP, dTTP.

The method relies on the fact that certain regions in genomic DNA contain di-, tri- or tetranucleotide repeats, for example CA dinucleotide repeats (consecutive repeating units of 2-4 dinucleotides):

CGTCAGCACACACACACACACACACACACACATGGCGTG

An example of the use of STR analysis in a parentage dispute. In this example it is clear that half of the STR bands of the child originate from the mother. Should the alleged father be instructed to pay child support?

Figure 6 | An example of the use of STR analysis in a parentage dispute. In this example it is clear that half of the STR bands of the child originate from the mother. Should the alleged father be instructed to pay child support?

In a modern variant of STR analysis, one PCR primer of each pair is labelled at the 5′-end with a fluorescent dye. Several different fluorescent dyes can be used and this adds another dimension to the assay. Now the different DNA fragments have different colours so in theory it is possible to determine the origin from which each PCR amplicon originated. However, it has so far only been possible to develop a limited number of fluorescent dyes with well-resolved spectral characteristics, so in the current forensic system 11 different loci are amplified using three different fluorescent dyes. The simultaneous analysis of 11 loci provides a very high degree of certainty, and in this system no two individuals in the UK will have the same profile (unless they are identical twins). Analogues of fluorescein have been used in STR analysis, e.g. FAM, TET and HEX.

Single nucleotide polymorphisms (SNPs)

The most common variations in the human genome are single nucleotide polymorphisms (SNPs, pronounced snips). Normally one of two specific bases (alleles) will occur at an SNP site, for example A or G. The frequency of occurrence of the less common of the pair will be greater than 1 per cent. SNPs occur about once every 1000 base pairs in the genome, making up the bulk of the 3 × 106 variations, and the polymorphism tends to remain stable in the population. SNPs occur in genes and also in the surrounding regions of the genome that control gene expression. The effect of one allele of a particular SNP on a gene may not be large − perhaps influencing the activity of the encoded protein in a subtle way − but even subtle effects can influence susceptibility to common diseases, such as coronary thrombosis or Alzheimer's disease. Thus by studying large numbers of SNPs in known sufferers of a disease it is possible to determine if the disease has a genetic link. In such cases, the larger the number of individuals studied, the greater is the likelihood of a genetic link emerging. Once such a link is found, the susceptibility of any individual in the population to contracting the disease can be evaluated. In the field of pharmacogenetics it is theoretically possible that the reaction of an individual to a particular drug can be predicted from SNP analysis, so it is possible to administer the most effective treatment. SNP analysis is therefore of great importance in medicine and is likely to have a major impact on diagnostics in the future.

In many SNPs the two alleles have about the same frequency in the population, i.e. if the two bases that can occur at a particular locus are A and G, half the population will have adenine and the other half will have guanine. Clearly there has been no evolutionary pressure to favour one or other of the alleles at such loci. These kinds of SNPs are very useful in genetic analysis. In contrast, if the less frequent allele of a pair occurs at a level of 1 per cent or less the SNP is much less useful as a statistical/analytical tool. This is because the less common allele will be rarely encountered. Consequently in DNA that has been pooled from a large number of individuals (e.g. from a group of people suffering from heart disease) it will be difficult to detect the presence of the minor allele, even if it is over-represented in this group relative to a group that is not suffering from the disease.

The analysis of single nucleotide polymorphisms can also yield information on physical characteristics. An example of this is first commercial test for eye colour ("Retinome"). In addition to eye colour, SNPs have been linked with other features of human phenotype such as hair and skin colour. Analysis of a large number of phenotypic SNPs is equivalent to building an "identikit" picture of an individual, and is therefore the subject of intense study by forensic scientists.

SNP analysis, DNA diagnostics and mutation detection

The challenge in analysing SNPs is equivalent to that of analysing DNA for single point mutations, as the two phenomena are related. Mutations tend to cause diseases or at least confer a disadvantage to the individual, whereas the effects of SNPs are more subtle. Highly accurate medium throughput methods of SNP and mutation analysis such as TaqMan and Molecular Beacons are described in detail below, followed by a short analysis of high-throughput methods (which are more applicable to population studies).

DNA diagnostics involves identification and analysis of a region of genomic or mitochondrial DNA associated with a particular disease. Often this DNA sequence will be responsible for expression of a faulty protein, but it could also be a region that controls gene expression. When such a DNA sequence has been identified, its presence (or absence) in the DNA sample of an individual must be determined. DNA screening for mutations depends upon the development of sensitive, rapid, accurate and economical procedures, in which PCR amplification is used in conjunction with an appropriate probe technology.

DNA probe-based technologies can be used to detect differences as small as single nucleotide substitutions, insertions and deletion mutations. Such "point mutations" can cause genetic diseases such as sickle cell anaemia, cystic fibrosis, phenylketonuria and Huntington's disease. Genetic analysis is important in pre-and postnatal diagnosis and genetic information can be also be used to determine susceptibility in individuals to exogenous risks such as dietary or environmental factors. Infectious diseases such as measles, rubella and HIV Hepatitis A and B, and diseases caused by pathogenic bacteria such as salmonella and candida, can be clinically diagnosed using DNA probe technologies. Some oligonucleotide probe technologies are sufficiently selective to differentiate between closely related organisms/viruses such as type 1 and 2 variants of Herpes Simplex Virus.

Real-time PCR

Real-time PCR is a recent development in PCR technology that combines normal PCR amplification of DNA with simultaneous detection of the PCR product, usually in a single reaction tube. Fluorescent real-time PCR is a combination of PCR amplification and fluorescence detection. In its simplest form it involves the use of an organic dye that is fluorescent only when bound to a DNA duplex. When such a dye is added at the beginning of a PCR reaction an increase in fluorescence occurs as the number of DNA duplexes increases, and this is indicative of successful PCR. SYBR Green (Figure 7) is an example of a molecule that binds to double-stranded DNA and becomes fluorescent on binding (the dsDNA-dye complex is fluorescent).

This method has severe limitations as it is non-specific, i.e. a positive result is obtained regardless of the nature of the PCR product. As PCR is prone to artefacts such as primer-dimer formation, simple amplification using indiscriminate dyes is not always very informative.

Structure of SYBR Green I − a fluorescent molecule that binds to double-stranded DNA

Figure 7 | Structure of SYBR Green I − a fluorescent molecule that binds to double-stranded DNA

Probe-based real-time PCR

When using PCR in human diagnostics it is very important to be certain of the precise nature of the PCR product. A method must be used that involves the identification of a key sequence in the PCR product (the amplicon). This can be achieved by adding a fluorogenic DNA probe to the PCR reaction, i.e. a short synthetic oligonucleotide that is complementary to a specific sequence in the PCR amplicon, and does not fluoresce unless it binds to the amplicon. A number of probe designs fulfil this criterion and these are discussed in detail below. When such a probe is used, a positive signal is obtained only if the PCR amplicon contains the complementary sequence to the fluorogenic probe: the fluorescent signal is sequence-specific. For example, clinical samples can be tested for genes that cause cystic fibrosis as follows: a probe that is complementary to a mutated region of the cystic fibrosis gene (the target) is synthesized, and PCR is performed on the sample in the presence of this probe. During the course of PCR amplification the number of DNA molecules (PCR amplicons) increases. If the mutation is present, the number of probe-target hybrids also increases, and fluorescence signal grows in a predictable manner. This demonstrates that the mutated gene is indeed present. However, if no fluorescent signal is generated this is indicative that the patient does not carry that specific mutation. In a real clinical application both mutant and "wild-type" probes are used in parallel. A person suffering from the disease will give a positive signal from the mutant probe, while an unaffected person will give a positive signal from the wild-type probe; a carrier of the disease might give a positive signal from both probes.

Specialized equipment is necessary for fluorogenic real-time PCR and a number of instruments have been designed for this purpose. They all consist of a thermocycler to drive the PCR reaction, a light source to excite the fluorescent dye(s), and a fluorescence detector, together with a PC to control the instrument and process the data.

In general "fluorogenic" probes contain a fluorescent dyes and a fluorescence quencher. They are non-fluorescent in the absence of a target nucleic acid because the quencher absorbs energy from the excited fluorophore, and this energy is dissipated as heat or radiation at a higher wavelength. There are two separate mechanisms for fluorescence quenching: collisional quenching and FRET quenching. The most important fluorogenic real-time PCR methods are discussed in detail below.

The TaqMan assay

The TaqMan assay is the most widely used real-time method for the analysis of PCR products and is used extensively in SNP analysis and mutation detection. A TaqMan probe consists of an oligonucleotide labelled with a fluorophore at one end (e.g. 5′-FAM) and a fluorescent quencher at the other (e.g. 3′-TAMRA). Excitation of fluorescein at its absorption wavelength of 495 nm would normally lead to fluorescence emission at 525 nm. However, this falls within the broad absorption spectrum of the TAMRA dye which is in close proximity in the TaqMan probe, so energy is absorbed by the TAMRA dye owing to fluorescence resonance energy transfer (FRET) and fluorescence is observed at the emission wavelength of TAMRA (585 nm) rather than at the emission frequency of FAM.

TaqMan probes are used in real-time PCR (Figure 8) as follows: In each cooling cycle prior to the extension phase, the TaqMan probe hybridizes to its complementary sequence in the PCR amplicon. When Taq polymerase encounters the TaqMan probe during polymerisation, the 5' to 3' exonuclease activity of the enzyme leads to digestion of the probe. This separates the fluorescence donor (FAM) from the acceptor (TAMRA) and excitation at 495 nm now leads to fluorescence emission by FAM at 525 nm. There is now negligible energy transfer to TAMRA as the two fluorescent dyes are too far apart (FRET is described in more detail below). Therefore the TAMRA dye does not fluoresce when the TaqMan probe has been digested. Overall an increase in fluorescence emission at 525 nm and a decrease at 585 nm is indicative of a positive PCR reaction, and importantly proves the presence of the correct amplicon. Other fluorescent dyes can be used in TaqMan probes provided they have suitable fluorescence emission and absorption spectra.

Schematic representation of the TaqMan assay

Figure 8 | Schematic representation of the TaqMan assay

Fluorescence resonance energy transfer (FRET)

The TaqMan assay utilizes Fluorescence resonance energy transfer (FRET), a powerful tool that has been used extensively as a spectroscopic "ruler" to examine DNA secondary structure. In FRET, an excited fluorophore (energy donor) transfers its energy to a neighbouring chromophore or fluorophore (acceptor) non-radiatively through induced dipole-dipole interactions when the dipoles are in approximately parallel orientations. For FRET to occur there must be an overlap between the emission spectrum of the donor and the absorption spectrum of the acceptor. The efficiency of energy transfer between the donor and the acceptor is inversely proportional to the sixth power of the distance between the two fluorophores (1 / r6). The optimum distance (r) for non-radiative transfer of energy is between 10–100 Å for most common fluorophores.

Mechanism of fluorescence quenching

Figure 9 | Mechanism of fluorescence quenching

See the section on FRET quenchers for information on specific FRET quenchers such as BHQs.

Molecular beacons

A molecular beacon is a chemically modified oligonucleotide that adopts a stem-loop structure in the absence of a complementary target sequence. The loop consists of a probe that is designed to hybridize to a specific sequence in the amplicon during PCR, and the stem is constructed of two short complementary oligonucleotides, one labelled with a fluorophore and the other with a non-fluorescent quencher. In the stem-loop form, the fluorophore and quencher are held in very close proximity by the short duplex structure, and, if the fluorophore is excited, energy is absorbed by the quencher and is dissipated as heat (non-radiative energy transfer). This is the "closed" or "dark" state of the Molecular Beacon (Figure 10, top).

In each cycle of PCR, during the denaturation step, the stem-loop of the Molecular Beacon opens (i.e. the base pairs in the stem dissociate or "melt"). During the subsequent cooling (annealing) phase, the loop of the Molecular Beacon hybridizes to its complementary sequence in the amplicon to form a short duplex. In this form ("open form") the stems are too far apart to associate, the fluorophore and quencher are kept apart and irradiation produces a fluorescent signal (Figure 10, bottom). During real-time PCR fluorescence must be measured at a temperature at which the stem-loop form of the Beacon is stable, so that the unhybridized Beacons do not produce undesirable background fluorescence. This is normally around 50 °C.

Molecular beacons

Figure 10 | Molecular beacons

Originally molecular beacons were synthesized with the fluorophore 5-(2′-aminoethyl)-amino-napthalene-1-sulfonic acid (EDANS) at the 5′-end and the fluorescence quencher 4-(4′-dimethylaminophenylazo) benzoic acid (DABCYL) at the 3′-end (Figure 11).

Structures of the fluorescence quencher DABCYL and the fluorophore EDANS

Figure 11 | Structures of the fluorescence quencher DABCYL and the fluorophore EDANS

More recently a wider range of fluorophores have been used in conjunction with the same dabcyl quencher. In Molecular Beacons, collisional fluorescence quenching in the closed form is very efficient and does not depend strongly on the absorption spectrum of the quencher. However, the level of fluorescence in the open form is limited as the fluorophore and quencher remain relatively close to each other.

Molecular beacons can be used to quantify the concentration of the amplicon during PCR by measuring the intensity of fluorescence at the annealing stage in each PCR cycle (Quantitative PCR). They have also been used extensively in SNP and mutation analysis. Several molecular beacons, each with a unique fluorophore, can be used in multiplex PCR reactions to analyse SNPs at different loci simultaneously.

Scorpion primers

A Scorpion is a PCR primer with a Molecular Beacon attached via a linker which acts as a PCR stopper. This linker, (normally hexaethylene glycol), isolates the Beacon element from the primer. After PCR extension of the primer element of the Scorpion, the resultant amplicon contains a sequence that is complementary to the probe element. In the denaturation stage of the PCR cycle the amplicon is rendered single-stranded and on cooling the probe element binds to this complement to form an intramolecular duplex. In this form the quencher is no longer close to the fluorophore and a fluorescent signal is produced (Figure 12).

Scorpion primers

Figure 12 | Scorpion primers

Scorpions contain several chemical modifications in a single oligonucleotide and therefore are relatively complex molecules to synthesize. However, they have some major advantages over Molecular Beacons. Formation of the active Scorpion during PCR is an intra-molecular process and is therefore very fast, unlike Molecular Beacons that rely upon intermolecular probing. Moreover the active form of Scorpions is kinetically more stable than that of Molecular Beacons, which tend to fall off their target and fold into a non-fluorescent intramolecular hairpin loop. In contrast to TaqMan, Scorpions do not depend upon enzymic cleavage to produce a fluorescent signal, and therefore rapid PCR cycling is possible resulting in a very fast and reliable detection system.

High throughput methods of SNP and mutation analysis

DNA microarrays

Oligonucleotides can be chemically attached to the surface of materials such as glass or silicon on which they form small "spots" of around 100 μm (10−4 m) in diameter. Large numbers of oligonucleotides can be laid down on a single microscope slide to form a microarray, and single strands of fluorescently-labelled DNA (labelled PCR products or cDNA) can be captured by hybridization. (cDNA is single stranded DNA complementary to the RNA from which it is synthesized by reverse transcription. It gives indirect information on the nature of the various RNA messages expressed in a cell (expression analysis)). If such a microarray contains 1000 spots then in theory it is possible to hybridize a unique complementary nucleic acid sequence to each spot. The identity of the DNA sequence is deduced from the location of the spot to which it hybridizes using a fluorescence scanner.

The fluorescent label attached to the captured nucleic acid strand can be added by a number of methods. PCR products can be labelled at the 5′-end simply by using a PCR primer with a 5′-fluorescent dye. PCR primers can be labelled with multiple fluorophores, but these tend to quench each other and also inhibit the PCR reaction. A better way to introduce multiple labels into the PCR product is to use fluorescently labelled deoxynucleoside triphosphates in the PCR or reverse transcriptase reaction (e.g. fluorescein-labelled dT). However, the efficiency of the PCR reaction may be compromised by the chemical modification on the heterocyclic base, which can inhibit the Taq polymerase. Therefore a carefully determined mixture of unlabelled and labelled deoxynucleotide triphosphates must be used and it is rare to achieve labelling densities greater than one fluorophore per 30 nucleotides. Microarray assays can also be carried out in the reverse format by attaching individual PCR products to the slide as discrete spots and probing with a pool of fluorescently labelled oligonucleotides (Figure 13).

DNA microarrays

Figure 13 | DNA microarrays

DNA microarrays are potentially important in high-throughput mutation, SNP and gene expression analysis because very large numbers of DNA strands can be attached to a single array. DNA microarrays are also amenable to automation by robotic systems, allowing very high throughput. However, microarrays present a number of challenges owing to some undesirable chemical and biophysical properties of molecules on surfaces. Firstly, it is difficult to create very dense arrays. A spot size of 100 μm is achievable, but much smaller spots (e.g. 1 μm) would allow far higher numbers of spots per array, permitting the use of smaller volumes of solution-phase DNA and far greater throughput. Secondly, the hybridization of complementary DNA molecules on a surface is not nearly as efficient as solution hybridization. To make the system workable, the properties of the surface and the nature of the linker between the surface and the attached DNA must be carefully controlled. Many attachment chemistries have been investigated and a standard method is to link the amino group of a 5′-amino-modified DNA strand to a carboxylic acid on the surface of the microarray by a diimide-mediated coupling reaction (Figure 14).

Scheme showing the steps involved in array functionalization

Figure 14 | Scheme showing the steps involved in array functionalization

The mechanism of the coupling reaction is shown in Figure 16. The acid is converted into an active ester by reaction with a diimide (such as diisopropylcarbobiimide, DIC, as in Figure 15). The active ester is then reacted with a nucleophile such as 1-hydroxybenzotriazole (HOBt) in a fast reaction to form a second active ester. The side-product of this reaction is a urea. If this step is omitted, slow rearrangement to the unreactive N-acylurea can occur. Finally, reaction of an amine with the HOBt ester forms the amide.

Structures of the diimide coupling reagents dicyclohexyl carbodiimide (DCC), diisopropyl carbodiimide (DIC) and 1-ethyl-3-(3'-dimethylamino)carbodiimide (EDC)

Figure 15 | Structures of the diimide coupling reagents dicyclohexyl carbodiimide (DCC), diisopropyl carbodiimide (DIC) and 1-ethyl-3-(3′-dimethylamino)carbodiimide (EDC)

Different coupling agents produce ureas with different properties, which makes different diimides suitable for use in different conditions (Table 2).

Table 2Solubility and use of diimide coupling reagents

DiimideSolubility of ureaUse
Dicyclohexylcarbodiimide (DCC)Poorly soluble − removed by filtrationSolution-phase chemistry
Diisopropylcarbodiimide (DIC)Soluble in DCM − removed by DCM washesSolid-phase chemistry
1-ethyl-3-(3′-dimethylamino)carbodiimide HCl salt (EDC)Soluble in water − removed by aqueous washesSolution-phase chemistry
Mechanism of the diimide coupling reaction

Figure 16 | Mechanism of the diimide coupling reaction

Other coupling methods include linking the amine to an aldehyde on the surface by imine formation followed by reduction to stabilize the linkage, and the use of epoxy-coated slides (Figure 17).

Array functionalization

Figure 17 | Array functionalization

Fluorescence in situ hybridization (FISH)

The principles of in situ hybridization are discussed in The synthesis and applications of chemically modified oligonucleotides, but fluorescence in situ hybridization (FISH) is worthy of particular attention. FISH is an important tool in genetic analysis as it allows the identification of the presence and location of cellular DNA or RNA within morphological preserved chromosome preparations. The principle lies in the annealing of a labelled probe to its complementary strand within the chromosomes of fixed cells or tissues, followed by detection of the fluorescent label. In FISH, the target is embedded in a complex matrix that can hinder probe access and destabilize the probe:target hybrid. The probes (DNA or RNA) are usually prepared by one of three polymerase enzyme-based methods: nick translation, random priming or PCR which allow the incorporation of fluorescently-labelled deoxynucleotide triphosphates (e.g. Figure 17). An average of one fluorescent label per 30 nucleotides is a typical level of incorporation. The length of a DNA probe can be between 100 bp and 1000 bp. Longer probes increase non-specific background but short probes can be difficult to detect owing to insufficient hybridization and low levels of labelling. It is important that the target is accessible to the probe and must be retained in situ, not degraded by nuclease enzymes. Visualisation limits span from an entire chromosome to a 40 kb chromosomal section.

FISH has been used in toxicological studies to monitor the effect of radiation on chromosomal aberrations (structural and numerical). Chromosome painting by combinatorial or ratio labelling of specific probes has led to the painting of all 24 different human chromosomes with distinct colours which can provide a general screening test for chromosome abnormalities such as Down's syndrome. Other applications include monitoring changes in aneuploidy in sperm (a major cause of birth defects). Genetic mapping with probes labelled with multiple colours allows the order of occurrence of genes along a chromosome to be determined.

See also

This article on Sequencing, forensic analysis and genetic analysis is part of the Nucleic Acids Book.