Themes > Science > Life Sciences > Physical Anthropology > Reproduction and Heredity > Beyond Biology

Instrumentation and Informatics

From the start, it has been clear that the Human Genome Project would require advanced instrumentation and automation if its mapping and sequencing goals were to be met. And here, especially, the DOE's engineering infrastructure and tradition of instrumentation development have been crucial contributors to the international effort. Significant DOE resources have been committed to innovations in instrumentation, ranging from straightforward applications of automation to improve the speed and efficiency of conventional laboratory protocols (see, for example, Faster, smaller, cheaper, top image) to the development of technologies on the cutting edge -- technologies that might potentially increase mapping and sequencing efficiencies by orders of magnitude.

On the first of these fronts, genome researchers are seeing significant improvements in the rate, efficiency, and economy of large-scale mapping and sequencing efforts as a result of improved laboratory automation tools. In many cases, commercial robots have simply been mechanically reconfigured and reprogrammed to perform repetitive tasks, including the replication of large clone libraries, the pooling of libraries as a prelude to various assays, and the arraying of clone libraries for hybridization studies. In other cases, custom-designed instruments have proved more efficient. A notable illustration is the world's fastest cell and chromosome sorter, developed at Livermore and now being commercialized, which is used to sort human chromosomes for chromosome-specific libraries. Other examples include a high-speed, robotics-compatible thermal cycler developed at Berkeley, which greatly accelerates PCR amplifications, and instruments developed at Utah for automated hybridization in multiplex sequencing schemes.

Faster, smaller, cheaper (74k GIFs)

Smaller is better -- and other developments

Beyond "mere" automation are efforts aimed at more fundamental enhancements of established techniques. In particular, a number of DOE-supported efforts aim at improved versions of the automated gel-based Sanger sequencing technique. For example, in place of the conventional slab gels, ultrathin gels, less than 0.1 millimeter thick, can be used to obtain 400 bases of sequence from each lane in a hour's run, a fivefold improvement in throughput over conventional systems. Even faster speedups are seen when arrays of 0.1-millimeter capillaries are used as the separation medium. Both of these approaches exploit higher electric field strengths to increase DNA mobility and to reduce analysis times. And Livermore scientists are looking beyond even capillaries, to sequencing arrays of rigid glass microchannels, supplemented by automated gel and sample loading.

The capillary approach is especially ripe for further development. Challenges include providing uniform excitation over arrays of 50 to 100 capillaries and then efficiently detecting the fluorescence emitted by labeled samples. Technologies under investigation include fiber-optic arrays, scanning confocal microscopy, and cooled CCD cameras. Some of this effort has already been transferred to the private sector, and tenfold improvements in speed, economy, and efficiency are projected in future commercial instruments.

The move toward miniaturization is afoot elsewhere as well. Building on experiences in the electronics industry, several DOE-supported groups are exploring ways to adapt high-resolution photolithographic methods to the manipulation of minuscule quantities of biological reagents, followed by assays performed on the same "chip." Current thrusts of this "nanotechnology" approach include the design of microscopic electrophoresis systems and ultrasmall-volume, high-speed thermal cycling systems for PCR. A miniaturized, computer-controlled PCR device under development at Livermore operates on 9-volt batteries and might ultimately lead to arrays of thousands of individually controlled micro-PCR chambers.

Another miniaturization effort aims at the fabrication of high-density combinatorial arrays of custom oligomers (short chains of nucleotides), which would make feasible large-scale hybridization assays, including sequencing by hybridization. This innovative technique uses short oligomers that pair up with corresponding sequences of DNA. The oligomers are placed on an array by a process similar to that of making silicon chips for electronics. Successful matches between oligomers and genomic DNA are then detected by fluorescence, and the application of sophisticated statistical analyses reassembles the target sequence. This same technology has already been used for genetic screening and cDNA fingerprinting. Faster, smaller, cheaper (middle image) illustrates a DOE-supported application of high-density oligonucleotide arrays to the detection of mutations in the HIV-1 genome. Similar approaches can be envisioned to understand differences in patterns of gene expression: Which genes are active (which are producing mRNA) in which cells? Which are active at different times during an organism's development? Which are active, or inactive, in disease?

Sequencing by hybridization is only one of several forward-looking ideas for revolutionizing sequencing technology. In spite of continuing improvements to sequencers based on the classic methods, it is nonetheless desirable to explore altogether new approaches, with an eye to simplifying sample preparation, reducing measurement times, increasing the length of the strands that can be analyzed in a single run, and facilitating interpretation of the results. Over the course of the past few years, several alternative approaches to direct sequencing have been explored, including atomic-resolution molecular scanning, single-molecule detection of individual bases, and mass spectrometry of DNA fragments.

All of these alternatives look promising in the long term, but mass spectrometry has perhaps demonstrated the greatest near-term potential. Mass spectrometry measures the masses of ionized DNA fragments by recording their time-of-flight in vacuum. It would therefore replace traditional gel electrophoresis as the last step in a conventional sequencing scheme. Routine application of this technique still lies in the future, but fragments of up to 500 bases have been analyzed, and practical systems based on high-resolution mass separations of DNA fragments of fewer than 100 bases are currently being developed at several universities and national laboratories.

Another innovative sequencing method is under investigation at Los Alamos. As depicted in Faster, smaller, cheaper (bottom image), each of the four bases (A, T, C, G) in a single strand of DNA receives a different fluorescent label, then the bases are enzymatically detached, one at a time. The characteristic fluorescence is detected by a laser system, thereby yielding the sequence, base by base. This approach is beset by major technical challenges, and direct sequencing has not yet been achieved. But the potential benefits are great, and much of the instrumentation for sensitive detection of fluorescence signals has already proved useful for molecular sizing in mapping applications.

Dealing with the data

Among the less visible challenges of the Human Genome Project is the daunting prospect of coping with all the data that success implies. Appropriate information systems are needed not only during data acquisition, but also for sophisticated data analysis and for the management and public distribution of unprecedented quantities of biological information. Further, because much of the challenge is interpreting genomic data and making the results available for scientific and technological applications, the challenge extends not just to the Human Genome Project, but also to the microbial genome program and to public- and private-sector programs focused on areas such as health effects, structural biology, and environmental remediation. Efforts in all these areas are the mandate of the DOE genome informatics program, whose products are already widely used in genome laboratories, general molecular biology and medical laboratories, biotechnology companies, and biopharmaceutical companies around the world.

The roles of laboratory data acquisition and management systems include the construction of genetic and physical maps, DNA sequencing, and gene expression analysis. These systems typically comprise databases for tracking biological materials and experimental procedures, software for controlling robots or other automated systems, and software for acquiring laboratory data and presenting it in useful form. Among such systems are physical mapping databases developed at Livermore and Los Alamos, robot control software developed at Berkeley and Livermore, and DNA sequence assembly software developed at the University of Arizona. These systems are the keys to efficient, cost-effective data production in both DOE laboratories and the many other laboratories that use them.

Gene hunts (66k GIF)

The interpretation of map and sequence data is the job of data analysis systems. These systems typically include task-specific computational engines, together with graphics and user-friendly interfaces that invite their use by biologists and other non-computer scientists. The genome informatics program is the world leader in developing automated systems for identifying genes in DNA sequence data from humans and other organisms, supporting efforts at Oak Ridge National Laboratory and elsewhere. The Oak Ridge-developed GRAIL system, illustrated in Gene hunts, is a world-standard gene identification tool. In 1995 alone, more than 180 million base pairs of DNA were analyzed with GRAIL.

A third area of informatics reflects, in a sense, the ultimate product of the Human Genome Project -- information readily available to the scientific and lay communities. Public resource databases must provide data and interpretive analyses to a worldwide research and development community. As this community of researchers expands and as the quantity of data grows, the challenges of maintaining accessible and useful databases likewise increase. For example, it is critical to develop scientific databases that "interoperate," sharing data and protocols so that users can expect answers to complex questions that demand information from geographically distributed data resources. As the genome project continues to provide data that interlink structural and functional biochemistry, molecular, cellular, and developmental biology, physiology and medicine, and environmental science, such interoperable databases will be the critical resources for both research and technology development. The DOE genome informatics program is crucial to the multiagency effort to develop just such databases. Systems now in place include the Genome Database of human genome map data at Johns Hopkins University, the Genome Sequence DataBase at the National Center for Genome Resources in Santa Fe, and the Molecular Structure Database at Brookhaven National Laboratory.


Information provided by: http://www.ornl.gov