Computational Biology


Imaging has rapidly become a defining tool of the current era in biological research. But finding the right method and optimizing it for data collection can be a daunting process, even for an established imaging laboratory. Cold Spring Harbor Protocols is one of the world’s leading sources for detailed technical instruction for implementation of imaging methods, and the November issue features articles detailing standard and cutting-edge laboratory techniques.

The confocal microscope is a workhorse of the modern life science laboratory. Its popularity stems from its ability to permit volume objects to be imaged and rendered in three dimensions. But the confocal microscope itself does not produce three-dimensional images; in fact, it only images very thin sections of a specimen that lie within its focal region. To produce a three-dimensional image, a series of thin optical sections are collected, and computer processing is used to combine them into a volumetric rendering. In the first of November’s featured articles, Spinning-Disk Microscopy Systems, Oxford University’s Tony Wilson reviews the many methods for producing optical sections, of which the confocal optical system is just one. He also describes a number of convenient methods of implementation that can lead to, among other things, real-time image formation. The paper, like all our featured articles, is freely available to subscribers and non-subscribers alike.

Large segments of DNA can vary in copy number between individuals. Such copy number variations (CNVs) contribute greatly to genetic diversity and are also thought to be associated with susceptibility or resistance to some diseases, including cancer. Simple Copy Number Determination with Reference Query Pyrosequencing (RQPS), featured in the September issue of Cold Spring Harbor Protocols, provides an assay for determining the copy number of any allele in the genome. The method, from Raphael Kopan and colleagues at Washington University, takes advantage of the fact that pyrosequencing can accurately measure the ratio of DNA fragments in a mixture that differ by a single nucleotide. A reference allele with a known copy number and a query allele with an unknown copy number are engineered with single nucleotide variations, and the ratio seen between these probes and genomic DNA reflects the copy number. RQPS can be used to measure copy number of any transgene, differentiate homozygotes from heterozygotes, detect the CNV of endogenous genes, and screen embryonic stem cells targeted with bacterial artificial chromosome (BAC) vectors. RQPS is rapid, inexpensive, sensitive, and adaptable to high-throughput approaches. As one of our featured articles, the protocol is freely available to subscribers and non-subscribers alike.

The dynamic nature of biological processes has long been difficult to document, as researchers have been limited to static studies based on fixed specimens. Methods like immunocytochemistry or in situ hybridization can only provide accurate information on one organism at one particular time point. As Scott Fraser has remarked, it’s akin to trying to figure out the rules of football from looking at a set of still photographs taken during a game. But recent developments in imaging techniques, particularly the use of Green Fluorescent Protein (GFP) and its variants, have provided nondestructive ways to study dynamic processes over time, taking our understanding into the fourth dimension.

These new imaging techniques generate an enormous amount of digital image data, which can be difficult to cope with as it builds up over time. Computer-based image analysis is required for the extraction of reproducible and quantitative information. Previously, Cold Spring Harbor Protocols has featured Khuloud Jaqaman and Gaudenz Danuser’s case study using particle tracking to study cellular dynamics. In the June issue of the journal, Roland Eils and colleagues present Tracking and Quantitative Analysis of Dynamic Movements of Cells and Particles. The article sketches a general workflow for quantitative analysis of live cell images and details automated methods for image analysis including preprocessing, segmentation, registration, tracking and classification.

While 454-based pyrosequencing has led to great advances, an intrinsic artifact of the process leads to artificial over-representation of more than 10% of the original DNA sequencing templates. This is particularly problematic in metagenomic studies, where the abundance of any sequence in a dataset is often used for comparative community analysis. It’s important to remove these artificial replicates before analysis. This phenomenon can skew data interpretation when making comparisons between datasets. As metagenome datasets become more plentiful, the ability to apply more robust statistical tests becomes increasingly important, and the validity of the input datasets becomes more crucial. Tools such as MG-RAST (covered in the January issue of Cold Spring Harbor Protocols in Using the Metagenomics RAST Server (MG-RAST) for Analyzing Shotgun Metagenomes) have the capability to remove exact duplicates, but this captures only a subset of the artificial replicates. In the April issue of Cold Spring Harbor Protocols, Tracy Teal and Thomas Schmidt from Michigan State University present an instruction set for Identifying and Removing Artificial Replicates from 454 Pyrosequencing Data. Their 454 Replicate Filter is a web-based tool that incorporates the algorithm cd-hit. This protocol provides details on how to use the replicate filter and obtain a file of unique sequences for use in metagenomic or transcriptomic analyses. This allows users to obtain a more accurate quantitative representation of the sequence diversity in a dataset.

Some recent articles discussing computer software designed for use by biologists (I can’t personally vouch for any of these programs, but thought they might be of interest to readers of CSH Protocols):

Even Better Free Molecular Biology Software: Serial Cloner–the always valuable Bitesize Bio website has a review of Serial Cloner, a cross platform program for molecular biologists:

“It is very intuitive and is packed with features; from basics like constructing importing sequences, constructing plasmid maps and restriction mapping, through more complex things like sequence alignment, Gateway cloning and siRNA design.”

One of the commenters on the article also suggests PlasmaDNA.

iPhone apps every biologist needs: article from The Scientist, detailing 10 apps of interest. While many look useful, I’m not sure how many of them have added appeal on a mobile device (as opposed to use on a laptop or desktop computer). How often do you need to consult the periodic table while you’re on-the-go?

Also, I may be a luddite, but in my lab days, you wouldn’t even take a lab manual to the bench, you’d photocopy the protocol you were going to use because you didn’t want the expensive manual exposed to harsh chemicals and other contaminants. Are people really using expensive and fragile items like the iPhone at the lab bench? Do you set your iPhone down next to the phenol, just behind the HCl? Can you use it while wearing gloves? Wouldn’t you worry about all the E. coli contaminating your gloves from the plasmid preps you’re doing? Do you really want to smear that all over the device you’ll be holding next to your face?

9/12/09–Edited to addHere’s another list, of 50 Useful iPhone Apps for Science Students & Teachers.

To aid in the study of genetic diseases, the International Haplotype Map Project has developed a haplotype map of the human genome, a tool that displays common patterns of genetic variation. While data from the project are available for unrestricted public use from the project’s website, the new tools needed to browse the map can be difficult to master for the beginner. This month’s issue of Cold Spring Harbor Protocols features a set of articles with clear, step-by-step instructions for the analysis of HapMap data.

Browsing HapMap Data Using the Genome Browser provides details on how to navigate to and explore HapMap data for a gene or region of interest. Written by Albert Vernon Smith, this protocol shows how to analyze a candidate gene to find out whether there are any common single nucleotide polymorphisms (SNPs) in the immediate vicinity, what those SNPs’ alleles are, and the relative frequencies of the alleles in the population. As one of our featured articles for the month, it’s freely available to subscribers and non-subscribers.

The other articles in the set (subscribers only) are Generating HapMap Data Text Reports Using the Genome Browser, Manipulating HapMap Data Using HaploView, Retrieving HapMap Data Using HapMart, and Retrieving HapMap Data via Bulk Download. If your institution does not yet subscribe and you’d like to see these articles, you can sign up here for a free three month trial.

Tandem repeats are short stretches of DNA that are repeated head-to-tail. These are increasingly used as markers in forensic and genotyping research. But not all tandem repeats are created equal, as they display varying rates of stability. A repeat must exhibit enough instability to generate enough heterozygosity in a population to be of use in discriminating between individuals in a population. Too much instability though makes it difficult to look over large evolutionary distances, as it becomes difficult to see relatedness between samples. To determine which repeats are useful as markers, Kevin Verstrepen’s lab at Harvard has created the SERV (“Sequence-Based Estimation of Repeats Variability”) applet, which enables finding repeats in DNA sequences and estimating their variability. First introduced in this Genome Access paper, Sequence-based estimation of minisatellite and microsatellite repeat variability, Verstrepen and colleagues have now written a guide to using the SERV Applet, available in this month’s issue of CSH Protocols.

Legendre, M., Pochet, N., Pak, T., Verstrepen, K.J. (2007). Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Research, 17(12), 1787-1796. DOI: 10.1101/gr.6554007
Legendre, M., Verstrepen, K.J. (2008). Using the SERV Applet to Detect Tandem Repeats in DNA Sequences and to Predict Their Variability. Cold Spring Harbor Protocols, 2008(2), pdb.ip50-pdb.ip50. DOI: 10.1101/pdb.ip50

Over the last few months (and in coming months), we’ve presented a series of protocols for genotyping and DNA isolation in a variety of model organisms. Much of this material was adapted in advance from Genetic Variation: A Laboratory Manual, which is now available from CSHL Press. It’s a difficult subject for a laboratory manual, as it’s such a rapidly advancing field, and the question when it was proposed was, is it possible to put together a manual that isn’t obsolete the moment it’s published? (more…)

September’s issue of CSH Protocols is upon us (my, that was a quick summer–Kurt Weill sure knew what he was talking about). We’re featuring two protocols this month, both available freely to non-subscribers (as are all of our sample protocols). The first is for generating mouse models for squamous cell carcinoma and the second is for the Oligonucleotide Ligation Assay (OLA), used for finding Single-Nucleotide Polymorphisms (SNP’s). (more…)

The July issue of CSH Protocols is available and features a set of articles detailing the basics for common Bioinformatics techniques (and just in time for the 4th of July, allowing me to make a “blast” pun). (more…)