Print

Print


ARTICLE: IEEE Spectrum: This is Your Brain Online

Paul Allen's $100 million for mapping the brain will produce the largest trove of biological data ever

By Philip E. Ross

Wed. 31 Dec 03 18:16 1600 GMT

A new discipline has bloomed at the intersection of biology and computer science. Called bioinformatics, it is already
so far advanced that many life scientists spend more time at their computers than they do at the laboratory bench. They
gobble processing power like peanuts, burying themselves in the massive comparison of genes and the chemical
instructions the genes issue to the body's cells.

"Massive" took on new meaning in September, when Paul G. Allen, cofounder of Microsoft Corp., in Redmond, Wash.,
unveiled the greatest bioinformatics initiative yet: a map called the Allen Brain Atlas, indicating which of 20 000
genes is doing what in the brain and where it is doing it. The project is being undertaken by the Allen Institute for
Brain Science, in Seattle, Wash.

Allen put up the US $100 million that the three-year mission is expected to consume, and he has promised to put the
entire thing on the Web, in quarterly installments, for individual researchers to access free of charge. Then the real
magic will begin: neuroscientists will sift the data for insight into the workings of the mind and clues to the causes,
and possibly the cures, of such devastating ailments as Parkinson's disease, epilepsy, depression, obsessive-compulsive
disorder, alcoholism, and schizophrenia.

These brain disorders mostly resist today's drug treatments, and because they generally torture people for years
without killing them, they have created a huge pool of patients whose care costs tens of billions of dollars—not to
mention indirect economic costs, notably from lost workdays. New ideas for drug therapy are desperately required;
Allen's Atlas promises to provide them.

Up to now the biggest numbers game in biology had been run by the publicly financed Human Genome Project, which
sequenced each of the three billion letters in the DNA code for a human being. "I know a neuroscientist who downloaded
the Human Genome Project onto an Apple iPod," says Mark Boguski, an M.D. and Ph.D. who is a veteran of that project and
who now directs the Atlas [see photo, "Cranial Cartographer"]. "But that was 3 gigabytes, and we will be producing
petabytes."

The orders-of-magnitude calculation is simple: multiply 20 000 genes by a trillion neurons. Nobody will be downloading
this mass of data—that's for sure. Drug companies and other power users that want to get their arms around the entire
data set to apply their own algorithms will have to pay for special access to the Atlas computers or to other computers
carrying a copy of all its data.

The project is negotiating with a supplier that it won't name for 24 servers (or nodes) it wants for its computer farm,
says Brian Crook, a software engineer who's been with the project for the past six months. "That's just a start. In my
last job, I worked on a system that had just 15 terabytes in compressed form, and it required 60 nodes." The Atlas will
certainly need hundreds.

It may seem strange that so much more information can come from a single organ than from a genome that specifies the
entire body, but DNA is really more like a recipe than a blueprint. It tells you not where every last component will go
but merely what instructions to follow to get them there. Just as a very few ingredients can give rise to the
delicately detailed swirls of foam in a soufflé, so can a handful of genes specify the stupendous complexity of the
cerebral cortex—in this case, the mouse cortex.

Big as the Brain Atlas will be, its managers intend to start small, by scanning the gray matter of a little black-
furred thoroughbred called the C57BL/6 mouse. It is a genetically uniform, fast-growing critter that has the added
advantage of having all its genes decoded. With the transcript of that code—which includes about as many genes as
humans have—the project's directors hope to get a lot of work done fast and show some medically valuable results.

Only later will they begin associating structures in the mouse brain with those in the human one, a painstaking slog
that must await the development of technology that can safely zero in on the physiology of individual neurons without
first having to kill the subject.

"Paul Allen did not come to us and say, 'Make an atlas of the mouse brain,' " Boguski says. "But he very quickly came
to realize that the mouse brain was a very powerful tool." By "us," he means the board of advisers, an august bunch
that includes the linguist Steven Pinker, professor of psychology at Harvard University in Cambridge, Mass., and author
of The Language Instinct, and the molecular biologist James D. Watson, a codiscoverer of the structure of DNA, who is
now president of Cold Spring Harbor Laboratory on Long Island, N.Y.

Allen was fascinated by the Human Genome Project, and like many computer people, he had also been interested in
modeling the mind. The brain project fit both interests and also promised to give a lot of bang for his philanthropic
buck. For several years, such a project had been on the wish list of the government-funded National Institutes of
Health, in Bethesda, Md. Only now, however, had the technology become equal to the task. Methods of speeding and
automating biological research had ripened, the mouse genome had been sequenced, and the ability to manage large data
sets had matured.
WINNER: ALLEN BRAIN ATLAS

GOAL In three years, create a comprehensive map of the mouse brain showing in which cells each of 20 000 genes are
active, and make it available to researchers free of charge on the Web. Later, make an equivalent map of the human
brain

WHY IT'S A WINNER Generating more biological data than any project that has come before, the Atlas will provide the
most detailed map of the most complex organ. It will act as a key resource for understanding and then combating many
intractable brain disorders

ORGANIZATION Allen Institute for Brain Science

CENTER OF ACTIVITY Seattle, Wash.

NUMBER OF PEOPLE ON THE PROJECT 45, going to 100 in two years

BUDGET US $100 million provided by Paul Allen

How can a mere mouse yield medical wonders? There are plenty of physical diseases for which the mouse has proved to be
a useful model, and there is no reason it cannot uncover the roots of mental disorders as well. True, a mouse does not
have all that much upstairs. Yet, though the rodents do not suffer from the same sort of depression that afflicts
people—they do not despise themselves or pray for death—a few seem to experience something rather like it. A normal
mouse, set afloat in a tank with a hidden underwater perch, will tread water until its feet find purchase.
"Quasidepressive" mice give up quickly and sink. However, treated with antidepressants, those same mice will persevere.


The mouse leg of the Allen Brain Atlas is just getting started in a long, low building in the Seattle area. A walk
through the place gives me a distinct feeling of déjà vu: multiple office kitchens filled with appliances, sunny
conference rooms strewn with $1000 Aeron executive chairs. Yep, this place was previously occupied by a dot-com company
that apparently closed up shop before its employees had time to leave a single coffee stain on the carpet.

As visitors enter the laboratory proper, they pass a few big pieces of equipment from Germany, still in their crates; a
lot more are on order. People are on order, too, as Boguski quietly makes clear to a delegation of British scientists.
Advertisements in the science journal Nature have elicited a flood of résumés from experts in animal care,
neuroscience, and computer science—the current hiring rate runs at about three per week.

Right now, just three people are sitting in the lab, all huddled along a pair of benches. Another 20 or so have yet to
relocate from temporary quarters in the downtown Seattle offices of Vulcan Ventures Inc., the investment firm that
manages Allen's many biotech enterprises. By spring, the project should be staffed with up to about 45 people; in
another couple of years, it will reach 100, including a number of top scientists holding joint academic appointments.

The basic goal is to show what the brain genes do and where they do it. Each gene directs the manufacture of a
particular protein. Famous examples of genes identified by the proteins they make include the one for hemoglobin (which
carries oxygen in the blood), estrogen (which feminizes women), and testosterone (which turns men into fools).

In the brain, some of the most interesting proteins are receptors, so called because they sit on the cell membrane and
receive chemical messengers. For instance, the dopamine D4 receptor detects the messenger dopamine as it comes in from
neighboring neurons; this receptor is thought to play a role in schizophrenia, depression, attention-deficit disorder,
and even the penchant for novel experiences.

Ideally, the Atlas would study the proteins directly, but because proteins are hard to detect in fine detail and at
high production speeds, the Atlas will instead follow an easier target: nucleic acids that ferry data from the nucleus
to the structures that translate them into protein. The technique involves slicing the brain into many thin sections,
putting each one on a slide, and exposing it to chemicals that attach to the nucleic acids you're interested in.

Chemical reactions turn the attached acids a color so that their location in the brain can be scanned and digitized.
The resulting deluge of data must be cataloged so that various search algorithms can fit it all into physiologically
meaningful patterns.

To get a first pass through 20 000 genes within three years, the Atlas project will rear mice to a precise age, then
sacrifice them minutes, even seconds, before cutting their brains into 25-µm slices, about three cells thick. Speed is
of the essence: other organs can be put on ice, but brains need glucose and oxygen from second to second or they begin
to die. To help freeze important molecules in place, the workers will inject a preservative into the mouse while it's
still alive and use the heart to pump the liquid through the brain.

The entire process will be standardized and, to the extent possible, roboticized to increase productivity. A slice will
be wafted to a slide with a puff of air, lightly glued to it—rather as a Post-It slip is glued to a desk—then examined
under a microscope and its image digitized [see photo, "One Down..."].

No brains are being dissected right now, and the various ways of automating the slicing, dicing, dissecting, and
staining are still being worked out. It's clear, though, that the factory will have to work fast. Given a 1.5-cm-long
brain and a 25-µm-thick slice, you get 600 slices per brain; with the ability to see just three genes per slice and the
goal of looking at 20 000 genes altogether, you require 7000 brains (perhaps 8000 for good measure). That comes to four
million slices, most of which will have to be processed in the last two years of the three-year run, when the system
should be generating some 30 000 slides per week.

Already, Baylor College of Medicine, in Houston, Texas, has produced some sample slides for the programmers to play
with while they optimize the software [see screen shot, "Brainscape Navigator"]. By thus tackling the information
technology (IT) challenge first, they will have the project ready when the flood of homegrown data starts pouring in. A
slide's worth of data may not seem much, coming as it does from just three genes, but because each gene's protein-
making activity is mapped over two dimensions, the yield comes to some 50 MB.

That's the kind of specificity that medical researchers need. "It's like real estate—what matters is location,
location, location," says Boguski. "It matters not just what area the molecule's in but what cell it's in."

Rather than carefully dissect a single slice for hours, the project will shove many identical slices through its mill
fast, taking care to slice at the same angle every time. This strategy of trading quantity of data for quality is as
foreign to the painstaking world of neuroanatomy as it is familiar to that of computer science. Chess software, for
example, incorporates little chess knowledge but applies it to so many millions of possible lines of play that it can
give headaches even to Garry Kasparov, the world's top player.

Neuroanatomists, like chess masters, don't like the idea of an automated factory beating them at their own game. "One I
talked to said that with 25-micron sections, we often wouldn't even get the nucleus [the cell's central DNA archive],"
says Boguski. "I asked him, if it were cheap enough to do the same experiment 1000 times, wouldn't that be better than
doing it once, thoroughly?" A thousand slices should get the nucleus most of the time.

Neuroanatomists don't like the idea of an automated factory beating them at their own game

Data from each two-dimensional slice will be fed to programs that reconstruct the brain's structures in three
dimensions. Say you have a neuron whose cell body—containing the nucleus—sits in the middle of the brain and whose
axon—the long, communicating stalk analogous to an interconnect in an IC—reaches to the frontal part of the brain. The
software will have to tease out the long, skinny, possibly oblique path frame by frame.

Unlike most other big biology projects, the Atlas will study genes just as they come up, in no particular order. "That
was a mistake at the Human Genome Project—the scientists stayed with their favorite areas and the work never got done,"
says Boguski. "The real surprises will come when we look at all genes, agnostically. Scientists are trained to be
hypothesis-driven, but the Atlas will be data-driven."

Boguski, a pathologist by training, worked in bioinformatics on the Human Genome Project before either the project or
the field bore those names. And he continued at a bioinformatics company, Rosetta Inpharmatics LLC, now in Kirkland,
Wash. His background makes him particularly sensitive to another mistake the Atlas means to avoid: the scanting of
bioinformatics. "The original funders of the Human Genome Project underestimated the IT element, the National
Institutes of Health have not come to grips with it, and GenSat [a government mouse-brain anatomy program] paid only
for data production, not for bioinformatics," Boguski says.

That oversight in the Human Genome Project led to a last-minute scramble in the spring of 2000 to develop a program
that assembled all the various fragments of genetic code into a mostly coherent whole. "If you can't use the data, what
good is it?" Boguski asks. "We have twice as many computer science people as biologists right now, and even when the
project reaches full employment, the ratio will probably still be 1:1."

Lin Chen, a bioinformaticist at the Atlas, is working in a number of software environments. Strewn around his
workstation are manuals from Red Hat, a software brand from Red Hat Inc., in Raleigh, N.C., that is based on Linux, the
open-source operating system that is itself based on Unix. "Unix is big in bioinformatics because it's good for big-
batch processing," Chen says. "Also, a lot of the software is written in Perl, which is easy, fast, and loaded with
functions to deal with pattern search and string search. Most things here, I wrote in Perl," the programming language
developed by linguist Larry Wall to take in large amounts of data and manipulate it flexibly—a "Swiss Army chainsaw,"
as its devotees call it.

Chen used to work for Celera Genomics, part of Applera Corp., based in Norwalk, Conn. Celera made itself a big name
(though no money) by sequencing the human genome faster than the Human Genome Project could manage. Unlike the Human
Genome Project, Celera did not underestimate the IT challenge: it spent $50 million on what was one of the largest
computing centers outside government weapons laboratories.

The Atlas governing body, the Allen Institute for Brain Science, won't make a dime from this, either—it can't, as it is
a not-for-profit organization. However, that isn't stopping it from acting entrepreneurially. Boguski is looking for
corporate and government money to extend and enhance the project. "We are not just building an atlas, we're building a
platform that can be used for other experiments," he says.

One next-generation project would be to study mice in which certain genes have been disabled, so that their role in the
brain can be deduced. Another would be to construct maps not of the immediate chemical messengers of the genes, as the
scientists are doing now, but of the final chemical products—the entire set of proteins in the brain, its so-called
proteome.

Next might be the extension of this static image of the brain to a more dynamic one, part of the ever-increasing
dimensionality: first a line (the string of DNA code), then a plane (gene activity in a cross section), next a rendered
solid, and finally, perhaps, a representation of how the solid structure changes over time. Of course, to keep Paul
Allen happy, the researchers will endeavor to link this picture of mice to men, in a point-to-point correspondence
between the brains of the two species.

It is not yet clear how this will be done. One way might be to tag nucleic acids performing protein synthesis with
magnetic molecules, then to scan the living brain electromagnetically, as in functional magnetic resonance imaging. The
result, assembled by a computer, would then be a 3-D depiction of the synthesis of the protein in question.

"We'd start by scanning small animals noninvasively, then move to humans," Boguski says. Even if only a few proteins
could be outlined in this fashion, the resulting information could serve as signposts for the proper alignment of other
data that can be physiologically linked to it.

Like a detective's magnifying glass, the Atlas will aid sight, not confer it. A lot of inspired pattern-sifting will be
required to unravel the common psychiatric disorders, which mostly stem not from the actions of a single misbegotten
gene but from those of many genes, all reacting to environmental cues, one another, and each one's reactions.

"If the cause of a disease is like a needle in a haystack, then we're making the haystack smaller," Boguski says.
"There are pathways to disease, and once you find them, you're well on your way to finding the ultimate cause."

SOURCE: IEEE Spectrum On-line
http://www.spectrum.ieee.org/WEBONLY/publicfeature/jan04/0104bio1.html

* * *

----------------------------------------------------------------------
To sign-off Parkinsn send a message to: mailto:[log in to unmask]
In the body of the message put: signoff parkinsn