Bioinformatics

John Van Hemert, February 2006

What is Bioinformatics?

Bioinformatics is the marriage of information technology and biology. It is the science of using computing power to analyze biological data sets. On a lower level, bioinformatics is the combination of computer science, biology, chemistry, statistics and mathematics. See the curriculum for the University of Northern Iowa undergraduate program in bioinformatics. In "Medical Informatics," Shortliffe and Blois describe the relationship between bioinformatics and medical informatics with a diagram similar to the following:

Computer Science in Bioinformatics

Computers offer bioinformatics the processing speed necessary to handle large sets of data, such as genetic data found in genes. They are needed to repeat tasks millions of times. Computers are also used for their problem solving power for problems such as protein folding prediction (how a protein is shaped due to its DNA sequence). Computers have only recently become powerful enough to process and generate usable genetic data. Much research on a grand scale is necessary to discover methods for storing such information because conventional methods are simply inadaquate.

Biology/Chemistry in Bioinformatics

Bioinformatics is the application of computer science to biology. It is an extremely new and young field. Most bioinformaticians come from a backgound in only biology or only computer science. There is a large difference between the two types of entrances. When a computer scientist enters bioinformatics, she must gain domain knowledge in biology. When a biologist enters bioinformatics, he probably has a more difficult road to travel because he must learn computer science. This difference exists because biology is a very difficult science to discover, but a very easy science to understand. Discoveries in biology are of Nobel stature, but when biological ideas are discovered, they can be explained to and understood by a child. On the other hand, computer science is very difficult to learn and easier to discover. It is this disparity that draws more computer scientists into bioinformatics than biologists.

Statistics/Mathematics in Bioinformatics

DNA sequencing involves breaking up DNA chains into fragments, or Expressed Sequence Tags. These ESTs are then processed, usually by distributed or multiprocessor systems. When reassembling the ESTs, statisticians and mathematicians must use complex models to determine the original order. Bioinformatics involves complex algorithms and procedures. Many of these procedures are "wet" and physically conducted by scientists and involove many steps. An example is the use of microarrays (See the chapter on Genetic Testing.) With many human-performed steps, there are many chances for error to occur. This is the call for statistical methods to predict the accuracy of such procedures as microarry creation.

Bioinformatics "Gateways"

In their Instant Notes book, "Bioinformatics," Westhead, Parish and Twyman list several good staring points for bioinformatics on the world wide web:

Link

Description

Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease.
National Center for Biotechnology Inforamation
 
The European Bioinformatics Institute (EBI) is a non-profit academic organisation that forms part of the European Molecular Biology Laboratory (EMBL). The EBI is a centre for research and services in bioinformatics. The Institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures.
European Bioinformatics Institute
 
The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE.
The ExPASy (Expert Protein Analysis System)
 
EMBL Heidelberg is the main laboratory of EMBL's five sites, dedicated to basic research in molecular biology, the provision of research services to its member states and advanced training.
European Molecular Biology Homepage
 
GMD conducts cutting edge research on national and international level. It will continue doing so by cooperating with partners from trade and industry including media industry. As an application-oriented research institution, GMD will extend existing contacts and establish new contacts with potential partners. GMD sees itself as a catalyst of innovative ideas in the field of information technology. GMD is the place where researchers can develop innovative ideas in cooperation with partners from trade and industry who transform the obtained research results into new products.
German National Center for Information Technology homepage

In Bioinformatics for Dummies, Claverie and Notredame list some important bioinformatics databases:

Link

Description

Ensembl is a joint project between EMBL - European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute (WTSI) to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes.
Ensembl
 
The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease.
RCSB PDB Protein Data Bank

They also list some important software in bioinformatics:

Link

Description

EBI Database Search
SRS by Lion Bioscience AG
 
A web server for mixing Sequences and Structures into multiple sequence alignments
Tcoffee
 
                   \\|//          
                   (o o)
    -. .-.   .-oOOo~(_)~oOOo-.   .-. .-
    ||X|||\ /|||X|||\ /|||X|||\ /|||X||
    |/ \|||X|||/ \|||X|||/ \|||X|||/ \|
    '   `-' `-'   `-' `-'   `-' `-'   `
Identification of complete gene structures in genomic DNA
GENSCAN at MIT
 
Phylogenetic Tree Reconstruction
Phylip
 
Structure visualization
Rasmol