The genomes of more than 100 species have been sequenced, and the biological functions of encoded proteins are now actively being researched. Protein function is based on interactions between proteins and other molecules. One approach to assuming protein function based on genomic sequence is to predict interactions between an encoded protein and other molecules. As a data source for such predictions, knowledge regarding known protein-small molecule interactions needs to be compiled. We have, therefore, surveyed interactions between proteins and other molecules in Protein Data Bank (PDB), the protein three-dimensional (3D) structure database. Among 20,685 entries in PDB (April, 2003), 4,189 types of small molecules were found to interact with proteins. Biologically relevant small molecules most often found in PDB were metal ions, such as calcium, zinc, and magnesium. Sugars and nucleotides were the next most common. These molecules are known to act as cofactors for enzymes and/or stabilizers of proteins. In each case of interactions between a protein and small molecule, we found preferred amino acid residues at the interaction sites. These preferences can be the basis for predicting protein function from genomic sequence and protein 3D structures. The data pertaining to these small molecules were collected in a database named Het-PDB Navi., which is freely available at http:// daisy.nagahama-i-bio.ac.jp/golab/hetpdbnavi.html and linked to the official PDB home page.
ASJC Scopus subject areas