The Network of Sequence Flow Between Protein Structures
Presenter
January 15, 2008
Keywords:
- Proteins
MSC:
- 92D20
Abstract
Sequence-structure relationships in proteins are highly asymmetric since
many sequences fold into relatively few structures. What is the number of
sequences that fold into a particular protein structure? Is it possible to
switch between stable protein folds by point mutations? To address these
questions we compute a directed graph of sequences and structures of
proteins, which is based on experimentally determined protein shapes. Two
thousand and sixty experimental structures from the Protein Data Bank were
considered, providing a good coverage of fold families. The graph is
computed using an energy function that measures stability of a sequence in a
fold. A node in the graph is an experimental structure (and the
computationally matching sequences). A directed and weighted edge between
nodes A and B is the number of sequences of A that switch to B because the
energy of B is lower. The directed graph is highly connected at native
energies with ³sinks² that attract many sequences from other folds. The
sinks are rich in beta sheets. The in-degrees of a particular protein shape
correlates with the number of sequences that matches this shape in
empirically determined genomes. Properties of strongly connected components
of the graph are correlated with protein length and secondary structure.
Joint work with Leonid Meyerguz and Jon Kleinberg