Videos

The Network of Sequence Flow Between Protein Structures

Presenter
January 15, 2008
Keywords:
  • Proteins
MSC:
  • 92D20
Abstract
Sequence-structure relationships in proteins are highly asymmetric since many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions we compute a directed graph of sequences and structures of proteins, which is based on experimentally determined protein shapes. Two thousand and sixty experimental structures from the Protein Data Bank were considered, providing a good coverage of fold families. The graph is computed using an energy function that measures stability of a sequence in a fold. A node in the graph is an experimental structure (and the computationally matching sequences). A directed and weighted edge between nodes A and B is the number of sequences of A that switch to B because the energy of B is lower. The directed graph is highly connected at native energies with ³sinks² that attract many sequences from other folds. The sinks are rich in beta sheets. The in-degrees of a particular protein shape correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure. Joint work with Leonid Meyerguz and Jon Kleinberg