Extend-NMR: Extending NMR for Functional and Structural Genomics
Partner 3Institut Pasteur, Paris, France
Partner 1University of Cambridge, UK
People involved:Wolfgang Rieping, Michael Habeck, Darima Lamazhapova, Michael Nilges

ISD: Inferential Structure Determination

Structure determination is generally done by minimisation. Experimental data (e.g. NOE peak intensities) are converted to constraints (e.g. distance bounds), combined with a molecular force field, and minimised. In practice you need to estimate (guess) parameters like calibration constants for peak intensities, distance cut-offs, correlation times, and the weighting constants for different data points. The final result is a (family of) structure(s) , but there is no reliable way of determining the uncertainty of the coordinates.

ISD (Inferential Structure Determination) uses Bayesian statistics to calculate the distribution of probable results given the input information. The program randomly samples different structures and calculates the measurement results that each would have produced. The difference between the calculated and measured results determines how likely each structure is, and the total set of sampled structures gives you the distribution of likely structures given the input data and prior information. There is no need to estimate calibration constants, data weightings etc. beforehand - the program can sample randomly over different values and find the likely values just like it finds the likely coordinates for the structures. Ultimately the program gives you a family of structures with an uncertainty estimate for all coordinates, all calibration constants. The data weightings calculated for data points tell you how well each point fits and so gives you a built-in validation of the data.

Random sampling over large numbers of structures is very computationally demanding. ISD uses Markov Chain Monte Carlo sampling with replica exchange, simulating many structures in parallel at different temperatures. The program has been set up to run on clusters to provide faster results.

The graphical interface to ISD

ISD Benefits

  • NMR structures with uncertainties ('error bars') on the coordinates
  • No free parameters
  • No need to set calibrations scaling constants etc. - ISD estimates them in the calculation
  • Built-in validation of the data
  • Optimal use of available information
  • Protection against over-fitting

Probabilistic assesment of protein structure conformations
A structure ensemble from ISD