Newsletter 105 - 112
September 26, 2005 - January 2, 2006
The NIH X-Ray Diffraction Interest Group
Newsletter web site:
Item 3: Topic Discussion
PHASER: MR with Maximum Likelihood
Although numerous crystal structures of several retroviral proteases (PR) have been solved in the past (e.g. from HIV, RSV, FIV, EIAV) and all these enzymes share significant sequence and structure homology, the new structure of human T-cell leukemia virus (HTLV) PR could not be solved (in our hands) by any of the standard MR programs (AMoRe, EPMR, MolRep). However, the structure was solved in a straightforward manner using the PHASER program, as implemented in ccp4i.
Some explanation of the failure of the "older" MR programs is provided by the comparatively low levels of sequence identity/similarity between the models and the target (ranging in simple CLUSTAL comparisons from 23.9/59.3% for FIV PR to 32.3/63.6% for HIV PR) and by the complicated architecture of the unknown structure (three homodimeric molecules forming a highly pseudosymmetric trimer), combined with high degree of uncertainty about the asymmetric unit contents.
In automatic runs, which included the full available resolution of 2.6 A, PHASER correctly solved the trimeric model, but failed (correctly again) to located any additional copies of the molecule due to packing considerations. The solutions could be unambiguously identified by their high Z-score parameter (> 10) for the more similar models (HIV PR, EIAV PR). The less similar models had Z-score values just below 8, and corresponded to a correct (7.96, RSV PR) or incorrect (7.80, FIV PR) solution. The quality of the solutions is also recognizable from the final LL-gain parameter, which at a value of 110 could distinguish between incorrect and correct solutions, increased to about 180 for "strong", unambiguous solutions (Z-score about 11), and rose sharply to 270 for the absolutely best case (Z-score 14.54).
In post factum analyses, the levels of sequence identity/similarity are reflected in the r.m.s. deviations between the Ca atoms of the corresponding models and the final HTLV PR structure, which are high even for the best models: HIV PR 1.6 A, EIAV PR 1.7 A, FIV PR 1.8 A, RSV PR 1.9 A.
In practical terms, the structure was cracked using the best available model of HIV PR determined at atomic resolution (Z-score 11.16). However, in a posteriori calculations the clearest solution was obtained with a medium-resolution HIV PR model, which in Ca comparison with the final target did not show any obvious superiority. This observation reinforces the notion that the performance of even the most robust MR algorithms may critically depend on the initial conditions and that, whenever available, different models should be tried.
Xinhua Ji (NCI): All Data and Small Search Models
My group has solved about 10 structures using PHASER. Our limited experience suggests the use of all data and small search models. At higher resolution, the solution is more likely unique. Small models are often solid and likely more accurate. Below is a script for the first, and often successful, MR attempt with PHASER, where protein-1 contains a single domain while protein-2 contains two different domains.
phaser << eof > auto.log # PHASER v1.3
TITLe your-project automatic
LABIn F=FP SIGF=SIGFP
ENSEmble model-1 PDBfile model-1.pdb IDENtity 95
ENSEmble model-2a PDBfile model-2a.pdb IDENtity 90
ENSEmble model-2b PDBfile model-2b.pdb IDENtity 90
COMPosition PROTein MW 32000 NUM 1 # protein-1
COMPosition PROTein MW 66000 NUM 1 # protein-2
SEARch ENSEmble model-1 NUM 1
SEARch ENSEmble model-2a NUM 1
SEARch ENSEmble model-2b NUM 1
Multiple search models (as shown below) help, especially when the search model is not that solid, as indicated by high B factors (of each model), high RMS values (between the models), and low sequence identity (to the unknown).
ENSEmble model-2b PDBfile model-2b1.pdb IDENtity 90 &
PDBfile model-2b2.pdb IDENtity 90 &
PDBfile model-2b3.pdb IDENtity 90
Dr. Mark Mayer (NICHD): Keeping up with new software is time consuming especially for small labs. The NIH X-Ray Diffraction Interest Group News Letter gives us a chance as a community to share information about newer programs, and tips for using them. On the Mac side of things the Structural Biology Grid has version of CNS and Refmac that support multithreading and fast FFT calculations on G4/G5 systems which significantly speed up calculation of composite omit maps. The binaries are available from: http://www.sbgrid.org/osx.php?software=1&id=0
A relatively new Molecular Replacement program PHASER has just been updated to version 1.3.1 and appears to be a very useful tool.
PHASER was written by Randy Read, Ralf Grosse-Kunstleve and colleagues as part of PHENIX and is available from: http://www-structmed.cimr.cam.ac.uk/phaser/.
Binaries are available for multiple platforms including Mac OsX, linux, SGIs, and PCs. An interface to CCP4i is also available. Eventually the program will be incorporated into the CCP4 suite.
Among many nice features are use of maximum likelihood targets; anisotropy correction; a very user friendly interface when run from CCP4i; Z scoring as an indication of the success of the fit; automatic testing of enantiomorphic space groups; the ability to use an ensemble of models as targets; and the ability to search for and fit domains independently. The latter feature was attractive to me in the case of a 2.2-angstrom data set from my home facility (P41212 or P43212) for which I was expecting a large conformational change from my search model. PHASER identified the space group and found the two domains in their new orientations without intervention.
The program has many features I have not tried which extend those available in MOLREP and AmoRe.
This site is maintained by Dr. Xinhua Ji (email@example.com) on the NCI-CCR-MCL server (http://mcl1.ncifcrf.gov).