Newsletter No. 200
July 27, 2009
TOPIC DISCUSSION - Test Set and R-free

Editor: Please share your practice and theory on Test Set and R-free with fellow members of the group. A list of suggested subtopics is shown below. 

(1) How many reflections do you put in Test Set for R-free calculations, a centerin percentage or a certain number?

(2) For isomorphous structures, such as a wild type and several mutant structures crystallized in the same lattice, do you use the same Test Set for all these structures?

(3) Do you use the same Test Set if you re-process your data during refinement or obtain a better data set?

(4) Do you use the same Test Set when you switch refinement programs?

(5) What do you do to the Test Set if you find out the crystal is twinned during refinement without twinning considerations?

(6) Do you include some or all data in the Test Set for the final rounds of refinement?

Recommended Reading: CCP4 wiki: R-factors

Mark Mayer: How about an addition?

(7) When you have NCS do you pick Test Set using thin shells?


Xinhua Ji: The more complete the data, the more accurate the stucture. It appears that R-free should be used to monitor initial refinement only. Once the model is completed and refinement is converged, all data should be included for final refinement. Besides, refinement with all data is likely to reveal additional features. However, the modern ML refinement targets require the use of test set and R-free is mandatory. Any ideas?

Phenix.refine allows one to set the maximal number of reflections in the test set. I usually set this number at 1000 using the command line shown below.


At lower resolution, the program selects a default number (<1000) of test-set reflections. At higher resolution, when the default number of reflections is greater than 1000, 1000 reflections will be selected. I discussed this with Dr. Paul Adams during the recent Mid-Atlantic Macromolecular Crystallogrophy Meeting and he agreed with me.

Phenix.refine also allows the use of all reflections in the refinement. The command line shown below illustrates how this is done.


This is a great option for refining structures at very low resolution where every reflection counts or at very high resolution where the risk of overfitting is low. In addition, phenix.refine makes it easy to use all data in the final refinement of all crystal structures.

Copyright © NIH X-Ray Diffraction Group                       Maintained by Dr. Xinhua Ji
on the NIH-NCI-CCR-MCL server (http://mcl1.ncifcrf.gov)