Chemical Shift validation programs

Summary

Software that works on stand-alone chemical shifts operates on

  • one entity
  • one data set
  • proteins only
  • variety of input formats

Software that back-calculates chemical shifts from coordinates and compares them to observed values operates on

  • one chain
  • one model
  • one data set
  • coordinates in PDB format
    • matching PDB chain to BMRB entity is not trivial and requires mmCIF and/or NMR-STAR metadata
  • chemical shifts in a variety of formats

SHIFTS is the only program that works on nucleic acids.

Common outputs

Reference offsets
Program Nucleus & Detected offset
LACS
PANAV
SPARTA+ if detected

Example:

_CS_reference_offset.Software_name
_CS_reference_offset.Nucleus
_CS_reference_offset.Val

LACS    CA   0.1
PANAV   CA   0.0
SPARTA+ CA   .

(will have more tags in real life: Software_ID, Entry_ID, Entity_ID, Assigned_chemical_shift_list_ID etc.)

CS Outliers
Program Residue & Atom ID Expected/predicted value
AVS
LACS
PANAV
SPARTA+
SHIFTS

In the table above, observed chemical shift and difference between expected and observed can be trivially added if expected value is provided.

Example:

_CS_outlier.Software_name
_CS_outlier.Comp_index_ID
_CS_outlier.Atom_ID
_CS_outlier.Val_expected
_CS_outlier.Val_observed
_CS_outlier.Val_delta

LACS    1 CA   55.1 65.5 10.4
PANAV   1 CA   55.2 65.5 10.3
SPARTA+ 1 CA   53.3 65.5 12.2
SHIFTS  1 CA   52.4 65.5 13.1

(will have more tags in real life: Software_ID, Entry_ID, Entity_ID, Assigned_chemical_shift_list_ID etc.)

Problem: for coordinate-based programs we have to

  1. have Model_ID column in both tables; it will always be NULL for the other programs,
  2. decide what do wo with multiple models. If we put everything in one table it may be harder to derive a “consensus” report from it.

Details

Programs that work on Chemical Shifts only

Ref. OffsetsOutlier atomsOutlier CS ValuesOther
AVSclassifies errors into “anomalous” and “suspicious”, detects duplicate assignments.
Full report includes CS statistics
LACS✔ (H,N,C,CA/CB)
PANAV✔ (N,C,CA,CB)classifies outliers into “deviant” and “suspicious” (outliers after removing “deviant” ones?)
“expected” CS values, probability scores(?), CSI(?)

AVS

Language: Perl

Input: NMR-STAR 3.1

Output:

  1. “anomalous” report to author:
    • expected CS
    • observed CS
    • STDDEV
    • Error code: anomalous, suspicious, or duplicate
  2. “anomalous” report in STAR format:
    • the above plus some extra details
    • software saveframe
    • citation saveframe
  3. “full” report: text file, lists AVG, STDDEV, and observed CS values for each residue and atom, plus analysis result.

What is “PRTL”?
Can it handle multiple entities, hybrid entries, multiple CS lists?
NSTD residues with a number in Comp_ID will crash the program.

LACS

Language: matlab

Input: own format

Output:

  1. own format:
    • detected reference offsets
    • X & Y plot coordinates for “trend lines”
    • X & Y coordinates for each observed CS
    • CS outliers are flagged
  2. STAR format: contains
    • tables of X and Y coordinates
    • used by DEVise

Works on one amino-acid entity only.
If there is a gap in the sequence and the missing residue is PRO, may give incorrect result for the following residue.
Nitrogen analysis is apparently sensitive to pH.

PANAV

Language: Java

Input:

  1. NMR-STAR 2.1 or SHIFTY
  2. With our custom wrapper: NMR-STAR 3.1 CS table (from loop_ to stop_)

Output:

  1. PANAV is a GUI application,
  2. Our custom wrapper pulls out and prints to stdout
    1. CS referencing offsets
    2. Outliers
    3. (optional) table of expected chemical shifts after referencing correction and secondary structure prediction

Table format:

  1. residue, e.g. A1
  2. C CS after ref. correction
  3. CA - “ -
  4. CB - ” -
  5. N - “ -
  6. H - ” -
  7. HA - “ -
  8. “B-prob”: probability of beta-sheet (?)
  9. “C-prob”: - ” - random coil (?)
  10. “H-prob”: - “ - alpha-helix (?)
  11. secondary structure code: B, C, or H

Probably cannot handle multiple entities, hybrid entries, multiple CS lists (not supported by SHIFTY format)

Programs back-calculating Chemical Shifts from coordinates

Ref. OffsetsPredicted CSPredicition error est.CS outliers/diff between predicted and observed CSOther
SPARTA+if detectedsecondary shift, ring current, random coil, EFE, srtuctural parameters
SHIFTS?Protons: ring currents, electrostatic contribution (backbone only for now), peptide group anisotropy, constant constribution, random coil shift; Other: preceding backbone effect, self backbone effect, following backbone effect, preceding chi effect, self chi effect, direct HB effect (NH ←), indirect HB effect (C=O <)
SHIFTX

Common problems:

  • Mapping from PDB chain ID to BMRB entity ID is non-trivial and typically requires parsing the mmCIF and NMR-STAR to prepare the input files.
  • Typically these programs have issues with protons in PDB files (absence/presence/nomenclature).
  • Most programs work on one chain only. We have to
    1. split the input files by entity/chain
    2. run the program on each chunk,
    3. combine the results somehow.
  • Most programs work on one model only. We have to either run on representative conformer only, or run on all models. Either way
    1. figuring out the representative conformer often requires parsing the mmCIF,
    2. there are structures with multiple rep. conformers in PDB,
    3. the PDB file has to be split and results of multiple runs recombined as with multiple chains.

SPARTA+

Language: C++

Input:

  1. NMRPipe/TALOS
  2. PDB

Output:

  1. own format: three files with
    1. back-calculated CS incl.
      1. predicted CS
      2. secondary CS
      3. random coil CS
      4. ring current shift
      5. electrical field effect adjustment
      6. prediction error estimate
    2. structural parameters:
      1. torsion angles,
      2. H-bond distances
      3. NH order parameter
    3. observed vs predicted CS analysis incl.
      1. observed CS
      2. cs referencing offset if detected & applied (in REMARK section)
      3. corrected observed CS
      4. predicted CS
      5. difference beween observed (corrected?) and predicted CS
      6. estimated prediction error
      7. CS statistics

NMR-STAR output attempts to capture some of the above after splitting running SPARTA+ in a loop over every model in the PDB file. It does not include

  • CS referencing offset
  • Struct. parameters,
  • back-calculated CS data other than “predicted”

“Outliers can be identified if observed and predictred CS differ by more than 5 STDDEVs or by more than prediction error” – quoting the README file.
Works on one model, one chain, one amino-acid entity.
Mapping between PDB chains and BMRB entities is not trivial and requires parsing the mmCIF and/or NMR-STAR.
Best model information may not be in the PDB file (mmCIF should have it).
PDB file must have protons.
Generates useless warning messages when there are gaps in CS.
Has (had?) problems with running multiple copies in parallel: trample over each other's temporary files.

ShiftS

Language: NAB. NAB is the AMBER's scripting language component wirtten in C and available as a separate package.

Input:

  1. PDB
  2. space-separated table of observed CS: NMR-STAR 2.1 without the tags

Output:

  1. with observed CS file: RDB file (tab-separated with header) ??? – not tested
  2. back-calculated CS in .emp and .qdb files in their own formats – (somewhat) documented at http://casegroup.rutgers.edu/qshifts/qshifts.htm

Requires separate runs for protons and carbons/nitrogens, hence .emp and .qdb files
Runs on one model only (multiple chains: not documented)
Works on nucleic acids
Writes all over its temporaty files when multiple copies run in parallel

ShiftX

Language: C

Input: PDB

Output: back-calculated H, HA, N, CS, CB, and C CS, space-delimited table w/ a header.

Works on one chain only. Probably one model, too
Does not validate observed CS

ShiftX2

Language: C, Java, Python, (fortran?): SHIFTX2 includes BLAST, SHIFTX+, SHIFTY+, and a couple of others.

(according to README)

Input: PDB

Output:

  1. back-calculated CS, comma or tab-delimited
  2. SHIFTX+ results file
  3. SHIFTY+ results file

Works on multiple chains
Can average multiple models or save SHIFTX+/SHITY+ results for each model
Does not validate observed CS

Neither SHIFTX nor SHIFTX2 do anything with observed CS.

VASCO

May or may not be usable as a chemical shift validation tool. Not available for download.

http://www.ebi.ac.uk/pdbe-apps/nmr/vasco/

Login