Generating dictionary files for ADIT-NMR and validator

Overview

“Master” NMR-STAR dictionary is maintained by Eldon in an Excel spreadsheet and a number of text files: CSV files exported from the spreadsheet, some STAR files, and a few others. They are stored in nmr-star-dictionary subversion repository.

Three main pieces of software that use (subsets of) the information from /production_adit_files/adit_input version of the dictionary:

  1. Web interface to the dictionary in relational database format,
  2. ADIT-NMR deposition system,
  3. BMRB entry validation software (“the validator”).

The latter two use subsets of the dictionary in their own specially formated files. Those files are generated from the relational database.

The code

Current version is a collection of python scripts kept in svn repo https://svn.bmrb.wisc.edu/svn/nmr-star-dictionary-scripts/.

Generating the dictionary

the easy way

cd to nmr-star-dictionary-scripts and run make. Then generate dictionaries for “production” validator as described below (that's not part of make because it takes a while).

the long way

(See also HOWTO and README files in the dictionary scripts repo.)

Create the database first:

  1. (From scratch)
    1. get the code and Eldon's dictionary files:
      svn co https://svn.bmrb.wisc.edu/svn/nmr-star-dictionary-scripts
      cd nmr-star-dictionary-scripts
      svn co https://svn.bmrb.wisc.edu/svn/nmr-star-dictionary/bmrb_star_v3_files/adit_input
    2. edit dictionary.properties and fix all the paths,
  2. or (not from scratch) just run svn up in adit_input subdirectory.
  3. Run
    ./load_dict.py
    ./add_adit_if_dict.py
    ./add_comments.py
    ./add_ddl_types.py
    ./add_enums.py

This creates sqlite3 relational database in file dict.sqlt3.

You must run load_dict.py first, the rest can run in any order. Loader script generates some warning messages that can be ignored, add_*.py scripts don't produce any output.

Web interface

Simply dump the contents of the database to CSV files, copy them to the master webserver, and update its PostgreSQL database:

./to_csv.py
mv *.csv /website/ftp/pub/bmrb/relational_tables/bmrb

Then log on to master webserver and run /website/ftp/pub/bmrb/relational_tables/bmrb/load_db.sh there.

The update will be mirrored to public servers automatically.

Software dictionaries

Generated dictionary files are placed in their own svn repository: https://svn.bmrb.wisc.edu/svn/nmr-star-software-dictionaries/, that makes updating the software a bit more convenient. So, first get the local copy of that:

svn co file:///svn/nmr-star-software-dictionaries

(Note: use file: URL so you can commit later: https: URL is “read-only”)
Once you've generated all the files and copied them to nmr-star-software-dictionaries,

cd nmr-star-software-dictionaries
svn commit -m `date +%F`

(I usually put the date in commit message).

ADIT-NMR

Scripts for generating ADIT-NMR files are in adit subdirectory.

cd adit
./make_dict.py
mv View-bmrb.cif adit_nmr_upload_tags.csv bmrb_1.view default-entry.cif dict.cif mmcif_bmrb.dic nmrcifmatch.cif table_dict.str ../nmr-star-software-dictionaries/adit-nmr/
cd ..

Validator

Validator currently exists in 3 versions:

  1. “Production”: java code used by annotators for interactive work and by ADIT-NMR to post-process new depositions.
  2. “Development”: python code that does more stringent checking and should eventually become “the validator”.
  3. “Entry release”: a one-off temporary hack that should have been replaced by the python version long time ago (but of course didn't – at least as of the time of this writing).

Each uses its own dictionary file(s).

Note: if you're using a fresh checkout (“from scratch”), fix paths in *.properties files in each subdirectory before running the scripts.

Production

The scripts are in production subdirectory. You need to generate two versions of the dictionary: one for annotation (“mode 3”) and one for released BMRB entries (“mode 1”). Main script takes a while. Running off a local hard drive (rather than a network share) helps.

cd production

If you've started from scratch (fresh svn code checkout), first edit dictionary.properties and fix the paths.

./make_validict.py
./print_validict.py > validict.3.str
./make_validict.py -m 0
./print_validict.py > validict.1.str
mv validict.?.str ../nmr-star-software-dictionaries/validator/production/
cd ..

Entry release

As above, but the scripts are in entry_release directory. Fix dictionary.properties in there, then run ./make_validit.py followed by ./print_validict.py (note no output redirect for that one). Finally,

mv validict.str ../nmr-star-software-dictionaries/validator/entry_release/
cd ..

Development

  • For postgres :
    • run ./make_validict.py, then
    • ./print_validict.py -c OUTPUT DIRECTORY

CSV files to load into postgres will be in the OUTPUT DIRECTORY. The schema is in validict.psql.

  • For sqlite3 :
    • run ./make_validict.py, then
    • just copy dict.sqlt3 to wherever.
Login