Manual page for apps.RmsCalc program

RmsCalc is an extremely flexible program for crmsd and drmsd calculations. It can compare a reference structure against one ore more target structures.It is also possible to point RmsCalc to a directory of PDB files. File names may be filtered by a regular expression.
The program can calculate crmsd, drmsd, GDT, TM_score and MaxSub scores. These parameters may be evaluated solely on C-alpha atoms, protein backbone or on all atoms that are common for the two structures being compared. It is also possible to compare based on a sequence alignment.
-align.gap_extend=<value> Penalty for gap opening that will be used for sequence or profile alignment. When used with BLOSSUM or PAM matrices, is usually in the range [-2,-1].
-align.gap_open=<value> Penalty for gap opening that will be used for sequence or profile alignment. When used with BLOSSUM or PAM matrices, is usually in the range [-12,-10].
-align.global -global
-align.local -local
-align.matrix=<matrix name> Defines which substitition matrix will be used to score sequence alignments. Default is BLOSUM62.
-align.query.pdb=<file name> an input query (target) file in the PDB format
-align.query.pdbdir=<dir name> provides directory with input PDB files that will be used as queries
-align.query.pdblist=<file name> an input text file that lists names of query structures in PDB format (with paths if necessary)
-align.sequences=<T|F> aligns given sequences. The alignment method used depends on given flags. User may specify either -local or -global. Defaults is -global
-align.template.pdb=<file name> an input template (or in general, the reference) protein structure(s) in the PDB format
-align.template.pdblist=<file name> an input text file that lists names of template structures in PDB format (with paths if necessary)
-calc.crmsd calculates crmsd distance between protein structures
-calc.crmsd.all_pairs=<file name> calculates the distances between all query and all template structures. This option enforces reading all the models from the query file. Results are printed to a file or to stdout if no output file is given.
-calc.drmsd calculates drmsd distance between protein structures
-calc.gdt=<value> calculates gdt score between protein structures based on a given distance cutoff, which by defaults is set to 2.0 Angstroms
-calc.gdt_ts calculates gdt_ts score between protein structures
-calc.lcs=<value> calculates LCS (Longest Continuous Segment) score between protein structures
-calc.max_sub calculates gdt_ts score between protein structures
-calc.q calculates drmsd distance between protein structures
-calc.tm_score calculates tm_score score between protein structures
-calc.tm_score.ref_length=<number> : provide a reference chain length for structure similarity measures. By default LCS, GDT and TM-score measures are normalized to the length of a target structure. This behaviour my be changed with this option
-h print a brief summary of available options
-help=<name-part> print a help message on the screen - ANSI terminal version with visual enhancements. If <name-part> argumen is given, the program will print only these options that contains that substring
-help.dox=<name> print a help message in doxygen (*.dox) format on the screen for the RmsCalc program
-help.md=<name> print a help message in markdown (*.md) format on the screen
-help.option=<option-name> print a help message for a single option on the screen.
-help.plain=<name-part> print a help message on the screen - plain text version. If <name-part> argumen is given, the program will print only these options that contains that substring
-in.pdb.all_models=<T|F> forces PDB reader to take all the models from a PDB file.
-in.pdb.comma_separated forces -in.pdb option to look for several PDB file names, separated by a comma. In This case any of the file names may not contain a comma character.
Example:
-in.pdb.comma_separated -in.pdb=2gb1.pdb,2aza.pdb
-in.pdb.create_bu=<T|F> forces PDB reader to create biological unit for each structure. Biological unit creation is solely based on information stored in the PDB file header.Any error in the header will affect the resulting biological unit. This option fores PDB reading mechanism to create an array of structures for each structure (MODEL data) in the given PDB. Therefore (to avoid handling too many molecules) it is advised NOT to use -in.pdb.all_models combined with this option.
-in.pdb.file_mask=<mask_regexp> provides a mask for directory-based input, e.g. for -input_pdb_dir option. Without file_mask, directory related options try to read all possible files in a directory. This is a way to change it and narrow the selection. In general a file mask should follow the rules of regular expression in Java. The only exceptions are >.< (dot character) and >*< (asterix) that should be given explicitly (with no escaping). Example masks are: *.pdb 1[ABC]*.pdb 1(MBA|;mba).pdb
-in.pdb.first_model_only=<T|F> forces PDB reader to take only the first model from a PDB file.
-in.pdb.online=<T|F> download a PDB file from www.rcsb.org rather than reading a file. In this case the parameter given to -ip must provide a valid four-character PDB code
-in.pdb.read_hydrogens=<T|F> forces PDB reader to read in all hydrogen atoms. This by default is switched off and all hydrogens are discarded
-in.pdb.search_path=<path string> provides a path where -ip and other PDB-reading options will look for a PDB data. In this case the parameter given to -ip must provide a valid four-character PDB code rather than a file name. Then, for the code (say, 1abc), several possible file locations will be tested, e.g:
PATH/1abc
PATH/1abc.pdb
PATH/1abc.pdb.gz
PATH/pdb1abc.ent
PATH/pdb1abc.ent.gz
PATH/1ABC
PATH/1ABC.PDB
PATH/ab/pdb1abc.ent
PATH/ab/pdb1abc.ent.gz
-in.pdb.skip_header=<T|F> skip a header when parsing a PDB file.
-mute suppress all messages from a given package or class, e.g. “-mute=jbcl.data.formats”, or “-mute=jbcl.calc.structural.Crmsd”. It is also possible to switch of a whole branch from the jbcl library, e.g. “-mute=jbcl.data” will mute all comming from jbcl.data.formats,jbcl.data.types, jbcl.data.dict and jbcl.data.basic. To switch all the messages, say: “-mute=jbcl” or simply “-mute” because the default behaviour is to mute everything. This option is executed AFTER -verbose, so user may increase verbosity level to a desired valueand then selectively switch off logging from some packages
-out.pdb=<file name> prints relevant structures in the PDB format. The default behaviour is to print on standard output, user may give a file name as a parameter.
-out.pdb.chain_ids=<chars> provides characters for new chain IDs. Output chains will be renamed by this new IDs. In the case of writing an alignment, the first character will be used for a query and the second one for a template structure
-rmscalc.all_atom=<T|F> uses all atoms from both structures for distance calculations.By default all measures are computed on CA atoms.
-rmscalc.show_alignment=<T|F> prints the alignment for which the similarity measures were evaluatedBy default all measures are computed on CA atoms.
-rmscalc.superimpose_all=<T|F> superimpose the whole structure, no matter what atoms have been used to find optimal transformation.With this option output PDB file will contains all the atoms specified in the input structures.
-select.bb filters input protein structure(s) and removes all the atoms except its backbone.
-select.ca filters input protein structure(s) and removes all the atoms except alpha carbons.
-select.fragment=<selection expression> selects residues and chains defined by their PDB residue ID and chain ID. The selection string may be a combination of selectors, separated with a semicolon. For example the following: -select.fragment=A.43:78;B.12:89 selects residues 48:78 from chain A and residues 12:89 from chain B.
-select.residues_by_id=<residue selection> selects residues defined by their PDB ID. All chains will be processed separately.
-select.residues_by_index=<residue selection> selects residues defined by the order they appear in a structure. The first residue has index 0. All chains will be processed separately.
-verbose=<integer> Sets up a verbosity level to a given value. The argument should be an integer from the rangefrom 0 (no messages at all, which is equivalent to -mute=jbcl) to 6 when everything is logged. See -mute for additional information.

EXAMPLES

      (1) Calculate several similarity (distance) measures between two
chains (PDB format).
    java apps.RmsCalc -tp=1ixa_.pdb -qp=model.pdb


      (1) Calculate tm-score between two chains (PDB format).
    java apps.RmsCalc -tp=1ixa_.pdb -qp=model.pdb -tm


      (3) Compute crmsd on sequence alignment of two structures:
    java apps.RmsCalc -tp=1ixa_.pdb -qp=model.pdb -align.sequences -rms


      (4) Computes GDT(2A) between a bunch of models and a native stucture.
The input models are choosen from a given directory by a file mask
    java apps.RmsCalc -qpd=./models -file_mask=models_75_*.pdb -tp=3d4oA.pdb -align_sequences -gdt=2


      (5) Read-in all models predicted for a given protein sequence and
evaluate various similarity parameters to the native:
    java apps.RmsCalc -align.query.pdbdir=./models/T0387 -tp=./casp_natives/T0387.pdb -gdt=2.0 -rms -calc.gdt_ts -calc.tm_score


      (6) Compares all-vs-all structures from the query set. Template
structure is not used.
    java apps.RmsCalc -qpd=2gb1-models/ -calc.crmsd.all_pairs -in.pdb.file_mask=*.pdb

      (7) Compute crmsd between all models in a query PDB (multimodel
format) java apps.RmsCalc -qp=models.pdb.gz -in.pdb.all_models -calc.crmsd -calc.crmsd.all_pairs=crmsd_all

      (8) Calculate structural similarity based on a global sequence
alignment.
    java apps.RmsCalc -qp=model.pdb -tp=native.pdb -align.sequences -align.local -rmscalc.show_alignment

      (9) Superimpose model on the native structure using the selected
residues as a reference; alter chain IDs and save the superimposition.
    java apps.RmsCalc -qp=model.pdb -tp=2gb1A.pdb -out.pdb.chain_ids=AB -op -select.residues_by_id=1:14,18:56