Manual page for apps.Strc program

Strc is a tool for manipulating protein structures. The program can read a protein structure from PDB or DSSP files. It is also possible to combine XYZ coordinates with an amino acid sequence from a FASTA or a SEQ file. Program can also change protein representation (e.g. from all-atom to Rosetta, CABS or Refin models).

-h print a brief summary of available options
-help=<name-part> print a help message on the screen - ANSI terminal version with visual enhancements. If <name-part> argumen is given, the program will print only these options that contains that substring
-help.dox=<name> print a help message in doxygen (*.dox) format on the screen for the Strc program
-help.md=<name> print a help message in markdown (*.md) format on the screen
-help.option=<option-name> print a help message for a single option on the screen.
-help.plain=<name-part> print a help message on the screen - plain text version. If <name-part> argumen is given, the program will print only these options that contains that substring
-in.dssp=<file name> an input file in the DSSP format
-in.fasta=<file name> an input file in the FASTA format
-in.pdb=<file name> an input data in the PDB format. This option accepts the following types of input:
- file in PDB format (possibly gzip’ped)
- several files in PDB format (possibly gzip’ped) - in this case you must use -in.pdb.comma_separated option to let the program know to split the input string
- just PDB code, with -online option the data is downloaded from www.rcsb.org
- just PDB code, with -in.pdb.search_path option the program will look for the right file
-in.pdb.all_models=<T|F> forces PDB reader to take all the models from a PDB file.
-in.pdb.comma_separated forces -in.pdb option to look for several PDB file names, separated by a comma. In This case any of the file names may not contain a comma character.
Example:
-in.pdb.comma_separated -in.pdb=2gb1.pdb,2aza.pdb
-in.pdb.create_bu=<T|F> forces PDB reader to create biological unit for each structure. Biological unit creation is solely based on information stored in the PDB file header.Any error in the header will affect the resulting biological unit. This option fores PDB reading mechanism to create an array of structures for each structure (MODEL data) in the given PDB. Therefore (to avoid handling too many molecules) it is advised NOT to use -in.pdb.all_models combined with this option.
-in.pdb.dir=<dir name> provides directory with PDB files
-in.pdb.file_mask=<mask_regexp> provides a mask for directory-based input, e.g. for -input_pdb_dir option. Without file_mask, directory related options try to read all possible files in a directory. This is a way to change it and narrow the selection. In general a file mask should follow the rules of regular expression in Java. The only exceptions are >.< (dot character) and >*< (asterix) that should be given explicitly (with no escaping). Example masks are: *.pdb 1[ABC]*.pdb 1(MBA|;mba).pdb
-in.pdb.first_model_only=<T|F> forces PDB reader to take only the first model from a PDB file.
-in.pdb.list=<file name> an input text file that lists names of PDB files (with paths if necessary)
-in.pdb.online=<T|F> download a PDB file from www.rcsb.org rather than reading a file. In this case the parameter given to -ip must provide a valid four-character PDB code
-in.pdb.read_hydrogens=<T|F> forces PDB reader to read in all hydrogen atoms. This by default is switched off and all hydrogens are discarded
-in.pdb.search_path=<path string> provides a path where -ip and other PDB-reading options will look for a PDB data. In this case the parameter given to -ip must provide a valid four-character PDB code rather than a file name. Then, for the code (say, 1abc), several possible file locations will be tested, e.g:
PATH/1abc
PATH/1abc.pdb
PATH/1abc.pdb.gz
PATH/pdb1abc.ent
PATH/pdb1abc.ent.gz
PATH/1ABC
PATH/1ABC.PDB
PATH/ab/pdb1abc.ent
PATH/ab/pdb1abc.ent.gz
-in.pdb.skip_header=<T|F> skip a header when parsing a PDB file.
-in.pir=<file name> an input file in the PIR format
-in.seq=<file name> an input file in the SEQ format
-in.xyz=<file name> an input file in the XYZ format
-mute suppress all messages from a given package or class, e.g. “-mute=jbcl.data.formats”, or “-mute=jbcl.calc.structural.Crmsd”. It is also possible to switch of a whole branch from the jbcl library, e.g. “-mute=jbcl.data” will mute all comming from jbcl.data.formats,jbcl.data.types, jbcl.data.dict and jbcl.data.basic. To switch all the messages, say: “-mute=jbcl” or simply “-mute” because the default behaviour is to mute everything. This option is executed AFTER -verbose, so user may increase verbosity level to a desired valueand then selectively switch off logging from some packages
-out.pdb=<file name> prints relevant structures in the PDB format. The default behaviour is to print on standard output, user may give a file name as a parameter.
-representation.defined=<name> returns a predefined representation that will be used to refactor a protein structure. The available representations are:
BACKBONE_CB
BACKBONE_CEN
CA_ONLY
CEN_ONLY
BACKBONE_ONLY
CA_CEN
CABS
REFIN
ROSETTA
-select.aa selects only amino acid residues.
-select.atoms=<strings> selects atoms defined by their PDB name. Example: -select.atoms=CA,N,C,O select all backbone atoms
-select.bb filters input protein structure(s) and removes all the atoms except its backbone.
-select.bb_cb filters input protein structure(s) and removes all the atoms except its backbone or beta carbon.
-select.ca filters input protein structure(s) and removes all the atoms except alpha carbons.
-select.chains=<characters> selects chains defined by their PDB id (single character per chain). Example: -select.chains=ABD
-select.elements=<strings> selects atoms defined by their chemical element.
Example: -select.elements=N,O select all oxygens and nitrogens
-select.elements=C -select.bb_cb select all carbons from backbone + CB, effectively carbonyl C, CA and CB
-select.fragment=<selection expression> selects residues and chains defined by their PDB residue ID and chain ID. The selection string may be a combination of selectors, separated with a semicolon. For example the following: -select.fragment=A.43:78;B.12:89 selects residues 48:78 from chain A and residues 12:89 from chain B.
-select.fragment.by_sequence=<sequence of file> selects a fragment of a protein (or nucleic acid) structure based on a sequence fragment. User may specify a file name (FASTA format) the exact seuqence string
-select.models=<selection> picks up selected frames (models) for further operations. Frame indices starts from 0. Example: -select.models=0:45,90:99
-select.residues_by_id=<residue selection> selects residues defined by their PDB ID. All chains will be processed separately.
-select.residues_by_index=<residue selection> selects residues defined by the order they appear in a structure. The first residue has index 0. All chains will be processed separately.
-strc.renumber=<firstId> renumber residues and atoms starting from an index specified by a user
-strc.set_bfactors=<string> provides a file with new b-factors. The will be used to substitute original temperature factors in loaded structures. The values should be provided as a single column in a file. If they exceed 1.0, they will automatically be renormalized.
-strc.show_summary=<T|F> prints a summary about a PDB file.
-verbose=<integer> Sets up a verbosity level to a given value. The argument should be an integer from the rangefrom 0 (no messages at all, which is equivalent to -mute=jbcl) to 6 when everything is logged. See -mute for additional information.

EXAMPLES

      (1) Read a given PDB file (3dty protein) and extract the chain 'E'
into a file.
    java apps.Strc -ip=3dty.pdb -op=3dtyE.pdb -select.chains=E


      (1) As above, but download the protein from RCSB website.
    java apps.Strc -ip=3dty -online  -select.chains=E -op=3dtyE.pdb


      (3) As above, but look for the protein in a local PDB mirror. Extract
both 'A' and 'E' chains:
    java apps.Strc -ip=3dty -in.pdb.search_path=/net/wwpdb -op=3dtyE.pdb  -select.chains=EA


      (4) Read a directory of PDB files and combine them into a single
multimodel PDB
    java apps.Strc -ipd=test/dataset/2gb1-models/ -in.pdb.file_mask=*.pdb -out.pdb=2gb1-multimodel.pdb


      (5) Transforms a protein into another representation (CA + united SC
in this case
    java apps.Strc -representation.defined=CA_CEN -ip=2gb1.pdb

      (6) Read a PDB and print CA-only PDB on the screen
    java apps.Strc -ip=3dty.pdb -select.ca -op


      (7) Read a multimodel-PDB and print selected model(s) into a file;
print also hydrogen atoms, if present
    ava apps.Strc -ip=out.pdb -in.pdb.all_models=T -select.models=238,190,242 -op=models.pdb -in.pdb.read_hydrogens=T