Manual page for seqc program

Seqc is a tool for manipulating protein sequences.

Seqc reads files in formats: PDB, FASTA, DSSP, SEQ, Blast (sequence profile), and PsiPred (secondary structure prediction output). The information may be written in one of the following formats: FASTA and SEQ

-h print a brief summary of available options
-help=<name-part> print a help message on the screen - ANSI terminal version with visual enhancements. If <name-part> argumen is given, the program will print only these options that contains that substring
-help.dox=<name> print a help message in doxygen (*.dox) format on the screen for the Seqc program
-help.md=<name> print a help message in markdown (*.md) format on the screen
-help.option=<option-name> print a help message for a single option on the screen.
-help.plain=<name-part> print a help message on the screen - plain text version. If <name-part> argumen is given, the program will print only these options that contains that substring
-in.chk=<file name> an input file in the PsiBlast profile from a binary checkpoint file. Such a file may be prepared by running PsiBlast with -C option.
-in.chk.list=<file name> an input file that lists PsiBlast checkpoint files.
-in.dssp=<file name> an input file in the DSSP format
-in.fasta=<file name> an input file in the FASTA format
-in.pdb=<file name> an input data in the PDB format. This option accepts the following types of input:
- file in PDB format (possibly gzip’ped)
- several files in PDB format (possibly gzip’ped) - in this case you must use -in.pdb.comma_separated option to let the program know to split the input string
- just PDB code, with -online option the data is downloaded from www.rcsb.org
- just PDB code, with -in.pdb.search_path option the program will look for the right file
-in.pdb.all_models=<T|F> forces PDB reader to take all the models from a PDB file.
-in.pdb.comma_separated forces -in.pdb option to look for several PDB file names, separated by a comma. In This case any of the file names may not contain a comma character.
Example:
-in.pdb.comma_separated -in.pdb=2gb1.pdb,2aza.pdb
-in.pdb.create_bu=<T|F> forces PDB reader to create biological unit for each structure. Biological unit creation is solely based on information stored in the PDB file header.Any error in the header will affect the resulting biological unit. This option fores PDB reading mechanism to create an array of structures for each structure (MODEL data) in the given PDB. Therefore (to avoid handling too many molecules) it is advised NOT to use -in.pdb.all_models combined with this option.
-in.pdb.dir=<dir name> provides directory with PDB files
-in.pdb.file_mask=<mask_regexp> provides a mask for directory-based input, e.g. for -input_pdb_dir option. Without file_mask, directory related options try to read all possible files in a directory. This is a way to change it and narrow the selection. In general a file mask should follow the rules of regular expression in Java. The only exceptions are >.< (dot character) and >*< (asterix) that should be given explicitly (with no escaping). Example masks are: *.pdb 1[ABC]*.pdb 1(MBA|;mba).pdb
-in.pdb.first_model_only=<T|F> forces PDB reader to take only the first model from a PDB file.
-in.pdb.online=<T|F> download a PDB file from www.rcsb.org rather than reading a file. In this case the parameter given to -ip must provide a valid four-character PDB code
-in.pdb.read_hydrogens=<T|F> forces PDB reader to read in all hydrogen atoms. This by default is switched off and all hydrogens are discarded
-in.pdb.search_path=<path string> provides a path where -ip and other PDB-reading options will look for a PDB data. In this case the parameter given to -ip must provide a valid four-character PDB code rather than a file name. Then, for the code (say, 1abc), several possible file locations will be tested, e.g:
PATH/1abc
PATH/1abc.pdb
PATH/1abc.pdb.gz
PATH/pdb1abc.ent
PATH/pdb1abc.ent.gz
PATH/1ABC
PATH/1ABC.PDB
PATH/ab/pdb1abc.ent
PATH/ab/pdb1abc.ent.gz
-in.pdb.skip_header=<T|F> skip a header when parsing a PDB file.
-in.pir=<file name> an input file in the PIR format
-in.psipred=<file name> reads a file with psipred secondary structure prediction (simple PsiPred format)
-in.pssm=<file name> an input file in the PsiBlast profile (PSSM) format
-in.seq=<file name> an input file in the SEQ format
-in.ss2=<file name> reads a file with psipred secondary structure prediction (SS2 file format) with H, E, C probabilities
-mute suppress all messages from a given package or class, e.g. “-mute=jbcl.data.formats”, or “-mute=jbcl.calc.structural.Crmsd”. It is also possible to switch of a whole branch from the jbcl library, e.g. “-mute=jbcl.data” will mute all comming from jbcl.data.formats,jbcl.data.types, jbcl.data.dict and jbcl.data.basic. To switch all the messages, say: “-mute=jbcl” or simply “-mute” because the default behaviour is to mute everything. This option is executed AFTER -verbose, so user may increase verbosity level to a desired valueand then selectively switch off logging from some packages
-out.fasta=<file name> prints relevant sequences in the FASTA format
-out.fasta.width=<number> sets the new width for the FASTA format. Say 0 or a negative number to set infinite number of columns and print the whole sequence in a single line.
-out.seq=<file name> prints relevant sequences in the SEQ format
-out.ss2=<file name> prints relevant sequence and secondary structure probabilities in the PsiPred-SS2 format
-select.chains=<characters> selects chains defined by their PDB id (single character per chain). Example: -select.chains=ABD
-seqc.detect_secondary=<T|F> detects secondary structure using DSSP algorithm for each protein structure given in input. Note that a protein structure must have a full backbone to detect hydrogen bonds properly
-seqc.show_ss prints secondary structure. This option is used only in FASTA output. Secondary struture in three letter code (H, E and C) is printed just after relevant amino acid sequence
-seqc.sse=<T|F> list all secondary structure elements
-verbose=<integer> Sets up a verbosity level to a given value. The argument should be an integer from the rangefrom 0 (no messages at all, which is equivalent to -mute=jbcl) to 6 when everything is logged. See -mute for additional information.

EXAMPLES

      (1) Convert a multiline sequence in FASTA file into a single line (by
setting the number of columns as 0).
    java apps.Seqc -if=long_sequence.fasta -op=one_line.fasta -out.fasta.width=0


      (1) As above, but download the protein from RCSB website.
    java apps.Strc -ip=3dty -online  -select.chains=E -op=3dtyE.pdb


      (3) As above, but look for the protein in a local PDB mirror. Extract
both 'A' and 'E' chains:
    java apps.Strc -ip=3dty -in.pdb.search_path=/net/wwpdb -op=3dtyE.pdb  -select.chains=EA


      (4) Detect secondary structure for a protein based on coordinates
from a PDB file and save it in SS2 format:
    java apps.Seqc -ip=193l.pdb -seqc.detect_secondary -out.ss2=193l.dssp.ss2


      (5) Convert DSSP file into SS2 format: java apps.Seqc -in.dssp=2azaA.dssp -out.ss2=2azaA.dssp.ss2


      (6) Read a PDB and print CA-only PDB on the screen
    java apps.Strc -ip=3dty.pdb -select.ca -op