Manual page for apps.PsiBlastAnalyse program

PsiBlastAnalyse - parser and filter for PsiBlast results. The program is intended to parse the results from PriBlastSearch tool, although it can read in any ouput fromBlast or PsiBlast program, providing that is in format “0” (i.e. -m 0 was used to run Blast). This program can read in and combine several blast outfiles. Results may be filtered in several different ways

-align.query.fasta=<file name> an input query (target) sequence(s) in FASTA format
-align.query.pdb=<file name> an input query (target) file in the PDB format
-blast.db=<string> full path to the database to be used in blast searches
-blast.dry_run in dry run mode the program does not run (Psi)Blast; all the commands are printed on stdout
-blast.filter_hits.by_coverage=<value> discard all hits that cover less than a given ratio of the query sequence
-blast.filter_hits.by_evalue=<value> discard all hits worse than a given e-value cutoff
-blast.filter_hits.by_seqid=<value> remove redundant HSPs by applying sequence identity cutoff
-blast.filter_hits.longest_gap=<number> remove HSPs whose longest gap is too long (by default it removes hits with score negative)
-blast.filter_hits.min_score=<value> remove HSPs whose BLAST score is too low (by default it removes hits with score negative)
-blast.filter_hits.query_seq_id=<value> remove HSPs whose sequence identity measured to the query is too low
-blast.filter_hits.uniq_gid remove sequences with redundant GId
-blast.input_hits=<strings> provide one or more psiblast output files to be processed
-blast.input_hits.mask=<string> provide a bash-like mask to select input files; the path must be specified with -blast.input_hits.path option
-blast.input_hits.path=<string> provide a path to input files (psiblast output). This option reads the value of -blast.input_hits.mask flag, which by default is set to: *.outfile
-blast.show_cfg prints Psi-Blast cofiguration on stdout
-h print a brief summary of available options
-help=<name-part> print a help message on the screen - ANSI terminal version with visual enhancements. If <name-part> argumen is given, the program will print only these options that contains that substring
-help.dox=<name> print a help message in doxygen (*.dox) format on the screen for the PsiBlastAnalyse program
-help.md=<name> print a help message in markdown (*.md) format on the screen
-help.option=<option-name> print a help message for a single option on the screen.
-help.plain=<name-part> print a help message on the screen - plain text version. If <name-part> argumen is given, the program will print only these options that contains that substring
-in.pdb.all_models=<T|F> forces PDB reader to take all the models from a PDB file.
-in.pdb.comma_separated forces -in.pdb option to look for several PDB file names, separated by a comma. In This case any of the file names may not contain a comma character.
Example:
-in.pdb.comma_separated -in.pdb=2gb1.pdb,2aza.pdb
-in.pdb.create_bu=<T|F> forces PDB reader to create biological unit for each structure. Biological unit creation is solely based on information stored in the PDB file header.Any error in the header will affect the resulting biological unit. This option fores PDB reading mechanism to create an array of structures for each structure (MODEL data) in the given PDB. Therefore (to avoid handling too many molecules) it is advised NOT to use -in.pdb.all_models combined with this option.
-in.pdb.first_model_only=<T|F> forces PDB reader to take only the first model from a PDB file.
-in.pdb.online=<T|F> download a PDB file from www.rcsb.org rather than reading a file. In this case the parameter given to -ip must provide a valid four-character PDB code
-in.pdb.read_hydrogens=<T|F> forces PDB reader to read in all hydrogen atoms. This by default is switched off and all hydrogens are discarded
-in.pdb.search_path=<path string> provides a path where -ip and other PDB-reading options will look for a PDB data. In this case the parameter given to -ip must provide a valid four-character PDB code rather than a file name. Then, for the code (say, 1abc), several possible file locations will be tested, e.g:
PATH/1abc
PATH/1abc.pdb
PATH/1abc.pdb.gz
PATH/pdb1abc.ent
PATH/pdb1abc.ent.gz
PATH/1ABC
PATH/1ABC.PDB
PATH/ab/pdb1abc.ent
PATH/ab/pdb1abc.ent.gz
-in.pdb.skip_header=<T|F> skip a header when parsing a PDB file.
-mute suppress all messages from a given package or class, e.g. “-mute=jbcl.data.formats”, or “-mute=jbcl.calc.structural.Crmsd”. It is also possible to switch of a whole branch from the jbcl library, e.g. “-mute=jbcl.data” will mute all comming from jbcl.data.formats,jbcl.data.types, jbcl.data.dict and jbcl.data.basic. To switch all the messages, say: “-mute=jbcl” or simply “-mute” because the default behaviour is to mute everything. This option is executed AFTER -verbose, so user may increase verbosity level to a desired valueand then selectively switch off logging from some packages
-out.fasta=<file name> prints relevant sequences in the FASTA format
-out.fasta.remove_gaps remove all gaps when saving a sequence into a FASTA format
-out.fasta.width=<number> sets the new width for the FASTA format. Say 0 or a negative number to set infinite number of columns and print the whole sequence in a single line.
-verbose=<integer> Sets up a verbosity level to a given value. The argument should be an integer from the rangefrom 0 (no messages at all, which is equivalent to -mute=jbcl) to 6 when everything is logged. See -mute for additional information.

EXAMPLES

      (1) Read a PsiBlast results file and write summay of all the hits in
a table.
    java apps.PsiBlastAnalyse -blast.input_hits=1.outfile -align.query.fasta=query.fasta


      (2) Read a PsiBlast results file and filter the results. Select
sequences with unique GID, only these that share at least 30% identical
residues with the query and remove redundancy at 90% seqID level
  java apps.PsiBlastAnalyse -blast.input_hits=q.outfile -blast.filter_hits.by_seqid=0.9 -v -qf=q.fasta -blast.filter_hits.query_seq_id=0.3 -blast.filter_hits.uniq_gid

      (3) Read a bunch of output files, filter sequences by their mutual
sequence identity ratio (no more than the 0.9 threshold)
    java apps.PsiBlastAnalyse -blast.input_hits.mask=*.outfile -blast.input_hits.path=./ -align.query.fasta=query.fasta -blast.filter_hits.by_seqid=0.9