Manual page for apps.PRAline program

Using PRAline (Profile Alignment) tool one can calculate optimal and sub-optimal alignments between two amino acid sequences or sequence profiles.

In general, with PRAline one can:

  • align two amino acid sequences: use one of the following flags: [-qp -qf -qs] and one of these: [-tp -tf -ts] to provide query and template sequences, respectively

  • align two amino acid profiles: use -qb and -tb options to provide input profiles

  • align two amino acid sequences (or profiles) with secondary structure information In this case secondary structure must be provided by SEQ files (-qs and -ts options).

-align.cross_product Use crossproduct to score profile-profile sequence similarity
-align.dot_product Use dot product to score profile-profile sequence similarity
-align.gap_extend=<value> Penalty for gap opening that will be used for sequence or profile alignment. When used with BLOSSUM or PAM matrices, is usually in the range [-2,-1].
-align.gap_open=<value> Penalty for gap opening that will be used for sequence or profile alignment. When used with BLOSSUM or PAM matrices, is usually in the range [-12,-10].
-align.global -global
-align.l1score Use L1 metrics to score profile-profile sequence similarity
-align.local -local
-align.matrix=<matrix name> Defines which substitition matrix will be used to score sequence alignments. Default is BLOSUM62.
-align.out.all=<T|F> asks global alignment write all the optimal alignments
-align.out.edinburgh save output alignment(s) in Edinburgh format
-align.out.fasta save output alignment(s) in FASTA format
-align.out.intelligenetics save output alignment(s) in Intelligenetics format
-align.out.just_scores save only scores for output alignment(s)
-align.picasso3 Use Picasso3 to score profile-profile sequence similarity
-align.prof_sim Use ProfSim method to score profile-profile sequence similarity
-align.query.chk=<file name> an input PsiBlast profile from a binary checkpoint file. Such a file may be prepared by running PsiBlast with -C option.
-align.query.chk.list=<file name> an input file that lists query checkpoint files.
-align.query.fasta=<file name> an input query (target) sequence(s) in FASTA format
-align.query.pdb=<file name> an input query (target) file in the PDB format
-align.query.pir=<file name> reads an input file in the PIR format and extracts all sequences. These will be used as a queries (targets) in alignment calculations
-align.query.pssm=<file name> an input file in the PsiBlast profile (PSSM) format
-align.query.seq=<file name> an input query sequence in SEQ format
-align.query.seqname=<string> provide the default name for a query sequence. The string will be used to identify each query sequence if any other name is not available
-align.query.ss2=<file name> an input query secondary structure prediction in the PsiPred’s SS2format.
-align.query.string=<seq_string> provide a query sequence as a FASTA-like string
-align.score_bias=<value> A per-position constant value added to profile alignment score. It influences only local alignment
-align.ss_weight=<value> A weight value for the secondary structure similarity score.
-align.template.chk=<file name> an input PsiBlast profile from a binary checkpoint file. Such a file may be prepared by running PsiBlast with -C option.
-align.template.chk.list=<file name> an input file that lists template checkpoint files.
-align.template.fasta=<file name> an input template sequence(s) in FASTA format
-align.template.pdb=<file name> an input template (or in general, the reference) protein structure(s) in the PDB format
-align.template.pir=<file name> reads an input file in the PIR format and extracts all sequences. These will be used as a templates in alignment calculations
-align.template.pssm=<file name> an input file in the PsiBlast profile (PSSM) format
-align.template.seq=<file name> an input template sequence in SEQ format
-align.template.seqname=<string> provide the default name for a template sequence. The string will be used to identify each template sequence if any other name is not available
-align.template.ss2=<file name> an input template secondary structure prediction in the PsiPred’s SS2format.
-align.template.string=<seq_string> provide a template sequence as a FASTA-like string
-h print a brief summary of available options
-help=<name-part> print a help message on the screen - ANSI terminal version with visual enhancements. If <name-part> argumen is given, the program will print only these options that contains that substring
-help.dox=<name> print a help message in doxygen (*.dox) format on the screen for the PRAline program
-help.md=<name> print a help message in markdown (*.md) format on the screen
-help.option=<option-name> print a help message for a single option on the screen.
-help.plain=<name-part> print a help message on the screen - plain text version. If <name-part> argumen is given, the program will print only these options that contains that substring
-in.pdb.all_models=<T|F> forces PDB reader to take all the models from a PDB file.
-in.pdb.comma_separated forces -in.pdb option to look for several PDB file names, separated by a comma. In This case any of the file names may not contain a comma character.
Example:
-in.pdb.comma_separated -in.pdb=2gb1.pdb,2aza.pdb
-in.pdb.create_bu=<T|F> forces PDB reader to create biological unit for each structure. Biological unit creation is solely based on information stored in the PDB file header.Any error in the header will affect the resulting biological unit. This option fores PDB reading mechanism to create an array of structures for each structure (MODEL data) in the given PDB. Therefore (to avoid handling too many molecules) it is advised NOT to use -in.pdb.all_models combined with this option.
-in.pdb.first_model_only=<T|F> forces PDB reader to take only the first model from a PDB file.
-in.pdb.online=<T|F> download a PDB file from www.rcsb.org rather than reading a file. In this case the parameter given to -ip must provide a valid four-character PDB code
-in.pdb.read_hydrogens=<T|F> forces PDB reader to read in all hydrogen atoms. This by default is switched off and all hydrogens are discarded
-in.pdb.search_path=<path string> provides a path where -ip and other PDB-reading options will look for a PDB data. In this case the parameter given to -ip must provide a valid four-character PDB code rather than a file name. Then, for the code (say, 1abc), several possible file locations will be tested, e.g:
PATH/1abc
PATH/1abc.pdb
PATH/1abc.pdb.gz
PATH/pdb1abc.ent
PATH/pdb1abc.ent.gz
PATH/1ABC
PATH/1ABC.PDB
PATH/ab/pdb1abc.ent
PATH/ab/pdb1abc.ent.gz
-in.pdb.skip_header=<T|F> skip a header when parsing a PDB file.
-mute suppress all messages from a given package or class, e.g. “-mute=jbcl.data.formats”, or “-mute=jbcl.calc.structural.Crmsd”. It is also possible to switch of a whole branch from the jbcl library, e.g. “-mute=jbcl.data” will mute all comming from jbcl.data.formats,jbcl.data.types, jbcl.data.dict and jbcl.data.basic. To switch all the messages, say: “-mute=jbcl” or simply “-mute” because the default behaviour is to mute everything. This option is executed AFTER -verbose, so user may increase verbosity level to a desired valueand then selectively switch off logging from some packages
-out.fasta.width=<number> sets the new width for the FASTA format. Say 0 or a negative number to set infinite number of columns and print the whole sequence in a single line.
-out.pdb=<file name> prints relevant structures in the PDB format. The default behaviour is to print on standard output, user may give a file name as a parameter.
-verbose=<integer> Sets up a verbosity level to a given value. The argument should be an integer from the rangefrom 0 (no messages at all, which is equivalent to -mute=jbcl) to 6 when everything is logged. See -mute for additional information.