BioShell ver. 2.x Cookbook

This page provides plenty of handy one-liners that solve most staple bioinformatics problems, such as crmsd calculation, sequence and structure handling, basic calculations and many others. All these commands uses only BioShell programs, jython therefore is not required.

Data conversion - sequence related

To extract a FASTA sequence from a PDB file (result will be printed on the screen):

java apps.Seqc -ip=2gb1.pdb -of

Data conversion - secondary structure

To convert a DSSP file into FASTA style string, that contains secondary structure in HEC code:

java apps.Seqc -id=2gb1.dssp -of -seqc.show_ss

To convert a DSSP file into SS2 file format:

java apps.Seqc -in.dssp=2gb1.dssp -out.ss2=2gb1.dssp.ss2

To count residues by secondary structure: H, E or C (Helix, Extended or Coil):

java apps.Seqc -id=2gb1.dssp -of -seqc.show_ss | tail -1 | awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++) m[$i]++} END{for(i in m) print i,m[i]}'

There are three commands in the above pipeline. The first one uses Seqc app of BioShell to extract a secondary structure string - it will be the last line of the output. The second (tail -1) actually takes the last line. Finally, awk counts the particular H, E and C characters.

Data conversion - tertiary structure

To extract chain A from a PDB file (remove everything but amino aicds thanks to -select.aa):

java apps.Strc -ip=pdb10mh.ent -op=10mhA.pdb -select.chains=A -select.aa

Superpositions and other crmsd-related calculations

To calculate crmsd distance between two structures having the same number of residues; the value will be computed on alpha-carbons only :

java apps.RmsCalc -qp=model.pdb -tp=2gb1.pdb -rms

To calculate crmsd, drmsd, GDT and LCS distances (or scores) between two protein structures having the same number of residues; the value will be computed on alpha-carbons only :

java apps.RmsCalc -qp=model.pdb -tp=2gb1.pdb

To superimpose one protein structure (-qp ) on the other (-tp ) based on alpha-carbons; the transformation (rotation+translation) is based on C-alpha but all the atoms from the query structure will be transformed:

java apps.RmsCalc -qp=model.pdb -tp=2gb1.pdb -op

To calculate crmsd distance between models from a subdirectory and a reference structure; when the file mask is omitted, all the files from the subdirectory are used for calculations. Skip -rms flag to compute crmsd, drmsd, GDT, TMscore and LSC rather than just crmsd:

java apps.RmsCalc -align.query.pdbdir=./models/ -in.pdb.file_mask=2*.pdb -tp=2gb1.pdb -rms

To calculate pairwise crmsd bewteen all models in a directory; note that in this case there is no reference structure provided :

java apps.RmsCalc -align.query.pdbdir=./models/ -calc.crmsd.all_pairs