Free Download reciprocal_smallest_distance for Linux ::: Bioinformatics

reciprocal_smallest_distance

Software Screenshot:

Software Details:

Version: 1.1.5

Upload Date: 20 Feb 15

Developer: Todd F. DeLuca and Dennis P. Wall

Distribution Type: Freeware

Downloads: 10

Download

Currently nan/5
1
2
3
4
5

Rating: nan/5 (Total Votes: 0)

reciprocal_smallest_distance is a pairwise orthology algorithm that uses global sequence alignment and maximum likelihood evolutionary distance between sequences to accurately detects orthologs between genomes.

Installing From a Tarball

Download and untar the latest version from github:

cd ~
curl -L https://github.com/downloads/todddeluca/reciprocal_smallest_distance/reciprocal_smallest_distance-VERSION.tar.gz | tar xvz

Install reciprocal_smallest_distance, making sure to use Python 2.7:

cd reciprocal_smallest_distance-VERSION
python setup.py install

Using RSD to Find Othologs

The following example commands demonstrate the main ways to run rsd_search. Every invocation of rsd_search requires specifying the location of a FASTA-formatted sequence file for two genomes, called the query and subject genomes. Their order is arbitrary, but if you use the --ids option, the ids must come from the query genome. You must also specify a file to write the results of the orthologs found by the RSD algorithm. The format of the output file contains one ortholog per line. Each line contains the query sequence id, subject sequence id, and distance (calculated by codeml) between the sequences. You can optionally specify a file containing ids using the --ids option. Then rsd will only search for orthologs for those ids. Using --divergence and --evalue, you have the option of using different thresholds from the defaults.

Get help on how to run rsd_search, rsd_blast, or rsd_format:

rsd_search -h
rsd_blast -h
rsd_format -h

Find orthologs between all the sequences in the query and subject genomes, using default divergence and evalue thresholds

rsd_search -q examples/genomes/Mycoplasma_genitalium.aa/Mycoplasma_genitalium.aa \
--subject-genome=examples/genomes/Mycobacterium_leprae.aa/Mycobacterium_leprae.aa \
-o Mycoplasma_genitalium.aa_Mycobacterium_leprae.aa_0.8_1e-5.orthologs.txt

Find orthologs using several non-default divergence and evalue thresholds

rsd_search -q examples/genomes/Mycoplasma_genitalium.aa/Mycoplasma_genitalium.aa \
--subject-genome=examples/genomes/Mycobacterium_leprae.aa/Mycobacterium_leprae.aa \
-o Mycoplasma_genitalium.aa_Mycobacterium_leprae.aa.several.orthologs.txt \
--de 0.2 1e-20 --de .5 0.00001 --de 0.8 0.1

It is not necessary to format a FASTA file for BLAST or compute BLAST hits because rsd_search does it for you.
However if you plan on running rsd_search multiple times for the same genomes, especially for large genomes, you can save time by using rsd_format to preformatting the FASTA files and rsd_blast to precomputing the BLAST hits. When running rsd_blast, make sure to use an --evalue as large as the largest evalue threshold you intend to give to rsd_search.

Here is how to format a pair of FASTA files in place:

rsd_format -g examples/genomes/Mycoplasma_genitalium.aa/Mycoplasma_genitalium.aa
rsd_format -g examples/genomes/Mycobacterium_leprae.aa/Mycobacterium_leprae.aa

And here is how to format the FASTA files, putting the results in another directory (the current directory in this case)

rsd_format -g examples/genomes/Mycoplasma_genitalium.aa/Mycoplasma_genitalium.aa -d .
rsd_format -g examples/genomes/Mycobacterium_leprae.aa/Mycobacterium_leprae.aa -d .

Here is how to compute forward and reverse blast hits (using the default evalue):

rsd_blast -v -q examples/genomes/Mycoplasma_genitalium.aa/Mycoplasma_genitalium.aa \
--subject-genome=examples/genomes/Mycobacterium_leprae.aa/Mycobacterium_leprae.aa \
--forward-hits q_s.hits --reverse-hits s_q.hits

Here is how to compute forward and reverse blast hits for rsd_search, using genomes that have already been formatted for blast and a non-default evalue

rsd_blast -v -q Mycoplasma_genitalium.aa \
--subject-genome=Mycobacterium_leprae.aa \
--forward-hits q_s.hits --reverse-hits s_q.hits \
--no-format --evalue 0.1

Find orthologs between all the sequences in the query and subject genomes using genomes that have already been formatted for blast

rsd_search -q Mycoplasma_genitalium.aa \
--subject-genome=Mycobacterium_leprae.aa \
-o Mycoplasma_genitalium.aa_Mycobacterium_leprae.aa_0.8_1e-5.orthologs.txt \
--no-format

Find orthologs between all the sequences in the query and subject genomes using hits that have already been computed. Notice that --no-format is included, because since the blast hits have already been computed the genomes do not need to be formatted for blast.

rsd_search -v --query-genome Mycoplasma_genitalium.aa \
--subject-genome=Mycobacterium_leprae.aa \
-o Mycoplasma_genitalium.aa_Mycobacterium_leprae.aa.default.orthologs.txt \
--forward-hits q_s.hits --reverse-hits s_q.hits --no-format

Find orthologs for specific sequences in the query genome. For finding orthologs for only a few sequences, using --no-blast-cache can speed up computation. YMMV.

rsd_search -q examples/genomes/Mycoplasma_genitalium.aa/Mycoplasma_genitalium.aa \
--subject-genome=examples/genomes/Mycobacterium_leprae.aa/Mycobacterium_leprae.aa \
-o examples/Mycoplasma_genitalium.aa_Mycobacterium_leprae.aa_0.8_1e-5.orthologs.txt \
--ids examples/Mycoplasma_genitalium.aa.ids.txt --no-blast-cache

Output Formats

Orthologs can be saved in several different formats using the --outfmt option of rsd_search. The default format, --outfmt -1, refers to --outfmt 3. Inspired by Uniprot dat files, a set of orthologs starts with a parameters line, then has 0 or more ortholog lines, then has an end line. The parametes are the query genome name, subject genome name, divergence threshold, and evalue threshold. Each ortholog is on a single line listing the query sequence id, the subject sequence id, and the maximum likelihood distance estimate. This format can represent orthologs for multiple sets of parameters in a single file as well as sets of parameters with no orthologs. Therefore it is suitable for use with rsd_search when specifying multiple divergence and evalue thresholds.

Here is an example containing 2 parameter combinations, one of which has no orthologs:

PA\tLACJO\tYEAS7\t0.2\t1e-15
OR\tQ74IU0\tA6ZM40\t1.7016
OR\tQ74K17\tA6ZKK5\t0.8215
//
PA\tMYCGE\tMYCHP\t0.2\t1e-15
//

The original format of RSD, --outfmt 1, is provided for backward compatibility. Each line contains an ortholog, represented as subject sequence id, query sequence id, and maximum likelihood distance estimate. It can only represent a single set of orthologs in a file.

Example:

A6ZM40\tQ74IU0\t1.7016
A6ZKK5\tQ74K17\t0.8215

Also provided for backward compatibility is a format used internally by Roundup (http://roundup.hms.harvard.edu/) which is like the original RSD format, except the query sequence id column is before the subject sequence id.

Example:

Q74IU0\tA6ZM40\t1.7016
Q74K17\tA6ZKK5\t0.8215

Search by Category

reciprocal_smallest_distance

Similar Software

CLC DNA Workbench

Genepidgin

VULCAN

seriesoftubes

Comments to reciprocal_smallest_distance

Comments not found

Add Comment

Search by Category

Search by Category

Popular software

pyNetConv 3 Jun 15

reciprocal_smallest_distance 20 Feb 15

GDIS 3 Jun 15

goby 14 Apr 15

Adun 3 Jun 15

MacMolPlt 2 Jun 15

E-Cell System 11 May 15

reciprocal_smallest_distance

Similar Software

CLC DNA Workbench

Genepidgin

VULCAN

seriesoftubes

Comments to reciprocal_smallest_distance

Comments not found

Add Comment

Search by Category

Popular software

ProteinShop 12 May 15

Geant4 20 Feb 15

pyNetConv 3 Jun 15

AHREA 11 May 15

edittag 20 Feb 15

CLC DNA Workbench 2 Jun 15

STEME 20 Feb 15