1. How does Procom work?
2. What are ANCHOR, INTERSECTION, and SUBTRACTION organisms?
3. What are the parameters used in BLASTP?
4. Why do I have to specify both "Intersection" and "Subtraction" E-values?
5. Where did you download the proteomes? Are the genomes complete?
6. Contact and citation
INTERSECTION:
(1) INTERSECTION is used as the database in the BLASTP comparisons;
(2) The output ANCHOR proteins must have a match in all INTERSECTION organisms that are chosen;
(3) INTERSECTION organisms often share the trait of interest with the ANCHOR.
SUBTRACTION:
(1) SUBTRACTION is used as the database in the BLASTP comparisons;
(2) The output ANCHOR proteins should NOT have a match in any of the SUBTRACTION organisms that
are chosen;
(3) SUBTRACTION organisms do not share the trait of interest with the ANCHOR.
E=1: Only the matches with E-value <= 1 are reported.
V=1: Only one database sequence for which the one-line description will be reported.
B=1: Only one database sequence for which high-scoring segment pairs (HSPs) will be reported.
-filter SEG+XNU: To mask the low complexity regions.
The BLASTP output file will be parsed; the query (ANCHOR) protein name will be retrieved when the corresponding E-value is lower than specified by the user. The collecitons of query protein names are compared with each other to obtain the overlap for intersection organisms and remove the overlap for subtraction organisms.
Anopheles gambiae [PubMed]
   ftp://ftp.ensembl.org/pub/current_mosquito/data/fasta/pep/
Arabidopsis thaliana [PubMed]
   ftp://tairpub:tairpub@ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/OLD
Aspergillus nidulans [Coverage: 13X; Percentage: 96%]
   http://www.broad.mit.edu/cgi-bin/annotation/aspergillus/download_license.cgi
Brugia malayi [Coverage: 5.1X]
   ftp://ftp.tigr.org/private/euk/b_malayi_fh574/
(license needed)
Caenorhabditis briggsae [PubMed]
   ftp://ftp.wormbase.org/pub/wormbase/briggsae-current_release/gff_db_load_files/run_25/
Caenorhabditis elegans [PubMed]
   ftp://ftp.wormbase.org/pub/wormbase/elegans/WS131/wormpep131.tar.gz
Chlamydomonas reinhardtii [Coverage: 8X]
   http://genome.jgi-psf.org/chlre2/chlre2.download.ftp.html
Ciona intestinalis [PubMed]
   http://genome.jgi-psf.org/ciona4/ciona4.download.ftp.html
Cryptococcus neoformans [Coverage: 10.5X]
   ftp://ftp.tigr.org/private/euk/c_neoformans_64hr/
Danio rerio [Coverage: 5.7X]
   ftp://ftp.ensembl.org/pub/current_zebrafish/data/fasta/pep/
Dictyostelium discoideum [PubMed]
   http://dictybase.org/db/cgi-bin/dictyBase/download/download.pl
Drosophila melanogaster [PubMed]
   ftp://ftp.ensembl.org/pub/current_fly/data/fasta/pep/
Encephalitozoon cuniculi [PubMed]
   ftp://ftp.ncbi.nlm.nih.gov/genomes/Encephalitozoon_cuniculi/
Entamoeba histolytica [Pubmed]
   ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/e_histolytica/annotation_dbs/EHA1.pep [Newly Added 2005-12-5]
Fugu rubripes [PubMed]
   ftp://ftp.ensembl.org/pub/current_fugu/data/fasta/pep/
Gallus gallus [Coverage: 6.6X]
   ftp://ftp.ensembl.org/pub/current_chicken/data/fasta/pep/
Giardia lamblia
   http://gmod.mbl.edu/perl/site/giardia?page=download_tool&file=orfs_aa&type=orfs&noheader=T [Newly Added 2005-12-5]
Guillardia theta [PubMed]
   http://www.ebi.ac.uk/integr8/FtpSearch.do;jsessionid=1302BCF1222FD416655F54964C6F0C7C?orgTaxID=55529
Homo sapiens [PubMed]
   ftp://ftp.ensembl.org/pub/current_human/data/fasta/pep/
Leishmania major [Coverage: 10X]
   ftp://ftp.sanger.ac.uk/pub/databases/L.major_sequences/LEISHPEP/GeneDB_protein_database_270404
   ftp://ftp.sanger.ac.uk/pub/databases/L.major_sequences/LEISHPEP/GeneDB_Protein_database_100505 [New Version]
Mus musculus [PubMed]
   ftp://ftp.ensembl.org/pub/current_mouse/data/fasta/pep/
Neurospora crassa [PubMed]
   http://www.broad.mit.edu/cgi-bin/annotation/neurospora/download_license.cgi
Oryza sativa [PubMed]
   ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_2.0/all_chrs/
Plasmodium falciparum [PubMed]
   http://www.plasmodb.org/restricted/data/P_falciparum/WG/cds.aa/
Rattus norvegicus [PubMed]
   ftp://ftp.ensembl.org/pub/current_rat/data/fasta/pep/
Saccharomyces cerevisiae [PubMed]
   ftp://genome-ftp.stanford.edu/pub/yeast/data_download/sequence/genomic_sequence/orf_protein/
Schizosaccharomyces pombe [PubMed]
   ftp://ftp.sanger.ac.uk/pub/yeast/pombe/Protein_data/pompep/
Tetrahymena thermophila
   ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/t_thermophila/Gene_Predictions/
Thalassiosira pseudonana [PubMed]
   http://genome.jgi-psf.org/thaps1/thaps1.download.ftp.html
Toxoplasma gondii [Coverage: 8X]
   http://toxodb.org/restricted/data/Genome/pep/Tg10x_TwinScan_20040527.gz
(license needed)
Trypanosoma brucei [PubMed (ChrI)]
[PubMed (ChrII)]
   ftp://ftp.tigr.org/private/euk/t_brucei_fnzm1/annotation_dbs/TBA1.pep
(license needed)
   ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/t_brucei/annotation_dbs/TBA1.pep [New Version]
Trypanosoma cruzi [Coverage: 19X]
   ftp://ftp.tigr.org/private/euk/t_cruzi_q98122/annotation_dbs/TCA1.pep
(license needed)
Please email "billy [AT] ural.wustl.edu" for questions and comments.
Please cite:
Li, J.B., Zhang, M., Dutcher, S.K., and Stormo, G.D. (2005) Procom: a web-based tool to compare
multiple eukaryotic proteomes. Bioinformatics. 21: 1693-1694. [Abstract] [PDF]
Last update: Jin Billy Li, December 6, 2005