PHASTEST Statistics


Quick Links
Table 1: PHASTEST's performance upgrades
Table 2: Feature comparison between PHAST, PHASTER and PHASTEST
Table 3: PHASTEST's evaluation (summary)
Table 4: PHASTEST's evaluation on GenBank annotated genomes
Table 5: PHASTEST's evaluation on unannotated genomes
Table 6: Comparison of Numbers of Proteins Annotated by Lite (Swissprot) and Deep (PHAST-BSD) Annotation Mode
Figure 1: Numbers of correct and false positive phage regions for E. coli
Figure 2: Performance Comparison of Prodigal and GLIMMER

Table 1: PHASTEST's performance upgrades
Cumulative set of performance enhancementsBLAST vs. phage DB runtime (s)BLAST vs. bacterial DB runtime (s)Total runtime on GenBank annotated genome (s)Total runtime on unannotated genome (s)
PHAST (baseline) current DBs, no other upgrades191576270899
PHASTER (Past) - 2016 Data4748126227
PHASTER (Baseline) - Current DBs, no other upgrades11683162277
PHASTEST (Upgrade 1) - BLAST+ parameter adjustment8282144229
PHASTEST (Upgrade 2) - Whole-sequence Prodigal8171141201
PHASTEST (Upgrade 3) - Parallel Diamond*84124266118
PHASTEST (Upgrade 4) - Swissprot DB*8064110195

Details of PHASTEST’s performance upgrades, and its impact on 5.5 Mbp test genome (Escherichia coli O157:H7,GenBank accession NC_002655). Pipeline configuration capable of full genome-wide annotation is marked with (*).



Table 2: Feature comparison between PHAST, PHASTER and PHASTEST
FeaturePHAST (as of Jan 2011)PHASTERPHASTEST
Viral sequence database~45,000 sequences~187,000 sequences~410,000 sequences
Bacterial sequence database~4 million sequences~9 million sequences, streamlined through CD-HIT filteringSwissprot database, ~560,000 sequences
Computing cluster32 CPU cores112 CPU cores116 CPU cores
BLASTLegacy version 2.2.16BLAST+ version 2.3.0+BLAST+ version 2.3.0+, Diamond v2.0.14
Cluster use optimizationRudimentarySmart partitioning of query sequences and target bacterial DB; optimized execution parametersOptimized execution parameters
Front-end serverShared, single CPU50% faster, dedicatedDedicated 4 CPU cores
Front-end websitePerl and CGIRuby on RailsRuby on Rails
Genome viewerAdobe FlashJavaScript, AngularPlasmid and D3CGView.js
Queuing systemFlat fileUses Sidekiq for threading submissionsUses Sidekiq for threading submissions
Recall previous user submissionsBookmark page“My Searches” feature or bookmark"My Searches" feature or bookmark
Pre-computed genome results for quick query searching0>14,000>14,000
Retrieve previously annotated genome resultsGenBank accession or GI number onlyGenBank accession, GI number, or full sequenceGenBank accession, GI number, or full sequence
Metagenomic data handlingNAFor raw sequence files onlyFor raw sequence files and whole-genome shotgun GenBank records
Annotation TargetPhage region onlyPhage region onlyFull genome


Table 3: PHASTEST's evaluation (summary)
PHAST (2011)PHASTER (2016)PHASTEST (2022)
Input dataGenBank annotated genomeSequence onlyGenBank annotated genomeSequence onlyGenBank annotated genomeSequence only
Sensitivity85.4%79.4%86.9%85.0%87.6%85.0%
Positive predictive value (PPV)94.2%86.5%91.0%87.3%91.4%91.2%
Prophages annotated in evaluation set267267267267267267
Prophages matched228212232227234227
Predicted prophages not present in evaluation set143323332222
Predicted prophages not in the evaluation set with evidence suggestive of being true prophagesN/AN/A121187
Adjusted sensitivityN/AN/A87.5%85.6%88.1%85.4%
Adjusted PPVN/AN/A95.7%91.5%94.5%94.0%

Comparison of accuracy between the PHAST (2011), PHASTER (2016), and PHASTEST (2022). Evaluation was performed against manual prophage annotations in 54 bacterial genomes from Casjens, S. (2003) Molecular Microbiology, 49, 277-300. Because these manual annotations are a number of years old, predicted prophages were manually inspected, and it was determined that PHASTER and PHASTEST finds good evidence that several of these are in fact true prophages (these are marked below in Tables 4 and 5). Once the evaluation standard was adjusted to count these prophages as true positives, adjusted sensitivity and PPV values were calculated (last two rows).



Table 4: PHASTEST's evaluation on GenBank annotated genomes
OrganismReferencePHASTPHASTERPHASTEST
>NC_000962,
Mycobacterium tuberculosis H37Rv,
4411532 bp
2970551-29815762970065-29838742970063-29838742970063-2984654
1780643-17885051780643-17885051766989-17885051766987-1788505
>NC_000913,
Escherichia coli K12,
4639675 bp
262552-296320262124-296432262898-297206 262898-297719
2464567-24756512464378-24756512465301-24776292465301-2477629
2754181-27758042753821-27807482754896-27777822746434-2777782
564038-584856563980-585282564755-586057564755-586057
1410024-14322811404587-14328381395952-14350511395952-1435051
1196090-12104021191881-12189611196867-12166711196867-1216671
1631063-16507321631063-16625371619557-16567441619557-1656744
2556793-2563354False NegativeFalse Negative2558575-2567625
2064329-2076158False NegativeFalse Negative2068952-2082547
False Positives4505466-45407624491197-45484424491197-4548442 **
False Positivesn/a3716547-3722049n/a
>NC_003112,
Neisseria meningitidis MC58,
2272360 bp
1001560-1005455998362-1007364False NegativeFalse Negative
1099910-11339801101164-11337601101164-11337601101164-1133760

Detailed evaluation results for PHAST (2011) and PHASTER (2016) run on GenBank annotated genomes. Evaluation was made against manual prophage annotations in 54 bacterial genomes from Casjens, S. (2003) Molecular microbiology, 49, 277-300. Prophages predicted by PHASTER that were absent in the evaluation set were manually inspected, and several (marked below with **) were deemed to have good evidence of being true prophages rather than being false positives.

For statistics on more organisms click here.

Table 5: PHASTEST's evaluation on unannotated genomes
OrganismReferencePHASTPHASTERPHASTEST
>NC_000962,
Mycobacterium tuberculosis H37Rv,
4411532 bp
2970551-29815762970065-29808352970063-29808332970063-2984654
1780643-1788505False Negative1766987-17881151766987-1788487
>NC_000913,
Escherichia coli K12,
4639675 bp
262552-296320False NegativeFalse NegativeFalse Negative
2464567-2475651False Negative2465301-24776292465301-2477629
2754181-2775804False NegativeFalse NegativeFalse Negative
564038-584856563980-585282564755-586057564755-586057
1410024-14322811409925-14329851411899-14349591411969-1434812
1196090-12104021197865-12158961198640-12166711198640-1216671
1631063-16507321616881-16507321627517-16443041618855-1653915
2556793-2563354False NegativeFalse NegativeFalse Negative
2064329-2076158False NegativeFalse NegativeFalse Negative
False Positivesn/a3716547-3722049n/a
>NC_003112,
Neisseria meningitidis MC58,
2272360 bp
1001560-1005455998266-1007364False NegativeFalse Negative
1099910-11339801099110-11337601101164-11339661098337-1133966
False Positives916111-930967916109-929811n/a

Detailed evaluation results for PHAST (2011) and PHASTER (2016) run on unannotated genome sequences i.e. not genbank annotated (only the sequence is used). Evaluation was made against manual prophage annotations in 54 bacterial genomes from Casjens, S. (2003) Molecular microbiology, 49, 277-300. Prophages predicted by PHASTER that were absent in the evaluation set were manually inspected, and several (marked below with **) were deemed to have good evidence of being true prophages rather than being false positives.

For statistics on more organisms click here.

Table 6: Comparison of Numbers of Proteins Annotated by Lite (Swissprot) and Deep (PHAST-BSD) Annotation Modes
SampleSwissprot HitsSwissprot MissesPHAST-BSD HitsPHAST-BSD Misses
NC_000907.116472216663
NC_000913.3395528395924
NC_000962.3283293837637
NC_000964.336984837379
NC_002488.3117457417480

Comparison of numbers of non-phage proteins annotated via PHASTEST's lite mode (using Swissprot database) and deep mode (using PHAST-BSD database). Swissprot Misses denote numbers of proteins present in deep annotation mode but not found in the lite annotation, whereas PHAST-BSD Misses denote numbers of proteins present in lite annotation but not found in deep annotation mode.

For statistics on more organisms click here.


Figure 1: Numbers of correct and false positive phage regions for E. coli

Numbers of true and false positive phage regions for E. coli O157:H7 (NC_002655.2) contig inputs of varying length. Graph shows numbers of phage regions correctly identified relative to the E. coli O157:H7 full sequence annotation result. E. coli O157:H7 full sequence is predicted to possess 15 phage regions total.




Figure 2: Performance Comparison of Prodigal and GLIMMER

Accuracy comparison between Prodigal (used in PHASTEST) and parallel GLIMMER (used in PHASTER). Both programs were provided with FASTA format sequence input of 54 genome samples outlined in the table. Result of the ORF prediction was compared against pre-annotated CDS regions outlined in Genbank records of each genome. Sensitivity formula follows (amount of correct region prediction / total CDS identified in Genbank record), whereas accuracy was determined by (amount of correct predictions / (correct predictions + false positives)).

For detailed statistics on more organisms click here.

We require the use of cookies for essential features like storing your previously submitted PHASTEST queries. Rejecting the usage of cookies will result in certain features being disabled. By clicking ACCEPT or continuing to use the website you are agreeing to our use of cookies.

ACCEPT