{"vars":[{"name":"vars","containerName":"","line":164,"kind":2},{"name":"overload","containerName":"","line":167,"kind":2},{"name":"to_string","line":167,"kind":12},{"containerName":"","name":"base","line":169,"kind":2},{"containerName":null,"name":"$GAP_SYMBOL","line":171,"kind":13},{"name":"%STRAND_SYMBOL","containerName":null,"kind":13,"line":172},{"signature":{"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>","parameters":[{"label":"$class"},{"label":"@args"}],"label":"new($class,@args)"},"detail":"($class,@args)","definition":"sub","containerName":"main::","children":[{"localvar":"my","containerName":"new","definition":"my","name":"$class","line":211,"kind":13},{"kind":13,"line":211,"name":"@args","containerName":"new"},{"kind":13,"line":213,"definition":"my","name":"$self","containerName":"new","localvar":"my"},{"name":"$class","containerName":"new","kind":13,"line":213},{"kind":13,"line":213,"containerName":"new","name":"@args"},{"line":215,"kind":13,"containerName":"new","name":"$self"},{"containerName":"new","name":"$self","line":215,"kind":13},{"localvar":"my","name":"$raw_data","definition":"my","containerName":"new","line":216,"kind":13},{"name":"$qname","containerName":"new","kind":13,"line":216},{"name":"$hname","containerName":"new","kind":13,"line":216},{"line":216,"kind":13,"name":"$qlen","containerName":"new"},{"name":"$hlen","containerName":"new","line":216,"kind":13},{"name":"$self","containerName":"new","line":218,"kind":13},{"line":218,"kind":13,"name":"$self","containerName":"new"},{"name":"$raw_data","containerName":"new","kind":13,"line":218},{"name":"$qname","containerName":"new","kind":13,"line":219},{"containerName":"new","name":"$hname","kind":13,"line":219},{"line":220,"kind":13,"containerName":"new","name":"$self"},{"name":"_rearrange","containerName":"new","kind":12,"line":220},{"name":"@args","containerName":"new","line":225,"kind":13},{"name":"$self","containerName":"new","kind":13,"line":229},{"name":"_set_data","containerName":"new","kind":12,"line":229},{"line":229,"kind":13,"name":"$raw_data","containerName":"new"}],"line":209,"kind":12,"range":{"start":{"character":0,"line":209},"end":{"line":229,"character":9999}},"name":"new"},{"kind":12,"line":213,"name":"SUPER","containerName":"new"},{"name":"$qb","definition":"my","containerName":null,"localvar":"my","kind":13,"line":231},{"containerName":null,"name":"$hb","kind":13,"line":231},{"kind":13,"line":231,"containerName":null,"name":"$self"},{"containerName":"main::","name":"start","line":231,"kind":12},{"containerName":null,"definition":"my","name":"$qe","localvar":"my","kind":13,"line":232},{"line":232,"kind":13,"name":"$he","containerName":null},{"line":232,"kind":13,"containerName":null,"name":"$self"},{"containerName":"main::","name":"end","kind":12,"line":232},{"localvar":"my","containerName":null,"name":"$qs","definition":"my","line":233,"kind":13},{"containerName":null,"name":"$hs","kind":13,"line":233},{"kind":13,"line":233,"containerName":null,"name":"$self"},{"kind":12,"line":233,"name":"strand","containerName":"main::"},{"name":"$qf","definition":"my","containerName":null,"localvar":"my","kind":13,"line":234},{"line":234,"kind":13,"containerName":null,"name":"$hf"},{"line":234,"kind":13,"name":"$self","containerName":null},{"line":234,"kind":12,"containerName":"main::","name":"query"},{"containerName":"main::","name":"frame","kind":12,"line":234},{"line":235,"kind":13,"name":"$self","containerName":null},{"line":235,"kind":12,"name":"hit","containerName":"main::"},{"containerName":"main::","name":"frame","line":235,"kind":12},{"name":"$self","containerName":null,"line":237,"kind":13},{"containerName":"main::","name":"query","kind":12,"line":237},{"name":"Bio","containerName":"SeqFeature::Similarity","line":237,"kind":12},{"containerName":"main::","name":"new","kind":12,"line":237},{"line":237,"kind":13,"containerName":null,"name":"$qb"},{"name":"$qe","containerName":null,"line":238,"kind":13},{"kind":13,"line":239,"containerName":null,"name":"$qs"},{"line":240,"kind":13,"containerName":null,"name":"$self"},{"line":240,"kind":12,"containerName":"main::","name":"bits"},{"line":241,"kind":13,"containerName":null,"name":"$self"},{"kind":12,"line":241,"name":"score","containerName":"main::"},{"line":242,"kind":13,"name":"$qf","containerName":null},{"kind":13,"line":243,"name":"$qname","containerName":null},{"line":244,"kind":13,"containerName":null,"name":"%self"},{"name":"$self","containerName":null,"line":246,"kind":13},{"name":"hit","containerName":"main::","line":246,"kind":12},{"containerName":"SeqFeature::Similarity","name":"Bio","line":246,"kind":12},{"kind":12,"line":246,"name":"new","containerName":"main::"},{"kind":13,"line":246,"containerName":null,"name":"$hb"},{"containerName":null,"name":"$he","line":247,"kind":13},{"containerName":null,"name":"$hs","kind":13,"line":248},{"line":249,"kind":13,"name":"$self","containerName":null},{"line":249,"kind":12,"name":"bits","containerName":"main::"},{"containerName":null,"name":"$self","kind":13,"line":250},{"line":250,"kind":12,"containerName":"main::","name":"score"},{"name":"$hf","containerName":null,"kind":13,"line":251},{"containerName":null,"name":"$hname","kind":13,"line":252},{"line":253,"kind":13,"name":"%self","containerName":null},{"name":"$self","containerName":null,"line":256,"kind":13},{"containerName":"main::","name":"query","line":256,"kind":12},{"containerName":"main::","name":"seqlength","line":256,"kind":12},{"name":"$qlen","containerName":null,"line":256,"kind":13},{"name":"$self","containerName":null,"kind":13,"line":257},{"line":257,"kind":12,"containerName":"main::","name":"hit"},{"kind":12,"line":257,"containerName":"main::","name":"seqlength"},{"kind":13,"line":257,"name":"$hlen","containerName":null},{"name":"$self","containerName":null,"kind":13,"line":259},{"containerName":"main::","name":"query","kind":12,"line":259},{"containerName":"main::","name":"frac_identical","kind":12,"line":259},{"kind":13,"line":259,"name":"$self","containerName":null},{"name":"frac_identical","containerName":"main::","kind":12,"line":259},{"line":260,"kind":13,"containerName":null,"name":"$self"},{"name":"hit","containerName":"main::","line":260,"kind":12},{"containerName":"main::","name":"frac_identical","line":260,"kind":12},{"kind":13,"line":260,"containerName":null,"name":"$self"},{"name":"frac_identical","containerName":"main::","kind":12,"line":260},{"containerName":null,"name":"$self","kind":13,"line":261},{"range":{"end":{"line":283,"character":9999},"start":{"character":0,"line":275}},"definition":"sub","name":"_id_str","containerName":"main::","children":[{"kind":13,"line":276,"name":"$self","definition":"my","containerName":"_id_str","localvar":"my"},{"containerName":"_id_str","name":"$self","line":277,"kind":13},{"localvar":"my","name":"$qname","definition":"my","containerName":"_id_str","line":278,"kind":13},{"containerName":"_id_str","name":"$self","line":278,"kind":13},{"line":278,"kind":12,"containerName":"_id_str","name":"query"},{"containerName":"_id_str","name":"seq_id","kind":12,"line":278},{"line":279,"kind":13,"localvar":"my","definition":"my","name":"$hname","containerName":"_id_str"},{"name":"$self","containerName":"_id_str","kind":13,"line":279},{"line":279,"kind":12,"name":"hit","containerName":"_id_str"},{"kind":12,"line":279,"name":"seq_id","containerName":"_id_str"},{"kind":13,"line":280,"containerName":"_id_str","name":"$self"},{"name":"$self","containerName":"_id_str","line":282,"kind":13}],"line":275,"kind":12},{"range":{"end":{"character":9999,"line":308},"start":{"line":304,"character":0}},"name":"algorithm","signature":{"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none","parameters":[{"label":"$self"},{"label":"@args"}],"label":"algorithm($self,@args)"},"detail":"($self,@args)","definition":"sub","containerName":"main::","children":[{"line":306,"kind":13,"localvar":"my","containerName":"algorithm","name":"$self","definition":"my"},{"name":"@args","containerName":"algorithm","line":306,"kind":13},{"line":307,"kind":13,"containerName":"algorithm","name":"$self"}],"line":304,"kind":12},{"line":331,"children":[{"containerName":"signif","definition":"my","name":"$self","localvar":"my","kind":13,"line":333},{"line":334,"kind":13,"localvar":"my","containerName":"signif","definition":"my","name":"$val"},{"name":"$self","containerName":"signif","line":334,"kind":13},{"line":334,"kind":13,"containerName":"signif","name":"$self"},{"name":"$self","containerName":"signif","line":334,"kind":13},{"containerName":"signif","name":"$val","line":335,"kind":13}],"kind":12,"range":{"end":{"character":9999,"line":336},"start":{"line":331,"character":0}},"containerName":"main::","name":"signif","definition":"sub"},{"children":[],"line":356,"kind":12,"range":{"start":{"character":0,"line":356},"end":{"line":356,"character":9999}},"definition":"sub","name":"evalue","containerName":"main::"},{"kind":12,"children":[{"kind":13,"line":377,"containerName":"p","name":"$self","definition":"my","localvar":"my"},{"line":377,"kind":13,"name":"$self","containerName":"p"}],"line":377,"definition":"sub","name":"p","containerName":"main::","range":{"end":{"line":377,"character":9999},"start":{"line":377,"character":0}}},{"name":"pvalue","definition":"sub","containerName":"main::","range":{"start":{"character":0,"line":381},"end":{"line":381,"character":9999}},"kind":12,"children":[{"containerName":"pvalue","name":"p","kind":12,"line":381}],"line":381},{"range":{"end":{"line":417,"character":9999},"start":{"line":401,"character":0}},"name":"length","children":[{"localvar":"my","containerName":"length","name":"$self","definition":"my","line":405,"kind":13},{"line":405,"kind":13,"containerName":"length","name":"$seqType"},{"line":405,"kind":13,"name":"$data","containerName":"length"},{"line":406,"kind":13,"name":"$seqType","containerName":"length"},{"containerName":"length","name":"$seqType","kind":13,"line":407},{"name":"$seqType","containerName":"length","kind":13,"line":407},{"name":"$seqType","containerName":"length","kind":13,"line":409},{"name":"$self","containerName":"length","kind":13,"line":409},{"kind":12,"line":409,"name":"_set_seq_data","containerName":"length"},{"kind":13,"line":409,"name":"$self","containerName":"length"},{"name":"$seqType","containerName":"length","kind":13,"line":412},{"containerName":"length","name":"$data","kind":13,"line":413},{"containerName":"length","name":"$self","line":414,"kind":13},{"containerName":"length","name":"$seqType","kind":13,"line":414},{"name":"$data","containerName":"length","line":414,"kind":13},{"name":"$self","containerName":"length","line":416,"kind":13},{"name":"$seqType","containerName":"length","kind":13,"line":416}],"line":401,"kind":12,"signature":{"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>","parameters":[{"label":"$self"},{"label":"$seqType"},{"label":"$data"}],"label":"length($self,$seqType,$data)"},"detail":"($self,$seqType,$data)","definition":"sub","containerName":"main::"},{"name":"gaps","range":{"end":{"character":9999,"line":463},"start":{"character":0,"line":443}},"kind":12,"children":[{"localvar":"my","name":"$self","definition":"my","containerName":"gaps","line":445,"kind":13},{"line":445,"kind":13,"containerName":"gaps","name":"$seqType"},{"line":447,"kind":13,"containerName":"gaps","name":"$self"},{"kind":12,"line":447,"containerName":"gaps","name":"_set_seq_data"},{"kind":13,"line":447,"name":"$self","containerName":"gaps"},{"line":449,"kind":13,"containerName":"gaps","name":"$seqType"},{"name":"$seqType","containerName":"gaps","kind":13,"line":450},{"name":"$seqType","containerName":"gaps","line":450,"kind":13},{"line":452,"kind":13,"containerName":"gaps","name":"$seqType"},{"line":453,"kind":13,"containerName":"gaps","name":"$self"},{"containerName":"gaps","name":"$self","kind":13,"line":453},{"kind":13,"line":456,"containerName":"gaps","name":"$seqType"},{"line":457,"kind":13,"name":"$self","containerName":"gaps"},{"line":457,"kind":13,"name":"$self","containerName":"gaps"},{"kind":13,"line":460,"name":"$seqType","containerName":"gaps"},{"name":"$self","containerName":"gaps","line":461,"kind":13},{"kind":13,"line":461,"containerName":"gaps","name":"$seqType"}],"line":443,"definition":"sub","containerName":"main::","signature":{"label":"gaps($self,$seqType)","parameters":[{"label":"$self"},{"label":"$seqType"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>"},"detail":"($self,$seqType)"},{"detail":"($self,$seqType)","signature":{"label":"frac_identical($self,$seqType)","documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>","parameters":[{"label":"$self"},{"label":"$seqType"}]},"containerName":"main::","definition":"sub","line":495,"children":[{"kind":13,"line":501,"containerName":"frac_identical","definition":"my","name":"$self","localvar":"my"},{"containerName":"frac_identical","name":"$seqType","line":501,"kind":13},{"name":"$seqType","containerName":"frac_identical","kind":13,"line":502},{"line":503,"kind":13,"name":"$seqType","containerName":"frac_identical"},{"name":"$seqType","containerName":"frac_identical","line":503,"kind":13},{"kind":13,"line":505,"containerName":"frac_identical","name":"$seqType"},{"name":"$self","containerName":"frac_identical","kind":13,"line":506},{"containerName":"frac_identical","name":"_set_seq_data","kind":12,"line":506},{"kind":13,"line":506,"name":"$self","containerName":"frac_identical"},{"kind":13,"line":509,"containerName":"frac_identical","name":"$seqType"},{"containerName":"frac_identical","name":"$self","kind":13,"line":511},{"name":"$self","containerName":"frac_identical","line":511,"kind":13},{"kind":13,"line":511,"containerName":"frac_identical","name":"$seqType"}],"kind":12,"range":{"end":{"line":512,"character":9999},"start":{"line":495,"character":0}},"name":"frac_identical"},{"name":"frac_conserved","range":{"start":{"character":0,"line":546},"end":{"line":564,"character":9999}},"kind":12,"children":[{"definition":"my","name":"$self","containerName":"frac_conserved","localvar":"my","kind":13,"line":552},{"line":552,"kind":13,"containerName":"frac_conserved","name":"$seqType"},{"name":"$seqType","containerName":"frac_conserved","kind":13,"line":553},{"line":554,"kind":13,"containerName":"frac_conserved","name":"$seqType"},{"name":"$seqType","containerName":"frac_conserved","kind":13,"line":554},{"name":"$seqType","containerName":"frac_conserved","kind":13,"line":556},{"name":"$self","containerName":"frac_conserved","line":557,"kind":13},{"kind":12,"line":557,"containerName":"frac_conserved","name":"_set_seq_data"},{"name":"$self","containerName":"frac_conserved","line":557,"kind":13},{"containerName":"frac_conserved","name":"$seqType","line":561,"kind":13},{"line":563,"kind":13,"name":"$self","containerName":"frac_conserved"},{"kind":13,"line":563,"containerName":"frac_conserved","name":"$self"},{"kind":13,"line":563,"name":"$seqType","containerName":"frac_conserved"}],"line":546,"definition":"sub","containerName":"main::","signature":{"label":"frac_conserved($self,$seqType)","parameters":[{"label":"$self"},{"label":"$seqType"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>"},"detail":"($self,$seqType)"},{"name":"query_string","definition":"sub","containerName":"main::","range":{"start":{"line":578,"character":0},"end":{"character":9999,"line":578}},"kind":12,"children":[{"containerName":"query_string","name":"seq_str","kind":12,"line":578}],"line":578},{"containerName":"main::","definition":"sub","name":"hit_string","range":{"end":{"line":593,"character":9999},"start":{"character":0,"line":593}},"kind":12,"line":593,"children":[{"containerName":"hit_string","name":"seq_str","kind":12,"line":593}]},{"line":611,"children":[{"line":611,"kind":12,"name":"seq_str","containerName":"homology_string"}],"kind":12,"range":{"end":{"character":9999,"line":611},"start":{"line":611,"character":0}},"containerName":"main::","definition":"sub","name":"homology_string"},{"name":"expect","definition":"sub","containerName":"main::","range":{"end":{"character":9999,"line":627},"start":{"character":0,"line":627}},"kind":12,"children":[{"name":"evalue","containerName":"expect","kind":12,"line":627}],"line":627},{"range":{"end":{"character":9999,"line":644},"start":{"line":644,"character":0}},"containerName":"main::","name":"rank","definition":"sub","line":644,"children":[],"kind":12},{"kind":12,"children":[{"containerName":"name","name":"rank","line":649,"kind":12}],"line":649,"definition":"sub","name":"name","containerName":"main::","range":{"start":{"line":649,"character":0},"end":{"character":9999,"line":649}}},{"line":668,"children":[{"containerName":"to_string","definition":"my","name":"$self","localvar":"my","kind":13,"line":670},{"kind":13,"line":671,"containerName":"to_string","name":"$self"},{"line":671,"kind":12,"name":"rank","containerName":"to_string"}],"kind":12,"range":{"end":{"character":9999,"line":672},"start":{"line":668,"character":0}},"containerName":"main::","name":"to_string","definition":"sub"},{"line":691,"children":[{"definition":"my","name":"$self","containerName":"_set_data","localvar":"my","kind":13,"line":693},{"line":694,"kind":13,"localvar":"my","name":"@data","definition":"my","containerName":"_set_data"},{"localvar":"my","containerName":"_set_data","definition":"my","name":"@queryList","line":695,"kind":13},{"kind":13,"line":696,"definition":"my","name":"@sbjctList","containerName":"_set_data","localvar":"my"},{"localvar":"my","containerName":"_set_data","definition":"my","name":"@matchList","line":697,"kind":13},{"line":698,"kind":13,"localvar":"my","name":"$matchLine","definition":"my","containerName":"_set_data"},{"containerName":"_set_data","name":"@linedat","definition":"my","localvar":"my","kind":13,"line":699},{"name":"$line","definition":"my","containerName":"_set_data","localvar":"my","kind":13,"line":703},{"line":703,"kind":13,"name":"$aln_row_len","containerName":"_set_data"},{"name":"$length_diff","containerName":"_set_data","line":703,"kind":13},{"name":"$length_diff","containerName":"_set_data","kind":13,"line":704},{"line":715,"kind":13,"name":"$line","containerName":"_set_data"},{"containerName":"_set_data","name":"@data","kind":13,"line":715},{"line":716,"kind":13,"name":"$line","containerName":"_set_data"},{"name":"$line","containerName":"_set_data","kind":13,"line":718},{"containerName":"_set_data","name":"$self","line":719,"kind":13},{"line":719,"kind":12,"name":"_set_score_stats","containerName":"_set_data"},{"name":"$line","containerName":"_set_data","kind":13,"line":719},{"containerName":"_set_data","name":"$line","line":720,"kind":13},{"line":721,"kind":13,"name":"$self","containerName":"_set_data"},{"name":"_set_match_stats","containerName":"_set_data","line":721,"kind":12},{"kind":13,"line":721,"name":"$line","containerName":"_set_data"},{"containerName":"_set_data","name":"$line","line":722,"kind":13},{"kind":13,"line":726,"containerName":"_set_data","definition":"my","name":"$frame","localvar":"my"},{"name":"$self","containerName":"_set_data","kind":13,"line":727},{"containerName":"_set_data","name":"frame","line":727,"kind":12},{"line":727,"kind":13,"containerName":"_set_data","name":"$frame"},{"kind":13,"line":728,"name":"$line","containerName":"_set_data"},{"kind":13,"line":729,"name":"@queryList","containerName":"_set_data"},{"kind":13,"line":729,"name":"$line","containerName":"_set_data"},{"containerName":"_set_data","name":"$self","kind":13,"line":730},{"containerName":"_set_data","name":"$aln_row_len","kind":13,"line":731},{"kind":13,"line":732,"containerName":"_set_data","name":"$matchLine"},{"containerName":"_set_data","name":"$matchLine","line":733,"kind":13},{"line":735,"kind":13,"name":"$length_diff","containerName":"_set_data"},{"line":735,"kind":13,"containerName":"_set_data","name":"$aln_row_len"},{"name":"$line","containerName":"_set_data","kind":13,"line":735},{"containerName":"_set_data","name":"$length_diff","line":736,"kind":13},{"line":736,"kind":13,"name":"$line","containerName":"_set_data"},{"containerName":"_set_data","name":"$length_diff","kind":13,"line":736},{"kind":13,"line":737,"name":"@matchList","containerName":"_set_data"},{"name":"$line","containerName":"_set_data","kind":13,"line":737},{"name":"$matchLine","containerName":"_set_data","line":738,"kind":13},{"kind":13,"line":739,"name":"$line","containerName":"_set_data"},{"name":"@sbjctList","containerName":"_set_data","kind":13,"line":740},{"line":740,"kind":13,"name":"$line","containerName":"_set_data"},{"kind":13,"line":745,"containerName":"_set_data","name":"$self"},{"name":"@queryList","containerName":"_set_data","kind":13,"line":745},{"name":"$self","containerName":"_set_data","line":746,"kind":13},{"containerName":"_set_data","name":"@sbjctList","kind":13,"line":746},{"name":"$self","containerName":"_set_data","line":749,"kind":13},{"kind":13,"line":749,"name":"@matchList","containerName":"_set_data"},{"line":751,"kind":13,"containerName":"_set_data","name":"$self"},{"name":"$id_str","definition":"my","containerName":"_set_data","localvar":"my","kind":13,"line":752},{"containerName":"_set_data","name":"$self","line":752,"kind":13},{"line":752,"kind":12,"name":"_id_str","containerName":"_set_data"},{"name":"$self","containerName":"_set_data","line":753,"kind":13},{"line":753,"kind":12,"containerName":"_set_data","name":"throw"},{"kind":13,"line":756,"containerName":"_set_data","name":"@queryList"},{"line":756,"kind":13,"name":"@sbjctList","containerName":"_set_data"},{"localvar":"my","containerName":"_set_data","name":"$id_str","definition":"my","line":757,"kind":13},{"containerName":"_set_data","name":"$self","kind":13,"line":757},{"line":757,"kind":12,"name":"_id_str","containerName":"_set_data"},{"containerName":"_set_data","name":"$self","kind":13,"line":758},{"containerName":"_set_data","name":"throw","kind":12,"line":758}],"kind":12,"range":{"start":{"character":0,"line":691},"end":{"line":760,"character":9999}},"containerName":"main::","name":"_set_data","definition":"sub"},{"kind":12,"line":730,"containerName":"length","name":"CORE"},{"containerName":"length","name":"CORE","kind":12,"line":731},{"line":731,"kind":12,"name":"CORE","containerName":"length"},{"line":736,"kind":12,"containerName":"length","name":"CORE"},{"range":{"end":{"character":9999,"line":825},"start":{"character":0,"line":780}},"name":"_set_score_stats","detail":"($self,$data)","signature":{"label":"_set_score_stats($self,$data)","parameters":[{"label":"$self"},{"label":"$data"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>"},"containerName":"main::","definition":"sub","line":780,"children":[{"name":"$self","definition":"my","containerName":"_set_score_stats","localvar":"my","kind":13,"line":782},{"kind":13,"line":782,"containerName":"_set_score_stats","name":"$data"},{"line":784,"kind":13,"localvar":"my","containerName":"_set_score_stats","name":"$expect","definition":"my"},{"containerName":"_set_score_stats","name":"$p","kind":13,"line":784},{"name":"$data","containerName":"_set_score_stats","line":786,"kind":13},{"kind":13,"line":788,"containerName":"_set_score_stats","name":"$self"},{"containerName":"_set_score_stats","name":"bits","line":788,"kind":12},{"name":"$self","containerName":"_set_score_stats","line":789,"kind":13},{"line":789,"kind":12,"name":"score","containerName":"_set_score_stats"},{"containerName":"_set_score_stats","name":"$expect","line":790,"kind":13},{"name":"$data","containerName":"_set_score_stats","kind":13,"line":791},{"kind":13,"line":793,"name":"$self","containerName":"_set_score_stats"},{"containerName":"_set_score_stats","name":"bits","line":793,"kind":12},{"kind":13,"line":794,"containerName":"_set_score_stats","name":"$self"},{"line":794,"kind":12,"containerName":"_set_score_stats","name":"score"},{"name":"$self","containerName":"_set_score_stats","line":795,"kind":13},{"kind":13,"line":796,"containerName":"_set_score_stats","name":"$expect"},{"line":798,"kind":13,"name":"$data","containerName":"_set_score_stats"},{"line":800,"kind":13,"name":"$self","containerName":"_set_score_stats"},{"kind":12,"line":800,"containerName":"_set_score_stats","name":"score"},{"name":"$self","containerName":"_set_score_stats","line":801,"kind":13},{"kind":12,"line":801,"name":"bits","containerName":"_set_score_stats"},{"name":"$expect","containerName":"_set_score_stats","line":802,"kind":13},{"containerName":"_set_score_stats","name":"$p","line":803,"kind":13},{"containerName":"_set_score_stats","name":"$data","kind":13,"line":805},{"name":"$self","containerName":"_set_score_stats","kind":13,"line":807},{"containerName":"_set_score_stats","name":"score","kind":12,"line":807},{"name":"$self","containerName":"_set_score_stats","line":808,"kind":13},{"line":808,"kind":12,"name":"bits","containerName":"_set_score_stats"},{"kind":13,"line":809,"containerName":"_set_score_stats","name":"$expect"},{"kind":13,"line":810,"containerName":"_set_score_stats","name":"$self"},{"line":811,"kind":13,"containerName":"_set_score_stats","name":"$p"},{"name":"$id_str","definition":"my","containerName":"_set_score_stats","localvar":"my","kind":13,"line":814},{"name":"$self","containerName":"_set_score_stats","kind":13,"line":814},{"kind":12,"line":814,"name":"_id_str","containerName":"_set_score_stats"},{"name":"$self","containerName":"_set_score_stats","kind":13,"line":815},{"line":815,"kind":12,"containerName":"_set_score_stats","name":"throw"},{"line":817,"kind":13,"containerName":"_set_score_stats","name":"$data"},{"kind":13,"line":819,"containerName":"_set_score_stats","name":"$expect"},{"containerName":"_set_score_stats","name":"$expect","kind":13,"line":819},{"line":820,"kind":13,"name":"$p","containerName":"_set_score_stats"},{"kind":13,"line":820,"name":"$p","containerName":"_set_score_stats"},{"line":820,"kind":13,"name":"$p","containerName":"_set_score_stats"},{"containerName":"_set_score_stats","name":"$self","line":822,"kind":13},{"name":"$expect","containerName":"_set_score_stats","kind":13,"line":822},{"kind":13,"line":823,"name":"$self","containerName":"_set_score_stats"},{"name":"$p","containerName":"_set_score_stats","kind":13,"line":823},{"name":"$self","containerName":"_set_score_stats","kind":13,"line":824},{"kind":12,"line":824,"name":"significance","containerName":"_set_score_stats"},{"kind":13,"line":824,"name":"$p","containerName":"_set_score_stats"},{"name":"$expect","containerName":"_set_score_stats","line":824,"kind":13}],"kind":12},{"name":"_set_match_stats","range":{"end":{"line":883,"character":9999},"start":{"character":0,"line":851}},"definition":"sub","containerName":"main::","signature":{"label":"_set_match_stats($self,$data)","documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>","parameters":[{"label":"$self"},{"label":"$data"}]},"detail":"($self,$data)","kind":12,"children":[{"line":853,"kind":13,"localvar":"my","containerName":"_set_match_stats","name":"$self","definition":"my"},{"name":"$data","containerName":"_set_match_stats","line":853,"kind":13},{"line":855,"kind":13,"name":"$data","containerName":"_set_match_stats"},{"containerName":"_set_match_stats","name":"$self","line":857,"kind":13},{"name":"$self","containerName":"_set_match_stats","kind":13,"line":858},{"containerName":"_set_match_stats","name":"$data","line":861,"kind":13},{"kind":13,"line":863,"containerName":"_set_match_stats","name":"$self"},{"name":"$self","containerName":"_set_match_stats","line":864,"kind":13},{"containerName":"_set_match_stats","name":"$data","line":867,"kind":13},{"kind":13,"line":868,"name":"$self","containerName":"_set_match_stats"},{"kind":12,"line":868,"name":"frame","containerName":"_set_match_stats"},{"line":873,"kind":13,"containerName":"_set_match_stats","name":"$data"},{"containerName":"_set_match_stats","name":"$self","kind":13,"line":874},{"kind":13,"line":875,"containerName":"_set_match_stats","name":"$self"}],"line":851},{"kind":12,"children":[{"line":906,"kind":13,"localvar":"my","containerName":"_set_seq_data","definition":"my","name":"$self"},{"line":908,"kind":13,"name":"$self","containerName":"_set_seq_data"},{"line":908,"kind":12,"containerName":"_set_seq_data","name":"_set_seq"},{"kind":13,"line":908,"containerName":"_set_seq_data","name":"$self"}],"line":904,"name":"_set_seq_data","definition":"sub","containerName":"main::","range":{"start":{"character":0,"line":904},"end":{"character":9999,"line":908}}},{"line":909,"kind":13,"containerName":null,"name":"$self"},{"containerName":"main::","name":"_set_seq","kind":12,"line":909},{"kind":13,"line":909,"containerName":null,"name":"%self"},{"kind":13,"line":912,"name":"%self","containerName":null},{"name":"%self","containerName":null,"kind":13,"line":912},{"line":913,"kind":13,"containerName":null,"name":"%self"},{"containerName":null,"name":"%self","kind":13,"line":914},{"containerName":null,"name":"%self","line":916,"kind":13},{"range":{"start":{"line":944,"character":0},"end":{"character":9999,"line":1007}},"definition":"sub","name":"_set_seq","containerName":"main::","children":[{"line":946,"kind":13,"localvar":"my","name":"$self","definition":"my","containerName":"_set_seq"},{"name":"$seqType","definition":"my","containerName":"_set_seq","localvar":"my","kind":13,"line":947},{"kind":13,"line":948,"definition":"my","name":"@data","containerName":"_set_seq","localvar":"my"},{"kind":13,"line":949,"containerName":"_set_seq","definition":"my","name":"@ranges","localvar":"my"},{"kind":13,"line":950,"containerName":"_set_seq","name":"@sequence","definition":"my","localvar":"my"},{"line":951,"kind":13,"localvar":"my","containerName":"_set_seq","definition":"my","name":"$numGaps"},{"kind":13,"line":953,"containerName":"_set_seq","name":"@data"},{"kind":13,"line":955,"containerName":"_set_seq","name":"@ranges"},{"containerName":"_set_seq","name":"@sequence","kind":13,"line":956},{"name":"$self","containerName":"_set_seq","kind":13,"line":959},{"name":"warn","containerName":"_set_seq","kind":12,"line":959},{"line":963,"kind":13,"containerName":"_set_seq","name":"@sequence"},{"containerName":"_set_seq","name":"@ranges","kind":13,"line":963},{"containerName":"_set_seq","definition":"my","name":"$id_str","localvar":"my","kind":13,"line":964},{"line":964,"kind":13,"containerName":"_set_seq","name":"$self"},{"name":"_id_str","containerName":"_set_seq","kind":12,"line":964},{"containerName":"_set_seq","name":"$self","line":965,"kind":13},{"containerName":"_set_seq","name":"throw","kind":12,"line":965},{"containerName":"_set_seq","name":"$seqType","line":969,"kind":13},{"line":970,"kind":13,"containerName":"_set_seq","name":"$self"},{"name":"$seqType","containerName":"_set_seq","line":970,"kind":13},{"kind":13,"line":970,"containerName":"_set_seq","name":"$ranges"},{"kind":13,"line":971,"containerName":"_set_seq","name":"$self"},{"name":"$seqType","containerName":"_set_seq","kind":13,"line":971},{"name":"$ranges","containerName":"_set_seq","line":971,"kind":13},{"line":972,"kind":13,"name":"$self","containerName":"_set_seq"},{"kind":13,"line":972,"containerName":"_set_seq","name":"$seqType"},{"line":972,"kind":13,"name":"@sequence","containerName":"_set_seq"},{"kind":13,"line":974,"containerName":"_set_seq","name":"$self"},{"containerName":"_set_seq","name":"$seqType","line":974,"kind":13},{"line":974,"kind":13,"containerName":"_set_seq","name":"$ranges"},{"name":"$ranges","containerName":"_set_seq","kind":13,"line":974},{"line":979,"kind":13,"localvar":"my","definition":"my","name":"$prog","containerName":"_set_seq"},{"line":979,"kind":13,"name":"$self","containerName":"_set_seq"},{"kind":12,"line":979,"name":"algorithm","containerName":"_set_seq"},{"name":"$prog","containerName":"_set_seq","line":980,"kind":13},{"line":980,"kind":13,"name":"$seqType","containerName":"_set_seq"},{"containerName":"_set_seq","name":"$self","kind":13,"line":981},{"line":981,"kind":13,"name":"$seqType","containerName":"_set_seq"},{"kind":13,"line":982,"containerName":"_set_seq","name":"$prog"},{"kind":13,"line":982,"containerName":"_set_seq","name":"$seqType"},{"kind":13,"line":983,"containerName":"_set_seq","name":"$self"},{"line":983,"kind":13,"containerName":"_set_seq","name":"$seqType"},{"line":984,"kind":13,"name":"$prog","containerName":"_set_seq"},{"line":985,"kind":13,"containerName":"_set_seq","name":"$self"},{"containerName":"_set_seq","name":"$seqType","kind":13,"line":985},{"kind":13,"line":988,"containerName":"_set_seq","name":"$prog"},{"kind":13,"line":989,"name":"$self","containerName":"_set_seq"},{"kind":13,"line":989,"containerName":"_set_seq","name":"$seqType"},{"line":989,"kind":13,"containerName":"_set_seq","name":"$prog"},{"name":"$self","containerName":"_set_seq","line":990,"kind":13},{"line":990,"kind":13,"name":"$seqType","containerName":"_set_seq"},{"line":990,"kind":13,"containerName":"_set_seq","name":"$prog"},{"containerName":"_set_seq","name":"$seqType","kind":13,"line":990},{"containerName":"_set_seq","name":"$self","kind":13,"line":993},{"name":"$seqType","containerName":"_set_seq","line":993,"kind":13},{"line":993,"kind":13,"name":"$self","containerName":"_set_seq"},{"kind":13,"line":993,"name":"$seqType","containerName":"_set_seq"},{"containerName":"_set_seq","name":"$self","kind":13,"line":994},{"kind":13,"line":994,"containerName":"_set_seq","name":"$seqType"},{"line":994,"kind":13,"containerName":"_set_seq","name":"$self"},{"line":994,"kind":13,"name":"$seqType","containerName":"_set_seq"},{"containerName":"_set_seq","name":"$self","kind":13,"line":995},{"name":"$seqType","containerName":"_set_seq","line":995,"kind":13},{"name":"$self","containerName":"_set_seq","kind":13,"line":995},{"kind":13,"line":995,"name":"$seqType","containerName":"_set_seq"},{"line":996,"kind":13,"containerName":"_set_seq","name":"$self"},{"name":"$seqType","containerName":"_set_seq","line":996,"kind":13},{"containerName":"_set_seq","definition":"my","name":"$seqstr","localvar":"my","kind":13,"line":1002},{"containerName":"_set_seq","name":"@sequence","line":1002,"kind":13},{"name":"$seqstr","containerName":"_set_seq","kind":13,"line":1003},{"kind":13,"line":1004,"containerName":"_set_seq","definition":"my","name":"$num_gaps","localvar":"my"},{"kind":13,"line":1004,"name":"$seqstr","containerName":"_set_seq"},{"name":"$self","containerName":"_set_seq","kind":13,"line":1004},{"containerName":"_set_seq","name":"$seqType","kind":13,"line":1004},{"kind":13,"line":1005,"name":"$self","containerName":"_set_seq"},{"name":"$seqType","containerName":"_set_seq","line":1005,"kind":13},{"kind":13,"line":1005,"name":"$num_gaps","containerName":"_set_seq"},{"kind":13,"line":1005,"containerName":"_set_seq","name":"$num_gaps"}],"line":944,"kind":12},{"name":"ranges","line":971,"kind":12},{"name":"ranges","line":974,"kind":12},{"kind":12,"line":1004,"containerName":"length","name":"CORE"},{"children":[{"name":"$self","definition":"my","containerName":"_set_residues","localvar":"my","kind":13,"line":1029},{"containerName":"_set_residues","name":"@sequence","definition":"my","localvar":"my","kind":13,"line":1030},{"line":1032,"kind":13,"containerName":"_set_residues","name":"$self"},{"name":"_set_seq_data","containerName":"_set_residues","line":1032,"kind":12},{"name":"$self","containerName":"_set_residues","kind":13,"line":1032},{"localvar":"my","containerName":"_set_residues","definition":"my","name":"%identicalList_query","line":1035,"kind":13},{"localvar":"my","name":"%identicalList_sbjct","definition":"my","containerName":"_set_residues","line":1036,"kind":13},{"localvar":"my","name":"%conservedList_query","definition":"my","containerName":"_set_residues","line":1037,"kind":13},{"kind":13,"line":1038,"containerName":"_set_residues","name":"%conservedList_sbjct","definition":"my","localvar":"my"},{"containerName":"_set_residues","definition":"my","name":"$aref","localvar":"my","kind":13,"line":1040},{"name":"$self","containerName":"_set_residues","line":1040,"kind":13},{"name":"_set_match_seq","containerName":"_set_residues","line":1040,"kind":12},{"line":1040,"kind":13,"containerName":"_set_residues","name":"$self"},{"kind":13,"line":1041,"containerName":"_set_residues","name":"$aref"},{"containerName":"_set_residues","name":"$self","kind":13,"line":1041},{"kind":13,"line":1042,"containerName":"_set_residues","name":"$seqString","definition":"my","localvar":"my"},{"containerName":"_set_residues","name":"$aref","line":1042,"kind":13},{"localvar":"my","containerName":"_set_residues","name":"$qseq","definition":"my","line":1044,"kind":13},{"line":1044,"kind":13,"containerName":"_set_residues","name":"$self"}],"line":1027,"kind":12,"range":{"start":{"character":0,"line":1027},"end":{"character":9999,"line":1044}},"name":"_set_residues","definition":"sub","containerName":"main::"},{"containerName":null,"definition":"my","name":"$sseq","localvar":"my","kind":13,"line":1045},{"line":1045,"kind":13,"name":"%self","containerName":null},{"name":"$resCount_query","definition":"my","containerName":null,"localvar":"my","kind":13,"line":1046},{"kind":13,"line":1046,"containerName":null,"name":"%self"},{"line":1047,"kind":13,"localvar":"my","definition":"my","name":"$resCount_sbjct","containerName":null},{"kind":13,"line":1047,"containerName":null,"name":"%self"},{"localvar":"my","name":"$prog","definition":"my","containerName":null,"line":1049,"kind":13},{"containerName":null,"name":"$self","line":1049,"kind":13},{"kind":12,"line":1049,"containerName":"main::","name":"algorithm"},{"containerName":null,"name":"%prog","kind":13,"line":1050},{"containerName":null,"name":"%prog","line":1051,"kind":13},{"kind":13,"line":1052,"containerName":null,"name":"$resCount_sbjct"},{"kind":13,"line":1053,"containerName":null,"name":"%prog"},{"line":1054,"kind":13,"name":"$resCount_query","containerName":null},{"line":1055,"kind":13,"name":"%prog","containerName":null},{"line":1056,"kind":13,"containerName":null,"name":"$resCount_query"},{"line":1057,"kind":13,"name":"$resCount_sbjct","containerName":null},{"kind":13,"line":1061,"containerName":null,"definition":"my","name":"$mchar","localvar":"my"},{"containerName":null,"name":"$schar","line":1061,"kind":13},{"kind":13,"line":1061,"name":"$qchar","containerName":null},{"name":"$mchar","containerName":null,"line":1062,"kind":13},{"containerName":null,"name":"%seqString","kind":13,"line":1062},{"containerName":null,"name":"$qchar","line":1063,"kind":13},{"line":1063,"kind":13,"containerName":null,"name":"$schar"},{"kind":13,"line":1063,"containerName":null,"name":"$qseq"},{"kind":13,"line":1063,"name":"$sseq","containerName":null},{"containerName":null,"name":"%mchar","line":1064,"kind":13},{"containerName":null,"name":"%conservedList_query","kind":13,"line":1065},{"kind":13,"line":1065,"containerName":null,"name":"$resCount_query"},{"name":"%conservedList_sbjct","containerName":null,"kind":13,"line":1066},{"containerName":null,"name":"$resCount_sbjct","line":1066,"kind":13},{"containerName":null,"name":"%mchar","kind":13,"line":1067},{"containerName":null,"name":"%identicalList_query","line":1068,"kind":13},{"kind":13,"line":1068,"name":"$resCount_query","containerName":null},{"kind":13,"line":1069,"containerName":null,"name":"%identicalList_sbjct"},{"containerName":null,"name":"$resCount_sbjct","kind":13,"line":1069},{"containerName":null,"name":"$resCount_query","kind":13,"line":1071},{"name":"$qchar","containerName":null,"line":1071,"kind":13},{"containerName":null,"name":"$GAP_SYMBOL","line":1071,"kind":13},{"containerName":null,"name":"$resCount_sbjct","kind":13,"line":1072},{"kind":13,"line":1072,"name":"$schar","containerName":null},{"containerName":null,"name":"$GAP_SYMBOL","line":1072,"kind":13},{"name":"%self","containerName":null,"kind":13,"line":1074},{"line":1074,"kind":13,"name":"%identicalList_query","containerName":null},{"containerName":null,"name":"%self","line":1075,"kind":13},{"line":1075,"kind":13,"containerName":null,"name":"%conservedList_query"},{"line":1076,"kind":13,"containerName":null,"name":"%self"},{"kind":13,"line":1076,"name":"%identicalList_sbjct","containerName":null},{"name":"%self","containerName":null,"line":1077,"kind":13},{"name":"%conservedList_sbjct","containerName":null,"kind":13,"line":1077},{"name":"_set_match_seq","definition":"sub","containerName":"main::","range":{"start":{"line":1098,"character":0},"end":{"character":9999,"line":1107}},"kind":12,"children":[{"localvar":"my","definition":"my","name":"$self","containerName":"_set_match_seq","line":1100,"kind":13},{"line":1102,"kind":13,"containerName":"_set_match_seq","name":"$self"},{"kind":13,"line":1103,"name":"$id_str","definition":"my","containerName":"_set_match_seq","localvar":"my"},{"line":1103,"kind":13,"name":"$self","containerName":"_set_match_seq"},{"line":1103,"kind":12,"name":"_id_str","containerName":"_set_match_seq"},{"kind":13,"line":1104,"containerName":"_set_match_seq","name":"$self"},{"name":"throw","containerName":"_set_match_seq","kind":12,"line":1104},{"kind":13,"line":1107,"containerName":"_set_match_seq","name":"@data","definition":"my","localvar":"my"},{"containerName":"_set_match_seq","name":"$self","kind":13,"line":1107}],"line":1098},{"containerName":null,"name":"@sequence","definition":"my","localvar":"my","kind":13,"line":1109},{"line":1110,"kind":13,"containerName":null,"name":"@data"},{"line":1115,"kind":13,"name":"@sequence","containerName":null},{"line":1118,"kind":13,"name":"%self","containerName":null},{"line":1119,"kind":13,"containerName":null,"name":"%self"},{"kind":13,"line":1121,"containerName":null,"name":"%self"},{"kind":13,"line":1121,"name":"@sequence","containerName":null},{"kind":13,"line":1123,"name":"%self","containerName":null},{"range":{"start":{"line":1146,"character":0},"end":{"character":9999,"line":1146}},"definition":"sub","name":"n","containerName":"main::","children":[{"localvar":"my","definition":"my","name":"$self","containerName":"n","line":1146,"kind":13},{"line":1146,"kind":13,"containerName":"n","name":"$self"}],"line":1146,"kind":12},{"kind":12,"line":1175,"children":[{"containerName":"matches","name":"$self","definition":"my","localvar":"my","kind":13,"line":1177},{"containerName":"matches","name":"%param","line":1177,"kind":13},{"line":1178,"kind":13,"localvar":"my","definition":"my","name":"@data","containerName":"matches"},{"definition":"my","name":"$seqType","containerName":"matches","localvar":"my","kind":13,"line":1179},{"containerName":"matches","name":"$beg","kind":13,"line":1179},{"name":"$end","containerName":"matches","line":1179,"kind":13},{"line":1179,"kind":13,"name":"$param","containerName":"matches"},{"containerName":"matches","name":"$param","line":1179,"kind":13},{"containerName":"matches","name":"$param","line":1179,"kind":13},{"line":1180,"kind":13,"containerName":"matches","name":"$seqType"},{"line":1181,"kind":13,"name":"$seqType","containerName":"matches"},{"name":"$seqType","containerName":"matches","line":1181,"kind":13},{"localvar":"my","definition":"my","name":"$start","containerName":"matches","line":1183,"kind":13},{"name":"$stop","containerName":"matches","line":1183,"kind":13},{"containerName":"matches","name":"$beg","line":1185,"kind":13},{"containerName":"matches","name":"$end","kind":13,"line":1185},{"name":"@data","containerName":"matches","line":1187,"kind":13},{"kind":13,"line":1187,"name":"$self","containerName":"matches"},{"line":1187,"kind":13,"name":"$self","containerName":"matches"},{"kind":13,"line":1190,"name":"$beg","containerName":"matches"},{"containerName":"matches","name":"$end","kind":13,"line":1191},{"containerName":"matches","name":"$start","kind":13,"line":1192},{"kind":13,"line":1192,"containerName":"matches","name":"$stop"},{"kind":13,"line":1192,"containerName":"matches","name":"$self"},{"name":"range","containerName":"matches","kind":12,"line":1192},{"name":"$seqType","containerName":"matches","kind":13,"line":1192},{"kind":13,"line":1193,"containerName":"matches","name":"$beg"},{"name":"$beg","containerName":"matches","line":1193,"kind":13},{"line":1193,"kind":13,"containerName":"matches","name":"$start"},{"containerName":"matches","name":"$end","line":1193,"kind":13},{"line":1193,"kind":13,"name":"$beg","containerName":"matches"},{"containerName":"matches","name":"$end","line":1193,"kind":13},{"containerName":"matches","name":"$end","line":1194,"kind":13},{"kind":13,"line":1194,"name":"$end","containerName":"matches"},{"line":1194,"kind":13,"name":"$stop","containerName":"matches"},{"kind":13,"line":1194,"containerName":"matches","name":"$beg"},{"name":"$end","containerName":"matches","line":1194,"kind":13},{"kind":13,"line":1194,"name":"$beg","containerName":"matches"},{"line":1196,"kind":13,"containerName":"matches","name":"$end"},{"name":"$stop","containerName":"matches","kind":13,"line":1196},{"containerName":"matches","name":"$end","line":1196,"kind":13},{"kind":13,"line":1196,"name":"$stop","containerName":"matches"},{"kind":13,"line":1197,"containerName":"matches","name":"$end"},{"kind":13,"line":1200,"name":"$beg","containerName":"matches"},{"containerName":"matches","name":"$start","line":1200,"kind":13},{"name":"$beg","containerName":"matches","kind":13,"line":1200},{"kind":13,"line":1200,"name":"$start","containerName":"matches"},{"kind":13,"line":1206,"definition":"my","name":"$seq","containerName":"matches","localvar":"my"},{"kind":13,"line":1207,"definition":"my","name":"$prog","containerName":"matches","localvar":"my"},{"name":"$self","containerName":"matches","line":1207,"kind":13},{"kind":12,"line":1207,"name":"algorithm","containerName":"matches"},{"name":"$prog","containerName":"matches","kind":13,"line":1208},{"containerName":"matches","name":"$seqType","kind":13,"line":1208},{"line":1210,"kind":13,"name":"$seq","containerName":"matches"},{"line":1210,"kind":13,"name":"$self","containerName":"matches"},{"line":1210,"kind":12,"containerName":"matches","name":"seq_str"},{"kind":13,"line":1211,"name":"$beg","containerName":"matches"},{"kind":13,"line":1211,"name":"$start","containerName":"matches"},{"kind":13,"line":1211,"name":"$end","containerName":"matches"},{"line":1211,"kind":13,"containerName":"matches","name":"$beg"},{"line":1213,"kind":13,"containerName":"matches","name":"$prog"},{"line":1213,"kind":13,"name":"$seqType","containerName":"matches"},{"name":"$seq","containerName":"matches","line":1215,"kind":13},{"kind":13,"line":1215,"containerName":"matches","name":"$self"},{"line":1215,"kind":12,"name":"seq_str","containerName":"matches"},{"kind":13,"line":1216,"containerName":"matches","name":"$beg"},{"name":"$start","containerName":"matches","line":1216,"kind":13},{"kind":13,"line":1216,"name":"$end","containerName":"matches"},{"line":1216,"kind":13,"containerName":"matches","name":"$beg"},{"containerName":"matches","name":"$seq","kind":13,"line":1218},{"containerName":"matches","name":"$self","kind":13,"line":1218},{"kind":12,"line":1218,"name":"seq_str","containerName":"matches"},{"line":1219,"kind":13,"name":"$beg","containerName":"matches"},{"kind":13,"line":1219,"containerName":"matches","name":"$start"},{"line":1219,"kind":13,"containerName":"matches","name":"$end"},{"name":"$beg","containerName":"matches","line":1219,"kind":13},{"containerName":"matches","name":"$seq","kind":13,"line":1237},{"line":1238,"kind":13,"localvar":"my","containerName":"matches","name":"$id_str","definition":"my"},{"containerName":"matches","name":"$self","line":1238,"kind":13},{"name":"_id_str","containerName":"matches","line":1238,"kind":12},{"containerName":"matches","name":"$self","kind":13,"line":1239},{"containerName":"matches","name":"throw","line":1239,"kind":12},{"line":1244,"kind":13,"name":"$seq","containerName":"matches"},{"containerName":"matches","definition":"my","name":"$len_cons","localvar":"my","kind":13,"line":1245},{"name":"$seq","containerName":"matches","kind":13,"line":1245},{"name":"$seq","containerName":"matches","line":1246,"kind":13},{"line":1247,"kind":13,"localvar":"my","definition":"my","name":"$len_id","containerName":"matches"},{"containerName":"matches","name":"$seq","line":1247,"kind":13},{"kind":13,"line":1248,"name":"@data","containerName":"matches"},{"name":"$len_id","containerName":"matches","line":1248,"kind":13},{"kind":13,"line":1248,"containerName":"matches","name":"$len_cons"},{"name":"@data","containerName":"matches","kind":13,"line":1251}],"containerName":"main::","definition":"sub","detail":"($self,%param)","signature":{"label":"matches($self,%param)","parameters":[{"label":"$self"},{"label":"%param"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>\n\n\n#--------------------\nsub _set_match_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    if($data =~ m!Identities = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numIdentical'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Positives = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numConserved'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Frame = ([\\d+-]+)!) {\n      $self->frame($1);\n    }\n\n    # Strand data is not always present in this line.\n    # _set_seq() will also set strand information.\n    if($data =~ m!Strand = (\\w+) / (\\w+)!) {\n\t$self->{'_queryStrand'} = $1;\n\t$self->{'_sbjctStrand'} = $2;\n    }\n\n#    if($data =~ m!Gaps = (\\d+)/(\\d+)!) {\n#\t $self->{'_totalGaps'} = $1;\n#    } else {\n#\t $self->{'_totalGaps'} = 0;\n#    }\n}\n\n\n\n=head2 _set_seq_data\n\n Usage     : called automatically when sequence data is requested.\n Purpose   : Sets the HSP sequence data for both query and sbjct sequences.\n           : Includes: start, stop, length, gaps, and raw sequence.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_match_seq()\n Comments  : Uses raw data stored by _set_data() during object construction.\n           : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as gaps(), _set_residues(),\n           : etc. _set_seq() does the dirty work.\n\nSee Also   : L</_set_seq>\n\n\n#-----------------\nsub _set_seq_data {\n#-----------------\n    my $self = shift;\n\n    $self->_set_seq('query', @{$self->{'_queryList'}});\n    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});\n\n    # Liberate some memory.\n    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();\n    undef $self->{'_queryList'};\n    undef $self->{'_sbjctList'};\n\n    $self->{'_set_seq_data'} = 1;\n}\n\n\n\n=head2 _set_seq \n\n Usage     : called automatically by _set_seq_data()\n           : $hsp_obj->($seq_type, @data);\n Purpose   : Sets sequence information for both the query and sbjct sequences.\n           : Directly counts the number of gaps in each sequence (if gapped Blast).\n Argument  : $seq_type = 'query' or 'sbjct'\n           : @data = all seq lines with the form:\n           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120\n Throws    : Exception if data strings cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : Uses first argument to determine which data members to set\n           : making this method sensitive data member name changes.\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs\n           : for TBLASTN/X hits on the minus strand. Normalization facilitates use\n           : of range information by methods such as match().\n\nSee Also   : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>\n\n\n#-------------\nsub _set_seq {\n#-------------\n    my $self      = shift;\n    my $seqType   = shift;\n    my @data      = @_;\n    my @ranges    = ();\n    my @sequence  = ();\n    my $numGaps   = 0;\n\n    foreach( @data ) {\n        if( m/(\\d+) *([^\\d\\s]+) *(\\d+)/) {\n            push @ranges, ( $1, $3 ) ;\n            push @sequence, $2;\n        #print STDERR \"_set_seq found sequence \\\"$2\\\"\\n\";\n\t} else {\n\t    $self->warn(\"Bad sequence data: $_\");\n\t}\n    }\n\n    if( !(scalar(@sequence) and scalar(@ranges))) {\n        my $id_str = $self->_id_str;\n\t$self->throw(\"Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType\");\n   }\n\n    # Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n    $self->{$seqType.'Start'} = $ranges[0];\n    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];\n    $self->{$seqType.'Seq'}   = \\@sequence;\n\n    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;\n\n    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences\n    # Converting nucl coords to amino acid coords.\n\n    my $prog = $self->algorithm;\n    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'TBLASTX') {\n\t$self->{$seqType.'Length'} /= 3;\n    }\n\n    if( $prog ne 'BLASTP' ) {\n        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;\n        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');\n        # Normalize sequence endpoints so that start < end.\n        # Reverse complement or 'minus strand' HSPs get flipped here.\n        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {\n            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =\n                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});\n            $self->{$seqType.'Strand'} = 'Minus';\n        }\n    }\n\n    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.\n#    if($self->{'_gapped'}) {\n\tmy $seqstr = join('', @sequence);\n\t$seqstr =~ s/\\s//g;\n        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};\n\t$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;\n#    }\n}\n\n\n=head2 _set_residues\n\n Usage     : called automatically when residue data is requested.\n Purpose   : Sets the residue numbers representing the identical and\n           : conserved positions. These data are obtained by analyzing the\n           : symbols between query and sbjct lines of the alignments.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().\n Comments  : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as seq_inds().\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n\nSee Also   : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>\n\n\n#------------------\nsub _set_residues {\n#------------------\n    my $self      = shift;\n    my @sequence  = ();\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    # Using hashes to avoid saving duplicate residue numbers.\n    my %identicalList_query = ();\n    my %identicalList_sbjct = ();\n    my %conservedList_query = ();\n    my %conservedList_sbjct = ();\n\n    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};\n    $aref  ||= $self->{'_matchSeq'};\n    my $seqString = join('', @$aref );\n\n    my $qseq = join('',@{$self->{'_querySeq'}});\n    my $sseq = join('',@{$self->{'_sbjctSeq'}});\n    my $resCount_query = $self->{'_queryStop'} || 0;\n    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;\n\n    my $prog = $self->algorithm;\n    if($prog !~ /^BLASTP|^BLASTN/) {\n\tif($prog eq 'TBLASTN') {\n\t    $resCount_sbjct /= 3;\n\t} elsif($prog eq 'BLASTX') {\n\t    $resCount_query /= 3;\n\t} elsif($prog eq 'TBLASTX') {\n\t    $resCount_query /= 3;\n\t    $resCount_sbjct /= 3;\n\t}\n    }\n\n    my ($mchar, $schar, $qchar);\n    while( $mchar = chop($seqString) ) {\n\t($qchar, $schar) = (chop($qseq), chop($sseq));\n\tif( $mchar eq '+' ) {\n\t    $conservedList_query{ $resCount_query } = 1;\n\t    $conservedList_sbjct{ $resCount_sbjct } = 1;\n\t} elsif( $mchar ne ' ' ) {\n\t    $identicalList_query{ $resCount_query } = 1;\n\t    $identicalList_sbjct{ $resCount_sbjct } = 1;\n\t}\n\t$resCount_query-- if $qchar ne $GAP_SYMBOL;\n\t$resCount_sbjct-- if $schar ne $GAP_SYMBOL;\n    }\n    $self->{'_identicalRes_query'} = \\%identicalList_query;\n    $self->{'_conservedRes_query'} = \\%conservedList_query;\n    $self->{'_identicalRes_sbjct'} = \\%identicalList_sbjct;\n    $self->{'_conservedRes_sbjct'} = \\%conservedList_sbjct;\n\n}\n\n=head2 _set_match_seq\n\n Usage     : $hsp_obj->_set_match_seq()\n Purpose   : Set the 'match' sequence for the current HSP (symbols in between\n           : the query and sbjct lines.)\n Returns   : Array reference holding the match sequences lines.\n Argument  : n/a\n Throws    : Exception if the _matchList field is not set.\n Comments  : The match information is not always necessary. This method\n           : allows it to be conditionally prepared.\n           : Called by _set_residues>() and seq_str().\n\nSee Also   : L</_set_residues>, L</seq_str>\n\n\n#-------------------\nsub _set_match_seq {\n#-------------------\n    my $self = shift;\n\n    if( ! ref($self->{'_matchList'}) ) {\n        my $id_str = $self->_id_str;\n        $self->throw(\"Can't set HSP match sequence: No data ($id_str)\");\n    }\n\n    my @data = @{$self->{'_matchList'}};\n\n    my(@sequence);\n    foreach( @data ) {\n\tchomp($_);\n\t## Remove leading spaces; (note: aln may begin with a space\n\t## which is why we can't use s/^ +//).\n\ts/^ {$self->{'_match_indent'}}//;\n\tpush @sequence, $_;\n    }\n    # Liberate some memory.\n    @{$self->{'_matchList'}} = undef;\n    $self->{'_matchList'} = undef;\n\n    $self->{'_matchSeq'} = \\@sequence;\n\n    return $self->{'_matchSeq'};\n}\n\n\n=head2 n\n\n Usage     : $hsp_obj->n()\n Purpose   : Get the N value (num HSPs on which P/Expect is based).\n           : This value is not defined with NCBI Blast2 with gapping.\n Returns   : Integer or null string if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : The 'N' value is listed in parenthesis with P/Expect value:\n           : e.g., P(3) = 1.2e-30  ---> (N = 3).\n           : Not defined in NCBI Blast2 with gaps.\n           : This typically is equal to the number of HSPs but not always.\n           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().\n\nSee Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>\n\n\n#-----\nsub n { my $self = shift; $self->{'N'} || ''; }\n#-----\n\n\n=head2 matches\n\n Usage     : $hsp->matches([seq_type], [start], [stop]);\n Purpose   : Get the total number of identical and conservative matches\n           : in the query or sbjct sequence for the given HSP. Optionally can\n           : report data within a defined interval along the seq.\n           : (Note: 'conservative' matches are called 'positives' in the\n\t   : Blast report.)\n Example   : ($id,$cons) = $hsp_object->matches('hit');\n           : ($id,$cons) = $hsp_object->matches('query',300,400);\n Returns   : 2-element array of integers\n Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : (2) start = Starting coordinate (optional)\n           : (3) stop  = Ending coordinate (optional)\n Throws    : Exception if the supplied coordinates are out of range.\n Comments  : Relies on seq_str('match') to get the string of alignment symbols\n           : between the query and sbjct lines which are used for determining\n           : the number of identical and conservative matches.\n\nSee Also   : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>"},"name":"matches","range":{"start":{"line":1175,"character":0},"end":{"line":1252,"character":9999}}},{"name":"CORE","containerName":"length","kind":12,"line":1237},{"kind":12,"line":1246,"name":"CORE","containerName":"length"},{"line":1248,"kind":12,"containerName":"length","name":"CORE"},{"line":1269,"children":[{"line":1271,"kind":13,"localvar":"my","definition":"my","name":"$self","containerName":"num_identical"},{"kind":13,"line":1273,"containerName":"num_identical","name":"$self"}],"kind":12,"range":{"end":{"character":9999,"line":1274},"start":{"character":0,"line":1269}},"containerName":"main::","definition":"sub","name":"num_identical"},{"line":1291,"children":[{"kind":13,"line":1293,"name":"$self","definition":"my","containerName":"num_conserved","localvar":"my"},{"name":"$self","containerName":"num_conserved","kind":13,"line":1295}],"kind":12,"range":{"end":{"character":9999,"line":1296},"start":{"line":1291,"character":0}},"containerName":"main::","definition":"sub","name":"num_conserved"},{"name":"range","range":{"end":{"character":9999,"line":1330},"start":{"character":0,"line":1317}},"definition":"sub","containerName":"main::","signature":{"parameters":[{"label":"$self"},{"label":"$seqType"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>\n\n\n#--------------------\nsub _set_match_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    if($data =~ m!Identities = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numIdentical'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Positives = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numConserved'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Frame = ([\\d+-]+)!) {\n      $self->frame($1);\n    }\n\n    # Strand data is not always present in this line.\n    # _set_seq() will also set strand information.\n    if($data =~ m!Strand = (\\w+) / (\\w+)!) {\n\t$self->{'_queryStrand'} = $1;\n\t$self->{'_sbjctStrand'} = $2;\n    }\n\n#    if($data =~ m!Gaps = (\\d+)/(\\d+)!) {\n#\t $self->{'_totalGaps'} = $1;\n#    } else {\n#\t $self->{'_totalGaps'} = 0;\n#    }\n}\n\n\n\n=head2 _set_seq_data\n\n Usage     : called automatically when sequence data is requested.\n Purpose   : Sets the HSP sequence data for both query and sbjct sequences.\n           : Includes: start, stop, length, gaps, and raw sequence.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_match_seq()\n Comments  : Uses raw data stored by _set_data() during object construction.\n           : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as gaps(), _set_residues(),\n           : etc. _set_seq() does the dirty work.\n\nSee Also   : L</_set_seq>\n\n\n#-----------------\nsub _set_seq_data {\n#-----------------\n    my $self = shift;\n\n    $self->_set_seq('query', @{$self->{'_queryList'}});\n    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});\n\n    # Liberate some memory.\n    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();\n    undef $self->{'_queryList'};\n    undef $self->{'_sbjctList'};\n\n    $self->{'_set_seq_data'} = 1;\n}\n\n\n\n=head2 _set_seq \n\n Usage     : called automatically by _set_seq_data()\n           : $hsp_obj->($seq_type, @data);\n Purpose   : Sets sequence information for both the query and sbjct sequences.\n           : Directly counts the number of gaps in each sequence (if gapped Blast).\n Argument  : $seq_type = 'query' or 'sbjct'\n           : @data = all seq lines with the form:\n           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120\n Throws    : Exception if data strings cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : Uses first argument to determine which data members to set\n           : making this method sensitive data member name changes.\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs\n           : for TBLASTN/X hits on the minus strand. Normalization facilitates use\n           : of range information by methods such as match().\n\nSee Also   : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>\n\n\n#-------------\nsub _set_seq {\n#-------------\n    my $self      = shift;\n    my $seqType   = shift;\n    my @data      = @_;\n    my @ranges    = ();\n    my @sequence  = ();\n    my $numGaps   = 0;\n\n    foreach( @data ) {\n        if( m/(\\d+) *([^\\d\\s]+) *(\\d+)/) {\n            push @ranges, ( $1, $3 ) ;\n            push @sequence, $2;\n        #print STDERR \"_set_seq found sequence \\\"$2\\\"\\n\";\n\t} else {\n\t    $self->warn(\"Bad sequence data: $_\");\n\t}\n    }\n\n    if( !(scalar(@sequence) and scalar(@ranges))) {\n        my $id_str = $self->_id_str;\n\t$self->throw(\"Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType\");\n   }\n\n    # Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n    $self->{$seqType.'Start'} = $ranges[0];\n    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];\n    $self->{$seqType.'Seq'}   = \\@sequence;\n\n    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;\n\n    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences\n    # Converting nucl coords to amino acid coords.\n\n    my $prog = $self->algorithm;\n    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'TBLASTX') {\n\t$self->{$seqType.'Length'} /= 3;\n    }\n\n    if( $prog ne 'BLASTP' ) {\n        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;\n        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');\n        # Normalize sequence endpoints so that start < end.\n        # Reverse complement or 'minus strand' HSPs get flipped here.\n        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {\n            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =\n                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});\n            $self->{$seqType.'Strand'} = 'Minus';\n        }\n    }\n\n    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.\n#    if($self->{'_gapped'}) {\n\tmy $seqstr = join('', @sequence);\n\t$seqstr =~ s/\\s//g;\n        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};\n\t$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;\n#    }\n}\n\n\n=head2 _set_residues\n\n Usage     : called automatically when residue data is requested.\n Purpose   : Sets the residue numbers representing the identical and\n           : conserved positions. These data are obtained by analyzing the\n           : symbols between query and sbjct lines of the alignments.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().\n Comments  : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as seq_inds().\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n\nSee Also   : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>\n\n\n#------------------\nsub _set_residues {\n#------------------\n    my $self      = shift;\n    my @sequence  = ();\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    # Using hashes to avoid saving duplicate residue numbers.\n    my %identicalList_query = ();\n    my %identicalList_sbjct = ();\n    my %conservedList_query = ();\n    my %conservedList_sbjct = ();\n\n    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};\n    $aref  ||= $self->{'_matchSeq'};\n    my $seqString = join('', @$aref );\n\n    my $qseq = join('',@{$self->{'_querySeq'}});\n    my $sseq = join('',@{$self->{'_sbjctSeq'}});\n    my $resCount_query = $self->{'_queryStop'} || 0;\n    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;\n\n    my $prog = $self->algorithm;\n    if($prog !~ /^BLASTP|^BLASTN/) {\n\tif($prog eq 'TBLASTN') {\n\t    $resCount_sbjct /= 3;\n\t} elsif($prog eq 'BLASTX') {\n\t    $resCount_query /= 3;\n\t} elsif($prog eq 'TBLASTX') {\n\t    $resCount_query /= 3;\n\t    $resCount_sbjct /= 3;\n\t}\n    }\n\n    my ($mchar, $schar, $qchar);\n    while( $mchar = chop($seqString) ) {\n\t($qchar, $schar) = (chop($qseq), chop($sseq));\n\tif( $mchar eq '+' ) {\n\t    $conservedList_query{ $resCount_query } = 1;\n\t    $conservedList_sbjct{ $resCount_sbjct } = 1;\n\t} elsif( $mchar ne ' ' ) {\n\t    $identicalList_query{ $resCount_query } = 1;\n\t    $identicalList_sbjct{ $resCount_sbjct } = 1;\n\t}\n\t$resCount_query-- if $qchar ne $GAP_SYMBOL;\n\t$resCount_sbjct-- if $schar ne $GAP_SYMBOL;\n    }\n    $self->{'_identicalRes_query'} = \\%identicalList_query;\n    $self->{'_conservedRes_query'} = \\%conservedList_query;\n    $self->{'_identicalRes_sbjct'} = \\%identicalList_sbjct;\n    $self->{'_conservedRes_sbjct'} = \\%conservedList_sbjct;\n\n}\n\n=head2 _set_match_seq\n\n Usage     : $hsp_obj->_set_match_seq()\n Purpose   : Set the 'match' sequence for the current HSP (symbols in between\n           : the query and sbjct lines.)\n Returns   : Array reference holding the match sequences lines.\n Argument  : n/a\n Throws    : Exception if the _matchList field is not set.\n Comments  : The match information is not always necessary. This method\n           : allows it to be conditionally prepared.\n           : Called by _set_residues>() and seq_str().\n\nSee Also   : L</_set_residues>, L</seq_str>\n\n\n#-------------------\nsub _set_match_seq {\n#-------------------\n    my $self = shift;\n\n    if( ! ref($self->{'_matchList'}) ) {\n        my $id_str = $self->_id_str;\n        $self->throw(\"Can't set HSP match sequence: No data ($id_str)\");\n    }\n\n    my @data = @{$self->{'_matchList'}};\n\n    my(@sequence);\n    foreach( @data ) {\n\tchomp($_);\n\t## Remove leading spaces; (note: aln may begin with a space\n\t## which is why we can't use s/^ +//).\n\ts/^ {$self->{'_match_indent'}}//;\n\tpush @sequence, $_;\n    }\n    # Liberate some memory.\n    @{$self->{'_matchList'}} = undef;\n    $self->{'_matchList'} = undef;\n\n    $self->{'_matchSeq'} = \\@sequence;\n\n    return $self->{'_matchSeq'};\n}\n\n\n=head2 n\n\n Usage     : $hsp_obj->n()\n Purpose   : Get the N value (num HSPs on which P/Expect is based).\n           : This value is not defined with NCBI Blast2 with gapping.\n Returns   : Integer or null string if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : The 'N' value is listed in parenthesis with P/Expect value:\n           : e.g., P(3) = 1.2e-30  ---> (N = 3).\n           : Not defined in NCBI Blast2 with gaps.\n           : This typically is equal to the number of HSPs but not always.\n           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().\n\nSee Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>\n\n\n#-----\nsub n { my $self = shift; $self->{'N'} || ''; }\n#-----\n\n\n=head2 matches\n\n Usage     : $hsp->matches([seq_type], [start], [stop]);\n Purpose   : Get the total number of identical and conservative matches\n           : in the query or sbjct sequence for the given HSP. Optionally can\n           : report data within a defined interval along the seq.\n           : (Note: 'conservative' matches are called 'positives' in the\n\t   : Blast report.)\n Example   : ($id,$cons) = $hsp_object->matches('hit');\n           : ($id,$cons) = $hsp_object->matches('query',300,400);\n Returns   : 2-element array of integers\n Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : (2) start = Starting coordinate (optional)\n           : (3) stop  = Ending coordinate (optional)\n Throws    : Exception if the supplied coordinates are out of range.\n Comments  : Relies on seq_str('match') to get the string of alignment symbols\n           : between the query and sbjct lines which are used for determining\n           : the number of identical and conservative matches.\n\nSee Also   : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub matches {\n#-----------\n    my( $self, %param ) = @_;\n    my(@data);\n    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    my($start,$stop);\n\n    if(!defined $beg && !defined $end) {\n\t## Get data for the whole alignment.\n\tpush @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});\n    } else {\n\t## Get the substring representing the desired sub-section of aln.\n\t$beg ||= 0;\n\t$end ||= 0;\n\t($start,$stop) = $self->range($seqType);\n\tif($beg == 0) { $beg = $start; $end = $beg+$end; }\n\telsif($end == 0) { $end = $stop; $beg = $end-$beg; }\n\n\tif($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)\n\telse { $end += 1;}   ##ML moved from commented position below, makes\n                             ##more sense here\n#\tif($end > $stop) { $end = $stop; }\n\tif($beg < $start) { $beg = $start; }\n#\telse { $end += 1;}\n\n#\tmy $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));\n\n\t## ML: START fix for substr out of range error ------------------\n\tmy $seq = \"\";\n        my $prog = $self->algorithm;\n\tif (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\n\t} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\t} else {\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  $beg-$start, ($end-$beg));\n\t}\n\t## ML: End of fix for  substr out of range error -----------------\n\n\n\t## ML: debugging code\n\t## This is where we get our exception.  Try printing out the values going\n\t## into this:\n\t##\n#\t print STDERR\n#\t     qq(*------------MY EXCEPTION --------------------\\nSeq: \") ,\n#\t     $self->seq_str(\"$seqType\"), qq(\"\\n),$self->rank,\",(  index:\";\n#\t print STDERR  $beg-$start, \", len: \", $end-$beg,\" ), (HSPRealLen:\",\n#\t     CORE::length $self->seq_str(\"$seqType\");\n#\t print STDERR \", HSPCalcLen: \", $stop - $start +1 ,\" ),\n#\t     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\\n\";\n\t ## ML: END DEBUGGING CODE----------\n\n\tif(!CORE::length $seq) {\n            my $id_str = $self->_id_str;\n\t    $self->throw(\"Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)\");\n\t}\n\t## Get data for a substring.\n#\tprintf \"Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\\n%s<---\\n\", $beg, $end, $start, $stop, $seq;\n#\tprintf \"Original match seq:\\n%s\\n\",$self->seq_str('match');\n\t$seq =~ s/ //g;  # remove space (no info).\n\tmy $len_cons = CORE::length $seq;\n\t$seq =~ s/\\+//g;  # remove '+' characters (conservative substitutions)\n\tmy $len_id = CORE::length $seq;\n\tpush @data, ($len_id, $len_cons);\n#\tprintf \"  HSP = %s\\n  id = %d; cons = %d\\n\", $self->rank, $len_id, $len_cons; <STDIN>;\n    }\n    @data;\n}\n\n\n=head2 num_identical\n\n Usage     : $hsp_object->num_identical();\n Purpose   : Get the number of identical positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_identical();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_conserved>, L</frac_identical>\n\n\n#-------------------\nsub num_identical {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numIdentical'};\n}\n\n\n=head2 num_conserved\n\n Usage     : $hsp_object->num_conserved();\n Purpose   : Get the number of conserved positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_conserved();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_identical>, L</frac_conserved>\n\n\n#-------------------\nsub num_conserved {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numConserved'};\n}\n\n\n\n=head2 range\n\n Usage     : $hsp->range( [seq_type] );\n Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence\n           : in the HSP alignment.\n Example   : ($query_beg, $query_end) = $hsp->range('query');\n           : ($hit_beg, $hit_end) = $hsp->range('hit');\n Returns   : Two-element array of integers\n Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</start>, L</end>","label":"range($self,$seqType)"},"detail":"($self,$seqType)","kind":12,"children":[{"kind":13,"line":1319,"name":"$self","definition":"my","containerName":"range","localvar":"my"},{"name":"$seqType","containerName":"range","kind":13,"line":1319},{"containerName":"range","name":"$self","line":1321,"kind":13},{"line":1321,"kind":12,"name":"_set_seq_data","containerName":"range"},{"line":1321,"kind":13,"containerName":"range","name":"$self"},{"kind":13,"line":1323,"containerName":"range","name":"$seqType"},{"containerName":"range","name":"$seqType","kind":13,"line":1324},{"name":"$seqType","containerName":"range","kind":13,"line":1324},{"name":"$seqType","containerName":"range","kind":13,"line":1327},{"line":1329,"kind":13,"name":"$self","containerName":"range"},{"containerName":"range","name":"$seqType","kind":13,"line":1329},{"line":1329,"kind":13,"name":"$self","containerName":"range"},{"containerName":"range","name":"$seqType","kind":13,"line":1329}],"line":1317},{"range":{"end":{"character":9999,"line":1370},"start":{"character":0,"line":1354}},"name":"start","detail":"($self,$seqType)","signature":{"label":"start($self,$seqType)","documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>\n\n\n#--------------------\nsub _set_match_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    if($data =~ m!Identities = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numIdentical'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Positives = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numConserved'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Frame = ([\\d+-]+)!) {\n      $self->frame($1);\n    }\n\n    # Strand data is not always present in this line.\n    # _set_seq() will also set strand information.\n    if($data =~ m!Strand = (\\w+) / (\\w+)!) {\n\t$self->{'_queryStrand'} = $1;\n\t$self->{'_sbjctStrand'} = $2;\n    }\n\n#    if($data =~ m!Gaps = (\\d+)/(\\d+)!) {\n#\t $self->{'_totalGaps'} = $1;\n#    } else {\n#\t $self->{'_totalGaps'} = 0;\n#    }\n}\n\n\n\n=head2 _set_seq_data\n\n Usage     : called automatically when sequence data is requested.\n Purpose   : Sets the HSP sequence data for both query and sbjct sequences.\n           : Includes: start, stop, length, gaps, and raw sequence.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_match_seq()\n Comments  : Uses raw data stored by _set_data() during object construction.\n           : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as gaps(), _set_residues(),\n           : etc. _set_seq() does the dirty work.\n\nSee Also   : L</_set_seq>\n\n\n#-----------------\nsub _set_seq_data {\n#-----------------\n    my $self = shift;\n\n    $self->_set_seq('query', @{$self->{'_queryList'}});\n    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});\n\n    # Liberate some memory.\n    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();\n    undef $self->{'_queryList'};\n    undef $self->{'_sbjctList'};\n\n    $self->{'_set_seq_data'} = 1;\n}\n\n\n\n=head2 _set_seq \n\n Usage     : called automatically by _set_seq_data()\n           : $hsp_obj->($seq_type, @data);\n Purpose   : Sets sequence information for both the query and sbjct sequences.\n           : Directly counts the number of gaps in each sequence (if gapped Blast).\n Argument  : $seq_type = 'query' or 'sbjct'\n           : @data = all seq lines with the form:\n           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120\n Throws    : Exception if data strings cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : Uses first argument to determine which data members to set\n           : making this method sensitive data member name changes.\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs\n           : for TBLASTN/X hits on the minus strand. Normalization facilitates use\n           : of range information by methods such as match().\n\nSee Also   : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>\n\n\n#-------------\nsub _set_seq {\n#-------------\n    my $self      = shift;\n    my $seqType   = shift;\n    my @data      = @_;\n    my @ranges    = ();\n    my @sequence  = ();\n    my $numGaps   = 0;\n\n    foreach( @data ) {\n        if( m/(\\d+) *([^\\d\\s]+) *(\\d+)/) {\n            push @ranges, ( $1, $3 ) ;\n            push @sequence, $2;\n        #print STDERR \"_set_seq found sequence \\\"$2\\\"\\n\";\n\t} else {\n\t    $self->warn(\"Bad sequence data: $_\");\n\t}\n    }\n\n    if( !(scalar(@sequence) and scalar(@ranges))) {\n        my $id_str = $self->_id_str;\n\t$self->throw(\"Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType\");\n   }\n\n    # Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n    $self->{$seqType.'Start'} = $ranges[0];\n    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];\n    $self->{$seqType.'Seq'}   = \\@sequence;\n\n    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;\n\n    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences\n    # Converting nucl coords to amino acid coords.\n\n    my $prog = $self->algorithm;\n    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'TBLASTX') {\n\t$self->{$seqType.'Length'} /= 3;\n    }\n\n    if( $prog ne 'BLASTP' ) {\n        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;\n        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');\n        # Normalize sequence endpoints so that start < end.\n        # Reverse complement or 'minus strand' HSPs get flipped here.\n        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {\n            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =\n                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});\n            $self->{$seqType.'Strand'} = 'Minus';\n        }\n    }\n\n    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.\n#    if($self->{'_gapped'}) {\n\tmy $seqstr = join('', @sequence);\n\t$seqstr =~ s/\\s//g;\n        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};\n\t$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;\n#    }\n}\n\n\n=head2 _set_residues\n\n Usage     : called automatically when residue data is requested.\n Purpose   : Sets the residue numbers representing the identical and\n           : conserved positions. These data are obtained by analyzing the\n           : symbols between query and sbjct lines of the alignments.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().\n Comments  : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as seq_inds().\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n\nSee Also   : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>\n\n\n#------------------\nsub _set_residues {\n#------------------\n    my $self      = shift;\n    my @sequence  = ();\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    # Using hashes to avoid saving duplicate residue numbers.\n    my %identicalList_query = ();\n    my %identicalList_sbjct = ();\n    my %conservedList_query = ();\n    my %conservedList_sbjct = ();\n\n    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};\n    $aref  ||= $self->{'_matchSeq'};\n    my $seqString = join('', @$aref );\n\n    my $qseq = join('',@{$self->{'_querySeq'}});\n    my $sseq = join('',@{$self->{'_sbjctSeq'}});\n    my $resCount_query = $self->{'_queryStop'} || 0;\n    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;\n\n    my $prog = $self->algorithm;\n    if($prog !~ /^BLASTP|^BLASTN/) {\n\tif($prog eq 'TBLASTN') {\n\t    $resCount_sbjct /= 3;\n\t} elsif($prog eq 'BLASTX') {\n\t    $resCount_query /= 3;\n\t} elsif($prog eq 'TBLASTX') {\n\t    $resCount_query /= 3;\n\t    $resCount_sbjct /= 3;\n\t}\n    }\n\n    my ($mchar, $schar, $qchar);\n    while( $mchar = chop($seqString) ) {\n\t($qchar, $schar) = (chop($qseq), chop($sseq));\n\tif( $mchar eq '+' ) {\n\t    $conservedList_query{ $resCount_query } = 1;\n\t    $conservedList_sbjct{ $resCount_sbjct } = 1;\n\t} elsif( $mchar ne ' ' ) {\n\t    $identicalList_query{ $resCount_query } = 1;\n\t    $identicalList_sbjct{ $resCount_sbjct } = 1;\n\t}\n\t$resCount_query-- if $qchar ne $GAP_SYMBOL;\n\t$resCount_sbjct-- if $schar ne $GAP_SYMBOL;\n    }\n    $self->{'_identicalRes_query'} = \\%identicalList_query;\n    $self->{'_conservedRes_query'} = \\%conservedList_query;\n    $self->{'_identicalRes_sbjct'} = \\%identicalList_sbjct;\n    $self->{'_conservedRes_sbjct'} = \\%conservedList_sbjct;\n\n}\n\n=head2 _set_match_seq\n\n Usage     : $hsp_obj->_set_match_seq()\n Purpose   : Set the 'match' sequence for the current HSP (symbols in between\n           : the query and sbjct lines.)\n Returns   : Array reference holding the match sequences lines.\n Argument  : n/a\n Throws    : Exception if the _matchList field is not set.\n Comments  : The match information is not always necessary. This method\n           : allows it to be conditionally prepared.\n           : Called by _set_residues>() and seq_str().\n\nSee Also   : L</_set_residues>, L</seq_str>\n\n\n#-------------------\nsub _set_match_seq {\n#-------------------\n    my $self = shift;\n\n    if( ! ref($self->{'_matchList'}) ) {\n        my $id_str = $self->_id_str;\n        $self->throw(\"Can't set HSP match sequence: No data ($id_str)\");\n    }\n\n    my @data = @{$self->{'_matchList'}};\n\n    my(@sequence);\n    foreach( @data ) {\n\tchomp($_);\n\t## Remove leading spaces; (note: aln may begin with a space\n\t## which is why we can't use s/^ +//).\n\ts/^ {$self->{'_match_indent'}}//;\n\tpush @sequence, $_;\n    }\n    # Liberate some memory.\n    @{$self->{'_matchList'}} = undef;\n    $self->{'_matchList'} = undef;\n\n    $self->{'_matchSeq'} = \\@sequence;\n\n    return $self->{'_matchSeq'};\n}\n\n\n=head2 n\n\n Usage     : $hsp_obj->n()\n Purpose   : Get the N value (num HSPs on which P/Expect is based).\n           : This value is not defined with NCBI Blast2 with gapping.\n Returns   : Integer or null string if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : The 'N' value is listed in parenthesis with P/Expect value:\n           : e.g., P(3) = 1.2e-30  ---> (N = 3).\n           : Not defined in NCBI Blast2 with gaps.\n           : This typically is equal to the number of HSPs but not always.\n           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().\n\nSee Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>\n\n\n#-----\nsub n { my $self = shift; $self->{'N'} || ''; }\n#-----\n\n\n=head2 matches\n\n Usage     : $hsp->matches([seq_type], [start], [stop]);\n Purpose   : Get the total number of identical and conservative matches\n           : in the query or sbjct sequence for the given HSP. Optionally can\n           : report data within a defined interval along the seq.\n           : (Note: 'conservative' matches are called 'positives' in the\n\t   : Blast report.)\n Example   : ($id,$cons) = $hsp_object->matches('hit');\n           : ($id,$cons) = $hsp_object->matches('query',300,400);\n Returns   : 2-element array of integers\n Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : (2) start = Starting coordinate (optional)\n           : (3) stop  = Ending coordinate (optional)\n Throws    : Exception if the supplied coordinates are out of range.\n Comments  : Relies on seq_str('match') to get the string of alignment symbols\n           : between the query and sbjct lines which are used for determining\n           : the number of identical and conservative matches.\n\nSee Also   : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub matches {\n#-----------\n    my( $self, %param ) = @_;\n    my(@data);\n    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    my($start,$stop);\n\n    if(!defined $beg && !defined $end) {\n\t## Get data for the whole alignment.\n\tpush @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});\n    } else {\n\t## Get the substring representing the desired sub-section of aln.\n\t$beg ||= 0;\n\t$end ||= 0;\n\t($start,$stop) = $self->range($seqType);\n\tif($beg == 0) { $beg = $start; $end = $beg+$end; }\n\telsif($end == 0) { $end = $stop; $beg = $end-$beg; }\n\n\tif($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)\n\telse { $end += 1;}   ##ML moved from commented position below, makes\n                             ##more sense here\n#\tif($end > $stop) { $end = $stop; }\n\tif($beg < $start) { $beg = $start; }\n#\telse { $end += 1;}\n\n#\tmy $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));\n\n\t## ML: START fix for substr out of range error ------------------\n\tmy $seq = \"\";\n        my $prog = $self->algorithm;\n\tif (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\n\t} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\t} else {\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  $beg-$start, ($end-$beg));\n\t}\n\t## ML: End of fix for  substr out of range error -----------------\n\n\n\t## ML: debugging code\n\t## This is where we get our exception.  Try printing out the values going\n\t## into this:\n\t##\n#\t print STDERR\n#\t     qq(*------------MY EXCEPTION --------------------\\nSeq: \") ,\n#\t     $self->seq_str(\"$seqType\"), qq(\"\\n),$self->rank,\",(  index:\";\n#\t print STDERR  $beg-$start, \", len: \", $end-$beg,\" ), (HSPRealLen:\",\n#\t     CORE::length $self->seq_str(\"$seqType\");\n#\t print STDERR \", HSPCalcLen: \", $stop - $start +1 ,\" ),\n#\t     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\\n\";\n\t ## ML: END DEBUGGING CODE----------\n\n\tif(!CORE::length $seq) {\n            my $id_str = $self->_id_str;\n\t    $self->throw(\"Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)\");\n\t}\n\t## Get data for a substring.\n#\tprintf \"Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\\n%s<---\\n\", $beg, $end, $start, $stop, $seq;\n#\tprintf \"Original match seq:\\n%s\\n\",$self->seq_str('match');\n\t$seq =~ s/ //g;  # remove space (no info).\n\tmy $len_cons = CORE::length $seq;\n\t$seq =~ s/\\+//g;  # remove '+' characters (conservative substitutions)\n\tmy $len_id = CORE::length $seq;\n\tpush @data, ($len_id, $len_cons);\n#\tprintf \"  HSP = %s\\n  id = %d; cons = %d\\n\", $self->rank, $len_id, $len_cons; <STDIN>;\n    }\n    @data;\n}\n\n\n=head2 num_identical\n\n Usage     : $hsp_object->num_identical();\n Purpose   : Get the number of identical positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_identical();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_conserved>, L</frac_identical>\n\n\n#-------------------\nsub num_identical {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numIdentical'};\n}\n\n\n=head2 num_conserved\n\n Usage     : $hsp_object->num_conserved();\n Purpose   : Get the number of conserved positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_conserved();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_identical>, L</frac_conserved>\n\n\n#-------------------\nsub num_conserved {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numConserved'};\n}\n\n\n\n=head2 range\n\n Usage     : $hsp->range( [seq_type] );\n Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence\n           : in the HSP alignment.\n Example   : ($query_beg, $query_end) = $hsp->range('query');\n           : ($hit_beg, $hit_end) = $hsp->range('hit');\n Returns   : Two-element array of integers\n Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</start>, L</end>\n\n\n#----------\nsub range {\n#----------\n    my ($self, $seqType) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n\n    return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});\n}\n\n=head2 start\n\n Usage     : $hsp->start( [seq_type] );\n Purpose   : Gets the start coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_beg = $hsp->start('query');\n           : $hit_beg = $hsp->start('hit');\n           : ($query_beg, $hit_beg) = $hsp->start();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</end>, L</range>","parameters":[{"label":"$self"},{"label":"$seqType"}]},"containerName":"main::","definition":"sub","line":1354,"children":[{"localvar":"my","name":"$self","definition":"my","containerName":"start","line":1356,"kind":13},{"line":1356,"kind":13,"name":"$seqType","containerName":"start"},{"line":1358,"kind":13,"name":"$seqType","containerName":"start"},{"kind":13,"line":1359,"containerName":"start","name":"$seqType"},{"containerName":"start","name":"$seqType","kind":13,"line":1359},{"kind":13,"line":1361,"containerName":"start","name":"$self"},{"kind":12,"line":1361,"name":"_set_seq_data","containerName":"start"},{"containerName":"start","name":"$self","kind":13,"line":1361},{"containerName":"start","name":"$seqType","kind":13,"line":1363},{"kind":13,"line":1364,"name":"$self","containerName":"start"},{"containerName":"start","name":"$self","kind":13,"line":1364},{"kind":13,"line":1367,"containerName":"start","name":"$seqType"},{"kind":13,"line":1368,"containerName":"start","name":"$self"},{"containerName":"start","name":"$seqType","kind":13,"line":1368}],"kind":12},{"name":"end","range":{"end":{"character":9999,"line":1410},"start":{"line":1394,"character":0}},"definition":"sub","containerName":"main::","signature":{"label":"end($self,$seqType)","parameters":[{"label":"$self"},{"label":"$seqType"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>\n\n\n#--------------------\nsub _set_match_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    if($data =~ m!Identities = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numIdentical'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Positives = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numConserved'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Frame = ([\\d+-]+)!) {\n      $self->frame($1);\n    }\n\n    # Strand data is not always present in this line.\n    # _set_seq() will also set strand information.\n    if($data =~ m!Strand = (\\w+) / (\\w+)!) {\n\t$self->{'_queryStrand'} = $1;\n\t$self->{'_sbjctStrand'} = $2;\n    }\n\n#    if($data =~ m!Gaps = (\\d+)/(\\d+)!) {\n#\t $self->{'_totalGaps'} = $1;\n#    } else {\n#\t $self->{'_totalGaps'} = 0;\n#    }\n}\n\n\n\n=head2 _set_seq_data\n\n Usage     : called automatically when sequence data is requested.\n Purpose   : Sets the HSP sequence data for both query and sbjct sequences.\n           : Includes: start, stop, length, gaps, and raw sequence.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_match_seq()\n Comments  : Uses raw data stored by _set_data() during object construction.\n           : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as gaps(), _set_residues(),\n           : etc. _set_seq() does the dirty work.\n\nSee Also   : L</_set_seq>\n\n\n#-----------------\nsub _set_seq_data {\n#-----------------\n    my $self = shift;\n\n    $self->_set_seq('query', @{$self->{'_queryList'}});\n    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});\n\n    # Liberate some memory.\n    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();\n    undef $self->{'_queryList'};\n    undef $self->{'_sbjctList'};\n\n    $self->{'_set_seq_data'} = 1;\n}\n\n\n\n=head2 _set_seq \n\n Usage     : called automatically by _set_seq_data()\n           : $hsp_obj->($seq_type, @data);\n Purpose   : Sets sequence information for both the query and sbjct sequences.\n           : Directly counts the number of gaps in each sequence (if gapped Blast).\n Argument  : $seq_type = 'query' or 'sbjct'\n           : @data = all seq lines with the form:\n           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120\n Throws    : Exception if data strings cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : Uses first argument to determine which data members to set\n           : making this method sensitive data member name changes.\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs\n           : for TBLASTN/X hits on the minus strand. Normalization facilitates use\n           : of range information by methods such as match().\n\nSee Also   : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>\n\n\n#-------------\nsub _set_seq {\n#-------------\n    my $self      = shift;\n    my $seqType   = shift;\n    my @data      = @_;\n    my @ranges    = ();\n    my @sequence  = ();\n    my $numGaps   = 0;\n\n    foreach( @data ) {\n        if( m/(\\d+) *([^\\d\\s]+) *(\\d+)/) {\n            push @ranges, ( $1, $3 ) ;\n            push @sequence, $2;\n        #print STDERR \"_set_seq found sequence \\\"$2\\\"\\n\";\n\t} else {\n\t    $self->warn(\"Bad sequence data: $_\");\n\t}\n    }\n\n    if( !(scalar(@sequence) and scalar(@ranges))) {\n        my $id_str = $self->_id_str;\n\t$self->throw(\"Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType\");\n   }\n\n    # Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n    $self->{$seqType.'Start'} = $ranges[0];\n    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];\n    $self->{$seqType.'Seq'}   = \\@sequence;\n\n    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;\n\n    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences\n    # Converting nucl coords to amino acid coords.\n\n    my $prog = $self->algorithm;\n    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'TBLASTX') {\n\t$self->{$seqType.'Length'} /= 3;\n    }\n\n    if( $prog ne 'BLASTP' ) {\n        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;\n        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');\n        # Normalize sequence endpoints so that start < end.\n        # Reverse complement or 'minus strand' HSPs get flipped here.\n        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {\n            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =\n                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});\n            $self->{$seqType.'Strand'} = 'Minus';\n        }\n    }\n\n    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.\n#    if($self->{'_gapped'}) {\n\tmy $seqstr = join('', @sequence);\n\t$seqstr =~ s/\\s//g;\n        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};\n\t$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;\n#    }\n}\n\n\n=head2 _set_residues\n\n Usage     : called automatically when residue data is requested.\n Purpose   : Sets the residue numbers representing the identical and\n           : conserved positions. These data are obtained by analyzing the\n           : symbols between query and sbjct lines of the alignments.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().\n Comments  : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as seq_inds().\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n\nSee Also   : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>\n\n\n#------------------\nsub _set_residues {\n#------------------\n    my $self      = shift;\n    my @sequence  = ();\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    # Using hashes to avoid saving duplicate residue numbers.\n    my %identicalList_query = ();\n    my %identicalList_sbjct = ();\n    my %conservedList_query = ();\n    my %conservedList_sbjct = ();\n\n    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};\n    $aref  ||= $self->{'_matchSeq'};\n    my $seqString = join('', @$aref );\n\n    my $qseq = join('',@{$self->{'_querySeq'}});\n    my $sseq = join('',@{$self->{'_sbjctSeq'}});\n    my $resCount_query = $self->{'_queryStop'} || 0;\n    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;\n\n    my $prog = $self->algorithm;\n    if($prog !~ /^BLASTP|^BLASTN/) {\n\tif($prog eq 'TBLASTN') {\n\t    $resCount_sbjct /= 3;\n\t} elsif($prog eq 'BLASTX') {\n\t    $resCount_query /= 3;\n\t} elsif($prog eq 'TBLASTX') {\n\t    $resCount_query /= 3;\n\t    $resCount_sbjct /= 3;\n\t}\n    }\n\n    my ($mchar, $schar, $qchar);\n    while( $mchar = chop($seqString) ) {\n\t($qchar, $schar) = (chop($qseq), chop($sseq));\n\tif( $mchar eq '+' ) {\n\t    $conservedList_query{ $resCount_query } = 1;\n\t    $conservedList_sbjct{ $resCount_sbjct } = 1;\n\t} elsif( $mchar ne ' ' ) {\n\t    $identicalList_query{ $resCount_query } = 1;\n\t    $identicalList_sbjct{ $resCount_sbjct } = 1;\n\t}\n\t$resCount_query-- if $qchar ne $GAP_SYMBOL;\n\t$resCount_sbjct-- if $schar ne $GAP_SYMBOL;\n    }\n    $self->{'_identicalRes_query'} = \\%identicalList_query;\n    $self->{'_conservedRes_query'} = \\%conservedList_query;\n    $self->{'_identicalRes_sbjct'} = \\%identicalList_sbjct;\n    $self->{'_conservedRes_sbjct'} = \\%conservedList_sbjct;\n\n}\n\n=head2 _set_match_seq\n\n Usage     : $hsp_obj->_set_match_seq()\n Purpose   : Set the 'match' sequence for the current HSP (symbols in between\n           : the query and sbjct lines.)\n Returns   : Array reference holding the match sequences lines.\n Argument  : n/a\n Throws    : Exception if the _matchList field is not set.\n Comments  : The match information is not always necessary. This method\n           : allows it to be conditionally prepared.\n           : Called by _set_residues>() and seq_str().\n\nSee Also   : L</_set_residues>, L</seq_str>\n\n\n#-------------------\nsub _set_match_seq {\n#-------------------\n    my $self = shift;\n\n    if( ! ref($self->{'_matchList'}) ) {\n        my $id_str = $self->_id_str;\n        $self->throw(\"Can't set HSP match sequence: No data ($id_str)\");\n    }\n\n    my @data = @{$self->{'_matchList'}};\n\n    my(@sequence);\n    foreach( @data ) {\n\tchomp($_);\n\t## Remove leading spaces; (note: aln may begin with a space\n\t## which is why we can't use s/^ +//).\n\ts/^ {$self->{'_match_indent'}}//;\n\tpush @sequence, $_;\n    }\n    # Liberate some memory.\n    @{$self->{'_matchList'}} = undef;\n    $self->{'_matchList'} = undef;\n\n    $self->{'_matchSeq'} = \\@sequence;\n\n    return $self->{'_matchSeq'};\n}\n\n\n=head2 n\n\n Usage     : $hsp_obj->n()\n Purpose   : Get the N value (num HSPs on which P/Expect is based).\n           : This value is not defined with NCBI Blast2 with gapping.\n Returns   : Integer or null string if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : The 'N' value is listed in parenthesis with P/Expect value:\n           : e.g., P(3) = 1.2e-30  ---> (N = 3).\n           : Not defined in NCBI Blast2 with gaps.\n           : This typically is equal to the number of HSPs but not always.\n           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().\n\nSee Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>\n\n\n#-----\nsub n { my $self = shift; $self->{'N'} || ''; }\n#-----\n\n\n=head2 matches\n\n Usage     : $hsp->matches([seq_type], [start], [stop]);\n Purpose   : Get the total number of identical and conservative matches\n           : in the query or sbjct sequence for the given HSP. Optionally can\n           : report data within a defined interval along the seq.\n           : (Note: 'conservative' matches are called 'positives' in the\n\t   : Blast report.)\n Example   : ($id,$cons) = $hsp_object->matches('hit');\n           : ($id,$cons) = $hsp_object->matches('query',300,400);\n Returns   : 2-element array of integers\n Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : (2) start = Starting coordinate (optional)\n           : (3) stop  = Ending coordinate (optional)\n Throws    : Exception if the supplied coordinates are out of range.\n Comments  : Relies on seq_str('match') to get the string of alignment symbols\n           : between the query and sbjct lines which are used for determining\n           : the number of identical and conservative matches.\n\nSee Also   : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub matches {\n#-----------\n    my( $self, %param ) = @_;\n    my(@data);\n    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    my($start,$stop);\n\n    if(!defined $beg && !defined $end) {\n\t## Get data for the whole alignment.\n\tpush @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});\n    } else {\n\t## Get the substring representing the desired sub-section of aln.\n\t$beg ||= 0;\n\t$end ||= 0;\n\t($start,$stop) = $self->range($seqType);\n\tif($beg == 0) { $beg = $start; $end = $beg+$end; }\n\telsif($end == 0) { $end = $stop; $beg = $end-$beg; }\n\n\tif($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)\n\telse { $end += 1;}   ##ML moved from commented position below, makes\n                             ##more sense here\n#\tif($end > $stop) { $end = $stop; }\n\tif($beg < $start) { $beg = $start; }\n#\telse { $end += 1;}\n\n#\tmy $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));\n\n\t## ML: START fix for substr out of range error ------------------\n\tmy $seq = \"\";\n        my $prog = $self->algorithm;\n\tif (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\n\t} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\t} else {\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  $beg-$start, ($end-$beg));\n\t}\n\t## ML: End of fix for  substr out of range error -----------------\n\n\n\t## ML: debugging code\n\t## This is where we get our exception.  Try printing out the values going\n\t## into this:\n\t##\n#\t print STDERR\n#\t     qq(*------------MY EXCEPTION --------------------\\nSeq: \") ,\n#\t     $self->seq_str(\"$seqType\"), qq(\"\\n),$self->rank,\",(  index:\";\n#\t print STDERR  $beg-$start, \", len: \", $end-$beg,\" ), (HSPRealLen:\",\n#\t     CORE::length $self->seq_str(\"$seqType\");\n#\t print STDERR \", HSPCalcLen: \", $stop - $start +1 ,\" ),\n#\t     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\\n\";\n\t ## ML: END DEBUGGING CODE----------\n\n\tif(!CORE::length $seq) {\n            my $id_str = $self->_id_str;\n\t    $self->throw(\"Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)\");\n\t}\n\t## Get data for a substring.\n#\tprintf \"Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\\n%s<---\\n\", $beg, $end, $start, $stop, $seq;\n#\tprintf \"Original match seq:\\n%s\\n\",$self->seq_str('match');\n\t$seq =~ s/ //g;  # remove space (no info).\n\tmy $len_cons = CORE::length $seq;\n\t$seq =~ s/\\+//g;  # remove '+' characters (conservative substitutions)\n\tmy $len_id = CORE::length $seq;\n\tpush @data, ($len_id, $len_cons);\n#\tprintf \"  HSP = %s\\n  id = %d; cons = %d\\n\", $self->rank, $len_id, $len_cons; <STDIN>;\n    }\n    @data;\n}\n\n\n=head2 num_identical\n\n Usage     : $hsp_object->num_identical();\n Purpose   : Get the number of identical positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_identical();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_conserved>, L</frac_identical>\n\n\n#-------------------\nsub num_identical {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numIdentical'};\n}\n\n\n=head2 num_conserved\n\n Usage     : $hsp_object->num_conserved();\n Purpose   : Get the number of conserved positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_conserved();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_identical>, L</frac_conserved>\n\n\n#-------------------\nsub num_conserved {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numConserved'};\n}\n\n\n\n=head2 range\n\n Usage     : $hsp->range( [seq_type] );\n Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence\n           : in the HSP alignment.\n Example   : ($query_beg, $query_end) = $hsp->range('query');\n           : ($hit_beg, $hit_end) = $hsp->range('hit');\n Returns   : Two-element array of integers\n Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</start>, L</end>\n\n\n#----------\nsub range {\n#----------\n    my ($self, $seqType) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n\n    return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});\n}\n\n=head2 start\n\n Usage     : $hsp->start( [seq_type] );\n Purpose   : Gets the start coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_beg = $hsp->start('query');\n           : $hit_beg = $hsp->start('hit');\n           : ($query_beg, $hit_beg) = $hsp->start();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</end>, L</range>\n\n\n#----------\nsub start {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStart'}, $self->{'_sbjctStart'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Start'};\n    }\n}\n\n=head2 end\n\n Usage     : $hsp->end( [seq_type] );\n Purpose   : Gets the end coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_end = $hsp->end('query');\n           : $hit_end = $hsp->end('hit');\n           : ($query_end, $hit_end) = $hsp->end();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</start>, L</range>, L</strand>"},"detail":"($self,$seqType)","kind":12,"children":[{"line":1396,"kind":13,"localvar":"my","containerName":"end","name":"$self","definition":"my"},{"containerName":"end","name":"$seqType","kind":13,"line":1396},{"containerName":"end","name":"$seqType","line":1398,"kind":13},{"line":1399,"kind":13,"name":"$seqType","containerName":"end"},{"containerName":"end","name":"$seqType","kind":13,"line":1399},{"line":1401,"kind":13,"containerName":"end","name":"$self"},{"name":"_set_seq_data","containerName":"end","line":1401,"kind":12},{"kind":13,"line":1401,"name":"$self","containerName":"end"},{"name":"$seqType","containerName":"end","kind":13,"line":1403},{"name":"$self","containerName":"end","line":1404,"kind":13},{"containerName":"end","name":"$self","kind":13,"line":1404},{"name":"$seqType","containerName":"end","kind":13,"line":1407},{"kind":13,"line":1408,"containerName":"end","name":"$self"},{"name":"$seqType","containerName":"end","kind":13,"line":1408}],"line":1394},{"range":{"start":{"character":0,"line":1438},"end":{"character":9999,"line":1477}},"name":"strand","detail":"($self,$seqType)","signature":{"parameters":[{"label":"$self"},{"label":"$seqType"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>\n\n\n#--------------------\nsub _set_match_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    if($data =~ m!Identities = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numIdentical'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Positives = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numConserved'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Frame = ([\\d+-]+)!) {\n      $self->frame($1);\n    }\n\n    # Strand data is not always present in this line.\n    # _set_seq() will also set strand information.\n    if($data =~ m!Strand = (\\w+) / (\\w+)!) {\n\t$self->{'_queryStrand'} = $1;\n\t$self->{'_sbjctStrand'} = $2;\n    }\n\n#    if($data =~ m!Gaps = (\\d+)/(\\d+)!) {\n#\t $self->{'_totalGaps'} = $1;\n#    } else {\n#\t $self->{'_totalGaps'} = 0;\n#    }\n}\n\n\n\n=head2 _set_seq_data\n\n Usage     : called automatically when sequence data is requested.\n Purpose   : Sets the HSP sequence data for both query and sbjct sequences.\n           : Includes: start, stop, length, gaps, and raw sequence.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_match_seq()\n Comments  : Uses raw data stored by _set_data() during object construction.\n           : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as gaps(), _set_residues(),\n           : etc. _set_seq() does the dirty work.\n\nSee Also   : L</_set_seq>\n\n\n#-----------------\nsub _set_seq_data {\n#-----------------\n    my $self = shift;\n\n    $self->_set_seq('query', @{$self->{'_queryList'}});\n    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});\n\n    # Liberate some memory.\n    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();\n    undef $self->{'_queryList'};\n    undef $self->{'_sbjctList'};\n\n    $self->{'_set_seq_data'} = 1;\n}\n\n\n\n=head2 _set_seq \n\n Usage     : called automatically by _set_seq_data()\n           : $hsp_obj->($seq_type, @data);\n Purpose   : Sets sequence information for both the query and sbjct sequences.\n           : Directly counts the number of gaps in each sequence (if gapped Blast).\n Argument  : $seq_type = 'query' or 'sbjct'\n           : @data = all seq lines with the form:\n           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120\n Throws    : Exception if data strings cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : Uses first argument to determine which data members to set\n           : making this method sensitive data member name changes.\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs\n           : for TBLASTN/X hits on the minus strand. Normalization facilitates use\n           : of range information by methods such as match().\n\nSee Also   : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>\n\n\n#-------------\nsub _set_seq {\n#-------------\n    my $self      = shift;\n    my $seqType   = shift;\n    my @data      = @_;\n    my @ranges    = ();\n    my @sequence  = ();\n    my $numGaps   = 0;\n\n    foreach( @data ) {\n        if( m/(\\d+) *([^\\d\\s]+) *(\\d+)/) {\n            push @ranges, ( $1, $3 ) ;\n            push @sequence, $2;\n        #print STDERR \"_set_seq found sequence \\\"$2\\\"\\n\";\n\t} else {\n\t    $self->warn(\"Bad sequence data: $_\");\n\t}\n    }\n\n    if( !(scalar(@sequence) and scalar(@ranges))) {\n        my $id_str = $self->_id_str;\n\t$self->throw(\"Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType\");\n   }\n\n    # Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n    $self->{$seqType.'Start'} = $ranges[0];\n    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];\n    $self->{$seqType.'Seq'}   = \\@sequence;\n\n    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;\n\n    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences\n    # Converting nucl coords to amino acid coords.\n\n    my $prog = $self->algorithm;\n    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'TBLASTX') {\n\t$self->{$seqType.'Length'} /= 3;\n    }\n\n    if( $prog ne 'BLASTP' ) {\n        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;\n        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');\n        # Normalize sequence endpoints so that start < end.\n        # Reverse complement or 'minus strand' HSPs get flipped here.\n        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {\n            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =\n                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});\n            $self->{$seqType.'Strand'} = 'Minus';\n        }\n    }\n\n    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.\n#    if($self->{'_gapped'}) {\n\tmy $seqstr = join('', @sequence);\n\t$seqstr =~ s/\\s//g;\n        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};\n\t$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;\n#    }\n}\n\n\n=head2 _set_residues\n\n Usage     : called automatically when residue data is requested.\n Purpose   : Sets the residue numbers representing the identical and\n           : conserved positions. These data are obtained by analyzing the\n           : symbols between query and sbjct lines of the alignments.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().\n Comments  : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as seq_inds().\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n\nSee Also   : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>\n\n\n#------------------\nsub _set_residues {\n#------------------\n    my $self      = shift;\n    my @sequence  = ();\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    # Using hashes to avoid saving duplicate residue numbers.\n    my %identicalList_query = ();\n    my %identicalList_sbjct = ();\n    my %conservedList_query = ();\n    my %conservedList_sbjct = ();\n\n    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};\n    $aref  ||= $self->{'_matchSeq'};\n    my $seqString = join('', @$aref );\n\n    my $qseq = join('',@{$self->{'_querySeq'}});\n    my $sseq = join('',@{$self->{'_sbjctSeq'}});\n    my $resCount_query = $self->{'_queryStop'} || 0;\n    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;\n\n    my $prog = $self->algorithm;\n    if($prog !~ /^BLASTP|^BLASTN/) {\n\tif($prog eq 'TBLASTN') {\n\t    $resCount_sbjct /= 3;\n\t} elsif($prog eq 'BLASTX') {\n\t    $resCount_query /= 3;\n\t} elsif($prog eq 'TBLASTX') {\n\t    $resCount_query /= 3;\n\t    $resCount_sbjct /= 3;\n\t}\n    }\n\n    my ($mchar, $schar, $qchar);\n    while( $mchar = chop($seqString) ) {\n\t($qchar, $schar) = (chop($qseq), chop($sseq));\n\tif( $mchar eq '+' ) {\n\t    $conservedList_query{ $resCount_query } = 1;\n\t    $conservedList_sbjct{ $resCount_sbjct } = 1;\n\t} elsif( $mchar ne ' ' ) {\n\t    $identicalList_query{ $resCount_query } = 1;\n\t    $identicalList_sbjct{ $resCount_sbjct } = 1;\n\t}\n\t$resCount_query-- if $qchar ne $GAP_SYMBOL;\n\t$resCount_sbjct-- if $schar ne $GAP_SYMBOL;\n    }\n    $self->{'_identicalRes_query'} = \\%identicalList_query;\n    $self->{'_conservedRes_query'} = \\%conservedList_query;\n    $self->{'_identicalRes_sbjct'} = \\%identicalList_sbjct;\n    $self->{'_conservedRes_sbjct'} = \\%conservedList_sbjct;\n\n}\n\n=head2 _set_match_seq\n\n Usage     : $hsp_obj->_set_match_seq()\n Purpose   : Set the 'match' sequence for the current HSP (symbols in between\n           : the query and sbjct lines.)\n Returns   : Array reference holding the match sequences lines.\n Argument  : n/a\n Throws    : Exception if the _matchList field is not set.\n Comments  : The match information is not always necessary. This method\n           : allows it to be conditionally prepared.\n           : Called by _set_residues>() and seq_str().\n\nSee Also   : L</_set_residues>, L</seq_str>\n\n\n#-------------------\nsub _set_match_seq {\n#-------------------\n    my $self = shift;\n\n    if( ! ref($self->{'_matchList'}) ) {\n        my $id_str = $self->_id_str;\n        $self->throw(\"Can't set HSP match sequence: No data ($id_str)\");\n    }\n\n    my @data = @{$self->{'_matchList'}};\n\n    my(@sequence);\n    foreach( @data ) {\n\tchomp($_);\n\t## Remove leading spaces; (note: aln may begin with a space\n\t## which is why we can't use s/^ +//).\n\ts/^ {$self->{'_match_indent'}}//;\n\tpush @sequence, $_;\n    }\n    # Liberate some memory.\n    @{$self->{'_matchList'}} = undef;\n    $self->{'_matchList'} = undef;\n\n    $self->{'_matchSeq'} = \\@sequence;\n\n    return $self->{'_matchSeq'};\n}\n\n\n=head2 n\n\n Usage     : $hsp_obj->n()\n Purpose   : Get the N value (num HSPs on which P/Expect is based).\n           : This value is not defined with NCBI Blast2 with gapping.\n Returns   : Integer or null string if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : The 'N' value is listed in parenthesis with P/Expect value:\n           : e.g., P(3) = 1.2e-30  ---> (N = 3).\n           : Not defined in NCBI Blast2 with gaps.\n           : This typically is equal to the number of HSPs but not always.\n           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().\n\nSee Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>\n\n\n#-----\nsub n { my $self = shift; $self->{'N'} || ''; }\n#-----\n\n\n=head2 matches\n\n Usage     : $hsp->matches([seq_type], [start], [stop]);\n Purpose   : Get the total number of identical and conservative matches\n           : in the query or sbjct sequence for the given HSP. Optionally can\n           : report data within a defined interval along the seq.\n           : (Note: 'conservative' matches are called 'positives' in the\n\t   : Blast report.)\n Example   : ($id,$cons) = $hsp_object->matches('hit');\n           : ($id,$cons) = $hsp_object->matches('query',300,400);\n Returns   : 2-element array of integers\n Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : (2) start = Starting coordinate (optional)\n           : (3) stop  = Ending coordinate (optional)\n Throws    : Exception if the supplied coordinates are out of range.\n Comments  : Relies on seq_str('match') to get the string of alignment symbols\n           : between the query and sbjct lines which are used for determining\n           : the number of identical and conservative matches.\n\nSee Also   : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub matches {\n#-----------\n    my( $self, %param ) = @_;\n    my(@data);\n    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    my($start,$stop);\n\n    if(!defined $beg && !defined $end) {\n\t## Get data for the whole alignment.\n\tpush @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});\n    } else {\n\t## Get the substring representing the desired sub-section of aln.\n\t$beg ||= 0;\n\t$end ||= 0;\n\t($start,$stop) = $self->range($seqType);\n\tif($beg == 0) { $beg = $start; $end = $beg+$end; }\n\telsif($end == 0) { $end = $stop; $beg = $end-$beg; }\n\n\tif($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)\n\telse { $end += 1;}   ##ML moved from commented position below, makes\n                             ##more sense here\n#\tif($end > $stop) { $end = $stop; }\n\tif($beg < $start) { $beg = $start; }\n#\telse { $end += 1;}\n\n#\tmy $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));\n\n\t## ML: START fix for substr out of range error ------------------\n\tmy $seq = \"\";\n        my $prog = $self->algorithm;\n\tif (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\n\t} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\t} else {\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  $beg-$start, ($end-$beg));\n\t}\n\t## ML: End of fix for  substr out of range error -----------------\n\n\n\t## ML: debugging code\n\t## This is where we get our exception.  Try printing out the values going\n\t## into this:\n\t##\n#\t print STDERR\n#\t     qq(*------------MY EXCEPTION --------------------\\nSeq: \") ,\n#\t     $self->seq_str(\"$seqType\"), qq(\"\\n),$self->rank,\",(  index:\";\n#\t print STDERR  $beg-$start, \", len: \", $end-$beg,\" ), (HSPRealLen:\",\n#\t     CORE::length $self->seq_str(\"$seqType\");\n#\t print STDERR \", HSPCalcLen: \", $stop - $start +1 ,\" ),\n#\t     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\\n\";\n\t ## ML: END DEBUGGING CODE----------\n\n\tif(!CORE::length $seq) {\n            my $id_str = $self->_id_str;\n\t    $self->throw(\"Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)\");\n\t}\n\t## Get data for a substring.\n#\tprintf \"Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\\n%s<---\\n\", $beg, $end, $start, $stop, $seq;\n#\tprintf \"Original match seq:\\n%s\\n\",$self->seq_str('match');\n\t$seq =~ s/ //g;  # remove space (no info).\n\tmy $len_cons = CORE::length $seq;\n\t$seq =~ s/\\+//g;  # remove '+' characters (conservative substitutions)\n\tmy $len_id = CORE::length $seq;\n\tpush @data, ($len_id, $len_cons);\n#\tprintf \"  HSP = %s\\n  id = %d; cons = %d\\n\", $self->rank, $len_id, $len_cons; <STDIN>;\n    }\n    @data;\n}\n\n\n=head2 num_identical\n\n Usage     : $hsp_object->num_identical();\n Purpose   : Get the number of identical positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_identical();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_conserved>, L</frac_identical>\n\n\n#-------------------\nsub num_identical {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numIdentical'};\n}\n\n\n=head2 num_conserved\n\n Usage     : $hsp_object->num_conserved();\n Purpose   : Get the number of conserved positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_conserved();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_identical>, L</frac_conserved>\n\n\n#-------------------\nsub num_conserved {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numConserved'};\n}\n\n\n\n=head2 range\n\n Usage     : $hsp->range( [seq_type] );\n Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence\n           : in the HSP alignment.\n Example   : ($query_beg, $query_end) = $hsp->range('query');\n           : ($hit_beg, $hit_end) = $hsp->range('hit');\n Returns   : Two-element array of integers\n Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</start>, L</end>\n\n\n#----------\nsub range {\n#----------\n    my ($self, $seqType) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n\n    return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});\n}\n\n=head2 start\n\n Usage     : $hsp->start( [seq_type] );\n Purpose   : Gets the start coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_beg = $hsp->start('query');\n           : $hit_beg = $hsp->start('hit');\n           : ($query_beg, $hit_beg) = $hsp->start();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</end>, L</range>\n\n\n#----------\nsub start {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStart'}, $self->{'_sbjctStart'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Start'};\n    }\n}\n\n=head2 end\n\n Usage     : $hsp->end( [seq_type] );\n Purpose   : Gets the end coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_end = $hsp->end('query');\n           : $hit_end = $hsp->end('hit');\n           : ($query_end, $hit_end) = $hsp->end();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</start>, L</range>, L</strand>\n\n\n#----------\nsub end {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStop'}, $self->{'_sbjctStop'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Stop'};\n    }\n}\n\n\n\n=head2 strand\n\n Usage     : $hsp_object->strand( [seq_type] )\n Purpose   : Get the strand of the query or sbjct sequence.\n Example   : print $hsp->strand('query');\n           : ($query_strand, $hit_strand) = $hsp->strand();\n Returns   : -1, 0, or 1\n           : -1 = Minus strand, +1 = Plus strand\n           : Returns 0 if strand is not defined, which occurs\n           : for BLASTP reports, and the query of TBLASTN\n           : as well as the hit if BLASTX reports.\n           : In scalar context without arguments, returns queryStrand value.\n           : In array context without arguments, returns a two-element list\n           :    of strings (queryStrand, sbjctStrand).\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or undef\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</_set_seq>, L</_set_match_stats>","label":"strand($self,$seqType)"},"containerName":"main::","definition":"sub","line":1438,"children":[{"kind":13,"line":1440,"name":"$self","definition":"my","containerName":"strand","localvar":"my"},{"containerName":"strand","name":"$seqType","line":1440,"kind":13},{"name":"$seqType","containerName":"strand","kind":13,"line":1442},{"containerName":"strand","name":"$seqType","kind":13,"line":1443},{"containerName":"strand","name":"$seqType","kind":13,"line":1443},{"name":"$seqType","containerName":"strand","line":1446,"kind":13},{"containerName":"strand","name":"$self","kind":13,"line":1449},{"line":1449,"kind":13,"name":"$self","containerName":"strand"},{"name":"_set_seq_data","containerName":"strand","kind":12,"line":1449},{"kind":13,"line":1449,"name":"$self","containerName":"strand"},{"kind":13,"line":1451,"containerName":"strand","definition":"my","name":"$prog","localvar":"my"},{"line":1451,"kind":13,"name":"$self","containerName":"strand"},{"name":"algorithm","containerName":"strand","line":1451,"kind":12},{"containerName":"strand","name":"$seqType","line":1453,"kind":13},{"definition":"my","name":"$qstr","containerName":"strand","localvar":"my","kind":13,"line":1454},{"name":"$hstr","containerName":"strand","kind":13,"line":1454},{"name":"$prog","containerName":"strand","kind":13,"line":1455},{"kind":13,"line":1456,"containerName":"strand","name":"$qstr"},{"name":"$hstr","containerName":"strand","line":1457,"kind":13},{"name":"$prog","containerName":"strand","line":1459,"kind":13},{"line":1460,"kind":13,"name":"$qstr","containerName":"strand"},{"line":1461,"kind":13,"name":"$hstr","containerName":"strand"},{"containerName":"strand","name":"$STRAND_SYMBOL","kind":13,"line":1461},{"containerName":"strand","name":"$self","kind":13,"line":1461},{"containerName":"strand","name":"$prog","kind":13,"line":1463},{"line":1464,"kind":13,"name":"$qstr","containerName":"strand"},{"kind":13,"line":1464,"containerName":"strand","name":"$STRAND_SYMBOL"},{"line":1464,"kind":13,"name":"$self","containerName":"strand"},{"name":"$hstr","containerName":"strand","line":1465,"kind":13},{"line":1468,"kind":13,"name":"$qstr","containerName":"strand"},{"kind":13,"line":1468,"containerName":"strand","name":"$STRAND_SYMBOL"},{"line":1468,"kind":13,"name":"$self","containerName":"strand"},{"kind":13,"line":1468,"name":"$self","containerName":"strand"},{"kind":13,"line":1469,"containerName":"strand","name":"$hstr"},{"line":1469,"kind":13,"name":"$STRAND_SYMBOL","containerName":"strand"},{"containerName":"strand","name":"$self","kind":13,"line":1469},{"line":1469,"kind":13,"containerName":"strand","name":"$self"},{"name":"$qstr","containerName":"strand","kind":13,"line":1471},{"containerName":"strand","name":"$hstr","kind":13,"line":1472},{"line":1473,"kind":13,"name":"$qstr","containerName":"strand"},{"kind":13,"line":1473,"name":"$hstr","containerName":"strand"},{"containerName":"strand","name":"$STRAND_SYMBOL","line":1476,"kind":13},{"line":1476,"kind":13,"name":"$self","containerName":"strand"},{"containerName":"strand","name":"$seqType","kind":13,"line":1476}],"kind":12},{"name":"seq","range":{"start":{"line":1499,"character":0},"end":{"line":1512,"character":9999}},"kind":12,"children":[{"line":1501,"kind":13,"localvar":"my","definition":"my","name":"$self","containerName":"seq"},{"kind":13,"line":1501,"containerName":"seq","name":"$seqType"},{"containerName":"seq","name":"$seqType","kind":13,"line":1502},{"name":"$seqType","containerName":"seq","kind":13,"line":1503},{"kind":13,"line":1503,"name":"$seqType","containerName":"seq"},{"line":1504,"kind":13,"localvar":"my","containerName":"seq","name":"$str","definition":"my"},{"line":1504,"kind":13,"name":"$self","containerName":"seq"},{"kind":12,"line":1504,"containerName":"seq","name":"seq_str"},{"containerName":"seq","name":"$seqType","line":1504,"kind":13},{"name":"new","containerName":"seq","line":1508,"kind":12},{"name":"$self","containerName":"seq","kind":13,"line":1508},{"name":"to_string","containerName":"seq","line":1508,"kind":12},{"containerName":"seq","name":"$str","kind":13,"line":1509}],"line":1499,"definition":"sub","containerName":"main::","signature":{"parameters":[{"label":"$self"},{"label":"$seqType"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>\n\n\n#--------------------\nsub _set_match_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    if($data =~ m!Identities = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numIdentical'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Positives = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numConserved'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Frame = ([\\d+-]+)!) {\n      $self->frame($1);\n    }\n\n    # Strand data is not always present in this line.\n    # _set_seq() will also set strand information.\n    if($data =~ m!Strand = (\\w+) / (\\w+)!) {\n\t$self->{'_queryStrand'} = $1;\n\t$self->{'_sbjctStrand'} = $2;\n    }\n\n#    if($data =~ m!Gaps = (\\d+)/(\\d+)!) {\n#\t $self->{'_totalGaps'} = $1;\n#    } else {\n#\t $self->{'_totalGaps'} = 0;\n#    }\n}\n\n\n\n=head2 _set_seq_data\n\n Usage     : called automatically when sequence data is requested.\n Purpose   : Sets the HSP sequence data for both query and sbjct sequences.\n           : Includes: start, stop, length, gaps, and raw sequence.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_match_seq()\n Comments  : Uses raw data stored by _set_data() during object construction.\n           : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as gaps(), _set_residues(),\n           : etc. _set_seq() does the dirty work.\n\nSee Also   : L</_set_seq>\n\n\n#-----------------\nsub _set_seq_data {\n#-----------------\n    my $self = shift;\n\n    $self->_set_seq('query', @{$self->{'_queryList'}});\n    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});\n\n    # Liberate some memory.\n    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();\n    undef $self->{'_queryList'};\n    undef $self->{'_sbjctList'};\n\n    $self->{'_set_seq_data'} = 1;\n}\n\n\n\n=head2 _set_seq \n\n Usage     : called automatically by _set_seq_data()\n           : $hsp_obj->($seq_type, @data);\n Purpose   : Sets sequence information for both the query and sbjct sequences.\n           : Directly counts the number of gaps in each sequence (if gapped Blast).\n Argument  : $seq_type = 'query' or 'sbjct'\n           : @data = all seq lines with the form:\n           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120\n Throws    : Exception if data strings cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : Uses first argument to determine which data members to set\n           : making this method sensitive data member name changes.\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs\n           : for TBLASTN/X hits on the minus strand. Normalization facilitates use\n           : of range information by methods such as match().\n\nSee Also   : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>\n\n\n#-------------\nsub _set_seq {\n#-------------\n    my $self      = shift;\n    my $seqType   = shift;\n    my @data      = @_;\n    my @ranges    = ();\n    my @sequence  = ();\n    my $numGaps   = 0;\n\n    foreach( @data ) {\n        if( m/(\\d+) *([^\\d\\s]+) *(\\d+)/) {\n            push @ranges, ( $1, $3 ) ;\n            push @sequence, $2;\n        #print STDERR \"_set_seq found sequence \\\"$2\\\"\\n\";\n\t} else {\n\t    $self->warn(\"Bad sequence data: $_\");\n\t}\n    }\n\n    if( !(scalar(@sequence) and scalar(@ranges))) {\n        my $id_str = $self->_id_str;\n\t$self->throw(\"Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType\");\n   }\n\n    # Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n    $self->{$seqType.'Start'} = $ranges[0];\n    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];\n    $self->{$seqType.'Seq'}   = \\@sequence;\n\n    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;\n\n    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences\n    # Converting nucl coords to amino acid coords.\n\n    my $prog = $self->algorithm;\n    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'TBLASTX') {\n\t$self->{$seqType.'Length'} /= 3;\n    }\n\n    if( $prog ne 'BLASTP' ) {\n        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;\n        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');\n        # Normalize sequence endpoints so that start < end.\n        # Reverse complement or 'minus strand' HSPs get flipped here.\n        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {\n            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =\n                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});\n            $self->{$seqType.'Strand'} = 'Minus';\n        }\n    }\n\n    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.\n#    if($self->{'_gapped'}) {\n\tmy $seqstr = join('', @sequence);\n\t$seqstr =~ s/\\s//g;\n        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};\n\t$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;\n#    }\n}\n\n\n=head2 _set_residues\n\n Usage     : called automatically when residue data is requested.\n Purpose   : Sets the residue numbers representing the identical and\n           : conserved positions. These data are obtained by analyzing the\n           : symbols between query and sbjct lines of the alignments.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().\n Comments  : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as seq_inds().\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n\nSee Also   : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>\n\n\n#------------------\nsub _set_residues {\n#------------------\n    my $self      = shift;\n    my @sequence  = ();\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    # Using hashes to avoid saving duplicate residue numbers.\n    my %identicalList_query = ();\n    my %identicalList_sbjct = ();\n    my %conservedList_query = ();\n    my %conservedList_sbjct = ();\n\n    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};\n    $aref  ||= $self->{'_matchSeq'};\n    my $seqString = join('', @$aref );\n\n    my $qseq = join('',@{$self->{'_querySeq'}});\n    my $sseq = join('',@{$self->{'_sbjctSeq'}});\n    my $resCount_query = $self->{'_queryStop'} || 0;\n    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;\n\n    my $prog = $self->algorithm;\n    if($prog !~ /^BLASTP|^BLASTN/) {\n\tif($prog eq 'TBLASTN') {\n\t    $resCount_sbjct /= 3;\n\t} elsif($prog eq 'BLASTX') {\n\t    $resCount_query /= 3;\n\t} elsif($prog eq 'TBLASTX') {\n\t    $resCount_query /= 3;\n\t    $resCount_sbjct /= 3;\n\t}\n    }\n\n    my ($mchar, $schar, $qchar);\n    while( $mchar = chop($seqString) ) {\n\t($qchar, $schar) = (chop($qseq), chop($sseq));\n\tif( $mchar eq '+' ) {\n\t    $conservedList_query{ $resCount_query } = 1;\n\t    $conservedList_sbjct{ $resCount_sbjct } = 1;\n\t} elsif( $mchar ne ' ' ) {\n\t    $identicalList_query{ $resCount_query } = 1;\n\t    $identicalList_sbjct{ $resCount_sbjct } = 1;\n\t}\n\t$resCount_query-- if $qchar ne $GAP_SYMBOL;\n\t$resCount_sbjct-- if $schar ne $GAP_SYMBOL;\n    }\n    $self->{'_identicalRes_query'} = \\%identicalList_query;\n    $self->{'_conservedRes_query'} = \\%conservedList_query;\n    $self->{'_identicalRes_sbjct'} = \\%identicalList_sbjct;\n    $self->{'_conservedRes_sbjct'} = \\%conservedList_sbjct;\n\n}\n\n=head2 _set_match_seq\n\n Usage     : $hsp_obj->_set_match_seq()\n Purpose   : Set the 'match' sequence for the current HSP (symbols in between\n           : the query and sbjct lines.)\n Returns   : Array reference holding the match sequences lines.\n Argument  : n/a\n Throws    : Exception if the _matchList field is not set.\n Comments  : The match information is not always necessary. This method\n           : allows it to be conditionally prepared.\n           : Called by _set_residues>() and seq_str().\n\nSee Also   : L</_set_residues>, L</seq_str>\n\n\n#-------------------\nsub _set_match_seq {\n#-------------------\n    my $self = shift;\n\n    if( ! ref($self->{'_matchList'}) ) {\n        my $id_str = $self->_id_str;\n        $self->throw(\"Can't set HSP match sequence: No data ($id_str)\");\n    }\n\n    my @data = @{$self->{'_matchList'}};\n\n    my(@sequence);\n    foreach( @data ) {\n\tchomp($_);\n\t## Remove leading spaces; (note: aln may begin with a space\n\t## which is why we can't use s/^ +//).\n\ts/^ {$self->{'_match_indent'}}//;\n\tpush @sequence, $_;\n    }\n    # Liberate some memory.\n    @{$self->{'_matchList'}} = undef;\n    $self->{'_matchList'} = undef;\n\n    $self->{'_matchSeq'} = \\@sequence;\n\n    return $self->{'_matchSeq'};\n}\n\n\n=head2 n\n\n Usage     : $hsp_obj->n()\n Purpose   : Get the N value (num HSPs on which P/Expect is based).\n           : This value is not defined with NCBI Blast2 with gapping.\n Returns   : Integer or null string if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : The 'N' value is listed in parenthesis with P/Expect value:\n           : e.g., P(3) = 1.2e-30  ---> (N = 3).\n           : Not defined in NCBI Blast2 with gaps.\n           : This typically is equal to the number of HSPs but not always.\n           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().\n\nSee Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>\n\n\n#-----\nsub n { my $self = shift; $self->{'N'} || ''; }\n#-----\n\n\n=head2 matches\n\n Usage     : $hsp->matches([seq_type], [start], [stop]);\n Purpose   : Get the total number of identical and conservative matches\n           : in the query or sbjct sequence for the given HSP. Optionally can\n           : report data within a defined interval along the seq.\n           : (Note: 'conservative' matches are called 'positives' in the\n\t   : Blast report.)\n Example   : ($id,$cons) = $hsp_object->matches('hit');\n           : ($id,$cons) = $hsp_object->matches('query',300,400);\n Returns   : 2-element array of integers\n Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : (2) start = Starting coordinate (optional)\n           : (3) stop  = Ending coordinate (optional)\n Throws    : Exception if the supplied coordinates are out of range.\n Comments  : Relies on seq_str('match') to get the string of alignment symbols\n           : between the query and sbjct lines which are used for determining\n           : the number of identical and conservative matches.\n\nSee Also   : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub matches {\n#-----------\n    my( $self, %param ) = @_;\n    my(@data);\n    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    my($start,$stop);\n\n    if(!defined $beg && !defined $end) {\n\t## Get data for the whole alignment.\n\tpush @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});\n    } else {\n\t## Get the substring representing the desired sub-section of aln.\n\t$beg ||= 0;\n\t$end ||= 0;\n\t($start,$stop) = $self->range($seqType);\n\tif($beg == 0) { $beg = $start; $end = $beg+$end; }\n\telsif($end == 0) { $end = $stop; $beg = $end-$beg; }\n\n\tif($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)\n\telse { $end += 1;}   ##ML moved from commented position below, makes\n                             ##more sense here\n#\tif($end > $stop) { $end = $stop; }\n\tif($beg < $start) { $beg = $start; }\n#\telse { $end += 1;}\n\n#\tmy $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));\n\n\t## ML: START fix for substr out of range error ------------------\n\tmy $seq = \"\";\n        my $prog = $self->algorithm;\n\tif (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\n\t} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\t} else {\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  $beg-$start, ($end-$beg));\n\t}\n\t## ML: End of fix for  substr out of range error -----------------\n\n\n\t## ML: debugging code\n\t## This is where we get our exception.  Try printing out the values going\n\t## into this:\n\t##\n#\t print STDERR\n#\t     qq(*------------MY EXCEPTION --------------------\\nSeq: \") ,\n#\t     $self->seq_str(\"$seqType\"), qq(\"\\n),$self->rank,\",(  index:\";\n#\t print STDERR  $beg-$start, \", len: \", $end-$beg,\" ), (HSPRealLen:\",\n#\t     CORE::length $self->seq_str(\"$seqType\");\n#\t print STDERR \", HSPCalcLen: \", $stop - $start +1 ,\" ),\n#\t     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\\n\";\n\t ## ML: END DEBUGGING CODE----------\n\n\tif(!CORE::length $seq) {\n            my $id_str = $self->_id_str;\n\t    $self->throw(\"Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)\");\n\t}\n\t## Get data for a substring.\n#\tprintf \"Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\\n%s<---\\n\", $beg, $end, $start, $stop, $seq;\n#\tprintf \"Original match seq:\\n%s\\n\",$self->seq_str('match');\n\t$seq =~ s/ //g;  # remove space (no info).\n\tmy $len_cons = CORE::length $seq;\n\t$seq =~ s/\\+//g;  # remove '+' characters (conservative substitutions)\n\tmy $len_id = CORE::length $seq;\n\tpush @data, ($len_id, $len_cons);\n#\tprintf \"  HSP = %s\\n  id = %d; cons = %d\\n\", $self->rank, $len_id, $len_cons; <STDIN>;\n    }\n    @data;\n}\n\n\n=head2 num_identical\n\n Usage     : $hsp_object->num_identical();\n Purpose   : Get the number of identical positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_identical();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_conserved>, L</frac_identical>\n\n\n#-------------------\nsub num_identical {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numIdentical'};\n}\n\n\n=head2 num_conserved\n\n Usage     : $hsp_object->num_conserved();\n Purpose   : Get the number of conserved positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_conserved();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_identical>, L</frac_conserved>\n\n\n#-------------------\nsub num_conserved {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numConserved'};\n}\n\n\n\n=head2 range\n\n Usage     : $hsp->range( [seq_type] );\n Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence\n           : in the HSP alignment.\n Example   : ($query_beg, $query_end) = $hsp->range('query');\n           : ($hit_beg, $hit_end) = $hsp->range('hit');\n Returns   : Two-element array of integers\n Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</start>, L</end>\n\n\n#----------\nsub range {\n#----------\n    my ($self, $seqType) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n\n    return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});\n}\n\n=head2 start\n\n Usage     : $hsp->start( [seq_type] );\n Purpose   : Gets the start coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_beg = $hsp->start('query');\n           : $hit_beg = $hsp->start('hit');\n           : ($query_beg, $hit_beg) = $hsp->start();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</end>, L</range>\n\n\n#----------\nsub start {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStart'}, $self->{'_sbjctStart'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Start'};\n    }\n}\n\n=head2 end\n\n Usage     : $hsp->end( [seq_type] );\n Purpose   : Gets the end coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_end = $hsp->end('query');\n           : $hit_end = $hsp->end('hit');\n           : ($query_end, $hit_end) = $hsp->end();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</start>, L</range>, L</strand>\n\n\n#----------\nsub end {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStop'}, $self->{'_sbjctStop'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Stop'};\n    }\n}\n\n\n\n=head2 strand\n\n Usage     : $hsp_object->strand( [seq_type] )\n Purpose   : Get the strand of the query or sbjct sequence.\n Example   : print $hsp->strand('query');\n           : ($query_strand, $hit_strand) = $hsp->strand();\n Returns   : -1, 0, or 1\n           : -1 = Minus strand, +1 = Plus strand\n           : Returns 0 if strand is not defined, which occurs\n           : for BLASTP reports, and the query of TBLASTN\n           : as well as the hit if BLASTX reports.\n           : In scalar context without arguments, returns queryStrand value.\n           : In array context without arguments, returns a two-element list\n           :    of strings (queryStrand, sbjctStrand).\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or undef\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</_set_seq>, L</_set_match_stats>\n\n\n#-----------\nsub strand {\n#-----------\n    my( $self, $seqType ) = @_;\n\n    $seqType  ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    # $seqType could be '_list'.\n    $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    my $prog = $self->algorithm;\n\n    if($seqType  =~ /list|array/i) {\n        my ($qstr, $hstr);\n        if( $prog eq 'BLASTP') {\n            $qstr = 0;\n            $hstr = 0;\n        }\n        elsif( $prog eq 'TBLASTN') {\n            $qstr = 0;\n            $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};\n        }\n        elsif( $prog eq 'BLASTX') {\n            $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};\n            $hstr = 0;\n        }\n        else {\n            $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};\n            $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};\n        }\n        $qstr ||= 0;\n        $hstr ||= 0;\n\treturn ($qstr, $hstr);\n    }\n    local $^W = 0;\n    $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;\n}\n\n\n=head2 seq\n\n Usage     : $hsp->seq( [seq_type] );\n Purpose   : Get the query or sbjct sequence as a Bio::Seq.pm object.\n Example   : $seqObj = $hsp->seq('query');\n Returns   : Object reference for a Bio::Seq.pm object.\n Argument  : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query').\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : Propagates any exception that occurs during construction\n           : of the Bio::Seq.pm object.\n Comments  : The sequence is returned in an array of strings corresponding\n           : to the strings in the original format of the Blast alignment.\n           : (i.e., same spacing).\n\nSee Also   : L</seq_str>, L</seq_inds>, L<Bio::Seq>","label":"seq($self,$seqType)"},"detail":"($self,$seqType)"},{"name":"Bio","containerName":"Seq::Bio::Seq","line":1508,"kind":12},{"detail":"($self,$seqType)","signature":{"label":"seq_str($self,$seqType)","parameters":[{"label":"$self"},{"label":"$seqType"}],"documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>\n\n\n#--------------------\nsub _set_match_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    if($data =~ m!Identities = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numIdentical'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Positives = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numConserved'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Frame = ([\\d+-]+)!) {\n      $self->frame($1);\n    }\n\n    # Strand data is not always present in this line.\n    # _set_seq() will also set strand information.\n    if($data =~ m!Strand = (\\w+) / (\\w+)!) {\n\t$self->{'_queryStrand'} = $1;\n\t$self->{'_sbjctStrand'} = $2;\n    }\n\n#    if($data =~ m!Gaps = (\\d+)/(\\d+)!) {\n#\t $self->{'_totalGaps'} = $1;\n#    } else {\n#\t $self->{'_totalGaps'} = 0;\n#    }\n}\n\n\n\n=head2 _set_seq_data\n\n Usage     : called automatically when sequence data is requested.\n Purpose   : Sets the HSP sequence data for both query and sbjct sequences.\n           : Includes: start, stop, length, gaps, and raw sequence.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_match_seq()\n Comments  : Uses raw data stored by _set_data() during object construction.\n           : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as gaps(), _set_residues(),\n           : etc. _set_seq() does the dirty work.\n\nSee Also   : L</_set_seq>\n\n\n#-----------------\nsub _set_seq_data {\n#-----------------\n    my $self = shift;\n\n    $self->_set_seq('query', @{$self->{'_queryList'}});\n    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});\n\n    # Liberate some memory.\n    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();\n    undef $self->{'_queryList'};\n    undef $self->{'_sbjctList'};\n\n    $self->{'_set_seq_data'} = 1;\n}\n\n\n\n=head2 _set_seq \n\n Usage     : called automatically by _set_seq_data()\n           : $hsp_obj->($seq_type, @data);\n Purpose   : Sets sequence information for both the query and sbjct sequences.\n           : Directly counts the number of gaps in each sequence (if gapped Blast).\n Argument  : $seq_type = 'query' or 'sbjct'\n           : @data = all seq lines with the form:\n           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120\n Throws    : Exception if data strings cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : Uses first argument to determine which data members to set\n           : making this method sensitive data member name changes.\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs\n           : for TBLASTN/X hits on the minus strand. Normalization facilitates use\n           : of range information by methods such as match().\n\nSee Also   : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>\n\n\n#-------------\nsub _set_seq {\n#-------------\n    my $self      = shift;\n    my $seqType   = shift;\n    my @data      = @_;\n    my @ranges    = ();\n    my @sequence  = ();\n    my $numGaps   = 0;\n\n    foreach( @data ) {\n        if( m/(\\d+) *([^\\d\\s]+) *(\\d+)/) {\n            push @ranges, ( $1, $3 ) ;\n            push @sequence, $2;\n        #print STDERR \"_set_seq found sequence \\\"$2\\\"\\n\";\n\t} else {\n\t    $self->warn(\"Bad sequence data: $_\");\n\t}\n    }\n\n    if( !(scalar(@sequence) and scalar(@ranges))) {\n        my $id_str = $self->_id_str;\n\t$self->throw(\"Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType\");\n   }\n\n    # Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n    $self->{$seqType.'Start'} = $ranges[0];\n    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];\n    $self->{$seqType.'Seq'}   = \\@sequence;\n\n    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;\n\n    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences\n    # Converting nucl coords to amino acid coords.\n\n    my $prog = $self->algorithm;\n    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'TBLASTX') {\n\t$self->{$seqType.'Length'} /= 3;\n    }\n\n    if( $prog ne 'BLASTP' ) {\n        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;\n        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');\n        # Normalize sequence endpoints so that start < end.\n        # Reverse complement or 'minus strand' HSPs get flipped here.\n        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {\n            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =\n                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});\n            $self->{$seqType.'Strand'} = 'Minus';\n        }\n    }\n\n    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.\n#    if($self->{'_gapped'}) {\n\tmy $seqstr = join('', @sequence);\n\t$seqstr =~ s/\\s//g;\n        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};\n\t$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;\n#    }\n}\n\n\n=head2 _set_residues\n\n Usage     : called automatically when residue data is requested.\n Purpose   : Sets the residue numbers representing the identical and\n           : conserved positions. These data are obtained by analyzing the\n           : symbols between query and sbjct lines of the alignments.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().\n Comments  : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as seq_inds().\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n\nSee Also   : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>\n\n\n#------------------\nsub _set_residues {\n#------------------\n    my $self      = shift;\n    my @sequence  = ();\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    # Using hashes to avoid saving duplicate residue numbers.\n    my %identicalList_query = ();\n    my %identicalList_sbjct = ();\n    my %conservedList_query = ();\n    my %conservedList_sbjct = ();\n\n    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};\n    $aref  ||= $self->{'_matchSeq'};\n    my $seqString = join('', @$aref );\n\n    my $qseq = join('',@{$self->{'_querySeq'}});\n    my $sseq = join('',@{$self->{'_sbjctSeq'}});\n    my $resCount_query = $self->{'_queryStop'} || 0;\n    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;\n\n    my $prog = $self->algorithm;\n    if($prog !~ /^BLASTP|^BLASTN/) {\n\tif($prog eq 'TBLASTN') {\n\t    $resCount_sbjct /= 3;\n\t} elsif($prog eq 'BLASTX') {\n\t    $resCount_query /= 3;\n\t} elsif($prog eq 'TBLASTX') {\n\t    $resCount_query /= 3;\n\t    $resCount_sbjct /= 3;\n\t}\n    }\n\n    my ($mchar, $schar, $qchar);\n    while( $mchar = chop($seqString) ) {\n\t($qchar, $schar) = (chop($qseq), chop($sseq));\n\tif( $mchar eq '+' ) {\n\t    $conservedList_query{ $resCount_query } = 1;\n\t    $conservedList_sbjct{ $resCount_sbjct } = 1;\n\t} elsif( $mchar ne ' ' ) {\n\t    $identicalList_query{ $resCount_query } = 1;\n\t    $identicalList_sbjct{ $resCount_sbjct } = 1;\n\t}\n\t$resCount_query-- if $qchar ne $GAP_SYMBOL;\n\t$resCount_sbjct-- if $schar ne $GAP_SYMBOL;\n    }\n    $self->{'_identicalRes_query'} = \\%identicalList_query;\n    $self->{'_conservedRes_query'} = \\%conservedList_query;\n    $self->{'_identicalRes_sbjct'} = \\%identicalList_sbjct;\n    $self->{'_conservedRes_sbjct'} = \\%conservedList_sbjct;\n\n}\n\n=head2 _set_match_seq\n\n Usage     : $hsp_obj->_set_match_seq()\n Purpose   : Set the 'match' sequence for the current HSP (symbols in between\n           : the query and sbjct lines.)\n Returns   : Array reference holding the match sequences lines.\n Argument  : n/a\n Throws    : Exception if the _matchList field is not set.\n Comments  : The match information is not always necessary. This method\n           : allows it to be conditionally prepared.\n           : Called by _set_residues>() and seq_str().\n\nSee Also   : L</_set_residues>, L</seq_str>\n\n\n#-------------------\nsub _set_match_seq {\n#-------------------\n    my $self = shift;\n\n    if( ! ref($self->{'_matchList'}) ) {\n        my $id_str = $self->_id_str;\n        $self->throw(\"Can't set HSP match sequence: No data ($id_str)\");\n    }\n\n    my @data = @{$self->{'_matchList'}};\n\n    my(@sequence);\n    foreach( @data ) {\n\tchomp($_);\n\t## Remove leading spaces; (note: aln may begin with a space\n\t## which is why we can't use s/^ +//).\n\ts/^ {$self->{'_match_indent'}}//;\n\tpush @sequence, $_;\n    }\n    # Liberate some memory.\n    @{$self->{'_matchList'}} = undef;\n    $self->{'_matchList'} = undef;\n\n    $self->{'_matchSeq'} = \\@sequence;\n\n    return $self->{'_matchSeq'};\n}\n\n\n=head2 n\n\n Usage     : $hsp_obj->n()\n Purpose   : Get the N value (num HSPs on which P/Expect is based).\n           : This value is not defined with NCBI Blast2 with gapping.\n Returns   : Integer or null string if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : The 'N' value is listed in parenthesis with P/Expect value:\n           : e.g., P(3) = 1.2e-30  ---> (N = 3).\n           : Not defined in NCBI Blast2 with gaps.\n           : This typically is equal to the number of HSPs but not always.\n           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().\n\nSee Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>\n\n\n#-----\nsub n { my $self = shift; $self->{'N'} || ''; }\n#-----\n\n\n=head2 matches\n\n Usage     : $hsp->matches([seq_type], [start], [stop]);\n Purpose   : Get the total number of identical and conservative matches\n           : in the query or sbjct sequence for the given HSP. Optionally can\n           : report data within a defined interval along the seq.\n           : (Note: 'conservative' matches are called 'positives' in the\n\t   : Blast report.)\n Example   : ($id,$cons) = $hsp_object->matches('hit');\n           : ($id,$cons) = $hsp_object->matches('query',300,400);\n Returns   : 2-element array of integers\n Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : (2) start = Starting coordinate (optional)\n           : (3) stop  = Ending coordinate (optional)\n Throws    : Exception if the supplied coordinates are out of range.\n Comments  : Relies on seq_str('match') to get the string of alignment symbols\n           : between the query and sbjct lines which are used for determining\n           : the number of identical and conservative matches.\n\nSee Also   : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub matches {\n#-----------\n    my( $self, %param ) = @_;\n    my(@data);\n    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    my($start,$stop);\n\n    if(!defined $beg && !defined $end) {\n\t## Get data for the whole alignment.\n\tpush @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});\n    } else {\n\t## Get the substring representing the desired sub-section of aln.\n\t$beg ||= 0;\n\t$end ||= 0;\n\t($start,$stop) = $self->range($seqType);\n\tif($beg == 0) { $beg = $start; $end = $beg+$end; }\n\telsif($end == 0) { $end = $stop; $beg = $end-$beg; }\n\n\tif($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)\n\telse { $end += 1;}   ##ML moved from commented position below, makes\n                             ##more sense here\n#\tif($end > $stop) { $end = $stop; }\n\tif($beg < $start) { $beg = $start; }\n#\telse { $end += 1;}\n\n#\tmy $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));\n\n\t## ML: START fix for substr out of range error ------------------\n\tmy $seq = \"\";\n        my $prog = $self->algorithm;\n\tif (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\n\t} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\t} else {\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  $beg-$start, ($end-$beg));\n\t}\n\t## ML: End of fix for  substr out of range error -----------------\n\n\n\t## ML: debugging code\n\t## This is where we get our exception.  Try printing out the values going\n\t## into this:\n\t##\n#\t print STDERR\n#\t     qq(*------------MY EXCEPTION --------------------\\nSeq: \") ,\n#\t     $self->seq_str(\"$seqType\"), qq(\"\\n),$self->rank,\",(  index:\";\n#\t print STDERR  $beg-$start, \", len: \", $end-$beg,\" ), (HSPRealLen:\",\n#\t     CORE::length $self->seq_str(\"$seqType\");\n#\t print STDERR \", HSPCalcLen: \", $stop - $start +1 ,\" ),\n#\t     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\\n\";\n\t ## ML: END DEBUGGING CODE----------\n\n\tif(!CORE::length $seq) {\n            my $id_str = $self->_id_str;\n\t    $self->throw(\"Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)\");\n\t}\n\t## Get data for a substring.\n#\tprintf \"Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\\n%s<---\\n\", $beg, $end, $start, $stop, $seq;\n#\tprintf \"Original match seq:\\n%s\\n\",$self->seq_str('match');\n\t$seq =~ s/ //g;  # remove space (no info).\n\tmy $len_cons = CORE::length $seq;\n\t$seq =~ s/\\+//g;  # remove '+' characters (conservative substitutions)\n\tmy $len_id = CORE::length $seq;\n\tpush @data, ($len_id, $len_cons);\n#\tprintf \"  HSP = %s\\n  id = %d; cons = %d\\n\", $self->rank, $len_id, $len_cons; <STDIN>;\n    }\n    @data;\n}\n\n\n=head2 num_identical\n\n Usage     : $hsp_object->num_identical();\n Purpose   : Get the number of identical positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_identical();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_conserved>, L</frac_identical>\n\n\n#-------------------\nsub num_identical {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numIdentical'};\n}\n\n\n=head2 num_conserved\n\n Usage     : $hsp_object->num_conserved();\n Purpose   : Get the number of conserved positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_conserved();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_identical>, L</frac_conserved>\n\n\n#-------------------\nsub num_conserved {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numConserved'};\n}\n\n\n\n=head2 range\n\n Usage     : $hsp->range( [seq_type] );\n Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence\n           : in the HSP alignment.\n Example   : ($query_beg, $query_end) = $hsp->range('query');\n           : ($hit_beg, $hit_end) = $hsp->range('hit');\n Returns   : Two-element array of integers\n Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</start>, L</end>\n\n\n#----------\nsub range {\n#----------\n    my ($self, $seqType) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n\n    return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});\n}\n\n=head2 start\n\n Usage     : $hsp->start( [seq_type] );\n Purpose   : Gets the start coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_beg = $hsp->start('query');\n           : $hit_beg = $hsp->start('hit');\n           : ($query_beg, $hit_beg) = $hsp->start();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</end>, L</range>\n\n\n#----------\nsub start {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStart'}, $self->{'_sbjctStart'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Start'};\n    }\n}\n\n=head2 end\n\n Usage     : $hsp->end( [seq_type] );\n Purpose   : Gets the end coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_end = $hsp->end('query');\n           : $hit_end = $hsp->end('hit');\n           : ($query_end, $hit_end) = $hsp->end();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</start>, L</range>, L</strand>\n\n\n#----------\nsub end {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStop'}, $self->{'_sbjctStop'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Stop'};\n    }\n}\n\n\n\n=head2 strand\n\n Usage     : $hsp_object->strand( [seq_type] )\n Purpose   : Get the strand of the query or sbjct sequence.\n Example   : print $hsp->strand('query');\n           : ($query_strand, $hit_strand) = $hsp->strand();\n Returns   : -1, 0, or 1\n           : -1 = Minus strand, +1 = Plus strand\n           : Returns 0 if strand is not defined, which occurs\n           : for BLASTP reports, and the query of TBLASTN\n           : as well as the hit if BLASTX reports.\n           : In scalar context without arguments, returns queryStrand value.\n           : In array context without arguments, returns a two-element list\n           :    of strings (queryStrand, sbjctStrand).\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or undef\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</_set_seq>, L</_set_match_stats>\n\n\n#-----------\nsub strand {\n#-----------\n    my( $self, $seqType ) = @_;\n\n    $seqType  ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    # $seqType could be '_list'.\n    $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    my $prog = $self->algorithm;\n\n    if($seqType  =~ /list|array/i) {\n        my ($qstr, $hstr);\n        if( $prog eq 'BLASTP') {\n            $qstr = 0;\n            $hstr = 0;\n        }\n        elsif( $prog eq 'TBLASTN') {\n            $qstr = 0;\n            $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};\n        }\n        elsif( $prog eq 'BLASTX') {\n            $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};\n            $hstr = 0;\n        }\n        else {\n            $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};\n            $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};\n        }\n        $qstr ||= 0;\n        $hstr ||= 0;\n\treturn ($qstr, $hstr);\n    }\n    local $^W = 0;\n    $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;\n}\n\n\n=head2 seq\n\n Usage     : $hsp->seq( [seq_type] );\n Purpose   : Get the query or sbjct sequence as a Bio::Seq.pm object.\n Example   : $seqObj = $hsp->seq('query');\n Returns   : Object reference for a Bio::Seq.pm object.\n Argument  : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query').\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : Propagates any exception that occurs during construction\n           : of the Bio::Seq.pm object.\n Comments  : The sequence is returned in an array of strings corresponding\n           : to the strings in the original format of the Blast alignment.\n           : (i.e., same spacing).\n\nSee Also   : L</seq_str>, L</seq_inds>, L<Bio::Seq>\n\n\n#-------\nsub seq {\n#-------\n    my($self,$seqType) = @_;\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n    my $str = $self->seq_str($seqType);\n\n    require Bio::Seq;\n\n    Bio::Seq->new(-ID   => $self->to_string,\n\t\t  -SEQ  => $str,\n\t\t  -DESC => \"$seqType sequence\",\n\t\t  );\n}\n\n=head2 seq_str\n\n Usage     : $hsp->seq_str( seq_type );\n Purpose   : Get the full query, sbjct, or 'match' sequence as a string.\n           : The 'match' sequence is the string of symbols in between the\n           : query and sbjct sequences.\n Example   : $str = $hsp->seq_str('query');\n Returns   : String\n Argument  : seq_Type = 'query' or 'hit' or 'sbjct' or 'match'\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : Exception if the argument does not match an accepted seq_type.\n Comments  : Calls _set_seq_data() to set the 'match' sequence if it has\n           : not been set already.\n\nSee Also   : L</seq>, L</seq_inds>, L</_set_match_seq>"},"containerName":"main::","definition":"sub","line":1533,"children":[{"localvar":"my","name":"$self","definition":"my","containerName":"seq_str","line":1535,"kind":13},{"line":1535,"kind":13,"containerName":"seq_str","name":"$seqType"},{"line":1537,"kind":13,"containerName":"seq_str","name":"$seqType"},{"line":1538,"kind":13,"name":"$seqType","containerName":"seq_str"},{"line":1538,"kind":13,"name":"$seqType","containerName":"seq_str"},{"kind":13,"line":1540,"containerName":"seq_str","name":"$seqType"},{"line":1542,"kind":13,"containerName":"seq_str","name":"$self"},{"name":"_set_seq_data","containerName":"seq_str","line":1542,"kind":12},{"containerName":"seq_str","name":"$self","line":1542,"kind":13},{"name":"$seqType","containerName":"seq_str","line":1544,"kind":13},{"containerName":"seq_str","name":"$seq","definition":"my","localvar":"my","kind":13,"line":1545},{"containerName":"seq_str","name":"$self","line":1545,"kind":13},{"containerName":"seq_str","name":"$seqType","line":1545,"kind":13},{"kind":13,"line":1546,"name":"$seq","containerName":"seq_str"},{"kind":13,"line":1547,"name":"$seq","containerName":"seq_str"}],"kind":12,"range":{"start":{"line":1533,"character":0},"end":{"character":9999,"line":1549}},"name":"seq_str"},{"line":1549,"kind":13,"containerName":null,"name":"%seqType"},{"localvar":"my","containerName":null,"name":"$aref","definition":"my","line":1551,"kind":13},{"kind":13,"line":1551,"containerName":null,"name":"$self"},{"name":"_set_match_seq","containerName":"main::","line":1551,"kind":12},{"name":"%self","containerName":null,"line":1551,"kind":13},{"name":"$aref","containerName":null,"kind":13,"line":1552},{"containerName":null,"name":"%self","kind":13,"line":1552},{"kind":13,"line":1554,"containerName":null,"name":"%aref"},{"kind":13,"line":1557,"containerName":null,"name":"$id_str","definition":"my","localvar":"my"},{"kind":13,"line":1557,"name":"$self","containerName":null},{"kind":12,"line":1557,"containerName":"main::","name":"_id_str"},{"kind":13,"line":1558,"containerName":null,"name":"$self"},{"name":"throw","containerName":"main::","kind":12,"line":1558},{"containerName":null,"name":"$seqType","line":1561,"kind":13},{"range":{"start":{"line":1593,"character":0},"end":{"line":1611,"character":9999}},"name":"seq_inds","line":1593,"children":[{"containerName":"seq_inds","definition":"my","name":"$self","localvar":"my","kind":13,"line":1595},{"containerName":"seq_inds","name":"$seqType","kind":13,"line":1595},{"line":1595,"kind":13,"containerName":"seq_inds","name":"$class"},{"containerName":"seq_inds","name":"$collapse","line":1595,"kind":13},{"containerName":"seq_inds","name":"$seqType","line":1597,"kind":13},{"line":1598,"kind":13,"name":"$class","containerName":"seq_inds"},{"kind":13,"line":1599,"name":"$collapse","containerName":"seq_inds"},{"kind":13,"line":1600,"containerName":"seq_inds","name":"$seqType"},{"containerName":"seq_inds","name":"$seqType","line":1600,"kind":13},{"kind":13,"line":1602,"containerName":"seq_inds","name":"$self"},{"name":"_set_residues","containerName":"seq_inds","line":1602,"kind":12},{"containerName":"seq_inds","name":"$self","kind":13,"line":1602},{"line":1604,"kind":13,"containerName":"seq_inds","name":"$seqType"},{"name":"$seqType","containerName":"seq_inds","line":1604,"kind":13},{"kind":13,"line":1605,"containerName":"seq_inds","name":"$class"},{"line":1605,"kind":13,"name":"$class","containerName":"seq_inds"},{"line":1608,"kind":13,"name":"$seqType","containerName":"seq_inds"},{"containerName":"seq_inds","name":"$class","line":1609,"kind":13},{"definition":"my","name":"@ary","containerName":"seq_inds","localvar":"my","kind":13,"line":1611},{"name":"$a","containerName":"seq_inds","kind":13,"line":1611},{"name":"$b","containerName":"seq_inds","line":1611,"kind":13},{"containerName":"seq_inds","name":"$self","kind":13,"line":1611}],"kind":12,"detail":"($self,$seqType,$class,$collapse)","signature":{"label":"seq_inds($self,$seqType,$class,$collapse)","documentation":"1;\n#-----------------------------------------------------------------\n# $Id: BlastHSP.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module Bio::Search::HSP::BlastHSP\n#\n# (This module was originally called Bio::Tools::Blast::HSP)\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Steve Chervitz <sac@bioperl.org>\n#\n# You may distribute this module under the same terms as perl itself\n#-----------------------------------------------------------------\n\n## POD Documentation:\n\n=head1 NAME\n\nBio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object\n\n=head1 SYNOPSIS\n\nSee L<Bio::Search::Hit::BlastHit>.\n\n=head1 DESCRIPTION\n\nA Bio::Search::HSP::BlastHSP object provides an interface to data\nobtained in a single alignment section of a Blast report (known as a\n\"High-scoring Segment Pair\"). This is essentially a pairwise\nalignment with score information.\n\nBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>\nobjects after parsing a BLAST report using the L<Bio::SearchIO>\nsystem.\n\nThe construction of BlastHSP objects is performed by\nBio::Factory::BlastHitFactory in a process that is\norchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).\nThe resulting BlastHSPs are then accessed via\nL<Bio::Search::Hit::BlastHit>). Therefore, you do not need to\nuse L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct\nBlastHSPs directly, see the new() function for details.\n\nFor L<Bio::SearchIO> BLAST parsing usage examples, see the\nC<examples/searchio> directory of the Bioperl distribution.\n\n=head2 Start and End coordinates\n\nSequence endpoints are swapped so that start is always less than\nend. This affects For TBLASTN/X hits on the minus strand. Strand\ninformation can be recovered using the strand() method. This\nnormalization step is standard Bioperl practice. It also facilitates\nuse of range information by methods such as match().\n\n=over 1\n\n* * Supports BLAST versions 1.x and 2.x, gapped and ungapped.\n\n\nBio::Search::HSP::BlastHSP.pm has the ability to extract a list of all\nresidue indices for identical and conservative matches along both\nquery and sbjct sequences. Since this degree of detail is not always\nneeded, this behavior does not occur during construction of the BlastHSP\nobject.  These data will automatically be collected as necessary as\nthe BlastHSP.pm object is used.\n\n=head1 DEPENDENCIES\n\nBio::Search::HSP::BlastHSP.pm is a concrete class that inherits from\nL<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.\nL<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating\nsequence and alignment objects, respectively.\n\n=head2 Relationship to SimpleAlign.pm & Seq.pm\n\nBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>\nobject via the L<seq()|seq> method. The BlastHSP.pm object can also create a\ntwo-sequence L<Bio::SimpleAlign> alignment object using the the query\nand sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment\nobjects is not automatic when constructing the BlastHSP.pm object since\nthis level of functionality is not always required and would generate\na lot of extra overhead when crunching many reports.\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules.  Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via the\nweb:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR\n\nSteve Chervitz E<lt>sac-at-bioperl.orgE<gt>\n\nSee L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.\n\n=head1 ACKNOWLEDGEMENTS\n\nThis software was originally developed in the Department of Genetics\nat Stanford University. I would also like to acknowledge my\ncolleagues at Affymetrix for useful feedback.\n\n=head1 SEE ALSO\n\n Bio::Search::Hit::BlastHit.pm          - Blast hit object.\n Bio::Search::Result::BlastResult.pm    - Blast Result object.\n Bio::Seq.pm                            - Biosequence object\n\n=head2 Links:\n\n http://bio.perl.org/                       - Bioperl Project Homepage\n\n=head1 COPYRIGHT\n\nCopyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.\n\n=head1 DISCLAIMER\n\nThis software is provided \"as is\" without warranty of any kind.\n\n\n\n# END of main POD documentation.\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object methods.\nInternal methods are usually preceded with a _\n\n\n# Let the code begin...\n\npackage Bio::Search::HSP::BlastHSP;\n\nuse strict;\nuse Bio::SeqFeature::Similarity;\n\nuse vars qw($GAP_SYMBOL %STRAND_SYMBOL);\n\nuse overload\n    '\"\"' => \\&to_string;\n\nuse base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);\n\n$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.\n%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );\n\n\n=head2 new\n\n Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );\n           : Bio::Search::HSP::BlastHSP objects are constructed\n           : automatically by Bio::SearchIO::BlastHitFactory,\n           : so there is no need for direct instantiation.\n Purpose   : Constructs a new BlastHSP object and Initializes key variables\n           : for the HSP.\n Returns   : A Bio::Search::HSP::BlastHSP object\n Argument  : Named parameters:\n           : Parameter keys are case-insensitive.\n           :      -RAW_DATA  => array ref containing raw BLAST report data for\n           :                    for a single HSP. This includes all lines\n           :                    of the HSP alignment from a traditional BLAST\n                                or PSI-BLAST (non-XML) report,\n           :      -RANK         => integer (1..n).\n           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).\n           :      -QUERY_NAME   => string, id of query sequence\n           :      -HIT_NAME     => string, id of hit sequence\n           :\n Comments  : Having the raw data allows this object to do lazy parsing of\n           : the raw HSP data (i.e., not parsed until needed).\n           :\n           : Note that there is a fair amount of basic parsing that is\n           : currently performed in this module that would be more appropriate\n           : to do within a separate factory object.\n           : This parsing code will likely be relocated and more initialization\n           : parameters will be added to new().\n           :\nSee Also   : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>\n\n\n#----------------\nsub new {\n#----------------\n    my ($class, @args ) = @_;\n\n    my $self = $class->SUPER::new( @args );\n    # Initialize placeholders\n    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;\n    my ($raw_data, $qname, $hname, $qlen, $hlen);\n\n    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,\n     $qname, $hname) =\n\t $self->_rearrange([qw( PROGRAM\n\t\t\t\tRANK\n\t\t\t\tRAW_DATA\n\t\t\t\tQUERY_NAME\n\t\t\t\tHIT_NAME\n\t\t\t\t)], @args );\n\n    # _set_data() does a fair amount of parsing.\n    # This will likely change (see comment above.)\n    $self->_set_data( @{$raw_data} );\n    # Store the aligned query as sequence feature\n    my ($qb, $hb) = ($self->start());\n    my ($qe, $he) = ($self->end());\n    my ($qs, $hs) = ($self->strand());\n    my ($qf,$hf) = ($self->query->frame(),\n\t\t    $self->hit->frame);\n\n    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb,\n\t\t\t\t\t\t    -end     =>$qe,\n\t\t\t\t\t\t    -strand  =>$qs,\n\t\t\t\t\t\t    -bits    =>$self->bits,\n\t\t\t\t\t\t    -score   =>$self->score,\n\t\t\t\t\t\t    -frame   =>$qf,\n\t\t\t\t\t\t    -seq_id  => $qname,\n\t\t\t\t\t\t    -source  =>$self->{'_prog'} ));\n\n    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb,\n\t\t\t\t\t\t  -end     =>$he,\n\t\t\t\t\t\t  -strand  =>$hs,\n\t\t\t\t\t\t  -bits    =>$self->bits,\n\t\t\t\t\t\t  -score   =>$self->score,\n                                                  -frame   =>$hf,\n\t\t\t\t\t\t  -seq_id  => $hname,\n\t\t\t\t\t\t  -source  =>$self->{'_prog'} ));\n\n    # set lengths\n    $self->query->seqlength($qlen); # query\n    $self->hit->seqlength($hlen); # subject\n\n    $self->query->frac_identical($self->frac_identical('query'));\n    $self->hit->frac_identical($self->frac_identical('hit'));\n    return $self;\n}\n\n#sub DESTROY {\n#    my $self = shift;\n#    #print STDERR \"--->DESTROYING $self\\n\";\n#}\n\n\n# Title   : _id_str;\n# Purpose : Intended for internal use only to provide a string for use\n#           within exception messages to help users figure out which\n#           query/hit caused the problem.\n# Returns : Short string with name of query and hit seq\nsub _id_str {\n    my $self = shift;\n    if( not defined $self->{'_id_str'}) {\n        my $qname = $self->query->seq_id;\n        my $hname = $self->hit->seq_id;\n        $self->{'_id_str'} = \"QUERY=\\\"$qname\\\" HIT=\\\"$hname\\\"\";\n    }\n    return $self->{'_id_str'};\n}\n\n#=================================================\n# Begin Bio::Search::HSP::HSPI implementation\n#=================================================\n\n=head2 algorithm\n\n Title   : algorithm\n Usage   : $alg = $hsp->algorithm();\n Function: Gets the algorithm specification that was used to obtain the hsp\n           For BLAST, the algorithm denotes what type of sequence was aligned\n           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated\n           dna-prt, TBLASTN prt-translated dna, TBLASTX translated\n           dna-translated dna).\n Returns : a scalar string\n Args    : none\n\n\n#----------------\nsub algorithm {\n#----------------\n    my ($self,@args) = @_;\n    return $self->{'_prog'};\n}\n\n\n\n\n=head2 signif()\n\n Usage     : $hsp_obj->signif()\n Purpose   : Get the P-value or Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n           : Returns P-value if it is defined, otherwise, Expect value.\n Argument  : n/a\n Throws    : n/a\n Comments  : Provided for consistency with BlastHit::signif()\n           : Support for returning the significance data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub signif {\n#-----------\n    my $self = shift;\n    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};\n    $val;\n}\n\n\n\n=head2 evalue\n\n Usage     : $hsp_obj->evalue()\n Purpose   : Get the Expect value for the HSP.\n Returns   : Float (0.001 or 1.3e-43)\n Argument  : n/a\n Throws    : n/a\n Comments  : Support for returning the expectation data in different\n           : formats (e.g., exponent only), is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L</p>\n\n\n#----------\nsub evalue { shift->{'_expect'} }\n#----------\n\n\n=head2 p\n\n Usage     : $hsp_obj->p()\n Purpose   : Get the P-value for the HSP.\n Returns   : Float (0.001 or 1.3e-43) or undef if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : P-value is not defined with NCBI Blast2 reports.\n           : Support for returning the expectation data in different\n           : formats (e.g., exponent only) is not provided for HSP objects.\n           : This is only available for the BlastHit or Blast object.\n\nSee Also   : L<expect()|expect>\n\n\n#-----\nsub p { my $self = shift; $self->{'_p'}; }\n#-----\n\n# alias\nsub pvalue { shift->p(@_); }\n\n=head2 length\n\n Usage     : $hsp->length( [seq_type] )\n Purpose   : Get the length of the aligned portion of the query or sbjct.\n Example   : $hsp->length('query')\n Returns   : integer\n Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')\n             ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n Comments  : 'total' length is the full length of the alignment\n           : as reported in the denominators in the alignment section:\n           : \"Identical = 34/120 Positives = 67/120\".\n\nSee Also   : L</gaps>\n\n\n#-----------\nsub length {\n#-----------\n## Developer note: when using the built-in length function within\n##                 this module, call it as CORE::length().\n    my( $self, $seqType,$data ) = @_;\n    $seqType  ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n    if( defined $data  ) {\n\t$self->{$seqType.'Length'} = $data;\n    }\n    $self->{$seqType.'Length'};\n}\n\n\n\n=head2 gaps\n\n Usage     : $hsp->gaps( [seq_type] )\n Purpose   : Get the number of gap characters in the query, sbjct, or total alignment.\n           : Also can return query gap chars and sbjct gap chars as a two-element list\n           : when in array context.\n Example   : $total_gaps      = $hsp->gaps();\n           : ($qgaps, $sgaps) = $hsp->gaps();\n           : $qgaps           = $hsp->gaps('query');\n Returns   : scalar context: integer\n           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : (default = 'total', scalar context)\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</length>, L</matches>\n\n\n#---------\nsub gaps {\n#---------\n    my( $self, $seqType ) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType  ||= (wantarray ? 'list' : 'total');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType =~ /list|array/i) {\n\treturn (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));\n    }\n\n    if($seqType eq 'total') {\n\treturn ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;\n    } else {\n\t## Sensitive to member name format.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Gaps'} || 0;\n    }\n}\n\n\n=head2 frac_identical\n\n Usage     : $hsp_object->frac_identical( [seq_type] );\n Purpose   : Get the fraction of identical positions within the given HSP.\n Example   : $frac_iden = $hsp_object->frac_identical('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction identical among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct' ('sbjct' is synonymous with 'hit').\n\nSee Also   : L</frac_conserved>, L</num_identical>, L</matches>\n\n\n#-------------------\nsub frac_identical {\n#-------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numIdentical'}/$self->{$seqType.'Length'});\n}\n\n\n=head2 frac_conserved\n\n Usage     : $hsp_object->frac_conserved( [seq_type] );\n Purpose   : Get the fraction of conserved positions within the given HSP.\n           : (Note: 'conservative' positions are called 'positives' in the\n\t   : Blast report.)\n Example   : $frac_cons = $hsp_object->frac_conserved('query');\n Returns   : Float (2-decimal precision, e.g., 0.75).\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'\n           :  ('sbjct' is synonymous with 'hit')\n           : default = 'total' (but see comments below).\n Throws    : n/a\n Comments  : Different versions of Blast report different values for the total\n           : length of the alignment. This is the number reported in the\n           : denominators in the stats section:\n           : \"Identical = 34/120 Positives = 67/120\".\n           : NCBI-BLAST uses the total length of the alignment (with gaps)\n           : WU-BLAST uses the length of the query sequence (without gaps).\n           : Therefore, when called without an argument or an argument of 'total',\n           : this method will report different values depending on the\n           : version of BLAST used.\n           :\n           : To get the fraction conserved among only the aligned residues,\n           : ignoring the gaps, call this method with an argument of 'query'\n           : or 'sbjct'.\n\nSee Also   : L</frac_conserved>, L</num_conserved>, L</matches>\n\n\n#--------------------\nsub frac_conserved {\n#--------------------\n# The value is calculated as opposed to storing it from the parsed results.\n# This saves storage and also permits flexibility in determining for which\n# sequence (query or sbjct) the figure is to be calculated.\n\n    my( $self, $seqType ) = @_;\n    $seqType ||= 'total';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    if($seqType ne 'total') {\n      $self->_set_seq_data() unless $self->{'_set_seq_data'};\n    }\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    sprintf( \"%.2f\", $self->{'_numConserved'}/$self->{$seqType.'Length'});\n}\n\n=head2 query_string\n\n Title   : query_string\n Usage   : my $qseq = $hsp->query_string;\n Function: Retrieves the query sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub query_string{ shift->seq_str('query'); }\n#----------------\n\n=head2 hit_string\n\n Title   : hit_string\n Usage   : my $hseq = $hsp->hit_string;\n Function: Retrieves the hit sequence of this HSP as a string\n Returns : string\n Args    : none\n\n\n\n#----------------\nsub hit_string{ shift->seq_str('hit'); }\n#----------------\n\n\n=head2 homology_string\n\n Title   : homology_string\n Usage   : my $homo_string = $hsp->homology_string;\n Function: Retrieves the homology sequence for this HSP as a string.\n         : The homology sequence is the string of symbols in between the\n         : query and hit sequences in the alignment indicating the degree\n         : of conservation (e.g., identical, similar, not similar).\n Returns : string\n Args    : none\n\n\n#----------------\nsub homology_string{ shift->seq_str('match'); }\n#----------------\n\n#=================================================\n# End Bio::Search::HSP::HSPI implementation\n#=================================================\n\n# Older method delegating to method defined in HSPI.\n\n=head2 expect\n\nSee L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>\n\n\n#----------\nsub expect { shift->evalue( @_ ); }\n#----------\n\n\n=head2 rank\n\n Usage     : $hsp->rank( [string] );\n Purpose   : Get the rank of the HSP within a given Blast hit.\n Example   : $rank = $hsp->rank;\n Returns   : Integer (1..n) corresponding to the order in which the HSP\n             appears in the BLAST report.\n\n\n#'\n\n#----------\nsub rank { shift->{'_rank'} }\n#----------\n\n# For backward compatibility\n#----------\nsub name { shift->rank }\n#----------\n\n=head2 to_string\n\n Title   : to_string\n Usage   : print $hsp->to_string;\n Function: Returns a string representation for the Blast HSP.\n           Primarily intended for debugging purposes.\n Example : see usage\n Returns : A string of the form:\n           [BlastHSP] <rank>\n           e.g.:\n           [BlastHit] 1\n Args    : None\n\n\n#----------\nsub to_string {\n#----------\n    my $self = shift;\n    return \"[BlastHSP] \" . $self->rank();\n}\n\n\n=head2 _set_data\n\n Usage     : called automatically during object construction.\n Purpose   : Parses the raw HSP section from a flat BLAST report and\n             sets the query sequence, sbjct sequence, and the \"match\" data\n           : which consists of the symbols between the query and sbjct lines\n           : in the alignment.\n Argument  : Array (all lines for a single, complete HSP, from a raw,\n             flat (i.e., non-XML) BLAST report)\n Throws    : Propagates any exceptions from the methods called (\"See Also\")\n\nSee Also   : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>\n\n\n#--------------\nsub _set_data {\n#--------------\n    my $self = shift;\n    my @data = @_;\n    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.\n    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.\n    my @matchList  = ();\n    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.\n    my @linedat = ();\n\n    #print STDERR \"BlastHSP: set_data()\\n\";\n\n    my($line, $aln_row_len, $length_diff);\n    $length_diff = 0;\n\n    # Collecting data for all lines in the alignment\n    # and then storing the collections for possible processing later.\n    #\n    # Note that \"match\" lines may not be properly padded with spaces.\n    # This loop now properly handles such cases:\n    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200\n    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI\n    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200\n\n    foreach $line( @data ) {\n\tnext if $line =~ /^\\s*$/;\n\n\tif( $line =~ /^ ?Score/ ) {\n\t    $self->_set_score_stats( $line );\n\t} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {\n\t    $self->_set_match_stats( $line );\n\t} elsif( $line =~ /^ ?Frame = ([\\d+-]+)/ ) {\n\t  # Version 2.0.8 has Frame information on a separate line.\n\t  # Storing frame according to SeqFeature::Generic::frame()\n\t  # which does not contain strand info (use strand()).\n\t  my $frame = abs($1) - 1;\n\t  $self->frame( $frame );\n\t} elsif( $line =~ /^(Query:?[\\s\\d]+)([^\\s\\d]+)/ ) {\n\t    push @queryList, $line;\n\t    $self->{'_match_indent'} = CORE::length $1;\n\t    $aln_row_len = (CORE::length $1) + (CORE::length $2);\n\t    $matchLine = 1;\n\t} elsif( $matchLine ) {\n\t    # Pad the match line with spaces if necessary.\n\t    $length_diff = $aln_row_len - CORE::length $line;\n\t    $length_diff and $line .= ' 'x $length_diff;\n\t    push @matchList, $line;\n\t    $matchLine = 0;\n\t} elsif( $line =~ /^Sbjct/ ) {\n\t    push @sbjctList, $line;\n\t}\n    }\n    # Storing the query and sbjct lists in case they are needed later.\n    # We could make this conditional to save memory.\n    $self->{'_queryList'} = \\@queryList;\n    $self->{'_sbjctList'} = \\@sbjctList;\n\n    # Storing the match list in case it is needed later.\n    $self->{'_matchList'} = \\@matchList;\n\n    if(not defined ($self->{'_numIdentical'})) {\n        my $id_str = $self->_id_str;\n        $self->throw( -text  => \"Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)\");\n    }\n\n    if(!scalar @queryList or !scalar @sbjctList) {\n        my $id_str = $self->_id_str;\n        $self->throw( \"Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)\");\n    }\n}\n\n\n=head2 _set_score_stats\n\n Usage     : called automatically by _set_data()\n Purpose   : Sets various score statistics obtained from the HSP listing.\n Argument  : String with any of the following formats:\n           : blast2:  Score = 30.1 bits (66), Expect = 9.2\n           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110\n           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40\n           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n\nSee Also   : L</_set_data>\n\n\n#--------------------\nsub _set_score_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    my ($expect, $p);\n\n    if($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect = +([\\d.e+-]+)/) {\n\t# blast2 format n = 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$expect            = $3;\n    } elsif($data =~ /Score = +([\\d.e+-]+) bits \\(([\\d.e+-]+)\\), +Expect\\((\\d+)\\) = +([\\d.e+-]+)/) {\n\t# blast2 format n > 1\n\t$self->bits($1);\n\t$self->score($2);\n\t$self->{'_n'}      = $3;\n\t$expect            = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), P = +([\\d.e-]+)/) {\n\t# blast1 format, n = 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$p                 = $4;\n\n    } elsif($data =~ /Score = +([\\d.e+-]+) \\(([\\d.e+-]+) bits\\), +Expect = +([\\d.e+-]+), +Sum P\\((\\d+)\\) = +([\\d.e-]+)/) {\n\t# blast1 format, n > 1\n\t$self->score($1);\n\t$self->bits($2);\n\t$expect            = $3;\n\t$self->{'_n'}      = $4;\n\t$p                 = $5;\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::Exception',\n\t\t     -text => \"Can't parse score statistics: unrecognized format. ($id_str)\",\n\t\t     -value => $data);\n    }\n    $expect = \"1$expect\" if $expect =~ /^e/i;\n    $p      = \"1$p\"      if defined $p and $p=~ /^e/i;\n\n    $self->{'_expect'} = $expect;\n    $self->{'_p'}      = $p || undef;\n    $self->significance( $p || $expect );\n}\n\n\n=head2 _set_match_stats\n\n Usage     : Private method; called automatically by _set_data()\n Purpose   : Sets various matching statistics obtained from the HSP listing.\n Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)\n           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)\n           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3\n           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus\n Throws    : Exception if the stats cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : The \"Gaps = \" data in the HSP header has a different meaning depending\n           : on the type of Blast: for BLASTP, this number is the total number of\n           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the\n           : query sequence only. Thus, it is safer to collect the data\n           : separately by examining the actual sequence strings as is done\n           : in _set_seq().\n\nSee Also   : L</_set_data>, L</_set_seq>\n\n\n#--------------------\nsub _set_match_stats {\n#--------------------\n    my ($self, $data) = @_;\n\n    if($data =~ m!Identities = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numIdentical'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Positives = (\\d+)/(\\d+)!) {\n      # blast1 or 2 format\n      $self->{'_numConserved'} = $1;\n      $self->{'_totalLength'}  = $2;\n    }\n\n    if($data =~ m!Frame = ([\\d+-]+)!) {\n      $self->frame($1);\n    }\n\n    # Strand data is not always present in this line.\n    # _set_seq() will also set strand information.\n    if($data =~ m!Strand = (\\w+) / (\\w+)!) {\n\t$self->{'_queryStrand'} = $1;\n\t$self->{'_sbjctStrand'} = $2;\n    }\n\n#    if($data =~ m!Gaps = (\\d+)/(\\d+)!) {\n#\t $self->{'_totalGaps'} = $1;\n#    } else {\n#\t $self->{'_totalGaps'} = 0;\n#    }\n}\n\n\n\n=head2 _set_seq_data\n\n Usage     : called automatically when sequence data is requested.\n Purpose   : Sets the HSP sequence data for both query and sbjct sequences.\n           : Includes: start, stop, length, gaps, and raw sequence.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_match_seq()\n Comments  : Uses raw data stored by _set_data() during object construction.\n           : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as gaps(), _set_residues(),\n           : etc. _set_seq() does the dirty work.\n\nSee Also   : L</_set_seq>\n\n\n#-----------------\nsub _set_seq_data {\n#-----------------\n    my $self = shift;\n\n    $self->_set_seq('query', @{$self->{'_queryList'}});\n    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});\n\n    # Liberate some memory.\n    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();\n    undef $self->{'_queryList'};\n    undef $self->{'_sbjctList'};\n\n    $self->{'_set_seq_data'} = 1;\n}\n\n\n\n=head2 _set_seq \n\n Usage     : called automatically by _set_seq_data()\n           : $hsp_obj->($seq_type, @data);\n Purpose   : Sets sequence information for both the query and sbjct sequences.\n           : Directly counts the number of gaps in each sequence (if gapped Blast).\n Argument  : $seq_type = 'query' or 'sbjct'\n           : @data = all seq lines with the form:\n           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120\n Throws    : Exception if data strings cannot be parsed, probably due to a change\n           : in the Blast report format.\n Comments  : Uses first argument to determine which data members to set\n           : making this method sensitive data member name changes.\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs\n           : for TBLASTN/X hits on the minus strand. Normalization facilitates use\n           : of range information by methods such as match().\n\nSee Also   : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>\n\n\n#-------------\nsub _set_seq {\n#-------------\n    my $self      = shift;\n    my $seqType   = shift;\n    my @data      = @_;\n    my @ranges    = ();\n    my @sequence  = ();\n    my $numGaps   = 0;\n\n    foreach( @data ) {\n        if( m/(\\d+) *([^\\d\\s]+) *(\\d+)/) {\n            push @ranges, ( $1, $3 ) ;\n            push @sequence, $2;\n        #print STDERR \"_set_seq found sequence \\\"$2\\\"\\n\";\n\t} else {\n\t    $self->warn(\"Bad sequence data: $_\");\n\t}\n    }\n\n    if( !(scalar(@sequence) and scalar(@ranges))) {\n        my $id_str = $self->_id_str;\n\t$self->throw(\"Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType\");\n   }\n\n    # Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n    $self->{$seqType.'Start'} = $ranges[0];\n    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];\n    $self->{$seqType.'Seq'}   = \\@sequence;\n\n    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;\n\n    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences\n    # Converting nucl coords to amino acid coords.\n\n    my $prog = $self->algorithm;\n    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {\n\t$self->{$seqType.'Length'} /= 3;\n    } elsif($prog eq 'TBLASTX') {\n\t$self->{$seqType.'Length'} /= 3;\n    }\n\n    if( $prog ne 'BLASTP' ) {\n        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;\n        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');\n        # Normalize sequence endpoints so that start < end.\n        # Reverse complement or 'minus strand' HSPs get flipped here.\n        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {\n            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =\n                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});\n            $self->{$seqType.'Strand'} = 'Minus';\n        }\n    }\n\n    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.\n#    if($self->{'_gapped'}) {\n\tmy $seqstr = join('', @sequence);\n\t$seqstr =~ s/\\s//g;\n        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};\n\t$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;\n#    }\n}\n\n\n=head2 _set_residues\n\n Usage     : called automatically when residue data is requested.\n Purpose   : Sets the residue numbers representing the identical and\n           : conserved positions. These data are obtained by analyzing the\n           : symbols between query and sbjct lines of the alignments.\n Argument  : n/a\n Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().\n Comments  : These data are not always needed, so it is conditionally\n           : executed only upon demand by methods such as seq_inds().\n           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).\n\nSee Also   : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>\n\n\n#------------------\nsub _set_residues {\n#------------------\n    my $self      = shift;\n    my @sequence  = ();\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    # Using hashes to avoid saving duplicate residue numbers.\n    my %identicalList_query = ();\n    my %identicalList_sbjct = ();\n    my %conservedList_query = ();\n    my %conservedList_sbjct = ();\n\n    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};\n    $aref  ||= $self->{'_matchSeq'};\n    my $seqString = join('', @$aref );\n\n    my $qseq = join('',@{$self->{'_querySeq'}});\n    my $sseq = join('',@{$self->{'_sbjctSeq'}});\n    my $resCount_query = $self->{'_queryStop'} || 0;\n    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;\n\n    my $prog = $self->algorithm;\n    if($prog !~ /^BLASTP|^BLASTN/) {\n\tif($prog eq 'TBLASTN') {\n\t    $resCount_sbjct /= 3;\n\t} elsif($prog eq 'BLASTX') {\n\t    $resCount_query /= 3;\n\t} elsif($prog eq 'TBLASTX') {\n\t    $resCount_query /= 3;\n\t    $resCount_sbjct /= 3;\n\t}\n    }\n\n    my ($mchar, $schar, $qchar);\n    while( $mchar = chop($seqString) ) {\n\t($qchar, $schar) = (chop($qseq), chop($sseq));\n\tif( $mchar eq '+' ) {\n\t    $conservedList_query{ $resCount_query } = 1;\n\t    $conservedList_sbjct{ $resCount_sbjct } = 1;\n\t} elsif( $mchar ne ' ' ) {\n\t    $identicalList_query{ $resCount_query } = 1;\n\t    $identicalList_sbjct{ $resCount_sbjct } = 1;\n\t}\n\t$resCount_query-- if $qchar ne $GAP_SYMBOL;\n\t$resCount_sbjct-- if $schar ne $GAP_SYMBOL;\n    }\n    $self->{'_identicalRes_query'} = \\%identicalList_query;\n    $self->{'_conservedRes_query'} = \\%conservedList_query;\n    $self->{'_identicalRes_sbjct'} = \\%identicalList_sbjct;\n    $self->{'_conservedRes_sbjct'} = \\%conservedList_sbjct;\n\n}\n\n=head2 _set_match_seq\n\n Usage     : $hsp_obj->_set_match_seq()\n Purpose   : Set the 'match' sequence for the current HSP (symbols in between\n           : the query and sbjct lines.)\n Returns   : Array reference holding the match sequences lines.\n Argument  : n/a\n Throws    : Exception if the _matchList field is not set.\n Comments  : The match information is not always necessary. This method\n           : allows it to be conditionally prepared.\n           : Called by _set_residues>() and seq_str().\n\nSee Also   : L</_set_residues>, L</seq_str>\n\n\n#-------------------\nsub _set_match_seq {\n#-------------------\n    my $self = shift;\n\n    if( ! ref($self->{'_matchList'}) ) {\n        my $id_str = $self->_id_str;\n        $self->throw(\"Can't set HSP match sequence: No data ($id_str)\");\n    }\n\n    my @data = @{$self->{'_matchList'}};\n\n    my(@sequence);\n    foreach( @data ) {\n\tchomp($_);\n\t## Remove leading spaces; (note: aln may begin with a space\n\t## which is why we can't use s/^ +//).\n\ts/^ {$self->{'_match_indent'}}//;\n\tpush @sequence, $_;\n    }\n    # Liberate some memory.\n    @{$self->{'_matchList'}} = undef;\n    $self->{'_matchList'} = undef;\n\n    $self->{'_matchSeq'} = \\@sequence;\n\n    return $self->{'_matchSeq'};\n}\n\n\n=head2 n\n\n Usage     : $hsp_obj->n()\n Purpose   : Get the N value (num HSPs on which P/Expect is based).\n           : This value is not defined with NCBI Blast2 with gapping.\n Returns   : Integer or null string if not defined.\n Argument  : n/a\n Throws    : n/a\n Comments  : The 'N' value is listed in parenthesis with P/Expect value:\n           : e.g., P(3) = 1.2e-30  ---> (N = 3).\n           : Not defined in NCBI Blast2 with gaps.\n           : This typically is equal to the number of HSPs but not always.\n           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().\n\nSee Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>\n\n\n#-----\nsub n { my $self = shift; $self->{'N'} || ''; }\n#-----\n\n\n=head2 matches\n\n Usage     : $hsp->matches([seq_type], [start], [stop]);\n Purpose   : Get the total number of identical and conservative matches\n           : in the query or sbjct sequence for the given HSP. Optionally can\n           : report data within a defined interval along the seq.\n           : (Note: 'conservative' matches are called 'positives' in the\n\t   : Blast report.)\n Example   : ($id,$cons) = $hsp_object->matches('hit');\n           : ($id,$cons) = $hsp_object->matches('query',300,400);\n Returns   : 2-element array of integers\n Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : (2) start = Starting coordinate (optional)\n           : (3) stop  = Ending coordinate (optional)\n Throws    : Exception if the supplied coordinates are out of range.\n Comments  : Relies on seq_str('match') to get the string of alignment symbols\n           : between the query and sbjct lines which are used for determining\n           : the number of identical and conservative matches.\n\nSee Also   : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>\n\n\n#-----------\nsub matches {\n#-----------\n    my( $self, %param ) = @_;\n    my(@data);\n    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    my($start,$stop);\n\n    if(!defined $beg && !defined $end) {\n\t## Get data for the whole alignment.\n\tpush @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});\n    } else {\n\t## Get the substring representing the desired sub-section of aln.\n\t$beg ||= 0;\n\t$end ||= 0;\n\t($start,$stop) = $self->range($seqType);\n\tif($beg == 0) { $beg = $start; $end = $beg+$end; }\n\telsif($end == 0) { $end = $stop; $beg = $end-$beg; }\n\n\tif($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)\n\telse { $end += 1;}   ##ML moved from commented position below, makes\n                             ##more sense here\n#\tif($end > $stop) { $end = $stop; }\n\tif($beg < $start) { $beg = $start; }\n#\telse { $end += 1;}\n\n#\tmy $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));\n\n\t## ML: START fix for substr out of range error ------------------\n\tmy $seq = \"\";\n        my $prog = $self->algorithm;\n\tif (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\n\t} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))\n\t{\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  int(($beg-$start)/3), int(($end-$beg+1)/3));\n\t} else {\n\t    $seq = substr($self->seq_str('match'),\n\t\t\t  $beg-$start, ($end-$beg));\n\t}\n\t## ML: End of fix for  substr out of range error -----------------\n\n\n\t## ML: debugging code\n\t## This is where we get our exception.  Try printing out the values going\n\t## into this:\n\t##\n#\t print STDERR\n#\t     qq(*------------MY EXCEPTION --------------------\\nSeq: \") ,\n#\t     $self->seq_str(\"$seqType\"), qq(\"\\n),$self->rank,\",(  index:\";\n#\t print STDERR  $beg-$start, \", len: \", $end-$beg,\" ), (HSPRealLen:\",\n#\t     CORE::length $self->seq_str(\"$seqType\");\n#\t print STDERR \", HSPCalcLen: \", $stop - $start +1 ,\" ),\n#\t     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\\n\";\n\t ## ML: END DEBUGGING CODE----------\n\n\tif(!CORE::length $seq) {\n            my $id_str = $self->_id_str;\n\t    $self->throw(\"Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)\");\n\t}\n\t## Get data for a substring.\n#\tprintf \"Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\\n%s<---\\n\", $beg, $end, $start, $stop, $seq;\n#\tprintf \"Original match seq:\\n%s\\n\",$self->seq_str('match');\n\t$seq =~ s/ //g;  # remove space (no info).\n\tmy $len_cons = CORE::length $seq;\n\t$seq =~ s/\\+//g;  # remove '+' characters (conservative substitutions)\n\tmy $len_id = CORE::length $seq;\n\tpush @data, ($len_id, $len_cons);\n#\tprintf \"  HSP = %s\\n  id = %d; cons = %d\\n\", $self->rank, $len_id, $len_cons; <STDIN>;\n    }\n    @data;\n}\n\n\n=head2 num_identical\n\n Usage     : $hsp_object->num_identical();\n Purpose   : Get the number of identical positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_identical();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_conserved>, L</frac_identical>\n\n\n#-------------------\nsub num_identical {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numIdentical'};\n}\n\n\n=head2 num_conserved\n\n Usage     : $hsp_object->num_conserved();\n Purpose   : Get the number of conserved positions within the given HSP.\n Example   : $num_iden = $hsp_object->num_conserved();\n Returns   : integer\n Argument  : n/a\n Throws    : n/a\n\nSee Also   : L</num_identical>, L</frac_conserved>\n\n\n#-------------------\nsub num_conserved {\n#-------------------\n    my( $self) = shift;\n\n    $self->{'_numConserved'};\n}\n\n\n\n=head2 range\n\n Usage     : $hsp->range( [seq_type] );\n Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence\n           : in the HSP alignment.\n Example   : ($query_beg, $query_end) = $hsp->range('query');\n           : ($hit_beg, $hit_end) = $hsp->range('hit');\n Returns   : Two-element array of integers\n Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</start>, L</end>\n\n\n#----------\nsub range {\n#----------\n    my ($self, $seqType) = @_;\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n\n    return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});\n}\n\n=head2 start\n\n Usage     : $hsp->start( [seq_type] );\n Purpose   : Gets the start coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_beg = $hsp->start('query');\n           : $hit_beg = $hsp->start('hit');\n           : ($query_beg, $hit_beg) = $hsp->start();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</end>, L</range>\n\n\n#----------\nsub start {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStart'}, $self->{'_sbjctStart'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Start'};\n    }\n}\n\n=head2 end\n\n Usage     : $hsp->end( [seq_type] );\n Purpose   : Gets the end coordinate for the query, sbjct, or both sequences\n           : in the HSP alignment.\n           : NOTE: Start will always be less than end.\n           : To determine strand, use $hsp->strand()\n Example   : $query_end = $hsp->end('query');\n           : $hit_end = $hsp->end('hit');\n           : ($query_end, $hit_end) = $hsp->end();\n Returns   : scalar context: integer\n           : array context without args: list of two integers\n Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')\n           :  ('sbjct' is synonymous with 'hit')\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Throws    : n/a\n\nSee Also   : L</start>, L</range>, L</strand>\n\n\n#----------\nsub end {\n#----------\n    my ($self, $seqType) = @_;\n\n    $seqType ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /list|array/i) {\n\treturn ($self->{'_queryStop'}, $self->{'_sbjctStop'});\n    } else {\n\t## Sensitive to member name changes.\n\t$seqType = \"_\\L$seqType\\E\";\n\treturn $self->{$seqType.'Stop'};\n    }\n}\n\n\n\n=head2 strand\n\n Usage     : $hsp_object->strand( [seq_type] )\n Purpose   : Get the strand of the query or sbjct sequence.\n Example   : print $hsp->strand('query');\n           : ($query_strand, $hit_strand) = $hsp->strand();\n Returns   : -1, 0, or 1\n           : -1 = Minus strand, +1 = Plus strand\n           : Returns 0 if strand is not defined, which occurs\n           : for BLASTP reports, and the query of TBLASTN\n           : as well as the hit if BLASTX reports.\n           : In scalar context without arguments, returns queryStrand value.\n           : In array context without arguments, returns a two-element list\n           :    of strings (queryStrand, sbjctStrand).\n           : Array context can be \"induced\" by providing an argument of 'list' or 'array'.\n Argument  : seq_type: 'query' or 'hit' or 'sbjct' or undef\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : n/a\n\nSee Also   : L</_set_seq>, L</_set_match_stats>\n\n\n#-----------\nsub strand {\n#-----------\n    my( $self, $seqType ) = @_;\n\n    $seqType  ||= (wantarray ? 'list' : 'query');\n    $seqType = 'sbjct' if $seqType eq 'hit';\n\n    ## Sensitive to member name format.\n    $seqType = \"_\\L$seqType\\E\";\n\n    # $seqType could be '_list'.\n    $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    my $prog = $self->algorithm;\n\n    if($seqType  =~ /list|array/i) {\n        my ($qstr, $hstr);\n        if( $prog eq 'BLASTP') {\n            $qstr = 0;\n            $hstr = 0;\n        }\n        elsif( $prog eq 'TBLASTN') {\n            $qstr = 0;\n            $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};\n        }\n        elsif( $prog eq 'BLASTX') {\n            $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};\n            $hstr = 0;\n        }\n        else {\n            $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};\n            $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};\n        }\n        $qstr ||= 0;\n        $hstr ||= 0;\n\treturn ($qstr, $hstr);\n    }\n    local $^W = 0;\n    $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;\n}\n\n\n=head2 seq\n\n Usage     : $hsp->seq( [seq_type] );\n Purpose   : Get the query or sbjct sequence as a Bio::Seq.pm object.\n Example   : $seqObj = $hsp->seq('query');\n Returns   : Object reference for a Bio::Seq.pm object.\n Argument  : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query').\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : Propagates any exception that occurs during construction\n           : of the Bio::Seq.pm object.\n Comments  : The sequence is returned in an array of strings corresponding\n           : to the strings in the original format of the Blast alignment.\n           : (i.e., same spacing).\n\nSee Also   : L</seq_str>, L</seq_inds>, L<Bio::Seq>\n\n\n#-------\nsub seq {\n#-------\n    my($self,$seqType) = @_;\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n    my $str = $self->seq_str($seqType);\n\n    require Bio::Seq;\n\n    Bio::Seq->new(-ID   => $self->to_string,\n\t\t  -SEQ  => $str,\n\t\t  -DESC => \"$seqType sequence\",\n\t\t  );\n}\n\n=head2 seq_str\n\n Usage     : $hsp->seq_str( seq_type );\n Purpose   : Get the full query, sbjct, or 'match' sequence as a string.\n           : The 'match' sequence is the string of symbols in between the\n           : query and sbjct sequences.\n Example   : $str = $hsp->seq_str('query');\n Returns   : String\n Argument  : seq_Type = 'query' or 'hit' or 'sbjct' or 'match'\n           :  ('sbjct' is synonymous with 'hit')\n Throws    : Exception if the argument does not match an accepted seq_type.\n Comments  : Calls _set_seq_data() to set the 'match' sequence if it has\n           : not been set already.\n\nSee Also   : L</seq>, L</seq_inds>, L</_set_match_seq>\n\n\n#------------\nsub seq_str {\n#------------\n    my($self,$seqType) = @_;\n\n    $seqType ||= 'query';\n    $seqType = 'sbjct' if $seqType eq 'hit';\n    ## Sensitive to member name changes.\n    $seqType = \"_\\L$seqType\\E\";\n\n    $self->_set_seq_data() unless $self->{'_set_seq_data'};\n\n    if($seqType =~ /sbjct|query/) {\n\tmy $seq = join('',@{$self->{$seqType.'Seq'}});\n\t$seq =~ s/\\s+//g;\n\treturn $seq;\n\n    } elsif( $seqType =~ /match/i) {\n\t# Only need to call _set_match_seq() if the match seq is requested.\n\tmy $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};\n\t$aref =  $self->{'_matchSeq'};\n\n\treturn join('',@$aref);\n\n    } else {\n        my $id_str = $self->_id_str;\n\t$self->throw(-class => 'Bio::Root::BadParameter',\n\t\t     -text => \"Invalid or undefined sequence type: $seqType ($id_str)\\n\" .\n\t\t               \"Valid types: query, sbjct, match\",\n\t\t     -value => $seqType);\n    }\n}\n\n=head2 seq_inds\n\n Usage     : $hsp->seq_inds( seq_type, class, collapse );\n Purpose   : Get a list of residue positions (indices) for all identical\n           : or conserved residues in the query or sbjct sequence.\n Example   : @s_ind = $hsp->seq_inds('query', 'identical');\n           : @h_ind = $hsp->seq_inds('hit', 'conserved');\n           : @h_ind = $hsp->seq_inds('hit', 'conserved', 1);\n Returns   : List of integers\n           : May include ranges if collapse is true.\n Argument  : seq_type  = 'query' or 'hit' or 'sbjct'  (default = query)\n           :  ('sbjct' is synonymous with 'hit')\n           : class     = 'identical' or 'conserved' (default = identical)\n           :              (can be shortened to 'id' or 'cons')\n           :              (actually, anything not 'id' will evaluate to 'conserved').\n           : collapse  = boolean, if true, consecutive positions are merged\n           :             using a range notation, e.g., \"1 2 3 4 5 7 9 10 11\"\n           :             collapses to \"1-5 7 9-11\". This is useful for\n           :             consolidating long lists. Default = no collapse.\n Throws    : n/a.\n Comments  : Calls _set_residues() to set the 'match' sequence if it has\n           : not been set already.\n\nSee Also   : L</seq>, L</_set_residues>, L<Bio::Search::BlastUtils::collapse_nums()|Bio::Search::BlastUtils>, L<Bio::Search::Hit::BlastHit::seq_inds()|Bio::Search::Hit::BlastHit>","parameters":[{"label":"$self"},{"label":"$seqType"},{"label":"$class"},{"label":"$collapse"}]},"containerName":"main::","definition":"sub"},{"line":1613,"kind":13,"containerName":null,"name":"$collapse"},{"containerName":null,"name":"$collapse","line":1615,"kind":13},{"line":1615,"kind":12,"name":"Bio","containerName":"Search::BlastUtils"},{"name":"Bio","containerName":"Search::BlastUtils::collapse_nums","kind":12,"line":1615},{"name":"@ary","containerName":null,"line":1615,"kind":13},{"name":"@ary","containerName":null,"line":1615,"kind":13},{"containerName":"main::","name":"get_aln","definition":"sub","range":{"end":{"character":9999,"line":1661},"start":{"character":0,"line":1639}},"kind":12,"line":1639,"children":[{"localvar":"my","containerName":"get_aln","definition":"my","name":"$self","line":1641,"kind":13},{"localvar":"my","containerName":"get_aln","definition":"my","name":"$qseq","line":1645,"kind":13},{"name":"$self","containerName":"get_aln","line":1645,"kind":13},{"containerName":"get_aln","name":"seq","line":1645,"kind":12},{"kind":13,"line":1646,"containerName":"get_aln","name":"$sseq","definition":"my","localvar":"my"},{"containerName":"get_aln","name":"$self","kind":13,"line":1646},{"name":"seq","containerName":"get_aln","line":1646,"kind":12},{"containerName":"get_aln","name":"$type","definition":"my","localvar":"my","kind":13,"line":1648},{"kind":13,"line":1648,"containerName":"get_aln","name":"$self"},{"kind":12,"line":1648,"name":"algorithm","containerName":"get_aln"},{"localvar":"my","name":"$aln","definition":"my","containerName":"get_aln","line":1649,"kind":13},{"containerName":"get_aln","name":"new","line":1649,"kind":12},{"containerName":"get_aln","name":"$aln","line":1650,"kind":13},{"kind":12,"line":1650,"name":"add_seq","containerName":"get_aln"},{"containerName":"get_aln","name":"new","line":1650,"kind":12},{"line":1650,"kind":13,"name":"$qseq","containerName":"get_aln"},{"line":1650,"kind":12,"name":"seq","containerName":"get_aln"},{"name":"$qseq","containerName":"get_aln","line":1651,"kind":13},{"line":1651,"kind":12,"name":"display_id","containerName":"get_aln"},{"line":1653,"kind":13,"containerName":"get_aln","name":"$qseq"},{"containerName":"get_aln","name":"$aln","kind":13,"line":1655},{"containerName":"get_aln","name":"add_seq","kind":12,"line":1655},{"line":1655,"kind":12,"name":"new","containerName":"get_aln"},{"name":"$sseq","containerName":"get_aln","kind":13,"line":1655},{"containerName":"get_aln","name":"seq","line":1655,"kind":12},{"containerName":"get_aln","name":"$sseq","line":1656,"kind":13},{"line":1656,"kind":12,"name":"display_id","containerName":"get_aln"},{"kind":13,"line":1658,"containerName":"get_aln","name":"$sseq"},{"containerName":"get_aln","name":"$aln","kind":13,"line":1660}]},{"containerName":"SimpleAlign::Bio::LocatableSeq","name":"Bio","kind":12,"line":1645},{"kind":12,"line":1649,"containerName":"SimpleAlign","name":"Bio"},{"containerName":"LocatableSeq","name":"Bio","kind":12,"line":1650},{"name":"CORE","containerName":"length","kind":12,"line":1653},{"name":"Bio","containerName":"LocatableSeq","line":1655,"kind":12},{"name":"CORE","containerName":"length","line":1658,"kind":12}],"version":5}