{"version":5,"vars":[{"name":"vars","containerName":"","line":121,"kind":2},{"containerName":"","name":"base","line":132,"kind":2},{"name":"%FTQUAL_NO_QUOTE","containerName":null,"line":134,"kind":13},{"containerName":"main::","definition":"sub","detail":"($self,@args)","signature":{"label":"_initialize($self,@args)","documentation":"","parameters":[{"label":"$self"},{"label":"@args"}]},"kind":12,"line":152,"children":[{"kind":13,"line":153,"name":"$self","definition":"my","containerName":"_initialize","localvar":"my"},{"name":"@args","containerName":"_initialize","kind":13,"line":153},{"containerName":"_initialize","name":"$self","line":155,"kind":13},{"containerName":"_initialize","name":"@args","line":155,"kind":13},{"name":"$self","containerName":"_initialize","line":157,"kind":13},{"line":159,"kind":13,"containerName":"_initialize","name":"$self"},{"line":159,"kind":12,"name":"_show_dna","containerName":"_initialize"},{"line":160,"kind":13,"name":"$self","containerName":"_initialize"},{"line":160,"kind":12,"containerName":"_initialize","name":"sequence_factory"},{"kind":13,"line":161,"containerName":"_initialize","name":"$self"},{"line":161,"kind":12,"name":"sequence_factory","containerName":"_initialize"},{"name":"new","containerName":"_initialize","kind":12,"line":162},{"kind":13,"line":162,"name":"$self","containerName":"_initialize"},{"line":162,"kind":12,"containerName":"_initialize","name":"verbose"}],"name":"_initialize","range":{"start":{"line":152,"character":0},"end":{"character":9999,"line":165}}},{"kind":12,"line":155,"containerName":"_initialize","name":"SUPER"},{"name":"Bio","containerName":"Seq::SeqFactory","kind":12,"line":161},{"containerName":"main::","definition":"sub","detail":"($self,@args)","signature":{"documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :","parameters":[{"label":"$self"},{"label":"@args"}],"label":"next_seq($self,@args)"},"kind":12,"line":177,"children":[{"definition":"my","name":"$self","containerName":"next_seq","localvar":"my","kind":13,"line":178},{"kind":13,"line":178,"containerName":"next_seq","name":"@args"},{"localvar":"my","containerName":"next_seq","definition":"my","name":"$pseq","line":179,"kind":13},{"line":179,"kind":13,"name":"$c","containerName":"next_seq"},{"line":179,"kind":13,"name":"$line","containerName":"next_seq"},{"containerName":"next_seq","name":"$name","kind":13,"line":179},{"line":179,"kind":13,"name":"$desc","containerName":"next_seq"},{"containerName":"next_seq","name":"$acc","line":179,"kind":13},{"name":"$seqc","containerName":"next_seq","kind":13,"line":179},{"containerName":"next_seq","name":"$mol","kind":13,"line":179},{"line":179,"kind":13,"name":"$div","containerName":"next_seq"},{"kind":13,"line":180,"name":"$date","containerName":"next_seq"},{"name":"$comment","containerName":"next_seq","kind":13,"line":180},{"line":180,"kind":13,"containerName":"next_seq","name":"@date_arr"},{"name":"$annotation","definition":"my","containerName":"next_seq","localvar":"my","kind":13,"line":182},{"kind":13,"line":182,"containerName":"next_seq","name":"%params"},{"line":182,"kind":13,"containerName":"next_seq","name":"@features"},{"containerName":"next_seq","name":"new","kind":12,"line":183},{"kind":13,"line":185,"containerName":"next_seq","name":"$line"},{"kind":13,"line":185,"containerName":"next_seq","name":"$self"},{"name":"_readline","containerName":"next_seq","kind":12,"line":185},{"line":188,"kind":13,"containerName":"next_seq","name":"$line"},{"kind":13,"line":192,"name":"$line","containerName":"next_seq"},{"name":"$line","containerName":"next_seq","line":193,"kind":13},{"kind":13,"line":193,"containerName":"next_seq","name":"$self"},{"line":193,"kind":12,"containerName":"next_seq","name":"_readline"},{"containerName":"next_seq","name":"$line","line":194,"kind":13},{"name":"$line","containerName":"next_seq","kind":13,"line":198},{"name":"$self","containerName":"next_seq","line":202,"kind":13},{"kind":12,"line":202,"containerName":"next_seq","name":"throw"},{"containerName":"next_seq","name":"$line","line":203,"kind":13},{"name":"$alphabet","definition":"my","containerName":"next_seq","localvar":"my","kind":13,"line":206},{"line":207,"kind":13,"containerName":"next_seq","name":"$line"},{"localvar":"my","name":"$topology","definition":"my","containerName":"next_seq","line":210,"kind":13},{"line":211,"kind":13,"localvar":"my","definition":"my","name":"$sv","containerName":"next_seq"},{"kind":13,"line":215,"name":"$line","containerName":"next_seq"},{"kind":13,"line":216,"containerName":"next_seq","name":"$name"},{"containerName":"next_seq","name":"$sv","line":216,"kind":13},{"name":"$topology","containerName":"next_seq","kind":13,"line":216},{"name":"$mol","containerName":"next_seq","kind":13,"line":216},{"kind":13,"line":216,"name":"$div","containerName":"next_seq"},{"containerName":"next_seq","name":"$sv","kind":13,"line":218},{"kind":13,"line":219,"name":"$params","containerName":"next_seq"},{"line":219,"kind":13,"containerName":"next_seq","name":"$sv"},{"kind":13,"line":220,"name":"$params","containerName":"next_seq"},{"name":"$sv","containerName":"next_seq","kind":13,"line":220},{"name":"$topology","containerName":"next_seq","line":223,"kind":13},{"name":"$params","containerName":"next_seq","kind":13,"line":224},{"name":"$mol","containerName":"next_seq","line":227,"kind":13},{"containerName":"next_seq","name":"$mol","kind":13,"line":228},{"name":"$alphabet","containerName":"next_seq","kind":13,"line":229},{"name":"$mol","containerName":"next_seq","line":230,"kind":13},{"containerName":"next_seq","name":"$alphabet","line":231,"kind":13},{"containerName":"next_seq","name":"$mol","kind":13,"line":232},{"kind":13,"line":233,"containerName":"next_seq","name":"$alphabet"},{"containerName":"next_seq","name":"$line","line":239,"kind":13},{"line":240,"kind":13,"containerName":"next_seq","name":"$name"},{"line":240,"kind":13,"name":"$mol","containerName":"next_seq"},{"line":240,"kind":13,"containerName":"next_seq","name":"$div"},{"containerName":"next_seq","name":"$mol","line":243,"kind":13},{"name":"$mol","containerName":"next_seq","line":244,"kind":13},{"kind":13,"line":245,"name":"$params","containerName":"next_seq"},{"name":"$mol","containerName":"next_seq","line":246,"kind":13},{"line":248,"kind":13,"containerName":"next_seq","name":"$mol"},{"name":"$mol","containerName":"next_seq","kind":13,"line":249},{"kind":13,"line":250,"containerName":"next_seq","name":"$alphabet"},{"name":"$mol","containerName":"next_seq","line":251,"kind":13},{"line":252,"kind":13,"name":"$alphabet","containerName":"next_seq"},{"kind":13,"line":253,"name":"$mol","containerName":"next_seq"},{"name":"$alphabet","containerName":"next_seq","line":254,"kind":13},{"line":260,"kind":13,"name":"$name","containerName":"next_seq"},{"containerName":"next_seq","name":"$name","line":260,"kind":13},{"name":"$name","containerName":"next_seq","kind":13,"line":261},{"localvar":"my","definition":"my","name":"$buffer","containerName":"next_seq","line":265,"kind":13},{"kind":13,"line":265,"containerName":"next_seq","name":"$line"},{"containerName":"next_seq","name":"$buffer","line":268,"kind":13},{"kind":13,"line":269,"name":"$buffer","containerName":"next_seq"},{"kind":13,"line":272,"containerName":"next_seq","name":"$self"},{"name":"_pushback","containerName":"next_seq","line":272,"kind":12},{"line":277,"kind":13,"containerName":"next_seq","name":"$desc"},{"containerName":"next_seq","name":"$desc","line":277,"kind":13},{"kind":13,"line":282,"containerName":"next_seq","definition":"my","name":"@accs","localvar":"my"},{"containerName":"next_seq","name":"$params","kind":13,"line":283},{"kind":13,"line":284,"name":"@accs","containerName":"next_seq"},{"name":"$params","containerName":"next_seq","kind":13,"line":284},{"containerName":"next_seq","name":"$params","line":285,"kind":13},{"line":285,"kind":13,"containerName":"next_seq","name":"@accs"},{"line":290,"kind":13,"localvar":"my","containerName":"next_seq","name":"$sv","definition":"my"},{"kind":13,"line":292,"containerName":"next_seq","name":"$params"},{"containerName":"next_seq","name":"$sv","line":292,"kind":13},{"kind":13,"line":293,"containerName":"next_seq","name":"$params"},{"kind":13,"line":293,"containerName":"next_seq","name":"$sv"},{"localvar":"my","name":"$line","definition":"my","containerName":"next_seq","line":298,"kind":13},{"kind":13,"line":299,"containerName":"next_seq","definition":"my","name":"$date","localvar":"my"},{"kind":13,"line":299,"containerName":"next_seq","name":"$version"},{"kind":13,"line":299,"name":"$line","containerName":"next_seq"},{"name":"$date","containerName":"next_seq","line":300,"kind":13},{"line":301,"kind":13,"containerName":"next_seq","name":"$version"},{"kind":13,"line":302,"containerName":"next_seq","definition":"my","name":"$release","localvar":"my"},{"containerName":"next_seq","name":"new","kind":12,"line":302},{"line":306,"kind":13,"containerName":"next_seq","name":"$annotation"},{"kind":12,"line":306,"name":"add_Annotation","containerName":"next_seq"},{"kind":13,"line":306,"containerName":"next_seq","name":"$release"},{"kind":13,"line":307,"name":"$version","containerName":"next_seq"},{"containerName":"next_seq","definition":"my","name":"$release","localvar":"my","kind":13,"line":308},{"name":"new","containerName":"next_seq","line":308,"kind":12},{"containerName":"next_seq","name":"$annotation","kind":13,"line":312},{"name":"add_Annotation","containerName":"next_seq","line":312,"kind":12},{"name":"$release","containerName":"next_seq","kind":13,"line":312},{"line":314,"kind":13,"localvar":"my","definition":"my","name":"$update","containerName":"next_seq"},{"line":314,"kind":12,"containerName":"next_seq","name":"new"},{"line":318,"kind":13,"name":"$annotation","containerName":"next_seq"},{"containerName":"next_seq","name":"add_Annotation","line":318,"kind":12},{"name":"$update","containerName":"next_seq","kind":13,"line":318},{"line":320,"kind":13,"name":"$params","containerName":"next_seq"},{"containerName":"next_seq","name":"$date","line":320,"kind":13}],"name":"next_seq","range":{"end":{"line":321,"character":9999},"start":{"character":0,"line":177}}},{"name":"Bio","containerName":"Annotation::Collection","line":183,"kind":12},{"name":"BEFORE_FEATURE_TABLE","kind":12,"line":267},{"name":"Bio","containerName":"Annotation::SimpleValue","kind":12,"line":302},{"line":308,"kind":12,"containerName":"Annotation::SimpleValue","name":"Bio"},{"kind":12,"line":314,"containerName":"Annotation::SimpleValue","name":"Bio"},{"line":325,"kind":13,"localvar":"my","definition":"my","name":"@kw","containerName":null},{"line":326,"kind":13,"name":"%params","containerName":null},{"containerName":null,"name":"@kw","kind":13,"line":326},{"localvar":"my","containerName":null,"definition":"my","name":"$species","line":332,"kind":13},{"containerName":null,"name":"$self","line":332,"kind":13},{"kind":12,"line":332,"name":"_read_EMBL_Species","containerName":"main::"},{"kind":13,"line":332,"containerName":null,"name":"$buffer"},{"name":"%params","containerName":null,"line":332,"kind":13},{"containerName":null,"name":"%params","line":333,"kind":13},{"containerName":null,"name":"%species","kind":13,"line":333},{"line":338,"kind":13,"localvar":"my","containerName":null,"name":"@links","definition":"my"},{"line":338,"kind":13,"containerName":null,"name":"$self"},{"line":338,"kind":12,"name":"_read_EMBL_TaxID_DBLink","containerName":"main::"},{"containerName":null,"name":"$buffer","kind":13,"line":338},{"definition":"my","name":"$dblink","containerName":null,"localvar":"my","kind":13,"line":339},{"containerName":null,"name":"@links","kind":13,"line":339},{"kind":13,"line":340,"name":"$annotation","containerName":null},{"line":340,"kind":12,"name":"add_Annotation","containerName":"main::"},{"line":340,"kind":13,"containerName":null,"name":"%dblink"},{"localvar":"my","containerName":null,"name":"@refs","definition":"my","line":346,"kind":13},{"containerName":null,"name":"$self","line":346,"kind":13},{"line":346,"kind":12,"containerName":"main::","name":"_read_EMBL_References"},{"name":"$buffer","containerName":null,"kind":13,"line":346},{"kind":13,"line":347,"containerName":null,"definition":"my","name":"$ref","localvar":"my"},{"containerName":null,"name":"@refs","kind":13,"line":347},{"name":"$annotation","containerName":null,"kind":13,"line":348},{"name":"add_Annotation","containerName":"main::","line":348,"kind":12},{"containerName":null,"name":"%ref","line":348,"kind":13},{"kind":13,"line":354,"name":"@links","definition":"my","containerName":null,"localvar":"my"},{"kind":13,"line":354,"name":"$self","containerName":null},{"containerName":"main::","name":"_read_EMBL_DBLink","kind":12,"line":354},{"name":"$buffer","containerName":null,"line":354,"kind":13},{"definition":"my","name":"$dblink","containerName":null,"localvar":"my","kind":13,"line":355},{"kind":13,"line":355,"name":"@links","containerName":null},{"name":"$annotation","containerName":null,"kind":13,"line":356},{"kind":12,"line":356,"name":"add_Annotation","containerName":"main::"},{"line":356,"kind":13,"name":"%dblink","containerName":null},{"line":362,"kind":13,"containerName":null,"name":"$comment"},{"name":"$comment","containerName":null,"kind":13,"line":363},{"containerName":null,"name":"$self","line":364,"kind":13},{"name":"_readline","containerName":"main::","line":364,"kind":12},{"kind":13,"line":366,"name":"$comment","containerName":null},{"containerName":null,"name":"%comment","kind":13,"line":367},{"containerName":null,"definition":"my","name":"$commobj","localvar":"my","kind":13,"line":372},{"name":"Bio","containerName":"Annotation::Comment","line":372,"kind":12},{"kind":12,"line":372,"containerName":"main::","name":"new"},{"kind":13,"line":373,"containerName":null,"name":"$commobj"},{"name":"text","containerName":"main::","kind":12,"line":373},{"name":"$comment","containerName":null,"kind":13,"line":373},{"line":374,"kind":13,"containerName":null,"name":"$annotation"},{"kind":12,"line":374,"name":"add_Annotation","containerName":"main::"},{"kind":13,"line":374,"name":"$commobj","containerName":null},{"containerName":null,"name":"$comment","kind":13,"line":375},{"containerName":null,"name":"$buffer","kind":13,"line":379},{"line":379,"kind":13,"name":"$self","containerName":null},{"line":379,"kind":12,"name":"_readline","containerName":"main::"},{"name":"$self","containerName":null,"line":382,"kind":13},{"name":"_readline","containerName":"main::","line":382,"kind":12},{"line":387,"kind":13,"name":"$buffer","containerName":null},{"containerName":null,"name":"$buffer","kind":13,"line":389},{"line":389,"kind":13,"containerName":null,"name":"%buffer"},{"kind":13,"line":390,"containerName":null,"name":"%buffer"},{"localvar":"my","containerName":null,"definition":"my","name":"$ftunit","line":391,"kind":13},{"name":"$self","containerName":null,"line":391,"kind":13},{"containerName":"main::","name":"_read_FTHelper_EMBL","kind":12,"line":391},{"line":391,"kind":13,"containerName":null,"name":"$buffer"},{"kind":13,"line":394,"containerName":null,"definition":"my","name":"$feat","localvar":"my"},{"kind":13,"line":395,"containerName":null,"name":"$ftunit"},{"kind":12,"line":395,"containerName":"main::","name":"_generic_seqfeature"},{"containerName":null,"name":"$self","kind":13,"line":395},{"kind":12,"line":395,"containerName":"main::","name":"location_factory"},{"kind":13,"line":395,"containerName":null,"name":"$name"},{"containerName":null,"name":"%params","line":398,"kind":13},{"containerName":null,"name":"$feat","kind":13,"line":398},{"kind":12,"line":398,"name":"primary_tag","containerName":"main::"},{"line":399,"kind":13,"name":"$feat","containerName":null},{"kind":12,"line":399,"containerName":"main::","name":"has_tag"},{"kind":13,"line":400,"containerName":null,"name":"%params"},{"containerName":"main::","name":"ncbi_taxid","line":400,"kind":12},{"localvar":"my","name":"$tagval","definition":"my","containerName":null,"line":401,"kind":13},{"line":401,"kind":13,"containerName":null,"name":"$feat"},{"containerName":"main::","name":"get_tag_values","kind":12,"line":401},{"kind":13,"line":402,"name":"%tagval","containerName":null},{"name":"%params","containerName":null,"kind":13,"line":403},{"kind":12,"line":403,"containerName":"main::","name":"ncbi_taxid"},{"name":"$tagval","containerName":null,"kind":13,"line":403},{"kind":13,"line":410,"name":"@features","containerName":null},{"containerName":null,"name":"$feat","line":410,"kind":13},{"line":412,"kind":13,"containerName":null,"name":"%buffer"},{"line":418,"kind":13,"containerName":null,"name":"$buffer"},{"name":"%buffer","containerName":null,"line":418,"kind":13},{"containerName":null,"name":"$buffer","kind":13,"line":419},{"name":"$self","containerName":null,"kind":13,"line":419},{"containerName":"main::","name":"_readline","kind":12,"line":419},{"line":422,"kind":13,"name":"%buffer","containerName":null},{"name":"%buffer","containerName":null,"kind":13,"line":423},{"kind":13,"line":424,"containerName":null,"definition":"my","name":"$ftunit","localvar":"my"},{"kind":13,"line":424,"containerName":null,"name":"$self"},{"containerName":"main::","name":"_read_FTHelper_EMBL","kind":12,"line":424},{"name":"$buffer","containerName":null,"kind":13,"line":424},{"name":"@features","containerName":null,"line":426,"kind":13},{"name":"$ftunit","containerName":null,"kind":13,"line":427},{"line":427,"kind":12,"containerName":"main::","name":"_generic_seqfeature"},{"line":427,"kind":13,"containerName":null,"name":"$self"},{"line":427,"kind":12,"name":"location_factory","containerName":"main::"},{"containerName":null,"name":"$name","line":428,"kind":13},{"line":430,"kind":13,"name":"%buffer","containerName":null},{"containerName":null,"name":"%buffer","line":435,"kind":13},{"line":436,"kind":13,"containerName":null,"name":"$self"},{"kind":12,"line":436,"name":"_readline","containerName":"main::"},{"name":"$seqc","containerName":null,"line":440,"kind":13},{"line":441,"kind":13,"containerName":null,"name":"$self"},{"name":"_readline","containerName":"main::","kind":12,"line":441},{"kind":13,"line":445,"name":"$seqc","containerName":null},{"containerName":null,"name":"$seq","definition":"my","localvar":"my","kind":13,"line":447},{"name":"$self","containerName":null,"line":447,"kind":13},{"line":447,"kind":12,"name":"sequence_factory","containerName":"main::"},{"name":"create","containerName":"main::","kind":12,"line":448},{"line":448,"kind":13,"containerName":null,"name":"$self"},{"name":"verbose","containerName":"main::","line":448,"kind":12},{"line":449,"kind":13,"name":"$div","containerName":null},{"kind":13,"line":450,"name":"$seqc","containerName":null},{"name":"$desc","containerName":null,"line":451,"kind":13},{"containerName":null,"name":"$name","line":452,"kind":13},{"name":"$annotation","containerName":null,"line":453,"kind":13},{"containerName":null,"name":"$mol","kind":13,"line":454},{"line":455,"kind":13,"containerName":null,"name":"$alphabet"},{"containerName":null,"name":"@features","kind":13,"line":456},{"name":"%params","containerName":null,"kind":13,"line":457},{"containerName":null,"name":"$seq","kind":13,"line":458},{"name":"_write_ID_line","range":{"end":{"character":9999,"line":484},"start":{"line":476,"character":0}},"kind":12,"children":[{"line":478,"kind":13,"localvar":"my","name":"$self","definition":"my","containerName":"_write_ID_line"},{"name":"$seq","containerName":"_write_ID_line","kind":13,"line":478},{"name":"$id_line","definition":"my","containerName":"_write_ID_line","localvar":"my","kind":13,"line":480},{"kind":13,"line":482,"containerName":"_write_ID_line","name":"$self"},{"kind":12,"line":482,"name":"_id_generation_func","containerName":"_write_ID_line"},{"line":483,"kind":13,"containerName":"_write_ID_line","name":"$id_line"},{"kind":13,"line":483,"name":"$self","containerName":"_write_ID_line"},{"line":483,"kind":12,"name":"_id_generation_func","containerName":"_write_ID_line"},{"containerName":"_write_ID_line","name":"$seq","kind":13,"line":483}],"line":476,"definition":"sub","containerName":"main::","signature":{"label":"_write_ID_line($self,$seq)","documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object","parameters":[{"label":"$self"},{"label":"$seq"}]},"detail":"($self,$seq)"},{"line":489,"kind":13,"localvar":"my","containerName":null,"definition":"my","name":"$name"},{"containerName":null,"name":"$seq","line":489,"kind":13},{"name":"accession_number","containerName":"main::","kind":12,"line":489},{"containerName":null,"name":"%name","kind":13,"line":490},{"name":"$name","containerName":null,"line":492,"kind":13},{"name":"$seq","containerName":null,"kind":13,"line":492},{"kind":12,"line":492,"name":"id","containerName":"main::"},{"kind":13,"line":495,"containerName":null,"name":"$self"},{"kind":12,"line":495,"containerName":"main::","name":"warn"},{"containerName":null,"name":"$name","line":495,"kind":13},{"line":495,"kind":13,"containerName":null,"name":"$name"},{"containerName":null,"definition":"my","name":"$version","localvar":"my","kind":13,"line":498},{"kind":13,"line":498,"containerName":null,"name":"$seq"},{"line":498,"kind":12,"containerName":"main::","name":"version"},{"line":500,"kind":13,"localvar":"my","containerName":null,"definition":"my","name":"$len"},{"name":"$seq","containerName":null,"line":500,"kind":13},{"kind":12,"line":500,"name":"length","containerName":"main::"},{"localvar":"my","definition":"my","name":"$div","containerName":null,"line":503,"kind":13},{"line":504,"kind":13,"name":"$seq","containerName":null},{"line":504,"kind":12,"name":"can","containerName":"main::"},{"line":504,"kind":13,"name":"$seq","containerName":null},{"containerName":"main::","name":"division","kind":12,"line":504},{"containerName":null,"name":"$self","kind":13,"line":505},{"containerName":"main::","name":"_is_valid_division","line":505,"kind":12},{"containerName":null,"name":"$seq","kind":13,"line":505},{"name":"division","containerName":"main::","kind":12,"line":505},{"line":506,"kind":13,"name":"$div","containerName":null},{"name":"$seq","containerName":null,"line":506,"kind":13},{"line":506,"kind":12,"containerName":"main::","name":"division"},{"name":"$div","containerName":null,"line":508,"kind":13},{"line":511,"kind":13,"localvar":"my","definition":"my","name":"$mol","containerName":null},{"name":"$seq","containerName":null,"line":513,"kind":13},{"name":"can","containerName":"main::","line":513,"kind":12},{"name":"$seq","containerName":null,"line":514,"kind":13},{"containerName":"main::","name":"molecule","line":514,"kind":12},{"line":515,"kind":13,"containerName":null,"name":"$self"},{"line":515,"kind":12,"name":"_is_valid_molecule_type","containerName":"main::"},{"name":"$seq","containerName":null,"line":515,"kind":13},{"containerName":"main::","name":"molecule","kind":12,"line":515},{"name":"$mol","containerName":null,"line":517,"kind":13},{"containerName":null,"name":"$seq","kind":13,"line":517},{"line":517,"kind":12,"name":"molecule","containerName":"main::"},{"line":520,"kind":13,"name":"$seq","containerName":null},{"kind":12,"line":520,"name":"can","containerName":"main::"},{"kind":13,"line":520,"containerName":null,"name":"$seq"},{"containerName":"main::","name":"primary_seq","kind":12,"line":520},{"containerName":"main::","name":"alphabet","line":520,"kind":12},{"line":521,"kind":13,"localvar":"my","containerName":null,"definition":"my","name":"$alphabet"},{"line":521,"kind":13,"name":"$seq","containerName":null},{"name":"primary_seq","containerName":"main::","kind":12,"line":521},{"line":521,"kind":12,"containerName":"main::","name":"alphabet"},{"kind":13,"line":522,"containerName":null,"name":"%alphabet"},{"kind":13,"line":523,"containerName":null,"name":"$mol"},{"containerName":null,"name":"%alphabet","line":524,"kind":13},{"name":"$mol","containerName":null,"line":525,"kind":13},{"kind":13,"line":526,"name":"%alphabet","containerName":null},{"kind":13,"line":527,"containerName":null,"name":"$self"},{"name":"warn","containerName":"main::","line":527,"kind":12},{"containerName":null,"name":"$mol","line":528,"kind":13},{"kind":13,"line":532,"name":"$topology","definition":"my","containerName":null,"localvar":"my"},{"line":533,"kind":13,"name":"$seq","containerName":null},{"line":533,"kind":12,"name":"is_circular","containerName":"main::"},{"line":534,"kind":13,"containerName":null,"name":"$topology"},{"containerName":null,"name":"$mol","kind":13,"line":537},{"kind":13,"line":538,"containerName":null,"name":"$id_line"},{"line":539,"kind":13,"containerName":null,"name":"$self"},{"containerName":"main::","name":"_print","kind":12,"line":539},{"name":"$id_line","containerName":null,"kind":13,"line":539},{"range":{"end":{"line":574,"character":9999},"start":{"line":553,"character":0}},"name":"_is_valid_division","line":553,"children":[{"localvar":"my","containerName":"_is_valid_division","definition":"my","name":"$self","line":554,"kind":13},{"kind":13,"line":554,"containerName":"_is_valid_division","name":"$division"},{"containerName":"_is_valid_division","name":"%EMBL_divisions","definition":"my","localvar":"my","kind":13,"line":556},{"line":573,"kind":13,"name":"$EMBL_divisions","containerName":"_is_valid_division"},{"name":"$division","containerName":"_is_valid_division","line":573,"kind":13}],"kind":12,"detail":"($self,$division)","signature":{"parameters":[{"label":"$self"},{"label":"$division"}],"documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string","label":"_is_valid_division($self,$division)"},"containerName":"main::","definition":"sub"},{"name":"_is_valid_molecule_type","range":{"start":{"character":0,"line":586},"end":{"line":606,"character":9999}},"kind":12,"line":586,"children":[{"name":"$self","definition":"my","containerName":"_is_valid_molecule_type","localvar":"my","kind":13,"line":587},{"name":"$moltype","containerName":"_is_valid_molecule_type","kind":13,"line":587},{"line":589,"kind":13,"localvar":"my","containerName":"_is_valid_molecule_type","name":"%EMBL_molecule_types","definition":"my"},{"containerName":"_is_valid_molecule_type","name":"$EMBL_molecule_types","kind":13,"line":605},{"kind":13,"line":605,"containerName":"_is_valid_molecule_type","name":"$moltype"}],"containerName":"main::","definition":"sub","detail":"($self,$moltype)","signature":{"label":"_is_valid_molecule_type($self,$moltype)","documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string","parameters":[{"label":"$self"},{"label":"$moltype"}]}},{"children":[{"kind":13,"line":620,"definition":"my","name":"$self","containerName":"write_seq","localvar":"my"},{"containerName":"write_seq","name":"@seqs","line":620,"kind":13},{"kind":13,"line":622,"containerName":"write_seq","definition":"my","name":"$seq","localvar":"my"},{"name":"@seqs","containerName":"write_seq","kind":13,"line":622},{"name":"$self","containerName":"write_seq","kind":13,"line":623},{"kind":12,"line":623,"containerName":"write_seq","name":"throw"},{"kind":13,"line":623,"name":"$seq","containerName":"write_seq"},{"kind":13,"line":624,"containerName":"write_seq","name":"$seq"},{"name":"$seq","containerName":"write_seq","line":624,"kind":13},{"name":"isa","containerName":"write_seq","kind":12,"line":624},{"containerName":"write_seq","name":"$self","line":625,"kind":13},{"containerName":"write_seq","name":"warn","kind":12,"line":625},{"kind":13,"line":626,"containerName":"write_seq","name":"$self"},{"containerName":"write_seq","name":"verbose","kind":12,"line":626},{"name":"$seq","containerName":"write_seq","line":627,"kind":13},{"name":"$seq","containerName":"write_seq","line":627,"kind":13},{"name":"isa","containerName":"write_seq","line":627,"kind":12},{"line":628,"kind":13,"containerName":"write_seq","name":"$self"},{"containerName":"write_seq","name":"throw","line":628,"kind":12},{"containerName":"write_seq","name":"$str","definition":"my","localvar":"my","kind":13,"line":631},{"containerName":"write_seq","name":"$seq","kind":13,"line":631},{"line":631,"kind":12,"name":"seq","containerName":"write_seq"},{"name":"$self","containerName":"write_seq","line":634,"kind":13},{"line":634,"kind":12,"name":"_write_ID_line","containerName":"write_seq"},{"containerName":"write_seq","name":"$seq","kind":13,"line":634},{"localvar":"my","definition":"my","name":"$acc","containerName":"write_seq","line":638,"kind":13},{"localvar":"my","name":"$func","definition":"my","containerName":"write_seq","line":640,"kind":13},{"containerName":"write_seq","name":"$self","line":640,"kind":13},{"kind":12,"line":640,"name":"_ac_generation_func","containerName":"write_seq"},{"containerName":"write_seq","name":"$acc","line":641,"kind":13},{"containerName":"write_seq","name":"$func","kind":13,"line":641},{"name":"$seq","containerName":"write_seq","line":641,"kind":13},{"line":642,"kind":13,"containerName":"write_seq","name":"$seq"},{"name":"isa","containerName":"write_seq","kind":12,"line":642},{"containerName":"write_seq","name":"$seq","line":643,"kind":13},{"name":"accession_number","containerName":"write_seq","kind":12,"line":643},{"line":644,"kind":13,"containerName":"write_seq","name":"$acc"},{"containerName":"write_seq","name":"$seq","line":644,"kind":13},{"name":"accession_number","containerName":"write_seq","line":644,"kind":12},{"kind":13,"line":645,"containerName":"write_seq","name":"$acc"},{"name":"$acc","containerName":"write_seq","kind":13,"line":645},{"containerName":"write_seq","name":"$seq","kind":13,"line":645},{"line":645,"kind":12,"containerName":"write_seq","name":"get_secondary_accessions"},{"containerName":"write_seq","name":"$seq","kind":13,"line":646},{"name":"can","containerName":"write_seq","kind":12,"line":646},{"containerName":"write_seq","name":"$acc","kind":13,"line":647},{"kind":13,"line":647,"name":"$seq","containerName":"write_seq"},{"kind":12,"line":647,"name":"accession_number","containerName":"write_seq"},{"name":"$acc","containerName":"write_seq","line":650,"kind":13},{"containerName":"write_seq","name":"$self","kind":13,"line":651},{"name":"_print","containerName":"write_seq","line":651,"kind":12},{"containerName":"write_seq","name":"$switch","definition":"my","localvar":"my","kind":13,"line":657},{"kind":13,"line":658,"containerName":"write_seq","name":"$seq"},{"name":"can","containerName":"write_seq","line":658,"kind":12},{"kind":13,"line":659,"containerName":"write_seq","definition":"my","name":"@dates","localvar":"my"},{"name":"$seq","containerName":"write_seq","kind":13,"line":659},{"name":"get_dates","containerName":"write_seq","line":659,"kind":12},{"containerName":"write_seq","definition":"my","name":"$ct","localvar":"my","kind":13,"line":660},{"kind":13,"line":661,"definition":"my","name":"$date_flag","containerName":"write_seq","localvar":"my"},{"containerName":"write_seq","name":"$cr","definition":"my","localvar":"my","kind":13,"line":662},{"name":"$seq","containerName":"write_seq","kind":13,"line":662},{"kind":12,"line":662,"containerName":"write_seq","name":"annotation"},{"line":662,"kind":12,"name":"get_Annotations","containerName":"write_seq"},{"line":663,"kind":13,"localvar":"my","name":"$ur","definition":"my","containerName":"write_seq"},{"line":663,"kind":13,"containerName":"write_seq","name":"$seq"},{"line":663,"kind":12,"name":"annotation","containerName":"write_seq"},{"containerName":"write_seq","name":"get_Annotations","line":663,"kind":12},{"line":664,"kind":13,"localvar":"my","definition":"my","name":"$uv","containerName":"write_seq"},{"line":664,"kind":13,"name":"$seq","containerName":"write_seq"},{"kind":12,"line":664,"containerName":"write_seq","name":"annotation"},{"containerName":"write_seq","name":"get_Annotations","line":664,"kind":12},{"name":"$cr","containerName":"write_seq","kind":13,"line":666},{"kind":13,"line":666,"containerName":"write_seq","name":"$ur"},{"kind":13,"line":666,"name":"$ur","containerName":"write_seq"},{"name":"$date_flag","containerName":"write_seq","kind":13,"line":667},{"localvar":"my","definition":"my","name":"$dt","containerName":"write_seq","line":670,"kind":13},{"kind":13,"line":670,"containerName":"write_seq","name":"@dates"},{"containerName":"write_seq","name":"$date_flag","line":671,"kind":13},{"containerName":"write_seq","name":"$self","kind":13,"line":672},{"kind":12,"line":672,"containerName":"write_seq","name":"_write_line_EMBL_regex"},{"kind":13,"line":673,"containerName":"write_seq","name":"$dt"},{"containerName":"write_seq","name":"$ct","line":674,"kind":13},{"containerName":"write_seq","name":"$self","line":675,"kind":13},{"containerName":"write_seq","name":"_write_line_EMBL_regex","line":675,"kind":12},{"line":676,"kind":13,"containerName":"write_seq","name":"$dt"},{"kind":13,"line":677,"containerName":"write_seq","name":"$ct"},{"containerName":"write_seq","name":"$self","line":679,"kind":13},{"line":679,"kind":12,"name":"_write_line_EMBL_regex","containerName":"write_seq"},{"line":680,"kind":13,"name":"$dt","containerName":"write_seq"},{"line":682,"kind":13,"name":"$switch","containerName":"write_seq"},{"containerName":"write_seq","name":"$ct","kind":13,"line":683},{"name":"$switch","containerName":"write_seq","kind":13,"line":685},{"line":686,"kind":13,"containerName":"write_seq","name":"$self"},{"line":686,"kind":12,"containerName":"write_seq","name":"_print"},{"name":"$self","containerName":"write_seq","kind":13,"line":691},{"line":691,"kind":12,"containerName":"write_seq","name":"_write_line_EMBL_regex"},{"line":691,"kind":13,"containerName":"write_seq","name":"$seq"},{"kind":12,"line":691,"name":"desc","containerName":"write_seq"},{"kind":13,"line":692,"containerName":"write_seq","name":"$self"},{"kind":12,"line":692,"name":"_print","containerName":"write_seq"},{"localvar":"my","containerName":"write_seq","definition":"my","name":"$kw","line":696,"kind":13},{"line":697,"kind":13,"localvar":"my","definition":"my","name":"$func","containerName":"write_seq"},{"kind":13,"line":697,"containerName":"write_seq","name":"$self"},{"containerName":"write_seq","name":"_kw_generation_func","line":697,"kind":12},{"containerName":"write_seq","name":"$kw","kind":13,"line":698},{"kind":13,"line":698,"containerName":"write_seq","name":"$func"},{"kind":13,"line":698,"containerName":"write_seq","name":"$seq"},{"line":699,"kind":13,"containerName":"write_seq","name":"$seq"},{"containerName":"write_seq","name":"can","line":699,"kind":12},{"kind":13,"line":700,"containerName":"write_seq","name":"$kw"},{"kind":13,"line":700,"containerName":"write_seq","name":"$seq"},{"name":"keywords","containerName":"write_seq","line":700,"kind":12},{"containerName":"write_seq","name":"$kw","line":702,"kind":13},{"name":"$self","containerName":"write_seq","line":703,"kind":13},{"kind":12,"line":703,"name":"_write_line_EMBL_regex","containerName":"write_seq"},{"line":703,"kind":13,"name":"$kw","containerName":"write_seq"},{"line":704,"kind":13,"containerName":"write_seq","name":"$self"},{"name":"_print","containerName":"write_seq","line":704,"kind":12}],"line":619,"kind":12,"signature":{"parameters":[{"label":"$self"},{"label":"@seqs"}],"documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects","label":"write_seq($self,@seqs)"},"detail":"($self,@seqs)","definition":"sub","containerName":"main::","range":{"end":{"line":706,"character":9999},"start":{"character":0,"line":619}},"name":"write_seq"},{"kind":13,"line":710,"containerName":null,"name":"$seq"},{"kind":12,"line":710,"containerName":"main::","name":"can"},{"localvar":"my","definition":"my","name":"$spec","containerName":null,"line":710,"kind":13},{"kind":13,"line":710,"containerName":null,"name":"$seq"},{"name":"species","containerName":"main::","kind":12,"line":710},{"containerName":null,"definition":"my","name":"@class","localvar":"my","kind":13,"line":711},{"line":711,"kind":13,"name":"$spec","containerName":null},{"name":"classification","containerName":"main::","line":711,"kind":12},{"name":"@class","containerName":null,"line":712,"kind":13},{"localvar":"my","containerName":null,"name":"$OS","definition":"my","line":716,"kind":13},{"name":"$spec","containerName":null,"line":716,"kind":13},{"containerName":"main::","name":"scientific_name","kind":12,"line":716},{"name":"$spec","containerName":null,"kind":13,"line":717},{"kind":12,"line":717,"name":"common_name","containerName":"main::"},{"line":718,"kind":13,"containerName":null,"name":"$OS"},{"kind":13,"line":718,"name":"$spec","containerName":null},{"line":718,"kind":12,"containerName":"main::","name":"common_name"},{"line":720,"kind":13,"containerName":null,"name":"$self"},{"line":720,"kind":12,"containerName":"main::","name":"_print"},{"kind":13,"line":721,"containerName":null,"name":"$OC","definition":"my","localvar":"my"},{"line":721,"kind":13,"containerName":null,"name":"@class"},{"kind":13,"line":722,"containerName":null,"name":"$self"},{"line":722,"kind":12,"name":"_write_line_EMBL_regex","containerName":"main::"},{"containerName":null,"name":"$OC","line":722,"kind":13},{"line":723,"kind":13,"containerName":null,"name":"$spec"},{"name":"organelle","containerName":"main::","line":723,"kind":12},{"name":"$self","containerName":null,"line":724,"kind":13},{"containerName":"main::","name":"_write_line_EMBL_regex","kind":12,"line":724},{"line":724,"kind":13,"containerName":null,"name":"$spec"},{"line":724,"kind":12,"containerName":"main::","name":"organelle"},{"line":726,"kind":13,"containerName":null,"name":"$self"},{"kind":12,"line":726,"containerName":"main::","name":"_print"},{"localvar":"my","containerName":null,"definition":"my","name":"$t","line":730,"kind":13},{"containerName":null,"name":"$seq","line":731,"kind":13},{"kind":12,"line":731,"containerName":"main::","name":"can"},{"containerName":null,"name":"$seq","line":731,"kind":13},{"line":731,"kind":12,"containerName":"main::","name":"annotation"},{"definition":"my","name":"$ref","containerName":null,"localvar":"my","kind":13,"line":732},{"name":"$seq","containerName":null,"line":732,"kind":13},{"name":"annotation","containerName":"main::","line":732,"kind":12},{"kind":12,"line":732,"name":"get_Annotations","containerName":"main::"},{"kind":13,"line":733,"containerName":null,"name":"$self"},{"line":733,"kind":12,"containerName":"main::","name":"_print"},{"kind":13,"line":737,"containerName":null,"name":"$ref"},{"line":737,"kind":12,"containerName":"main::","name":"comment"},{"kind":13,"line":738,"containerName":null,"name":"$self"},{"name":"_write_line_EMBL_regex","containerName":"main::","kind":12,"line":738},{"name":"$ref","containerName":null,"kind":13,"line":738},{"name":"comment","containerName":"main::","line":738,"kind":12},{"kind":13,"line":740,"containerName":null,"definition":"my","name":"$start","localvar":"my"},{"line":740,"kind":13,"containerName":null,"name":"$ref"},{"kind":12,"line":740,"name":"start","containerName":"main::"},{"line":741,"kind":13,"localvar":"my","containerName":null,"name":"$end","definition":"my"},{"line":741,"kind":13,"containerName":null,"name":"$ref"},{"name":"end","containerName":"main::","line":741,"kind":12},{"containerName":null,"name":"$start","line":742,"kind":13},{"kind":13,"line":742,"containerName":null,"name":"%end"},{"line":743,"kind":13,"name":"$self","containerName":null},{"kind":12,"line":743,"name":"_print","containerName":"main::"},{"kind":13,"line":744,"name":"$start","containerName":null},{"containerName":null,"name":"%end","kind":13,"line":744},{"line":745,"kind":13,"containerName":null,"name":"$self"},{"kind":12,"line":745,"containerName":"main::","name":"throw"},{"kind":13,"line":749,"definition":"my","name":"$med","containerName":null,"localvar":"my"},{"name":"$ref","containerName":null,"line":749,"kind":13},{"line":749,"kind":12,"name":"medline","containerName":"main::"},{"name":"$self","containerName":null,"kind":13,"line":750},{"containerName":"main::","name":"_print","kind":12,"line":750},{"kind":13,"line":752,"definition":"my","name":"$pm","containerName":null,"localvar":"my"},{"line":752,"kind":13,"name":"$ref","containerName":null},{"kind":12,"line":752,"name":"pubmed","containerName":"main::"},{"name":"$self","containerName":null,"line":753,"kind":13},{"line":753,"kind":12,"name":"_print","containerName":"main::"},{"kind":13,"line":755,"containerName":null,"name":"$authors","definition":"my","localvar":"my"},{"kind":13,"line":755,"containerName":null,"name":"$ref"},{"line":755,"kind":12,"name":"authors","containerName":"main::"},{"name":"$authors","containerName":null,"kind":13,"line":756},{"kind":13,"line":758,"name":"$self","containerName":null},{"name":"_write_line_EMBL_regex","containerName":"main::","kind":12,"line":758},{"line":759,"kind":13,"containerName":null,"name":"$authors"},{"kind":13,"line":765,"containerName":null,"name":"$ref_title","definition":"my","localvar":"my"},{"name":"$ref","containerName":null,"line":765,"kind":13},{"line":765,"kind":12,"containerName":"main::","name":"title"},{"line":766,"kind":13,"name":"$ref_title","containerName":null},{"containerName":null,"name":"$self","kind":13,"line":767},{"containerName":"main::","name":"_write_line_EMBL_regex","kind":12,"line":767},{"name":"$ref_title","containerName":null,"line":767,"kind":13},{"kind":13,"line":768,"name":"$self","containerName":null},{"name":"_write_line_EMBL_regex","containerName":"main::","line":768,"kind":12},{"kind":13,"line":768,"containerName":null,"name":"$ref"},{"containerName":"main::","name":"location","kind":12,"line":768},{"line":769,"kind":13,"name":"$self","containerName":null},{"line":769,"kind":12,"containerName":"main::","name":"_print"},{"name":"$t","containerName":null,"kind":13,"line":770},{"kind":13,"line":774,"definition":"my","name":"@db_xref","containerName":null,"localvar":"my"},{"line":774,"kind":13,"name":"$seq","containerName":null},{"line":774,"kind":12,"name":"annotation","containerName":"main::"},{"name":"get_Annotations","containerName":"main::","line":774,"kind":12},{"localvar":"my","containerName":null,"name":"$dr","definition":"my","line":775,"kind":13},{"name":"@db_xref","containerName":null,"kind":13,"line":775},{"kind":13,"line":776,"definition":"my","name":"$db_name","containerName":null,"localvar":"my"},{"containerName":null,"name":"$dr","kind":13,"line":776},{"name":"database","containerName":"main::","line":776,"kind":12},{"kind":13,"line":777,"name":"$prim","definition":"my","containerName":null,"localvar":"my"},{"line":777,"kind":13,"name":"$dr","containerName":null},{"kind":12,"line":777,"containerName":"main::","name":"primary_id"},{"line":779,"kind":13,"localvar":"my","name":"$opt","definition":"my","containerName":null},{"kind":13,"line":779,"name":"$dr","containerName":null},{"name":"optional_id","containerName":"main::","kind":12,"line":779},{"containerName":null,"definition":"my","name":"$line","localvar":"my","kind":13,"line":780},{"name":"$opt","containerName":null,"kind":13,"line":780},{"containerName":null,"name":"$self","line":781,"kind":13},{"containerName":"main::","name":"_write_line_EMBL_regex","kind":12,"line":781},{"kind":13,"line":781,"name":"$line","containerName":null},{"line":783,"kind":13,"name":"$self","containerName":null},{"line":783,"kind":12,"name":"_print","containerName":"main::"},{"localvar":"my","containerName":null,"name":"$comment","definition":"my","line":787,"kind":13},{"kind":13,"line":787,"name":"$seq","containerName":null},{"name":"annotation","containerName":"main::","kind":12,"line":787},{"containerName":"main::","name":"get_Annotations","line":787,"kind":12},{"line":788,"kind":13,"name":"$self","containerName":null},{"containerName":"main::","name":"_write_line_EMBL_regex","line":788,"kind":12},{"containerName":null,"name":"$comment","kind":13,"line":788},{"line":788,"kind":12,"containerName":"main::","name":"text"},{"containerName":null,"name":"$self","kind":13,"line":789},{"containerName":"main::","name":"_print","kind":12,"line":789},{"containerName":null,"name":"$self","kind":13,"line":796},{"line":796,"kind":12,"containerName":"main::","name":"_print"},{"containerName":null,"name":"$self","kind":13,"line":797},{"name":"_print","containerName":"main::","kind":12,"line":797},{"line":799,"kind":13,"localvar":"my","containerName":null,"definition":"my","name":"@feats"},{"kind":13,"line":799,"containerName":null,"name":"$seq"},{"kind":12,"line":799,"name":"can","containerName":"main::"},{"containerName":null,"name":"$seq","kind":13,"line":799},{"containerName":"main::","name":"top_SeqFeatures","kind":12,"line":799},{"containerName":null,"name":"@feats","kind":13,"line":800},{"name":"$self","containerName":null,"line":801,"kind":13},{"kind":12,"line":801,"containerName":"main::","name":"_post_sort"},{"kind":13,"line":805,"containerName":null,"definition":"my","name":"$post_sort_func","localvar":"my"},{"line":805,"kind":13,"containerName":null,"name":"$self"},{"line":805,"kind":12,"name":"_post_sort","containerName":"main::"},{"definition":"my","name":"@fth","containerName":null,"localvar":"my","kind":13,"line":806},{"line":808,"kind":13,"localvar":"my","containerName":null,"definition":"my","name":"$sf"},{"kind":13,"line":808,"name":"@feats","containerName":null},{"containerName":null,"name":"@fth","kind":13,"line":809},{"name":"Bio","containerName":"SeqIO::FTHelper::from_SeqFeature","kind":12,"line":809},{"kind":13,"line":809,"containerName":null,"name":"$sf"},{"line":809,"kind":13,"containerName":null,"name":"$seq"},{"line":812,"kind":13,"name":"@fth","containerName":null},{"name":"post_sort_func","line":812,"kind":12},{"name":"$a","containerName":null,"kind":13,"line":812},{"name":"$b","containerName":null,"line":812,"kind":13},{"line":812,"kind":13,"containerName":null,"name":"@fth"},{"definition":"my","name":"$fth","containerName":null,"localvar":"my","kind":13,"line":814},{"name":"@fth","containerName":null,"kind":13,"line":814},{"kind":13,"line":815,"containerName":null,"name":"$self"},{"name":"_print_EMBL_FTHelper","containerName":"main::","line":815,"kind":12},{"line":815,"kind":13,"containerName":null,"name":"%fth"},{"localvar":"my","containerName":null,"definition":"my","name":"$sf","line":821,"kind":13},{"containerName":null,"name":"@feats","line":821,"kind":13},{"line":822,"kind":13,"localvar":"my","definition":"my","name":"@fth","containerName":null},{"name":"Bio","containerName":"SeqIO::FTHelper::from_SeqFeature","kind":12,"line":822},{"line":822,"kind":13,"name":"$sf","containerName":null},{"name":"$seq","containerName":null,"line":822,"kind":13},{"localvar":"my","containerName":null,"name":"$fth","definition":"my","line":823,"kind":13},{"kind":13,"line":823,"name":"@fth","containerName":null},{"containerName":null,"name":"$fth","line":824,"kind":13},{"kind":12,"line":824,"containerName":"main::","name":"key"},{"kind":13,"line":825,"name":"$self","containerName":null},{"kind":12,"line":825,"containerName":"main::","name":"_show_dna"},{"line":827,"kind":13,"containerName":null,"name":"$self"},{"kind":12,"line":827,"name":"_print_EMBL_FTHelper","containerName":"main::"},{"line":827,"kind":13,"name":"$fth","containerName":null},{"containerName":null,"name":"$self","line":833,"kind":13},{"kind":12,"line":833,"name":"_show_dna","containerName":"main::"},{"containerName":null,"name":"$self","kind":13,"line":834},{"line":834,"kind":12,"name":"_print","containerName":"main::"},{"line":837,"kind":13,"name":"$self","containerName":null},{"kind":12,"line":837,"name":"_print","containerName":"main::"},{"line":841,"kind":13,"containerName":null,"name":"$str"},{"line":844,"kind":13,"localvar":"my","name":"$alen","definition":"my","containerName":null},{"line":844,"kind":13,"containerName":null,"name":"$str"},{"line":845,"kind":13,"localvar":"my","containerName":null,"name":"$clen","definition":"my"},{"kind":13,"line":845,"name":"$str","containerName":null},{"localvar":"my","containerName":null,"definition":"my","name":"$glen","line":846,"kind":13},{"line":846,"kind":13,"name":"$str","containerName":null},{"localvar":"my","definition":"my","name":"$tlen","containerName":null,"line":847,"kind":13},{"line":847,"kind":13,"containerName":null,"name":"$str"},{"definition":"my","name":"$len","containerName":null,"localvar":"my","kind":13,"line":849},{"name":"$seq","containerName":null,"line":849,"kind":13},{"kind":12,"line":849,"containerName":"main::","name":"length"},{"line":850,"kind":13,"localvar":"my","containerName":null,"definition":"my","name":"$olen"},{"line":850,"kind":13,"name":"$seq","containerName":null},{"containerName":"main::","name":"length","kind":12,"line":850},{"line":850,"kind":13,"name":"$alen","containerName":null},{"kind":13,"line":850,"containerName":null,"name":"$tlen"},{"containerName":null,"name":"$clen","kind":13,"line":850},{"line":850,"kind":13,"containerName":null,"name":"$glen"},{"containerName":null,"name":"%olen","line":851,"kind":13},{"containerName":null,"name":"$self","line":852,"kind":13},{"containerName":"main::","name":"warn","kind":12,"line":852},{"containerName":null,"name":"$self","kind":13,"line":855},{"containerName":"main::","name":"_print","line":855,"kind":12},{"definition":"my","name":"$nuc","containerName":null,"localvar":"my","kind":13,"line":857},{"line":858,"kind":13,"localvar":"my","containerName":null,"definition":"my","name":"$whole_pat"},{"line":859,"kind":13,"localvar":"my","containerName":null,"definition":"my","name":"$out_pat"},{"kind":13,"line":860,"containerName":null,"definition":"my","name":"$length","localvar":"my"},{"line":860,"kind":13,"name":"$str","containerName":null},{"containerName":null,"definition":"my","name":"$whole","localvar":"my","kind":13,"line":863},{"name":"$length","containerName":null,"line":863,"kind":13},{"containerName":null,"name":"$nuc","kind":13,"line":863},{"kind":13,"line":863,"name":"$nuc","containerName":null},{"kind":13,"line":866,"name":"$i","definition":"my","containerName":null,"localvar":"my"},{"line":867,"kind":13,"name":"$i","containerName":null},{"kind":13,"line":867,"containerName":null,"name":"$i"},{"line":867,"kind":13,"containerName":null,"name":"$whole"},{"kind":13,"line":867,"name":"$i","containerName":null},{"name":"%nuc","containerName":null,"line":867,"kind":13},{"definition":"my","name":"$blocks","containerName":null,"localvar":"my","kind":13,"line":868},{"containerName":null,"name":"$out_pat","line":868,"kind":13},{"containerName":null,"name":"$whole_pat","line":869,"kind":13},{"containerName":null,"name":"$str","kind":13,"line":870},{"line":870,"kind":13,"name":"$i","containerName":null},{"containerName":null,"name":"$nuc","line":870,"kind":13},{"kind":13,"line":871,"name":"$self","containerName":null},{"kind":12,"line":871,"name":"_print","containerName":"main::"},{"kind":13,"line":871,"name":"$i","containerName":null},{"name":"$nuc","containerName":null,"line":871,"kind":13},{"kind":13,"line":875,"name":"$last","definition":"my","containerName":null,"localvar":"my"},{"containerName":null,"name":"$str","line":875,"kind":13},{"line":875,"kind":13,"name":"%i","containerName":null},{"containerName":null,"name":"$last_len","definition":"my","localvar":"my","kind":13,"line":876},{"name":"$last","containerName":null,"kind":13,"line":876},{"localvar":"my","containerName":null,"definition":"my","name":"$last_pat","line":877,"kind":13},{"line":877,"kind":13,"containerName":null,"name":"$last_len"},{"name":"$last_len","containerName":null,"kind":13,"line":877},{"line":878,"kind":13,"localvar":"my","name":"$blocks","definition":"my","containerName":null},{"name":"$out_pat","containerName":null,"kind":13,"line":878},{"containerName":null,"name":"$last_pat","line":879,"kind":13},{"name":"$last","containerName":null,"kind":13,"line":879},{"line":880,"kind":13,"containerName":null,"name":"$self"},{"containerName":"main::","name":"_print","line":880,"kind":12},{"name":"$length","containerName":null,"line":880,"kind":13},{"name":"$self","containerName":null,"line":884,"kind":13},{"containerName":"main::","name":"_print","kind":12,"line":884},{"containerName":null,"name":"$self","kind":13,"line":886},{"line":886,"kind":12,"name":"flush","containerName":"main::"},{"containerName":null,"name":"$self","kind":13,"line":886},{"containerName":"main::","name":"_flush_on_write","kind":12,"line":886},{"kind":13,"line":886,"name":"$self","containerName":null},{"line":886,"kind":12,"name":"_fh","containerName":"main::"},{"name":"_print_EMBL_FTHelper","range":{"end":{"character":9999,"line":923},"start":{"line":902,"character":0}},"kind":12,"children":[{"containerName":"_print_EMBL_FTHelper","definition":"my","name":"$self","localvar":"my","kind":13,"line":903},{"name":"$fth","containerName":"_print_EMBL_FTHelper","line":903,"kind":13},{"name":"$fth","containerName":"_print_EMBL_FTHelper","kind":13,"line":905},{"containerName":"_print_EMBL_FTHelper","name":"$fth","line":905,"kind":13},{"line":905,"kind":12,"containerName":"_print_EMBL_FTHelper","name":"isa"},{"kind":13,"line":906,"name":"$fth","containerName":"_print_EMBL_FTHelper"},{"name":"warn","containerName":"_print_EMBL_FTHelper","kind":12,"line":906},{"line":913,"kind":13,"containerName":"_print_EMBL_FTHelper","name":"$fth"},{"containerName":"_print_EMBL_FTHelper","name":"key","kind":12,"line":913},{"containerName":"_print_EMBL_FTHelper","name":"$self","line":914,"kind":13},{"line":914,"kind":12,"name":"_print","containerName":"_print_EMBL_FTHelper"},{"line":915,"kind":13,"containerName":"_print_EMBL_FTHelper","name":"$self"},{"name":"_write_line_EMBL_regex","containerName":"_print_EMBL_FTHelper","line":915,"kind":12},{"kind":13,"line":916,"name":"$fth","containerName":"_print_EMBL_FTHelper"},{"containerName":"_print_EMBL_FTHelper","name":"loc","kind":12,"line":916},{"name":"$self","containerName":"_print_EMBL_FTHelper","kind":13,"line":920},{"line":920,"kind":12,"name":"_write_line_EMBL_regex","containerName":"_print_EMBL_FTHelper"},{"containerName":"_print_EMBL_FTHelper","name":"$fth","kind":13,"line":920},{"name":"key","containerName":"_print_EMBL_FTHelper","line":920,"kind":12},{"containerName":"_print_EMBL_FTHelper","name":"$fth","kind":13,"line":921},{"name":"loc","containerName":"_print_EMBL_FTHelper","line":921,"kind":12},{"localvar":"my","name":"$tag","definition":"my","containerName":"_print_EMBL_FTHelper","line":923,"kind":13},{"kind":13,"line":923,"name":"$fth","containerName":"_print_EMBL_FTHelper"},{"line":923,"kind":12,"name":"field","containerName":"_print_EMBL_FTHelper"}],"line":902,"definition":"sub","containerName":"main::","signature":{"label":"_print_EMBL_FTHelper($self,$fth)","parameters":[{"label":"$self"},{"label":"$fth"}],"documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :"},"detail":"($self,$fth)"},{"name":"$fth","containerName":null,"kind":13,"line":924},{"line":924,"kind":12,"name":"field","containerName":"main::"},{"name":"%tag","containerName":null,"kind":13,"line":924},{"containerName":null,"definition":"my","name":"$value","localvar":"my","kind":13,"line":927},{"containerName":null,"name":"$fth","line":927,"kind":13},{"kind":12,"line":927,"name":"field","containerName":"main::"},{"name":"%tag","containerName":null,"line":927,"kind":13},{"kind":13,"line":928,"name":"$value","containerName":null},{"kind":13,"line":929,"containerName":null,"name":"%value"},{"line":930,"kind":13,"containerName":null,"name":"$self"},{"containerName":"main::","name":"_write_line_EMBL_regex","kind":12,"line":930},{"line":936,"kind":13,"containerName":null,"name":"%FTQUAL_NO_QUOTE"},{"line":936,"kind":13,"containerName":null,"name":"%tag"},{"kind":13,"line":937,"containerName":null,"definition":"my","name":"$pat","localvar":"my"},{"name":"$value","containerName":null,"line":937,"kind":13},{"line":938,"kind":13,"containerName":null,"name":"$self"},{"containerName":"main::","name":"_write_line_EMBL_regex","kind":12,"line":938},{"kind":13,"line":940,"containerName":null,"name":"%pat"},{"line":942,"kind":13,"containerName":null,"name":"$self"},{"line":942,"kind":12,"name":"_write_line_EMBL_regex","containerName":"main::"},{"kind":12,"line":965,"children":[{"kind":13,"line":966,"containerName":"_read_EMBL_References","definition":"my","name":"$self","localvar":"my"},{"containerName":"_read_EMBL_References","name":"$buffer","line":966,"kind":13},{"containerName":"_read_EMBL_References","name":"@refs","definition":"my","localvar":"my","kind":13,"line":967},{"localvar":"my","definition":"my","name":"$b1","containerName":"_read_EMBL_References","line":974,"kind":13},{"containerName":"_read_EMBL_References","definition":"my","name":"$b2","localvar":"my","kind":13,"line":975},{"localvar":"my","containerName":"_read_EMBL_References","name":"$title","definition":"my","line":976,"kind":13},{"line":977,"kind":13,"localvar":"my","containerName":"_read_EMBL_References","name":"$loc","definition":"my"},{"localvar":"my","definition":"my","name":"$au","containerName":"_read_EMBL_References","line":978,"kind":13},{"localvar":"my","containerName":"_read_EMBL_References","definition":"my","name":"$med","line":979,"kind":13},{"localvar":"my","containerName":"_read_EMBL_References","definition":"my","name":"$pm","line":980,"kind":13},{"kind":13,"line":981,"name":"$com","definition":"my","containerName":"_read_EMBL_References","localvar":"my"},{"containerName":"_read_EMBL_References","name":"$self","kind":13,"line":983},{"name":"_readline","containerName":"_read_EMBL_References","kind":12,"line":983},{"line":985,"kind":13,"containerName":"_read_EMBL_References","name":"$b1"},{"kind":13,"line":985,"name":"$b2","containerName":"_read_EMBL_References"},{"kind":13,"line":986,"name":"$med","containerName":"_read_EMBL_References"},{"kind":13,"line":987,"containerName":"_read_EMBL_References","name":"$pm"},{"containerName":"_read_EMBL_References","name":"$au","line":989,"kind":13},{"containerName":"_read_EMBL_References","name":"$self","kind":13,"line":989},{"line":989,"kind":12,"containerName":"_read_EMBL_References","name":"_concatenate_lines"},{"kind":13,"line":989,"name":"$au","containerName":"_read_EMBL_References"},{"line":992,"kind":13,"containerName":"_read_EMBL_References","name":"$title"},{"name":"$self","containerName":"_read_EMBL_References","line":992,"kind":13},{"name":"_concatenate_lines","containerName":"_read_EMBL_References","kind":12,"line":992},{"kind":13,"line":992,"name":"$title","containerName":"_read_EMBL_References"},{"containerName":"_read_EMBL_References","name":"$loc","line":995,"kind":13},{"containerName":"_read_EMBL_References","name":"$self","kind":13,"line":995},{"name":"_concatenate_lines","containerName":"_read_EMBL_References","kind":12,"line":995},{"kind":13,"line":995,"containerName":"_read_EMBL_References","name":"$loc"},{"kind":13,"line":998,"name":"$com","containerName":"_read_EMBL_References"},{"line":998,"kind":13,"containerName":"_read_EMBL_References","name":"$self"},{"kind":12,"line":998,"containerName":"_read_EMBL_References","name":"_concatenate_lines"},{"containerName":"_read_EMBL_References","name":"$com","line":998,"kind":13},{"line":1002,"kind":13,"localvar":"my","containerName":"_read_EMBL_References","definition":"my","name":"$ref"},{"line":1002,"kind":12,"containerName":"_read_EMBL_References","name":"new"},{"name":"$au","containerName":"_read_EMBL_References","kind":13,"line":1003},{"containerName":"_read_EMBL_References","name":"$title","kind":13,"line":1004},{"containerName":"_read_EMBL_References","name":"$ref","kind":13,"line":1006},{"containerName":"_read_EMBL_References","name":"start","line":1006,"kind":12},{"line":1006,"kind":13,"containerName":"_read_EMBL_References","name":"$b1"},{"name":"$ref","containerName":"_read_EMBL_References","line":1007,"kind":13},{"name":"end","containerName":"_read_EMBL_References","kind":12,"line":1007},{"containerName":"_read_EMBL_References","name":"$b2","kind":13,"line":1007},{"name":"$ref","containerName":"_read_EMBL_References","line":1008,"kind":13},{"kind":12,"line":1008,"containerName":"_read_EMBL_References","name":"authors"},{"line":1008,"kind":13,"name":"$au","containerName":"_read_EMBL_References"},{"containerName":"_read_EMBL_References","name":"$ref","kind":13,"line":1009},{"line":1009,"kind":12,"containerName":"_read_EMBL_References","name":"title"},{"kind":13,"line":1009,"containerName":"_read_EMBL_References","name":"$title"},{"kind":13,"line":1010,"name":"$ref","containerName":"_read_EMBL_References"},{"line":1010,"kind":12,"name":"location","containerName":"_read_EMBL_References"},{"name":"$loc","containerName":"_read_EMBL_References","line":1010,"kind":13},{"containerName":"_read_EMBL_References","name":"$ref","line":1011,"kind":13},{"containerName":"_read_EMBL_References","name":"medline","kind":12,"line":1011},{"line":1011,"kind":13,"name":"$med","containerName":"_read_EMBL_References"},{"kind":13,"line":1012,"containerName":"_read_EMBL_References","name":"$ref"},{"line":1012,"kind":12,"name":"comment","containerName":"_read_EMBL_References"},{"kind":13,"line":1012,"containerName":"_read_EMBL_References","name":"$com"},{"name":"$ref","containerName":"_read_EMBL_References","kind":13,"line":1013},{"line":1013,"kind":12,"containerName":"_read_EMBL_References","name":"pubmed"},{"line":1013,"kind":13,"containerName":"_read_EMBL_References","name":"$pm"},{"containerName":"_read_EMBL_References","name":"@refs","kind":13,"line":1015},{"line":1015,"kind":13,"name":"$ref","containerName":"_read_EMBL_References"},{"line":1018,"kind":13,"name":"@refs","containerName":"_read_EMBL_References"}],"containerName":"main::","definition":"sub","detail":"($self,$buffer)","signature":{"label":"_read_EMBL_References($self,$buffer)","documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :\n\n\n\nsub _print_EMBL_FTHelper {\n    my ($self,$fth) = @_;\n\n    if ( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {\n        $fth->warn(\"$fth is not a FTHelper class. Attempting to print, but there could be tears!\");\n    }\n\n\n    #$self->_print( \"FH   Key             Location/Qualifiers\\n\");\n    #$self->_print( sprintf(\"FT   %-15s  %s\\n\",$fth->key,$fth->loc));\n    # let\n    if ( $fth->key eq 'CONTIG' ) {\n        $self->_print(\"XX\\n\") || return;\n        $self->_write_line_EMBL_regex(\"CO   \",\n                                      \"CO   \",$fth->loc,\n                                      '\\,|$',80) || return; #'\n        return 1;\n    }\n    $self->_write_line_EMBL_regex(sprintf(\"FT   %-15s \",$fth->key),\n                                  \"FT                   \",$fth->loc,\n                                  '\\,|$',80) || return; #'\n    foreach my $tag ( keys %{$fth->field} ) {\n        if ( ! defined $fth->field->{$tag} ) {\n            next;\n        }\n        foreach my $value ( @{$fth->field->{$tag}} ) {\n            $value =~ s/\\\"/\\\"\\\"/g;\n            if ($value eq \"_no_value\") {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag\",'.|$',80) || return; #'\n            }\n            # there are almost 3x more quoted qualifier values and they\n            # are more common too so we take quoted ones first\n            elsif (!$FTQUAL_NO_QUOTE{$tag}) {\n                my $pat = $value =~ /\\s/ ? '\\s|\\-|$' : '.|\\-|$';\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=\\\"$value\\\"\",$pat,80) || return;\n            } else {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=$value\",'.|$',80) || return; #'\n                                          }\n            }\n        }\n\n        return 1;\n    }\n\n#'\n=head2 _read_EMBL_References\n\n Title   : _read_EMBL_References\n Usage   :\n Function: Reads references from EMBL format. Internal function really\n Example :\n Returns :\n Args    :","parameters":[{"label":"$self"},{"label":"$buffer"}]},"name":"_read_EMBL_References","range":{"start":{"character":0,"line":965},"end":{"line":1019,"character":9999}}},{"line":971,"kind":12,"name":"buffer"},{"kind":12,"line":1002,"name":"Bio","containerName":"Annotation::Reference"},{"line":1016,"kind":12,"name":"buffer"},{"kind":12,"children":[{"name":"$self","definition":"my","containerName":"_read_EMBL_Species","localvar":"my","kind":13,"line":1034},{"name":"$buffer","containerName":"_read_EMBL_Species","kind":13,"line":1034},{"containerName":"_read_EMBL_Species","name":"$acc","line":1034,"kind":13},{"line":1035,"kind":13,"localvar":"my","containerName":"_read_EMBL_Species","name":"$org","definition":"my"},{"kind":13,"line":1038,"containerName":"_read_EMBL_Species","definition":"my","name":"$sub_species","localvar":"my"},{"line":1038,"kind":13,"containerName":"_read_EMBL_Species","name":"$species"},{"containerName":"_read_EMBL_Species","name":"$genus","kind":13,"line":1038},{"name":"$common","containerName":"_read_EMBL_Species","kind":13,"line":1038},{"kind":13,"line":1038,"containerName":"_read_EMBL_Species","name":"$sci_name"},{"containerName":"_read_EMBL_Species","name":"$class_lines","line":1038,"kind":13},{"containerName":"_read_EMBL_Species","name":"$self","line":1039,"kind":13},{"name":"_readline","containerName":"_read_EMBL_Species","kind":12,"line":1039},{"containerName":"_read_EMBL_Species","name":"$sci_name","kind":13,"line":1041},{"name":"$sci_name","containerName":"_read_EMBL_Species","kind":13,"line":1041},{"name":"$class_lines","containerName":"_read_EMBL_Species","kind":13,"line":1043},{"containerName":"_read_EMBL_Species","name":"$org","line":1045,"kind":13},{"kind":13,"line":1054,"name":"$self","containerName":"_read_EMBL_Species"},{"line":1054,"kind":12,"name":"_pushback","containerName":"_read_EMBL_Species"},{"name":"$sci_name","containerName":"_read_EMBL_Species","kind":13,"line":1056},{"name":"$sci_name","containerName":"_read_EMBL_Species","line":1057,"kind":13},{"kind":13,"line":1063,"containerName":"_read_EMBL_Species","definition":"my","name":"@class","localvar":"my"},{"containerName":"_read_EMBL_Species","name":"$class_lines","kind":13,"line":1063},{"kind":13,"line":1066,"name":"$possible_genus","definition":"my","containerName":"_read_EMBL_Species","localvar":"my"},{"line":1066,"kind":13,"containerName":"_read_EMBL_Species","name":"$class"},{"line":1067,"kind":13,"name":"$possible_genus","containerName":"_read_EMBL_Species"},{"containerName":"_read_EMBL_Species","name":"$class","kind":13,"line":1067},{"line":1068,"kind":13,"name":"$sci_name","containerName":"_read_EMBL_Species"},{"name":"$genus","containerName":"_read_EMBL_Species","line":1069,"kind":13},{"containerName":"_read_EMBL_Species","name":"$species","line":1070,"kind":13},{"name":"$sci_name","containerName":"_read_EMBL_Species","line":1070,"kind":13},{"containerName":"_read_EMBL_Species","name":"$species","kind":13,"line":1072},{"kind":13,"line":1072,"name":"$sci_name","containerName":"_read_EMBL_Species"},{"name":"$genus","containerName":"_read_EMBL_Species","line":1076,"kind":13},{"name":"$genus","containerName":"_read_EMBL_Species","kind":13,"line":1077},{"line":1082,"kind":13,"containerName":"_read_EMBL_Species","name":"$species"},{"line":1083,"kind":13,"containerName":"_read_EMBL_Species","name":"$species"},{"line":1083,"kind":13,"containerName":"_read_EMBL_Species","name":"$sub_species"},{"name":"$species","containerName":"_read_EMBL_Species","line":1083,"kind":13},{"kind":13,"line":1090,"name":"$class","containerName":"_read_EMBL_Species"},{"kind":13,"line":1091,"containerName":"_read_EMBL_Species","name":"$species"},{"kind":13,"line":1091,"name":"$common","containerName":"_read_EMBL_Species"},{"kind":13,"line":1091,"name":"$species","containerName":"_read_EMBL_Species"},{"containerName":"_read_EMBL_Species","name":"$sci_name","kind":13,"line":1092},{"containerName":"_read_EMBL_Species","name":"$common","line":1092,"kind":13},{"line":1096,"kind":13,"containerName":"_read_EMBL_Species","name":"$class"},{"containerName":"_read_EMBL_Species","name":"$sci_name","kind":13,"line":1096},{"name":"@class","containerName":"_read_EMBL_Species","kind":13,"line":1097},{"name":"$sci_name","containerName":"_read_EMBL_Species","line":1097,"kind":13},{"name":"@class","containerName":"_read_EMBL_Species","kind":13,"line":1099},{"line":1099,"kind":13,"name":"@class","containerName":"_read_EMBL_Species"},{"line":1104,"kind":13,"name":"$self","containerName":"_read_EMBL_Species"},{"containerName":"_read_EMBL_Species","name":"throw","kind":12,"line":1104},{"containerName":"_read_EMBL_Species","name":"$sci_name","line":1104,"kind":13},{"line":1105,"kind":13,"localvar":"my","containerName":"_read_EMBL_Species","name":"%names","definition":"my"},{"definition":"my","name":"$i","containerName":"_read_EMBL_Species","localvar":"my","kind":13,"line":1106},{"line":1107,"kind":13,"localvar":"my","name":"$name","definition":"my","containerName":"_read_EMBL_Species"},{"containerName":"_read_EMBL_Species","name":"$class","line":1107,"kind":13},{"name":"$i","containerName":"_read_EMBL_Species","line":1107,"kind":13},{"line":1108,"kind":13,"containerName":"_read_EMBL_Species","name":"$names"},{"name":"$name","containerName":"_read_EMBL_Species","kind":13,"line":1108},{"containerName":"_read_EMBL_Species","name":"$names","kind":13,"line":1109},{"line":1109,"kind":13,"name":"$name","containerName":"_read_EMBL_Species"},{"kind":13,"line":1109,"containerName":"_read_EMBL_Species","name":"$name"},{"kind":13,"line":1109,"containerName":"_read_EMBL_Species","name":"$class"},{"line":1109,"kind":13,"containerName":"_read_EMBL_Species","name":"$i"},{"line":1110,"kind":13,"name":"$self","containerName":"_read_EMBL_Species"},{"kind":12,"line":1110,"name":"throw","containerName":"_read_EMBL_Species"},{"line":1113,"kind":13,"localvar":"my","containerName":"_read_EMBL_Species","definition":"my","name":"$make"},{"line":1113,"kind":12,"containerName":"_read_EMBL_Species","name":"new"},{"kind":13,"line":1114,"containerName":"_read_EMBL_Species","name":"$make"},{"name":"scientific_name","containerName":"_read_EMBL_Species","kind":12,"line":1114},{"containerName":"_read_EMBL_Species","name":"$sci_name","line":1114,"kind":13},{"containerName":"_read_EMBL_Species","name":"$make","line":1115,"kind":13},{"containerName":"_read_EMBL_Species","name":"classification","kind":12,"line":1115},{"containerName":"_read_EMBL_Species","name":"@class","kind":13,"line":1115},{"containerName":"_read_EMBL_Species","name":"$class","line":1116,"kind":13},{"name":"$make","containerName":"_read_EMBL_Species","line":1117,"kind":13},{"line":1117,"kind":12,"name":"genus","containerName":"_read_EMBL_Species"},{"name":"$genus","containerName":"_read_EMBL_Species","line":1117,"kind":13},{"kind":13,"line":1117,"containerName":"_read_EMBL_Species","name":"$genus"},{"kind":13,"line":1118,"containerName":"_read_EMBL_Species","name":"$make"},{"containerName":"_read_EMBL_Species","name":"species","kind":12,"line":1118},{"name":"$species","containerName":"_read_EMBL_Species","kind":13,"line":1118},{"line":1118,"kind":13,"name":"$species","containerName":"_read_EMBL_Species"},{"name":"$make","containerName":"_read_EMBL_Species","line":1119,"kind":13},{"line":1119,"kind":12,"name":"sub_species","containerName":"_read_EMBL_Species"},{"containerName":"_read_EMBL_Species","name":"$sub_species","kind":13,"line":1119},{"name":"$sub_species","containerName":"_read_EMBL_Species","kind":13,"line":1119},{"line":1120,"kind":13,"containerName":"_read_EMBL_Species","name":"$make"},{"containerName":"_read_EMBL_Species","name":"common_name","line":1120,"kind":12},{"kind":13,"line":1120,"name":"$common","containerName":"_read_EMBL_Species"},{"name":"$common","containerName":"_read_EMBL_Species","line":1120,"kind":13},{"containerName":"_read_EMBL_Species","name":"$make","line":1122,"kind":13},{"containerName":"_read_EMBL_Species","name":"organelle","line":1122,"kind":12},{"name":"$org","containerName":"_read_EMBL_Species","line":1122,"kind":13},{"kind":13,"line":1122,"name":"$org","containerName":"_read_EMBL_Species"},{"name":"$make","containerName":"_read_EMBL_Species","kind":13,"line":1123}],"line":1033,"definition":"sub","containerName":"main::","signature":{"label":"_read_EMBL_Species($self,$buffer,$acc)","documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :\n\n\n\nsub _print_EMBL_FTHelper {\n    my ($self,$fth) = @_;\n\n    if ( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {\n        $fth->warn(\"$fth is not a FTHelper class. Attempting to print, but there could be tears!\");\n    }\n\n\n    #$self->_print( \"FH   Key             Location/Qualifiers\\n\");\n    #$self->_print( sprintf(\"FT   %-15s  %s\\n\",$fth->key,$fth->loc));\n    # let\n    if ( $fth->key eq 'CONTIG' ) {\n        $self->_print(\"XX\\n\") || return;\n        $self->_write_line_EMBL_regex(\"CO   \",\n                                      \"CO   \",$fth->loc,\n                                      '\\,|$',80) || return; #'\n        return 1;\n    }\n    $self->_write_line_EMBL_regex(sprintf(\"FT   %-15s \",$fth->key),\n                                  \"FT                   \",$fth->loc,\n                                  '\\,|$',80) || return; #'\n    foreach my $tag ( keys %{$fth->field} ) {\n        if ( ! defined $fth->field->{$tag} ) {\n            next;\n        }\n        foreach my $value ( @{$fth->field->{$tag}} ) {\n            $value =~ s/\\\"/\\\"\\\"/g;\n            if ($value eq \"_no_value\") {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag\",'.|$',80) || return; #'\n            }\n            # there are almost 3x more quoted qualifier values and they\n            # are more common too so we take quoted ones first\n            elsif (!$FTQUAL_NO_QUOTE{$tag}) {\n                my $pat = $value =~ /\\s/ ? '\\s|\\-|$' : '.|\\-|$';\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=\\\"$value\\\"\",$pat,80) || return;\n            } else {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=$value\",'.|$',80) || return; #'\n                                          }\n            }\n        }\n\n        return 1;\n    }\n\n#'\n=head2 _read_EMBL_References\n\n Title   : _read_EMBL_References\n Usage   :\n Function: Reads references from EMBL format. Internal function really\n Example :\n Returns :\n Args    :\n\n\n\nsub _read_EMBL_References {\n    my ($self,$buffer) = @_;\n    my (@refs);\n\n    # assume things are starting with RN\n\n    if ( $$buffer !~ /^RN/ ) {\n        warn(\"Not parsing line '$$buffer' which maybe important\");\n    }\n    my $b1;\n    my $b2;\n    my $title;\n    my $loc;\n    my $au;\n    my $med;\n    my $pm;\n    my $com;\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^R/ || last;\n        /^RP   (\\d+)-(\\d+)/ && do {$b1=$1;$b2=$2;};\n        /^RX   MEDLINE;\\s+(\\d+)/ && do {$med=$1};\n        /^RX   PUBMED;\\s+(\\d+)/ && do {$pm=$1};\n        /^RA   (.*)/ && do {\n            $au = $self->_concatenate_lines($au,$1); next;\n        };\n        /^RT   (.*)/ && do {\n            $title = $self->_concatenate_lines($title,$1); next;\n        };\n        /^RL   (.*)/ && do {\n            $loc = $self->_concatenate_lines($loc,$1); next;\n        };\n        /^RC   (.*)/ && do {\n            $com = $self->_concatenate_lines($com,$1); next;\n        };\n    }\n\n    my $ref = Bio::Annotation::Reference->new();\n    $au =~ s/;\\s*$//g;\n    $title =~ s/;\\s*$//g;\n\n    $ref->start($b1);\n    $ref->end($b2);\n    $ref->authors($au);\n    $ref->title($title);\n    $ref->location($loc);\n    $ref->medline($med);\n    $ref->comment($com);\n    $ref->pubmed($pm);\n\n    push(@refs,$ref);\n    $$buffer = $_;\n\n    return @refs;\n}\n\n=head2 _read_EMBL_Species\n\n Title   : _read_EMBL_Species\n Usage   :\n Function: Reads the EMBL Organism species and classification\n           lines.\n Example :\n Returns : A Bio::Species object\n Args    : a reference to the current line buffer, accession number","parameters":[{"label":"$self"},{"label":"$buffer"},{"label":"$acc"}]},"detail":"($self,$buffer,$acc)","name":"_read_EMBL_Species","range":{"start":{"character":0,"line":1033},"end":{"line":1124,"character":9999}}},{"kind":12,"line":1037,"name":"buffer"},{"name":"class","line":1106,"kind":12},{"line":1113,"kind":12,"name":"Bio","containerName":"Species"},{"name":"_read_EMBL_DBLink","range":{"end":{"line":1158,"character":9999},"start":{"character":0,"line":1137}},"kind":12,"children":[{"localvar":"my","definition":"my","name":"$self","containerName":"_read_EMBL_DBLink","line":1138,"kind":13},{"line":1138,"kind":13,"name":"$buffer","containerName":"_read_EMBL_DBLink"},{"localvar":"my","containerName":"_read_EMBL_DBLink","name":"@db_link","definition":"my","line":1139,"kind":13},{"line":1142,"kind":13,"containerName":"_read_EMBL_DBLink","name":"$self"},{"containerName":"_read_EMBL_DBLink","name":"_readline","line":1142,"kind":12},{"kind":13,"line":1144,"containerName":"_read_EMBL_DBLink","definition":"my","name":"$databse","localvar":"my"},{"line":1144,"kind":13,"name":"$prim_id","containerName":"_read_EMBL_DBLink"},{"kind":13,"line":1144,"name":"$sec_id","containerName":"_read_EMBL_DBLink"},{"name":"$link","definition":"my","containerName":"_read_EMBL_DBLink","localvar":"my","kind":13,"line":1145},{"line":1145,"kind":12,"name":"new","containerName":"_read_EMBL_DBLink"},{"line":1145,"kind":13,"name":"$databse","containerName":"_read_EMBL_DBLink"},{"line":1146,"kind":13,"containerName":"_read_EMBL_DBLink","name":"$prim_id"},{"name":"$sec_id","containerName":"_read_EMBL_DBLink","kind":13,"line":1147},{"name":"@db_link","containerName":"_read_EMBL_DBLink","line":1149,"kind":13},{"name":"$link","containerName":"_read_EMBL_DBLink","line":1149,"kind":13},{"name":"@db_link","containerName":"_read_EMBL_DBLink","kind":13,"line":1157}],"line":1137,"definition":"sub","containerName":"main::","signature":{"label":"_read_EMBL_DBLink($self,$buffer)","documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :\n\n\n\nsub _print_EMBL_FTHelper {\n    my ($self,$fth) = @_;\n\n    if ( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {\n        $fth->warn(\"$fth is not a FTHelper class. Attempting to print, but there could be tears!\");\n    }\n\n\n    #$self->_print( \"FH   Key             Location/Qualifiers\\n\");\n    #$self->_print( sprintf(\"FT   %-15s  %s\\n\",$fth->key,$fth->loc));\n    # let\n    if ( $fth->key eq 'CONTIG' ) {\n        $self->_print(\"XX\\n\") || return;\n        $self->_write_line_EMBL_regex(\"CO   \",\n                                      \"CO   \",$fth->loc,\n                                      '\\,|$',80) || return; #'\n        return 1;\n    }\n    $self->_write_line_EMBL_regex(sprintf(\"FT   %-15s \",$fth->key),\n                                  \"FT                   \",$fth->loc,\n                                  '\\,|$',80) || return; #'\n    foreach my $tag ( keys %{$fth->field} ) {\n        if ( ! defined $fth->field->{$tag} ) {\n            next;\n        }\n        foreach my $value ( @{$fth->field->{$tag}} ) {\n            $value =~ s/\\\"/\\\"\\\"/g;\n            if ($value eq \"_no_value\") {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag\",'.|$',80) || return; #'\n            }\n            # there are almost 3x more quoted qualifier values and they\n            # are more common too so we take quoted ones first\n            elsif (!$FTQUAL_NO_QUOTE{$tag}) {\n                my $pat = $value =~ /\\s/ ? '\\s|\\-|$' : '.|\\-|$';\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=\\\"$value\\\"\",$pat,80) || return;\n            } else {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=$value\",'.|$',80) || return; #'\n                                          }\n            }\n        }\n\n        return 1;\n    }\n\n#'\n=head2 _read_EMBL_References\n\n Title   : _read_EMBL_References\n Usage   :\n Function: Reads references from EMBL format. Internal function really\n Example :\n Returns :\n Args    :\n\n\n\nsub _read_EMBL_References {\n    my ($self,$buffer) = @_;\n    my (@refs);\n\n    # assume things are starting with RN\n\n    if ( $$buffer !~ /^RN/ ) {\n        warn(\"Not parsing line '$$buffer' which maybe important\");\n    }\n    my $b1;\n    my $b2;\n    my $title;\n    my $loc;\n    my $au;\n    my $med;\n    my $pm;\n    my $com;\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^R/ || last;\n        /^RP   (\\d+)-(\\d+)/ && do {$b1=$1;$b2=$2;};\n        /^RX   MEDLINE;\\s+(\\d+)/ && do {$med=$1};\n        /^RX   PUBMED;\\s+(\\d+)/ && do {$pm=$1};\n        /^RA   (.*)/ && do {\n            $au = $self->_concatenate_lines($au,$1); next;\n        };\n        /^RT   (.*)/ && do {\n            $title = $self->_concatenate_lines($title,$1); next;\n        };\n        /^RL   (.*)/ && do {\n            $loc = $self->_concatenate_lines($loc,$1); next;\n        };\n        /^RC   (.*)/ && do {\n            $com = $self->_concatenate_lines($com,$1); next;\n        };\n    }\n\n    my $ref = Bio::Annotation::Reference->new();\n    $au =~ s/;\\s*$//g;\n    $title =~ s/;\\s*$//g;\n\n    $ref->start($b1);\n    $ref->end($b2);\n    $ref->authors($au);\n    $ref->title($title);\n    $ref->location($loc);\n    $ref->medline($med);\n    $ref->comment($com);\n    $ref->pubmed($pm);\n\n    push(@refs,$ref);\n    $$buffer = $_;\n\n    return @refs;\n}\n\n=head2 _read_EMBL_Species\n\n Title   : _read_EMBL_Species\n Usage   :\n Function: Reads the EMBL Organism species and classification\n           lines.\n Example :\n Returns : A Bio::Species object\n Args    : a reference to the current line buffer, accession number\n\n\nsub _read_EMBL_Species {\n    my( $self, $buffer, $acc ) = @_;\n    my $org;\n\n    $_ = $$buffer;\n    my( $sub_species, $species, $genus, $common, $sci_name, $class_lines );\n    while (defined( $_ ||= $self->_readline )) {\n        if (/^OS\\s+(.+)/) {\n            $sci_name .= ($sci_name) ? ' '.$1 : $1;\n        } elsif (s/^OC\\s+(.+)$//) {\n            $class_lines .= $1;\n        } elsif (/^OG\\s+(.*)/) {\n            $org = $1;\n        } else {\n            last;\n        }\n\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n#    $$buffer = $_;\n\t$self->_pushback($_);\n\t\n    $sci_name =~ s{\\.$}{};\n    $sci_name || return;\n\n    # Convert data in classification lines into classification array.\n    # only split on ';' or '.' so that classification that is 2 or more words\n    # will still get matched, use map() to remove trailing/leading/intervening\n    # spaces\n    my @class = map { s/^\\s+//; s/\\s+$//; s/\\s{2,}/ /g; $_; } split /(?<!subgen)[;\\.]+/, $class_lines;\n\n    # do we have a genus?\n    my $possible_genus = $class[-1];\n    $possible_genus .= \"|$class[-2]\" if $class[-2];\n    if ($sci_name =~ /^($possible_genus)/) {\n        $genus = $1;\n        ($species) = $sci_name =~ /^$genus\\s+(.+)/;\n    } else {\n        $species = $sci_name;\n    }\n\n    # Don't make a species object if it is \"Unknown\" or \"None\"\n    if ($genus) {\n        return if $genus =~ /^(Unknown|None)$/i;\n    }\n\n    # is this organism of rank species or is it lower?\n    # (doesn't catch everything, but at least the guess isn't dangerous)\n    if ($species =~ /subsp\\.|var\\./) {\n        ($species, $sub_species) = $species =~ /(.+)\\s+((?:subsp\\.|var\\.).+)/;\n    }\n\n    # sometimes things have common name in brackets, like\n    # Schizosaccharomyces pombe (fission yeast), so get rid of the common\n    # name bit. Probably dangerous if real scientific species name ends in\n    # bracketed bit.\n    unless ($class[-1] eq 'Viruses') {\n        ($species, $common) = $species =~ /^(.+)\\s+\\((.+)\\)$/;\n        $sci_name =~ s/\\s+\\(.+\\)$// if $common;\n    }\n\n    # Bio::Species array needs array in Species -> Kingdom direction\n    unless ($class[-1] eq $sci_name) {\n        push(@class, $sci_name);\n    }\n    @class = reverse @class;\n\n    # do minimal sanity checks before we hand off to Bio::Species which won't\n    # be able to give informative throw messages if it has to throw because\n    # of problems here\n    $self->throw(\"$acc seems to be missing its OS line: invalid.\") unless $sci_name;\n    my %names;\n    foreach my $i (0..$#class) {\n        my $name = $class[$i];\n        $names{$name}++;\n        if ($names{$name} > 1 && $name ne $class[$i - 1]) {\n            $self->throw(\"$acc seems to have an invalid species classification.\");\n        }\n    }\n    my $make = Bio::Species->new();\n    $make->scientific_name($sci_name);\n    $make->classification(@class);\n    unless ($class[-1] eq 'Viruses') {\n        $make->genus($genus) if $genus;\n        $make->species($species) if $species;\n        $make->sub_species($sub_species) if $sub_species;\n        $make->common_name($common) if $common;\n    }\n    $make->organelle($org) if $org;\n    return $make;\n}\n\n=head2 _read_EMBL_DBLink\n\n Title   : _read_EMBL_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference (\"DR\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :","parameters":[{"label":"$self"},{"label":"$buffer"}]},"detail":"($self,$buffer)"},{"kind":12,"line":1141,"name":"buffer"},{"line":1145,"kind":12,"containerName":"Annotation::DBLink","name":"Bio"},{"name":"buffer","line":1156,"kind":12},{"kind":12,"children":[{"localvar":"my","containerName":"_read_EMBL_TaxID_DBLink","definition":"my","name":"$self","line":1172,"kind":13},{"name":"$buffer","containerName":"_read_EMBL_TaxID_DBLink","kind":13,"line":1172},{"name":"@db_link","definition":"my","containerName":"_read_EMBL_TaxID_DBLink","localvar":"my","kind":13,"line":1173},{"line":1176,"kind":13,"name":"$self","containerName":"_read_EMBL_TaxID_DBLink"},{"name":"_readline","containerName":"_read_EMBL_TaxID_DBLink","line":1176,"kind":12},{"line":1178,"kind":13,"localvar":"my","definition":"my","name":"$databse","containerName":"_read_EMBL_TaxID_DBLink"},{"kind":13,"line":1178,"name":"$prim_id","containerName":"_read_EMBL_TaxID_DBLink"},{"containerName":"_read_EMBL_TaxID_DBLink","definition":"my","name":"$link","localvar":"my","kind":13,"line":1179},{"name":"new","containerName":"_read_EMBL_TaxID_DBLink","kind":12,"line":1179},{"containerName":"_read_EMBL_TaxID_DBLink","name":"$databse","line":1179,"kind":13},{"containerName":"_read_EMBL_TaxID_DBLink","name":"$prim_id","kind":13,"line":1180},{"kind":13,"line":1181,"containerName":"_read_EMBL_TaxID_DBLink","name":"@db_link"},{"line":1181,"kind":13,"containerName":"_read_EMBL_TaxID_DBLink","name":"$link"},{"name":"@db_link","containerName":"_read_EMBL_TaxID_DBLink","line":1189,"kind":13}],"line":1171,"definition":"sub","containerName":"main::","signature":{"label":"_read_EMBL_TaxID_DBLink($self,$buffer)","documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :\n\n\n\nsub _print_EMBL_FTHelper {\n    my ($self,$fth) = @_;\n\n    if ( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {\n        $fth->warn(\"$fth is not a FTHelper class. Attempting to print, but there could be tears!\");\n    }\n\n\n    #$self->_print( \"FH   Key             Location/Qualifiers\\n\");\n    #$self->_print( sprintf(\"FT   %-15s  %s\\n\",$fth->key,$fth->loc));\n    # let\n    if ( $fth->key eq 'CONTIG' ) {\n        $self->_print(\"XX\\n\") || return;\n        $self->_write_line_EMBL_regex(\"CO   \",\n                                      \"CO   \",$fth->loc,\n                                      '\\,|$',80) || return; #'\n        return 1;\n    }\n    $self->_write_line_EMBL_regex(sprintf(\"FT   %-15s \",$fth->key),\n                                  \"FT                   \",$fth->loc,\n                                  '\\,|$',80) || return; #'\n    foreach my $tag ( keys %{$fth->field} ) {\n        if ( ! defined $fth->field->{$tag} ) {\n            next;\n        }\n        foreach my $value ( @{$fth->field->{$tag}} ) {\n            $value =~ s/\\\"/\\\"\\\"/g;\n            if ($value eq \"_no_value\") {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag\",'.|$',80) || return; #'\n            }\n            # there are almost 3x more quoted qualifier values and they\n            # are more common too so we take quoted ones first\n            elsif (!$FTQUAL_NO_QUOTE{$tag}) {\n                my $pat = $value =~ /\\s/ ? '\\s|\\-|$' : '.|\\-|$';\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=\\\"$value\\\"\",$pat,80) || return;\n            } else {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=$value\",'.|$',80) || return; #'\n                                          }\n            }\n        }\n\n        return 1;\n    }\n\n#'\n=head2 _read_EMBL_References\n\n Title   : _read_EMBL_References\n Usage   :\n Function: Reads references from EMBL format. Internal function really\n Example :\n Returns :\n Args    :\n\n\n\nsub _read_EMBL_References {\n    my ($self,$buffer) = @_;\n    my (@refs);\n\n    # assume things are starting with RN\n\n    if ( $$buffer !~ /^RN/ ) {\n        warn(\"Not parsing line '$$buffer' which maybe important\");\n    }\n    my $b1;\n    my $b2;\n    my $title;\n    my $loc;\n    my $au;\n    my $med;\n    my $pm;\n    my $com;\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^R/ || last;\n        /^RP   (\\d+)-(\\d+)/ && do {$b1=$1;$b2=$2;};\n        /^RX   MEDLINE;\\s+(\\d+)/ && do {$med=$1};\n        /^RX   PUBMED;\\s+(\\d+)/ && do {$pm=$1};\n        /^RA   (.*)/ && do {\n            $au = $self->_concatenate_lines($au,$1); next;\n        };\n        /^RT   (.*)/ && do {\n            $title = $self->_concatenate_lines($title,$1); next;\n        };\n        /^RL   (.*)/ && do {\n            $loc = $self->_concatenate_lines($loc,$1); next;\n        };\n        /^RC   (.*)/ && do {\n            $com = $self->_concatenate_lines($com,$1); next;\n        };\n    }\n\n    my $ref = Bio::Annotation::Reference->new();\n    $au =~ s/;\\s*$//g;\n    $title =~ s/;\\s*$//g;\n\n    $ref->start($b1);\n    $ref->end($b2);\n    $ref->authors($au);\n    $ref->title($title);\n    $ref->location($loc);\n    $ref->medline($med);\n    $ref->comment($com);\n    $ref->pubmed($pm);\n\n    push(@refs,$ref);\n    $$buffer = $_;\n\n    return @refs;\n}\n\n=head2 _read_EMBL_Species\n\n Title   : _read_EMBL_Species\n Usage   :\n Function: Reads the EMBL Organism species and classification\n           lines.\n Example :\n Returns : A Bio::Species object\n Args    : a reference to the current line buffer, accession number\n\n\nsub _read_EMBL_Species {\n    my( $self, $buffer, $acc ) = @_;\n    my $org;\n\n    $_ = $$buffer;\n    my( $sub_species, $species, $genus, $common, $sci_name, $class_lines );\n    while (defined( $_ ||= $self->_readline )) {\n        if (/^OS\\s+(.+)/) {\n            $sci_name .= ($sci_name) ? ' '.$1 : $1;\n        } elsif (s/^OC\\s+(.+)$//) {\n            $class_lines .= $1;\n        } elsif (/^OG\\s+(.*)/) {\n            $org = $1;\n        } else {\n            last;\n        }\n\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n#    $$buffer = $_;\n\t$self->_pushback($_);\n\t\n    $sci_name =~ s{\\.$}{};\n    $sci_name || return;\n\n    # Convert data in classification lines into classification array.\n    # only split on ';' or '.' so that classification that is 2 or more words\n    # will still get matched, use map() to remove trailing/leading/intervening\n    # spaces\n    my @class = map { s/^\\s+//; s/\\s+$//; s/\\s{2,}/ /g; $_; } split /(?<!subgen)[;\\.]+/, $class_lines;\n\n    # do we have a genus?\n    my $possible_genus = $class[-1];\n    $possible_genus .= \"|$class[-2]\" if $class[-2];\n    if ($sci_name =~ /^($possible_genus)/) {\n        $genus = $1;\n        ($species) = $sci_name =~ /^$genus\\s+(.+)/;\n    } else {\n        $species = $sci_name;\n    }\n\n    # Don't make a species object if it is \"Unknown\" or \"None\"\n    if ($genus) {\n        return if $genus =~ /^(Unknown|None)$/i;\n    }\n\n    # is this organism of rank species or is it lower?\n    # (doesn't catch everything, but at least the guess isn't dangerous)\n    if ($species =~ /subsp\\.|var\\./) {\n        ($species, $sub_species) = $species =~ /(.+)\\s+((?:subsp\\.|var\\.).+)/;\n    }\n\n    # sometimes things have common name in brackets, like\n    # Schizosaccharomyces pombe (fission yeast), so get rid of the common\n    # name bit. Probably dangerous if real scientific species name ends in\n    # bracketed bit.\n    unless ($class[-1] eq 'Viruses') {\n        ($species, $common) = $species =~ /^(.+)\\s+\\((.+)\\)$/;\n        $sci_name =~ s/\\s+\\(.+\\)$// if $common;\n    }\n\n    # Bio::Species array needs array in Species -> Kingdom direction\n    unless ($class[-1] eq $sci_name) {\n        push(@class, $sci_name);\n    }\n    @class = reverse @class;\n\n    # do minimal sanity checks before we hand off to Bio::Species which won't\n    # be able to give informative throw messages if it has to throw because\n    # of problems here\n    $self->throw(\"$acc seems to be missing its OS line: invalid.\") unless $sci_name;\n    my %names;\n    foreach my $i (0..$#class) {\n        my $name = $class[$i];\n        $names{$name}++;\n        if ($names{$name} > 1 && $name ne $class[$i - 1]) {\n            $self->throw(\"$acc seems to have an invalid species classification.\");\n        }\n    }\n    my $make = Bio::Species->new();\n    $make->scientific_name($sci_name);\n    $make->classification(@class);\n    unless ($class[-1] eq 'Viruses') {\n        $make->genus($genus) if $genus;\n        $make->species($species) if $species;\n        $make->sub_species($sub_species) if $sub_species;\n        $make->common_name($common) if $common;\n    }\n    $make->organelle($org) if $org;\n    return $make;\n}\n\n=head2 _read_EMBL_DBLink\n\n Title   : _read_EMBL_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference (\"DR\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^DR   ([^\\s;]+);\\s*([^\\s;]+);?\\s*([^\\s;]+)?\\.$/) {\n        my ($databse, $prim_id, $sec_id) = ($1,$2,$3);\n        my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                            -primary_id  => $prim_id,\n                            -optional_id => $sec_id);\n\n            push(@db_link, $link);\n    } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _read_EMBL_TaxID_DBLink\n\n Title   : _read_EMBL_TaxID_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference to NCBI TaxID (\"OX\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :","parameters":[{"label":"$self"},{"label":"$buffer"}]},"detail":"($self,$buffer)","name":"_read_EMBL_TaxID_DBLink","range":{"start":{"line":1171,"character":0},"end":{"line":1190,"character":9999}}},{"kind":12,"line":1175,"name":"buffer"},{"containerName":"Annotation::DBLink","name":"Bio","kind":12,"line":1179},{"kind":12,"line":1188,"name":"buffer"},{"name":"_filehandle","range":{"start":{"line":1204,"character":0},"end":{"character":9999,"line":1211}},"containerName":"main::","definition":"sub","detail":"($obj,$value)","signature":{"label":"_filehandle($obj,$value)","parameters":[{"label":"$obj"},{"label":"$value"}],"documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :\n\n\n\nsub _print_EMBL_FTHelper {\n    my ($self,$fth) = @_;\n\n    if ( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {\n        $fth->warn(\"$fth is not a FTHelper class. Attempting to print, but there could be tears!\");\n    }\n\n\n    #$self->_print( \"FH   Key             Location/Qualifiers\\n\");\n    #$self->_print( sprintf(\"FT   %-15s  %s\\n\",$fth->key,$fth->loc));\n    # let\n    if ( $fth->key eq 'CONTIG' ) {\n        $self->_print(\"XX\\n\") || return;\n        $self->_write_line_EMBL_regex(\"CO   \",\n                                      \"CO   \",$fth->loc,\n                                      '\\,|$',80) || return; #'\n        return 1;\n    }\n    $self->_write_line_EMBL_regex(sprintf(\"FT   %-15s \",$fth->key),\n                                  \"FT                   \",$fth->loc,\n                                  '\\,|$',80) || return; #'\n    foreach my $tag ( keys %{$fth->field} ) {\n        if ( ! defined $fth->field->{$tag} ) {\n            next;\n        }\n        foreach my $value ( @{$fth->field->{$tag}} ) {\n            $value =~ s/\\\"/\\\"\\\"/g;\n            if ($value eq \"_no_value\") {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag\",'.|$',80) || return; #'\n            }\n            # there are almost 3x more quoted qualifier values and they\n            # are more common too so we take quoted ones first\n            elsif (!$FTQUAL_NO_QUOTE{$tag}) {\n                my $pat = $value =~ /\\s/ ? '\\s|\\-|$' : '.|\\-|$';\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=\\\"$value\\\"\",$pat,80) || return;\n            } else {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=$value\",'.|$',80) || return; #'\n                                          }\n            }\n        }\n\n        return 1;\n    }\n\n#'\n=head2 _read_EMBL_References\n\n Title   : _read_EMBL_References\n Usage   :\n Function: Reads references from EMBL format. Internal function really\n Example :\n Returns :\n Args    :\n\n\n\nsub _read_EMBL_References {\n    my ($self,$buffer) = @_;\n    my (@refs);\n\n    # assume things are starting with RN\n\n    if ( $$buffer !~ /^RN/ ) {\n        warn(\"Not parsing line '$$buffer' which maybe important\");\n    }\n    my $b1;\n    my $b2;\n    my $title;\n    my $loc;\n    my $au;\n    my $med;\n    my $pm;\n    my $com;\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^R/ || last;\n        /^RP   (\\d+)-(\\d+)/ && do {$b1=$1;$b2=$2;};\n        /^RX   MEDLINE;\\s+(\\d+)/ && do {$med=$1};\n        /^RX   PUBMED;\\s+(\\d+)/ && do {$pm=$1};\n        /^RA   (.*)/ && do {\n            $au = $self->_concatenate_lines($au,$1); next;\n        };\n        /^RT   (.*)/ && do {\n            $title = $self->_concatenate_lines($title,$1); next;\n        };\n        /^RL   (.*)/ && do {\n            $loc = $self->_concatenate_lines($loc,$1); next;\n        };\n        /^RC   (.*)/ && do {\n            $com = $self->_concatenate_lines($com,$1); next;\n        };\n    }\n\n    my $ref = Bio::Annotation::Reference->new();\n    $au =~ s/;\\s*$//g;\n    $title =~ s/;\\s*$//g;\n\n    $ref->start($b1);\n    $ref->end($b2);\n    $ref->authors($au);\n    $ref->title($title);\n    $ref->location($loc);\n    $ref->medline($med);\n    $ref->comment($com);\n    $ref->pubmed($pm);\n\n    push(@refs,$ref);\n    $$buffer = $_;\n\n    return @refs;\n}\n\n=head2 _read_EMBL_Species\n\n Title   : _read_EMBL_Species\n Usage   :\n Function: Reads the EMBL Organism species and classification\n           lines.\n Example :\n Returns : A Bio::Species object\n Args    : a reference to the current line buffer, accession number\n\n\nsub _read_EMBL_Species {\n    my( $self, $buffer, $acc ) = @_;\n    my $org;\n\n    $_ = $$buffer;\n    my( $sub_species, $species, $genus, $common, $sci_name, $class_lines );\n    while (defined( $_ ||= $self->_readline )) {\n        if (/^OS\\s+(.+)/) {\n            $sci_name .= ($sci_name) ? ' '.$1 : $1;\n        } elsif (s/^OC\\s+(.+)$//) {\n            $class_lines .= $1;\n        } elsif (/^OG\\s+(.*)/) {\n            $org = $1;\n        } else {\n            last;\n        }\n\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n#    $$buffer = $_;\n\t$self->_pushback($_);\n\t\n    $sci_name =~ s{\\.$}{};\n    $sci_name || return;\n\n    # Convert data in classification lines into classification array.\n    # only split on ';' or '.' so that classification that is 2 or more words\n    # will still get matched, use map() to remove trailing/leading/intervening\n    # spaces\n    my @class = map { s/^\\s+//; s/\\s+$//; s/\\s{2,}/ /g; $_; } split /(?<!subgen)[;\\.]+/, $class_lines;\n\n    # do we have a genus?\n    my $possible_genus = $class[-1];\n    $possible_genus .= \"|$class[-2]\" if $class[-2];\n    if ($sci_name =~ /^($possible_genus)/) {\n        $genus = $1;\n        ($species) = $sci_name =~ /^$genus\\s+(.+)/;\n    } else {\n        $species = $sci_name;\n    }\n\n    # Don't make a species object if it is \"Unknown\" or \"None\"\n    if ($genus) {\n        return if $genus =~ /^(Unknown|None)$/i;\n    }\n\n    # is this organism of rank species or is it lower?\n    # (doesn't catch everything, but at least the guess isn't dangerous)\n    if ($species =~ /subsp\\.|var\\./) {\n        ($species, $sub_species) = $species =~ /(.+)\\s+((?:subsp\\.|var\\.).+)/;\n    }\n\n    # sometimes things have common name in brackets, like\n    # Schizosaccharomyces pombe (fission yeast), so get rid of the common\n    # name bit. Probably dangerous if real scientific species name ends in\n    # bracketed bit.\n    unless ($class[-1] eq 'Viruses') {\n        ($species, $common) = $species =~ /^(.+)\\s+\\((.+)\\)$/;\n        $sci_name =~ s/\\s+\\(.+\\)$// if $common;\n    }\n\n    # Bio::Species array needs array in Species -> Kingdom direction\n    unless ($class[-1] eq $sci_name) {\n        push(@class, $sci_name);\n    }\n    @class = reverse @class;\n\n    # do minimal sanity checks before we hand off to Bio::Species which won't\n    # be able to give informative throw messages if it has to throw because\n    # of problems here\n    $self->throw(\"$acc seems to be missing its OS line: invalid.\") unless $sci_name;\n    my %names;\n    foreach my $i (0..$#class) {\n        my $name = $class[$i];\n        $names{$name}++;\n        if ($names{$name} > 1 && $name ne $class[$i - 1]) {\n            $self->throw(\"$acc seems to have an invalid species classification.\");\n        }\n    }\n    my $make = Bio::Species->new();\n    $make->scientific_name($sci_name);\n    $make->classification(@class);\n    unless ($class[-1] eq 'Viruses') {\n        $make->genus($genus) if $genus;\n        $make->species($species) if $species;\n        $make->sub_species($sub_species) if $sub_species;\n        $make->common_name($common) if $common;\n    }\n    $make->organelle($org) if $org;\n    return $make;\n}\n\n=head2 _read_EMBL_DBLink\n\n Title   : _read_EMBL_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference (\"DR\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^DR   ([^\\s;]+);\\s*([^\\s;]+);?\\s*([^\\s;]+)?\\.$/) {\n        my ($databse, $prim_id, $sec_id) = ($1,$2,$3);\n        my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                            -primary_id  => $prim_id,\n                            -optional_id => $sec_id);\n\n            push(@db_link, $link);\n    } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _read_EMBL_TaxID_DBLink\n\n Title   : _read_EMBL_TaxID_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference to NCBI TaxID (\"OX\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_TaxID_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^OX   (\\S+)=(\\d+);$/ ) {\n            my ($databse, $prim_id) = ($1,$2);\n            my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                                                    -primary_id  => $prim_id,);\n            push(@db_link, $link);\n        } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _filehandle\n\n Title   : _filehandle\n Usage   : $obj->_filehandle($newval)\n Function:\n Example :\n Returns : value of _filehandle\n Args    : newvalue (optional)"},"kind":12,"line":1204,"children":[{"localvar":"my","containerName":"_filehandle","definition":"my","name":"$obj","line":1205,"kind":13},{"name":"$value","containerName":"_filehandle","kind":13,"line":1205},{"containerName":"_filehandle","name":"$value","line":1206,"kind":13},{"line":1207,"kind":13,"containerName":"_filehandle","name":"$obj"},{"line":1207,"kind":13,"name":"$value","containerName":"_filehandle"},{"line":1209,"kind":13,"name":"$obj","containerName":"_filehandle"}]},{"name":"_read_FTHelper_EMBL","range":{"start":{"line":1225,"character":0},"end":{"line":1328,"character":9999}},"definition":"sub","containerName":"main::","signature":{"documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :\n\n\n\nsub _print_EMBL_FTHelper {\n    my ($self,$fth) = @_;\n\n    if ( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {\n        $fth->warn(\"$fth is not a FTHelper class. Attempting to print, but there could be tears!\");\n    }\n\n\n    #$self->_print( \"FH   Key             Location/Qualifiers\\n\");\n    #$self->_print( sprintf(\"FT   %-15s  %s\\n\",$fth->key,$fth->loc));\n    # let\n    if ( $fth->key eq 'CONTIG' ) {\n        $self->_print(\"XX\\n\") || return;\n        $self->_write_line_EMBL_regex(\"CO   \",\n                                      \"CO   \",$fth->loc,\n                                      '\\,|$',80) || return; #'\n        return 1;\n    }\n    $self->_write_line_EMBL_regex(sprintf(\"FT   %-15s \",$fth->key),\n                                  \"FT                   \",$fth->loc,\n                                  '\\,|$',80) || return; #'\n    foreach my $tag ( keys %{$fth->field} ) {\n        if ( ! defined $fth->field->{$tag} ) {\n            next;\n        }\n        foreach my $value ( @{$fth->field->{$tag}} ) {\n            $value =~ s/\\\"/\\\"\\\"/g;\n            if ($value eq \"_no_value\") {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag\",'.|$',80) || return; #'\n            }\n            # there are almost 3x more quoted qualifier values and they\n            # are more common too so we take quoted ones first\n            elsif (!$FTQUAL_NO_QUOTE{$tag}) {\n                my $pat = $value =~ /\\s/ ? '\\s|\\-|$' : '.|\\-|$';\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=\\\"$value\\\"\",$pat,80) || return;\n            } else {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=$value\",'.|$',80) || return; #'\n                                          }\n            }\n        }\n\n        return 1;\n    }\n\n#'\n=head2 _read_EMBL_References\n\n Title   : _read_EMBL_References\n Usage   :\n Function: Reads references from EMBL format. Internal function really\n Example :\n Returns :\n Args    :\n\n\n\nsub _read_EMBL_References {\n    my ($self,$buffer) = @_;\n    my (@refs);\n\n    # assume things are starting with RN\n\n    if ( $$buffer !~ /^RN/ ) {\n        warn(\"Not parsing line '$$buffer' which maybe important\");\n    }\n    my $b1;\n    my $b2;\n    my $title;\n    my $loc;\n    my $au;\n    my $med;\n    my $pm;\n    my $com;\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^R/ || last;\n        /^RP   (\\d+)-(\\d+)/ && do {$b1=$1;$b2=$2;};\n        /^RX   MEDLINE;\\s+(\\d+)/ && do {$med=$1};\n        /^RX   PUBMED;\\s+(\\d+)/ && do {$pm=$1};\n        /^RA   (.*)/ && do {\n            $au = $self->_concatenate_lines($au,$1); next;\n        };\n        /^RT   (.*)/ && do {\n            $title = $self->_concatenate_lines($title,$1); next;\n        };\n        /^RL   (.*)/ && do {\n            $loc = $self->_concatenate_lines($loc,$1); next;\n        };\n        /^RC   (.*)/ && do {\n            $com = $self->_concatenate_lines($com,$1); next;\n        };\n    }\n\n    my $ref = Bio::Annotation::Reference->new();\n    $au =~ s/;\\s*$//g;\n    $title =~ s/;\\s*$//g;\n\n    $ref->start($b1);\n    $ref->end($b2);\n    $ref->authors($au);\n    $ref->title($title);\n    $ref->location($loc);\n    $ref->medline($med);\n    $ref->comment($com);\n    $ref->pubmed($pm);\n\n    push(@refs,$ref);\n    $$buffer = $_;\n\n    return @refs;\n}\n\n=head2 _read_EMBL_Species\n\n Title   : _read_EMBL_Species\n Usage   :\n Function: Reads the EMBL Organism species and classification\n           lines.\n Example :\n Returns : A Bio::Species object\n Args    : a reference to the current line buffer, accession number\n\n\nsub _read_EMBL_Species {\n    my( $self, $buffer, $acc ) = @_;\n    my $org;\n\n    $_ = $$buffer;\n    my( $sub_species, $species, $genus, $common, $sci_name, $class_lines );\n    while (defined( $_ ||= $self->_readline )) {\n        if (/^OS\\s+(.+)/) {\n            $sci_name .= ($sci_name) ? ' '.$1 : $1;\n        } elsif (s/^OC\\s+(.+)$//) {\n            $class_lines .= $1;\n        } elsif (/^OG\\s+(.*)/) {\n            $org = $1;\n        } else {\n            last;\n        }\n\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n#    $$buffer = $_;\n\t$self->_pushback($_);\n\t\n    $sci_name =~ s{\\.$}{};\n    $sci_name || return;\n\n    # Convert data in classification lines into classification array.\n    # only split on ';' or '.' so that classification that is 2 or more words\n    # will still get matched, use map() to remove trailing/leading/intervening\n    # spaces\n    my @class = map { s/^\\s+//; s/\\s+$//; s/\\s{2,}/ /g; $_; } split /(?<!subgen)[;\\.]+/, $class_lines;\n\n    # do we have a genus?\n    my $possible_genus = $class[-1];\n    $possible_genus .= \"|$class[-2]\" if $class[-2];\n    if ($sci_name =~ /^($possible_genus)/) {\n        $genus = $1;\n        ($species) = $sci_name =~ /^$genus\\s+(.+)/;\n    } else {\n        $species = $sci_name;\n    }\n\n    # Don't make a species object if it is \"Unknown\" or \"None\"\n    if ($genus) {\n        return if $genus =~ /^(Unknown|None)$/i;\n    }\n\n    # is this organism of rank species or is it lower?\n    # (doesn't catch everything, but at least the guess isn't dangerous)\n    if ($species =~ /subsp\\.|var\\./) {\n        ($species, $sub_species) = $species =~ /(.+)\\s+((?:subsp\\.|var\\.).+)/;\n    }\n\n    # sometimes things have common name in brackets, like\n    # Schizosaccharomyces pombe (fission yeast), so get rid of the common\n    # name bit. Probably dangerous if real scientific species name ends in\n    # bracketed bit.\n    unless ($class[-1] eq 'Viruses') {\n        ($species, $common) = $species =~ /^(.+)\\s+\\((.+)\\)$/;\n        $sci_name =~ s/\\s+\\(.+\\)$// if $common;\n    }\n\n    # Bio::Species array needs array in Species -> Kingdom direction\n    unless ($class[-1] eq $sci_name) {\n        push(@class, $sci_name);\n    }\n    @class = reverse @class;\n\n    # do minimal sanity checks before we hand off to Bio::Species which won't\n    # be able to give informative throw messages if it has to throw because\n    # of problems here\n    $self->throw(\"$acc seems to be missing its OS line: invalid.\") unless $sci_name;\n    my %names;\n    foreach my $i (0..$#class) {\n        my $name = $class[$i];\n        $names{$name}++;\n        if ($names{$name} > 1 && $name ne $class[$i - 1]) {\n            $self->throw(\"$acc seems to have an invalid species classification.\");\n        }\n    }\n    my $make = Bio::Species->new();\n    $make->scientific_name($sci_name);\n    $make->classification(@class);\n    unless ($class[-1] eq 'Viruses') {\n        $make->genus($genus) if $genus;\n        $make->species($species) if $species;\n        $make->sub_species($sub_species) if $sub_species;\n        $make->common_name($common) if $common;\n    }\n    $make->organelle($org) if $org;\n    return $make;\n}\n\n=head2 _read_EMBL_DBLink\n\n Title   : _read_EMBL_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference (\"DR\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^DR   ([^\\s;]+);\\s*([^\\s;]+);?\\s*([^\\s;]+)?\\.$/) {\n        my ($databse, $prim_id, $sec_id) = ($1,$2,$3);\n        my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                            -primary_id  => $prim_id,\n                            -optional_id => $sec_id);\n\n            push(@db_link, $link);\n    } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _read_EMBL_TaxID_DBLink\n\n Title   : _read_EMBL_TaxID_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference to NCBI TaxID (\"OX\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_TaxID_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^OX   (\\S+)=(\\d+);$/ ) {\n            my ($databse, $prim_id) = ($1,$2);\n            my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                                                    -primary_id  => $prim_id,);\n            push(@db_link, $link);\n        } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _filehandle\n\n Title   : _filehandle\n Usage   : $obj->_filehandle($newval)\n Function:\n Example :\n Returns : value of _filehandle\n Args    : newvalue (optional)\n\n\n\nsub _filehandle{\n    my ($obj,$value) = @_;\n    if ( defined $value) {\n        $obj->{'_filehandle'} = $value;\n    }\n    return $obj->{'_filehandle'};\n\n}\n\n=head2 _read_FTHelper_EMBL\n\n Title   : _read_FTHelper_EMBL\n Usage   : _read_FTHelper_EMBL($buffer)\n Function: reads the next FT key line\n Example :\n Returns : Bio::SeqIO::FTHelper object\n Args    : filehandle and reference to a scalar","parameters":[{"label":"$self"},{"label":"$buffer"}],"label":"_read_FTHelper_EMBL($self,$buffer)"},"detail":"($self,$buffer)","kind":12,"children":[{"kind":13,"line":1226,"definition":"my","name":"$self","containerName":"_read_FTHelper_EMBL","localvar":"my"},{"line":1226,"kind":13,"name":"$buffer","containerName":"_read_FTHelper_EMBL"},{"localvar":"my","definition":"my","name":"$key","containerName":"_read_FTHelper_EMBL","line":1228,"kind":13},{"line":1229,"kind":13,"name":"$loc","containerName":"_read_FTHelper_EMBL"},{"name":"@qual","containerName":"_read_FTHelper_EMBL","kind":13,"line":1230},{"containerName":"_read_FTHelper_EMBL","name":"$key","kind":13,"line":1234},{"line":1235,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$loc"},{"name":"$self","containerName":"_read_FTHelper_EMBL","line":1237,"kind":13},{"name":"_readline","containerName":"_read_FTHelper_EMBL","line":1237,"kind":12},{"line":1243,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"@qual"},{"line":1244,"kind":13,"name":"@qual","containerName":"_read_FTHelper_EMBL"},{"line":1248,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"@qual"},{"containerName":"_read_FTHelper_EMBL","name":"$loc","line":1252,"kind":13},{"containerName":"_read_FTHelper_EMBL","name":"$key","line":1264,"kind":13},{"kind":13,"line":1265,"name":"$loc","containerName":"_read_FTHelper_EMBL"},{"containerName":"_read_FTHelper_EMBL","name":"$self","kind":13,"line":1267},{"name":"_readline","containerName":"_read_FTHelper_EMBL","kind":12,"line":1267},{"line":1269,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$loc"},{"localvar":"my","containerName":"_read_FTHelper_EMBL","name":"$out","definition":"my","line":1284,"kind":13},{"containerName":"_read_FTHelper_EMBL","name":"new","line":1284,"kind":12},{"line":1285,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$out"},{"name":"verbose","containerName":"_read_FTHelper_EMBL","line":1285,"kind":12},{"line":1285,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$self"},{"kind":12,"line":1285,"name":"verbose","containerName":"_read_FTHelper_EMBL"},{"name":"$out","containerName":"_read_FTHelper_EMBL","kind":13,"line":1286},{"kind":12,"line":1286,"name":"key","containerName":"_read_FTHelper_EMBL"},{"line":1286,"kind":13,"name":"$key","containerName":"_read_FTHelper_EMBL"},{"line":1287,"kind":13,"name":"$out","containerName":"_read_FTHelper_EMBL"},{"name":"loc","containerName":"_read_FTHelper_EMBL","line":1287,"kind":12},{"containerName":"_read_FTHelper_EMBL","name":"$loc","line":1287,"kind":13},{"kind":13,"line":1291,"definition":"my","name":"$i","containerName":"_read_FTHelper_EMBL","localvar":"my"},{"line":1291,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$i"},{"kind":13,"line":1291,"name":"@qual","containerName":"_read_FTHelper_EMBL"},{"name":"$i","containerName":"_read_FTHelper_EMBL","line":1291,"kind":13},{"line":1292,"kind":13,"name":"$qual","containerName":"_read_FTHelper_EMBL"},{"kind":13,"line":1292,"name":"$i","containerName":"_read_FTHelper_EMBL"},{"containerName":"_read_FTHelper_EMBL","definition":"my","name":"$qualifier","localvar":"my","kind":13,"line":1293},{"line":1293,"kind":13,"name":"$value","containerName":"_read_FTHelper_EMBL"},{"kind":13,"line":1294,"containerName":"_read_FTHelper_EMBL","name":"$self"},{"name":"throw","containerName":"_read_FTHelper_EMBL","kind":12,"line":1294},{"kind":13,"line":1295,"name":"@qual","containerName":"_read_FTHelper_EMBL"},{"line":1296,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$value"},{"containerName":"_read_FTHelper_EMBL","name":"$value","line":1298,"kind":13},{"line":1302,"kind":13,"name":"$value","containerName":"_read_FTHelper_EMBL"},{"kind":13,"line":1302,"containerName":"_read_FTHelper_EMBL","name":"$value"},{"line":1303,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$i"},{"localvar":"my","containerName":"_read_FTHelper_EMBL","name":"$next","definition":"my","line":1304,"kind":13},{"line":1304,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$qual"},{"name":"$i","containerName":"_read_FTHelper_EMBL","kind":13,"line":1304},{"line":1305,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$next"},{"kind":13,"line":1306,"containerName":"_read_FTHelper_EMBL","name":"$self"},{"name":"warn","containerName":"_read_FTHelper_EMBL","kind":12,"line":1306},{"kind":13,"line":1306,"containerName":"_read_FTHelper_EMBL","name":"@qual"},{"name":"$value","containerName":"_read_FTHelper_EMBL","kind":13,"line":1309},{"name":"$value","containerName":"_read_FTHelper_EMBL","line":1314,"kind":13},{"name":"$value","containerName":"_read_FTHelper_EMBL","line":1314,"kind":13},{"containerName":"_read_FTHelper_EMBL","name":"$next","kind":13,"line":1314},{"name":"$next","containerName":"_read_FTHelper_EMBL","line":1314,"kind":13},{"name":"$value","containerName":"_read_FTHelper_EMBL","line":1317,"kind":13},{"kind":13,"line":1319,"name":"$value","containerName":"_read_FTHelper_EMBL"},{"name":"$value","containerName":"_read_FTHelper_EMBL","kind":13,"line":1322},{"line":1326,"kind":13,"containerName":"_read_FTHelper_EMBL","name":"$out"},{"line":1326,"kind":12,"name":"field","containerName":"_read_FTHelper_EMBL"},{"name":"$qualifier","containerName":"_read_FTHelper_EMBL","kind":13,"line":1326},{"kind":13,"line":1327,"name":"$out","containerName":"_read_FTHelper_EMBL"},{"line":1327,"kind":12,"containerName":"_read_FTHelper_EMBL","name":"field"},{"kind":13,"line":1327,"containerName":"_read_FTHelper_EMBL","name":"$qualifier"},{"kind":13,"line":1327,"containerName":"_read_FTHelper_EMBL","name":"$value"}],"line":1225},{"name":"buffer","line":1233,"kind":12},{"name":"buffer","kind":12,"line":1263},{"kind":12,"line":1281,"name":"buffer"},{"kind":12,"line":1284,"containerName":"SeqIO::FTHelper","name":"Bio"},{"name":"QUAL","line":1291,"kind":12},{"name":"QUOTES","kind":12,"line":1301},{"name":"QUOTES","kind":12,"line":1310},{"containerName":null,"name":"$out","kind":13,"line":1330},{"range":{"end":{"line":1364,"character":9999},"start":{"character":0,"line":1345}},"name":"_write_line_EMBL","detail":"($self,$pre1,$pre2,$line,$length)","signature":{"label":"_write_line_EMBL($self,$pre1,$pre2,$line,$length)","documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :\n\n\n\nsub _print_EMBL_FTHelper {\n    my ($self,$fth) = @_;\n\n    if ( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {\n        $fth->warn(\"$fth is not a FTHelper class. Attempting to print, but there could be tears!\");\n    }\n\n\n    #$self->_print( \"FH   Key             Location/Qualifiers\\n\");\n    #$self->_print( sprintf(\"FT   %-15s  %s\\n\",$fth->key,$fth->loc));\n    # let\n    if ( $fth->key eq 'CONTIG' ) {\n        $self->_print(\"XX\\n\") || return;\n        $self->_write_line_EMBL_regex(\"CO   \",\n                                      \"CO   \",$fth->loc,\n                                      '\\,|$',80) || return; #'\n        return 1;\n    }\n    $self->_write_line_EMBL_regex(sprintf(\"FT   %-15s \",$fth->key),\n                                  \"FT                   \",$fth->loc,\n                                  '\\,|$',80) || return; #'\n    foreach my $tag ( keys %{$fth->field} ) {\n        if ( ! defined $fth->field->{$tag} ) {\n            next;\n        }\n        foreach my $value ( @{$fth->field->{$tag}} ) {\n            $value =~ s/\\\"/\\\"\\\"/g;\n            if ($value eq \"_no_value\") {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag\",'.|$',80) || return; #'\n            }\n            # there are almost 3x more quoted qualifier values and they\n            # are more common too so we take quoted ones first\n            elsif (!$FTQUAL_NO_QUOTE{$tag}) {\n                my $pat = $value =~ /\\s/ ? '\\s|\\-|$' : '.|\\-|$';\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=\\\"$value\\\"\",$pat,80) || return;\n            } else {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=$value\",'.|$',80) || return; #'\n                                          }\n            }\n        }\n\n        return 1;\n    }\n\n#'\n=head2 _read_EMBL_References\n\n Title   : _read_EMBL_References\n Usage   :\n Function: Reads references from EMBL format. Internal function really\n Example :\n Returns :\n Args    :\n\n\n\nsub _read_EMBL_References {\n    my ($self,$buffer) = @_;\n    my (@refs);\n\n    # assume things are starting with RN\n\n    if ( $$buffer !~ /^RN/ ) {\n        warn(\"Not parsing line '$$buffer' which maybe important\");\n    }\n    my $b1;\n    my $b2;\n    my $title;\n    my $loc;\n    my $au;\n    my $med;\n    my $pm;\n    my $com;\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^R/ || last;\n        /^RP   (\\d+)-(\\d+)/ && do {$b1=$1;$b2=$2;};\n        /^RX   MEDLINE;\\s+(\\d+)/ && do {$med=$1};\n        /^RX   PUBMED;\\s+(\\d+)/ && do {$pm=$1};\n        /^RA   (.*)/ && do {\n            $au = $self->_concatenate_lines($au,$1); next;\n        };\n        /^RT   (.*)/ && do {\n            $title = $self->_concatenate_lines($title,$1); next;\n        };\n        /^RL   (.*)/ && do {\n            $loc = $self->_concatenate_lines($loc,$1); next;\n        };\n        /^RC   (.*)/ && do {\n            $com = $self->_concatenate_lines($com,$1); next;\n        };\n    }\n\n    my $ref = Bio::Annotation::Reference->new();\n    $au =~ s/;\\s*$//g;\n    $title =~ s/;\\s*$//g;\n\n    $ref->start($b1);\n    $ref->end($b2);\n    $ref->authors($au);\n    $ref->title($title);\n    $ref->location($loc);\n    $ref->medline($med);\n    $ref->comment($com);\n    $ref->pubmed($pm);\n\n    push(@refs,$ref);\n    $$buffer = $_;\n\n    return @refs;\n}\n\n=head2 _read_EMBL_Species\n\n Title   : _read_EMBL_Species\n Usage   :\n Function: Reads the EMBL Organism species and classification\n           lines.\n Example :\n Returns : A Bio::Species object\n Args    : a reference to the current line buffer, accession number\n\n\nsub _read_EMBL_Species {\n    my( $self, $buffer, $acc ) = @_;\n    my $org;\n\n    $_ = $$buffer;\n    my( $sub_species, $species, $genus, $common, $sci_name, $class_lines );\n    while (defined( $_ ||= $self->_readline )) {\n        if (/^OS\\s+(.+)/) {\n            $sci_name .= ($sci_name) ? ' '.$1 : $1;\n        } elsif (s/^OC\\s+(.+)$//) {\n            $class_lines .= $1;\n        } elsif (/^OG\\s+(.*)/) {\n            $org = $1;\n        } else {\n            last;\n        }\n\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n#    $$buffer = $_;\n\t$self->_pushback($_);\n\t\n    $sci_name =~ s{\\.$}{};\n    $sci_name || return;\n\n    # Convert data in classification lines into classification array.\n    # only split on ';' or '.' so that classification that is 2 or more words\n    # will still get matched, use map() to remove trailing/leading/intervening\n    # spaces\n    my @class = map { s/^\\s+//; s/\\s+$//; s/\\s{2,}/ /g; $_; } split /(?<!subgen)[;\\.]+/, $class_lines;\n\n    # do we have a genus?\n    my $possible_genus = $class[-1];\n    $possible_genus .= \"|$class[-2]\" if $class[-2];\n    if ($sci_name =~ /^($possible_genus)/) {\n        $genus = $1;\n        ($species) = $sci_name =~ /^$genus\\s+(.+)/;\n    } else {\n        $species = $sci_name;\n    }\n\n    # Don't make a species object if it is \"Unknown\" or \"None\"\n    if ($genus) {\n        return if $genus =~ /^(Unknown|None)$/i;\n    }\n\n    # is this organism of rank species or is it lower?\n    # (doesn't catch everything, but at least the guess isn't dangerous)\n    if ($species =~ /subsp\\.|var\\./) {\n        ($species, $sub_species) = $species =~ /(.+)\\s+((?:subsp\\.|var\\.).+)/;\n    }\n\n    # sometimes things have common name in brackets, like\n    # Schizosaccharomyces pombe (fission yeast), so get rid of the common\n    # name bit. Probably dangerous if real scientific species name ends in\n    # bracketed bit.\n    unless ($class[-1] eq 'Viruses') {\n        ($species, $common) = $species =~ /^(.+)\\s+\\((.+)\\)$/;\n        $sci_name =~ s/\\s+\\(.+\\)$// if $common;\n    }\n\n    # Bio::Species array needs array in Species -> Kingdom direction\n    unless ($class[-1] eq $sci_name) {\n        push(@class, $sci_name);\n    }\n    @class = reverse @class;\n\n    # do minimal sanity checks before we hand off to Bio::Species which won't\n    # be able to give informative throw messages if it has to throw because\n    # of problems here\n    $self->throw(\"$acc seems to be missing its OS line: invalid.\") unless $sci_name;\n    my %names;\n    foreach my $i (0..$#class) {\n        my $name = $class[$i];\n        $names{$name}++;\n        if ($names{$name} > 1 && $name ne $class[$i - 1]) {\n            $self->throw(\"$acc seems to have an invalid species classification.\");\n        }\n    }\n    my $make = Bio::Species->new();\n    $make->scientific_name($sci_name);\n    $make->classification(@class);\n    unless ($class[-1] eq 'Viruses') {\n        $make->genus($genus) if $genus;\n        $make->species($species) if $species;\n        $make->sub_species($sub_species) if $sub_species;\n        $make->common_name($common) if $common;\n    }\n    $make->organelle($org) if $org;\n    return $make;\n}\n\n=head2 _read_EMBL_DBLink\n\n Title   : _read_EMBL_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference (\"DR\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^DR   ([^\\s;]+);\\s*([^\\s;]+);?\\s*([^\\s;]+)?\\.$/) {\n        my ($databse, $prim_id, $sec_id) = ($1,$2,$3);\n        my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                            -primary_id  => $prim_id,\n                            -optional_id => $sec_id);\n\n            push(@db_link, $link);\n    } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _read_EMBL_TaxID_DBLink\n\n Title   : _read_EMBL_TaxID_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference to NCBI TaxID (\"OX\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_TaxID_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^OX   (\\S+)=(\\d+);$/ ) {\n            my ($databse, $prim_id) = ($1,$2);\n            my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                                                    -primary_id  => $prim_id,);\n            push(@db_link, $link);\n        } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _filehandle\n\n Title   : _filehandle\n Usage   : $obj->_filehandle($newval)\n Function:\n Example :\n Returns : value of _filehandle\n Args    : newvalue (optional)\n\n\n\nsub _filehandle{\n    my ($obj,$value) = @_;\n    if ( defined $value) {\n        $obj->{'_filehandle'} = $value;\n    }\n    return $obj->{'_filehandle'};\n\n}\n\n=head2 _read_FTHelper_EMBL\n\n Title   : _read_FTHelper_EMBL\n Usage   : _read_FTHelper_EMBL($buffer)\n Function: reads the next FT key line\n Example :\n Returns : Bio::SeqIO::FTHelper object\n Args    : filehandle and reference to a scalar\n\n\n\nsub _read_FTHelper_EMBL {\n    my ($self,$buffer) = @_;\n\n    my ($key,                   # The key of the feature\n        $loc,                   # The location line from the feature\n        @qual,                  # An arrray of lines making up the qualifiers\n       );\n\n    if ($$buffer =~ /^FT\\s{3}(\\S+)\\s+(\\S+)/ ) {\n        $key = $1;\n        $loc = $2;\n        # Read all the lines up to the next feature\n        while ( defined($_ = $self->_readline) ) {\n            if (/^FT(\\s+)(.+?)\\s*$/) {\n                # Lines inside features are preceeded by 19 spaces\n                # A new feature is preceeded by 3 spaces\n                if (length($1) > 4) {\n                    # Add to qualifiers if we're in the qualifiers\n                    if (@qual) {\n                        push(@qual, $2);\n                    }\n                    # Start the qualifier list if it's the first qualifier\n                    elsif (substr($2, 0, 1) eq '/') {\n                        @qual = ($2);\n                    }\n                    # We're still in the location line, so append to location\n                    else {\n                        $loc .= $2;\n                    }\n                } else {\n                    # We've reached the start of the next feature\n                    last;\n                }\n            } else {\n                # We're at the end of the feature table\n                last;\n            }\n        }\n    } elsif ( $$buffer =~ /^CO\\s+(\\S+)/) {\n    $key = 'CONTIG';\n    $loc = $1;\n    # Read all the lines up to the next feature\n    while ( defined($_ = $self->_readline) ) {\n        if (/^CO\\s+(\\S+)\\s*$/) {\n        $loc .= $1;\n        } else {\n        # We've reached the start of the next feature\n        last;\n        }\n    }\n    } else {\n        # No feature key\n        return;\n    }\n\n    # Put the first line of the next feature into the buffer\n    $$buffer = $_;\n\n    # Make the new FTHelper object\n    my $out = Bio::SeqIO::FTHelper->new();\n    $out->verbose($self->verbose());\n    $out->key($key);\n    $out->loc($loc);\n\n    # Now parse and add any qualifiers.  (@qual is kept\n    # intact to provide informative error messages.)\n  QUAL: for (my $i = 0; $i < @qual; $i++) {\n        $_ = $qual[$i];\n        my( $qualifier, $value ) = m{^/([^=]+)(?:=(.+))?}\n            or $self->throw(\"Can't see new qualifier in: $_\\nfrom:\\n\"\n                            . join('', map \"$_\\n\", @qual));\n        if (defined $value) {\n            # Do we have a quoted value?\n            if (substr($value, 0, 1) eq '\"') {\n                # Keep adding to value until we find the trailing quote\n                # and the quotes are balanced\n              QUOTES:\n                while ($value !~ /\"$/ or $value =~ tr/\"/\"/ % 2) { #\"\n                    $i++;\n                    my $next = $qual[$i];\n                    if (!defined($next)) {\n                        $self->warn(\"Unbalanced quote in:\\n\".join(\"\\n\", @qual).\n                                    \"\\nAdding quote to close...\".\n                                    \"Check sequence quality!\");\n                        $value .= '\"';\n                        last QUOTES;\n                    }\n\n                    # Join to value with space if value or next line contains a space\n                    $value .= (grep /\\s/, ($value, $next)) ? \" $next\" : $next;\n                }\n                # Trim leading and trailing quotes\n                $value =~ s/^\"|\"$//g;\n                # Undouble internal quotes\n                $value =~ s/\"\"/\"/g; #\"\n            }\n        } else {\n            $value = '_no_value';\n        }\n\n        # Store the qualifier\n        $out->field->{$qualifier} ||= [];\n        push(@{$out->field->{$qualifier}},$value);\n    }\n\n    return $out;\n}\n\n=head2 _write_line_EMBL\n\n Title   : _write_line_EMBL\n Usage   :\n Function: internal function\n Example :\n Returns : 1 if writing suceeded, else undef\n Args    :","parameters":[{"label":"$self"},{"label":"$pre1"},{"label":"$pre2"},{"label":"$line"},{"label":"$length"}]},"containerName":"main::","definition":"sub","line":1345,"children":[{"localvar":"my","name":"$self","definition":"my","containerName":"_write_line_EMBL","line":1346,"kind":13},{"kind":13,"line":1346,"containerName":"_write_line_EMBL","name":"$pre1"},{"containerName":"_write_line_EMBL","name":"$pre2","line":1346,"kind":13},{"containerName":"_write_line_EMBL","name":"$line","line":1346,"kind":13},{"kind":13,"line":1346,"containerName":"_write_line_EMBL","name":"$length"},{"name":"$length","containerName":"_write_line_EMBL","line":1348,"kind":13},{"line":1348,"kind":13,"containerName":"_write_line_EMBL","name":"$self"},{"containerName":"_write_line_EMBL","name":"throw","line":1348,"kind":12},{"line":1349,"kind":13,"localvar":"my","definition":"my","name":"$subl","containerName":"_write_line_EMBL"},{"name":"$length","containerName":"_write_line_EMBL","line":1349,"kind":13},{"name":"$pre2","containerName":"_write_line_EMBL","line":1349,"kind":13},{"line":1350,"kind":13,"localvar":"my","containerName":"_write_line_EMBL","definition":"my","name":"$linel"},{"name":"$line","containerName":"_write_line_EMBL","line":1350,"kind":13},{"name":"$i","definition":"my","containerName":"_write_line_EMBL","localvar":"my","kind":13,"line":1351},{"line":1353,"kind":13,"localvar":"my","containerName":"_write_line_EMBL","definition":"my","name":"$sub"},{"name":"$line","containerName":"_write_line_EMBL","kind":13,"line":1353},{"line":1353,"kind":13,"name":"$length","containerName":"_write_line_EMBL"},{"name":"$pre1","containerName":"_write_line_EMBL","kind":13,"line":1353},{"name":"$self","containerName":"_write_line_EMBL","kind":13,"line":1355},{"kind":12,"line":1355,"containerName":"_write_line_EMBL","name":"_print"},{"kind":13,"line":1357,"name":"$i","containerName":"_write_line_EMBL"},{"name":"$length","containerName":"_write_line_EMBL","line":1357,"kind":13},{"kind":13,"line":1357,"containerName":"_write_line_EMBL","name":"$pre1"},{"containerName":"_write_line_EMBL","name":"$i","line":1357,"kind":13},{"line":1357,"kind":13,"name":"$linel","containerName":"_write_line_EMBL"},{"line":1358,"kind":13,"name":"$sub","containerName":"_write_line_EMBL"},{"name":"$line","containerName":"_write_line_EMBL","kind":13,"line":1358},{"name":"$i","containerName":"_write_line_EMBL","line":1358,"kind":13},{"name":"$subl","containerName":"_write_line_EMBL","line":1358,"kind":13},{"containerName":"_write_line_EMBL","name":"$self","line":1359,"kind":13},{"line":1359,"kind":12,"containerName":"_write_line_EMBL","name":"_print"},{"line":1360,"kind":13,"name":"$i","containerName":"_write_line_EMBL"},{"kind":13,"line":1360,"name":"$subl","containerName":"_write_line_EMBL"}],"kind":12},{"range":{"start":{"character":0,"line":1381},"end":{"character":9999,"line":1419}},"name":"_write_line_EMBL_regex","line":1381,"children":[{"name":"$self","definition":"my","containerName":"_write_line_EMBL_regex","localvar":"my","kind":13,"line":1382},{"name":"$pre1","containerName":"_write_line_EMBL_regex","kind":13,"line":1382},{"line":1382,"kind":13,"name":"$pre2","containerName":"_write_line_EMBL_regex"},{"kind":13,"line":1382,"name":"$line","containerName":"_write_line_EMBL_regex"},{"line":1382,"kind":13,"name":"$regex","containerName":"_write_line_EMBL_regex"},{"kind":13,"line":1382,"name":"$length","containerName":"_write_line_EMBL_regex"},{"name":"$length","containerName":"_write_line_EMBL_regex","line":1386,"kind":13},{"containerName":"_write_line_EMBL_regex","name":"$self","line":1386,"kind":13},{"name":"throw","containerName":"_write_line_EMBL_regex","kind":12,"line":1386},{"kind":13,"line":1388,"containerName":"_write_line_EMBL_regex","definition":"my","name":"$subl","localvar":"my"},{"line":1388,"kind":13,"containerName":"_write_line_EMBL_regex","name":"$length"},{"containerName":"_write_line_EMBL_regex","name":"$pre1","kind":13,"line":1388},{"line":1389,"kind":13,"localvar":"my","containerName":"_write_line_EMBL_regex","definition":"my","name":"@lines"},{"line":1391,"kind":13,"name":"$line","containerName":"_write_line_EMBL_regex"},{"localvar":"my","containerName":"_write_line_EMBL_regex","definition":"my","name":"$pat","line":1392,"kind":13},{"containerName":"_write_line_EMBL_regex","name":"$regex","line":1392,"kind":13},{"kind":13,"line":1392,"containerName":"_write_line_EMBL_regex","name":"$regex"},{"name":"$regex","containerName":"_write_line_EMBL_regex","kind":13,"line":1392},{"line":1393,"kind":13,"containerName":"_write_line_EMBL_regex","name":"$line"},{"kind":13,"line":1394,"containerName":"_write_line_EMBL_regex","name":"$l","definition":"my","localvar":"my"},{"kind":13,"line":1395,"containerName":"_write_line_EMBL_regex","name":"$l"},{"line":1396,"kind":13,"containerName":"_write_line_EMBL_regex","name":"$pre1"},{"containerName":"_write_line_EMBL_regex","name":"$newl","definition":"my","localvar":"my","kind":13,"line":1397},{"kind":13,"line":1398,"containerName":"_write_line_EMBL_regex","name":"$line"},{"containerName":"_write_line_EMBL_regex","name":"$line","kind":13,"line":1398},{"kind":13,"line":1398,"name":"$l","containerName":"_write_line_EMBL_regex"},{"kind":13,"line":1401,"name":"$l","containerName":"_write_line_EMBL_regex"},{"containerName":"_write_line_EMBL_regex","name":"@lines","kind":13,"line":1402},{"name":"$l","containerName":"_write_line_EMBL_regex","kind":13,"line":1402},{"kind":13,"line":1407,"name":"$self","containerName":"_write_line_EMBL_regex"},{"name":"warn","containerName":"_write_line_EMBL_regex","kind":12,"line":1407},{"kind":13,"line":1410,"name":"$line","containerName":"_write_line_EMBL_regex"},{"containerName":"_write_line_EMBL_regex","name":"$line","line":1410,"kind":13},{"name":"$subl","containerName":"_write_line_EMBL_regex","kind":13,"line":1410},{"line":1410,"kind":13,"containerName":"_write_line_EMBL_regex","name":"$line"},{"kind":13,"line":1410,"name":"$subl","containerName":"_write_line_EMBL_regex"},{"kind":13,"line":1412,"definition":"my","name":"$s","containerName":"_write_line_EMBL_regex","localvar":"my"},{"line":1412,"kind":13,"containerName":"_write_line_EMBL_regex","name":"@lines"},{"line":1413,"kind":13,"name":"$self","containerName":"_write_line_EMBL_regex"},{"line":1413,"kind":12,"name":"_print","containerName":"_write_line_EMBL_regex"},{"name":"$s","containerName":"_write_line_EMBL_regex","line":1413,"kind":13},{"localvar":"my","containerName":"_write_line_EMBL_regex","name":"$s","definition":"my","line":1414,"kind":13},{"name":"@lines","containerName":"_write_line_EMBL_regex","line":1414,"kind":13},{"containerName":"_write_line_EMBL_regex","name":"$self","kind":13,"line":1415},{"containerName":"_write_line_EMBL_regex","name":"_print","line":1415,"kind":12}],"kind":12,"detail":"($self,$pre1,$pre2,$line,$regex,$length)","signature":{"label":"_write_line_EMBL_regex($self,$pre1,$pre2,$line,$regex,$length)","documentation":"1;\n# $Id: embl.pm 16123 2009-09-17 12:57:27Z cjfields $\n#\n# BioPerl module for Bio::SeqIO::EMBL\n#\n# Please direct questions and support issues to <bioperl-l@bioperl.org> \n#\n# Cared for by Ewan Birney <birney@ebi.ac.uk>\n#\n# Copyright Ewan Birney\n#\n# You may distribute this module under the same terms as perl itself\n\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::SeqIO::embl - EMBL sequence input/output stream\n\n=head1 SYNOPSIS\n\nIt is probably best not to use this object directly, but\nrather go through the SeqIO handler system. Go:\n\n    $stream = Bio::SeqIO->new(-file => $filename, -format => 'EMBL');\n\n    while ( (my $seq = $stream->next_seq()) ) {\n        # do something with $seq\n    }\n\n=head1 DESCRIPTION\n\nThis object can transform Bio::Seq objects to and from EMBL flat\nfile databases.\n\nThere is a lot of flexibility here about how to dump things which\nshould be documented more fully.\n\nThere should be a common object that this and Genbank share (probably\nwith Swissprot). Too much of the magic is identical.\n\n=head2 Optional functions\n\n=over 3\n\n* _show_dna()\n\n(output only) shows the dna or not\n\n* _post_sort()\n\n(output only) provides a sorting func which is applied to the FTHelpers\nbefore printing\n\n* _id_generation_func()\n\nThis is function which is called as\n\n   print \"ID   \", $func($annseq), \"\\n\";\n\nTo generate the ID line. If it is not there, it generates a sensible ID\nline using a number of tools.\n\nIf you want to output annotations in EMBL format they need to be\nstored in a Bio::Annotation::Collection object which is accessible\nthrough the Bio::SeqI interface method L<annotation()|annotation>.\n\nThe following are the names of the keys which are polled from a\nL<Bio::Annotation::Collection> object.\n\n reference  - Should contain Bio::Annotation::Reference objects\n comment    - Should contain Bio::Annotation::Comment objects\n dblink     - Should contain Bio::Annotation::DBLink objects\n\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to one\nof the Bioperl mailing lists.  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the Bioperl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via\nthe web:\n\n  http://bugzilla.open-bio.org/\n\n=head1 AUTHOR - Ewan Birney\n\nEmail birney@ebi.ac.uk\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a _\n\n\n\n# Let the code begin...\n\n\npackage Bio::SeqIO::embl;\nuse vars qw(%FTQUAL_NO_QUOTE);\nuse strict;\nuse Bio::SeqIO::FTHelper;\nuse Bio::SeqFeature::Generic;\nuse Bio::Species;\nuse Bio::Seq::SeqFactory;\nuse Bio::Annotation::Collection;\nuse Bio::Annotation::Comment;\nuse Bio::Annotation::Reference;\nuse Bio::Annotation::DBLink;\n\nuse base qw(Bio::SeqIO);\n\n%FTQUAL_NO_QUOTE=(\n                  'anticodon'=>1,\n                  'citation'=>1,\n                  'codon'=>1,\n                  'codon_start'=>1,\n                  'cons_splice'=>1,\n                  'direction'=>1,\n                  'evidence'=>1,\n                  'label'=>1,\n                  'mod_base'=> 1,\n                  'number'=> 1,\n                  'rpt_type'=> 1,\n                  'rpt_unit'=> 1,\n                  'transl_except'=> 1,\n                  'transl_table'=> 1,\n                  'usedin'=> 1,\n                 );\n\nsub _initialize {\n    my($self,@args) = @_;\n\n    $self->SUPER::_initialize(@args);\n    # hash for functions for decoding keys.\n    $self->{'_func_ftunit_hash'} = {};\n    # sets this to one by default. People can change it\n    $self->_show_dna(1);\n    if ( ! defined $self->sequence_factory ) {\n        $self->sequence_factory(Bio::Seq::SeqFactory->new\n                                (-verbose => $self->verbose(),\n                                 -type => 'Bio::Seq::RichSeq'));\n    }\n}\n\n=head2 next_seq\n\n Title   : next_seq\n Usage   : $seq = $stream->next_seq()\n Function: returns the next sequence in the stream\n Returns : Bio::Seq object\n Args    :\n\n\nsub next_seq {\n    my ($self,@args) = @_;\n    my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,\n        $date, $comment, @date_arr);\n\n    my ($annotation, %params, @features) =\n        Bio::Annotation::Collection->new();\n\n    $line = $self->_readline;\n    # This needs to be before the first eof() test\n\n    if ( !defined $line ) {\n        return;                 # no throws - end of file\n    }\n\n    if ( $line =~ /^\\s+$/ ) {\n        while ( defined ($line = $self->_readline) ) {\n            $line =~/^\\S/ && last;\n        }\n        # return without error if the whole next sequence was just a single\n        # blank line and then eof\n        return unless $line;\n    }\n\n    # no ID as 1st non-blank line, need short circuit and exit routine\n    $self->throw(\"EMBL stream with no ID. Not embl in my book\")\n        unless $line =~ /^ID\\s+\\S+/;\n\n    # At this point we are sure that $line contains an ID header line\n    my $alphabet;\n    if ( $line =~ tr/;/;/ == 6) { # New style headers contain exactly six semicolons.\n\n        # New style header (EMBL Release >= 87, after June 2006)\n        my $topology;\n        my $sv;\n\n        # ID   DQ299383; SV 1; linear; mRNA; STD; MAM; 431 BP.\n        # This regexp comes from the new2old.pl conversion script, from EBI\n        if ($line =~ m/^ID   (\\w+);\\s+SV (\\d+); (\\w+); ([^;]+); (\\w{3}); (\\w{3}); (\\d+) BP./) {\n        ($name, $sv, $topology, $mol, $div) = ($1, $2, $3, $4, $6);\n        }\n        if (defined($sv)) {\n        $params{'-seq_version'} = $sv;\n        $params{'-version'} = $sv;\n        }\n\n        if ($topology eq \"circular\") {\n        $params{'-is_circular'} = 1;\n        }\n    \n    if (defined $mol ) {\n        if ($mol =~ /DNA/) {\n        $alphabet='dna';\n        } elsif ($mol =~ /RNA/) {\n        $alphabet='rna';\n        } elsif ($mol =~ /AA/) {\n        $alphabet='protein';\n        }\n    }\n    } else {\n    \n        # Old style header (EMBL Release < 87, before June 2006)\n        if ($line =~ /^ID\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;\\s+(\\S+)[^;]*;/) {\n        ($name, $mol, $div) = ($1, $2, $3);\n        }\n    \n        if ($mol) {\n            if ( $mol =~ /circular/ ) {\n            $params{'-is_circular'} = 1;\n            $mol =~  s|circular ||;\n            }\n            if (defined $mol ) {\n            if ($mol =~ /DNA/) {\n                $alphabet='dna';\n            } elsif ($mol =~ /RNA/) {\n                $alphabet='rna';\n            } elsif ($mol =~ /AA/) {\n                $alphabet='protein';\n            }\n            }\n        }\n    }\n\n    unless( defined $name && length($name) ) {\n    $name = \"unknown_id\";\n    }\n\n    # $self->warn(\"not parsing upper annotation in EMBL file yet!\");\n    my $buffer = $line;\n    local $_;\n    BEFORE_FEATURE_TABLE :\n          until ( !defined $buffer ) {\n              $_ = $buffer;\n              # Exit at start of Feature table\n              if ( /^(F[HT]|SQ)/ ) {\n                  $self->_pushback($_) if( $1 eq 'SQ' || $1 eq 'FT');\n                  last;\n              }\n              # Description line(s)\n              if (/^DE\\s+(\\S.*\\S)/) {\n                  $desc .= $desc ? \" $1\" : $1;\n              }\n\n              #accession number\n              if ( /^AC\\s+(.*)?/ || /^PA\\s+(.*)?/) {\n                  my @accs = split(/[; ]+/, $1); # allow space in addition\n                  $params{'-accession_number'} = shift @accs\n                      unless defined $params{'-accession_number'};\n                  push @{$params{'-secondary_accessions'}}, @accs;\n              }\n\n              #version number\n              if ( /^SV\\s+\\S+\\.(\\d+);?/ ) {\n                  my $sv = $1;\n                  #$sv =~ s/\\;//;\n                  $params{'-seq_version'} = $sv;\n                  $params{'-version'} = $sv;\n              }\n\n              #date (NOTE: takes last date line)\n              if ( /^DT\\s+(.+)$/ ) {\n                  my $line = $1;\n                  my ($date, $version) = split(' ', $line, 2);\n                  $date =~ tr/,//d; # remove comma if new version\n                  if ($version =~ /\\(Rel\\. (\\d+), Created\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'creation_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n                  } elsif ($version =~ /\\(Rel\\. (\\d+), Last updated, Version (\\d+)\\)/xms ) {\n                      my $release = Bio::Annotation::SimpleValue->new(\n                                                                      -tagname    => 'update_release',\n                                                                      -value      => $1\n                                                                     );\n                      $annotation->add_Annotation($release);\n\n                      my $update = Bio::Annotation::SimpleValue->new(\n                                                                     -tagname    => 'update_version',\n                                                                     -value      => $2\n                                                                    );\n                      $annotation->add_Annotation($update);\n                  }\n                  push @{$params{'-dates'}}, $date;\n              }\n\n              #keywords\n              if ( /^KW   (.*)\\S*$/ ) {\n                  my @kw = split(/\\s*\\;\\s*/,$1);\n                  push @{$params{'-keywords'}}, @kw;\n              }\n\n              # Organism name and phylogenetic information\n              elsif (/^O[SC]/) {\n                  # pass the accession number so we can give an informative throw message if necessary\n                  my $species = $self->_read_EMBL_Species(\\$buffer, $params{'-accession_number'});\n                  $params{'-species'}= $species;\n              }\n\n              # NCBI TaxID Xref\n              elsif (/^OX/) {\n                  my @links = $self->_read_EMBL_TaxID_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # References\n              elsif (/^R/) {\n                  my @refs = $self->_read_EMBL_References(\\$buffer);\n                  foreach my $ref ( @refs ) {\n                      $annotation->add_Annotation('reference',$ref);\n                  }\n              }\n\n              # DB Xrefs\n              elsif (/^DR/) {\n                  my @links = $self->_read_EMBL_DBLink(\\$buffer);\n                  foreach my $dblink ( @links ) {\n                      $annotation->add_Annotation('dblink',$dblink);\n                  }\n              }\n\n              # Comments\n              elsif (/^CC\\s+(.*)/) {\n                  $comment .= $1;\n                  $comment .= \" \";\n                  while (defined ($_ = $self->_readline) ) {\n                      if (/^CC\\s+(.*)/) {\n                          $comment .= $1;\n                          $comment .= \" \";\n                      } else {\n                          last;\n                      }\n                  }\n                  my $commobj = Bio::Annotation::Comment->new();\n                  $commobj->text($comment);\n                  $annotation->add_Annotation('comment',$commobj);\n                  $comment = \"\";\n              }\n\n              # Get next line.\n              $buffer = $self->_readline;\n          }\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^FT\\s{3}\\w/ && last;\n        /^SQ / && last;\n        /^CO / && last;\n    }\n    $buffer = $_;\n\n    if (defined($buffer) && $buffer =~ /^FT /) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n\n            # process ftunit\n            my $feat =\n                $ftunit->_generic_seqfeature($self->location_factory(), $name);\n\n            # add taxon_id from source if available\n            if ($params{'-species'} && ($feat->primary_tag eq 'source')\n                && $feat->has_tag('db_xref')\n                && (! $params{'-species'}->ncbi_taxid())) {\n                foreach my $tagval ($feat->get_tag_values('db_xref')) {\n                    if (index($tagval,\"taxon:\") == 0) {\n                        $params{'-species'}->ncbi_taxid(substr($tagval,6));\n                        last;\n                    }\n                }\n            }\n\n            # add feature to list of features\n            push(@features, $feat);\n\n            if ( $buffer !~ /^FT/ ) {\n                last;\n            }\n        }\n    }\n    # skip comments\n    while ( defined ($buffer) && $buffer =~ /^XX/ ) {\n        $buffer = $self->_readline();\n    }\n\n    if ( $buffer =~ /^CO/  ) {\n        until ( !defined ($buffer) ) {\n            my $ftunit = $self->_read_FTHelper_EMBL(\\$buffer);\n            # process ftunit\n            push(@features,\n                 $ftunit->_generic_seqfeature($self->location_factory(),\n                                              $name));\n\n            if ( $buffer !~ /^CO/ ) {\n                last;\n            }\n        }\n    }\n    if ( $buffer !~ /^SQ/  ) {\n        while ( defined ($_ = $self->_readline) ) {\n            /^SQ/ && last;\n        }\n    }\n    $seqc = \"\";\n    while ( defined ($_ = $self->_readline) ) {\n        m{^//} && last;\n        $_ = uc($_);\n        s/[^A-Za-z]//g;\n        $seqc .= $_;\n    }\n    my $seq = $self->sequence_factory->create\n        (-verbose => $self->verbose(),\n         -division => $div,\n         -seq => $seqc,\n         -desc => $desc,\n         -display_id => $name,\n         -annotation => $annotation,\n         -molecule => $mol,\n         -alphabet => $alphabet,\n         -features => \\@features,\n         %params);\n    return $seq;\n}\n\n\n\n=head2 _write_ID_line\n\n Title   : _write_ID_line\n Usage   : $self->_write_ID_line($seq);\n Function: Writes the EMBL Release 87 format ID line to the stream, unless\n         : there is a user-supplied ID line generation function in which\n         : case that is used instead.\n         : ( See Bio::SeqIO::embl::_id_generation_function(). )\n Returns : nothing\n Args    : Bio::Seq object\n\n\nsub _write_ID_line {\n\n    my ($self, $seq) = @_;\n\n    my $id_line;\n    # If there is a user-supplied ID generation function, use it.\n    if ( $self->_id_generation_func ) {\n        $id_line = \"ID   \" . &{$self->_id_generation_func}($seq) . \"\\nXX\\n\";\n    }\n    # Otherwise, generate a standard EMBL release 87 (June 2006) ID line.\n    else {\n\n        # The sequence name is supposed to be the primary accession number,\n        my $name = $seq->accession_number();\n        if (!$name) {\n            # but if it is not present, use the sequence ID.\n            $name = $seq->id();\n        }\n\n        $self->warn(\"No whitespace allowed in EMBL id [\". $name. \"]\") if $name =~ /\\s/;\n\n        # Use the sequence version, or default to 1.\n        my $version = $seq->version() || 1;\n\n        my $len = $seq->length();\n\n        # Taxonomic division.\n        my $div;\n        if ( $seq->can('division') && defined($seq->division) &&\n             $self->_is_valid_division($seq->division) ) {\n            $div = $seq->division();\n        } else {\n            $div ||= 'UNC';     # 'UNC' is the EMBL division code for 'unclassified'.\n        }\n\n        my $mol;\n        # If the molecule type is a valid EMBL type, use it.\n        if (  $seq->can('molecule')\n              && defined($seq->molecule)\n              && $self->_is_valid_molecule_type($seq->molecule)\n           ) {\n            $mol = $seq->molecule();\n        }\n        # Otherwise, choose unassigned DNA or RNA based on the alphabet.\n        elsif ($seq->can('primary_seq') && defined $seq->primary_seq->alphabet) {\n            my $alphabet =$seq->primary_seq->alphabet;\n            if ($alphabet eq 'dna') {\n                $mol ='unassigned DNA';\n            } elsif ($alphabet eq 'rna') {\n                $mol='unassigned RNA';\n            } elsif ($alphabet eq 'protein') {\n                $self->warn(\"Protein sequence found; EMBL is a nucleotide format.\");\n                $mol='AA';  # AA is not a valid EMBL molecule type.\n            }\n        }\n\n        my $topology = 'linear';\n        if ($seq->is_circular) {\n            $topology = 'circular';\n        }\n\n        $mol ||= '';            # 'unassigned'; ?\n        $id_line = \"ID   $name; SV $version; $topology; $mol; STD; $div; $len BP.\\nXX\\n\";\n        $self->_print($id_line);\n    }\n}\n\n=head2 _is_valid_division\n\n Title   : _is_valid_division\n Usage   : $self->_is_valid_division($div)\n Function: tests division code for validity\n Returns : true if $div is a valid EMBL release 87 taxonomic division.\n Args    : taxonomic division code string\n\n\nsub _is_valid_division {\n    my ($self, $division) = @_;\n\n    my %EMBL_divisions = (\n                          \"PHG\"    => 1, # Bacteriophage\n                          \"ENV\"    => 1, # Environmental Sample\n                          \"FUN\"    => 1, # Fungal\n                          \"HUM\"    => 1, # Human\n                          \"INV\"    => 1, # Invertebrate\n                          \"MAM\"    => 1, # Other Mammal\n                          \"VRT\"    => 1, # Other Vertebrate\n                          \"MUS\"    => 1, # Mus musculus\n                          \"PLN\"    => 1, # Plant\n                          \"PRO\"    => 1, # Prokaryote\n                          \"ROD\"    => 1, # Other Rodent\n                          \"SYN\"    => 1, # Synthetic\n                          \"UNC\"    => 1, # Unclassified\n                          \"VRL\"    => 1 # Viral\n                         );\n\n    return exists($EMBL_divisions{$division});\n}\n\n=head2 _is_valid_molecule_type\n\n Title   : _is_valid_molecule_type\n Usage   : $self->_is_valid_molecule_type($mol)\n Function: tests molecule type for validity\n Returns : true if $mol is a valid EMBL release 87 molecule type.\n Args    : molecule type string\n\n\nsub _is_valid_molecule_type {\n    my ($self, $moltype) = @_;\n\n    my %EMBL_molecule_types = (\n                               \"genomic DNA\"    => 1,\n                               \"genomic RNA\"    => 1,\n                               \"mRNA\"           => 1,\n                               \"tRNA\"           => 1,\n                               \"rRNA\"           => 1,\n                               \"snoRNA\"         => 1,\n                               \"snRNA\"          => 1,\n                               \"scRNA\"          => 1,\n                               \"pre-RNA\"        => 1,\n                               \"other RNA\"      => 1,\n                               \"other DNA\"      => 1,\n                               \"unassigned DNA\" => 1,\n                               \"unassigned RNA\" => 1\n                              );\n\n    return exists($EMBL_molecule_types{$moltype});\n}\n\n=head2 write_seq\n\n Title   : write_seq\n Usage   : $stream->write_seq($seq)\n Function: writes the $seq object (must be seq) to the stream\n Returns : 1 for success and undef for error\n Args    : array of 1 to n Bio::SeqI objects\n\n\n\nsub write_seq {\n    my ($self,@seqs) = @_;\n\n    foreach my $seq ( @seqs ) {\n        $self->throw(\"Attempting to write with no seq!\") unless defined $seq;\n        unless ( ref $seq && $seq->isa('Bio::SeqI' ) ) {\n            $self->warn(\"$seq is not a SeqI compliant sequence object!\")\n                if $self->verbose >= 0;\n            unless ( ref $seq && $seq->isa('Bio::PrimarySeqI' ) ) {\n                $self->throw(\"$seq is not a PrimarySeqI compliant sequence object!\");\n            }\n        }\n        my $str = $seq->seq || '';\n\n        # Write the ID line.\n        $self->_write_ID_line($seq);\n\n\n        # Write the accession line if present\n        my( $acc );\n        {\n            if ( my $func = $self->_ac_generation_func ) {\n                $acc = &{$func}($seq);\n            } elsif ( $seq->isa('Bio::Seq::RichSeqI') &&\n                      defined($seq->accession_number) ) {\n                $acc = $seq->accession_number;\n                $acc = join(\"; \", $acc, $seq->get_secondary_accessions);\n            } elsif ( $seq->can('accession_number') ) {\n                $acc = $seq->accession_number;\n            }\n\n            if (defined $acc) {\n                $self->_print(\"AC   $acc;\\n\",\n                              \"XX\\n\") || return;\n            }\n        }\n\n        # Date lines\n        my $switch=0;\n        if ( $seq->can('get_dates') ) {\n            my @dates =  $seq->get_dates();\n            my $ct = 1;\n            my $date_flag = 0;\n            my ($cr) = $seq->annotation->get_Annotations(\"creation_release\");\n            my ($ur) = $seq->annotation->get_Annotations(\"update_release\");\n            my ($uv) = $seq->annotation->get_Annotations(\"update_version\");\n\n            unless ($cr && $ur && $ur) {\n                $date_flag = 1;\n            }\n\n            foreach my $dt (@dates) {\n                if (!$date_flag) {\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $cr, Created)\",\n                                                  '\\s+|$',80) if $ct == 1;\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt.\" (Rel. $ur, Last updated, Version $uv)\",\n                                                  '\\s+|$',80) if $ct == 2;\n                } else {        # other formats?\n                    $self->_write_line_EMBL_regex(\"DT   \",\"DT   \",\n                                                  $dt,'\\s+|$',80);\n                }\n                $switch =1;\n                $ct++;\n            }\n            if ($switch == 1) {\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n\n        # Description lines\n        $self->_write_line_EMBL_regex(\"DE   \",\"DE   \",$seq->desc(),'\\s+|$',80) || return; #'\n        $self->_print( \"XX\\n\") || return;\n\n        # if there, write the kw line\n        {\n            my( $kw );\n            if ( my $func = $self->_kw_generation_func ) {\n                $kw = &{$func}($seq);\n            } elsif ( $seq->can('keywords') ) {\n                $kw = $seq->keywords;\n            }\n            if (defined $kw) {\n                $self->_write_line_EMBL_regex(\"KW   \", \"KW   \", $kw, '\\s+|$', 80) || return; #'\n                $self->_print( \"XX\\n\") || return;\n            }\n        }\n\n        # Organism lines\n\n        if ($seq->can('species') && (my $spec = $seq->species)) {\n            my @class = $spec->classification();\n            shift @class;       # get rid of species name. Some embl files include\n                                # the species name in the OC lines, but this seems\n                                # more like an error than something we need to\n                                # emulate\n            my $OS = $spec->scientific_name;\n            if ($spec->common_name) {\n                $OS .= ' ('.$spec->common_name.')';\n            }\n            $self->_print(\"OS   $OS\\n\") || return;\n            my $OC = join('; ', reverse(@class)) .'.';\n            $self->_write_line_EMBL_regex(\"OC   \",\"OC   \",$OC,'; |$',80) || return;\n            if ($spec->organelle) {\n                $self->_write_line_EMBL_regex(\"OG   \",\"OG   \",$spec->organelle,'; |$',80) || return;\n            }\n            $self->_print(\"XX\\n\") || return;\n        }\n\n        # Reference lines\n        my $t = 1;\n        if ( $seq->can('annotation') && defined $seq->annotation ) {\n            foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {\n                $self->_print( \"RN   [$t]\\n\") || return;\n\n                # Having no RP line is legal, but we need both\n                # start and end for a valid location.\n                if ($ref->comment) {\n                    $self->_write_line_EMBL_regex(\"RC   \", \"RC   \", $ref->comment, '\\s+|$', 80) || return; #'\n                }\n                my $start = $ref->start;\n                my $end   = $ref->end;\n                if ($start and $end) {\n                    $self->_print( \"RP   $start-$end\\n\") || return;\n                } elsif ($start or $end) {\n                    $self->throw(\"Both start and end are needed for a valid RP line.\".\n                                 \"  Got: start='$start' end='$end'\");\n                }\n\n                if (my $med = $ref->medline) {\n                    $self->_print( \"RX   MEDLINE; $med.\\n\") || return;\n                }\n                if (my $pm = $ref->pubmed) {\n                    $self->_print( \"RX   PUBMED; $pm.\\n\") || return;\n                }\n                my $authors = $ref->authors;\n                $authors =~ s/([\\w\\.]) (\\w)/$1#$2/g;  # add word wrap protection char '#'\n\n                $self->_write_line_EMBL_regex(\"RA   \", \"RA   \",\n                                              $authors . \";\",\n                                              '\\s+|$', 80) || return; #'\n\n                # If there is no title to the reference, it appears\n                # as a single semi-colon.  All titles must end in\n                # a semi-colon.\n                my $ref_title = $ref->title || '';\n                $ref_title =~ s/[\\s;]*$/;/;\n                $self->_write_line_EMBL_regex(\"RT   \", \"RT   \", $ref_title,    '\\s+|$', 80) || return; #'\n                $self->_write_line_EMBL_regex(\"RL   \", \"RL   \", $ref->location, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n                $t++;\n            }\n\n            # DB Xref lines\n            if (my @db_xref = $seq->annotation->get_Annotations('dblink') ) {\n                for my $dr (@db_xref) {\n                    my $db_name = $dr->database;\n                    my $prim    = $dr->primary_id;\n\n                    my $opt     = $dr->optional_id || '';\n                    my $line = $opt ? \"$db_name; $prim; $opt.\" : \"$db_name; $prim.\";\n                    $self->_write_line_EMBL_regex(\"DR   \", \"DR   \", $line, '\\s+|$', 80) || return; #'\n                }\n                $self->_print(\"XX\\n\") || return;\n            }\n            \n            # Comment lines\n            foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {\n                $self->_write_line_EMBL_regex(\"CC   \", \"CC   \", $comment->text, '\\s+|$', 80) || return; #'\n                $self->_print(\"XX\\n\") || return;\n            }\n        }\n        # \"\\\\s\\+\\|\\$\"\n\n        ## FEATURE TABLE\n\n        $self->_print(\"FH   Key             Location/Qualifiers\\n\") || return;\n        $self->_print(\"FH\\n\") || return;\n\n        my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();\n        if ($feats[0]) {\n            if ( defined $self->_post_sort ) {\n                # we need to read things into an array.\n                # Process. Sort them. Print 'em\n\n                my $post_sort_func = $self->_post_sort();\n                my @fth;\n\n                foreach my $sf ( @feats ) {\n                    push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));\n                }\n\n                @fth = sort { &$post_sort_func($a,$b) } @fth;\n\n                foreach my $fth ( @fth ) {\n                    $self->_print_EMBL_FTHelper($fth) || return;\n                }\n            } else {\n                # not post sorted. And so we can print as we get them.\n                # lower memory load...\n\n                foreach my $sf ( @feats ) {\n                    my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);\n                    foreach my $fth ( @fth ) {\n                        if ( $fth->key eq 'CONTIG') {\n                            $self->_show_dna(0);\n                        }\n                        $self->_print_EMBL_FTHelper($fth) || return;\n                    }\n                }\n            }\n        }\n\n        if ( $self->_show_dna() == 0 ) {\n            $self->_print( \"//\\n\") || return;\n            return;\n        }\n        $self->_print( \"XX\\n\") || return;\n\n        # finished printing features.\n\n        $str =~ tr/A-Z/a-z/;\n\n        # Count each nucleotide\n        my $alen = $str =~ tr/a/a/;\n        my $clen = $str =~ tr/c/c/;\n        my $glen = $str =~ tr/g/g/;\n        my $tlen = $str =~ tr/t/t/;\n\n        my $len = $seq->length();\n        my $olen = $seq->length() - ($alen + $tlen + $clen + $glen);\n        if ( $olen < 0 ) {\n            $self->warn(\"Weird. More atgc than bases. Problem!\");\n        }\n\n        $self->_print(\"SQ   Sequence $len BP; $alen A; $clen C; $glen G; $tlen T; $olen other;\\n\") || return;\n\n        my $nuc = 60;       # Number of nucleotides per line\n        my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line\n        my $out_pat   = 'A11' x 6; # Pattern for packing a line\n        my $length = length($str);\n\n        # Calculate the number of nucleotides which fit on whole lines\n        my $whole = int($length / $nuc) * $nuc;\n\n        # Print the whole lines\n        my( $i );\n        for ($i = 0; $i < $whole; $i += $nuc) {\n            my $blocks = pack $out_pat,\n                unpack $whole_pat,\n                    substr($str, $i, $nuc);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $i + $nuc)) || return;\n        }\n\n        # Print the last line\n        if (my $last = substr($str, $i)) {\n            my $last_len = length($last);\n            my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;\n            my $blocks = pack $out_pat,\n                unpack($last_pat, $last);\n            $self->_print(sprintf(\"     $blocks%9d\\n\", $length)) ||\n                return;         # Add the length to the end\n        }\n\n        $self->_print( \"//\\n\") || return;\n\n        $self->flush if $self->_flush_on_write && defined $self->_fh;\n    }\n    return 1;\n}\n\n=head2 _print_EMBL_FTHelper\n\n Title   : _print_EMBL_FTHelper\n Usage   :\n Function: Internal function\n Returns : 1 if writing suceeded, otherwise undef\n Args    :\n\n\n\nsub _print_EMBL_FTHelper {\n    my ($self,$fth) = @_;\n\n    if ( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {\n        $fth->warn(\"$fth is not a FTHelper class. Attempting to print, but there could be tears!\");\n    }\n\n\n    #$self->_print( \"FH   Key             Location/Qualifiers\\n\");\n    #$self->_print( sprintf(\"FT   %-15s  %s\\n\",$fth->key,$fth->loc));\n    # let\n    if ( $fth->key eq 'CONTIG' ) {\n        $self->_print(\"XX\\n\") || return;\n        $self->_write_line_EMBL_regex(\"CO   \",\n                                      \"CO   \",$fth->loc,\n                                      '\\,|$',80) || return; #'\n        return 1;\n    }\n    $self->_write_line_EMBL_regex(sprintf(\"FT   %-15s \",$fth->key),\n                                  \"FT                   \",$fth->loc,\n                                  '\\,|$',80) || return; #'\n    foreach my $tag ( keys %{$fth->field} ) {\n        if ( ! defined $fth->field->{$tag} ) {\n            next;\n        }\n        foreach my $value ( @{$fth->field->{$tag}} ) {\n            $value =~ s/\\\"/\\\"\\\"/g;\n            if ($value eq \"_no_value\") {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag\",'.|$',80) || return; #'\n            }\n            # there are almost 3x more quoted qualifier values and they\n            # are more common too so we take quoted ones first\n            elsif (!$FTQUAL_NO_QUOTE{$tag}) {\n                my $pat = $value =~ /\\s/ ? '\\s|\\-|$' : '.|\\-|$';\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=\\\"$value\\\"\",$pat,80) || return;\n            } else {\n                $self->_write_line_EMBL_regex(\"FT                   \",\n                                              \"FT                   \",\n                                              \"/$tag=$value\",'.|$',80) || return; #'\n                                          }\n            }\n        }\n\n        return 1;\n    }\n\n#'\n=head2 _read_EMBL_References\n\n Title   : _read_EMBL_References\n Usage   :\n Function: Reads references from EMBL format. Internal function really\n Example :\n Returns :\n Args    :\n\n\n\nsub _read_EMBL_References {\n    my ($self,$buffer) = @_;\n    my (@refs);\n\n    # assume things are starting with RN\n\n    if ( $$buffer !~ /^RN/ ) {\n        warn(\"Not parsing line '$$buffer' which maybe important\");\n    }\n    my $b1;\n    my $b2;\n    my $title;\n    my $loc;\n    my $au;\n    my $med;\n    my $pm;\n    my $com;\n\n    while ( defined ($_ = $self->_readline) ) {\n        /^R/ || last;\n        /^RP   (\\d+)-(\\d+)/ && do {$b1=$1;$b2=$2;};\n        /^RX   MEDLINE;\\s+(\\d+)/ && do {$med=$1};\n        /^RX   PUBMED;\\s+(\\d+)/ && do {$pm=$1};\n        /^RA   (.*)/ && do {\n            $au = $self->_concatenate_lines($au,$1); next;\n        };\n        /^RT   (.*)/ && do {\n            $title = $self->_concatenate_lines($title,$1); next;\n        };\n        /^RL   (.*)/ && do {\n            $loc = $self->_concatenate_lines($loc,$1); next;\n        };\n        /^RC   (.*)/ && do {\n            $com = $self->_concatenate_lines($com,$1); next;\n        };\n    }\n\n    my $ref = Bio::Annotation::Reference->new();\n    $au =~ s/;\\s*$//g;\n    $title =~ s/;\\s*$//g;\n\n    $ref->start($b1);\n    $ref->end($b2);\n    $ref->authors($au);\n    $ref->title($title);\n    $ref->location($loc);\n    $ref->medline($med);\n    $ref->comment($com);\n    $ref->pubmed($pm);\n\n    push(@refs,$ref);\n    $$buffer = $_;\n\n    return @refs;\n}\n\n=head2 _read_EMBL_Species\n\n Title   : _read_EMBL_Species\n Usage   :\n Function: Reads the EMBL Organism species and classification\n           lines.\n Example :\n Returns : A Bio::Species object\n Args    : a reference to the current line buffer, accession number\n\n\nsub _read_EMBL_Species {\n    my( $self, $buffer, $acc ) = @_;\n    my $org;\n\n    $_ = $$buffer;\n    my( $sub_species, $species, $genus, $common, $sci_name, $class_lines );\n    while (defined( $_ ||= $self->_readline )) {\n        if (/^OS\\s+(.+)/) {\n            $sci_name .= ($sci_name) ? ' '.$1 : $1;\n        } elsif (s/^OC\\s+(.+)$//) {\n            $class_lines .= $1;\n        } elsif (/^OG\\s+(.*)/) {\n            $org = $1;\n        } else {\n            last;\n        }\n\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n#    $$buffer = $_;\n\t$self->_pushback($_);\n\t\n    $sci_name =~ s{\\.$}{};\n    $sci_name || return;\n\n    # Convert data in classification lines into classification array.\n    # only split on ';' or '.' so that classification that is 2 or more words\n    # will still get matched, use map() to remove trailing/leading/intervening\n    # spaces\n    my @class = map { s/^\\s+//; s/\\s+$//; s/\\s{2,}/ /g; $_; } split /(?<!subgen)[;\\.]+/, $class_lines;\n\n    # do we have a genus?\n    my $possible_genus = $class[-1];\n    $possible_genus .= \"|$class[-2]\" if $class[-2];\n    if ($sci_name =~ /^($possible_genus)/) {\n        $genus = $1;\n        ($species) = $sci_name =~ /^$genus\\s+(.+)/;\n    } else {\n        $species = $sci_name;\n    }\n\n    # Don't make a species object if it is \"Unknown\" or \"None\"\n    if ($genus) {\n        return if $genus =~ /^(Unknown|None)$/i;\n    }\n\n    # is this organism of rank species or is it lower?\n    # (doesn't catch everything, but at least the guess isn't dangerous)\n    if ($species =~ /subsp\\.|var\\./) {\n        ($species, $sub_species) = $species =~ /(.+)\\s+((?:subsp\\.|var\\.).+)/;\n    }\n\n    # sometimes things have common name in brackets, like\n    # Schizosaccharomyces pombe (fission yeast), so get rid of the common\n    # name bit. Probably dangerous if real scientific species name ends in\n    # bracketed bit.\n    unless ($class[-1] eq 'Viruses') {\n        ($species, $common) = $species =~ /^(.+)\\s+\\((.+)\\)$/;\n        $sci_name =~ s/\\s+\\(.+\\)$// if $common;\n    }\n\n    # Bio::Species array needs array in Species -> Kingdom direction\n    unless ($class[-1] eq $sci_name) {\n        push(@class, $sci_name);\n    }\n    @class = reverse @class;\n\n    # do minimal sanity checks before we hand off to Bio::Species which won't\n    # be able to give informative throw messages if it has to throw because\n    # of problems here\n    $self->throw(\"$acc seems to be missing its OS line: invalid.\") unless $sci_name;\n    my %names;\n    foreach my $i (0..$#class) {\n        my $name = $class[$i];\n        $names{$name}++;\n        if ($names{$name} > 1 && $name ne $class[$i - 1]) {\n            $self->throw(\"$acc seems to have an invalid species classification.\");\n        }\n    }\n    my $make = Bio::Species->new();\n    $make->scientific_name($sci_name);\n    $make->classification(@class);\n    unless ($class[-1] eq 'Viruses') {\n        $make->genus($genus) if $genus;\n        $make->species($species) if $species;\n        $make->sub_species($sub_species) if $sub_species;\n        $make->common_name($common) if $common;\n    }\n    $make->organelle($org) if $org;\n    return $make;\n}\n\n=head2 _read_EMBL_DBLink\n\n Title   : _read_EMBL_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference (\"DR\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^DR   ([^\\s;]+);\\s*([^\\s;]+);?\\s*([^\\s;]+)?\\.$/) {\n        my ($databse, $prim_id, $sec_id) = ($1,$2,$3);\n        my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                            -primary_id  => $prim_id,\n                            -optional_id => $sec_id);\n\n            push(@db_link, $link);\n    } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _read_EMBL_TaxID_DBLink\n\n Title   : _read_EMBL_TaxID_DBLink\n Usage   :\n Function: Reads the EMBL database cross reference to NCBI TaxID (\"OX\") lines\n Example :\n Returns : A list of Bio::Annotation::DBLink objects\n Args    :\n\n\nsub _read_EMBL_TaxID_DBLink {\n    my( $self,$buffer ) = @_;\n    my( @db_link );\n\n    $_ = $$buffer;\n    while (defined( $_ ||= $self->_readline )) {\n        if ( /^OX   (\\S+)=(\\d+);$/ ) {\n            my ($databse, $prim_id) = ($1,$2);\n            my $link = Bio::Annotation::DBLink->new(-database    => $databse,\n                                                    -primary_id  => $prim_id,);\n            push(@db_link, $link);\n        } else {\n            last;\n        }\n        $_ = undef;             # Empty $_ to trigger read of next line\n    }\n\n    $$buffer = $_;\n    return @db_link;\n}\n\n=head2 _filehandle\n\n Title   : _filehandle\n Usage   : $obj->_filehandle($newval)\n Function:\n Example :\n Returns : value of _filehandle\n Args    : newvalue (optional)\n\n\n\nsub _filehandle{\n    my ($obj,$value) = @_;\n    if ( defined $value) {\n        $obj->{'_filehandle'} = $value;\n    }\n    return $obj->{'_filehandle'};\n\n}\n\n=head2 _read_FTHelper_EMBL\n\n Title   : _read_FTHelper_EMBL\n Usage   : _read_FTHelper_EMBL($buffer)\n Function: reads the next FT key line\n Example :\n Returns : Bio::SeqIO::FTHelper object\n Args    : filehandle and reference to a scalar\n\n\n\nsub _read_FTHelper_EMBL {\n    my ($self,$buffer) = @_;\n\n    my ($key,                   # The key of the feature\n        $loc,                   # The location line from the feature\n        @qual,                  # An arrray of lines making up the qualifiers\n       );\n\n    if ($$buffer =~ /^FT\\s{3}(\\S+)\\s+(\\S+)/ ) {\n        $key = $1;\n        $loc = $2;\n        # Read all the lines up to the next feature\n        while ( defined($_ = $self->_readline) ) {\n            if (/^FT(\\s+)(.+?)\\s*$/) {\n                # Lines inside features are preceeded by 19 spaces\n                # A new feature is preceeded by 3 spaces\n                if (length($1) > 4) {\n                    # Add to qualifiers if we're in the qualifiers\n                    if (@qual) {\n                        push(@qual, $2);\n                    }\n                    # Start the qualifier list if it's the first qualifier\n                    elsif (substr($2, 0, 1) eq '/') {\n                        @qual = ($2);\n                    }\n                    # We're still in the location line, so append to location\n                    else {\n                        $loc .= $2;\n                    }\n                } else {\n                    # We've reached the start of the next feature\n                    last;\n                }\n            } else {\n                # We're at the end of the feature table\n                last;\n            }\n        }\n    } elsif ( $$buffer =~ /^CO\\s+(\\S+)/) {\n    $key = 'CONTIG';\n    $loc = $1;\n    # Read all the lines up to the next feature\n    while ( defined($_ = $self->_readline) ) {\n        if (/^CO\\s+(\\S+)\\s*$/) {\n        $loc .= $1;\n        } else {\n        # We've reached the start of the next feature\n        last;\n        }\n    }\n    } else {\n        # No feature key\n        return;\n    }\n\n    # Put the first line of the next feature into the buffer\n    $$buffer = $_;\n\n    # Make the new FTHelper object\n    my $out = Bio::SeqIO::FTHelper->new();\n    $out->verbose($self->verbose());\n    $out->key($key);\n    $out->loc($loc);\n\n    # Now parse and add any qualifiers.  (@qual is kept\n    # intact to provide informative error messages.)\n  QUAL: for (my $i = 0; $i < @qual; $i++) {\n        $_ = $qual[$i];\n        my( $qualifier, $value ) = m{^/([^=]+)(?:=(.+))?}\n            or $self->throw(\"Can't see new qualifier in: $_\\nfrom:\\n\"\n                            . join('', map \"$_\\n\", @qual));\n        if (defined $value) {\n            # Do we have a quoted value?\n            if (substr($value, 0, 1) eq '\"') {\n                # Keep adding to value until we find the trailing quote\n                # and the quotes are balanced\n              QUOTES:\n                while ($value !~ /\"$/ or $value =~ tr/\"/\"/ % 2) { #\"\n                    $i++;\n                    my $next = $qual[$i];\n                    if (!defined($next)) {\n                        $self->warn(\"Unbalanced quote in:\\n\".join(\"\\n\", @qual).\n                                    \"\\nAdding quote to close...\".\n                                    \"Check sequence quality!\");\n                        $value .= '\"';\n                        last QUOTES;\n                    }\n\n                    # Join to value with space if value or next line contains a space\n                    $value .= (grep /\\s/, ($value, $next)) ? \" $next\" : $next;\n                }\n                # Trim leading and trailing quotes\n                $value =~ s/^\"|\"$//g;\n                # Undouble internal quotes\n                $value =~ s/\"\"/\"/g; #\"\n            }\n        } else {\n            $value = '_no_value';\n        }\n\n        # Store the qualifier\n        $out->field->{$qualifier} ||= [];\n        push(@{$out->field->{$qualifier}},$value);\n    }\n\n    return $out;\n}\n\n=head2 _write_line_EMBL\n\n Title   : _write_line_EMBL\n Usage   :\n Function: internal function\n Example :\n Returns : 1 if writing suceeded, else undef\n Args    :\n\n\n\nsub _write_line_EMBL {\n    my ($self,$pre1,$pre2,$line,$length) = @_;\n\n    $length || $self->throw(\"Miscalled write_line_EMBL without length. Programming error!\");\n    my $subl = $length - length $pre2;\n    my $linel = length $line;\n    my $i;\n\n    my $sub = substr($line,0,$length - length $pre1);\n\n    $self->_print( \"$pre1$sub\\n\") || return;\n\n    for ($i= ($length - length $pre1);$i < $linel;) {\n        $sub = substr($line,$i,($subl));\n        $self->_print( \"$pre2$sub\\n\") || return;\n        $i += $subl;\n    }\n\n    return 1;\n}\n\n=head2 _write_line_EMBL_regex\n\n Title   : _write_line_EMBL_regex\n Usage   :\n Function: internal function for writing lines of specified\n           length, with different first and the next line\n           left hand headers and split at specific points in the\n           text\n Example :\n Returns : nothing\n Args    : file handle, first header, second header, text-line, regex for line breaks, total line length","parameters":[{"label":"$self"},{"label":"$pre1"},{"label":"$pre2"},{"label":"$line"},{"label":"$regex"},{"label":"$length"}]},"containerName":"main::","definition":"sub"},{"name":"CHUNK","kind":12,"line":1391},{"name":"CHUNK","line":1403,"kind":12},{"children":[{"line":1433,"kind":13,"localvar":"my","definition":"my","name":"$obj","containerName":"_post_sort"},{"line":1435,"kind":13,"localvar":"my","containerName":"_post_sort","name":"$value","definition":"my"},{"kind":13,"line":1436,"name":"$obj","containerName":"_post_sort"},{"line":1436,"kind":13,"name":"$value","containerName":"_post_sort"},{"name":"$obj","containerName":"_post_sort","line":1438,"kind":13}],"line":1432,"kind":12,"range":{"start":{"character":0,"line":1432},"end":{"character":9999,"line":1440}},"name":"_post_sort","definition":"sub","containerName":"main::"},{"range":{"end":{"line":1461,"character":9999},"start":{"line":1453,"character":0}},"containerName":"main::","definition":"sub","name":"_show_dna","line":1453,"children":[{"line":1454,"kind":13,"localvar":"my","containerName":"_show_dna","name":"$obj","definition":"my"},{"containerName":"_show_dna","name":"$value","definition":"my","localvar":"my","kind":13,"line":1456},{"line":1457,"kind":13,"name":"$obj","containerName":"_show_dna"},{"name":"$value","containerName":"_show_dna","line":1457,"kind":13},{"name":"$obj","containerName":"_show_dna","kind":13,"line":1459}],"kind":12},{"containerName":"main::","name":"_id_generation_func","definition":"sub","range":{"start":{"character":0,"line":1474},"end":{"line":1482,"character":9999}},"kind":12,"line":1474,"children":[{"kind":13,"line":1475,"definition":"my","name":"$obj","containerName":"_id_generation_func","localvar":"my"},{"kind":13,"line":1477,"name":"$value","definition":"my","containerName":"_id_generation_func","localvar":"my"},{"containerName":"_id_generation_func","name":"$obj","kind":13,"line":1478},{"kind":13,"line":1478,"containerName":"_id_generation_func","name":"$value"},{"kind":13,"line":1480,"name":"$obj","containerName":"_id_generation_func"}]},{"children":[{"definition":"my","name":"$obj","containerName":"_ac_generation_func","localvar":"my","kind":13,"line":1496},{"localvar":"my","containerName":"_ac_generation_func","name":"$value","definition":"my","line":1498,"kind":13},{"containerName":"_ac_generation_func","name":"$obj","line":1499,"kind":13},{"line":1499,"kind":13,"containerName":"_ac_generation_func","name":"$value"},{"name":"$obj","containerName":"_ac_generation_func","kind":13,"line":1501}],"line":1495,"kind":12,"range":{"start":{"line":1495,"character":0},"end":{"character":9999,"line":1503}},"name":"_ac_generation_func","definition":"sub","containerName":"main::"},{"range":{"start":{"line":1516,"character":0},"end":{"character":9999,"line":1524}},"name":"_sv_generation_func","definition":"sub","containerName":"main::","children":[{"localvar":"my","containerName":"_sv_generation_func","name":"$obj","definition":"my","line":1517,"kind":13},{"localvar":"my","definition":"my","name":"$value","containerName":"_sv_generation_func","line":1519,"kind":13},{"line":1520,"kind":13,"containerName":"_sv_generation_func","name":"$obj"},{"name":"$value","containerName":"_sv_generation_func","line":1520,"kind":13},{"kind":13,"line":1522,"name":"$obj","containerName":"_sv_generation_func"}],"line":1516,"kind":12},{"containerName":"main::","definition":"sub","name":"_kw_generation_func","range":{"end":{"character":9999,"line":1545},"start":{"line":1537,"character":0}},"kind":12,"line":1537,"children":[{"line":1538,"kind":13,"localvar":"my","containerName":"_kw_generation_func","name":"$obj","definition":"my"},{"containerName":"_kw_generation_func","definition":"my","name":"$value","localvar":"my","kind":13,"line":1540},{"name":"$obj","containerName":"_kw_generation_func","line":1541,"kind":13},{"line":1541,"kind":13,"name":"$value","containerName":"_kw_generation_func"},{"containerName":"_kw_generation_func","name":"$obj","kind":13,"line":1543}]}]}