{"version":5,"vars":[{"signature":{"parameters":[{"label":"$class"},{"label":"@args"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none","label":"new($class,@args)"},"line":227,"range":{"start":{"line":227,"character":0},"end":{"character":9999,"line":275}},"kind":12,"definition":"sub","detail":"($class,@args)","children":[{"containerName":"new","localvar":"my","kind":13,"name":"$class","line":228,"definition":"my"},{"containerName":"new","kind":13,"name":"@args","line":228},{"line":229,"name":"$self","containerName":"new","localvar":"my","kind":13,"definition":"my"},{"kind":13,"containerName":"new","name":"$class","line":229},{"line":229,"containerName":"new","kind":13,"name":"@args"},{"definition":"my","localvar":"my","containerName":"new","kind":13,"name":"$id","line":230},{"line":230,"name":"$nof_seq","kind":13,"containerName":"new"},{"containerName":"new","kind":13,"name":"$nof_rep","line":230},{"line":230,"kind":13,"containerName":"new","name":"$max_size"},{"line":230,"kind":13,"containerName":"new","name":"$nof_overlaps"},{"containerName":"new","kind":13,"name":"$min_overlap","line":230},{"kind":13,"containerName":"new","name":"$min_identity","line":231},{"name":"$avg_overlap","kind":13,"containerName":"new","line":231},{"name":"$avg_identity","containerName":"new","kind":13,"line":231},{"name":"$avg_seq_len","containerName":"new","kind":13,"line":231},{"name":"$spectrum","kind":13,"containerName":"new","line":231},{"containerName":"new","kind":13,"name":"$assembly","line":232},{"line":232,"kind":13,"containerName":"new","name":"$eff_asm_params"},{"line":232,"containerName":"new","kind":13,"name":"$dissolve"},{"line":232,"name":"$cross","containerName":"new","kind":13},{"line":232,"name":"$self","containerName":"new","kind":13},{"line":232,"name":"_rearrange","kind":12,"containerName":"new"},{"kind":13,"containerName":"new","name":"@args","line":235},{"name":"$self","containerName":"new","kind":13,"line":238},{"name":"$self","kind":13,"containerName":"new","line":239},{"name":"$self","containerName":"new","kind":13,"line":240},{"line":241,"name":"$self","containerName":"new","kind":13},{"line":242,"name":"$self","kind":13,"containerName":"new"},{"line":243,"name":"$self","kind":13,"containerName":"new"},{"line":244,"kind":13,"containerName":"new","name":"$self"},{"line":245,"name":"$self","kind":13,"containerName":"new"},{"line":246,"kind":13,"containerName":"new","name":"$self"},{"name":"$self","containerName":"new","kind":13,"line":247},{"line":248,"kind":13,"containerName":"new","name":"$self"},{"line":249,"containerName":"new","kind":13,"name":"$self"},{"containerName":"new","kind":13,"name":"$self","line":250},{"line":253,"name":"$self","kind":13,"containerName":"new"},{"line":253,"kind":13,"containerName":"new","name":"$id"},{"name":"$id","kind":13,"containerName":"new","line":253},{"line":254,"name":"$self","containerName":"new","kind":13},{"name":"$nof_seq","containerName":"new","kind":13,"line":254},{"kind":13,"containerName":"new","name":"$nof_seq","line":254},{"name":"$self","containerName":"new","kind":13,"line":255},{"line":255,"containerName":"new","kind":13,"name":"$nof_rep"},{"kind":13,"containerName":"new","name":"$nof_rep","line":255},{"kind":13,"containerName":"new","name":"$self","line":256},{"line":256,"name":"$max_size","kind":13,"containerName":"new"},{"kind":13,"containerName":"new","name":"$max_size","line":256},{"line":257,"containerName":"new","kind":13,"name":"$self"},{"line":257,"name":"$nof_overlaps","kind":13,"containerName":"new"},{"name":"$nof_overlaps","kind":13,"containerName":"new","line":257},{"line":258,"name":"$self","kind":13,"containerName":"new"},{"containerName":"new","kind":13,"name":"$min_overlap","line":258},{"kind":13,"containerName":"new","name":"$min_overlap","line":258},{"line":259,"name":"$self","kind":13,"containerName":"new"},{"name":"$avg_overlap","kind":13,"containerName":"new","line":259},{"name":"$avg_overlap","kind":13,"containerName":"new","line":259},{"kind":13,"containerName":"new","name":"$self","line":260},{"kind":13,"containerName":"new","name":"$min_identity","line":260},{"name":"$min_identity","kind":13,"containerName":"new","line":260},{"name":"$self","kind":13,"containerName":"new","line":261},{"kind":13,"containerName":"new","name":"$avg_identity","line":261},{"containerName":"new","kind":13,"name":"$avg_identity","line":261},{"line":262,"name":"$self","containerName":"new","kind":13},{"line":262,"kind":13,"containerName":"new","name":"$avg_seq_len"},{"line":262,"name":"$avg_seq_len","kind":13,"containerName":"new"},{"name":"$self","containerName":"new","kind":13,"line":263},{"line":263,"name":"$eff_asm_params","containerName":"new","kind":13},{"line":263,"name":"$eff_asm_params","containerName":"new","kind":13},{"containerName":"new","kind":13,"name":"$self","line":266},{"kind":12,"containerName":"new","name":"_import_spectrum","line":266},{"containerName":"new","kind":13,"name":"$spectrum","line":266},{"line":266,"name":"$spectrum","kind":13,"containerName":"new"},{"name":"$self","kind":13,"containerName":"new","line":267},{"line":267,"containerName":"new","kind":12,"name":"_import_assembly"},{"kind":13,"containerName":"new","name":"$assembly","line":267},{"name":"$assembly","kind":13,"containerName":"new","line":267},{"containerName":"new","kind":13,"name":"$dissolve","line":268},{"containerName":"new","localvar":"my","kind":13,"name":"$mixed_csp","line":269,"definition":"my"},{"line":269,"name":"$header","containerName":"new","kind":13},{"line":269,"kind":13,"containerName":"new","name":"$dissolve"},{"name":"$dissolve","containerName":"new","kind":13,"line":269},{"line":270,"name":"$self","containerName":"new","kind":13},{"line":270,"containerName":"new","kind":12,"name":"_import_dissolved_csp"},{"line":270,"name":"$mixed_csp","kind":13,"containerName":"new"},{"name":"$header","kind":13,"containerName":"new","line":270},{"containerName":"new","kind":13,"name":"$self","line":272},{"line":272,"name":"_import_cross_csp","kind":12,"containerName":"new"},{"name":"$cross","containerName":"new","kind":13,"line":272},{"containerName":"new","kind":13,"name":"$cross","line":272},{"line":274,"name":"$self","kind":13,"containerName":"new"}],"name":"new","containerName":"main::"},{"line":228,"containerName":"","kind":2,"name":"base"},{"line":229,"containerName":"new","kind":12,"name":"SUPER"},{"line":288,"range":{"end":{"line":295,"character":9999},"start":{"line":288,"character":0}},"kind":12,"signature":{"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]","parameters":[{"label":"$self"},{"label":"$id"}],"label":"id($self,$id)"},"children":[{"definition":"my","kind":13,"localvar":"my","containerName":"id","name":"$self","line":289},{"containerName":"id","kind":13,"name":"$id","line":289},{"line":290,"name":"$id","kind":13,"containerName":"id"},{"kind":13,"containerName":"id","name":"$self","line":291},{"name":"$id","containerName":"id","kind":13,"line":291},{"line":293,"name":"$id","containerName":"id","kind":13},{"line":293,"kind":13,"containerName":"id","name":"$self"},{"line":294,"name":"$id","containerName":"id","kind":13}],"name":"id","containerName":"main::","definition":"sub","detail":"($self,$id)"},{"line":308,"range":{"start":{"character":0,"line":308},"end":{"character":9999,"line":317}},"kind":12,"signature":{"label":"nof_seq($self,$nof_seq)","documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]","parameters":[{"label":"$self"},{"label":"$nof_seq"}]},"children":[{"kind":13,"localvar":"my","containerName":"nof_seq","name":"$self","line":309,"definition":"my"},{"line":309,"name":"$nof_seq","kind":13,"containerName":"nof_seq"},{"containerName":"nof_seq","kind":13,"name":"$nof_seq","line":310},{"name":"$self","kind":13,"containerName":"nof_seq","line":311},{"name":"throw","kind":12,"containerName":"nof_seq","line":311},{"line":312,"name":"$nof_seq","kind":13,"containerName":"nof_seq"},{"name":"$self","kind":13,"containerName":"nof_seq","line":313},{"line":313,"kind":13,"containerName":"nof_seq","name":"$nof_seq"},{"kind":13,"containerName":"nof_seq","name":"$nof_seq","line":315},{"name":"$self","kind":13,"containerName":"nof_seq","line":315},{"kind":13,"containerName":"nof_seq","name":"$nof_seq","line":316}],"name":"nof_seq","containerName":"main::","definition":"sub","detail":"($self,$nof_seq)"},{"definition":"sub","detail":"($self,$nof_rep)","children":[{"definition":"my","line":332,"name":"$self","containerName":"nof_rep","localvar":"my","kind":13},{"kind":13,"containerName":"nof_rep","name":"$nof_rep","line":332},{"line":333,"kind":13,"containerName":"nof_rep","name":"$nof_rep"},{"name":"$self","containerName":"nof_rep","kind":13,"line":334},{"kind":12,"containerName":"nof_rep","name":"throw","line":334},{"line":335,"name":"$nof_rep","containerName":"nof_rep","kind":13},{"line":336,"kind":13,"containerName":"nof_rep","name":"$self"},{"name":"$nof_rep","kind":13,"containerName":"nof_rep","line":336},{"name":"$nof_rep","kind":13,"containerName":"nof_rep","line":338},{"name":"$self","containerName":"nof_rep","kind":13,"line":338},{"line":339,"name":"$nof_rep","containerName":"nof_rep","kind":13}],"containerName":"main::","name":"nof_rep","signature":{"label":"nof_rep($self,$nof_rep)","documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]","parameters":[{"label":"$self"},{"label":"$nof_rep"}]},"line":331,"range":{"end":{"line":340,"character":9999},"start":{"character":0,"line":331}},"kind":12},{"detail":"($self,$max_size)","definition":"sub","name":"max_size","containerName":"main::","children":[{"definition":"my","name":"$self","localvar":"my","kind":13,"containerName":"max_size","line":354},{"line":354,"name":"$max_size","kind":13,"containerName":"max_size"},{"line":355,"containerName":"max_size","kind":13,"name":"$max_size"},{"name":"$self","kind":13,"containerName":"max_size","line":356},{"kind":12,"containerName":"max_size","name":"throw","line":356},{"line":357,"containerName":"max_size","kind":13,"name":"$max_size"},{"line":358,"name":"$self","kind":13,"containerName":"max_size"},{"line":358,"containerName":"max_size","kind":13,"name":"$max_size"},{"name":"$max_size","containerName":"max_size","kind":13,"line":360},{"name":"$self","containerName":"max_size","kind":13,"line":360},{"kind":13,"containerName":"max_size","name":"$max_size","line":361}],"signature":{"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]","parameters":[{"label":"$self"},{"label":"$max_size"}],"label":"max_size($self,$max_size)"},"kind":12,"range":{"start":{"character":0,"line":353},"end":{"character":9999,"line":362}},"line":353},{"line":375,"range":{"end":{"line":384,"character":9999},"start":{"line":375,"character":0}},"kind":12,"signature":{"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]","parameters":[{"label":"$self"},{"label":"$nof_overlaps"}],"label":"nof_overlaps($self,$nof_overlaps)"},"children":[{"definition":"my","line":376,"localvar":"my","kind":13,"containerName":"nof_overlaps","name":"$self"},{"line":376,"name":"$nof_overlaps","kind":13,"containerName":"nof_overlaps"},{"line":377,"name":"$nof_overlaps","containerName":"nof_overlaps","kind":13},{"containerName":"nof_overlaps","kind":13,"name":"$self","line":378},{"line":378,"containerName":"nof_overlaps","kind":12,"name":"throw"},{"line":379,"kind":13,"containerName":"nof_overlaps","name":"$nof_overlaps"},{"line":380,"containerName":"nof_overlaps","kind":13,"name":"$self"},{"line":380,"containerName":"nof_overlaps","kind":13,"name":"$nof_overlaps"},{"line":382,"name":"$nof_overlaps","containerName":"nof_overlaps","kind":13},{"line":382,"name":"$self","kind":13,"containerName":"nof_overlaps"},{"containerName":"nof_overlaps","kind":13,"name":"$nof_overlaps","line":383}],"containerName":"main::","name":"nof_overlaps","definition":"sub","detail":"($self,$nof_overlaps)"},{"definition":"sub","detail":"($self,$min_overlap)","children":[{"definition":"my","line":398,"kind":13,"localvar":"my","containerName":"min_overlap","name":"$self"},{"line":398,"containerName":"min_overlap","kind":13,"name":"$min_overlap"},{"name":"$min_overlap","kind":13,"containerName":"min_overlap","line":399},{"containerName":"min_overlap","kind":13,"name":"$self","line":400},{"line":400,"containerName":"min_overlap","kind":12,"name":"throw"},{"containerName":"min_overlap","kind":13,"name":"$min_overlap","line":401},{"containerName":"min_overlap","kind":13,"name":"$self","line":402},{"kind":13,"containerName":"min_overlap","name":"$min_overlap","line":402},{"name":"$min_overlap","containerName":"min_overlap","kind":13,"line":404},{"kind":13,"containerName":"min_overlap","name":"$self","line":404},{"line":405,"containerName":"min_overlap","kind":13,"name":"$min_overlap"}],"containerName":"main::","name":"min_overlap","signature":{"label":"min_overlap($self,$min_overlap)","documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]","parameters":[{"label":"$self"},{"label":"$min_overlap"}]},"line":397,"range":{"start":{"line":397,"character":0},"end":{"line":406,"character":9999}},"kind":12},{"line":419,"range":{"end":{"character":9999,"line":428},"start":{"character":0,"line":419}},"kind":12,"signature":{"label":"avg_overlap($self,$avg_overlap)","parameters":[{"label":"$self"},{"label":"$avg_overlap"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]"},"children":[{"definition":"my","name":"$self","localvar":"my","kind":13,"containerName":"avg_overlap","line":420},{"kind":13,"containerName":"avg_overlap","name":"$avg_overlap","line":420},{"line":421,"name":"$avg_overlap","containerName":"avg_overlap","kind":13},{"containerName":"avg_overlap","kind":13,"name":"$self","line":422},{"line":422,"kind":12,"containerName":"avg_overlap","name":"throw"},{"name":"$avg_overlap","containerName":"avg_overlap","kind":13,"line":423},{"line":424,"name":"$self","containerName":"avg_overlap","kind":13},{"line":424,"name":"$avg_overlap","containerName":"avg_overlap","kind":13},{"line":426,"name":"$avg_overlap","containerName":"avg_overlap","kind":13},{"line":426,"name":"$self","containerName":"avg_overlap","kind":13},{"line":427,"kind":13,"containerName":"avg_overlap","name":"$avg_overlap"}],"name":"avg_overlap","containerName":"main::","definition":"sub","detail":"($self,$avg_overlap)"},{"range":{"start":{"line":441,"character":0},"end":{"character":9999,"line":450}},"kind":12,"line":441,"signature":{"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]","parameters":[{"label":"$self"},{"label":"$min_identity"}],"label":"min_identity($self,$min_identity)"},"containerName":"main::","name":"min_identity","children":[{"definition":"my","line":442,"containerName":"min_identity","localvar":"my","kind":13,"name":"$self"},{"line":442,"containerName":"min_identity","kind":13,"name":"$min_identity"},{"name":"$min_identity","containerName":"min_identity","kind":13,"line":443},{"containerName":"min_identity","kind":13,"name":"$self","line":444},{"name":"throw","containerName":"min_identity","kind":12,"line":444},{"name":"$min_identity","containerName":"min_identity","kind":13,"line":445},{"line":446,"name":"$self","kind":13,"containerName":"min_identity"},{"line":446,"name":"$min_identity","kind":13,"containerName":"min_identity"},{"name":"$min_identity","containerName":"min_identity","kind":13,"line":448},{"kind":13,"containerName":"min_identity","name":"$self","line":448},{"kind":13,"containerName":"min_identity","name":"$min_identity","line":449}],"detail":"($self,$min_identity)","definition":"sub"},{"detail":"($self,$avg_identity)","definition":"sub","name":"avg_identity","containerName":"main::","children":[{"line":464,"name":"$self","containerName":"avg_identity","localvar":"my","kind":13,"definition":"my"},{"line":464,"kind":13,"containerName":"avg_identity","name":"$avg_identity"},{"name":"$avg_identity","containerName":"avg_identity","kind":13,"line":465},{"line":466,"name":"$self","kind":13,"containerName":"avg_identity"},{"name":"throw","kind":12,"containerName":"avg_identity","line":466},{"name":"$avg_identity","kind":13,"containerName":"avg_identity","line":467},{"line":468,"containerName":"avg_identity","kind":13,"name":"$self"},{"line":468,"containerName":"avg_identity","kind":13,"name":"$avg_identity"},{"line":470,"containerName":"avg_identity","kind":13,"name":"$avg_identity"},{"containerName":"avg_identity","kind":13,"name":"$self","line":470},{"line":471,"name":"$avg_identity","containerName":"avg_identity","kind":13}],"signature":{"label":"avg_identity($self,$avg_identity)","parameters":[{"label":"$self"},{"label":"$avg_identity"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]"},"kind":12,"range":{"end":{"character":9999,"line":472},"start":{"character":0,"line":463}},"line":463},{"signature":{"label":"avg_seq_len($self,$avg_seq_len)","documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]","parameters":[{"label":"$self"},{"label":"$avg_seq_len"}]},"line":485,"kind":12,"range":{"start":{"character":0,"line":485},"end":{"line":494,"character":9999}},"definition":"sub","detail":"($self,$avg_seq_len)","children":[{"name":"$self","localvar":"my","containerName":"avg_seq_len","kind":13,"line":486,"definition":"my"},{"name":"$avg_seq_len","containerName":"avg_seq_len","kind":13,"line":486},{"kind":13,"containerName":"avg_seq_len","name":"$avg_seq_len","line":487},{"name":"$self","containerName":"avg_seq_len","kind":13,"line":488},{"line":488,"name":"throw","containerName":"avg_seq_len","kind":12},{"line":489,"containerName":"avg_seq_len","kind":13,"name":"$avg_seq_len"},{"name":"$self","containerName":"avg_seq_len","kind":13,"line":490},{"name":"$avg_seq_len","containerName":"avg_seq_len","kind":13,"line":490},{"line":492,"containerName":"avg_seq_len","kind":13,"name":"$avg_seq_len"},{"kind":13,"containerName":"avg_seq_len","name":"$self","line":492},{"name":"$avg_seq_len","containerName":"avg_seq_len","kind":13,"line":493}],"name":"avg_seq_len","containerName":"main::"},{"line":512,"range":{"start":{"character":0,"line":512},"end":{"line":521,"character":9999}},"kind":12,"signature":{"parameters":[{"label":"$self"},{"label":"$eff_asm_params"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]","label":"eff_asm_params($self,$eff_asm_params)"},"children":[{"definition":"my","localvar":"my","containerName":"eff_asm_params","kind":13,"name":"$self","line":513},{"line":513,"kind":13,"containerName":"eff_asm_params","name":"$eff_asm_params"},{"kind":13,"containerName":"eff_asm_params","name":"$eff_asm_params","line":514},{"name":"$self","containerName":"eff_asm_params","kind":13,"line":515},{"containerName":"eff_asm_params","kind":12,"name":"throw","line":515},{"line":516,"kind":13,"containerName":"eff_asm_params","name":"$eff_asm_params"},{"line":516,"name":"$eff_asm_params","containerName":"eff_asm_params","kind":13},{"line":517,"name":"$self","kind":13,"containerName":"eff_asm_params"},{"line":517,"name":"$eff_asm_params","containerName":"eff_asm_params","kind":13},{"name":"$eff_asm_params","kind":13,"containerName":"eff_asm_params","line":519},{"line":519,"kind":13,"containerName":"eff_asm_params","name":"$self"},{"name":"$eff_asm_params","kind":13,"containerName":"eff_asm_params","line":520}],"containerName":"main::","name":"eff_asm_params","definition":"sub","detail":"($self,$eff_asm_params)"},{"definition":"sub","detail":"($self,$spectrum)","children":[{"definition":"my","name":"$self","localvar":"my","containerName":"spectrum","kind":13,"line":540},{"line":540,"name":"$spectrum","kind":13,"containerName":"spectrum"},{"kind":13,"containerName":"spectrum","name":"$spectrum","line":541},{"name":"$self","containerName":"spectrum","kind":13,"line":542},{"kind":12,"containerName":"spectrum","name":"_import_spectrum","line":542},{"containerName":"spectrum","kind":13,"name":"$spectrum","line":542},{"name":"$spectrum","kind":13,"containerName":"spectrum","line":544},{"kind":13,"containerName":"spectrum","name":"$self","line":544},{"line":545,"kind":13,"containerName":"spectrum","name":"$spectrum"}],"containerName":"main::","name":"spectrum","signature":{"label":"spectrum($self,$spectrum)","documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]","parameters":[{"label":"$self"},{"label":"$spectrum"}]},"line":539,"range":{"end":{"line":546,"character":9999},"start":{"line":539,"character":0}},"kind":12},{"signature":{"label":"assembly($self,$assembly)","parameters":[{"label":"$self"},{"label":"$assembly"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold"},"kind":12,"range":{"end":{"line":566,"character":9999},"start":{"character":0,"line":561}},"line":561,"detail":"($self,$assembly)","definition":"sub","containerName":"main::","name":"assembly","children":[{"line":562,"kind":13,"localvar":"my","containerName":"assembly","name":"$self","definition":"my"},{"name":"$assembly","kind":13,"containerName":"assembly","line":562},{"name":"$assembly","containerName":"assembly","kind":13,"line":563},{"line":564,"name":"$self","kind":13,"containerName":"assembly"},{"line":564,"containerName":"assembly","kind":12,"name":"_import_assembly"},{"line":564,"containerName":"assembly","kind":13,"name":"$assembly"},{"localvar":"my","containerName":"assembly","kind":13,"name":"@asm_list","line":566,"definition":"my"},{"kind":13,"containerName":"assembly","name":"$self","line":566}]},{"line":566,"containerName":null,"kind":13,"name":"%self"},{"line":567,"name":"@asm_list","kind":13,"containerName":null},{"name":"drop_assembly","containerName":"main::","children":[{"name":"$self","localvar":"my","containerName":"drop_assembly","kind":13,"line":583,"definition":"my"},{"kind":13,"containerName":"drop_assembly","name":"$self","line":584}],"detail":"($self)","definition":"sub","kind":12,"range":{"start":{"line":582,"character":0},"end":{"line":586,"character":9999}},"line":582,"signature":{"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none","parameters":[{"label":"$self"}],"label":"drop_assembly($self)"}},{"children":[{"kind":13,"localvar":"my","containerName":"dissolve","name":"$self","line":607,"definition":"my"},{"line":607,"name":"$mixed_csp","containerName":"dissolve","kind":13},{"line":607,"name":"$seq_header","containerName":"dissolve","kind":13},{"line":608,"kind":13,"containerName":"dissolve","name":"$self"},{"name":"_import_dissolved_csp","containerName":"dissolve","kind":12,"line":608},{"line":608,"containerName":"dissolve","kind":13,"name":"$mixed_csp"},{"containerName":"dissolve","kind":13,"name":"$seq_header","line":608}],"containerName":"main::","name":"dissolve","definition":"sub","detail":"($self,$mixed_csp,$seq_header)","line":606,"range":{"end":{"line":610,"character":9999},"start":{"character":0,"line":606}},"kind":12,"signature":{"parameters":[{"label":"$self"},{"label":"$mixed_csp"},{"label":"$seq_header"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string","label":"dissolve($self,$mixed_csp,$seq_header)"}},{"name":"cross","containerName":"main::","children":[{"definition":"my","line":624,"name":"$self","localvar":"my","kind":13,"containerName":"cross"},{"name":"$mixed_csp","containerName":"cross","kind":13,"line":624},{"line":625,"containerName":"cross","kind":13,"name":"$self"},{"line":625,"kind":12,"containerName":"cross","name":"_import_cross_csp"},{"line":625,"kind":13,"containerName":"cross","name":"$mixed_csp"}],"detail":"($self,$mixed_csp)","definition":"sub","range":{"start":{"line":623,"character":0},"end":{"character":9999,"line":627}},"kind":12,"line":623,"signature":{"parameters":[{"label":"$self"},{"label":"$mixed_csp"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference","label":"cross($self,$mixed_csp)"}},{"kind":12,"range":{"start":{"line":642,"character":0},"end":{"character":9999,"line":665}},"line":642,"signature":{"parameters":[{"label":"$self"},{"label":"$element_separator"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated","label":"to_string($self,$element_separator)"},"containerName":"main::","name":"to_string","children":[{"definition":"my","name":"$self","localvar":"my","kind":13,"containerName":"to_string","line":643},{"name":"$element_separator","containerName":"to_string","kind":13,"line":643},{"name":"$self","kind":13,"containerName":"to_string","line":644},{"line":645,"name":"$element_separator","kind":13,"containerName":"to_string"},{"containerName":"to_string","kind":13,"name":"$element_separator","line":646},{"kind":13,"containerName":"to_string","name":"$element_separator","line":647},{"line":648,"name":"$element_separator","containerName":"to_string","kind":13},{"kind":13,"containerName":"to_string","name":"$element_separator","line":649},{"containerName":"to_string","kind":13,"name":"$element_separator","line":650},{"kind":13,"containerName":"to_string","name":"$element_separator","line":651},{"line":653,"containerName":"to_string","kind":13,"name":"$self"},{"line":653,"name":"throw","kind":12,"containerName":"to_string"},{"line":655,"containerName":"to_string","localvar":"my","kind":13,"name":"$str","definition":"my"},{"kind":13,"localvar":"my","containerName":"to_string","name":"$q","line":656,"definition":"my"},{"line":656,"name":"$q","kind":13,"containerName":"to_string"},{"name":"$self","kind":13,"containerName":"to_string","line":656},{"line":656,"name":"$q","kind":13,"containerName":"to_string"},{"line":657,"kind":13,"localvar":"my","containerName":"to_string","name":"$val","definition":"my"},{"containerName":"to_string","kind":13,"name":"$self","line":658},{"line":658,"kind":13,"containerName":"to_string","name":"$q"},{"kind":13,"containerName":"to_string","name":"$val","line":659},{"containerName":"to_string","kind":13,"name":"$self","line":659},{"name":"$q","containerName":"to_string","kind":13,"line":659},{"kind":13,"containerName":"to_string","name":"$str","line":661},{"line":661,"name":"$val","kind":13,"containerName":"to_string"},{"line":661,"containerName":"to_string","kind":13,"name":"$element_separator"},{"name":"$str","kind":13,"containerName":"to_string","line":663},{"line":664,"name":"$str","containerName":"to_string","kind":13}],"detail":"($self,$element_separator)","definition":"sub"},{"children":[{"definition":"my","localvar":"my","containerName":"add","kind":13,"name":"$self","line":680},{"name":"$csp","kind":13,"containerName":"add","line":680},{"kind":13,"containerName":"add","name":"$csp","line":682},{"line":682,"name":"$csp","kind":13,"containerName":"add"},{"line":682,"containerName":"add","kind":12,"name":"isa"},{"containerName":"add","kind":13,"name":"$self","line":683},{"line":683,"containerName":"add","kind":12,"name":"throw"},{"containerName":"add","kind":13,"name":"$csp","line":684},{"name":"$self","kind":13,"containerName":"add","line":687},{"line":689,"kind":13,"containerName":"add","name":"$csp"},{"containerName":"add","kind":13,"name":"$self","line":690},{"name":"warn","containerName":"add","kind":12,"line":690},{"line":691,"kind":13,"containerName":"add","name":"$self"},{"line":693,"containerName":"add","kind":13,"name":"$csp"},{"line":694,"name":"$csp","kind":13,"containerName":"add"},{"line":694,"containerName":"add","kind":13,"name":"$self"},{"name":"$self","containerName":"add","kind":13,"line":695},{"line":695,"kind":12,"containerName":"add","name":"warn"},{"name":"$self","containerName":"add","kind":13,"line":697},{"line":698,"name":"$csp","containerName":"add","kind":13},{"definition":"my","line":702,"containerName":"add","localvar":"my","kind":13,"name":"$tot_num_overlaps"},{"line":702,"kind":13,"containerName":"add","name":"$csp"},{"line":702,"kind":13,"containerName":"add","name":"$self"},{"line":703,"kind":13,"containerName":"add","name":"$self"},{"name":"$csp","containerName":"add","kind":13,"line":703},{"line":704,"kind":13,"containerName":"add","name":"$csp"},{"name":"$self","containerName":"add","kind":13,"line":704},{"kind":13,"containerName":"add","name":"$csp","line":705},{"line":705,"kind":13,"containerName":"add","name":"$self"},{"containerName":"add","kind":13,"name":"$self","line":706},{"line":706,"kind":13,"containerName":"add","name":"$csp"},{"line":707,"name":"$csp","kind":13,"containerName":"add"},{"name":"$self","kind":13,"containerName":"add","line":707},{"line":708,"kind":13,"containerName":"add","name":"$csp"},{"kind":13,"containerName":"add","name":"$self","line":708},{"line":709,"name":"$tot_num_overlaps","kind":13,"containerName":"add"},{"line":710,"containerName":"add","kind":13,"name":"$self"},{"containerName":"add","kind":13,"name":"$csp","line":711},{"name":"$csp","kind":13,"containerName":"add","line":711},{"line":712,"name":"$self","kind":13,"containerName":"add"},{"line":712,"name":"$self","containerName":"add","kind":13},{"line":713,"kind":13,"containerName":"add","name":"$tot_num_overlaps"},{"line":714,"name":"$self","containerName":"add","kind":13},{"containerName":"add","kind":13,"name":"$csp","line":715},{"name":"$csp","containerName":"add","kind":13,"line":715},{"name":"$self","containerName":"add","kind":13,"line":716},{"name":"$self","containerName":"add","kind":13,"line":716},{"name":"$tot_num_overlaps","kind":13,"containerName":"add","line":717},{"line":719,"name":"$self","kind":13,"containerName":"add"},{"kind":13,"containerName":"add","name":"$tot_num_overlaps","line":719},{"definition":"my","localvar":"my","kind":13,"containerName":"add","name":"$tot_nof_seq","line":722},{"line":722,"containerName":"add","kind":13,"name":"$csp"},{"kind":13,"containerName":"add","name":"$self","line":722},{"line":723,"kind":13,"containerName":"add","name":"$tot_nof_seq"},{"line":724,"name":"$self","kind":13,"containerName":"add"},{"containerName":"add","kind":13,"name":"$csp","line":724},{"name":"$csp","containerName":"add","kind":13,"line":724},{"line":725,"containerName":"add","kind":13,"name":"$self"},{"name":"$self","containerName":"add","kind":13,"line":725},{"kind":13,"containerName":"add","name":"$tot_nof_seq","line":725},{"containerName":"add","kind":13,"name":"$self","line":728},{"line":728,"name":"_import_spectrum","containerName":"add","kind":12},{"line":728,"containerName":"add","kind":13,"name":"$csp"},{"line":730,"containerName":"add","kind":13,"name":"$self"},{"name":"$self","kind":13,"containerName":"add","line":731},{"line":731,"name":"$csp","kind":13,"containerName":"add"},{"line":733,"containerName":"add","kind":13,"name":"$self"}],"containerName":"main::","name":"add","definition":"sub","detail":"($self,$csp)","line":679,"range":{"start":{"line":679,"character":0},"end":{"line":733,"character":9999}},"kind":12,"signature":{"label":"add($self,$csp)","parameters":[{"label":"$self"},{"label":"$csp"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object"}},{"containerName":null,"kind":13,"name":"%csp","line":733},{"containerName":null,"kind":13,"name":"%csp","line":734},{"children":[{"definition":"my","localvar":"my","kind":13,"containerName":"average","name":"$self","line":752},{"containerName":"average","kind":13,"name":"$list","line":752},{"containerName":"average","kind":13,"name":"$list","line":754},{"name":"$list","kind":13,"containerName":"average","line":754},{"containerName":"average","kind":13,"name":"$self","line":755},{"containerName":"average","kind":12,"name":"throw","line":755},{"line":755,"name":"$list","containerName":"average","kind":13},{"definition":"my","name":"$avg","kind":13,"localvar":"my","containerName":"average","line":758},{"kind":12,"containerName":"average","name":"new","line":758},{"name":"$avg","containerName":"average","kind":13,"line":759},{"definition":"my","name":"$tot_nof_rep","localvar":"my","kind":13,"containerName":"average","line":762},{"kind":13,"localvar":"my","containerName":"average","name":"$csp","line":763,"definition":"my"},{"containerName":"average","kind":13,"name":"$list","line":763},{"line":765,"kind":13,"containerName":"average","name":"$csp"},{"kind":12,"containerName":"average","name":"isa","line":765},{"kind":13,"containerName":"average","name":"$csp","line":766},{"line":766,"kind":12,"containerName":"average","name":"throw"},{"name":"$csp","kind":13,"containerName":"average","line":767},{"name":"$avg","kind":13,"containerName":"average","line":770},{"name":"add","containerName":"average","kind":12,"line":770},{"line":770,"containerName":"average","kind":13,"name":"$csp"},{"name":"$q","kind":13,"localvar":"my","containerName":"average","line":774,"definition":"my"},{"line":774,"name":"$q","containerName":"average","kind":13},{"line":774,"name":"$avg","kind":13,"containerName":"average"},{"containerName":"average","kind":13,"name":"$q","line":774},{"line":775,"containerName":"average","kind":13,"name":"$avg"},{"name":"$q","kind":13,"containerName":"average","line":775},{"line":775,"kind":13,"containerName":"average","name":"$avg"},{"containerName":"average","kind":13,"name":"$avg","line":776},{"line":776,"containerName":"average","kind":13,"name":"$q"},{"line":779,"containerName":"average","kind":13,"name":"$avg"},{"line":779,"kind":13,"containerName":"average","name":"$avg"},{"line":781,"kind":13,"containerName":"average","name":"$avg"},{"line":781,"name":"$avg","kind":13,"containerName":"average"},{"containerName":"average","kind":13,"name":"$avg","line":783}],"containerName":"main::","name":"average","definition":"sub","detail":"($self,$list)","line":751,"range":{"start":{"line":751,"character":0},"end":{"line":784,"character":9999}},"kind":12,"signature":{"label":"average($self,$list)","parameters":[{"label":"$self"},{"label":"$list"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params"}},{"name":"Bio","kind":12,"containerName":"Assembly::Tools::ContigSpectrum","line":758},{"children":[{"name":"$self","localvar":"my","containerName":"score","kind":13,"line":808,"definition":"my"},{"kind":13,"containerName":"score","name":"$nof_seqs","line":808},{"definition":"my","line":810,"name":"$score","kind":13,"localvar":"my","containerName":"score"},{"line":811,"name":"$n","localvar":"my","kind":13,"containerName":"score","definition":"my"},{"name":"$self","kind":13,"containerName":"score","line":811},{"line":811,"name":"nof_seq","kind":12,"containerName":"score"},{"kind":13,"containerName":"score","name":"$n","line":812},{"line":814,"name":"$q_max","localvar":"my","containerName":"score","kind":13,"definition":"my"},{"kind":13,"containerName":"score","name":"$self","line":814},{"line":814,"name":"max_size","kind":12,"containerName":"score"},{"containerName":"score","localvar":"my","kind":13,"name":"$spec","line":815,"definition":"my"},{"line":815,"name":"$self","containerName":"score","kind":13},{"line":815,"name":"spectrum","kind":12,"containerName":"score"},{"name":"$nof_seqs","kind":13,"containerName":"score","line":817},{"containerName":"score","kind":13,"name":"$spec","line":818},{"name":"$nof_seqs","containerName":"score","kind":13,"line":818},{"line":818,"name":"$n","kind":13,"containerName":"score"},{"containerName":"score","kind":13,"name":"$n","line":819},{"line":819,"name":"$nof_seqs","containerName":"score","kind":13},{"definition":"my","line":822,"containerName":"score","localvar":"my","kind":13,"name":"$q"},{"kind":13,"containerName":"score","name":"$q_max","line":822},{"line":823,"kind":13,"containerName":"score","name":"$spec"},{"line":823,"name":"$q","kind":13,"containerName":"score"},{"name":"$c_q","containerName":"score","localvar":"my","kind":13,"line":824,"definition":"my"},{"kind":13,"containerName":"score","name":"$spec","line":824},{"containerName":"score","kind":13,"name":"$q","line":824},{"line":825,"name":"$score","kind":13,"containerName":"score"},{"containerName":"score","kind":13,"name":"$c_q","line":825},{"kind":13,"containerName":"score","name":"$q","line":825},{"line":828,"containerName":"score","kind":13,"name":"$score"},{"line":828,"containerName":"score","kind":13,"name":"$n"},{"line":831,"kind":13,"containerName":"score","name":"$score"},{"name":"$n","kind":13,"containerName":"score","line":831},{"name":"$n","containerName":"score","kind":13,"line":831},{"line":831,"containerName":"score","kind":13,"name":"$score"},{"name":"$n","containerName":"score","kind":13,"line":831},{"containerName":"score","kind":13,"name":"$score","line":832}],"containerName":"main::","name":"score","definition":"sub","detail":"($self,$nof_seqs)","line":807,"range":{"end":{"character":9999,"line":833},"start":{"character":0,"line":807}},"kind":12,"signature":{"parameters":[{"label":"$self"},{"label":"$nof_seqs"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]","label":"score($self,$nof_seqs)"}},{"children":[{"line":854,"localvar":"my","containerName":"_naive_assembler","kind":13,"name":"$self","definition":"my"},{"kind":13,"containerName":"_naive_assembler","name":"$contig","line":854},{"line":854,"kind":13,"containerName":"_naive_assembler","name":"$seqlist"},{"line":854,"name":"$min_overlap","kind":13,"containerName":"_naive_assembler"},{"name":"$min_identity","containerName":"_naive_assembler","kind":13,"line":854},{"name":"$seqlist","kind":13,"containerName":"_naive_assembler","line":856},{"line":856,"kind":13,"containerName":"_naive_assembler","name":"$seqlist"},{"line":857,"containerName":"_naive_assembler","kind":13,"name":"$self"},{"line":857,"name":"throw","containerName":"_naive_assembler","kind":12},{"line":857,"name":"$seqlist","containerName":"_naive_assembler","kind":13},{"containerName":"_naive_assembler","localvar":"my","kind":13,"name":"$max","line":859,"definition":"my"},{"line":859,"kind":13,"containerName":"_naive_assembler","name":"$seqlist"},{"name":"$self","kind":13,"containerName":"_naive_assembler","line":860},{"line":860,"containerName":"_naive_assembler","kind":12,"name":"throw"},{"line":861,"name":"$max","containerName":"_naive_assembler","kind":13},{"name":"%spectrum","kind":13,"localvar":"my","containerName":"_naive_assembler","line":863,"definition":"my"},{"name":"%overlap_map","localvar":"my","kind":13,"containerName":"_naive_assembler","line":864,"definition":"my"},{"definition":"my","line":865,"name":"%has_overlap","kind":13,"localvar":"my","containerName":"_naive_assembler"},{"localvar":"my","kind":13,"containerName":"_naive_assembler","name":"$i","line":867,"definition":"my"},{"line":867,"containerName":"_naive_assembler","kind":13,"name":"$i"},{"name":"$max","kind":13,"containerName":"_naive_assembler","line":867},{"kind":13,"containerName":"_naive_assembler","name":"$i","line":867},{"line":869,"name":"$qseqid","containerName":"_naive_assembler","localvar":"my","kind":13,"definition":"my"},{"name":"$i","containerName":"_naive_assembler","kind":13,"line":869},{"definition":"my","kind":13,"localvar":"my","containerName":"_naive_assembler","name":"$qseq","line":870},{"containerName":"_naive_assembler","kind":13,"name":"$contig","line":870},{"line":870,"name":"get_seq_by_name","kind":12,"containerName":"_naive_assembler"},{"line":870,"name":"$qseqid","kind":13,"containerName":"_naive_assembler"},{"definition":"my","line":871,"containerName":"_naive_assembler","localvar":"my","kind":13,"name":"$is_singlet"},{"definition":"my","localvar":"my","containerName":"_naive_assembler","kind":13,"name":"$j","line":872},{"line":872,"containerName":"_naive_assembler","kind":13,"name":"$i"},{"line":872,"containerName":"_naive_assembler","kind":13,"name":"$j"},{"line":872,"name":"$max","containerName":"_naive_assembler","kind":13},{"line":872,"containerName":"_naive_assembler","kind":13,"name":"$j"},{"line":874,"localvar":"my","kind":13,"containerName":"_naive_assembler","name":"$tseqid","definition":"my"},{"line":874,"containerName":"_naive_assembler","kind":13,"name":"$j"},{"definition":"my","line":875,"containerName":"_naive_assembler","localvar":"my","kind":13,"name":"$tseq"},{"name":"$contig","kind":13,"containerName":"_naive_assembler","line":875},{"name":"get_seq_by_name","containerName":"_naive_assembler","kind":12,"line":875},{"line":875,"name":"$tseqid","containerName":"_naive_assembler","kind":13},{"name":"$aln","localvar":"my","kind":13,"containerName":"_naive_assembler","line":877,"definition":"my"},{"containerName":"_naive_assembler","kind":13,"name":"$overlap","line":877},{"name":"$identity","containerName":"_naive_assembler","kind":13,"line":877},{"name":"$self","containerName":"_naive_assembler","kind":13,"line":878},{"line":878,"containerName":"_naive_assembler","kind":12,"name":"_overlap_alignment"},{"line":878,"containerName":"_naive_assembler","kind":13,"name":"$contig"},{"containerName":"_naive_assembler","kind":13,"name":"$qseq","line":878},{"line":878,"name":"$tseq","containerName":"_naive_assembler","kind":13},{"line":878,"kind":13,"containerName":"_naive_assembler","name":"$min_overlap"},{"line":879,"name":"$min_identity","containerName":"_naive_assembler","kind":13},{"line":881,"containerName":"_naive_assembler","kind":13,"name":"$aln"},{"name":"$is_singlet","kind":13,"containerName":"_naive_assembler","line":883},{"containerName":"_naive_assembler","kind":13,"name":"$overlap_map","line":884},{"containerName":"_naive_assembler","kind":13,"name":"$qseqid","line":884},{"name":"$tseqid","containerName":"_naive_assembler","kind":13,"line":884},{"line":885,"name":"$has_overlap","kind":13,"containerName":"_naive_assembler"},{"name":"$tseqid","containerName":"_naive_assembler","kind":13,"line":885},{"kind":13,"containerName":"_naive_assembler","name":"$has_overlap","line":886},{"kind":13,"containerName":"_naive_assembler","name":"$qseqid","line":886},{"line":889,"containerName":"_naive_assembler","kind":13,"name":"$has_overlap"},{"name":"$qseqid","containerName":"_naive_assembler","kind":13,"line":889},{"name":"$is_singlet","containerName":"_naive_assembler","kind":13,"line":890},{"line":892,"name":"$is_singlet","containerName":"_naive_assembler","kind":13},{"line":893,"kind":13,"containerName":"_naive_assembler","name":"$spectrum"}],"name":"_naive_assembler","containerName":"main::","definition":"sub","detail":"($self,$contig,$seqlist,$min_overlap,$min_identity)","line":853,"kind":12,"range":{"start":{"line":853,"character":0},"end":{"line":895,"character":9999}},"signature":{"label":"_naive_assembler($self,$contig,$seqlist,$min_overlap,$min_identity)","parameters":[{"label":"$self"},{"label":"$contig"},{"label":"$seqlist"},{"label":"$min_overlap"},{"label":"$min_identity"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]"}},{"name":"seqlist","kind":12,"line":869},{"line":874,"name":"seqlist","kind":12},{"definition":"my","line":897,"kind":13,"localvar":"my","containerName":null,"name":"$last_is_singlet"},{"line":898,"containerName":null,"kind":13,"name":"%has_overlap"},{"kind":12,"name":"seqlist","line":898},{"name":"%max","containerName":null,"kind":13,"line":898},{"name":"$last_is_singlet","kind":13,"containerName":null,"line":899},{"kind":13,"containerName":null,"name":"%last_is_singlet","line":901},{"line":902,"kind":13,"containerName":null,"name":"%spectrum"},{"line":905,"containerName":null,"localvar":"my","kind":13,"name":"$seqid","definition":"my"},{"line":905,"kind":13,"containerName":null,"name":"%seqlist"},{"name":"%overlap_map","kind":13,"containerName":null,"line":907},{"line":907,"name":"$seqid","containerName":null,"kind":13},{"line":908,"containerName":null,"localvar":"my","kind":13,"name":"@overlist","definition":"my"},{"kind":13,"containerName":null,"name":"%overlap_map","line":908},{"containerName":null,"kind":13,"name":"$seqid","line":908},{"line":909,"name":"$j","localvar":"my","kind":13,"containerName":null,"definition":"my"},{"name":"$j","kind":13,"containerName":null,"line":909},{"kind":13,"containerName":null,"name":"@overlist","line":909},{"name":"%j","containerName":null,"kind":13,"line":909},{"line":910,"kind":13,"localvar":"my","containerName":null,"name":"$otherseqid","definition":"my"},{"containerName":null,"kind":13,"name":"@overlist","line":910},{"line":910,"name":"$j","kind":13,"containerName":null},{"name":"%overlap_map","containerName":null,"kind":13,"line":911},{"line":911,"name":"%otherseqid","containerName":null,"kind":13},{"containerName":null,"kind":13,"name":"@overlist","line":912},{"containerName":null,"kind":13,"name":"%overlap_map","line":912},{"line":912,"containerName":null,"kind":13,"name":"$otherseqid"},{"containerName":null,"kind":13,"name":"%overlap_map","line":913},{"line":913,"name":"$otherseqid","kind":13,"containerName":null},{"line":917,"containerName":null,"kind":13,"name":"@overlist"},{"line":917,"name":"@overlist","containerName":null,"kind":13},{"line":918,"containerName":null,"localvar":"my","kind":13,"name":"$j","definition":"my"},{"name":"$j","kind":13,"containerName":null,"line":918},{"line":918,"name":"@overlist","containerName":null,"kind":13},{"containerName":null,"kind":13,"name":"%j","line":918},{"name":"@overlist","kind":13,"containerName":null,"line":919},{"name":"$j","containerName":null,"kind":13,"line":919},{"line":919,"name":"@overlist","containerName":null,"kind":13},{"containerName":null,"kind":13,"name":"%j","line":919},{"containerName":null,"kind":13,"name":"@overlist","line":920},{"line":920,"containerName":null,"kind":13,"name":"$j"},{"line":921,"name":"$j","containerName":null,"kind":13},{"line":925,"name":"$qsize","localvar":"my","containerName":null,"kind":13,"definition":"my"},{"line":925,"name":"@overlist","kind":13,"containerName":null},{"name":"%spectrum","containerName":null,"kind":13,"line":926},{"name":"%qsize","containerName":null,"kind":13,"line":926},{"name":"%spectrum","kind":13,"containerName":null,"line":927},{"line":927,"name":"%qsize","containerName":null,"kind":13},{"containerName":null,"kind":13,"name":"%spectrum","line":929},{"name":"$qsize","containerName":null,"kind":13,"line":929},{"line":932,"name":"%spectrum","kind":13,"containerName":null},{"line":947,"range":{"start":{"line":947,"character":0},"end":{"character":9999,"line":986}},"kind":12,"signature":{"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold","parameters":[{"label":"$self"},{"label":"$assemblyobj"}],"label":"_new_from_assembly($self,$assemblyobj)"},"children":[{"definition":"my","localvar":"my","kind":13,"containerName":"_new_from_assembly","name":"$self","line":950},{"line":950,"containerName":"_new_from_assembly","kind":13,"name":"$assemblyobj"},{"definition":"my","name":"$csp","localvar":"my","kind":13,"containerName":"_new_from_assembly","line":951},{"line":951,"kind":12,"containerName":"_new_from_assembly","name":"new"},{"line":953,"kind":13,"containerName":"_new_from_assembly","name":"$csp"},{"line":953,"name":"$assemblyobj","kind":13,"containerName":"_new_from_assembly"},{"line":953,"name":"id","containerName":"_new_from_assembly","kind":12},{"name":"$csp","kind":13,"containerName":"_new_from_assembly","line":956},{"kind":13,"containerName":"_new_from_assembly","name":"$self","line":956},{"line":957,"name":"$csp","kind":13,"containerName":"_new_from_assembly"},{"definition":"my","name":"$nover","localvar":"my","containerName":"_new_from_assembly","kind":13,"line":958},{"kind":13,"containerName":"_new_from_assembly","name":"$minl","line":958},{"line":958,"kind":13,"containerName":"_new_from_assembly","name":"$avgl"},{"name":"$minid","containerName":"_new_from_assembly","kind":13,"line":958},{"line":958,"name":"$avgid","kind":13,"containerName":"_new_from_assembly"},{"line":959,"name":"$csp","containerName":"_new_from_assembly","kind":13},{"kind":12,"containerName":"_new_from_assembly","name":"_get_overlap_stats","line":959},{"kind":13,"containerName":"_new_from_assembly","name":"$assemblyobj","line":959},{"line":960,"name":"$csp","containerName":"_new_from_assembly","kind":13},{"name":"$minl","kind":13,"containerName":"_new_from_assembly","line":960},{"name":"$csp","kind":13,"containerName":"_new_from_assembly","line":961},{"line":961,"containerName":"_new_from_assembly","kind":13,"name":"$minid"},{"line":962,"kind":13,"containerName":"_new_from_assembly","name":"$csp"},{"containerName":"_new_from_assembly","kind":13,"name":"$avgl","line":962},{"line":963,"containerName":"_new_from_assembly","kind":13,"name":"$csp"},{"name":"$avgid","containerName":"_new_from_assembly","kind":13,"line":963},{"line":964,"containerName":"_new_from_assembly","kind":13,"name":"$csp"},{"line":964,"name":"$nover","kind":13,"containerName":"_new_from_assembly"},{"localvar":"my","containerName":"_new_from_assembly","kind":13,"name":"$nseq","line":967,"definition":"my"},{"containerName":"_new_from_assembly","kind":13,"name":"$avgseql","line":967},{"containerName":"_new_from_assembly","kind":13,"name":"$self","line":967},{"name":"_get_seq_stats","containerName":"_new_from_assembly","kind":12,"line":967},{"name":"$assemblyobj","containerName":"_new_from_assembly","kind":13,"line":967},{"line":968,"name":"$csp","kind":13,"containerName":"_new_from_assembly"},{"name":"$avgseql","containerName":"_new_from_assembly","kind":13,"line":968},{"line":969,"name":"$csp","containerName":"_new_from_assembly","kind":13},{"line":969,"containerName":"_new_from_assembly","kind":13,"name":"$nseq"},{"definition":"my","containerName":"_new_from_assembly","localvar":"my","kind":13,"name":"$contigobj","line":971},{"line":971,"name":"$assemblyobj","kind":13,"containerName":"_new_from_assembly"},{"line":971,"name":"all_contigs","kind":12,"containerName":"_new_from_assembly"},{"definition":"my","localvar":"my","kind":13,"containerName":"_new_from_assembly","name":"$size","line":972},{"name":"$contigobj","kind":13,"containerName":"_new_from_assembly","line":972},{"line":972,"name":"num_sequences","containerName":"_new_from_assembly","kind":12},{"name":"$csp","kind":13,"containerName":"_new_from_assembly","line":973},{"containerName":"_new_from_assembly","kind":13,"name":"$size","line":973},{"name":"$csp","containerName":"_new_from_assembly","kind":13,"line":974},{"line":974,"containerName":"_new_from_assembly","kind":13,"name":"$size"},{"name":"$csp","containerName":"_new_from_assembly","kind":13,"line":976},{"name":"$size","kind":13,"containerName":"_new_from_assembly","line":976},{"containerName":"_new_from_assembly","kind":13,"name":"$csp","line":978},{"line":978,"kind":13,"containerName":"_new_from_assembly","name":"$size"},{"containerName":"_new_from_assembly","kind":13,"name":"$size","line":978},{"containerName":"_new_from_assembly","kind":13,"name":"$csp","line":978},{"definition":"my","line":980,"name":"$nof_singlets","containerName":"_new_from_assembly","localvar":"my","kind":13},{"kind":13,"containerName":"_new_from_assembly","name":"$assemblyobj","line":980},{"line":980,"name":"get_nof_singlets","containerName":"_new_from_assembly","kind":12},{"kind":13,"containerName":"_new_from_assembly","name":"$nof_singlets","line":981},{"line":982,"kind":13,"containerName":"_new_from_assembly","name":"$csp"},{"name":"$nof_singlets","kind":13,"containerName":"_new_from_assembly","line":982},{"name":"$csp","kind":13,"containerName":"_new_from_assembly","line":983},{"containerName":"_new_from_assembly","kind":13,"name":"$nof_singlets","line":983},{"line":983,"containerName":"_new_from_assembly","kind":13,"name":"$csp"},{"name":"$csp","kind":13,"containerName":"_new_from_assembly","line":986}],"name":"_new_from_assembly","containerName":"main::","definition":"sub","detail":"($self,$assemblyobj)"},{"line":951,"name":"Bio","containerName":"Assembly::Tools::ContigSpectrum","kind":12},{"kind":13,"containerName":null,"name":"$assemblyobj","line":986},{"name":"%csp","kind":13,"containerName":null,"line":988},{"name":"$csp","containerName":null,"kind":13,"line":989},{"name":"_new_dissolved_csp","containerName":"main::","children":[{"line":1006,"name":"$self","localvar":"my","kind":13,"containerName":"_new_dissolved_csp","definition":"my"},{"line":1006,"name":"$mixed_csp","containerName":"_new_dissolved_csp","kind":13},{"kind":13,"containerName":"_new_dissolved_csp","name":"$seq_header","line":1006},{"name":"$mixed_csp","kind":13,"containerName":"_new_dissolved_csp","line":1011},{"name":"$self","kind":13,"containerName":"_new_dissolved_csp","line":1012},{"name":"$mixed_csp","containerName":"_new_dissolved_csp","kind":13,"line":1013},{"line":1014,"name":"$self","kind":13,"containerName":"_new_dissolved_csp"},{"kind":12,"containerName":"_new_dissolved_csp","name":"throw","line":1014},{"line":1017,"name":"$self","containerName":"_new_dissolved_csp","kind":13},{"line":1018,"kind":13,"containerName":"_new_dissolved_csp","name":"$mixed_csp"},{"name":"$self","containerName":"_new_dissolved_csp","kind":13,"line":1019},{"line":1019,"name":"throw","containerName":"_new_dissolved_csp","kind":12},{"containerName":"_new_dissolved_csp","kind":13,"name":"$mixed_csp","line":1025},{"name":"$mixed_csp","containerName":"_new_dissolved_csp","kind":13,"line":1026}],"detail":"($self,$mixed_csp,$seq_header)","definition":"sub","kind":12,"range":{"end":{"character":9999,"line":1026},"start":{"line":1005,"character":0}},"line":1005,"signature":{"label":"_new_dissolved_csp($self,$mixed_csp,$seq_header)","parameters":[{"label":"$self"},{"label":"$mixed_csp"},{"label":"$seq_header"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : "}},{"kind":13,"containerName":null,"name":"$self","line":1027},{"line":1027,"name":"throw","containerName":"main::","kind":12},{"name":"$dissolved","localvar":"my","containerName":null,"kind":13,"line":1032,"definition":"my"},{"name":"Bio","containerName":"Assembly::Tools::ContigSpectrum","kind":12,"line":1032},{"line":1032,"name":"new","containerName":"main::","kind":12},{"name":"%self","containerName":null,"kind":13,"line":1035},{"line":1036,"kind":13,"containerName":null,"name":"%dissolved"},{"containerName":null,"kind":13,"name":"%self","line":1036},{"name":"%dissolved","containerName":null,"kind":13,"line":1038},{"line":1038,"containerName":null,"kind":13,"name":"%mixed_csp"},{"containerName":null,"kind":13,"name":"%self","line":1040},{"kind":13,"containerName":null,"name":"%self","line":1040},{"kind":13,"containerName":null,"name":"%dissolved","line":1041},{"kind":13,"containerName":null,"name":"%dissolved","line":1041},{"kind":13,"containerName":null,"name":"%self","line":1042},{"name":"%self","kind":13,"containerName":null,"line":1042},{"containerName":null,"kind":13,"name":"%dissolved","line":1044},{"line":1044,"containerName":null,"kind":13,"name":"%dissolved"},{"line":1045,"containerName":null,"kind":13,"name":"%mixed_csp"},{"line":1045,"name":"%mixed_csp","kind":13,"containerName":null},{"line":1049,"kind":13,"localvar":"my","containerName":null,"name":"$assembly","definition":"my"},{"containerName":null,"kind":13,"name":"%mixed_csp","line":1049},{"definition":"my","line":1051,"name":"%asm_spectrum","localvar":"my","kind":13,"containerName":null},{"line":1052,"kind":13,"localvar":"my","containerName":null,"name":"%good_seqs","definition":"my"},{"definition":"my","name":"$contig","localvar":"my","containerName":null,"kind":13,"line":1054},{"line":1054,"name":"$assembly","kind":13,"containerName":null},{"name":"all_contigs","kind":12,"containerName":"main::","line":1054},{"definition":"my","line":1056,"localvar":"my","containerName":null,"kind":13,"name":"@contig_seqs"},{"definition":"my","line":1057,"name":"$seq","localvar":"my","kind":13,"containerName":null},{"line":1057,"name":"$contig","containerName":null,"kind":13},{"line":1057,"kind":12,"containerName":"main::","name":"each_seq"},{"definition":"my","name":"$seq_id","localvar":"my","containerName":null,"kind":13,"line":1058},{"line":1058,"containerName":null,"kind":13,"name":"$seq"},{"name":"id","kind":12,"containerName":"main::","line":1058},{"line":1060,"name":"$seq_id","containerName":null,"kind":13},{"containerName":null,"kind":13,"name":"@contig_seqs","line":1062},{"name":"$seq_id","containerName":null,"kind":13,"line":1062},{"kind":13,"containerName":null,"name":"%good_seqs","line":1063},{"line":1063,"name":"$seq_id","kind":13,"containerName":null},{"name":"$size","localvar":"my","kind":13,"containerName":null,"line":1066,"definition":"my"},{"line":1066,"name":"@contig_seqs","kind":13,"containerName":null},{"line":1067,"kind":13,"containerName":null,"name":"%size"},{"line":1069,"name":"%size","kind":13,"containerName":null},{"containerName":null,"kind":13,"name":"%asm_spectrum","line":1070},{"line":1071,"kind":13,"containerName":null,"name":"%size"},{"name":"$contig_spectrum","localvar":"my","kind":13,"containerName":null,"line":1073,"definition":"my"},{"kind":13,"containerName":null,"name":"$dissolved","line":1073},{"line":1073,"name":"_naive_assembler","kind":12,"containerName":"main::"},{"name":"$contig","containerName":null,"kind":13,"line":1074},{"line":1074,"name":"@contig_seqs","kind":13,"containerName":null},{"name":"%dissolved","containerName":null,"kind":13,"line":1074},{"line":1075,"containerName":null,"kind":13,"name":"%dissolved"},{"definition":"my","line":1077,"name":"$qsize","localvar":"my","containerName":null,"kind":13},{"name":"%contig_spectrum","kind":13,"containerName":null,"line":1077},{"name":"%asm_spectrum","containerName":null,"kind":13,"line":1078},{"containerName":null,"kind":13,"name":"$qsize","line":1078},{"line":1078,"kind":12,"name":"contig_spectrum"},{"containerName":null,"kind":13,"name":"%qsize","line":1078},{"name":"$self","containerName":null,"kind":13,"line":1081},{"containerName":"main::","kind":12,"name":"throw","line":1081},{"line":1085,"name":"$singlet","containerName":null,"localvar":"my","kind":13,"definition":"my"},{"line":1085,"containerName":null,"kind":13,"name":"$assembly"},{"line":1085,"name":"all_singlets","containerName":"main::","kind":12},{"definition":"my","line":1086,"localvar":"my","containerName":null,"kind":13,"name":"$seq_id"},{"kind":13,"containerName":null,"name":"$singlet","line":1086},{"name":"seqref","kind":12,"containerName":"main::","line":1086},{"name":"id","kind":12,"containerName":"main::","line":1086},{"kind":13,"containerName":null,"name":"$seq_id","line":1088},{"line":1090,"name":"%good_seqs","kind":13,"containerName":null},{"name":"$seq_id","containerName":null,"kind":13,"line":1090},{"line":1092,"name":"%asm_spectrum","containerName":null,"kind":13},{"line":1095,"name":"$dissolved","containerName":null,"kind":13},{"name":"_import_spectrum","kind":12,"containerName":"main::","line":1095},{"kind":13,"containerName":null,"name":"%asm_spectrum","line":1095},{"name":"%dissolved","kind":13,"containerName":null,"line":1097},{"line":1098,"name":"%dissolved","containerName":null,"kind":13},{"line":1098,"name":"%mixed_csp","kind":13,"containerName":null},{"line":1101,"name":"$nseq","containerName":null,"localvar":"my","kind":13,"definition":"my"},{"line":1101,"name":"$avgseql","kind":13,"containerName":null},{"line":1101,"containerName":null,"kind":13,"name":"$dissolved"},{"name":"_get_seq_stats","containerName":"main::","kind":12,"line":1101},{"line":1101,"name":"$assembly","kind":13,"containerName":null},{"name":"%good_seqs","kind":13,"containerName":null,"line":1101},{"containerName":null,"kind":13,"name":"%dissolved","line":1102},{"line":1102,"name":"$avgseql","containerName":null,"kind":13},{"kind":13,"containerName":null,"name":"%dissolved","line":1103},{"line":1103,"name":"$nseq","containerName":null,"kind":13},{"containerName":null,"kind":13,"name":"%dissolved","line":1106},{"definition":"my","line":1107,"localvar":"my","kind":13,"containerName":null,"name":"$nover"},{"line":1107,"name":"$minl","containerName":null,"kind":13},{"containerName":null,"kind":13,"name":"$avgl","line":1107},{"kind":13,"containerName":null,"name":"$minid","line":1107},{"line":1107,"containerName":null,"kind":13,"name":"$avgid"},{"line":1108,"name":"$dissolved","containerName":null,"kind":13},{"name":"_get_overlap_stats","containerName":"main::","kind":12,"line":1108},{"line":1108,"name":"$assembly","kind":13,"containerName":null},{"kind":13,"containerName":null,"name":"%good_seqs","line":1108},{"name":"%dissolved","kind":13,"containerName":null,"line":1109},{"line":1109,"containerName":null,"kind":13,"name":"$minl"},{"name":"%dissolved","kind":13,"containerName":null,"line":1110},{"line":1110,"name":"$minid","kind":13,"containerName":null},{"name":"%dissolved","kind":13,"containerName":null,"line":1111},{"name":"$avgl","containerName":null,"kind":13,"line":1111},{"line":1112,"name":"%dissolved","kind":13,"containerName":null},{"name":"$avgid","kind":13,"containerName":null,"line":1112},{"name":"%dissolved","containerName":null,"kind":13,"line":1113},{"line":1113,"name":"$nover","kind":13,"containerName":null},{"line":1117,"kind":13,"containerName":null,"name":"$dissolved"},{"kind":12,"range":{"start":{"line":1132,"character":0},"end":{"line":1137,"character":9999}},"line":1132,"signature":{"label":"_new_cross_csp($self,$mixed_csp)","parameters":[{"label":"$self"},{"label":"$mixed_csp"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : \n\n\n\nsub _new_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity checks on the mixed contig spectrum\n\n  # min_overlap and min_identity must be specified if there are some overlaps\n  # in the mixed contig\n  unless ($mixed_csp->{'_nof_overlaps'} == 0) {\n    unless ( defined $self->{'_min_overlap'} || \n      defined $mixed_csp->{'_min_overlap'} ) {\n      $self->throw(\"min_overlap must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum to dissolve a contig\");\n    }\n    unless ( defined $self->{'_min_identity'} ||\n      defined $mixed_csp->{'_min_identity'} ) {\n      $self->throw(\"min_identity must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum\");\n    }\n  }\n  \n  # there must be at least one assembly in mixed contig spectrum\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one\n    assembly\");\n  }\n\n  # New dissolved contig spectrum object\n  my $dissolved = Bio::Assembly::Tools::ContigSpectrum->new();\n  \n  # take parent attributes if existent or mixed attributes otherwise\n  if ($self->{'_eff_asm_params'}) {\n    $dissolved->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $dissolved->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Dissolve each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Dissolve this assembly for the given sequences\n    my %asm_spectrum = (1 => 0);\n    my %good_seqs;\n    # For each contig\n    for my $contig ($assembly->all_contigs) {\n      # Get good sequences\n      my @contig_seqs;\n      for my $seq ($contig->each_seq) {\n        my $seq_id = $seq->id;\n        # get sequence origin\n        next unless $seq_id =~ m/^$seq_header\\|/;\n        # add it to hash\n        push @contig_seqs, $seq_id;\n        $good_seqs{$seq_id} = 1;\n      }\n      # Update spectrum\n      my $size = scalar @contig_seqs;\n      if ($size == 0) {\n        next;\n      } elsif ($size == 1) {\n        $asm_spectrum{1}++;\n      } elsif ($size > 1) {\n        # Reassemble good sequences\n        my $contig_spectrum = $dissolved->_naive_assembler(\n          $contig, \\@contig_seqs, $dissolved->{'_min_overlap'},\n          $dissolved->{'_min_identity'});\n        # update spectrum\n        for my $qsize (keys %$contig_spectrum) {\n          $asm_spectrum{$qsize} += $$contig_spectrum{$qsize};\n        }\n      } else {\n        $self->throw(\"The size is not valid... how could that happen?\");\n      }\n    }\n    # For each singlet\n    for my $singlet ($assembly->all_singlets) {\n      my $seq_id = $singlet->seqref->id;\n      # get sequence origin\n      next unless $seq_id =~ m/^$seq_header\\|/;\n      # add it to hash\n      $good_seqs{$seq_id} = 1;\n      # update spectrum\n      $asm_spectrum{1}++;\n    }\n    # Update spectrum\n    $dissolved->_import_spectrum(\\%asm_spectrum);\n    # Update nof_rep\n    $dissolved->{'_nof_rep'}--;\n    $dissolved->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n\n    # Get sequence stats\n    my ($nseq, $avgseql) = $dissolved->_get_seq_stats($assembly, \\%good_seqs);\n    $dissolved->{'_avg_seq_len'} = $avgseql;\n    $dissolved->{'_nof_seq'}     = $nseq;\n  \n    # Get eff_asm_param for these sequences\n    if ($dissolved->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $dissolved->_get_overlap_stats($assembly, \\%good_seqs);\n      $dissolved->{'_min_overlap'}  = $minl;\n      $dissolved->{'_min_identity'} = $minid;\n      $dissolved->{'_avg_overlap'}  = $avgl;\n      $dissolved->{'_avg_identity'} = $avgid;\n      $dissolved->{'_nof_overlaps'} = $nover;\n    }\n\n  }\n  return $dissolved;\n}\n\n\n=head2 _new_cross_csp\n\n  Title   : \n  Usage   : \n  Function: create a cross contig spectrum object\n  Returns : \n  Args    : "},"containerName":"main::","name":"_new_cross_csp","children":[{"line":1133,"name":"$self","localvar":"my","containerName":"_new_cross_csp","kind":13,"definition":"my"},{"containerName":"_new_cross_csp","kind":13,"name":"$mixed_csp","line":1133},{"name":"$mixed_csp","containerName":"_new_cross_csp","kind":13,"line":1136},{"line":1137,"containerName":"_new_cross_csp","kind":13,"name":"$mixed_csp"}],"detail":"($self,$mixed_csp)","definition":"sub"},{"name":"$self","containerName":null,"kind":13,"line":1138},{"line":1138,"name":"throw","containerName":"main::","kind":12},{"line":1143,"localvar":"my","kind":13,"containerName":null,"name":"$cross","definition":"my"},{"name":"Bio","containerName":"Assembly::Tools::ContigSpectrum","kind":12,"line":1143},{"line":1143,"containerName":"main::","kind":12,"name":"new"},{"containerName":null,"localvar":"my","kind":13,"name":"%spectrum","line":1144,"definition":"my"},{"name":"%self","kind":13,"containerName":null,"line":1147},{"kind":13,"containerName":null,"name":"%cross","line":1148},{"line":1148,"kind":13,"containerName":null,"name":"%self"},{"name":"%cross","kind":13,"containerName":null,"line":1150},{"line":1150,"containerName":null,"kind":13,"name":"%mixed_csp"},{"line":1152,"containerName":null,"kind":13,"name":"%self"},{"line":1152,"name":"%self","containerName":null,"kind":13},{"name":"%cross","kind":13,"containerName":null,"line":1153},{"name":"%cross","kind":13,"containerName":null,"line":1153},{"name":"%self","kind":13,"containerName":null,"line":1154},{"line":1154,"name":"%self","containerName":null,"kind":13},{"name":"%cross","kind":13,"containerName":null,"line":1156},{"line":1156,"name":"%cross","containerName":null,"kind":13},{"kind":13,"containerName":null,"name":"%mixed_csp","line":1157},{"kind":13,"containerName":null,"name":"%mixed_csp","line":1157},{"name":"$assembly","localvar":"my","containerName":null,"kind":13,"line":1161,"definition":"my"},{"line":1161,"kind":13,"containerName":null,"name":"%mixed_csp"},{"line":1163,"name":"%good_seqs","kind":13,"localvar":"my","containerName":null,"definition":"my"},{"definition":"my","localvar":"my","kind":13,"containerName":null,"name":"$contig","line":1164},{"line":1164,"containerName":null,"kind":13,"name":"$assembly"},{"name":"all_contigs","kind":12,"containerName":"main::","line":1164},{"line":1166,"containerName":null,"localvar":"my","kind":13,"name":"@seq_origins","definition":"my"},{"definition":"my","name":"@seq_ids","localvar":"my","kind":13,"containerName":null,"line":1167},{"definition":"my","line":1168,"name":"$seq","containerName":null,"localvar":"my","kind":13},{"containerName":null,"kind":13,"name":"$contig","line":1168},{"line":1168,"containerName":"main::","kind":12,"name":"each_seq"},{"definition":"my","line":1170,"name":"$seq_id","containerName":null,"localvar":"my","kind":13},{"line":1170,"kind":13,"containerName":null,"name":"$seq"},{"containerName":"main::","kind":12,"name":"id","line":1170},{"name":"$seq_id","containerName":null,"kind":13,"line":1171},{"localvar":"my","containerName":null,"kind":13,"name":"$seq_header","line":1172,"definition":"my"},{"line":1173,"name":"$self","kind":13,"containerName":null},{"name":"warn","containerName":"main::","kind":12,"line":1173},{"name":"$seq_header","kind":13,"containerName":null,"line":1174},{"line":1175,"name":"$seq_header","kind":13,"containerName":null},{"name":"@seq_origins","kind":13,"containerName":null,"line":1176},{"line":1176,"containerName":null,"kind":13,"name":"$seq_header"},{"line":1177,"kind":13,"containerName":null,"name":"@seq_ids"},{"containerName":null,"kind":13,"name":"$seq_id","line":1177},{"definition":"my","name":"$qsize","kind":13,"localvar":"my","containerName":null,"line":1179},{"line":1179,"name":"@seq_ids","kind":13,"containerName":null},{"line":1180,"name":"@origins","localvar":"my","containerName":null,"kind":13,"definition":"my"},{"line":1180,"name":"$a","containerName":null,"kind":13},{"name":"$b","containerName":null,"kind":13,"line":1180},{"line":1180,"containerName":null,"kind":13,"name":"@seq_origins"},{"definition":"my","line":1181,"localvar":"my","containerName":null,"kind":13,"name":"$size"},{"line":1181,"name":"@origins","containerName":null,"kind":13},{"name":"$i","localvar":"my","kind":13,"containerName":null,"line":1182,"definition":"my"},{"containerName":null,"kind":13,"name":"$i","line":1182},{"name":"$size","containerName":null,"kind":13,"line":1182},{"name":"%i","containerName":null,"kind":13,"line":1182},{"line":1183,"name":"@origins","kind":13,"containerName":null},{"line":1183,"name":"$i","kind":13,"containerName":null},{"line":1183,"name":"@origins","kind":13,"containerName":null},{"line":1183,"kind":13,"containerName":null,"name":"%i"},{"name":"@origins","kind":13,"containerName":null,"line":1184},{"line":1184,"name":"$i","containerName":null,"kind":13},{"containerName":null,"kind":13,"name":"$i","line":1185},{"line":1186,"name":"$size","containerName":null,"kind":13},{"line":1190,"name":"%size","kind":13,"containerName":null},{"name":"$seq_id","kind":13,"localvar":"my","containerName":null,"line":1192,"definition":"my"},{"line":1192,"name":"@seq_ids","containerName":null,"kind":13},{"kind":13,"containerName":null,"name":"%good_seqs","line":1193},{"line":1193,"containerName":null,"kind":13,"name":"$seq_id"},{"name":"%spectrum","kind":13,"containerName":null,"line":1196},{"kind":13,"containerName":null,"name":"%qsize","line":1196},{"name":"%spectrum","containerName":null,"kind":13,"line":1197},{"containerName":null,"kind":13,"name":"%qsize","line":1197},{"name":"%spectrum","kind":13,"containerName":null,"line":1199},{"line":1199,"name":"$qsize","kind":13,"containerName":null},{"line":1203,"kind":13,"containerName":null,"name":"%size"},{"definition":"my","localvar":"my","kind":13,"containerName":null,"name":"$origin","line":1204},{"name":"@origins","containerName":null,"kind":13,"line":1204},{"definition":"my","line":1206,"localvar":"my","kind":13,"containerName":null,"name":"@ids"},{"kind":13,"localvar":"my","containerName":null,"name":"$i","line":1207,"definition":"my"},{"line":1207,"containerName":null,"kind":13,"name":"$i"},{"name":"$qsize","containerName":null,"kind":13,"line":1207},{"line":1207,"name":"%i","kind":13,"containerName":null},{"localvar":"my","containerName":null,"kind":13,"name":"$seq_origin","line":1208,"definition":"my"},{"line":1208,"containerName":null,"kind":13,"name":"@seq_origins"},{"line":1208,"name":"$i","kind":13,"containerName":null},{"definition":"my","localvar":"my","kind":13,"containerName":null,"name":"$seq_id","line":1209},{"name":"@seq_ids","containerName":null,"kind":13,"line":1209},{"containerName":null,"kind":13,"name":"$i","line":1209},{"containerName":null,"kind":13,"name":"@ids","line":1210},{"line":1210,"name":"$seq_id","kind":13,"containerName":null},{"line":1210,"containerName":null,"kind":13,"name":"$seq_origin"},{"line":1210,"kind":13,"containerName":null,"name":"$origin"},{"name":"@ids","kind":13,"containerName":null,"line":1212},{"kind":13,"containerName":null,"name":"%spectrum","line":1213},{"containerName":null,"kind":13,"name":"@ids","line":1214},{"line":1215,"name":"$contig_spectrum","localvar":"my","kind":13,"containerName":null,"definition":"my"},{"line":1215,"kind":13,"containerName":null,"name":"$cross"},{"line":1215,"kind":12,"containerName":"main::","name":"_naive_assembler"},{"kind":13,"containerName":null,"name":"$contig","line":1216},{"line":1216,"containerName":null,"kind":13,"name":"@ids"},{"line":1216,"name":"%cross","containerName":null,"kind":13},{"name":"%cross","containerName":null,"kind":13,"line":1217},{"line":1218,"containerName":null,"kind":13,"name":"%spectrum"},{"line":1218,"name":"contig_spectrum","kind":12},{"line":1220,"name":"$self","containerName":null,"kind":13},{"line":1220,"name":"throw","containerName":"main::","kind":12},{"definition":"my","line":1226,"localvar":"my","containerName":null,"kind":13,"name":"$nseq"},{"line":1226,"name":"$avgseql","kind":13,"containerName":null},{"line":1226,"containerName":null,"kind":13,"name":"$cross"},{"containerName":"main::","kind":12,"name":"_get_seq_stats","line":1226},{"name":"$assembly","containerName":null,"kind":13,"line":1226},{"name":"%good_seqs","containerName":null,"kind":13,"line":1226},{"line":1227,"name":"%cross","kind":13,"containerName":null},{"name":"$avgseql","kind":13,"containerName":null,"line":1227},{"line":1228,"kind":13,"containerName":null,"name":"%cross"},{"line":1228,"kind":13,"containerName":null,"name":"$nseq"},{"kind":13,"containerName":null,"name":"%cross","line":1230},{"name":"$nover","localvar":"my","containerName":null,"kind":13,"line":1231,"definition":"my"},{"line":1231,"name":"$minl","containerName":null,"kind":13},{"line":1231,"kind":13,"containerName":null,"name":"$avgl"},{"name":"$minid","kind":13,"containerName":null,"line":1231},{"line":1231,"name":"$avgid","kind":13,"containerName":null},{"line":1232,"containerName":null,"kind":13,"name":"$cross"},{"line":1232,"name":"_get_overlap_stats","kind":12,"containerName":"main::"},{"line":1232,"containerName":null,"kind":13,"name":"$assembly"},{"kind":13,"containerName":null,"name":"%good_seqs","line":1232},{"line":1233,"kind":13,"containerName":null,"name":"%cross"},{"name":"$minl","containerName":null,"kind":13,"line":1233},{"containerName":null,"kind":13,"name":"%cross","line":1234},{"kind":13,"containerName":null,"name":"$minid","line":1234},{"name":"%cross","kind":13,"containerName":null,"line":1235},{"kind":13,"containerName":null,"name":"$avgl","line":1235},{"name":"%cross","kind":13,"containerName":null,"line":1236},{"name":"$avgid","containerName":null,"kind":13,"line":1236},{"line":1237,"containerName":null,"kind":13,"name":"%cross"},{"line":1237,"name":"$nover","kind":13,"containerName":null},{"line":1241,"containerName":null,"kind":13,"name":"$cross"},{"name":"_import_spectrum","kind":12,"containerName":"main::","line":1241},{"line":1241,"name":"%spectrum","kind":13,"containerName":null},{"line":1243,"containerName":null,"kind":13,"name":"%cross"},{"name":"%cross","containerName":null,"kind":13,"line":1244},{"line":1244,"kind":13,"containerName":null,"name":"%mixed_csp"},{"containerName":null,"kind":13,"name":"$cross","line":1246},{"definition":"sub","detail":"($self,$assemblyobj)","children":[{"name":"$self","kind":13,"localvar":"my","containerName":"_import_assembly","line":1260,"definition":"my"},{"name":"$assemblyobj","containerName":"_import_assembly","kind":13,"line":1260},{"line":1262,"name":"$assemblyobj","containerName":"_import_assembly","kind":13},{"kind":13,"containerName":"_import_assembly","name":"$assemblyobj","line":1262},{"name":"isa","kind":12,"containerName":"_import_assembly","line":1262},{"name":"$self","kind":13,"containerName":"_import_assembly","line":1263},{"line":1263,"name":"throw","containerName":"_import_assembly","kind":12},{"name":"$assemblyobj","kind":13,"containerName":"_import_assembly","line":1264},{"line":1267,"containerName":"_import_assembly","localvar":"my","kind":13,"name":"$csp","definition":"my"},{"name":"$self","kind":13,"containerName":"_import_assembly","line":1267},{"kind":12,"containerName":"_import_assembly","name":"_new_from_assembly","line":1267},{"line":1267,"containerName":"_import_assembly","kind":13,"name":"$assemblyobj"},{"containerName":"_import_assembly","kind":13,"name":"$self","line":1269},{"line":1269,"name":"add","containerName":"_import_assembly","kind":12},{"line":1269,"containerName":"_import_assembly","kind":13,"name":"$csp"}],"containerName":"main::","name":"_import_assembly","signature":{"label":"_import_assembly($self,$assemblyobj)","documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : \n\n\n\nsub _new_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity checks on the mixed contig spectrum\n\n  # min_overlap and min_identity must be specified if there are some overlaps\n  # in the mixed contig\n  unless ($mixed_csp->{'_nof_overlaps'} == 0) {\n    unless ( defined $self->{'_min_overlap'} || \n      defined $mixed_csp->{'_min_overlap'} ) {\n      $self->throw(\"min_overlap must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum to dissolve a contig\");\n    }\n    unless ( defined $self->{'_min_identity'} ||\n      defined $mixed_csp->{'_min_identity'} ) {\n      $self->throw(\"min_identity must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum\");\n    }\n  }\n  \n  # there must be at least one assembly in mixed contig spectrum\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one\n    assembly\");\n  }\n\n  # New dissolved contig spectrum object\n  my $dissolved = Bio::Assembly::Tools::ContigSpectrum->new();\n  \n  # take parent attributes if existent or mixed attributes otherwise\n  if ($self->{'_eff_asm_params'}) {\n    $dissolved->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $dissolved->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Dissolve each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Dissolve this assembly for the given sequences\n    my %asm_spectrum = (1 => 0);\n    my %good_seqs;\n    # For each contig\n    for my $contig ($assembly->all_contigs) {\n      # Get good sequences\n      my @contig_seqs;\n      for my $seq ($contig->each_seq) {\n        my $seq_id = $seq->id;\n        # get sequence origin\n        next unless $seq_id =~ m/^$seq_header\\|/;\n        # add it to hash\n        push @contig_seqs, $seq_id;\n        $good_seqs{$seq_id} = 1;\n      }\n      # Update spectrum\n      my $size = scalar @contig_seqs;\n      if ($size == 0) {\n        next;\n      } elsif ($size == 1) {\n        $asm_spectrum{1}++;\n      } elsif ($size > 1) {\n        # Reassemble good sequences\n        my $contig_spectrum = $dissolved->_naive_assembler(\n          $contig, \\@contig_seqs, $dissolved->{'_min_overlap'},\n          $dissolved->{'_min_identity'});\n        # update spectrum\n        for my $qsize (keys %$contig_spectrum) {\n          $asm_spectrum{$qsize} += $$contig_spectrum{$qsize};\n        }\n      } else {\n        $self->throw(\"The size is not valid... how could that happen?\");\n      }\n    }\n    # For each singlet\n    for my $singlet ($assembly->all_singlets) {\n      my $seq_id = $singlet->seqref->id;\n      # get sequence origin\n      next unless $seq_id =~ m/^$seq_header\\|/;\n      # add it to hash\n      $good_seqs{$seq_id} = 1;\n      # update spectrum\n      $asm_spectrum{1}++;\n    }\n    # Update spectrum\n    $dissolved->_import_spectrum(\\%asm_spectrum);\n    # Update nof_rep\n    $dissolved->{'_nof_rep'}--;\n    $dissolved->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n\n    # Get sequence stats\n    my ($nseq, $avgseql) = $dissolved->_get_seq_stats($assembly, \\%good_seqs);\n    $dissolved->{'_avg_seq_len'} = $avgseql;\n    $dissolved->{'_nof_seq'}     = $nseq;\n  \n    # Get eff_asm_param for these sequences\n    if ($dissolved->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $dissolved->_get_overlap_stats($assembly, \\%good_seqs);\n      $dissolved->{'_min_overlap'}  = $minl;\n      $dissolved->{'_min_identity'} = $minid;\n      $dissolved->{'_avg_overlap'}  = $avgl;\n      $dissolved->{'_avg_identity'} = $avgid;\n      $dissolved->{'_nof_overlaps'} = $nover;\n    }\n\n  }\n  return $dissolved;\n}\n\n\n=head2 _new_cross_csp\n\n  Title   : \n  Usage   : \n  Function: create a cross contig spectrum object\n  Returns : \n  Args    : \n\n\n\nsub _new_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check on the mixed contig spectrum\n  # There must be at least one assembly\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one \".\n    \"assembly.\");\n  }\n  \n  # New dissolved contig spectrum object\n  my $cross = Bio::Assembly::Tools::ContigSpectrum->new();\n  my %spectrum = (1 => 0);\n  \n  # Take parent or mixed attributes\n  if ($self->{'_eff_asm_params'}) {\n    $cross->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $cross->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Get cross contig spectrum for each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Go through contigs and skip the pure ones\n    my %good_seqs;\n    for my $contig ($assembly->all_contigs) {\n      # Get origins\n      my @seq_origins;\n      my @seq_ids;\n      for my $seq ($contig->each_seq) {\n        # current sequence origin\n        my $seq_id = $seq->id;\n        $seq_id =~ m/^(.+)\\|/;\n        my $seq_header = $1;\n        $self->warn(\"Sequence $seq_id does not seem to have a header. Skipping\".\n          \" it...\") if not defined $seq_header;\n        $seq_header ||= '';\n        push @seq_origins, $seq_header;\n        push @seq_ids, $seq_id;\n      }\n      my $qsize = scalar(@seq_ids);\n      my @origins = sort { $a cmp $b } @seq_origins;\n      my $size = scalar(@origins);\n      for (my $i = 1 ; $i < $size ; $i++) {\n        if ($origins[$i] eq $origins[$i-1]) {\n          splice @origins, $i, 1;\n          $i--;\n          $size--;\n        }\n      }\n      # Update cross-contig number in spectrum\n      if ($size > 1) { # cross-contig detected\n        # update good sequences\n        for my $seq_id (@seq_ids) {\n          $good_seqs{$seq_id} = 1;\n        }\n        # update number of cross q-contigs in spectrum\n        if (defined $spectrum{$qsize}) {\n          $spectrum{$qsize}++;\n        } else {\n          $spectrum{$qsize} = 1;\n        }\n      }\n      # Update number of cross 1-contigs\n      if ($size > 1) { # cross-contig detected\n        for my $origin (@origins) {\n          # sequences to use\n          my @ids;\n          for (my $i = 0 ; $i < $qsize ; $i++) {\n            my $seq_origin = $seq_origins[$i];\n            my $seq_id = $seq_ids[$i];\n            push @ids, $seq_id if $seq_origin eq $origin;\n          }\n          if (scalar @ids == 1) {\n            $spectrum{1}++;\n          } elsif (scalar @ids > 1) {\n            my $contig_spectrum = $cross->_naive_assembler(\n              $contig, \\@ids, $cross->{'_min_overlap'},\n              $cross->{'_min_identity'});\n            $spectrum{1} += $$contig_spectrum{1};\n          } else {\n            $self->throw(\"The size is <= 0. How could such a thing happen?\");\n          }\n        }\n      }\n    }\n    # Get sequence stats\n    my ($nseq, $avgseql) = $cross->_get_seq_stats($assembly, \\%good_seqs);\n    $cross->{'_avg_seq_len'} = $avgseql;\n    $cross->{'_nof_seq'}     = $nseq;\n    # Get eff_asm_param for these sequences\n    if ($cross->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $cross->_get_overlap_stats($assembly, \\%good_seqs);\n      $cross->{'_min_overlap'}  = $minl;\n      $cross->{'_min_identity'} = $minid;\n      $cross->{'_avg_overlap'}  = $avgl;\n      $cross->{'_avg_identity'} = $avgid;\n      $cross->{'_nof_overlaps'} = $nover;\n    }\n  }\n  \n  $cross->_import_spectrum(\\%spectrum);\n  # Update nof_rep\n  $cross->{'_nof_rep'}--;\n  $cross->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n  \n  return $cross;\n}\n\n=head2 _import_assembly\n\n  Title   : _import_assembly\n  Usage   : $csp->_import_assembly($assemblyobj);\n  Function: Update a contig spectrum object based on an assembly object\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Scaffold assembly object","parameters":[{"label":"$self"},{"label":"$assemblyobj"}]},"line":1259,"kind":12,"range":{"start":{"character":0,"line":1259},"end":{"line":1271,"character":9999}}},{"detail":"($self,$spectrum)","definition":"sub","name":"_import_spectrum","containerName":"main::","children":[{"definition":"my","line":1287,"localvar":"my","containerName":"_import_spectrum","kind":13,"name":"$self"},{"line":1287,"name":"$spectrum","kind":13,"containerName":"_import_spectrum"},{"kind":13,"containerName":"_import_spectrum","name":"$spectrum","line":1289},{"name":"$spectrum","kind":13,"containerName":"_import_spectrum","line":1289},{"line":1290,"kind":13,"containerName":"_import_spectrum","name":"$self"},{"name":"throw","kind":12,"containerName":"_import_spectrum","line":1290},{"name":"$spectrum","kind":13,"containerName":"_import_spectrum","line":1291},{"definition":"my","containerName":"_import_spectrum","localvar":"my","kind":13,"name":"$size","line":1295},{"line":1297,"kind":13,"containerName":"_import_spectrum","name":"$self"},{"name":"$size","kind":13,"containerName":"_import_spectrum","line":1297},{"line":1298,"name":"$self","containerName":"_import_spectrum","kind":13},{"line":1298,"name":"$size","kind":13,"containerName":"_import_spectrum"},{"line":1298,"kind":13,"containerName":"_import_spectrum","name":"$size"},{"line":1300,"containerName":"_import_spectrum","kind":13,"name":"$self"},{"name":"$size","kind":13,"containerName":"_import_spectrum","line":1300},{"containerName":"_import_spectrum","kind":13,"name":"$size","line":1300},{"line":1303,"kind":13,"containerName":"_import_spectrum","name":"$self"},{"name":"$size","kind":13,"containerName":"_import_spectrum","line":1303},{"containerName":"_import_spectrum","kind":13,"name":"$size","line":1303},{"name":"$self","kind":13,"containerName":"_import_spectrum","line":1305},{"line":1305,"name":"$size","containerName":"_import_spectrum","kind":13},{"line":1305,"name":"$size","kind":13,"containerName":"_import_spectrum"},{"line":1305,"kind":13,"containerName":"_import_spectrum","name":"$self"},{"kind":13,"containerName":"_import_spectrum","name":"$self","line":1309},{"name":"$self","containerName":"_import_spectrum","kind":13,"line":1309}],"signature":{"parameters":[{"label":"$self"},{"label":"$spectrum"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : \n\n\n\nsub _new_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity checks on the mixed contig spectrum\n\n  # min_overlap and min_identity must be specified if there are some overlaps\n  # in the mixed contig\n  unless ($mixed_csp->{'_nof_overlaps'} == 0) {\n    unless ( defined $self->{'_min_overlap'} || \n      defined $mixed_csp->{'_min_overlap'} ) {\n      $self->throw(\"min_overlap must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum to dissolve a contig\");\n    }\n    unless ( defined $self->{'_min_identity'} ||\n      defined $mixed_csp->{'_min_identity'} ) {\n      $self->throw(\"min_identity must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum\");\n    }\n  }\n  \n  # there must be at least one assembly in mixed contig spectrum\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one\n    assembly\");\n  }\n\n  # New dissolved contig spectrum object\n  my $dissolved = Bio::Assembly::Tools::ContigSpectrum->new();\n  \n  # take parent attributes if existent or mixed attributes otherwise\n  if ($self->{'_eff_asm_params'}) {\n    $dissolved->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $dissolved->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Dissolve each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Dissolve this assembly for the given sequences\n    my %asm_spectrum = (1 => 0);\n    my %good_seqs;\n    # For each contig\n    for my $contig ($assembly->all_contigs) {\n      # Get good sequences\n      my @contig_seqs;\n      for my $seq ($contig->each_seq) {\n        my $seq_id = $seq->id;\n        # get sequence origin\n        next unless $seq_id =~ m/^$seq_header\\|/;\n        # add it to hash\n        push @contig_seqs, $seq_id;\n        $good_seqs{$seq_id} = 1;\n      }\n      # Update spectrum\n      my $size = scalar @contig_seqs;\n      if ($size == 0) {\n        next;\n      } elsif ($size == 1) {\n        $asm_spectrum{1}++;\n      } elsif ($size > 1) {\n        # Reassemble good sequences\n        my $contig_spectrum = $dissolved->_naive_assembler(\n          $contig, \\@contig_seqs, $dissolved->{'_min_overlap'},\n          $dissolved->{'_min_identity'});\n        # update spectrum\n        for my $qsize (keys %$contig_spectrum) {\n          $asm_spectrum{$qsize} += $$contig_spectrum{$qsize};\n        }\n      } else {\n        $self->throw(\"The size is not valid... how could that happen?\");\n      }\n    }\n    # For each singlet\n    for my $singlet ($assembly->all_singlets) {\n      my $seq_id = $singlet->seqref->id;\n      # get sequence origin\n      next unless $seq_id =~ m/^$seq_header\\|/;\n      # add it to hash\n      $good_seqs{$seq_id} = 1;\n      # update spectrum\n      $asm_spectrum{1}++;\n    }\n    # Update spectrum\n    $dissolved->_import_spectrum(\\%asm_spectrum);\n    # Update nof_rep\n    $dissolved->{'_nof_rep'}--;\n    $dissolved->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n\n    # Get sequence stats\n    my ($nseq, $avgseql) = $dissolved->_get_seq_stats($assembly, \\%good_seqs);\n    $dissolved->{'_avg_seq_len'} = $avgseql;\n    $dissolved->{'_nof_seq'}     = $nseq;\n  \n    # Get eff_asm_param for these sequences\n    if ($dissolved->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $dissolved->_get_overlap_stats($assembly, \\%good_seqs);\n      $dissolved->{'_min_overlap'}  = $minl;\n      $dissolved->{'_min_identity'} = $minid;\n      $dissolved->{'_avg_overlap'}  = $avgl;\n      $dissolved->{'_avg_identity'} = $avgid;\n      $dissolved->{'_nof_overlaps'} = $nover;\n    }\n\n  }\n  return $dissolved;\n}\n\n\n=head2 _new_cross_csp\n\n  Title   : \n  Usage   : \n  Function: create a cross contig spectrum object\n  Returns : \n  Args    : \n\n\n\nsub _new_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check on the mixed contig spectrum\n  # There must be at least one assembly\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one \".\n    \"assembly.\");\n  }\n  \n  # New dissolved contig spectrum object\n  my $cross = Bio::Assembly::Tools::ContigSpectrum->new();\n  my %spectrum = (1 => 0);\n  \n  # Take parent or mixed attributes\n  if ($self->{'_eff_asm_params'}) {\n    $cross->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $cross->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Get cross contig spectrum for each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Go through contigs and skip the pure ones\n    my %good_seqs;\n    for my $contig ($assembly->all_contigs) {\n      # Get origins\n      my @seq_origins;\n      my @seq_ids;\n      for my $seq ($contig->each_seq) {\n        # current sequence origin\n        my $seq_id = $seq->id;\n        $seq_id =~ m/^(.+)\\|/;\n        my $seq_header = $1;\n        $self->warn(\"Sequence $seq_id does not seem to have a header. Skipping\".\n          \" it...\") if not defined $seq_header;\n        $seq_header ||= '';\n        push @seq_origins, $seq_header;\n        push @seq_ids, $seq_id;\n      }\n      my $qsize = scalar(@seq_ids);\n      my @origins = sort { $a cmp $b } @seq_origins;\n      my $size = scalar(@origins);\n      for (my $i = 1 ; $i < $size ; $i++) {\n        if ($origins[$i] eq $origins[$i-1]) {\n          splice @origins, $i, 1;\n          $i--;\n          $size--;\n        }\n      }\n      # Update cross-contig number in spectrum\n      if ($size > 1) { # cross-contig detected\n        # update good sequences\n        for my $seq_id (@seq_ids) {\n          $good_seqs{$seq_id} = 1;\n        }\n        # update number of cross q-contigs in spectrum\n        if (defined $spectrum{$qsize}) {\n          $spectrum{$qsize}++;\n        } else {\n          $spectrum{$qsize} = 1;\n        }\n      }\n      # Update number of cross 1-contigs\n      if ($size > 1) { # cross-contig detected\n        for my $origin (@origins) {\n          # sequences to use\n          my @ids;\n          for (my $i = 0 ; $i < $qsize ; $i++) {\n            my $seq_origin = $seq_origins[$i];\n            my $seq_id = $seq_ids[$i];\n            push @ids, $seq_id if $seq_origin eq $origin;\n          }\n          if (scalar @ids == 1) {\n            $spectrum{1}++;\n          } elsif (scalar @ids > 1) {\n            my $contig_spectrum = $cross->_naive_assembler(\n              $contig, \\@ids, $cross->{'_min_overlap'},\n              $cross->{'_min_identity'});\n            $spectrum{1} += $$contig_spectrum{1};\n          } else {\n            $self->throw(\"The size is <= 0. How could such a thing happen?\");\n          }\n        }\n      }\n    }\n    # Get sequence stats\n    my ($nseq, $avgseql) = $cross->_get_seq_stats($assembly, \\%good_seqs);\n    $cross->{'_avg_seq_len'} = $avgseql;\n    $cross->{'_nof_seq'}     = $nseq;\n    # Get eff_asm_param for these sequences\n    if ($cross->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $cross->_get_overlap_stats($assembly, \\%good_seqs);\n      $cross->{'_min_overlap'}  = $minl;\n      $cross->{'_min_identity'} = $minid;\n      $cross->{'_avg_overlap'}  = $avgl;\n      $cross->{'_avg_identity'} = $avgid;\n      $cross->{'_nof_overlaps'} = $nover;\n    }\n  }\n  \n  $cross->_import_spectrum(\\%spectrum);\n  # Update nof_rep\n  $cross->{'_nof_rep'}--;\n  $cross->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n  \n  return $cross;\n}\n\n=head2 _import_assembly\n\n  Title   : _import_assembly\n  Usage   : $csp->_import_assembly($assemblyobj);\n  Function: Update a contig spectrum object based on an assembly object\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Scaffold assembly object\n\n\nsub _import_assembly {\n  my ($self, $assemblyobj) = @_;\n  # Sanity check\n  if( !ref $assemblyobj || ! $assemblyobj->isa('Bio::Assembly::ScaffoldI') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::ScaffoldI assembly \".\n        \"object [\".ref($assemblyobj).\"]\");\n  }\n  # Create new object from assembly\n  my $csp = $self->_new_from_assembly($assemblyobj);\n  # Update current contig spectrum object with new one\n  $self->add($csp);\n  return 1;\n}\n\n\n=head2 _import_spectrum\n\n  Title   : _import_spectrum\n  Usage   : $csp->_import_spectrum({ 1 => 90 , 2 => 3 , 4 => 1 })\n  Function: update a contig spectrum object based on a contig spectrum\n            represented as a hash (key: contig size, value: number of contigs of\n            this size)\n  Returns : 1 for success, 0 for error\n  Args    : contig spectrum as a hash reference","label":"_import_spectrum($self,$spectrum)"},"kind":12,"range":{"start":{"line":1286,"character":0},"end":{"character":9999,"line":1309}},"line":1286},{"line":1310,"kind":13,"containerName":null,"name":"%self"},{"name":"%self","kind":13,"containerName":null,"line":1310},{"line":1313,"name":"%self","kind":13,"containerName":null},{"signature":{"label":"_import_dissolved_csp($self,$mixed_csp,$seq_header)","parameters":[{"label":"$self"},{"label":"$mixed_csp"},{"label":"$seq_header"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : \n\n\n\nsub _new_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity checks on the mixed contig spectrum\n\n  # min_overlap and min_identity must be specified if there are some overlaps\n  # in the mixed contig\n  unless ($mixed_csp->{'_nof_overlaps'} == 0) {\n    unless ( defined $self->{'_min_overlap'} || \n      defined $mixed_csp->{'_min_overlap'} ) {\n      $self->throw(\"min_overlap must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum to dissolve a contig\");\n    }\n    unless ( defined $self->{'_min_identity'} ||\n      defined $mixed_csp->{'_min_identity'} ) {\n      $self->throw(\"min_identity must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum\");\n    }\n  }\n  \n  # there must be at least one assembly in mixed contig spectrum\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one\n    assembly\");\n  }\n\n  # New dissolved contig spectrum object\n  my $dissolved = Bio::Assembly::Tools::ContigSpectrum->new();\n  \n  # take parent attributes if existent or mixed attributes otherwise\n  if ($self->{'_eff_asm_params'}) {\n    $dissolved->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $dissolved->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Dissolve each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Dissolve this assembly for the given sequences\n    my %asm_spectrum = (1 => 0);\n    my %good_seqs;\n    # For each contig\n    for my $contig ($assembly->all_contigs) {\n      # Get good sequences\n      my @contig_seqs;\n      for my $seq ($contig->each_seq) {\n        my $seq_id = $seq->id;\n        # get sequence origin\n        next unless $seq_id =~ m/^$seq_header\\|/;\n        # add it to hash\n        push @contig_seqs, $seq_id;\n        $good_seqs{$seq_id} = 1;\n      }\n      # Update spectrum\n      my $size = scalar @contig_seqs;\n      if ($size == 0) {\n        next;\n      } elsif ($size == 1) {\n        $asm_spectrum{1}++;\n      } elsif ($size > 1) {\n        # Reassemble good sequences\n        my $contig_spectrum = $dissolved->_naive_assembler(\n          $contig, \\@contig_seqs, $dissolved->{'_min_overlap'},\n          $dissolved->{'_min_identity'});\n        # update spectrum\n        for my $qsize (keys %$contig_spectrum) {\n          $asm_spectrum{$qsize} += $$contig_spectrum{$qsize};\n        }\n      } else {\n        $self->throw(\"The size is not valid... how could that happen?\");\n      }\n    }\n    # For each singlet\n    for my $singlet ($assembly->all_singlets) {\n      my $seq_id = $singlet->seqref->id;\n      # get sequence origin\n      next unless $seq_id =~ m/^$seq_header\\|/;\n      # add it to hash\n      $good_seqs{$seq_id} = 1;\n      # update spectrum\n      $asm_spectrum{1}++;\n    }\n    # Update spectrum\n    $dissolved->_import_spectrum(\\%asm_spectrum);\n    # Update nof_rep\n    $dissolved->{'_nof_rep'}--;\n    $dissolved->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n\n    # Get sequence stats\n    my ($nseq, $avgseql) = $dissolved->_get_seq_stats($assembly, \\%good_seqs);\n    $dissolved->{'_avg_seq_len'} = $avgseql;\n    $dissolved->{'_nof_seq'}     = $nseq;\n  \n    # Get eff_asm_param for these sequences\n    if ($dissolved->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $dissolved->_get_overlap_stats($assembly, \\%good_seqs);\n      $dissolved->{'_min_overlap'}  = $minl;\n      $dissolved->{'_min_identity'} = $minid;\n      $dissolved->{'_avg_overlap'}  = $avgl;\n      $dissolved->{'_avg_identity'} = $avgid;\n      $dissolved->{'_nof_overlaps'} = $nover;\n    }\n\n  }\n  return $dissolved;\n}\n\n\n=head2 _new_cross_csp\n\n  Title   : \n  Usage   : \n  Function: create a cross contig spectrum object\n  Returns : \n  Args    : \n\n\n\nsub _new_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check on the mixed contig spectrum\n  # There must be at least one assembly\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one \".\n    \"assembly.\");\n  }\n  \n  # New dissolved contig spectrum object\n  my $cross = Bio::Assembly::Tools::ContigSpectrum->new();\n  my %spectrum = (1 => 0);\n  \n  # Take parent or mixed attributes\n  if ($self->{'_eff_asm_params'}) {\n    $cross->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $cross->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Get cross contig spectrum for each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Go through contigs and skip the pure ones\n    my %good_seqs;\n    for my $contig ($assembly->all_contigs) {\n      # Get origins\n      my @seq_origins;\n      my @seq_ids;\n      for my $seq ($contig->each_seq) {\n        # current sequence origin\n        my $seq_id = $seq->id;\n        $seq_id =~ m/^(.+)\\|/;\n        my $seq_header = $1;\n        $self->warn(\"Sequence $seq_id does not seem to have a header. Skipping\".\n          \" it...\") if not defined $seq_header;\n        $seq_header ||= '';\n        push @seq_origins, $seq_header;\n        push @seq_ids, $seq_id;\n      }\n      my $qsize = scalar(@seq_ids);\n      my @origins = sort { $a cmp $b } @seq_origins;\n      my $size = scalar(@origins);\n      for (my $i = 1 ; $i < $size ; $i++) {\n        if ($origins[$i] eq $origins[$i-1]) {\n          splice @origins, $i, 1;\n          $i--;\n          $size--;\n        }\n      }\n      # Update cross-contig number in spectrum\n      if ($size > 1) { # cross-contig detected\n        # update good sequences\n        for my $seq_id (@seq_ids) {\n          $good_seqs{$seq_id} = 1;\n        }\n        # update number of cross q-contigs in spectrum\n        if (defined $spectrum{$qsize}) {\n          $spectrum{$qsize}++;\n        } else {\n          $spectrum{$qsize} = 1;\n        }\n      }\n      # Update number of cross 1-contigs\n      if ($size > 1) { # cross-contig detected\n        for my $origin (@origins) {\n          # sequences to use\n          my @ids;\n          for (my $i = 0 ; $i < $qsize ; $i++) {\n            my $seq_origin = $seq_origins[$i];\n            my $seq_id = $seq_ids[$i];\n            push @ids, $seq_id if $seq_origin eq $origin;\n          }\n          if (scalar @ids == 1) {\n            $spectrum{1}++;\n          } elsif (scalar @ids > 1) {\n            my $contig_spectrum = $cross->_naive_assembler(\n              $contig, \\@ids, $cross->{'_min_overlap'},\n              $cross->{'_min_identity'});\n            $spectrum{1} += $$contig_spectrum{1};\n          } else {\n            $self->throw(\"The size is <= 0. How could such a thing happen?\");\n          }\n        }\n      }\n    }\n    # Get sequence stats\n    my ($nseq, $avgseql) = $cross->_get_seq_stats($assembly, \\%good_seqs);\n    $cross->{'_avg_seq_len'} = $avgseql;\n    $cross->{'_nof_seq'}     = $nseq;\n    # Get eff_asm_param for these sequences\n    if ($cross->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $cross->_get_overlap_stats($assembly, \\%good_seqs);\n      $cross->{'_min_overlap'}  = $minl;\n      $cross->{'_min_identity'} = $minid;\n      $cross->{'_avg_overlap'}  = $avgl;\n      $cross->{'_avg_identity'} = $avgid;\n      $cross->{'_nof_overlaps'} = $nover;\n    }\n  }\n  \n  $cross->_import_spectrum(\\%spectrum);\n  # Update nof_rep\n  $cross->{'_nof_rep'}--;\n  $cross->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n  \n  return $cross;\n}\n\n=head2 _import_assembly\n\n  Title   : _import_assembly\n  Usage   : $csp->_import_assembly($assemblyobj);\n  Function: Update a contig spectrum object based on an assembly object\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Scaffold assembly object\n\n\nsub _import_assembly {\n  my ($self, $assemblyobj) = @_;\n  # Sanity check\n  if( !ref $assemblyobj || ! $assemblyobj->isa('Bio::Assembly::ScaffoldI') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::ScaffoldI assembly \".\n        \"object [\".ref($assemblyobj).\"]\");\n  }\n  # Create new object from assembly\n  my $csp = $self->_new_from_assembly($assemblyobj);\n  # Update current contig spectrum object with new one\n  $self->add($csp);\n  return 1;\n}\n\n\n=head2 _import_spectrum\n\n  Title   : _import_spectrum\n  Usage   : $csp->_import_spectrum({ 1 => 90 , 2 => 3 , 4 => 1 })\n  Function: update a contig spectrum object based on a contig spectrum\n            represented as a hash (key: contig size, value: number of contigs of\n            this size)\n  Returns : 1 for success, 0 for error\n  Args    : contig spectrum as a hash reference\n\n\nsub _import_spectrum {\n  my ($self, $spectrum) = @_;\n  # Sanity check\n  if( ! ref $spectrum || ! ref $spectrum eq 'HASH') {\n    $self->throw(\"Spectrum should be a hash reference, but it is [\".\n      ref($spectrum).\"]\");\n  }\n  \n  # Update the spectrum (+ nof_rep, max_size and nof_seq)\n  for my $size (keys %$spectrum) {\n    # Get the number of contigs of different size\n    if (defined $self->{'_spectrum'}{$size}) {\n      $self->{'_spectrum'}{$size} += $$spectrum{$size};\n    } else {\n      $self->{'_spectrum'}{$size} = $$spectrum{$size};\n    }\n    # Update nof_seq\n    $self->{'_nof_seq'} += $size * $$spectrum{$size};\n    # Update max_size\n    $self->{'_max_size'} = $size if $size > $self->{'_max_size'};\n  }\n  \n  # If the contig spectrum has only zero 1-contigs, max_size is zero\n  $self->{'_max_size'} = 0 if scalar keys %{$self->{'_spectrum'}} == 1 &&\n    defined $self->{'_spectrum'}{'1'} && $self->{'_spectrum'}{'1'} == 0;\n  \n  # Update nof_rep\n  $self->{'_nof_rep'}++;\n  return 1;\n}\n\n=head2 _import_dissolved_csp\n\n  Title   : _import_dissolved_csp\n  Usage   : $csp->_import_dissolved_csp($mixed_csp, $seq_header);\n  Function: Update a contig spectrum object by dissolving a mixed contig\n            spectrum based on the header of the sequences\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum\n            sequence header string"},"line":1329,"range":{"start":{"character":0,"line":1329},"end":{"line":1341,"character":9999}},"kind":12,"definition":"sub","detail":"($self,$mixed_csp,$seq_header)","children":[{"definition":"my","containerName":"_import_dissolved_csp","localvar":"my","kind":13,"name":"$self","line":1330},{"line":1330,"containerName":"_import_dissolved_csp","kind":13,"name":"$mixed_csp"},{"kind":13,"containerName":"_import_dissolved_csp","name":"$seq_header","line":1330},{"name":"$mixed_csp","containerName":"_import_dissolved_csp","kind":13,"line":1332},{"line":1332,"name":"$seq_header","kind":13,"containerName":"_import_dissolved_csp"},{"name":"$self","containerName":"_import_dissolved_csp","kind":13,"line":1333},{"containerName":"_import_dissolved_csp","kind":12,"name":"throw","line":1333},{"line":1337,"name":"$dissolved_csp","kind":13,"localvar":"my","containerName":"_import_dissolved_csp","definition":"my"},{"line":1337,"name":"$self","kind":13,"containerName":"_import_dissolved_csp"},{"line":1337,"kind":12,"containerName":"_import_dissolved_csp","name":"_new_dissolved_csp"},{"line":1337,"kind":13,"containerName":"_import_dissolved_csp","name":"$mixed_csp"},{"line":1337,"name":"$seq_header","containerName":"_import_dissolved_csp","kind":13},{"kind":13,"containerName":"_import_dissolved_csp","name":"$self","line":1339},{"line":1339,"containerName":"_import_dissolved_csp","kind":12,"name":"add"},{"kind":13,"containerName":"_import_dissolved_csp","name":"$dissolved_csp","line":1339}],"containerName":"main::","name":"_import_dissolved_csp"},{"signature":{"label":"_import_cross_csp($self,$mixed_csp)","documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : \n\n\n\nsub _new_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity checks on the mixed contig spectrum\n\n  # min_overlap and min_identity must be specified if there are some overlaps\n  # in the mixed contig\n  unless ($mixed_csp->{'_nof_overlaps'} == 0) {\n    unless ( defined $self->{'_min_overlap'} || \n      defined $mixed_csp->{'_min_overlap'} ) {\n      $self->throw(\"min_overlap must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum to dissolve a contig\");\n    }\n    unless ( defined $self->{'_min_identity'} ||\n      defined $mixed_csp->{'_min_identity'} ) {\n      $self->throw(\"min_identity must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum\");\n    }\n  }\n  \n  # there must be at least one assembly in mixed contig spectrum\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one\n    assembly\");\n  }\n\n  # New dissolved contig spectrum object\n  my $dissolved = Bio::Assembly::Tools::ContigSpectrum->new();\n  \n  # take parent attributes if existent or mixed attributes otherwise\n  if ($self->{'_eff_asm_params'}) {\n    $dissolved->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $dissolved->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Dissolve each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Dissolve this assembly for the given sequences\n    my %asm_spectrum = (1 => 0);\n    my %good_seqs;\n    # For each contig\n    for my $contig ($assembly->all_contigs) {\n      # Get good sequences\n      my @contig_seqs;\n      for my $seq ($contig->each_seq) {\n        my $seq_id = $seq->id;\n        # get sequence origin\n        next unless $seq_id =~ m/^$seq_header\\|/;\n        # add it to hash\n        push @contig_seqs, $seq_id;\n        $good_seqs{$seq_id} = 1;\n      }\n      # Update spectrum\n      my $size = scalar @contig_seqs;\n      if ($size == 0) {\n        next;\n      } elsif ($size == 1) {\n        $asm_spectrum{1}++;\n      } elsif ($size > 1) {\n        # Reassemble good sequences\n        my $contig_spectrum = $dissolved->_naive_assembler(\n          $contig, \\@contig_seqs, $dissolved->{'_min_overlap'},\n          $dissolved->{'_min_identity'});\n        # update spectrum\n        for my $qsize (keys %$contig_spectrum) {\n          $asm_spectrum{$qsize} += $$contig_spectrum{$qsize};\n        }\n      } else {\n        $self->throw(\"The size is not valid... how could that happen?\");\n      }\n    }\n    # For each singlet\n    for my $singlet ($assembly->all_singlets) {\n      my $seq_id = $singlet->seqref->id;\n      # get sequence origin\n      next unless $seq_id =~ m/^$seq_header\\|/;\n      # add it to hash\n      $good_seqs{$seq_id} = 1;\n      # update spectrum\n      $asm_spectrum{1}++;\n    }\n    # Update spectrum\n    $dissolved->_import_spectrum(\\%asm_spectrum);\n    # Update nof_rep\n    $dissolved->{'_nof_rep'}--;\n    $dissolved->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n\n    # Get sequence stats\n    my ($nseq, $avgseql) = $dissolved->_get_seq_stats($assembly, \\%good_seqs);\n    $dissolved->{'_avg_seq_len'} = $avgseql;\n    $dissolved->{'_nof_seq'}     = $nseq;\n  \n    # Get eff_asm_param for these sequences\n    if ($dissolved->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $dissolved->_get_overlap_stats($assembly, \\%good_seqs);\n      $dissolved->{'_min_overlap'}  = $minl;\n      $dissolved->{'_min_identity'} = $minid;\n      $dissolved->{'_avg_overlap'}  = $avgl;\n      $dissolved->{'_avg_identity'} = $avgid;\n      $dissolved->{'_nof_overlaps'} = $nover;\n    }\n\n  }\n  return $dissolved;\n}\n\n\n=head2 _new_cross_csp\n\n  Title   : \n  Usage   : \n  Function: create a cross contig spectrum object\n  Returns : \n  Args    : \n\n\n\nsub _new_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check on the mixed contig spectrum\n  # There must be at least one assembly\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one \".\n    \"assembly.\");\n  }\n  \n  # New dissolved contig spectrum object\n  my $cross = Bio::Assembly::Tools::ContigSpectrum->new();\n  my %spectrum = (1 => 0);\n  \n  # Take parent or mixed attributes\n  if ($self->{'_eff_asm_params'}) {\n    $cross->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $cross->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Get cross contig spectrum for each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Go through contigs and skip the pure ones\n    my %good_seqs;\n    for my $contig ($assembly->all_contigs) {\n      # Get origins\n      my @seq_origins;\n      my @seq_ids;\n      for my $seq ($contig->each_seq) {\n        # current sequence origin\n        my $seq_id = $seq->id;\n        $seq_id =~ m/^(.+)\\|/;\n        my $seq_header = $1;\n        $self->warn(\"Sequence $seq_id does not seem to have a header. Skipping\".\n          \" it...\") if not defined $seq_header;\n        $seq_header ||= '';\n        push @seq_origins, $seq_header;\n        push @seq_ids, $seq_id;\n      }\n      my $qsize = scalar(@seq_ids);\n      my @origins = sort { $a cmp $b } @seq_origins;\n      my $size = scalar(@origins);\n      for (my $i = 1 ; $i < $size ; $i++) {\n        if ($origins[$i] eq $origins[$i-1]) {\n          splice @origins, $i, 1;\n          $i--;\n          $size--;\n        }\n      }\n      # Update cross-contig number in spectrum\n      if ($size > 1) { # cross-contig detected\n        # update good sequences\n        for my $seq_id (@seq_ids) {\n          $good_seqs{$seq_id} = 1;\n        }\n        # update number of cross q-contigs in spectrum\n        if (defined $spectrum{$qsize}) {\n          $spectrum{$qsize}++;\n        } else {\n          $spectrum{$qsize} = 1;\n        }\n      }\n      # Update number of cross 1-contigs\n      if ($size > 1) { # cross-contig detected\n        for my $origin (@origins) {\n          # sequences to use\n          my @ids;\n          for (my $i = 0 ; $i < $qsize ; $i++) {\n            my $seq_origin = $seq_origins[$i];\n            my $seq_id = $seq_ids[$i];\n            push @ids, $seq_id if $seq_origin eq $origin;\n          }\n          if (scalar @ids == 1) {\n            $spectrum{1}++;\n          } elsif (scalar @ids > 1) {\n            my $contig_spectrum = $cross->_naive_assembler(\n              $contig, \\@ids, $cross->{'_min_overlap'},\n              $cross->{'_min_identity'});\n            $spectrum{1} += $$contig_spectrum{1};\n          } else {\n            $self->throw(\"The size is <= 0. How could such a thing happen?\");\n          }\n        }\n      }\n    }\n    # Get sequence stats\n    my ($nseq, $avgseql) = $cross->_get_seq_stats($assembly, \\%good_seqs);\n    $cross->{'_avg_seq_len'} = $avgseql;\n    $cross->{'_nof_seq'}     = $nseq;\n    # Get eff_asm_param for these sequences\n    if ($cross->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $cross->_get_overlap_stats($assembly, \\%good_seqs);\n      $cross->{'_min_overlap'}  = $minl;\n      $cross->{'_min_identity'} = $minid;\n      $cross->{'_avg_overlap'}  = $avgl;\n      $cross->{'_avg_identity'} = $avgid;\n      $cross->{'_nof_overlaps'} = $nover;\n    }\n  }\n  \n  $cross->_import_spectrum(\\%spectrum);\n  # Update nof_rep\n  $cross->{'_nof_rep'}--;\n  $cross->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n  \n  return $cross;\n}\n\n=head2 _import_assembly\n\n  Title   : _import_assembly\n  Usage   : $csp->_import_assembly($assemblyobj);\n  Function: Update a contig spectrum object based on an assembly object\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Scaffold assembly object\n\n\nsub _import_assembly {\n  my ($self, $assemblyobj) = @_;\n  # Sanity check\n  if( !ref $assemblyobj || ! $assemblyobj->isa('Bio::Assembly::ScaffoldI') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::ScaffoldI assembly \".\n        \"object [\".ref($assemblyobj).\"]\");\n  }\n  # Create new object from assembly\n  my $csp = $self->_new_from_assembly($assemblyobj);\n  # Update current contig spectrum object with new one\n  $self->add($csp);\n  return 1;\n}\n\n\n=head2 _import_spectrum\n\n  Title   : _import_spectrum\n  Usage   : $csp->_import_spectrum({ 1 => 90 , 2 => 3 , 4 => 1 })\n  Function: update a contig spectrum object based on a contig spectrum\n            represented as a hash (key: contig size, value: number of contigs of\n            this size)\n  Returns : 1 for success, 0 for error\n  Args    : contig spectrum as a hash reference\n\n\nsub _import_spectrum {\n  my ($self, $spectrum) = @_;\n  # Sanity check\n  if( ! ref $spectrum || ! ref $spectrum eq 'HASH') {\n    $self->throw(\"Spectrum should be a hash reference, but it is [\".\n      ref($spectrum).\"]\");\n  }\n  \n  # Update the spectrum (+ nof_rep, max_size and nof_seq)\n  for my $size (keys %$spectrum) {\n    # Get the number of contigs of different size\n    if (defined $self->{'_spectrum'}{$size}) {\n      $self->{'_spectrum'}{$size} += $$spectrum{$size};\n    } else {\n      $self->{'_spectrum'}{$size} = $$spectrum{$size};\n    }\n    # Update nof_seq\n    $self->{'_nof_seq'} += $size * $$spectrum{$size};\n    # Update max_size\n    $self->{'_max_size'} = $size if $size > $self->{'_max_size'};\n  }\n  \n  # If the contig spectrum has only zero 1-contigs, max_size is zero\n  $self->{'_max_size'} = 0 if scalar keys %{$self->{'_spectrum'}} == 1 &&\n    defined $self->{'_spectrum'}{'1'} && $self->{'_spectrum'}{'1'} == 0;\n  \n  # Update nof_rep\n  $self->{'_nof_rep'}++;\n  return 1;\n}\n\n=head2 _import_dissolved_csp\n\n  Title   : _import_dissolved_csp\n  Usage   : $csp->_import_dissolved_csp($mixed_csp, $seq_header);\n  Function: Update a contig spectrum object by dissolving a mixed contig\n            spectrum based on the header of the sequences\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum\n            sequence header string\n\n\nsub _import_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity check\n  if (not defined $mixed_csp || not defined $seq_header) {\n    $self->throw(\"Expecting a contig spectrum reference and sequence header as\".\n    \" arguments\");\n  }\n  # Create new object from assembly\n  my $dissolved_csp = $self->_new_dissolved_csp($mixed_csp, $seq_header);\n  # Update current contig spectrum object with new one\n  $self->add($dissolved_csp);\n  return 1;\n}\n\n\n=head2 _import_cross_csp\n\n  Title   : _import_cross_csp\n  Usage   : $csp->_import_cross_csp($mixed_csp);\n  Function: Update a contig spectrum object by calculating the cross contig\n            spectrum based on a mixed contig spectrum\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum","parameters":[{"label":"$self"},{"label":"$mixed_csp"}]},"range":{"end":{"line":1369,"character":9999},"start":{"line":1355,"character":0}},"kind":12,"line":1355,"detail":"($self,$mixed_csp)","definition":"sub","name":"_import_cross_csp","containerName":"main::","children":[{"definition":"my","line":1356,"localvar":"my","containerName":"_import_cross_csp","kind":13,"name":"$self"},{"name":"$mixed_csp","containerName":"_import_cross_csp","kind":13,"line":1356},{"line":1358,"name":"$mixed_csp","kind":13,"containerName":"_import_cross_csp"},{"name":"$self","containerName":"_import_cross_csp","kind":13,"line":1359},{"line":1359,"name":"throw","kind":12,"containerName":"_import_cross_csp"},{"containerName":"_import_cross_csp","localvar":"my","kind":13,"name":"$cross_csp","line":1363,"definition":"my"},{"line":1363,"name":"$self","containerName":"_import_cross_csp","kind":13},{"containerName":"_import_cross_csp","kind":12,"name":"_new_cross_csp","line":1363},{"line":1363,"name":"$mixed_csp","kind":13,"containerName":"_import_cross_csp"},{"name":"$self","kind":13,"containerName":"_import_cross_csp","line":1366},{"line":1366,"name":"add","containerName":"_import_cross_csp","kind":12},{"name":"$cross_csp","containerName":"_import_cross_csp","kind":13,"line":1366}]},{"signature":{"parameters":[{"label":"$self"},{"label":"$assemblyobj"},{"label":"$seq_hash"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : \n\n\n\nsub _new_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity checks on the mixed contig spectrum\n\n  # min_overlap and min_identity must be specified if there are some overlaps\n  # in the mixed contig\n  unless ($mixed_csp->{'_nof_overlaps'} == 0) {\n    unless ( defined $self->{'_min_overlap'} || \n      defined $mixed_csp->{'_min_overlap'} ) {\n      $self->throw(\"min_overlap must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum to dissolve a contig\");\n    }\n    unless ( defined $self->{'_min_identity'} ||\n      defined $mixed_csp->{'_min_identity'} ) {\n      $self->throw(\"min_identity must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum\");\n    }\n  }\n  \n  # there must be at least one assembly in mixed contig spectrum\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one\n    assembly\");\n  }\n\n  # New dissolved contig spectrum object\n  my $dissolved = Bio::Assembly::Tools::ContigSpectrum->new();\n  \n  # take parent attributes if existent or mixed attributes otherwise\n  if ($self->{'_eff_asm_params'}) {\n    $dissolved->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $dissolved->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Dissolve each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Dissolve this assembly for the given sequences\n    my %asm_spectrum = (1 => 0);\n    my %good_seqs;\n    # For each contig\n    for my $contig ($assembly->all_contigs) {\n      # Get good sequences\n      my @contig_seqs;\n      for my $seq ($contig->each_seq) {\n        my $seq_id = $seq->id;\n        # get sequence origin\n        next unless $seq_id =~ m/^$seq_header\\|/;\n        # add it to hash\n        push @contig_seqs, $seq_id;\n        $good_seqs{$seq_id} = 1;\n      }\n      # Update spectrum\n      my $size = scalar @contig_seqs;\n      if ($size == 0) {\n        next;\n      } elsif ($size == 1) {\n        $asm_spectrum{1}++;\n      } elsif ($size > 1) {\n        # Reassemble good sequences\n        my $contig_spectrum = $dissolved->_naive_assembler(\n          $contig, \\@contig_seqs, $dissolved->{'_min_overlap'},\n          $dissolved->{'_min_identity'});\n        # update spectrum\n        for my $qsize (keys %$contig_spectrum) {\n          $asm_spectrum{$qsize} += $$contig_spectrum{$qsize};\n        }\n      } else {\n        $self->throw(\"The size is not valid... how could that happen?\");\n      }\n    }\n    # For each singlet\n    for my $singlet ($assembly->all_singlets) {\n      my $seq_id = $singlet->seqref->id;\n      # get sequence origin\n      next unless $seq_id =~ m/^$seq_header\\|/;\n      # add it to hash\n      $good_seqs{$seq_id} = 1;\n      # update spectrum\n      $asm_spectrum{1}++;\n    }\n    # Update spectrum\n    $dissolved->_import_spectrum(\\%asm_spectrum);\n    # Update nof_rep\n    $dissolved->{'_nof_rep'}--;\n    $dissolved->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n\n    # Get sequence stats\n    my ($nseq, $avgseql) = $dissolved->_get_seq_stats($assembly, \\%good_seqs);\n    $dissolved->{'_avg_seq_len'} = $avgseql;\n    $dissolved->{'_nof_seq'}     = $nseq;\n  \n    # Get eff_asm_param for these sequences\n    if ($dissolved->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $dissolved->_get_overlap_stats($assembly, \\%good_seqs);\n      $dissolved->{'_min_overlap'}  = $minl;\n      $dissolved->{'_min_identity'} = $minid;\n      $dissolved->{'_avg_overlap'}  = $avgl;\n      $dissolved->{'_avg_identity'} = $avgid;\n      $dissolved->{'_nof_overlaps'} = $nover;\n    }\n\n  }\n  return $dissolved;\n}\n\n\n=head2 _new_cross_csp\n\n  Title   : \n  Usage   : \n  Function: create a cross contig spectrum object\n  Returns : \n  Args    : \n\n\n\nsub _new_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check on the mixed contig spectrum\n  # There must be at least one assembly\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one \".\n    \"assembly.\");\n  }\n  \n  # New dissolved contig spectrum object\n  my $cross = Bio::Assembly::Tools::ContigSpectrum->new();\n  my %spectrum = (1 => 0);\n  \n  # Take parent or mixed attributes\n  if ($self->{'_eff_asm_params'}) {\n    $cross->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $cross->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Get cross contig spectrum for each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Go through contigs and skip the pure ones\n    my %good_seqs;\n    for my $contig ($assembly->all_contigs) {\n      # Get origins\n      my @seq_origins;\n      my @seq_ids;\n      for my $seq ($contig->each_seq) {\n        # current sequence origin\n        my $seq_id = $seq->id;\n        $seq_id =~ m/^(.+)\\|/;\n        my $seq_header = $1;\n        $self->warn(\"Sequence $seq_id does not seem to have a header. Skipping\".\n          \" it...\") if not defined $seq_header;\n        $seq_header ||= '';\n        push @seq_origins, $seq_header;\n        push @seq_ids, $seq_id;\n      }\n      my $qsize = scalar(@seq_ids);\n      my @origins = sort { $a cmp $b } @seq_origins;\n      my $size = scalar(@origins);\n      for (my $i = 1 ; $i < $size ; $i++) {\n        if ($origins[$i] eq $origins[$i-1]) {\n          splice @origins, $i, 1;\n          $i--;\n          $size--;\n        }\n      }\n      # Update cross-contig number in spectrum\n      if ($size > 1) { # cross-contig detected\n        # update good sequences\n        for my $seq_id (@seq_ids) {\n          $good_seqs{$seq_id} = 1;\n        }\n        # update number of cross q-contigs in spectrum\n        if (defined $spectrum{$qsize}) {\n          $spectrum{$qsize}++;\n        } else {\n          $spectrum{$qsize} = 1;\n        }\n      }\n      # Update number of cross 1-contigs\n      if ($size > 1) { # cross-contig detected\n        for my $origin (@origins) {\n          # sequences to use\n          my @ids;\n          for (my $i = 0 ; $i < $qsize ; $i++) {\n            my $seq_origin = $seq_origins[$i];\n            my $seq_id = $seq_ids[$i];\n            push @ids, $seq_id if $seq_origin eq $origin;\n          }\n          if (scalar @ids == 1) {\n            $spectrum{1}++;\n          } elsif (scalar @ids > 1) {\n            my $contig_spectrum = $cross->_naive_assembler(\n              $contig, \\@ids, $cross->{'_min_overlap'},\n              $cross->{'_min_identity'});\n            $spectrum{1} += $$contig_spectrum{1};\n          } else {\n            $self->throw(\"The size is <= 0. How could such a thing happen?\");\n          }\n        }\n      }\n    }\n    # Get sequence stats\n    my ($nseq, $avgseql) = $cross->_get_seq_stats($assembly, \\%good_seqs);\n    $cross->{'_avg_seq_len'} = $avgseql;\n    $cross->{'_nof_seq'}     = $nseq;\n    # Get eff_asm_param for these sequences\n    if ($cross->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $cross->_get_overlap_stats($assembly, \\%good_seqs);\n      $cross->{'_min_overlap'}  = $minl;\n      $cross->{'_min_identity'} = $minid;\n      $cross->{'_avg_overlap'}  = $avgl;\n      $cross->{'_avg_identity'} = $avgid;\n      $cross->{'_nof_overlaps'} = $nover;\n    }\n  }\n  \n  $cross->_import_spectrum(\\%spectrum);\n  # Update nof_rep\n  $cross->{'_nof_rep'}--;\n  $cross->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n  \n  return $cross;\n}\n\n=head2 _import_assembly\n\n  Title   : _import_assembly\n  Usage   : $csp->_import_assembly($assemblyobj);\n  Function: Update a contig spectrum object based on an assembly object\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Scaffold assembly object\n\n\nsub _import_assembly {\n  my ($self, $assemblyobj) = @_;\n  # Sanity check\n  if( !ref $assemblyobj || ! $assemblyobj->isa('Bio::Assembly::ScaffoldI') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::ScaffoldI assembly \".\n        \"object [\".ref($assemblyobj).\"]\");\n  }\n  # Create new object from assembly\n  my $csp = $self->_new_from_assembly($assemblyobj);\n  # Update current contig spectrum object with new one\n  $self->add($csp);\n  return 1;\n}\n\n\n=head2 _import_spectrum\n\n  Title   : _import_spectrum\n  Usage   : $csp->_import_spectrum({ 1 => 90 , 2 => 3 , 4 => 1 })\n  Function: update a contig spectrum object based on a contig spectrum\n            represented as a hash (key: contig size, value: number of contigs of\n            this size)\n  Returns : 1 for success, 0 for error\n  Args    : contig spectrum as a hash reference\n\n\nsub _import_spectrum {\n  my ($self, $spectrum) = @_;\n  # Sanity check\n  if( ! ref $spectrum || ! ref $spectrum eq 'HASH') {\n    $self->throw(\"Spectrum should be a hash reference, but it is [\".\n      ref($spectrum).\"]\");\n  }\n  \n  # Update the spectrum (+ nof_rep, max_size and nof_seq)\n  for my $size (keys %$spectrum) {\n    # Get the number of contigs of different size\n    if (defined $self->{'_spectrum'}{$size}) {\n      $self->{'_spectrum'}{$size} += $$spectrum{$size};\n    } else {\n      $self->{'_spectrum'}{$size} = $$spectrum{$size};\n    }\n    # Update nof_seq\n    $self->{'_nof_seq'} += $size * $$spectrum{$size};\n    # Update max_size\n    $self->{'_max_size'} = $size if $size > $self->{'_max_size'};\n  }\n  \n  # If the contig spectrum has only zero 1-contigs, max_size is zero\n  $self->{'_max_size'} = 0 if scalar keys %{$self->{'_spectrum'}} == 1 &&\n    defined $self->{'_spectrum'}{'1'} && $self->{'_spectrum'}{'1'} == 0;\n  \n  # Update nof_rep\n  $self->{'_nof_rep'}++;\n  return 1;\n}\n\n=head2 _import_dissolved_csp\n\n  Title   : _import_dissolved_csp\n  Usage   : $csp->_import_dissolved_csp($mixed_csp, $seq_header);\n  Function: Update a contig spectrum object by dissolving a mixed contig\n            spectrum based on the header of the sequences\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum\n            sequence header string\n\n\nsub _import_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity check\n  if (not defined $mixed_csp || not defined $seq_header) {\n    $self->throw(\"Expecting a contig spectrum reference and sequence header as\".\n    \" arguments\");\n  }\n  # Create new object from assembly\n  my $dissolved_csp = $self->_new_dissolved_csp($mixed_csp, $seq_header);\n  # Update current contig spectrum object with new one\n  $self->add($dissolved_csp);\n  return 1;\n}\n\n\n=head2 _import_cross_csp\n\n  Title   : _import_cross_csp\n  Usage   : $csp->_import_cross_csp($mixed_csp);\n  Function: Update a contig spectrum object by calculating the cross contig\n            spectrum based on a mixed contig spectrum\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum\n\n\nsub _import_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check\n  if (not defined $mixed_csp) {\n    $self->throw(\"Expecting a contig spectrum reference as argument\");\n  }\n\n  # Create new object from assembly\n  my $cross_csp = $self->_new_cross_csp($mixed_csp);\n\n  # Update current contig spectrum object with new one\n  $self->add($cross_csp);\n\n  return 1;\n}\n\n\n=head2 _get_seq_stats\n\n  Title   : _get_seq_stats\n  Usage   : my $seqlength = $csp->_get_seq_stats($assemblyobj);\n  Function: Get sequence statistics from an assembly:\n              number of sequences, average sequence length\n  Returns : number of sequences (integer)\n            average sequence length (decimal)\n  Args    : assembly object reference\n            hash reference with the IDs of the sequences to consider [optional]","label":"_get_seq_stats($self,$assemblyobj,$seq_hash)"},"line":1385,"range":{"start":{"line":1385,"character":0},"end":{"line":1416,"character":9999}},"kind":12,"definition":"sub","detail":"($self,$assemblyobj,$seq_hash)","children":[{"line":1386,"localvar":"my","containerName":"_get_seq_stats","kind":13,"name":"$self","definition":"my"},{"line":1386,"name":"$assemblyobj","containerName":"_get_seq_stats","kind":13},{"containerName":"_get_seq_stats","kind":13,"name":"$seq_hash","line":1386},{"line":1389,"name":"$self","kind":13,"containerName":"_get_seq_stats"},{"name":"throw","kind":12,"containerName":"_get_seq_stats","line":1389},{"line":1390,"name":"$assemblyobj","kind":13,"containerName":"_get_seq_stats"},{"line":1390,"containerName":"_get_seq_stats","kind":13,"name":"$assemblyobj"},{"line":1390,"kind":12,"containerName":"_get_seq_stats","name":"isa"},{"name":"$self","kind":13,"containerName":"_get_seq_stats","line":1391},{"name":"throw","containerName":"_get_seq_stats","kind":12,"line":1391},{"line":1391,"name":"$seq_hash","kind":13,"containerName":"_get_seq_stats"},{"line":1392,"name":"$seq_hash","kind":13,"containerName":"_get_seq_stats"},{"line":1392,"kind":13,"containerName":"_get_seq_stats","name":"$seq_hash"},{"definition":"my","line":1394,"kind":13,"localvar":"my","containerName":"_get_seq_stats","name":"$avg_seq_len"},{"definition":"my","line":1395,"name":"$nof_seq","localvar":"my","kind":13,"containerName":"_get_seq_stats"},{"definition":"my","line":1396,"name":"$contigobj","kind":13,"localvar":"my","containerName":"_get_seq_stats"},{"name":"$assemblyobj","kind":13,"containerName":"_get_seq_stats","line":1396},{"line":1396,"containerName":"_get_seq_stats","kind":12,"name":"all_contigs"},{"name":"$seqobj","kind":13,"localvar":"my","containerName":"_get_seq_stats","line":1397,"definition":"my"},{"kind":13,"containerName":"_get_seq_stats","name":"$contigobj","line":1397},{"name":"each_seq","kind":12,"containerName":"_get_seq_stats","line":1397},{"containerName":"_get_seq_stats","localvar":"my","kind":13,"name":"$seq_id","line":1398,"definition":"my"},{"line":1398,"name":"$seqobj","containerName":"_get_seq_stats","kind":13},{"name":"id","kind":12,"containerName":"_get_seq_stats","line":1398},{"line":1399,"name":"$seq_hash","containerName":"_get_seq_stats","kind":13},{"line":1399,"name":"$seq_id","kind":13,"containerName":"_get_seq_stats"},{"line":1400,"name":"$nof_seq","kind":13,"containerName":"_get_seq_stats"},{"definition":"my","line":1401,"name":"$seq_string","kind":13,"localvar":"my","containerName":"_get_seq_stats"},{"name":"$seqobj","containerName":"_get_seq_stats","kind":13,"line":1401},{"kind":12,"containerName":"_get_seq_stats","name":"seq","line":1401},{"kind":13,"containerName":"_get_seq_stats","name":"$seq_string","line":1402},{"name":"$avg_seq_len","containerName":"_get_seq_stats","kind":13,"line":1403},{"line":1403,"name":"$seq_string","kind":13,"containerName":"_get_seq_stats"},{"localvar":"my","containerName":"_get_seq_stats","kind":13,"name":"$singletobj","line":1406,"definition":"my"},{"name":"$assemblyobj","kind":13,"containerName":"_get_seq_stats","line":1406},{"kind":12,"containerName":"_get_seq_stats","name":"all_singlets","line":1406},{"line":1407,"name":"$seq_id","localvar":"my","kind":13,"containerName":"_get_seq_stats","definition":"my"},{"name":"$singletobj","kind":13,"containerName":"_get_seq_stats","line":1407},{"line":1407,"containerName":"_get_seq_stats","kind":12,"name":"seqref"},{"line":1407,"name":"id","kind":12,"containerName":"_get_seq_stats"},{"containerName":"_get_seq_stats","kind":13,"name":"$seq_hash","line":1408},{"name":"$seq_id","containerName":"_get_seq_stats","kind":13,"line":1408},{"line":1409,"kind":13,"containerName":"_get_seq_stats","name":"$nof_seq"},{"definition":"my","line":1410,"kind":13,"localvar":"my","containerName":"_get_seq_stats","name":"$seq_string"},{"line":1410,"name":"$singletobj","containerName":"_get_seq_stats","kind":13},{"kind":12,"containerName":"_get_seq_stats","name":"seqref","line":1410},{"line":1410,"kind":12,"containerName":"_get_seq_stats","name":"seq"},{"line":1411,"name":"$seq_string","containerName":"_get_seq_stats","kind":13},{"line":1412,"kind":13,"containerName":"_get_seq_stats","name":"$avg_seq_len"},{"line":1412,"containerName":"_get_seq_stats","kind":13,"name":"$seq_string"},{"name":"$avg_seq_len","containerName":"_get_seq_stats","kind":13,"line":1414},{"containerName":"_get_seq_stats","kind":13,"name":"$nof_seq","line":1414},{"name":"$nof_seq","containerName":"_get_seq_stats","kind":13,"line":1414},{"line":1415,"name":"$nof_seq","containerName":"_get_seq_stats","kind":13},{"line":1415,"kind":13,"containerName":"_get_seq_stats","name":"$avg_seq_len"}],"name":"_get_seq_stats","containerName":"main::"},{"line":1399,"kind":12,"name":"seq_hash"},{"line":1408,"name":"seq_hash","kind":12},{"line":1435,"range":{"start":{"line":1435,"character":0},"end":{"line":1517,"character":9999}},"kind":12,"signature":{"parameters":[{"label":"$self"},{"label":"$assembly_obj"},{"label":"$seq_hash"}],"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : \n\n\n\nsub _new_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity checks on the mixed contig spectrum\n\n  # min_overlap and min_identity must be specified if there are some overlaps\n  # in the mixed contig\n  unless ($mixed_csp->{'_nof_overlaps'} == 0) {\n    unless ( defined $self->{'_min_overlap'} || \n      defined $mixed_csp->{'_min_overlap'} ) {\n      $self->throw(\"min_overlap must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum to dissolve a contig\");\n    }\n    unless ( defined $self->{'_min_identity'} ||\n      defined $mixed_csp->{'_min_identity'} ) {\n      $self->throw(\"min_identity must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum\");\n    }\n  }\n  \n  # there must be at least one assembly in mixed contig spectrum\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one\n    assembly\");\n  }\n\n  # New dissolved contig spectrum object\n  my $dissolved = Bio::Assembly::Tools::ContigSpectrum->new();\n  \n  # take parent attributes if existent or mixed attributes otherwise\n  if ($self->{'_eff_asm_params'}) {\n    $dissolved->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $dissolved->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Dissolve each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Dissolve this assembly for the given sequences\n    my %asm_spectrum = (1 => 0);\n    my %good_seqs;\n    # For each contig\n    for my $contig ($assembly->all_contigs) {\n      # Get good sequences\n      my @contig_seqs;\n      for my $seq ($contig->each_seq) {\n        my $seq_id = $seq->id;\n        # get sequence origin\n        next unless $seq_id =~ m/^$seq_header\\|/;\n        # add it to hash\n        push @contig_seqs, $seq_id;\n        $good_seqs{$seq_id} = 1;\n      }\n      # Update spectrum\n      my $size = scalar @contig_seqs;\n      if ($size == 0) {\n        next;\n      } elsif ($size == 1) {\n        $asm_spectrum{1}++;\n      } elsif ($size > 1) {\n        # Reassemble good sequences\n        my $contig_spectrum = $dissolved->_naive_assembler(\n          $contig, \\@contig_seqs, $dissolved->{'_min_overlap'},\n          $dissolved->{'_min_identity'});\n        # update spectrum\n        for my $qsize (keys %$contig_spectrum) {\n          $asm_spectrum{$qsize} += $$contig_spectrum{$qsize};\n        }\n      } else {\n        $self->throw(\"The size is not valid... how could that happen?\");\n      }\n    }\n    # For each singlet\n    for my $singlet ($assembly->all_singlets) {\n      my $seq_id = $singlet->seqref->id;\n      # get sequence origin\n      next unless $seq_id =~ m/^$seq_header\\|/;\n      # add it to hash\n      $good_seqs{$seq_id} = 1;\n      # update spectrum\n      $asm_spectrum{1}++;\n    }\n    # Update spectrum\n    $dissolved->_import_spectrum(\\%asm_spectrum);\n    # Update nof_rep\n    $dissolved->{'_nof_rep'}--;\n    $dissolved->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n\n    # Get sequence stats\n    my ($nseq, $avgseql) = $dissolved->_get_seq_stats($assembly, \\%good_seqs);\n    $dissolved->{'_avg_seq_len'} = $avgseql;\n    $dissolved->{'_nof_seq'}     = $nseq;\n  \n    # Get eff_asm_param for these sequences\n    if ($dissolved->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $dissolved->_get_overlap_stats($assembly, \\%good_seqs);\n      $dissolved->{'_min_overlap'}  = $minl;\n      $dissolved->{'_min_identity'} = $minid;\n      $dissolved->{'_avg_overlap'}  = $avgl;\n      $dissolved->{'_avg_identity'} = $avgid;\n      $dissolved->{'_nof_overlaps'} = $nover;\n    }\n\n  }\n  return $dissolved;\n}\n\n\n=head2 _new_cross_csp\n\n  Title   : \n  Usage   : \n  Function: create a cross contig spectrum object\n  Returns : \n  Args    : \n\n\n\nsub _new_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check on the mixed contig spectrum\n  # There must be at least one assembly\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one \".\n    \"assembly.\");\n  }\n  \n  # New dissolved contig spectrum object\n  my $cross = Bio::Assembly::Tools::ContigSpectrum->new();\n  my %spectrum = (1 => 0);\n  \n  # Take parent or mixed attributes\n  if ($self->{'_eff_asm_params'}) {\n    $cross->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $cross->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Get cross contig spectrum for each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Go through contigs and skip the pure ones\n    my %good_seqs;\n    for my $contig ($assembly->all_contigs) {\n      # Get origins\n      my @seq_origins;\n      my @seq_ids;\n      for my $seq ($contig->each_seq) {\n        # current sequence origin\n        my $seq_id = $seq->id;\n        $seq_id =~ m/^(.+)\\|/;\n        my $seq_header = $1;\n        $self->warn(\"Sequence $seq_id does not seem to have a header. Skipping\".\n          \" it...\") if not defined $seq_header;\n        $seq_header ||= '';\n        push @seq_origins, $seq_header;\n        push @seq_ids, $seq_id;\n      }\n      my $qsize = scalar(@seq_ids);\n      my @origins = sort { $a cmp $b } @seq_origins;\n      my $size = scalar(@origins);\n      for (my $i = 1 ; $i < $size ; $i++) {\n        if ($origins[$i] eq $origins[$i-1]) {\n          splice @origins, $i, 1;\n          $i--;\n          $size--;\n        }\n      }\n      # Update cross-contig number in spectrum\n      if ($size > 1) { # cross-contig detected\n        # update good sequences\n        for my $seq_id (@seq_ids) {\n          $good_seqs{$seq_id} = 1;\n        }\n        # update number of cross q-contigs in spectrum\n        if (defined $spectrum{$qsize}) {\n          $spectrum{$qsize}++;\n        } else {\n          $spectrum{$qsize} = 1;\n        }\n      }\n      # Update number of cross 1-contigs\n      if ($size > 1) { # cross-contig detected\n        for my $origin (@origins) {\n          # sequences to use\n          my @ids;\n          for (my $i = 0 ; $i < $qsize ; $i++) {\n            my $seq_origin = $seq_origins[$i];\n            my $seq_id = $seq_ids[$i];\n            push @ids, $seq_id if $seq_origin eq $origin;\n          }\n          if (scalar @ids == 1) {\n            $spectrum{1}++;\n          } elsif (scalar @ids > 1) {\n            my $contig_spectrum = $cross->_naive_assembler(\n              $contig, \\@ids, $cross->{'_min_overlap'},\n              $cross->{'_min_identity'});\n            $spectrum{1} += $$contig_spectrum{1};\n          } else {\n            $self->throw(\"The size is <= 0. How could such a thing happen?\");\n          }\n        }\n      }\n    }\n    # Get sequence stats\n    my ($nseq, $avgseql) = $cross->_get_seq_stats($assembly, \\%good_seqs);\n    $cross->{'_avg_seq_len'} = $avgseql;\n    $cross->{'_nof_seq'}     = $nseq;\n    # Get eff_asm_param for these sequences\n    if ($cross->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $cross->_get_overlap_stats($assembly, \\%good_seqs);\n      $cross->{'_min_overlap'}  = $minl;\n      $cross->{'_min_identity'} = $minid;\n      $cross->{'_avg_overlap'}  = $avgl;\n      $cross->{'_avg_identity'} = $avgid;\n      $cross->{'_nof_overlaps'} = $nover;\n    }\n  }\n  \n  $cross->_import_spectrum(\\%spectrum);\n  # Update nof_rep\n  $cross->{'_nof_rep'}--;\n  $cross->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n  \n  return $cross;\n}\n\n=head2 _import_assembly\n\n  Title   : _import_assembly\n  Usage   : $csp->_import_assembly($assemblyobj);\n  Function: Update a contig spectrum object based on an assembly object\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Scaffold assembly object\n\n\nsub _import_assembly {\n  my ($self, $assemblyobj) = @_;\n  # Sanity check\n  if( !ref $assemblyobj || ! $assemblyobj->isa('Bio::Assembly::ScaffoldI') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::ScaffoldI assembly \".\n        \"object [\".ref($assemblyobj).\"]\");\n  }\n  # Create new object from assembly\n  my $csp = $self->_new_from_assembly($assemblyobj);\n  # Update current contig spectrum object with new one\n  $self->add($csp);\n  return 1;\n}\n\n\n=head2 _import_spectrum\n\n  Title   : _import_spectrum\n  Usage   : $csp->_import_spectrum({ 1 => 90 , 2 => 3 , 4 => 1 })\n  Function: update a contig spectrum object based on a contig spectrum\n            represented as a hash (key: contig size, value: number of contigs of\n            this size)\n  Returns : 1 for success, 0 for error\n  Args    : contig spectrum as a hash reference\n\n\nsub _import_spectrum {\n  my ($self, $spectrum) = @_;\n  # Sanity check\n  if( ! ref $spectrum || ! ref $spectrum eq 'HASH') {\n    $self->throw(\"Spectrum should be a hash reference, but it is [\".\n      ref($spectrum).\"]\");\n  }\n  \n  # Update the spectrum (+ nof_rep, max_size and nof_seq)\n  for my $size (keys %$spectrum) {\n    # Get the number of contigs of different size\n    if (defined $self->{'_spectrum'}{$size}) {\n      $self->{'_spectrum'}{$size} += $$spectrum{$size};\n    } else {\n      $self->{'_spectrum'}{$size} = $$spectrum{$size};\n    }\n    # Update nof_seq\n    $self->{'_nof_seq'} += $size * $$spectrum{$size};\n    # Update max_size\n    $self->{'_max_size'} = $size if $size > $self->{'_max_size'};\n  }\n  \n  # If the contig spectrum has only zero 1-contigs, max_size is zero\n  $self->{'_max_size'} = 0 if scalar keys %{$self->{'_spectrum'}} == 1 &&\n    defined $self->{'_spectrum'}{'1'} && $self->{'_spectrum'}{'1'} == 0;\n  \n  # Update nof_rep\n  $self->{'_nof_rep'}++;\n  return 1;\n}\n\n=head2 _import_dissolved_csp\n\n  Title   : _import_dissolved_csp\n  Usage   : $csp->_import_dissolved_csp($mixed_csp, $seq_header);\n  Function: Update a contig spectrum object by dissolving a mixed contig\n            spectrum based on the header of the sequences\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum\n            sequence header string\n\n\nsub _import_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity check\n  if (not defined $mixed_csp || not defined $seq_header) {\n    $self->throw(\"Expecting a contig spectrum reference and sequence header as\".\n    \" arguments\");\n  }\n  # Create new object from assembly\n  my $dissolved_csp = $self->_new_dissolved_csp($mixed_csp, $seq_header);\n  # Update current contig spectrum object with new one\n  $self->add($dissolved_csp);\n  return 1;\n}\n\n\n=head2 _import_cross_csp\n\n  Title   : _import_cross_csp\n  Usage   : $csp->_import_cross_csp($mixed_csp);\n  Function: Update a contig spectrum object by calculating the cross contig\n            spectrum based on a mixed contig spectrum\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum\n\n\nsub _import_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check\n  if (not defined $mixed_csp) {\n    $self->throw(\"Expecting a contig spectrum reference as argument\");\n  }\n\n  # Create new object from assembly\n  my $cross_csp = $self->_new_cross_csp($mixed_csp);\n\n  # Update current contig spectrum object with new one\n  $self->add($cross_csp);\n\n  return 1;\n}\n\n\n=head2 _get_seq_stats\n\n  Title   : _get_seq_stats\n  Usage   : my $seqlength = $csp->_get_seq_stats($assemblyobj);\n  Function: Get sequence statistics from an assembly:\n              number of sequences, average sequence length\n  Returns : number of sequences (integer)\n            average sequence length (decimal)\n  Args    : assembly object reference\n            hash reference with the IDs of the sequences to consider [optional]\n\n\nsub _get_seq_stats {\n  my ($self, $assemblyobj, $seq_hash) = @_;\n\n  # sanity check\n  $self->throw(\"Must provide a Bio::Assembly::Scaffold object\")\n    if (!defined $assemblyobj || !$assemblyobj->isa(\"Bio::Assembly::ScaffoldI\"));\n  $self->throw(\"Expecting a hash reference. Got [\".ref($seq_hash).\"]\")\n    if (defined $seq_hash && ! ref($seq_hash) eq 'HASH');\n\n  my $avg_seq_len = 0;\n  my $nof_seq = 0;\n  for my $contigobj ($assemblyobj->all_contigs) {\n    for my $seqobj ($contigobj->each_seq) {\n      my $seq_id = $seqobj->id;\n      next if defined $seq_hash && !defined $$seq_hash{$seq_id};\n      $nof_seq++;\n      my $seq_string = $seqobj->seq;\n      $seq_string =~ s/-//g;\n      $avg_seq_len += length($seq_string);\n    }\n  }\n  for my $singletobj ($assemblyobj->all_singlets) {\n    my $seq_id = $singletobj->seqref->id;\n    next if defined $seq_hash && !defined $$seq_hash{$seq_id};\n    $nof_seq++;\n    my $seq_string = $singletobj->seqref->seq;\n    $seq_string =~ s/-//g;\n    $avg_seq_len += length($seq_string);\n  }\n  $avg_seq_len /= $nof_seq unless $nof_seq == 0;\n  return $nof_seq, $avg_seq_len;\n}\n\n\n=head2 _get_overlap_stats\n\n  Title   : _get_overlap_stats\n  Usage   : my ($minlength, $min_identity, $avglength, $avgidentity)\n              = $csp->_get_overlap_stats($assemblyobj);\n  Function: Get statistics about pairwise overlaps in contigs of an assembly\n  Returns : number of overlaps\n            minimum overlap length\n            average overlap length\n            minimum identity percent\n            average identity percent\n  Args    : assembly object reference\n            hash reference with the IDs of the sequences to consider [optional]","label":"_get_overlap_stats($self,$assembly_obj,$seq_hash)"},"children":[{"definition":"my","line":1436,"name":"$self","containerName":"_get_overlap_stats","localvar":"my","kind":13},{"kind":13,"containerName":"_get_overlap_stats","name":"$assembly_obj","line":1436},{"line":1436,"name":"$seq_hash","kind":13,"containerName":"_get_overlap_stats"},{"kind":13,"containerName":"_get_overlap_stats","name":"$self","line":1439},{"name":"throw","kind":12,"containerName":"_get_overlap_stats","line":1439},{"name":"$assembly_obj","kind":13,"containerName":"_get_overlap_stats","line":1440},{"line":1440,"name":"$assembly_obj","kind":13,"containerName":"_get_overlap_stats"},{"name":"isa","kind":12,"containerName":"_get_overlap_stats","line":1440},{"line":1441,"kind":13,"containerName":"_get_overlap_stats","name":"$self"},{"containerName":"_get_overlap_stats","kind":12,"name":"throw","line":1441},{"line":1441,"name":"$seq_hash","kind":13,"containerName":"_get_overlap_stats"},{"line":1442,"kind":13,"containerName":"_get_overlap_stats","name":"$seq_hash"},{"line":1442,"name":"$seq_hash","kind":13,"containerName":"_get_overlap_stats"},{"definition":"my","line":1444,"name":"$matchdef","containerName":"_get_overlap_stats","localvar":"my","kind":13},{"line":1444,"name":"$self","kind":13,"containerName":"_get_overlap_stats"},{"definition":"my","kind":13,"localvar":"my","containerName":"_get_overlap_stats","name":"$min_length","line":1445},{"line":1445,"containerName":"_get_overlap_stats","kind":13,"name":"$avg_length"},{"line":1445,"kind":13,"containerName":"_get_overlap_stats","name":"$min_identity"},{"containerName":"_get_overlap_stats","kind":13,"name":"$avg_identity","line":1445},{"line":1445,"kind":13,"containerName":"_get_overlap_stats","name":"$nof_overlaps"},{"line":1449,"name":"$contig_obj","containerName":"_get_overlap_stats","localvar":"my","kind":13,"definition":"my"},{"line":1449,"kind":13,"containerName":"_get_overlap_stats","name":"$assembly_obj"},{"line":1449,"kind":12,"containerName":"_get_overlap_stats","name":"all_contigs"},{"definition":"my","localvar":"my","kind":13,"containerName":"_get_overlap_stats","name":"$nof_seq","line":1450},{"line":1453,"kind":13,"localvar":"my","containerName":"_get_overlap_stats","name":"@all_seq_objs","definition":"my"},{"name":"$contig_obj","containerName":"_get_overlap_stats","kind":13,"line":1453},{"name":"each_seq","kind":12,"containerName":"_get_overlap_stats","line":1453},{"definition":"my","localvar":"my","kind":13,"containerName":"_get_overlap_stats","name":"$i","line":1455},{"name":"$i","containerName":"_get_overlap_stats","kind":13,"line":1455},{"line":1455,"containerName":"_get_overlap_stats","kind":13,"name":"@all_seq_objs"},{"line":1455,"containerName":"_get_overlap_stats","kind":13,"name":"$i"},{"name":"$seq_obj","localvar":"my","containerName":"_get_overlap_stats","kind":13,"line":1456,"definition":"my"},{"line":1456,"kind":13,"containerName":"_get_overlap_stats","name":"$all_seq_objs"},{"line":1456,"kind":13,"containerName":"_get_overlap_stats","name":"$i"},{"definition":"my","name":"$seq_id","kind":13,"localvar":"my","containerName":"_get_overlap_stats","line":1457},{"line":1457,"containerName":"_get_overlap_stats","kind":13,"name":"$seq_obj"},{"name":"id","kind":12,"containerName":"_get_overlap_stats","line":1457},{"line":1460,"name":"$seq_hash","kind":13,"containerName":"_get_overlap_stats"},{"name":"$seq_id","containerName":"_get_overlap_stats","kind":13,"line":1460},{"kind":13,"containerName":"_get_overlap_stats","name":"$nof_seq","line":1461},{"line":1464,"kind":13,"containerName":"_get_overlap_stats","name":"$nof_seq"},{"name":"$stats","kind":13,"localvar":"my","containerName":"_get_overlap_stats","line":1467,"definition":"my"},{"line":1467,"kind":12,"containerName":"_get_overlap_stats","name":"new"},{"definition":"my","line":1468,"localvar":"my","containerName":"_get_overlap_stats","kind":13,"name":"$target_obj"},{"definition":"my","line":1469,"localvar":"my","kind":13,"containerName":"_get_overlap_stats","name":"$target_id"},{"line":1470,"name":"$best_score","localvar":"my","containerName":"_get_overlap_stats","kind":13,"definition":"my"},{"kind":13,"localvar":"my","containerName":"_get_overlap_stats","name":"$best_length","line":1471,"definition":"my"},{"name":"$best_identity","localvar":"my","containerName":"_get_overlap_stats","kind":13,"line":1472,"definition":"my"},{"definition":"my","line":1474,"name":"$j","localvar":"my","containerName":"_get_overlap_stats","kind":13},{"line":1474,"name":"$i","kind":13,"containerName":"_get_overlap_stats"},{"line":1474,"containerName":"_get_overlap_stats","kind":13,"name":"$j"},{"name":"$j","kind":13,"containerName":"_get_overlap_stats","line":1474},{"definition":"my","kind":13,"localvar":"my","containerName":"_get_overlap_stats","name":"$tmp_target_obj","line":1475},{"line":1475,"kind":13,"containerName":"_get_overlap_stats","name":"$all_seq_objs"},{"line":1475,"kind":13,"containerName":"_get_overlap_stats","name":"$j"},{"definition":"my","line":1476,"localvar":"my","kind":13,"containerName":"_get_overlap_stats","name":"$tmp_target_id"},{"line":1476,"kind":13,"containerName":"_get_overlap_stats","name":"$tmp_target_obj"},{"name":"id","kind":12,"containerName":"_get_overlap_stats","line":1476},{"line":1479,"name":"$seq_hash","kind":13,"containerName":"_get_overlap_stats"},{"line":1479,"containerName":"_get_overlap_stats","kind":13,"name":"$tmp_target_id"},{"definition":"my","line":1482,"name":"$aln_obj","kind":13,"localvar":"my","containerName":"_get_overlap_stats"},{"line":1482,"kind":13,"containerName":"_get_overlap_stats","name":"$tmp_length"},{"line":1482,"kind":13,"containerName":"_get_overlap_stats","name":"$tmp_identity"},{"line":1483,"containerName":"_get_overlap_stats","kind":13,"name":"$self"},{"line":1483,"kind":12,"containerName":"_get_overlap_stats","name":"_overlap_alignment"},{"line":1483,"name":"$contig_obj","containerName":"_get_overlap_stats","kind":13},{"kind":13,"containerName":"_get_overlap_stats","name":"$seq_obj","line":1483},{"name":"$tmp_target_obj","containerName":"_get_overlap_stats","kind":13,"line":1483},{"kind":13,"containerName":"_get_overlap_stats","name":"$aln_obj","line":1484},{"definition":"my","line":1485,"name":"$tmp_score","kind":13,"localvar":"my","containerName":"_get_overlap_stats"},{"line":1485,"name":"$stats","kind":13,"containerName":"_get_overlap_stats"},{"line":1485,"containerName":"_get_overlap_stats","kind":12,"name":"score_nuc"},{"line":1485,"kind":13,"containerName":"_get_overlap_stats","name":"$aln_obj"},{"kind":13,"containerName":"_get_overlap_stats","name":"$best_score","line":1488},{"name":"$best_score","containerName":"_get_overlap_stats","kind":13,"line":1488},{"line":1488,"kind":13,"containerName":"_get_overlap_stats","name":"$tmp_score"},{"line":1489,"kind":13,"containerName":"_get_overlap_stats","name":"$best_score"},{"name":"$tmp_score","containerName":"_get_overlap_stats","kind":13,"line":1489},{"line":1490,"containerName":"_get_overlap_stats","kind":13,"name":"$best_length"},{"containerName":"_get_overlap_stats","kind":13,"name":"$tmp_length","line":1490},{"line":1491,"containerName":"_get_overlap_stats","kind":13,"name":"$best_identity"},{"containerName":"_get_overlap_stats","kind":13,"name":"$tmp_identity","line":1491},{"line":1492,"containerName":"_get_overlap_stats","kind":13,"name":"$target_obj"},{"line":1492,"kind":13,"containerName":"_get_overlap_stats","name":"$tmp_target_obj"},{"line":1493,"kind":13,"containerName":"_get_overlap_stats","name":"$target_id"},{"kind":13,"containerName":"_get_overlap_stats","name":"$tmp_target_id","line":1493},{"kind":13,"containerName":"_get_overlap_stats","name":"$best_score","line":1498},{"containerName":"_get_overlap_stats","kind":13,"name":"$avg_length","line":1499},{"kind":13,"containerName":"_get_overlap_stats","name":"$best_length","line":1499},{"kind":13,"containerName":"_get_overlap_stats","name":"$avg_identity","line":1500},{"line":1500,"name":"$best_identity","containerName":"_get_overlap_stats","kind":13},{"kind":13,"containerName":"_get_overlap_stats","name":"$min_length","line":1501},{"line":1501,"kind":13,"containerName":"_get_overlap_stats","name":"$best_length"},{"kind":13,"containerName":"_get_overlap_stats","name":"$min_length","line":1501},{"name":"$best_length","kind":13,"containerName":"_get_overlap_stats","line":1502},{"line":1502,"name":"$min_length","containerName":"_get_overlap_stats","kind":13},{"kind":13,"containerName":"_get_overlap_stats","name":"$min_identity","line":1503},{"containerName":"_get_overlap_stats","kind":13,"name":"$best_identity","line":1503},{"name":"$min_identity","containerName":"_get_overlap_stats","kind":13,"line":1503},{"kind":13,"containerName":"_get_overlap_stats","name":"$best_identity","line":1504},{"line":1504,"containerName":"_get_overlap_stats","kind":13,"name":"$min_identity"},{"containerName":"_get_overlap_stats","kind":13,"name":"$nof_overlaps","line":1505},{"name":"$nof_overlaps","containerName":"_get_overlap_stats","kind":13,"line":1511},{"name":"$avg_length","kind":13,"containerName":"_get_overlap_stats","line":1512},{"line":1512,"name":"$nof_overlaps","kind":13,"containerName":"_get_overlap_stats"},{"line":1513,"name":"$avg_identity","containerName":"_get_overlap_stats","kind":13},{"containerName":"_get_overlap_stats","kind":13,"name":"$nof_overlaps","line":1513},{"name":"$nof_overlaps","containerName":"_get_overlap_stats","kind":13,"line":1516},{"name":"$min_length","containerName":"_get_overlap_stats","kind":13,"line":1516},{"line":1516,"name":"$avg_length","kind":13,"containerName":"_get_overlap_stats"},{"line":1516,"containerName":"_get_overlap_stats","kind":13,"name":"$min_identity"},{"name":"$avg_identity","kind":13,"containerName":"_get_overlap_stats","line":1516}],"containerName":"main::","name":"_get_overlap_stats","definition":"sub","detail":"($self,$assembly_obj,$seq_hash)"},{"line":1460,"kind":12,"name":"seq_hash"},{"containerName":"Align::PairwiseStatistics","kind":12,"name":"Bio","line":1467},{"kind":12,"name":"seq_hash","line":1479},{"detail":"($self,$contig,$qseq,$tseq,$min_overlap,$min_identity)","definition":"sub","containerName":"main::","name":"_overlap_alignment","children":[{"kind":13,"localvar":"my","containerName":"_overlap_alignment","name":"$self","line":1540,"definition":"my"},{"line":1540,"name":"$contig","kind":13,"containerName":"_overlap_alignment"},{"line":1540,"name":"$qseq","containerName":"_overlap_alignment","kind":13},{"line":1540,"name":"$tseq","kind":13,"containerName":"_overlap_alignment"},{"line":1540,"kind":13,"containerName":"_overlap_alignment","name":"$min_overlap"},{"kind":13,"containerName":"_overlap_alignment","name":"$min_identity","line":1540},{"localvar":"my","kind":13,"containerName":"_overlap_alignment","name":"$qpos","line":1542,"definition":"my"},{"line":1542,"name":"$contig","kind":13,"containerName":"_overlap_alignment"},{"containerName":"_overlap_alignment","kind":12,"name":"get_seq_coord","line":1542},{"kind":13,"containerName":"_overlap_alignment","name":"$qseq","line":1542},{"line":1543,"name":"$qstart","containerName":"_overlap_alignment","localvar":"my","kind":13,"definition":"my"},{"line":1543,"kind":13,"containerName":"_overlap_alignment","name":"$qpos"},{"name":"start","kind":12,"containerName":"_overlap_alignment","line":1543},{"localvar":"my","kind":13,"containerName":"_overlap_alignment","name":"$qend","line":1544,"definition":"my"},{"kind":13,"containerName":"_overlap_alignment","name":"$qpos","line":1544},{"kind":12,"containerName":"_overlap_alignment","name":"end","line":1544},{"definition":"my","line":1546,"name":"$tpos","containerName":"_overlap_alignment","localvar":"my","kind":13},{"kind":13,"containerName":"_overlap_alignment","name":"$contig","line":1546},{"containerName":"_overlap_alignment","kind":12,"name":"get_seq_coord","line":1546},{"line":1546,"name":"$tseq","containerName":"_overlap_alignment","kind":13},{"localvar":"my","containerName":"_overlap_alignment","kind":13,"name":"$tstart","line":1547,"definition":"my"},{"line":1547,"containerName":"_overlap_alignment","kind":13,"name":"$tpos"},{"name":"start","containerName":"_overlap_alignment","kind":12,"line":1547},{"definition":"my","kind":13,"localvar":"my","containerName":"_overlap_alignment","name":"$tend","line":1548},{"name":"$tpos","kind":13,"containerName":"_overlap_alignment","line":1548},{"line":1548,"kind":12,"containerName":"_overlap_alignment","name":"end"},{"containerName":"_overlap_alignment","kind":13,"name":"$qstart","line":1550},{"kind":13,"containerName":"_overlap_alignment","name":"$tend","line":1550},{"line":1550,"name":"$qend","kind":13,"containerName":"_overlap_alignment"},{"kind":13,"containerName":"_overlap_alignment","name":"$tstart","line":1550},{"definition":"my","name":"$left","containerName":"_overlap_alignment","localvar":"my","kind":13,"line":1552},{"line":1552,"kind":13,"containerName":"_overlap_alignment","name":"$qstart"},{"line":1553,"kind":13,"containerName":"_overlap_alignment","name":"$left"},{"line":1553,"name":"$tstart","kind":13,"containerName":"_overlap_alignment"},{"name":"$qstart","kind":13,"containerName":"_overlap_alignment","line":1553},{"line":1553,"containerName":"_overlap_alignment","kind":13,"name":"$tstart"},{"line":1554,"name":"$right","localvar":"my","containerName":"_overlap_alignment","kind":13,"definition":"my"},{"line":1554,"containerName":"_overlap_alignment","kind":13,"name":"$qend"},{"name":"$right","kind":13,"containerName":"_overlap_alignment","line":1555},{"name":"$tend","kind":13,"containerName":"_overlap_alignment","line":1555},{"line":1555,"kind":13,"containerName":"_overlap_alignment","name":"$qend"},{"containerName":"_overlap_alignment","kind":13,"name":"$tend","line":1555},{"line":1556,"localvar":"my","containerName":"_overlap_alignment","kind":13,"name":"$overlap","definition":"my"},{"name":"$right","containerName":"_overlap_alignment","kind":13,"line":1556},{"line":1556,"name":"$left","kind":13,"containerName":"_overlap_alignment"},{"containerName":"_overlap_alignment","kind":13,"name":"$min_overlap","line":1557},{"name":"$overlap","kind":13,"containerName":"_overlap_alignment","line":1557},{"line":1557,"name":"$min_overlap","kind":13,"containerName":"_overlap_alignment"},{"line":1559,"localvar":"my","kind":13,"containerName":"_overlap_alignment","name":"$qleft","definition":"my"},{"name":"$contig","containerName":"_overlap_alignment","kind":13,"line":1559},{"line":1559,"name":"change_coord","kind":12,"containerName":"_overlap_alignment"},{"line":1559,"containerName":"_overlap_alignment","kind":13,"name":"$qseq"},{"name":"id","kind":12,"containerName":"_overlap_alignment","line":1559},{"containerName":"_overlap_alignment","kind":13,"name":"$left","line":1560},{"line":1561,"kind":13,"localvar":"my","containerName":"_overlap_alignment","name":"$qright","definition":"my"},{"line":1561,"containerName":"_overlap_alignment","kind":13,"name":"$qleft"},{"name":"$overlap","containerName":"_overlap_alignment","kind":13,"line":1561},{"name":"$qstring","kind":13,"localvar":"my","containerName":"_overlap_alignment","line":1562,"definition":"my"},{"line":1562,"name":"$qseq","kind":13,"containerName":"_overlap_alignment"},{"containerName":"_overlap_alignment","kind":12,"name":"seq","line":1562},{"line":1563,"name":"$qstring","kind":13,"containerName":"_overlap_alignment"},{"containerName":"_overlap_alignment","kind":13,"name":"$qstring","line":1563},{"kind":13,"containerName":"_overlap_alignment","name":"$qleft","line":1563},{"line":1563,"containerName":"_overlap_alignment","kind":13,"name":"$overlap"},{"line":1564,"name":"$tleft","kind":13,"localvar":"my","containerName":"_overlap_alignment","definition":"my"},{"name":"$contig","kind":13,"containerName":"_overlap_alignment","line":1564},{"kind":12,"containerName":"_overlap_alignment","name":"change_coord","line":1564},{"line":1564,"name":"$tseq","containerName":"_overlap_alignment","kind":13},{"line":1564,"name":"id","kind":12,"containerName":"_overlap_alignment"},{"name":"$left","kind":13,"containerName":"_overlap_alignment","line":1565},{"definition":"my","line":1566,"containerName":"_overlap_alignment","localvar":"my","kind":13,"name":"$tright"},{"kind":13,"containerName":"_overlap_alignment","name":"$tleft","line":1566},{"line":1566,"name":"$overlap","containerName":"_overlap_alignment","kind":13},{"line":1567,"name":"$tstring","localvar":"my","containerName":"_overlap_alignment","kind":13,"definition":"my"},{"line":1567,"name":"$tseq","containerName":"_overlap_alignment","kind":13},{"line":1567,"name":"seq","containerName":"_overlap_alignment","kind":12},{"containerName":"_overlap_alignment","kind":13,"name":"$tstring","line":1568},{"kind":13,"containerName":"_overlap_alignment","name":"$tstring","line":1568},{"line":1568,"name":"$tleft","kind":13,"containerName":"_overlap_alignment"},{"name":"$overlap","kind":13,"containerName":"_overlap_alignment","line":1568},{"definition":"my","name":"$pos","localvar":"my","kind":13,"containerName":"_overlap_alignment","line":1570},{"name":"$pos","kind":13,"containerName":"_overlap_alignment","line":1570},{"line":1570,"name":"$overlap","kind":13,"containerName":"_overlap_alignment"},{"line":1570,"name":"$pos","kind":13,"containerName":"_overlap_alignment"},{"name":"$qnt","containerName":"_overlap_alignment","localvar":"my","kind":13,"line":1571,"definition":"my"},{"line":1571,"kind":13,"containerName":"_overlap_alignment","name":"$qstring"},{"name":"$pos","kind":13,"containerName":"_overlap_alignment","line":1571},{"definition":"my","kind":13,"localvar":"my","containerName":"_overlap_alignment","name":"$tnt","line":1572},{"line":1572,"name":"$tstring","containerName":"_overlap_alignment","kind":13},{"line":1572,"containerName":"_overlap_alignment","kind":13,"name":"$pos"},{"line":1573,"name":"$qnt","containerName":"_overlap_alignment","kind":13},{"line":1573,"containerName":"_overlap_alignment","kind":13,"name":"$tnt"},{"line":1574,"name":"$qstring","kind":13,"containerName":"_overlap_alignment"},{"line":1574,"kind":13,"containerName":"_overlap_alignment","name":"$pos"},{"name":"$tstring","kind":13,"containerName":"_overlap_alignment","line":1575},{"line":1575,"containerName":"_overlap_alignment","kind":13,"name":"$pos"},{"containerName":"_overlap_alignment","kind":13,"name":"$pos","line":1576},{"name":"$overlap","containerName":"_overlap_alignment","kind":13,"line":1577},{"line":1580,"containerName":"_overlap_alignment","kind":13,"name":"$min_overlap"},{"line":1580,"name":"$overlap","kind":13,"containerName":"_overlap_alignment"},{"line":1580,"name":"$min_overlap","containerName":"_overlap_alignment","kind":13},{"definition":"my","name":"$aln","kind":13,"localvar":"my","containerName":"_overlap_alignment","line":1582},{"name":"new","containerName":"_overlap_alignment","kind":12,"line":1582},{"definition":"my","containerName":"_overlap_alignment","localvar":"my","kind":13,"name":"$qalseq","line":1583},{"kind":12,"containerName":"_overlap_alignment","name":"new","line":1583},{"name":"$qstring","containerName":"_overlap_alignment","kind":13,"line":1585},{"kind":13,"containerName":"_overlap_alignment","name":"$aln","line":1589},{"line":1589,"name":"add_seq","kind":12,"containerName":"_overlap_alignment"},{"line":1589,"name":"$qalseq","containerName":"_overlap_alignment","kind":13},{"line":1590,"containerName":"_overlap_alignment","localvar":"my","kind":13,"name":"$talseq","definition":"my"},{"line":1590,"name":"new","kind":12,"containerName":"_overlap_alignment"},{"containerName":"_overlap_alignment","kind":13,"name":"$tstring","line":1592},{"line":1596,"kind":13,"containerName":"_overlap_alignment","name":"$aln"},{"line":1596,"containerName":"_overlap_alignment","kind":12,"name":"add_seq"},{"line":1596,"name":"$talseq","kind":13,"containerName":"_overlap_alignment"},{"name":"$identity","containerName":"_overlap_alignment","localvar":"my","kind":13,"line":1598,"definition":"my"},{"line":1598,"kind":13,"containerName":"_overlap_alignment","name":"$aln"},{"kind":12,"containerName":"_overlap_alignment","name":"overall_percentage_identity","line":1598},{"kind":13,"containerName":"_overlap_alignment","name":"$min_identity","line":1599},{"line":1599,"name":"$identity","containerName":"_overlap_alignment","kind":13},{"name":"$min_identity","kind":13,"containerName":"_overlap_alignment","line":1599},{"line":1601,"kind":13,"containerName":"_overlap_alignment","name":"$aln"},{"containerName":"_overlap_alignment","kind":13,"name":"$overlap","line":1601},{"kind":13,"containerName":"_overlap_alignment","name":"$identity","line":1601}],"signature":{"documentation":"__END__\n#\n# BioPerl module for Bio::Assembly::Tools::ContigSpectrum\n#\n# Copyright by Florent Angly\n#\n# You may distribute this module under the same terms as Perl itself\n#\n# POD documentation - main docs before the code\n\n=head1 NAME\n\nBio::Assembly::Tools::ContigSpectrum - create and manipulate contig spectra\n\n=head1 SYNOPSIS\n\n  # Simple contig spectrum creation\n  my $csp1 = Bio::Assembly::Tools::ContigSpectrum->new(\n    -id       => 'csp1',\n    -spectrum => { 1 => 10,\n                   2 => 2,\n                   3 => 1 } );\n\n  # ...or another way to create a simple contig spectrum\n  my $csp2 = Bio::Assembly::Tools::ContigSpectrum->new;\n  $csp2->id('csp2');\n  $csp2->spectrum({ 1 => 20, 2 => 1, 4 => 1 });\n\n  # Get some information\n  print \"This is contig spectrum \".$csp->id.\"\\n\";\n  print \"It contains \".$csp->nof_seq.\" sequences\\n\";\n  print \"The largest contig has \".$csp->max_size.\" sequences\\n\";\n  print \"The spectrum is: \".$csp->to_string($csp->spectrum).\"\\n\";\n\n  # Let's add the contig spectra\n  my $summed_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $summed_csp->add($csp1);\n  $summed_csp->add($csp2);\n  print \"The summed contig spectrum is \".$summed_csp->to_string.\"\\n\";\n\n  # Make an average\n  my $avg_csp = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg_csp = $avg_csp->average([$csp1, $csp2]);\n  print \"The average contig spectrum is \".$avg_csp->to_string.\"\\n\";\n\n  # Get a contig spectrum from an assembly\n  my $from_assembly = Bio::Assembly::Tools::ContigSpectrum->new(\n    -assembly       => $assembly_object,\n    -eff_asm_params => 1);\n  print \"The contig spectrum from assembly is \".$from_assembly->to_string.\"\\n\";\n\n  # Report advanced information (possible because eff_asm_params = 1)\n  print \"Average sequence length: \".$from_assembly->avg_seq_length.\" bp\\n\";\n  print \"Minimum overlap length: \".$from_assembly->min_overlap.\" bp\\n\";\n  print \"Average overlap length: \".$from_assembly->avg_overlap.\" bp\\n\";\n  print \"Minimum overlap match: \".$from_assembly->min_identity.\" %\\n\";\n  print \"Average overlap match: \".$from_assembly->avg_identity.\" %\\n\";\n\n  # Assuming the assembly object contains sequences from several different\n  # metagenomes, we have a mixed contig spectrum from which a cross contig\n  # spectrum and dissolved contig spectra can be obtained\n  my $mixed_csp = $from_assembly;\n\n  # Calculate a dissolved contig spectrum\n  my $meta1_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome1'] );\n  my $meta2_dissolved = Bio::Assembly::Tools::ContigSpectrum->new(\n    -dissolve => [$mixed_csp, 'metagenome2'] );\n  print \"The dissolved contig spectra are:\\n\".\n    $meta1_dissolved->to_string.\"\\n\".\n    $meta2_dissolved->to_string.\"\\n\";\n\n  # Determine a cross contig spectrum\n  my $cross_csp = Bio::Assembly::Tools::ContigSpectrum->new(\n    -cross => $mixed_csp );\n  print \"The cross contig spectrum is \".$cross_csp->to_string.\"\\n\";\n\n  # Score a contig spectrum (the more abundant the contigs and the larger their\n  # size, the larger the score)\n\n\n=head1 DESCRIPTION\n\nThe Bio::Assembly::Tools::ContigSpectrum Perl module enables to\nmanually create contig spectra, import them from assemblies,\nmanipulate them, transform between different types of contig spectra\nand output them.\n\nBio::Assembly::Tools::ContigSpectrum is a module to create, manipulate\nand output contig spectra, assembly-derived data used in metagenomics\n(community genomics) for diversity estimation.\n\n=head2 Background\n\nA contig spectrum is the count of the number of contigs of different\nsize in an assembly. For example, the contig spectrum [100 5 1 0 0\n...] means that there were 100 singlets (1-contigs), 5 contigs of 2\nsequences (2-contigs), 1 contig of 3 sequences (3-contig) and no\nlarger contigs.\n\nAn assembly can be produced from a mixture of sequences from different\nmetagenomes. The contig obtained from this assembly is a mixed contig\nspectrum. The contribution of each metagenome in this mixed contig\nspectrum can be obtained by determining a dissolved contig spectrum.\n\nFinally, based on a mixed contig spectrum, a cross contig spectrum can\nbe determined. In a cross contig spectrum, only contigs containing\nsequences from different metagenomes are kept; \"pure\" contigs are\nexcluded. Additionally, the total number of singletons (1-contigs)\nfrom each region that assembles with any fragments from other regions\nis the number of 1-contigs in the cross contig spectrum.\n\n=head2 Implementation\n\nThe simplest representation of a contig spectrum is as a hash\nrepresentation where the key is the contig size (number of sequences\nmaking up the contig) and the value the number of contigs of this\nsize.\n\nIn fact, it is useful to have more information associated with the\ncontig spectrum, hence the Bio::Assembly::Tools::ContigSpectrum module\nimplements an object containing a contig spectrum hash and additional\ninformation. The get/set methods to access them are:\n\n    id              contig spectrum ID\n    nof_seq         number of sequences\n    nof_rep         number of repetitions (assemblies) used\n    max_size        size of (number of sequences in) the largest contig\n    nof_overlaps    number of overlaps\n    min_overlap     minimum overlap length for building a contig\n    min_identity    minimum sequence identity over the overlap length\n    avg_overlap     average overlap length\n    avg_identity    average overlap identity\n    avg_seq_length  average sequence length\n    eff_asm_params  effective assembly parameters\n    spectrum        hash representation of a contig spectrum\n\n  Operations on the contig spectra:\n\n    to_string       create a string representation of the spectrum\n    spectrum        import a hash contig spectrum\n    assembly        determine a contig spectrum from an assembly\n    dissolve        calculate a dissolved contig spectrum (based on assembly)\n    cross           produce a cross contig spectrum (based on assembly)\n    add             add a contig spectrum to an existing one\n    average         make an average of several contig spectra\n\nWhen using operations that rely on knowing \"where\" (from what\nmetagenomes) a sequence came from (i.e. when creating a dissolved or\ncross contig spectrum), make sure that the sequences used for the\nassembly have a name header, e.g.  E<gt>metagenome1|seq1,\nE<gt>metagenome2|seq1, ...\n\n=head1 FEEDBACK\n\n=head2 Mailing Lists\n\nUser feedback is an integral part of the evolution of this and other\nBioperl modules. Send your comments and suggestions preferably to the\nBioperl mailing lists  Your participation is much appreciated.\n\n  bioperl-l@bioperl.org                  - General discussion\n  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists\n\n\n=head2 Support \n\nPlease direct usage questions or support issues to the mailing list:\n\nI<bioperl-l@bioperl.org>\n\nrather than to the module maintainer directly. Many experienced and \nreponsive experts will be able look at the problem and quickly \naddress it. Please include a thorough description of the problem \nwith code and data examples if at all possible.\n\n=head2 Reporting Bugs\n\nReport bugs to the BioPerl bug tracking system to help us keep track\nthe bugs and their resolution. Bug reports can be submitted via email\nor the web:\n\n  bioperl-bugs@bio.perl.org\n  http://bugzilla.bioperl.org/\n\n=head1 AUTHOR - Florent E Angly\n\nEmail florent_dot_angly_at_gmail_dot_com\n\n=head1 APPENDIX\n\nThe rest of the documentation details each of the object\nmethods. Internal methods are usually preceded with a \"_\".\n\n\npackage Bio::Assembly::Tools::ContigSpectrum;\n\nuse strict;\n\nuse Bio::Root::Root;\nuse Bio::Assembly::Scaffold;\nuse Bio::SimpleAlign;\nuse Bio::LocatableSeq;\nuse Bio::Align::PairwiseStatistics;\n\nuse base 'Bio::Root::Root';\n\n\n=head2 new\n\n  Title   : new\n  Usage   : my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -id => 'some_name',\n              -spectrum =>  { 1 => 90 , 2 => 3 , 4 => 1 },\n            );\n              or\n            my $csp = Bio::Assembly::Tools::ContigSpectrum->new(\n              -assembly =>  $assembly_obj\n            );\n  Function: create a new contig spectrum object\n  Returns : reference to a contig spectrum object\n  Args    : none\n\n\nsub new {\n  my ($class, @args) = @_;\n  my $self = $class->SUPER::new(@args);\n  my ( $id, $nof_seq, $nof_rep, $max_size, $nof_overlaps, $min_overlap,\n    $min_identity, $avg_overlap, $avg_identity, $avg_seq_len, $spectrum,\n    $assembly, $eff_asm_params, $dissolve, $cross) = $self->_rearrange( [qw(ID\n    NOF_SEQ NOF_REP MAX_SIZE NOF_OVERLAPS MIN_OVERLAP MIN_IDENTITY AVG_OVERLAP\n    AVG_IDENTITY AVG_SEQ_LEN SPECTRUM ASSEMBLY EFF_ASM_PARAMS DISSOLVE CROSS)],\n    @args );\n\n  # First set up some defauts\n  $self->{'_id'}             = 'NoName';\n  $self->{'_nof_seq'}        = 0;\n  $self->{'_nof_rep'}        = 0;\n  $self->{'_max_size'}       = 0;\n  $self->{'_nof_overlaps'}   = 0;\n  $self->{'_min_overlap'}    = undef;\n  $self->{'_min_identity'}   = undef;\n  $self->{'_avg_overlap'}    = 0;\n  $self->{'_avg_identity'}   = 0;\n  $self->{'_avg_seq_len'}    = 0;\n  $self->{'_eff_asm_params'} = 0;\n  $self->{'_spectrum'}       = {1 => 0};  # contig spectrum hash representation\n  $self->{'_assembly'}       = []; # list of assembly objects used\n\n  # Then, according to user desires, override defaults\n  $self->{'_id'}             = $id             if (defined $id);\n  $self->{'_nof_seq'}        = $nof_seq        if (defined $nof_seq);\n  $self->{'_nof_rep'}        = $nof_rep        if (defined $nof_rep);\n  $self->{'_max_size'}       = $max_size       if (defined $max_size);\n  $self->{'_nof_overlaps'}   = $nof_overlaps   if (defined $nof_overlaps);\n  $self->{'_min_overlap'}    = $min_overlap    if (defined $min_overlap);\n  $self->{'_avg_overlap'}    = $avg_overlap    if (defined $avg_overlap);\n  $self->{'_min_identity'}   = $min_identity   if (defined $min_identity);\n  $self->{'_avg_identity'}   = $avg_identity   if (defined $avg_identity);\n  $self->{'_avg_seq_len'}    = $avg_seq_len    if (defined $avg_seq_len);\n  $self->{'_eff_asm_params'} = $eff_asm_params if (defined $eff_asm_params);\n\n  # Finally get stuff that can be gotten in an automated way\n  $self->_import_spectrum($spectrum) if defined($spectrum);\n  $self->_import_assembly($assembly) if defined($assembly);\n  if (defined($dissolve)) {\n    my ($mixed_csp, $header) = (@$dissolve[0], @$dissolve[1]);\n    $self->_import_dissolved_csp($mixed_csp, $header);\n  }\n  $self->_import_cross_csp($cross)   if defined($cross);\n\n  return $self;\n}\n\n\n=head2 id\n\n  Title   : id\n  Usage   : $csp->id\n  Function: get/set contig spectrum id\n  Returns : string\n  Args    : string [optional]\n\n\nsub id {\n  my ($self, $id) = @_;\n  if (defined $id) {\n    $self->{'_id'} = $id;\n  }\n  $id = $self->{'_id'};\n  return $id;\n}\n\n\n=head2 nof_seq\n\n  Title   : nof_seq\n  Usage   : $csp->nof_seq\n  Function: get/set the number of sequences making up the contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_seq {\n  my ($self, $nof_seq) = @_;\n  if (defined $nof_seq) {\n    $self->throw(\"The number of sequences must be strictly positive. Got \".\n      \"'$nof_seq'\") if $nof_seq < 1;\n    $self->{'_nof_seq'} = $nof_seq;\n  }\n  $nof_seq = $self->{'_nof_seq'};\n  return $nof_seq;\n}\n\n\n=head2 nof_rep\n\n  Title   : nof_rep\n  Usage   : $csp->nof_rep\n  Function: Get/Set the number of repetitions (assemblies) used to create the \n            contig spectrum\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_rep {\n  my ($self, $nof_rep) = @_;\n  if (defined $nof_rep) {\n    $self->throw(\"The number of repetitions must be strictly positive. Got \".\n      \"'$nof_rep'\") if $nof_rep < 1;\n    $self->{'_nof_rep'} = $nof_rep;\n  }\n  $nof_rep = $self->{'_nof_rep'};\n  return $nof_rep;\n}\n\n\n=head2 max_size\n\n  Title   : max_size\n  Usage   : $csp->max_size\n  Function: get/set the size of (number of sequences in) the largest contig\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub max_size {\n  my ($self, $max_size) = @_;\n  if (defined $max_size) {\n    $self->throw(\"The contig maximum size must be strictly positive. Got \".\n      \"'$max_size'\") if $max_size < 1;\n    $self->{'_max_size'} = $max_size;\n  }\n  $max_size = $self->{'_max_size'};\n  return $max_size;\n}\n\n\n=head2 nof_overlaps\n\n  Title   : nof_overlaps\n  Usage   : $csp->nof_overlaps\n  Function: Get/Set the number of overlaps in the assembly.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub nof_overlaps {\n  my ($self, $nof_overlaps) = @_;\n  if (defined $nof_overlaps) {\n    $self->throw(\"The number of overlaps must be strictly positive. Got \".\n      \"'$nof_overlaps'\") if $nof_overlaps < 1;\n    $self->{'_nof_overlaps'} = $nof_overlaps;\n  }\n  $nof_overlaps = $self->{'_nof_overlaps'};\n  return $nof_overlaps;\n}\n\n\n=head2 min_overlap\n\n  Title   : min_overlap\n  Usage   : $csp->min_overlap\n  Function: get/set the assembly minimum overlap length\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub min_overlap {\n  my ($self, $min_overlap) = @_;\n  if (defined $min_overlap) {\n    $self->throw(\"The minimum of overlap length must be strictly positive. Got\".\n      \" '$min_overlap'\") if $min_overlap < 1;\n    $self->{'_min_overlap'} = $min_overlap;\n  }\n  $min_overlap = $self->{'_min_overlap'};\n  return $min_overlap;\n}\n\n\n=head2 avg_overlap\n\n  Title   : avg_overlap\n  Usage   : $csp->avg_overlap\n  Function: get/set the assembly average overlap length\n  Returns : decimal\n  Args    : decimal [optional]\n\n\nsub avg_overlap {\n  my ($self, $avg_overlap) = @_;\n  if (defined $avg_overlap) {\n    $self->throw(\"The average overlap length must be strictly positive. Got \".\n      \"'$avg_overlap'\") if $avg_overlap < 1;\n    $self->{'_avg_overlap'} = $avg_overlap;\n  }\n  $avg_overlap = $self->{'_avg_overlap'};\n  return $avg_overlap;\n}\n\n\n=head2 min_identity\n\n  Title   : min_identity\n  Usage   : $csp->min_identity\n  Function: get/set the assembly minimum overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub min_identity {\n  my ($self, $min_identity) = @_;\n  if (defined $min_identity) {\n    $self->throw(\"The minimum overlap percent identity must be strictly \".\n      \"positive. Got '$min_identity'\") if $min_identity < 1;\n    $self->{'_min_identity'} = $min_identity;\n  }\n  $min_identity = $self->{'_min_identity'};\n  return $min_identity;\n}\n\n\n=head2 avg_identity\n\n  Title   : avg_identity\n  Usage   : $csp->avg_identity\n  Function: get/set the assembly average overlap identity percent\n  Returns : 0 < decimal < 100\n  Args    : 0 < decimal < 100 [optional]\n\n\nsub avg_identity {\n  my ($self, $avg_identity) = @_;\n  if (defined $avg_identity) {\n    $self->throw(\"The average overlap percent identity must be strictly \".\n      \"positive. Got '$avg_identity'\") if $avg_identity < 1;\n    $self->{'_avg_identity'} = $avg_identity;\n  }\n  $avg_identity = $self->{'_avg_identity'};\n  return $avg_identity;\n}\n\n\n=head2 avg_seq_len\n\n  Title   : avg_seq_len\n  Usage   : $csp->avg_seq_len\n  Function: get/set the assembly average sequence length\n  Returns : avg_seq_len\n  Args    : real [optional]\n\n\nsub avg_seq_len {\n  my ($self, $avg_seq_len) = @_;\n  if (defined $avg_seq_len) {\n    $self->throw(\"The average sequence length must be strictly positive. Got \".\n      \"'$avg_seq_len'\") if $avg_seq_len < 1;\n    $self->{'_avg_seq_len'} = $avg_seq_len;\n  }\n  $avg_seq_len = $self->{'_avg_seq_len'};\n  return $avg_seq_len;\n}\n\n\n=head2 eff_asm_params\n\n  Title   : eff_asm_params\n  Usage   : $csp->eff_asm_params(1)\n  Function: Get/set the effective assembly parameters option. It defines if the\n            effective assembly parameters should be determined when a contig\n            spectrum based or derived from an assembly is calulated. The\n            effective assembly parameters include avg_seq_length, nof_overlaps,\n            min_overlap, avg_overlap, min_identity and avg_identity.\n            1 = get them, 0 = don't.\n  Returns : integer\n  Args    : integer [optional]\n\n\nsub eff_asm_params {\n  my ($self, $eff_asm_params) = @_;\n  if (defined $eff_asm_params) {\n    $self->throw(\"eff_asm_params can only take values 0 or 1. Input value was \".\n      \"'$eff_asm_params'\") unless $eff_asm_params == 0 || $eff_asm_params == 1;\n    $self->{'_eff_asm_params'} = $eff_asm_params;\n  }\n  $eff_asm_params = $self->{'_eff_asm_params'};\n  return $eff_asm_params;\n}\n\n\n=head2 spectrum\n\n  Title   : spectrum\n  Usage   : my $spectrum = $csp->spectrum({1=>10, 2=>2, 3=>1});\n  Function: Get the current contig spectrum represented as a hash / Update a\n            contig spectrum object based on a contig spectrum represented as a\n            hash\n            The hash representation of a contig spectrum is as following:\n              key   -> contig size (in number of sequences)\n              value -> number of contigs of this size\n  Returns : contig spectrum as a hash reference\n  Args    : contig spectrum as a hash reference [optional]\n\n\nsub spectrum {\n  my ($self, $spectrum) = @_;\n  if (defined $spectrum) {\n    $self->_import_spectrum($spectrum);\n  }\n  $spectrum = $self->{'_spectrum'};\n  return $spectrum;\n}\n\n\n=head2 assembly\n\n  Title   : assembly\n  Usage   : my @asm_list = $csp->assembly();\n  Function: Get a reference to the list of assembly object reference used to\n            make the contig spectrum object / Update the contig spectrum object\n            based on an assembly object.\n  Returns : array of Bio::Assembly::Scaffold\n  Args    : Bio::Assembly::Scaffold\n\n\nsub assembly {\n  my ($self, $assembly) = @_;\n  if (defined $assembly) {\n    $self->_import_assembly($assembly);\n  }\n  my @asm_list = @{$self->{'_assembly'}} if defined $self->{'_assembly'};\n  return \\@asm_list;\n}\n\n=head2 drop_assembly\n\n  Title   : drop_assembly\n  Usage   : $csp->drop_assembly();\n  Function: Remove all assembly objects associated with a contig spectrum.\n            Assembly objects can be big. This method allows to free some memory\n            when assembly information is not needed anymore.\n  Returns : 1 for success, 0 for failure\n  Args    : none\n\n\nsub drop_assembly {\n  my ($self) = @_;\n  $self->{'_assembly'} = [];\n  return 1;\n}\n\n=head2 dissolve\n\n  Title   : dissolve\n  Usage   : $dissolved_csp->dissolve($mixed_csp, $seq_header);\n  Function: Dissolve a mixed contig spectrum for the set of sequences that\n            contain the specified header, i.e. determine the contribution of\n            these sequences to the mixed contig spectrum based on the assembly.\n            The mixed contig spectrum object must have been created based on one\n            (or several) assembly object(s). Additionally, min_overlap and\n            min_identity must have been set (either manually using min_overlap\n            or automatically by switching on the eff_asm_params option).\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n            sequence header string\n\n\n\nsub dissolve {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  $self->_import_dissolved_csp($mixed_csp, $seq_header);\n  return 1;\n}\n\n\n=head2 cross\n\n  Title   : cross\n  Usage   : $cross_csp->cross($mixed_csp);\n  Function: Calculate a cross contig_spectrum based on a mixed contig_spectrum.\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum reference\n\n\nsub cross {\n  my ($self, $mixed_csp) = @_;\n  $self->_import_cross_csp($mixed_csp);\n  return 1;\n}\n\n=head2 to_string\n\n  Title   : to_string\n  Usage   : my $csp_string = $csp->to_string;\n  Function: Convert the contig spectrum into a string (easy to print!!).\n  Returns : string\n  Args    : element separator (integer) [optional]\n              1 -> space-separated\n              2 -> tab-separated\n              3 -> newline-separated\n\n\nsub to_string {\n  my ($self, $element_separator) = @_;\n  return 0 if $self->{'_max_size'} == 0;\n  $element_separator ||= 1;\n  if ($element_separator == 1) {\n    $element_separator = ' ';\n  } elsif ($element_separator == 2) {\n    $element_separator = \"\\t\";\n  } elsif ($element_separator == 3) {\n    $element_separator = \"\\n\";\n  } else {\n    $self->throw(\"Unknown separator type '$element_separator'\\n\");\n  }\n  my $str = '';\n  for (my $q = 1 ; $q <= $self->{'_max_size'} ; $q++) {\n    my $val = 0;\n    if (exists $self->{'_spectrum'}{$q}) {\n      $val = $self->{'_spectrum'}{$q};\n    }\n    $str .= $val.$element_separator;\n  }\n  $str =~ s/\\s$//;\n  return $str;\n}\n\n\n=head2 add\n\n  Title   : add\n  Usage   : $csp->add($additional_csp);\n  Function: Add a contig spectrum to an existing one: sums the spectra, update\n            the number of sequences, number of repetitions, ...\n  Returns : 1 for success, 0 for failure\n  Args    : Bio::Assembly::Tools::ContigSpectrum object\n\n\nsub add {\n  my ($self, $csp) = @_;\n  # Sanity check\n  if( !ref $csp || ! $csp->isa('Bio::Assembly::Tools::ContigSpectrum') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n  }\n  # Update overlap statistics\n  if ( $self->{'_eff_asm_params'} > 0 ) {\n    # Warnings\n    if ( $csp->{'_eff_asm_params'} == 0 ) {\n      $self->warn(\"The parent contig spectrum needs effective assembly \".\n        \"parameters (eff_asm_params = \".$self->{'_eff_asm_params'}.\") but the \".\n        \"child contig spectrum doesn't have them (eff_asm_params = \".\n        $csp->{'_eff_asm_params'}.\"). Skipping them...\");\n    } elsif ( $csp->{'_eff_asm_params'} != $self->{'_eff_asm_params'} ) {\n      $self->warn(\"The parent contig spectrum needs a different method for \".\n        \"detecting the effective assembly parameters (eff_asm_params = \".\n        $self->{'_eff_asm_params'}.\") than the one specified for the child \".\n        \"contig spectrum (eff_asm_params = \".$csp->{'_eff_asm_params'}.\"). \".\n        \"Ignoring the differences...\");\n    }\n    # Update existing stats\n    my $tot_num_overlaps = $csp->{'_nof_overlaps'} + $self->{'_nof_overlaps'};\n    $self->{'_min_overlap'} = $csp->{'_min_overlap'} if\n      defined $csp->{'_min_overlap'} && ( ! defined $self->{'_min_overlap'} ||\n      $csp->{'_min_overlap'} < $self->{'_min_overlap'} );\n    $self->{'_min_identity'} = $csp->{'_min_identity'} if\n      defined $csp->{'_min_identity'} && ( ! defined $self->{'_min_identity'} ||\n      $csp->{'_min_identity'} < $self->{'_min_identity'} );\n    if ($tot_num_overlaps != 0) {\n      $self->{'_avg_overlap'} =\n        ($csp->{'_avg_overlap'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_overlap'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n      $self->{'_avg_identity'} =\n        ($csp->{'_avg_identity'} * $csp->{'_nof_overlaps'}\n        + $self->{'_avg_identity'} * $self->{'_nof_overlaps'})\n        / $tot_num_overlaps;\n    }\n    $self->{'_nof_overlaps'} = $tot_num_overlaps;\n  }\n  # Update sequence statistics\n  my $tot_nof_seq = $csp->{'_nof_seq'} + $self->{'_nof_seq'};\n  if (not $tot_nof_seq == 0) {\n    $self->{'_avg_seq_len'} = ($csp->{'_avg_seq_len'} * $csp->{'_nof_seq'} +\n      $self->{'_avg_seq_len'} * $self->{'_nof_seq'}) / $tot_nof_seq;\n  }\n  # Update spectrum (and nof_seq, max_size, and increment nof_rep by 1)\n  $self->_import_spectrum($csp->{'_spectrum'});\n  # Update nof_rep\n  $self->{'_nof_rep'}--;\n  $self->{'_nof_rep'} += $csp->{'_nof_rep'};\n  # Update list of assembly objects used\n  push @{$self->{'_assembly'}}, @{$csp->{'_assembly'}}\n    if defined $csp->{'_assembly'};\n  return 1;\n}\n\n\n=head2 average\n\n  Title   : average\n  Usage   : my $avg_csp = $csp->average([$csp1, $csp2, $csp3]);\n  Function: Average one contig spectrum or the sum of several contig spectra by\n            the number of repetitions\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Tools::ContigSpectrum array reference\n            eff_asm_params\n\n\nsub average {\n  my ($self, $list) = @_;\n  # Sanity check\n  if ( ! ref $list || ! ref $list eq 'ARRAY') {\n    $self->throw(\"Average takes an array reference but got [\".ref($list).\"]\");\n  }\n  # New average contig spectrum object\n  my $avg = Bio::Assembly::Tools::ContigSpectrum->new;\n  $avg->{'_eff_asm_params'} = 1;\n  \n  # Cycle through contig spectra\n  my $tot_nof_rep = 0;\n  for my $csp (@$list) {\n    # Sanity check\n    if (not $csp->isa('Bio::Assembly::Tools::ContigSpectrum')) {\n      $csp->throw(\"Unable to process non Bio::Assembly::Tools::ContigSpectrum \".\n        \"object [\".ref($csp).\"]\");\n    }\n    # Import contig spectrum\n    $avg->add($csp);\n  }\n  \n  # Average sum of contig spectra by number of repetitions\n  for (my $q = 1 ; $q <= $avg->{'_max_size'} ; $q++) {\n    $avg->{'_spectrum'}{$q} /= $avg->{'_nof_rep'}\n      if (defined $avg->{'_spectrum'}{$q});\n  }\n  # Average number of sequences\n  $avg->{'_nof_seq'} /= $avg->{'_nof_rep'};\n  # Average number of overlaps\n  $avg->{'_nof_overlaps'} /= $avg->{'_nof_rep'};\n  \n  return $avg;\n}\n\n\n=head2 score\n\n  Title   : score\n  Usage   : my $score = $csp->score();\n  Function: Score a contig spectrum (or cross-contig spectrum) such that the\n             higher the number of contigs (or cross-contigs) and the larger their \n             size, the higher the score.\n             Let n   : total number of sequences\n                 c_q : number of contigs of size q\n                 q   : number of sequence in a contig\n             We define: score = n/(n-1) * (X - 1/n)\n                  where X = sum ( c_q * q^2 ) / n**2\n             The score ranges from 0 (singlets only) to 1 (a single large contig)\n             It is possible to specify a value for the number of sequences to\n              assume in the contig spectrum.\n  Returns : contig score\n  Args    : number of total sequences to assume [optional]\n\n\nsub score {\n  my ($self, $nof_seqs) = @_;\n  # Main\n  my $score = 0;\n  my $n = $self->nof_seq;\n  if ( $n > 0 ) {\n    # Contig spectrum info\n    my $q_max = $self->max_size;\n    my $spec  = $self->spectrum;\n    # Adjust number of 1-contigs\n    if ( $nof_seqs ) {\n      $spec->{'1'} += $nof_seqs - $n;\n      $n = $nof_seqs;\n    }\n    # Calculate X\n    for my $q ( 1 .. $q_max ) {\n      if ( $spec->{$q} ) {\n        my $c_q = $spec->{$q};\n        $score += $c_q * $q ** 2;\n      }\n    }\n    $score /= $n ** 2; \n  }\n  # Rescale X to obtain the score\n  $score = $n/($n-1) * ($score - 1/$n);\n  return $score;\n}\n\n\n=head2 _naive_assembler\n\n  Title   : _naive_assembler\n  Usage   : \n  Function: Determines the contig spectrum (hash representation) of a subset of\n            sequences from a mixed contig spectrum by \"reassembling\" the\n            specified sequences only based on their position in the contig. This\n            naive assembly only verifies that the minimum overlap length and\n            percentage identity are respected. There is no actual alignment done\n  Returns : contig spectrum hash reference\n  Args    : Bio::Assembly::Contig\n            sequence ID array reference\n            minimum overlap length (integer) [optional]\n            minimum percentage identity (integer) [optional]\n\n\nsub _naive_assembler {\n  my ($self, $contig, $seqlist, $min_overlap, $min_identity) = @_;\n  # Sanity checks\n  if ( ! ref $seqlist || ! ref($seqlist) eq 'ARRAY') {\n    $self->throw('Expecting an array reference. Got ['.ref($seqlist).\"] \\n\");\n  }\n  my $max = scalar @$seqlist;\n  $self->throw(\"Expecting at least 2 sequences as input for _naive_assembler\")\n    if ($max < 2);\n  # Assembly\n  my %spectrum = (1 => 0);\n  my %overlap_map;\n  my %has_overlap;\n  # Map what sequences overlap with what sequences\n  for (my $i = 0 ; $i < $max-1 ; $i++) {\n    # query sequence\n    my $qseqid = $$seqlist[$i];\n    my $qseq   = $contig->get_seq_by_name($qseqid);\n    my $is_singlet = 1;\n    for (my $j = $i+1 ; $j < $max ; $j++) {\n      # target sequence\n      my $tseqid = $$seqlist[$j];\n      my $tseq = $contig->get_seq_by_name($tseqid);\n      # try to align sequences\n      my ($aln, $overlap, $identity)\n        = $self->_overlap_alignment($contig, $qseq, $tseq, $min_overlap,\n        $min_identity);\n      # if there is no valid overlap, go to next sequence\n      next if ! defined $aln;\n      # the overlap is valid\n      $is_singlet = 0;\n      push @{$overlap_map{$qseqid}}, $tseqid;\n      $has_overlap{$tseqid} = 1;\n      $has_overlap{$qseqid} = 1;\n    }\n    # check if sequence is in previously seen overlap\n    if (exists $has_overlap{$qseqid}) {\n      $is_singlet = 0;\n    }\n    if ($is_singlet == 1) {\n      $spectrum{1}++;\n    }\n  }\n  # take care of last sequence\n  my $last_is_singlet = 1;\n  if (exists $has_overlap{$$seqlist[$max-1]}) {\n    $last_is_singlet = 0;\n  }\n  if ($last_is_singlet == 1) {\n    $spectrum{1}++;\n  }\n  # Parse overlap map\n  for my $seqid (@$seqlist) {\n    # list of sequences that should go in the contig\n    next if not exists $overlap_map{$seqid};\n    my @overlist = @{$overlap_map{$seqid}};\n    for (my $j = 0 ; $j < scalar(@overlist) ; $j++) {\n      my $otherseqid = $overlist[$j];\n      if (exists $overlap_map{$otherseqid}) {\n        push @overlist, @{$overlap_map{$otherseqid}};\n        delete $overlap_map{$otherseqid};\n      }\n    }\n    # remove duplicates from list\n    @overlist = sort @overlist;\n    for (my $j = 0 ; $j < scalar(@overlist)-1 ; $j++) {\n      if ( $overlist[$j] eq $overlist[$j+1] ) {\n        splice @overlist, $j, 1;\n        $j--;\n      }\n    }\n    # update spectrum with size of contig\n    my $qsize = scalar(@overlist) + 1;\n    if (defined $spectrum{$qsize}) {\n      $spectrum{$qsize}++;\n    } else {\n      $spectrum{$qsize} = 1;\n    }\n  }\n  return \\%spectrum;\n}\n\n\n=head2 _new_from_assembly\n\n  Title   : _new_from_assembly\n  Usage   : \n  Function: Creates a new contig spectrum object based solely on the result of \n            an assembly\n  Returns : Bio::Assembly::Tools::ContigSpectrum\n  Args    : Bio::Assembly::Scaffold\n\n\nsub _new_from_assembly {\n  # Create new contig spectrum object based purely on what we can get from the\n  # assembly object\n  my ($self, $assemblyobj) = @_;\n  my $csp = Bio::Assembly::Tools::ContigSpectrum->new();\n  # 1: Set id\n  $csp->{'_id'} = $assemblyobj->id;\n  # 2: Set overlap statistics: nof_overlaps, min_overlap, avg_overlap,\n  #  min_identity and avg_identity\n  $csp->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  if ($csp->{'_eff_asm_params'} > 0) {\n     my ($nover, $minl, $avgl, $minid, $avgid)\n       = $csp->_get_overlap_stats($assemblyobj);\n     $csp->{'_min_overlap'}  = $minl;\n     $csp->{'_min_identity'} = $minid;\n     $csp->{'_avg_overlap'}  = $avgl;\n     $csp->{'_avg_identity'} = $avgid;\n     $csp->{'_nof_overlaps'} = $nover;\n  }\n  # 3: Set sequence statistics: nof_seq and avg_seq_len\n  my ($nseq, $avgseql) = $self->_get_seq_stats($assemblyobj);\n  $csp->{'_avg_seq_len'} = $avgseql;\n  $csp->{'_nof_seq'}     = $nseq;\n  # 4: Set the spectrum: spectrum and max_size\n  for my $contigobj ($assemblyobj->all_contigs) {\n    my $size = $contigobj->num_sequences;\n    if (defined $csp->{'_spectrum'}{$size}) {\n      $csp->{'_spectrum'}{$size}++;\n    } else {\n      $csp->{'_spectrum'}{$size} = 1;\n    }\n    $csp->{'_max_size'} = $size if $size > $csp->{'_max_size'};\n  }\n  my $nof_singlets = $assemblyobj->get_nof_singlets();\n  if (defined $nof_singlets) {\n    $csp->{'_spectrum'}{1} += $nof_singlets;\n    $csp->{'_max_size'} = 1 if $nof_singlets >= 1 && $csp->{'_max_size'} < 1;\n  }\n  # 5: Set list of assembly objects used\n  push @{$csp->{'_assembly'}}, $assemblyobj;\n  # 6: Set number of repetitions\n  $csp->{'_nof_rep'} = 1;\n  return $csp;\n}\n\n\n\n=head2 _new_dissolved_csp\n\n  Title   : \n  Usage   : create a dissolved contig spectrum object\n  Function: \n  Returns : \n  Args    : \n\n\n\nsub _new_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity checks on the mixed contig spectrum\n\n  # min_overlap and min_identity must be specified if there are some overlaps\n  # in the mixed contig\n  unless ($mixed_csp->{'_nof_overlaps'} == 0) {\n    unless ( defined $self->{'_min_overlap'} || \n      defined $mixed_csp->{'_min_overlap'} ) {\n      $self->throw(\"min_overlap must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum to dissolve a contig\");\n    }\n    unless ( defined $self->{'_min_identity'} ||\n      defined $mixed_csp->{'_min_identity'} ) {\n      $self->throw(\"min_identity must be defined in the dissolved contig spectrum\".\n        \" or mixed contig spectrum\");\n    }\n  }\n  \n  # there must be at least one assembly in mixed contig spectrum\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one\n    assembly\");\n  }\n\n  # New dissolved contig spectrum object\n  my $dissolved = Bio::Assembly::Tools::ContigSpectrum->new();\n  \n  # take parent attributes if existent or mixed attributes otherwise\n  if ($self->{'_eff_asm_params'}) {\n    $dissolved->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $dissolved->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $dissolved->{'_min_overlap'}, $dissolved->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Dissolve each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Dissolve this assembly for the given sequences\n    my %asm_spectrum = (1 => 0);\n    my %good_seqs;\n    # For each contig\n    for my $contig ($assembly->all_contigs) {\n      # Get good sequences\n      my @contig_seqs;\n      for my $seq ($contig->each_seq) {\n        my $seq_id = $seq->id;\n        # get sequence origin\n        next unless $seq_id =~ m/^$seq_header\\|/;\n        # add it to hash\n        push @contig_seqs, $seq_id;\n        $good_seqs{$seq_id} = 1;\n      }\n      # Update spectrum\n      my $size = scalar @contig_seqs;\n      if ($size == 0) {\n        next;\n      } elsif ($size == 1) {\n        $asm_spectrum{1}++;\n      } elsif ($size > 1) {\n        # Reassemble good sequences\n        my $contig_spectrum = $dissolved->_naive_assembler(\n          $contig, \\@contig_seqs, $dissolved->{'_min_overlap'},\n          $dissolved->{'_min_identity'});\n        # update spectrum\n        for my $qsize (keys %$contig_spectrum) {\n          $asm_spectrum{$qsize} += $$contig_spectrum{$qsize};\n        }\n      } else {\n        $self->throw(\"The size is not valid... how could that happen?\");\n      }\n    }\n    # For each singlet\n    for my $singlet ($assembly->all_singlets) {\n      my $seq_id = $singlet->seqref->id;\n      # get sequence origin\n      next unless $seq_id =~ m/^$seq_header\\|/;\n      # add it to hash\n      $good_seqs{$seq_id} = 1;\n      # update spectrum\n      $asm_spectrum{1}++;\n    }\n    # Update spectrum\n    $dissolved->_import_spectrum(\\%asm_spectrum);\n    # Update nof_rep\n    $dissolved->{'_nof_rep'}--;\n    $dissolved->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n\n    # Get sequence stats\n    my ($nseq, $avgseql) = $dissolved->_get_seq_stats($assembly, \\%good_seqs);\n    $dissolved->{'_avg_seq_len'} = $avgseql;\n    $dissolved->{'_nof_seq'}     = $nseq;\n  \n    # Get eff_asm_param for these sequences\n    if ($dissolved->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $dissolved->_get_overlap_stats($assembly, \\%good_seqs);\n      $dissolved->{'_min_overlap'}  = $minl;\n      $dissolved->{'_min_identity'} = $minid;\n      $dissolved->{'_avg_overlap'}  = $avgl;\n      $dissolved->{'_avg_identity'} = $avgid;\n      $dissolved->{'_nof_overlaps'} = $nover;\n    }\n\n  }\n  return $dissolved;\n}\n\n\n=head2 _new_cross_csp\n\n  Title   : \n  Usage   : \n  Function: create a cross contig spectrum object\n  Returns : \n  Args    : \n\n\n\nsub _new_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check on the mixed contig spectrum\n  # There must be at least one assembly\n  if (!defined $mixed_csp->{'_assembly'} ||\n      scalar @{$mixed_csp->{'_assembly'}} < 1) {\n    $self->throw(\"The mixed contig spectrum must be based on at least one \".\n    \"assembly.\");\n  }\n  \n  # New dissolved contig spectrum object\n  my $cross = Bio::Assembly::Tools::ContigSpectrum->new();\n  my %spectrum = (1 => 0);\n  \n  # Take parent or mixed attributes\n  if ($self->{'_eff_asm_params'}) {\n    $cross->{'_eff_asm_params'} = $self->{'_eff_asm_params'};\n  } else {\n    $cross->{'_eff_asm_params'} = $mixed_csp->{'_eff_asm_params'};\n  }\n  if ($self->{'_min_overlap'} && $self->{'_min_identity'}) {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $self->{'_min_overlap'}, $self->{'_min_identity'} );\n  } else {\n    ( $cross->{'_min_overlap'}, $cross->{'_min_identity'} ) = \n      ( $mixed_csp->{'_min_overlap'}, $mixed_csp->{'_min_identity'} );\n  }\n  \n  # Get cross contig spectrum for each assembly\n  for my $assembly (@{$mixed_csp->{'_assembly'}}) {\n    # Go through contigs and skip the pure ones\n    my %good_seqs;\n    for my $contig ($assembly->all_contigs) {\n      # Get origins\n      my @seq_origins;\n      my @seq_ids;\n      for my $seq ($contig->each_seq) {\n        # current sequence origin\n        my $seq_id = $seq->id;\n        $seq_id =~ m/^(.+)\\|/;\n        my $seq_header = $1;\n        $self->warn(\"Sequence $seq_id does not seem to have a header. Skipping\".\n          \" it...\") if not defined $seq_header;\n        $seq_header ||= '';\n        push @seq_origins, $seq_header;\n        push @seq_ids, $seq_id;\n      }\n      my $qsize = scalar(@seq_ids);\n      my @origins = sort { $a cmp $b } @seq_origins;\n      my $size = scalar(@origins);\n      for (my $i = 1 ; $i < $size ; $i++) {\n        if ($origins[$i] eq $origins[$i-1]) {\n          splice @origins, $i, 1;\n          $i--;\n          $size--;\n        }\n      }\n      # Update cross-contig number in spectrum\n      if ($size > 1) { # cross-contig detected\n        # update good sequences\n        for my $seq_id (@seq_ids) {\n          $good_seqs{$seq_id} = 1;\n        }\n        # update number of cross q-contigs in spectrum\n        if (defined $spectrum{$qsize}) {\n          $spectrum{$qsize}++;\n        } else {\n          $spectrum{$qsize} = 1;\n        }\n      }\n      # Update number of cross 1-contigs\n      if ($size > 1) { # cross-contig detected\n        for my $origin (@origins) {\n          # sequences to use\n          my @ids;\n          for (my $i = 0 ; $i < $qsize ; $i++) {\n            my $seq_origin = $seq_origins[$i];\n            my $seq_id = $seq_ids[$i];\n            push @ids, $seq_id if $seq_origin eq $origin;\n          }\n          if (scalar @ids == 1) {\n            $spectrum{1}++;\n          } elsif (scalar @ids > 1) {\n            my $contig_spectrum = $cross->_naive_assembler(\n              $contig, \\@ids, $cross->{'_min_overlap'},\n              $cross->{'_min_identity'});\n            $spectrum{1} += $$contig_spectrum{1};\n          } else {\n            $self->throw(\"The size is <= 0. How could such a thing happen?\");\n          }\n        }\n      }\n    }\n    # Get sequence stats\n    my ($nseq, $avgseql) = $cross->_get_seq_stats($assembly, \\%good_seqs);\n    $cross->{'_avg_seq_len'} = $avgseql;\n    $cross->{'_nof_seq'}     = $nseq;\n    # Get eff_asm_param for these sequences\n    if ($cross->{'_eff_asm_params'} > 0) {\n      my ($nover, $minl, $avgl, $minid, $avgid)\n        = $cross->_get_overlap_stats($assembly, \\%good_seqs);\n      $cross->{'_min_overlap'}  = $minl;\n      $cross->{'_min_identity'} = $minid;\n      $cross->{'_avg_overlap'}  = $avgl;\n      $cross->{'_avg_identity'} = $avgid;\n      $cross->{'_nof_overlaps'} = $nover;\n    }\n  }\n  \n  $cross->_import_spectrum(\\%spectrum);\n  # Update nof_rep\n  $cross->{'_nof_rep'}--;\n  $cross->{'_nof_rep'} += $mixed_csp->{'_nof_rep'};\n  \n  return $cross;\n}\n\n=head2 _import_assembly\n\n  Title   : _import_assembly\n  Usage   : $csp->_import_assembly($assemblyobj);\n  Function: Update a contig spectrum object based on an assembly object\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Scaffold assembly object\n\n\nsub _import_assembly {\n  my ($self, $assemblyobj) = @_;\n  # Sanity check\n  if( !ref $assemblyobj || ! $assemblyobj->isa('Bio::Assembly::ScaffoldI') ) {\n        $self->throw(\"Unable to process non Bio::Assembly::ScaffoldI assembly \".\n        \"object [\".ref($assemblyobj).\"]\");\n  }\n  # Create new object from assembly\n  my $csp = $self->_new_from_assembly($assemblyobj);\n  # Update current contig spectrum object with new one\n  $self->add($csp);\n  return 1;\n}\n\n\n=head2 _import_spectrum\n\n  Title   : _import_spectrum\n  Usage   : $csp->_import_spectrum({ 1 => 90 , 2 => 3 , 4 => 1 })\n  Function: update a contig spectrum object based on a contig spectrum\n            represented as a hash (key: contig size, value: number of contigs of\n            this size)\n  Returns : 1 for success, 0 for error\n  Args    : contig spectrum as a hash reference\n\n\nsub _import_spectrum {\n  my ($self, $spectrum) = @_;\n  # Sanity check\n  if( ! ref $spectrum || ! ref $spectrum eq 'HASH') {\n    $self->throw(\"Spectrum should be a hash reference, but it is [\".\n      ref($spectrum).\"]\");\n  }\n  \n  # Update the spectrum (+ nof_rep, max_size and nof_seq)\n  for my $size (keys %$spectrum) {\n    # Get the number of contigs of different size\n    if (defined $self->{'_spectrum'}{$size}) {\n      $self->{'_spectrum'}{$size} += $$spectrum{$size};\n    } else {\n      $self->{'_spectrum'}{$size} = $$spectrum{$size};\n    }\n    # Update nof_seq\n    $self->{'_nof_seq'} += $size * $$spectrum{$size};\n    # Update max_size\n    $self->{'_max_size'} = $size if $size > $self->{'_max_size'};\n  }\n  \n  # If the contig spectrum has only zero 1-contigs, max_size is zero\n  $self->{'_max_size'} = 0 if scalar keys %{$self->{'_spectrum'}} == 1 &&\n    defined $self->{'_spectrum'}{'1'} && $self->{'_spectrum'}{'1'} == 0;\n  \n  # Update nof_rep\n  $self->{'_nof_rep'}++;\n  return 1;\n}\n\n=head2 _import_dissolved_csp\n\n  Title   : _import_dissolved_csp\n  Usage   : $csp->_import_dissolved_csp($mixed_csp, $seq_header);\n  Function: Update a contig spectrum object by dissolving a mixed contig\n            spectrum based on the header of the sequences\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum\n            sequence header string\n\n\nsub _import_dissolved_csp {\n  my ($self, $mixed_csp, $seq_header) = @_;\n  # Sanity check\n  if (not defined $mixed_csp || not defined $seq_header) {\n    $self->throw(\"Expecting a contig spectrum reference and sequence header as\".\n    \" arguments\");\n  }\n  # Create new object from assembly\n  my $dissolved_csp = $self->_new_dissolved_csp($mixed_csp, $seq_header);\n  # Update current contig spectrum object with new one\n  $self->add($dissolved_csp);\n  return 1;\n}\n\n\n=head2 _import_cross_csp\n\n  Title   : _import_cross_csp\n  Usage   : $csp->_import_cross_csp($mixed_csp);\n  Function: Update a contig spectrum object by calculating the cross contig\n            spectrum based on a mixed contig spectrum\n  Returns : 1 for success, 0 for error\n  Args    : Bio::Assembly::Tools::ContigSpectrum\n\n\nsub _import_cross_csp {\n  my ($self, $mixed_csp) = @_;\n  # Sanity check\n  if (not defined $mixed_csp) {\n    $self->throw(\"Expecting a contig spectrum reference as argument\");\n  }\n\n  # Create new object from assembly\n  my $cross_csp = $self->_new_cross_csp($mixed_csp);\n\n  # Update current contig spectrum object with new one\n  $self->add($cross_csp);\n\n  return 1;\n}\n\n\n=head2 _get_seq_stats\n\n  Title   : _get_seq_stats\n  Usage   : my $seqlength = $csp->_get_seq_stats($assemblyobj);\n  Function: Get sequence statistics from an assembly:\n              number of sequences, average sequence length\n  Returns : number of sequences (integer)\n            average sequence length (decimal)\n  Args    : assembly object reference\n            hash reference with the IDs of the sequences to consider [optional]\n\n\nsub _get_seq_stats {\n  my ($self, $assemblyobj, $seq_hash) = @_;\n\n  # sanity check\n  $self->throw(\"Must provide a Bio::Assembly::Scaffold object\")\n    if (!defined $assemblyobj || !$assemblyobj->isa(\"Bio::Assembly::ScaffoldI\"));\n  $self->throw(\"Expecting a hash reference. Got [\".ref($seq_hash).\"]\")\n    if (defined $seq_hash && ! ref($seq_hash) eq 'HASH');\n\n  my $avg_seq_len = 0;\n  my $nof_seq = 0;\n  for my $contigobj ($assemblyobj->all_contigs) {\n    for my $seqobj ($contigobj->each_seq) {\n      my $seq_id = $seqobj->id;\n      next if defined $seq_hash && !defined $$seq_hash{$seq_id};\n      $nof_seq++;\n      my $seq_string = $seqobj->seq;\n      $seq_string =~ s/-//g;\n      $avg_seq_len += length($seq_string);\n    }\n  }\n  for my $singletobj ($assemblyobj->all_singlets) {\n    my $seq_id = $singletobj->seqref->id;\n    next if defined $seq_hash && !defined $$seq_hash{$seq_id};\n    $nof_seq++;\n    my $seq_string = $singletobj->seqref->seq;\n    $seq_string =~ s/-//g;\n    $avg_seq_len += length($seq_string);\n  }\n  $avg_seq_len /= $nof_seq unless $nof_seq == 0;\n  return $nof_seq, $avg_seq_len;\n}\n\n\n=head2 _get_overlap_stats\n\n  Title   : _get_overlap_stats\n  Usage   : my ($minlength, $min_identity, $avglength, $avgidentity)\n              = $csp->_get_overlap_stats($assemblyobj);\n  Function: Get statistics about pairwise overlaps in contigs of an assembly\n  Returns : number of overlaps\n            minimum overlap length\n            average overlap length\n            minimum identity percent\n            average identity percent\n  Args    : assembly object reference\n            hash reference with the IDs of the sequences to consider [optional]\n\n\nsub _get_overlap_stats {\n  my ($self, $assembly_obj, $seq_hash) = @_;\n\n  # sanity check\n  $self->throw(\"Must provide a Bio::Assembly::ScaffoldI object\")\n    if (!defined $assembly_obj || !$assembly_obj->isa(\"Bio::Assembly::ScaffoldI\"));\n  $self->throw(\"Expecting a hash reference. Got [\".ref($seq_hash).\"]\")\n    if (defined $seq_hash && ! ref($seq_hash) eq 'HASH');\n  \n  my $matchdef = $self->{'_eff_asm_params'};\n  my ($min_length, $avg_length, $min_identity, $avg_identity, $nof_overlaps)\n    = (undef, 0, undef, 0, 0);\n  \n  # Look at all the contigs (and I really mean no singlets!)\n  for my $contig_obj ($assembly_obj->all_contigs) {\n    my $nof_seq = 0;\n\n    # Look at best overlap possible with previous sequences in contig\n    my @all_seq_objs = $contig_obj->each_seq;\n    # sequences should be ordered by starting position\n    for (my $i = 0 ; $i < scalar(@all_seq_objs) ; $i++) {\n      my $seq_obj    = $all_seq_objs[$i];\n      my $seq_id    = $seq_obj->id;\n      \n      # skip this sequence if not in list of wanted sequences\n      next if defined $seq_hash && !defined $$seq_hash{$seq_id};\n      $nof_seq++;\n      \n      # skip the first sequence (no other sequence to compare against)\n      next if $nof_seq <= 1;\n      \n      # what is the best previous sequence to align to?\n      my $stats = Bio::Align::PairwiseStatistics->new;\n      my $target_obj;\n      my $target_id;\n      my $best_score;\n      my $best_length;\n      my $best_identity;\n      \n      for (my $j = $i-1 ; $j >= 0 ; $j--) {\n        my $tmp_target_obj = $all_seq_objs[$j];\n        my $tmp_target_id = $tmp_target_obj->id;\n        \n        # skip this sequence if not in list of wanted sequences\n        next if defined $seq_hash && !defined $$seq_hash{$tmp_target_id};\n        \n        # find overlap with that sequence\n        my ($aln_obj, $tmp_length, $tmp_identity)\n          = $self->_overlap_alignment($contig_obj, $seq_obj, $tmp_target_obj);\n        next if ! defined $aln_obj; # there was no sequence overlap\n        my $tmp_score = $stats->score_nuc($aln_obj);\n        \n        # update score and best sequence for overlap\n        if (!defined $best_score || $best_score < $tmp_score) {\n          $best_score    = $tmp_score;\n          $best_length   = $tmp_length;\n          $best_identity = $tmp_identity;\n          $target_obj    = $tmp_target_obj;\n          $target_id     = $tmp_target_id;\n        }\n      }\n      \n      # Update our overlap statistics\n      if (defined $best_score) {\n        $avg_length += $best_length;\n        $avg_identity += $best_identity;\n        $min_length = $best_length if ! defined $min_length ||\n          $best_length < $min_length;\n        $min_identity = $best_identity if ! defined $min_identity ||\n          $best_identity < $min_identity;\n        $nof_overlaps++;\n      }\n    }\n  }\n  \n  # averaging\n  unless ($nof_overlaps == 0) {\n    $avg_length /= $nof_overlaps;\n    $avg_identity /= $nof_overlaps;\n  }\n  \n  return $nof_overlaps, $min_length, $avg_length, $min_identity, $avg_identity;\n}\n\n\n=head2 _overlap_alignment\n\n  Title   : _overlap_alignment\n  Usage   : \n  Function: Produce an alignment of the overlapping section of two sequences of\n            a contig. Minimum overlap length and percentage identity can be\n            specified. Return undef if the sequences do not overlap or do not\n            meet the minimum overlap criteria. \n  Return  : Bio::SimpleAlign object reference\n            alignment overlap length\n            alignment overlap identity\n  Args    : Bio::Assembly::Contig object reference\n            Bio::LocatableSeq contig sequence 1\n            Bio::LocatableSeq contig sequence 2\n            minium overlap length [optional]\n            minimum overlap percentage identity [optional]","parameters":[{"label":"$self"},{"label":"$contig"},{"label":"$qseq"},{"label":"$tseq"},{"label":"$min_overlap"},{"label":"$min_identity"}],"label":"_overlap_alignment($self,$contig,$qseq,$tseq,$min_overlap,$min_identity)"},"range":{"end":{"character":9999,"line":1602},"start":{"line":1539,"character":0}},"kind":12,"line":1539},{"kind":12,"containerName":"SimpleAlign","name":"Bio","line":1582},{"line":1583,"name":"Bio","containerName":"LocatableSeq","kind":12},{"kind":12,"containerName":"LocatableSeq","name":"Bio","line":1590}]}