De novo Sequencing
De novo sequencing involves obtaining the sequence of an organism for which there is no reference sequence available in the databases. Thus it involves assembling sequence data from an unknown genome. The bioinformatics tools currently available for de novo assembly use overlapping sequences, for example, to construct a limited number of contigs of the largest size possible. This process is facilitated by the production of long reads but also by paired-end reads and mate pair reads. Paired-end libraries allow sequencing of both ends of short fragments (< 1 kb in size) whereas sequencing of mate pair libraries produces paired-end reads of long fragments (several kb in size).
Long reads (750 bp) generated with Roche 454 technology could in theory allow a sufficient level of coverage, however the high cost of producing long reads would limit their use, especially for large eukaryote genomes. Paired-end short reads alone, obtained mainly with Illumina technology, will not suffice for a de novo project as their length does not allow coverage of long repeat regions. However, mate pair sequencing is very useful for de novo, as it allows a reduction in the regions of the genome with zero-coverage and allows contigs to be linked together to create scaffolds.
The protocols for preparing paired-end libraries and mate pair libraries differ :
Paired-end libraries for Illumina sequencing are prepared by fragmenting the genomic DNA mechanically (Covaris, Bioruptor) or enzymatically (tagmentase, Nextera technology) to sizes of less than 0.8 kb.
Adapters are then added to allow both ends of the DNA fragment to be sequenced.
Figure 1 : Construction of paired-end libraries using Nextera technology
Nextera technology uses tagmentase, a modified version of transposase, to fragment the DNA and simultaneously add the Illumina sequencing adapters (Figure 1). A further step of size selection of fragments by agarose gel electrophoresis can enable construction of a library with precise insert size, which may be necessary for assembly of certain genomic regions.
Mate pair library preparation is designed to allow pair-end sequencing of both ends of a fragment of an initial size of several kilobases. Figure 2 shows the preparation of a mate pair library for Illumina sequencing. Since early 2013, Nextera has been the technology of choice for preparation of mate pair libraries. The enzyme tagmentase is used to fragment the DNA into sizes of 2 -15 kb, and a circularisation adapter is attached. The fragments are migrated on agarose gel to allow size-selection and are then circularised to join together the two ends of each fragment. A biotin label is added at the junction and the circularised constructions are broken down mechanically into fragments of 200-700 bp. The fragments containing the ends are recovered through a selection procedure using streptavidin magnetic beads, which allows the fragments containing the biotin to be selected. The selected fragments are then subjected to standard methods of library construction.
Figure 2 : Preparation of mate pair libraries using Nextera technology
The platforms at the CEA Institute of Genomics/Genoscope and the Pasteur Institute (Paris), as well as those in Toulouse and Montpellier, make available their expertise and know-how in de novo sequencing to the scientific community for selected projects submitted via the France Génomique web portal. Generally, for large genomes it is preferable to combine paired-end and mate pair libraries. For microbial genomes, an optical map can be constructed in addition to the sequencing, to improve the quality of genome assembly.
Assembly : Set of sequences with the best possible approximation to a genome sequence
Contig : Sequence without gaps, created by assembling the overlapping short sequences generated by the sequencer
Scaffold : Sequence with gaps made up of several ordered contigs
Toutes les versions de cet article : [English] [français]