A network of specialised and complementary platforms

The FRANCE GENOMIQUE infrastructure brings together the majority of sequencing and/or bioinformatics platforms in France:

  • the national platforms of Genoscope and CNRGH in Evry, provide sequencing, genotyping and bioinformatics capacities to enable the completion of very large-scale projects,
  • Sequencing platforms each with their own specific expertise and technologies as well as ad hoc bioinformatics tools
  • the platforms associated with FRANCE GENOMIQUE,
  • the CEA's TGCC  where data storage and processing spaces are allocated to FRANCE GENOMIQUE partners.
  • FRANCE GENOMIQUE has also recognised a number of platforms that are members of the IFB (Institut Français de Bioinformatique) as associated bioinformatics platforms, working in collaboration with the FRANCE GENOMIQUE sequencing platforms and the beneficiaries of these platforms.
France Génomique Plateforms
  • National sequencing plateforms
  • Local sequencing plateforms
  • Bio-informatic plateforms
  • Associated plateforms
  • TGCC : Très Grand Centre de Calcul

Very high throughput and 3rd generation sequencing instruments

FRANCE GENOMIQUE is equipped with the most efficient and innovative technologies dedicated to sequencing. The fleet is evolving very rapidly with very high throughput (NovaSeq 6000) and so-called 3rd generation sequencing instruments (unique molecule technology).

FRANCE GENOMIQUE is thus able to respond to all requests for sequencing projects

An high performance computation infrastructure

The CEA “Très Grand Centre de Calcul” (TGCC) is an infrastructure dedicated to high-performance computing, capable of hosting petaflop-scale supercomputers and designed on the basis of a data-driven architecture. Within the TGCC, the CCRT has an extension dedicated to the users of the FRANCE GENOMIQUE project.

This e-infrastructure for data storage and processing, implemented by the CEA/DIF teams, allows GENOMIC FRANCE users to benefit from a medium-term storage space (scale: scientific projects lasting several years) of several petabytes, connected to several thousand scalar computing cores by a high-performance interconnection. As the volumes of data to be stored and processed increase exponentially, it is also designed to be scalable, with the objective of meeting all the challenges of genomics in the future.

Equipment and capacity

The set up dedicated to FRANCE GENOMIQUE is composed of :

  • 180 dual processor nodes (Intel Sandy Bridge E5-2680, 2.7 GHz, 8 cores) with 128 GB of memory per node, i.e a total of 2,880 cores (Bull),
  • 2 Bullx S6410 very large memory systems with 2 TB of memory,
  • 9 hybrid blades, equipped with nvidia Kepler GPUs.

Data hosting is achieved through the following storage configuration:

  • Medium-term storage with a global file system of 5 PB, including 2 PB on disk (hierarchical storage system Lustre + IBM HPSS),
  • Archive system for preliminary data.

Expertise and support

The CEA/DIF teams have developed internationally recognized expertise and competence both in the management of very large volumes of data (contribution to Opensource developments, management of EOFS, etc.) and in the definition and management of very large data centres. Assistance and support teams are at hand to help users make the most of the centre resources.

A dedicated application support team is provided by the national platforms (CEA) on behalf of FRANCE GENOMIQUE.

Main achievements

In order to characterize a set of 83 protein families without known functions and including some 60,000 sequences, Genoscope researchers conducted a modelling campaign on the CCRT Titane supercomputer. This phase, which would have required 280,000 hours of computation, could be performed in only 70 hours on 4,000 processors. From the results, the researchers created a catalogue of specific structural signatures for each of the families studied. This catalogue will provide biochemists with valuable information to discover new enzymatic activities.

The Genoscope has been using the TGCC/CCRT calculation resources for several years now, particularly via the DARI calls for projects. In this context, the TARA OCEANS project has benefited from more than 3.5 million hours of calculations to study the diversity of marine organisms. To do this, various sequence analysis tools have been ported to this infrastructure: BLAST, BLAT, InterProScan & CDDsearch. Specific codes have been designed and deployed to adapt these tools to the technical constraints of operating CCGT machines (massive data parallelization, execution control, error recovery, short unit jobs).

For further information

Web site : www-hpc.cea.fr/, www-ccrt.cea.fr

Plateform manager : Pierre Leca

CEA DAM-île de France
Bruyères-le-Châtel
91297 Arpajon Cedex

Contact: e-infrastructure@france-genomique.org

Illumina offers high and very high throughput sequencing.

After clonal amplification of short DNA fragments, sequencing by synthesis (SBS) begins: each base emits a unique fluorescence signal when added to the strand being synthesized. The detection of the signal at each incorporation determines the DNA sequence.

MiniSeq

Run time
20 hours
Maximum Output
7.5 Gb
Maximum Reads per run
44-55 millions
Maximum Reads length
2X150 bp
Platform equipped
MGX
Biomics

MiSeq

Run time
4-55 hours
Maximum Output
13.2-15 Gb
Maximum Reads per run
40-50 millions
Maximum Reads length
2X300 bp
Plateform equipped
CNRGH
ICGex
Biomics
GeT-PlaGe
Genoscope
LIGAN
GenoA
ECOGENO
PGTB
Genom’IC
ProfileXpert
iGenSeq

NextSeq 500

Run time
29 hours
Maximum Output
100-120 Gb
Maximum Reads per run
Up to 800 millions
Maximum Reads length
2X150 bp
Plateform equipped
CNRGH
GenomiqueENS
Biomics
TGML
PSI2BC
LIGAN
GENOMAX
POPS
Genom’IC
ProfileXpert

HiSeq 2500

Run time
< 1-3.5 days
Maximum Output
250-300 Gb
Maximum Reads per run
4 billion
Maximum Reads length
2X125 bp
Plateform equipped
ProfileXpert

NextSeq 2000

Run time
11- 48h
Maximum Output
350 Gb
Maximum Reads per run
1.2 billion
Maximum Reads length
2X150 bp
Plateform equipped
GenomiqueENS
Biomics
GenomEast
UCAGenomiX
TGML
PSI2BC
GENOMAX
iGenSeq
PTGB
Genom’IC

iSeq 100

Run time
9.5- 19h
Maximum Output
1.2 Gb
Maximum Reads per run
4 million
Maximum Reads length
2X150 bp
Plateform equipped
Biomics
GenomEast
LIGAN
PTGB

NovaSeq 6000

Run time
13 – 44h
Maximum Output
4800-6000 Gb
Maximum Reads per run
20 billion
Maximum Reads length
2X250 bp
Plateform equipped
CNRGH
ICGex
MGX
GeT-PlaGe
Genoscope
LIGAN
GenoA
Go@L
iGenSeq

NovaSeq X series

Run time
13 – 48hrs
Maximum Output
16 Tb
Maximum Reads per run
52 billion
Maximum Reads length
2X150 bp
Platform equipped
LIGAN

MGI’s DNA sequencing instruments utilize the core technology called DNBSEQTM.

DNBs (DNA nanoballs) are pumped with by the fluidics system and loaded onto a Patterned Array chip.

Sequencing primer is then added and hybridized to the adaptor region of the DNB. The sequencing reaction starts by pumping sequencing reagents containing fluorescently labeled dNTP probes and DNA polymerase. Images are taken after the fluorescently labeled probes on the DNB are excited with lasers. The images are then converted into a digital signal. This information is then used to determine the DNA sequence of the sample.

DNBSEQ-G400

Run Time
37 hours
Maximum Output
1440 Gb per run
Max Reads per run
1500-1800 millions
Max read length
PE300
Platform equipped
CNRGH
Biomics
GenomEast
Genoscope
EcogenO
ProfileXpert

DNBSEQ-T7

Run Time
30 hours
Maximum Output
 6 Tb per run
Max Reads per run
5000 millions
Max read length
PE150
Platform equipped

Pacific Biosciences offers through the Sequel system, a long reads sequencing of unique molecules, in real time, without synthesis and without amplification The sequencing technique is called SMRT for Single Molecule Real Time sequencing.

Sequel

Smart Cell specifications :

Run time
1 day
Maximum Output
20 Gb
Average read length
10 kb
Number of HiFi reads
up to 500 000
Platform equipped
Institut Pasteur

Sequel II

Smart Cell specifications :

Run time
30 hours
Maximum Output
24 Gb
Average read length
15 kb
Number of HiFi reads
4 M and 99,9% accuracy on average
Platform equipped
GeT-PlaGe
Gentyane
ICGex

Revio

Smart Cell specifications :

Run time
24 hours
Maximum Output
360 Gb
Average read length
15-20 kb
Number of HiFi reads
25 M with 99,95% accuracy on average
Platform equipped
Gentyane

Oxford Nanopore Technologies offers a technology for real-time DNA and RNA sequencing without synthesis and amplification, where the sequencing is carried out through a nanopore subjected to an electric field.

The ionic current differs according to the base A, T, G or C that blocks the nanopore. The identification of the sequence is done by measuring the evolution of the ionic current passing through the nanopore

MinION

Maximum Output
10-30 Gb per flow cell
Maximum Reads length
kbs to hundred kbs
Platform equipped
CNRGH
GenomiqueENS
Biomics
MGX
UCAGenomiX
Genoscope
PSI2BC
EcogenO
Go@L
PTGB
POPS
Genom’IC
ProfileXpert

GridION

Maximum Output
30 Gb per flow cell
150 Gb for 5 flow cell
Maximum Reads length
kbs to hundred kbs
Platform equipped
Biomics
GeT-PlaGe
Genoscope
PSI2BC
PTGB

PromethION

Maximum Output
158 Gb per flow cell
7.6 Tb for 48 flow cell
Maximum Reads length
kbs to hundred kbs
Platform equipped
CNRGH
GenomiqueENS
UCAGenomiX
GeT-PlaGe
Genoscope
ProfileXpert

The company 10x Genomics has developed a machine that partially solves the hurdles of short-reads synthetic sequencing (SBS): Chromium.

This system uses an emulsion PCR method. The purpose of creating the emulsion is to encapsulate in a drop of liquid containing all reagents needed, few high molecular weight DNA molecules in the case of a “long synthetic reads sequencing”, or a cell in the case of “single cell sequencing”.

This method allows short reads assemblies (Illumina) to be made via a single barcoding system, facilitating phasing analysis and characterization of chromosome structures. It also allows to study the transcriptome by RNAseq of several thousand unique cells in parallel.

After preparing the libraries, the sequencing is performed on Illumina machines.

10xgenomics Chromium

Chromium Controller

Platform equipped
CNRGH
ICGex
GenomiqueENS
GenomEast
UCAGenomiX
MGX
TGML
PSI2BC
LIGAN
GenoA
GENOMAX
Go@L
iGenSeq
POPS
Genom’IC
ProfileXpert

Chromium iX

Platform equipped
ICGex
GenomEast

Chromium Connect

Platform equipped
GENOMAX

VisiumCytAssist

Platform equipped
GenomEast

Our expertise

Our equipment

Submit a project