EMERGEN-BioInfo The digital platform for the French SARS-CoV-2 genomic surveillance and research program

authors

  • Denecker Thomas
  • Messak Imane
  • Mohamed Anliat
  • Antoinat Chiara
  • Le Bars Arthur
  • Tonazzolli Arianna
  • Demaille Benjamin
  • Sand Olivier
  • Gerbes François
  • Rosnet Thomas
  • Bouri Laurent
  • Seiler Julien
  • Charrière Nicole
  • Antoniewski Christophe
  • Bozorgan Anne
  • Castro Alvarez Javier
  • Sudour Jeanne
  • Le Strat Yann
  • Coignard Bruno
  • Amzert Abdelkader
  • Gharbi Nebras
  • Lethimonier Franck
  • Chiapello Hélène
  • Naouar Naira
  • Médigue Claudine
  • Le Corguillé Gildas
  • Salgado David
  • van Helden Jacques

keywords

  • SARS-CoV-2
  • COVID-19
  • Genomic surveillance
  • Health data
  • EMERGEN-Bioinfo

document type

ART

abstract

We present EMERGEN-Bioinfo, the digital platform to collect, process, manage and divulgate viral sequences and non-sensitive metadata, developed in the context of EMERGEN, the French plan for COVID-19 genomic surveillance and research. The bioinformatics platform relies on different components to manage all the steps from raw sequence collection to deposition in international repositories. This includes: (1) specific storage spaces for each one of the 60 teams of the consortium; (2) a data lake gathering all sequences (raw, mapped, consensus genomes, aligned genomic and peptidic sequences); (3) system-level workflows to handle the data flow trough all the components of the platform; (4) a covid-19 specific domain of the national Galaxy server (covid19.usegalaxy.fr); (5) EMERGEN-DB, a database to store and manage non-sensitive metadata and genomic consensus sequences ; (6) data brokering services to facilitate metadata management and curation, submission to international repositories (GISAID and ENA) and follow-up of their acceptance status. The EMERGEN-Bioinfo platform is complemented by a high-security digital platform (EMERGEN-HDS) certified for Health Data Storage, which will enable researchers to pair EMERGEN data with patient data from different sources (national COVID-19 and healthcare databases). All the software resources developed for this projects will be accessible under an open license, and re-usable for other national projects (e.g. ABRomics multi-omics platform for surveillance and research on antimicrobial resistance) or international cooperation (e.g. sharing with partners of the European bioinformatics infrastructure ELIXIR).

more information