BGE Research

Data Infrastructure

BGE insists on FAIR – findable, accessible, interoperable and reusable – data. We work closely with the international Barcode of Life Datasystem (BOLD) to ensure access to all our outputs.

For DNA barcodes, sequences are published, annotated and curated in databases for reference and reuse by the scientific community. To this end, we are working with colleagues in Canada to develop the next incarnation of the international Barcode of Life Datasystem BOLD, strengthening its long-term sustainability and resilience, enhancing functionality, and integrating it further into the landscape of barcoding resources, notably UNITE and the European Nucleotide Archive (ENA). This will make it easier for data to flow both within the BGE project and beyond.

For whole genome sequences, robust computational workflows are first required to assemble shorter sequences into whole genomes. We are developing computational infrastructure to generate high-quality genome assemblies using a range of sequencing data types. This includes support for manual curation to chromosome-level.

The exome is then annotated automatically based on data from transcriptome sequencing using Ensembl pipelines. Third party annotations generated from the community are also being incorporated. All raw data and final annotated reference genomes for each species are made publicly available via Ensembl Rapid Release and the European Nucleotide Archive. These data services will provide an integrated platform for the collection, aggregation, and sharing of metadata.

“In the first year of the project, we are surveying the needs of the barcoding community towards a roadmap for future infrastructure development, and preparing to deploy BOLD in more hosting environments. In addition, we are mapping the flows of barcoding and metabarcoding data through the project from sequencing to storage and downstream analysis.”

Dr Rutger Vos, Naturalis Biodiversity Center, the Netherlands

The resulting atlas of reference genome sequences – made available through the ERGA Data Portal – will ensure FAIR open access to the genome data generated by the project, supporting scientific, industrial, and regulatory communities across Europe.

During the first months of the project, a major focus has been developing a tracking tool to allow the project consortia to gain an understanding of progress and next required actions. The genome and barcoding data teams are working together to align processes and get a mutual understanding, sharing best practices across BioScan and ERGA activities.”

Dr Katharina Heil, ELIXIR Europe