BGE Research

Data Infrastructure

E insists on FAIR – findable, accessible, interoperable and reusable – data. We work closely with the international Barcode of Life Datasystem (BOLD), the European Nucleotide Archive (ENA), and UNITE to ensure access to all our outputs.

For DNA barcodes, sequences are published, annotated and curated in databases for reference and reuse by the scientific community. To this end, we are working with colleagues in Canada to develop the next incarnation of the international Barcode of Life Datasystem BOLD, strengthening its long-term sustainability and resilience, enhancing functionality, and integrating it further into the landscape of barcoding resources, notably ENA and UNITE. This will make it easier for data to flow both within the BGE project and beyond.

“The BGE project has surveyed the needs of the barcoding community and established a roadmap for future infrastructure development. This roadmap is coming to fruition through a redesigned BOLD data portal that is deployed in more hosting environments. In addition, we are mapping the flows of barcoding and metabarcoding data through the project from sequencing to storage and downstream analysis.”

Dr Rutger Vos, Naturalis Biodiversity Center

For whole genome sequences, robust computational workflows have been set up to assemble shorter sequences into whole genomes. Computational infrastructure has been developed to generate high-quality genome assemblies using a range of sequencing data types. This includes support for manual curation to chromosome-level.

Exomes are then annotated automatically based on data from transcriptome sequencing using Ensembl pipelines. Third-party annotations generated from the community have also been incorporated. All raw data and final annotated reference genomes for each species are made publicly available via Ensembl Rapid Release and the European Nucleotide Archive. These data services provide an integrated platform for the collection, aggregation, and sharing of metadata.

The resulting atlas of reference genome sequences – made available through the ERGA Data Portal – ensures FAIR open access to the genome data generated by the project, supporting scientific, industrial, and regulatory communities across Europe.

“During the first months of the project, a major focus was developing a tracking tool to allow the project consortia to gain an understanding of progress and next required actions. The genome and barcoding data teams are working together to align processes and get a mutual understanding, sharing best practices across iBOL Europe and ERGA activities.”

“As part of the Data Pillar we have been working jointly, discovering synergies, shared common challenges and identifying potential mutual solutions. This was and remains a good learning experience for members from both the ERGA as well as iBoL focused Work Packages. We are currently working on a publication to share learnings and insights with an even wider group.”

Dr Katharina Heil, ELIXIR Europe