Building shared capacity at the European level

Basics

The BGE project is a foundational joint step aimed at building shared capacity at the European level to accelerate the production of biodiversity genomic data ….

Biodiversity in crisis

Global wildlife populations have declined by an average of 69% since 1970, and an estimated 25% of species are threatened with extinction, including almost 40% of conifers, over a third of flowering plants, amphibians, sharks and rays, and more than a fifth of mammals.

This biodiversity crisis not only impacts wild species, but threatens human lives and livelihoods. For instance, the extinction of a third of the world’s tree species – as currently threatened – has recently been predicted to negatively affect billions of people through loss of livelihoods and benefits. In Europe, pollinators have been identified as a crucial group under threat, the loss of which could have severe impacts on food security and livelihoods.

To combat the global biodiversity crisis we need to first understand the diversity of life on Earth: how many species exist, where they are found, how they function and interact, and how they are responding to the multiple environmental pressures they face.

Despite centuries of research, an estimated 80% of the world’s multicellular species still await scientific discovery and description. Even when species have been described, telling them apart is often difficult, and knowledge of their biology, distributions, variability, inter-dependencies, and conservation status remains patchy and incomplete.

“Camila Mazzoni, BGE deputy director (Genome Sequencing stream) on the application of genomics for biodiversity protection and restoration”

Genomic research for biodiversity

Advances in genomic science provide an important means to address the challenges of describing, identifying and tracking species and their relationships. BGE uses two types of genomic data: whole genome sequences and DNA barcodes.

Genome sequencing and DNA barcoding use many of the same methods and technologies. However, whole genome sequencing usually relies upon samples collected fresh from the wild, ideally frozen in liquid nitrogen until their DNA can be extracted for sequencing. Barcoding – because it sequences much shorter lengths of DNA – can sometimes be achieved with specimens stored in museums or herbaria for many years. This has the advantages that the specimens are already identified, and there is no costly and time-consuming field collection.

For more information on the rationale and applications of genomic science, see this special issue of PNAS: The Earth BioGenome Project: The Launch of a Moonshot for Biology

Full-Genome Sequencing

DNA is made up of long chains of just four small molecules known as bases, represented by the letters A, C, G and T. Genome sequencing simply determines the unique order of these bases in the DNA of an organism.

The Earth Biogenome Project aims to crack this code for all species on Earth, delivering fundamental knowledge of how biological systems function and how species respond and adapt to environmental change. The European Reference Genome Atlas (ERGA) is the European arm of the Earth Biogenome Project, representing institutions across Europe that carry out full-genome sequencing. ERGA and many of its members are partners in BGE.

DNA Barcoding

Barcodes are specific, short sequences of DNA from within the genome that can be used to tell organisms apart. These help us to identify individual plants, animals or fungi, and define new species. Barcoding can also be used to document whole communities of organisms such as in soil, air or water samples, and monitor changes in these communities over time. BGE will deliver comprehensive ‘libraries’ of barcodes, against which new samples can be compared. This will speed up species discovery and provide the foundations of a future global bio-surveillance system for biodiversity.

Examples of barcoding uses include the identification of invasive non-native pest species, monitoring ecosystem health through stream invertebrate communities, combating wildlife crime, and characterising vertebrate populations.

Digital Sequence Information

It is very important that scientific knowledge is accessible to all, and that the benefits of new biological knowledge – scientific and commercial – are shared fairly among all involved, including the nations and communities within which any specimens originated. This is called ‘access and benefit sharing’. Unlike physical biological resources (such as animal or plant specimens in zoos, museums, herbaria and seed banks), Digital Sequence Information (DSI) – which includes full-genome sequences and DNA barcodes – has not previously been included in international legal frameworks on access and benefit sharing, such as the Nagoya Protocol.

At the fifteenth Conference of the Parties to the UN Convention on Biological Diversity (COP15) in 2022, nations reached an agreement to create a system for benefit sharing from the use of DSI – including a global fund for the distribution of financial benefits. This mechanism is to be finalised at COP16, in 2024.

Read the full COP15 agreement on DSI here (pdf).

Biodiversity genomics workflow

Both DNA barcoding and genome sequencing start with extracting DNA from a specimen. First the sample must be physically broken up to release molecules from the cells. This can be done by hand using a tiny pestle (usually for fresh samples) or in a machine (usually for dried specimens). Sand can be used to help grind the material, liquid nitrogen to freeze the material to make it more brittle, and chemical ‘buffers’ can help stop the DNA being broken during the process.

The DNA must then be precipitated out of the solution – which can be achieved by spinning at high speed in a centrifuge with an alcohol such as isopropanol – and washed. For some groups of organisms, such as plants, additional treatment may be needed to remove other molecules that might stick to the DNA. Commercial DNA extraction kits can also be used to simplify this process.

Electrophoresis – which separates the molecules in a solution by their size and electrical charge – is then used to check the quality and quantity of the DNA obtained.

Next-generation-sequencing (NGS) methods – which can read the sequence of many thousands of DNA pieces at once – have massively speeded up the sequencing of whole genomes. Several methods are available, including those created by Illumina, Oxford Nanopore and Pacific BioSciences.

Most NGS methods first require the creation of a ‘library’ of small fragments of DNA from the extracted sample, which together represent the whole genome of the organism. Each of the fragments in the library is treated (e.g., by the addition of small ‘adaptor’ sequences to its ends) to enable it to be ‘read’ by the sequencing machine.

For barcoding, only small, specific regions of the genome need to be sequenced. The region sequenced for barcoding will depend on the organism: for animals usually a region called COX1; for plants usually a combination of multiple regions.

Fresh barcode samples are sequenced using a process called ‘amplicon sequencing’, in which we amplify the specific barcode regions (amplicons) and then use NGS to sequence them. For the more fragmented DNA typically obtained from museum and herbarium samples, more specialised methods are needed. We are constantly improving our techniques to extract high-quality DNA, even from centuries-old specimens.

The infographic below shows the entire workflow of genomic research, from field sampling to application.

DNA Barcoding and Full Genome Sequencing Workflow