Methods

Acoustic datasets will be used to predict the spatio-temporal distribution of migrating bats in Europe.

Why acoustics?
Material needed
How to handle multiple recorders and settings?
Species ID
Working with very large datasets
Data Management Plan

Why acoustics?

Recordings of bat echolocation calls are easy to collect

Material has become relatively cheap
Setting up a passive recorder in the field is quick and easy
Data process can be automated
The process is not intrusive (no need to capture or disturb bats)

Acoustic recordings provide quality data to model spatial distribution

Can provide true absence data for most species
The number of bat passes/night is a proxy for population densities, which is much better than presence/absence data

Material needed

For this project, acoustic recordings from anywhere in geographical Europe are welcome. It will be important to obtain recordings from areas with no activity of the target species, as well as from areas with medium and high activity from these target species.

Data

Full-night and full-spectrum recordings
WAW or RAW files OR acoustic parameters provided by the TADARIDA software (.ta files)

Metadata required

Detector type and settings
Coordinates
Date
Type of study (ground-level, wind turbine, wind mast, building…)
Microphone height

Data filtering

Poor-quality microphones
Roost surveys
Partial night recordings

How to handle multiple recorders and settings?

- This project opens the opportunity to review the most common uses of settings and machines in Europe.

This review will allow to define which possibilities exist to work on a common dataset, using for example:
- Field tests to define which settings/machines give similar results
- Larger time interval to count bat passes (e.g. positive minute of activity, Miller 2001, Haquart 2012) –> eliminates a lot of variability due to material used
- Use setting and machine as input in models to take the variability into account

Species ID

Human identification of bat echolocation calls can be of very good quality but it also brings observer bias (i.e. two observers might not agree on the identity of the same sound sequence). The quantity of data and observers in this project makes it impossible to control observer bias. This is why we chose to use automatic ID.

Although not perfect, automatic ID makes it possible to assess the error rate, which can be used to sort out data (e.g. all sound sequences with an error probability superior to 0.1 will not be used) or to associate data with weights (e.g. sound sequences with a higher error probability will be assigned a lower weight). Moreover, automatic ID allows the re-analysis of the whole dataset in a reproducible way in the future, using this process:

Extract sound parameters with TADARIDA (Bas et al., 2017, open source) and make ID predictions
The classifiers proposes ID along with an error probability
Account for uncertainty in automatic ID by assessing the robustness of results for different confidence indexes (see Barré et al. 2019)

Modelling

Species distribution will be predicted with random forest models. Connectivity between hotspots of activity will be inferred using randomized shortest paths.

Working with very large datasets

In this project, a very large dataset of acoustic recordings will be analysed. This context makes it possible to work with some noise in the data:

Variability appears in data due to the type of measure or analysis: this is called noise.

e.g. Error in species ID

Noise is not always a problem: the law of large numbers says that noise can always be compensated by the quantity of data collected.

Bias appear in results when the noise is correlated with the studied variables.

e.g. If the Common Noctule is more often mistaken with the Serotine bat in forests compared to meadows –> The study of the influence of the habitat on the activity could lead to biased results.

However, if the correlation is small enough, then this bias is insignificant! If this correlation can be measured, and if it is not too strong, then this bias can be corrected through modelling!

–> With big datasets, it is important to perform the analysis with different thresholds of error probabilities in the automatic ID, to assess the robustness of results (Barré et al. 2019).

Data Management Plan

The collection of data will demand efforts from partners and from the coordination team. These data are complex (multiple metadata to take into account), heavy (especially the sound files) and numerous. This is why a Data Management Plan was written to ensure that data management will follow the most up-to-date practices and international recommendations. This will also guarantee that these data are handled in a foreseeable and secure way.