Section 1 Validating the Residence Patch Method with Calibration Data

Here we show how the residence patch method (Barraquand and Benhamou 2008; Bijleveld et al. 2016; Oudman et al. 2018) accurately estimates the duration of known stops in a track collected as part of a calibration exercise in the Wadden Sea. These data can be accessed from the data folder at this link: https://doi.org/10.5281/zenodo.4287462. These data are more fully reported in (Beardsworth et al. 2021).

1.1 Outline of Cleaning Steps

We begin by preparing the libraries we need, and installing atlastools from Github. After installing atlastools, we visualise the data to check for location errors, and find a single outlier position approx. 15km away from the study area (Fig. 1.1, 1.2). This outlier is removed by filtering data by the X coordinate bounds using the function atl_filter_bounds; X coordinate bounds \(\leq\) 645,000 in the UTM 31N coordinate reference system were removed (n = 1; remaining positions = 50,815; Fig. 1.2). We then calculate the incoming and outgoing speed, as well as the turning angle at each position using the functions atl_get_speed and atl_turning_angle respectively, as a precursor to targeting large-scale location errors in the form of point outliers. We use the function atl_filter_covariates to remove positions with incoming and outgoing speeds \(\geq\) the speed threshold of 15 m/s (n = 13,491, 26.5%; remaining positions = 37,324, 73.5%; Fig. 1.3; main text Fig. 7.b). This speed threshold is chosen as the fastest boat speed during the experiment, 15 m/s. Finally, we target small-scale location errors by applying a median smoother with a moving window size \(K\) = 5 using the function atl_median_smooth (Fig. 1.4; main text Fig. 7.c). Smoothing does not reduce the number of positions. We thin the data to a 30 second interval leaving 1,803 positions (4.8% positions of the smoothed track)

1.2 Install atlastools from Github

atlastools is available from Github and is archived on Zenodo (Gupte 2020). It can be installed using remotes or devtools. Here we use the remotes function install_github.

A Note on :=

The atlastools package is based on data.table, to be fast and efficient (Dowle and Srinivasan 2020). A key feature is modification in place, where data is changed without making a copy. This is already implemented in R and will be familiar to many users as data_frame$column_name <- values.

The data.table way of writing this assignment would be data_frame[, column_name := values]. We use this syntax throughout, as it provides many useful shortcuts, such as multiple assignment:

data_frame[, c("col_a", "col_b") := list(values_a, values_b)]

Users can use this special syntax, and will find it convenient with practice, but there are no cases where users must use the data.table syntax, and can simply treat the data as a regular data.frame. However, users are advised to convert their data.frame to a data.table using the function data.table::setDT().

1.4 Access data and preliminary visualisation

First we access the data from a local file using the data.table package (Dowle and Srinivasan 2020).

In all, we aim to keep three versions of the data: (1) data_raw, the entirely unprocessed data, (2) data, the working version, and (3) data_unproc, data that has been partially processed, but which is one step behind data. This allows us to better illustrate the pre-processing steps, and prevents us from irreversibly modifying our data — at best, we would have to re-run many pre-processing steps, and at worst, we might overwrite the original data on disk.

We look at the first few rows, using head(). We then visualise the raw data.

Here we show how data can be easily visualised using the popular plotting package ggplot2. Note that we plot both the points (geom_point) and the inferred path between them (geom_path), and specify a geospatial coordinate system in metres, suitable for the Dutch Wadden Sea (UTM 31N; ESPG code:32631; coord_sf). We save the output to file for future reference.

Since plot code can become very lengthy and complicated, we omit showing further plot code in versions of this document rendered as PDF or HTML; it can however be seen in the online .Rmd version.

The raw data from a calibration exercise conducted around the island of Griend in the Dutch Wadden Sea. A handheld WATLAS tag was used to examine how ATLAS data compared to GPS tracks, and we use the WATLAS data here to demonstrate the basics of the pre-processing pipeline, as well as validate the residence patch method. It is immediately clear from the figure that the track shows location errors, both in the form of point outliers as well as small-scale errors around the true location.

The raw data from a calibration exercise conducted around the island of Griend in the Dutch Wadden Sea. A handheld WATLAS tag was used to examine how ATLAS data compared to GPS tracks, and we use the WATLAS data here to demonstrate the basics of the pre-processing pipeline, as well as validate the residence patch method. It is immediately clear from the figure that the track shows location errors, both in the form of point outliers as well as small-scale errors around the true location.

1.5 Filter by bounding box

We first save a copy of the data, so that we can plot the unprocessed data with the cleaned data plotted over it for comparison. Here, data_unproc, data, and data_raw are still the same, since no pre-processing steps have been applied yet.

We then filter by a bounding box in order to remove the point outlier to the far south east of the main track. We use the atl_filter_bounds functions using the x_range argument, to which we pass the limit in the UTM 31N coordinate reference system. This limit is used to exclude all points with an X coordinate < 645,000.

We then plot the result of filtering, with the excluded point in black, and the points that are retained in green. After this stage, data is filtered and ‘ahead’ of data_raw and data_unproc, which are still the same. This pattern will repeat throughout this material.

Removal of a point outlier using the function atl_filter_bounds. The point outlier (black point) is removed based on its X coordinate value, with the data filtered to exclude positions with an X coordinate < 645,000 in the UTM 31N coordinate system. Positions that are retained are shown in green.

Removal of a point outlier using the function atl_filter_bounds. The point outlier (black point) is removed based on its X coordinate value, with the data filtered to exclude positions with an X coordinate < 645,000 in the UTM 31N coordinate system. Positions that are retained are shown in green.

1.6 Filter trajectories

1.6.1 Handle time

Time in ATLAS tracks is represented by 64-bit integers (type long) that specify time in milliseconds, starting from the beginning of 1970 (the UNIX epoch). This representation of time is called POSIX time and is usually specified in seconds, not milliseconds.

Since about 1.6 billion seconds have passed since the beginning of 1970, current POSIX times in milliseconds cannot be represented by R’s built-in 32-bit integers. A naive conversion results in truncation of out-of-range numbers leading to huge errors (dates many thousands of years in the future).

R does not natively support 64-bit integers. One option is to use the bit64 package, which adds 64-bit integer support to R.

A simpler solution is to convert the times to R’s built in double data type (also called numeric), which uses a 64-bit floating point representation. This representation can represent integers with up to 16 digits without error; we only need 13 digits to represent the number of milliseconds since 1970, so the conversion is error free. We can also perform the conversion and then divide by 1000 so that times are represented in seconds, not milliseconds; this simplifies speed estimation.

If second-resolution is accurate enough (it is for our purposes), the solution that we use is to divide times by 1000 to reduce the resolution from milliseconds to seconds and then to convert the time stamps to R integers. In the spirit of not destroying data, we create a second lower-case column called time to store this

1.6.4 Filter on speed

Here we use a speed threshold of 15 m/s, the fastest known boat speed. We then plot the data with the extreme speeds shown in grey, and the positions retained shown in green.

Here, data_unproc moves ‘ahead’ of data_raw, and holds the data filtered by a bounding box — data is also moving ahead, and will be filtered on speed.

Improving data quality by filtering out positions that would require unrealistic movement. We removed positions with speeds \geq 15 m/s, which is the fastest possible speed in this calibration data, part of which was collected in a moving boat around Griend. Grey positions are removed, while green positions are retained. Rectangles indicate areas expanded for visualisation in following figures.

Improving data quality by filtering out positions that would require unrealistic movement. We removed positions with speeds \(\geq\) 15 m/s, which is the fastest possible speed in this calibration data, part of which was collected in a moving boat around Griend. Grey positions are removed, while green positions are retained. Rectangles indicate areas expanded for visualisation in following figures.

1.7 Smoothing the trajectory

We then apply a median smooth over a moving window (\(K\) = 5). This function modifies in place, and does not need to be assigned to a new variable. We create a copy of the data before applying the smooth so that we can compare the data before and after smoothing.

Reducing small-scale location error using a median smooth with a moving window K = 5. Median smoothed positions are shown in green, while raw, unfiltered data is shown in grey. Median smoothing successfully recovers the likely path of the track without a loss of data. The area shown is the upper rectangle from Fig. 1.3.

Reducing small-scale location error using a median smooth with a moving window \(K\) = 5. Median smoothed positions are shown in green, while raw, unfiltered data is shown in grey. Median smoothing successfully recovers the likely path of the track without a loss of data. The area shown is the upper rectangle from Fig. 1.3.

1.8 Thinning the data

Next we thin the data by aggregation to demonstrate thinning after median smoothing. Following this, we plot the median smooth and thinning by aggregation.

Thinning by aggregation over a 30 second interval (down from 1 second) preserves track structure while reducing the data volume for computation. Here, thinned positions are shown as purple squares, with the size of the square indicating the number of positions within the 30 second bin used to obtain the average position. Green points show the median smoothed data from Fig. 1.4, while the raw data are shown in grey. The area shown is the upper rectangle in Fig. 1.3.

Thinning by aggregation over a 30 second interval (down from 1 second) preserves track structure while reducing the data volume for computation. Here, thinned positions are shown as purple squares, with the size of the square indicating the number of positions within the 30 second bin used to obtain the average position. Green points show the median smoothed data from Fig. 1.4, while the raw data are shown in grey. The area shown is the upper rectangle in Fig. 1.3.

1.9 Residence patches

1.9.1 Get waypoint centroids

We subset the annotated calibration data to select the waypoints and the positions around them which are supposed to be the locations of known stops. Since each stop was supposed to be 5 minutes long, there are multiple points in each known stop.

From this data, we get the centroid of known stops, and determine the time difference between the first and last point within 50 metres, and within 10 minutes of the waypoint positions’ median time.

Essentially, this means that the maximum duration of a stop can be 20 minutes, and stops above this duration are not expected.

# get centroid
data_res_summary <- data_res[, list(
  nfixes_real = .N,
  x_median = median(x),
  y_median = median(y),
  t_median = median(time)
),
by = "tID"
]

# now get times 10 mins before and after
data_res_summary[, c("t_min", "t_max") := list(
  t_median - (10 * 60),
  t_median + (10 * 60)
)]

# manually get the duration of the stops
wp_data <- mapply(function(l, u, mx, my) {

  # first select all data whose timestamp is between
  # the upper and lower bounds of the stop (l = lower, u = upper)
  tmp_data <- data_unproc[inrange(time, l, u), ]

  # calculate the distance between the positions selected above
  # and the median X and Y coordinates of the stop (centroid)
  tmp_data[, distance := sqrt((mx - x)^2 + (my - y)^2)]

  # keep positions that are within 50m of the centroid
  tmp_data <- tmp_data[distance <= 50, ]

  # get the duration of the stop as the difference between
  # the minimum and maximum times of the positions retained above
  return(diff(range(tmp_data$time)))
}, data_res_summary$t_min, data_res_summary$t_max,
data_res_summary$x_median, data_res_summary$y_median,

# this specifies that a vector, rather than a list, is returned
SIMPLIFY = TRUE
)

# get waypoint summary --- rounding median coordinates to the nearest 100m
patch_summary_real <- data_res_summary[, list(
  nfixes_real = nfixes_real,
  x_median = round(median(x_median), digits = -2),
  y_median = round(median(y_median), digits = -2)
),
by = "tID"
]

# add real duration
patch_summary_real[, duration_real := wp_data]

# write to file
fwrite(patch_summary_real, "data/data_real_watlas_stops.csv")

1.9.2 Prepare data

First, we filter data where we know the animal (or in this case, the human-carried tag) spent some time at or near a position, as this is the first step to identify residence patches. One way of doing this is by filtering out positions with speeds above which the tag (ideally on an animal) is likely to be in transit. Rather than filtering on instantaneous speed estimates, filtering on a median smoothed speed estimate is more reliable.

1.9.3 Exclude transit points

Here, we aim to remove locations where the tag is clearly moving, by filtering on smoothed speed, using a one-way median smooth with \(K\) = 5. The speeds between points must be recalculated here because the speed metrics now associated with the data refer to the raw data before median smoothing.

1.9.4 Run residence patch method

We subset data with a smoothed speed < 2 m/s in order to construct residence patches. From this subset, we construct residence patches using the parameters: buffer_radius = 5 metres, lim_spat_indep = 50 metres, lim_time_indep = 5 minutes, and min_fixes = 3.

A note on summary statistics

Users specifying a summary_variable should make sure that the variable for which they want a summary statistic is present in the data. For instance, requesting mean speed by passing summary_variable = "speed" and summary_function = "mean" to atl_res_patch, should make sure that their data includes a column called speed.

1.9.5 Get spatial and summary objects

Having classified slow-moving or stationary behavioural bouts into residence patches, many animal ecologists would most probably wish to know something about the environment at or around these patches — more accurately, around the point locations classified into patches.

How exactly this is done depends on the relative spatial scales of the residence patches and the resolution of the environmental data layer. For instance, a residence patch some 40m – 50m wide or long may be overlaid on an environmental raster layer with a resolution of 250m. In this case, sampling the layer at the centroid of the patch is as good as sampling at all the patch’s points – the mean is unlikely to differ (except at raster pixel boundaries).

On the other hand, a raster with a 10m resolution (e.g. Sentinel 1 and 2 data) may be worthwhile to sample at all the locations comprising a residence patch, so as to calculate the mean and variance of environmental conditions.

Furthermore, many (if not all) animals integrate cues from quite a distance (10m – 100m) when making decisions on when to settle in an area, and when to leave. Thus it can also be useful to sample environmental layers not at point locations, but to extract the mean and variance from an area, or a buffer, around the animal’s point locations.

We have provided a convenient function to get either (1) the points (classified into patches), or (2) a summary ouput of the residence patches (i.e., the median coordinates and their attributes), or finally (3) a spatial buffer around the points from (1). This function, atl_patch_summary implements these options using the which_data argument, where the options are (1) “points”, (2) “summary”, or (3) “spatial”.

Here, we choose option (3), using a spatial buffer of 20m. The distance of the bufffer is passed to the argument buffer_radius.

At this stage, users have successfully pre-processed their data from raw positions to residence patches. Residence patches are essentially sf objects and can be visualised using the sf method for plot; for instance plot(patch_sf_data). Further sections reproduce the analyses in the main manuscript.


1.9.6 Prepare to plot data

We read in the island’s shapefile to plot it as a background for the residence patch figure.

Classifying thinned data into residence patches yields robust estimates of the duration of known stops. The island of Griend (53.25^{\circ}N, 5.25^{\circ}E) is shown in beige. Residence patches (green polygons; function parameters in text) correspond well to the locations of known stops (purple triangles). However, the algorithm identified all areas with prolonged residence, including those which were not intended stops (n = 12; green polygons without triangles). The field station on Griend (red triangle) was not intended to be a stop, but the tags were stored here before the trial, and the method correctly picked up this prolonged stationary data as a residence patch. The algorithm failed to find two stops of 6 and 15 seconds duration, since these were lost in the data thinning step (purple triangle without green polygon shows one of these). The area shown is the lower rectangle in Fig. 1.3.

Classifying thinned data into residence patches yields robust estimates of the duration of known stops. The island of Griend (53.25\(^{\circ}\)N, 5.25\(^{\circ}\)E) is shown in beige. Residence patches (green polygons; function parameters in text) correspond well to the locations of known stops (purple triangles). However, the algorithm identified all areas with prolonged residence, including those which were not intended stops (n = 12; green polygons without triangles). The field station on Griend (red triangle) was not intended to be a stop, but the tags were stored here before the trial, and the method correctly picked up this prolonged stationary data as a residence patch. The algorithm failed to find two stops of 6 and 15 seconds duration, since these were lost in the data thinning step (purple triangle without green polygon shows one of these). The area shown is the lower rectangle in Fig. 1.3.

1.10 Compare patch metrics

We filter these data to exclude one exceedingly long outlier of about an hour (WP080).

We add data from the known patches, matching by X and Y median.

7 patches are identified where there are no waypoints, while 2 waypoints are not identified as patches. These waypoints consisted of 6 and 15 (WP098 and WP092) positions respectively, and were lost when the data were aggregated to 30 second intervals.

1.11 Main text Figure 7

See main text for Figure 7. Plotting code is not shown in PDF and HTML form, see the .Rmd file.

References

Barraquand, Frédéric, and Simon Benhamou. 2008. “Animal Movements in Heterogeneous Landscapes: Identifying Profitable Places and Homogeneous Movement Bouts.” Ecology 89 (12): 3336–48. https://doi.org/10.1890/08-0162.1.

Beardsworth, Christine E., Evy Gobbens, Frank van Maarseveen, Bas Denissen, Anne Dekinga, Ran Nathan, Sivan Toledo, and Allert I. Bijleveld. 2021. “Validating a High-Throughput Tracking System: ATLAS as a Regional-Scale Alternative to GPS.” bioRxiv, February. Cold Spring Harbor Laboratory, 2021.02.09.430514. https://doi.org/10.1101/2021.02.09.430514.

Bijleveld, Allert Imre, Robert B MacCurdy, Ying-Chi Chan, Emma Penning, Richard M. Gabrielson, John Cluderay, Erik L. Spaulding, et al. 2016. “Understanding Spatial Distributions: Negative Density-Dependence in Prey Causes Predators to Trade-Off Prey Quantity with Quality.” Proceedings of the Royal Society B: Biological Sciences 283 (1828): 20151557. https://doi.org/10.1098/rspb.2015.1557.

Dowle, Matt, and Arun Srinivasan. 2020. Data.Table: Extension of ‘data.Frame‘. Manual.

Gupte, Pratik Rajan. 2020. “Atlastools: Pre-Processing Tools for High Frequency Tracking Data.” Zenodo. https://doi.org/10.5281/ZENODO.4033154.

Oudman, Thomas, Theunis Piersma, Mohamed V. Ahmedou Salem, Marieke E. Feis, Anne Dekinga, Sander Holthuijsen, Job ten Horn, Jan A. van Gils, and Allert I. Bijleveld. 2018. “Resource Landscapes Explain Contrasting Patterns of Aggregation and Site Fidelity by Red Knots at Two Wintering Sites.” Movement Ecology 6 (1): 24–24. https://doi.org/10.1186/s40462-018-0142-4.