Thin tracking data by resampling or aggregation. — atl_thin

Uniformly reduce data volumes with either aggregation or resampling (specified by the method argument) over an interval specified in seconds using the interval argument. Both options make two important assumptions: (1) that timestamps are named time, and (2) all columns except the identity columns can be averaged in R. While the subsample option returns a thinned dataset with all columns from the input data, the aggregate option drops the column COVXY, since this cannot be propagated to the averaged position. Both options handle the column time differently: while subsample returns the actual timestamp (in UNIX time) of each sample, aggregate returns the mean timestamp (also in UNIX time). In both cases, an extra column, time_agg, is added which has a uniform difference between each element corresponding to the user-defined thinning interval. The aggregate option only recognises errors named VARX and VARY, and standard deviation around each position named SD. If all of these columns are not present together the function assumes there is no measure of error, and drops those columns. If there is actually no measure of error, the function simply returns the averaged position and covariates in each time interval. Grouping variables' names (such as animal identity) may be passed as a character vector to the id_columns argument.

atl_thin_data(
  data,
  interval = 60,
  time = "time",
  id_columns = NULL,
  method = c("subsample", "aggregate")
)

Arguments

data	Cleaned data to aggregate. Must have a numeric column named time.
interval	The interval in seconds over which to aggregate.
time	The timestamp column name, ideally referring to a column with an integer type.
id_columns	Column names for grouping columns.
method	Should the data be thinned by subsampling or aggregation. If resampling (`method = "subsample"`), the first position of each group is taken. If aggregation (`method = "aggregate"`), the group positions' mean is taken.

Value

A dataframe aggregated taking the mean over the interval.

Examples

if (FALSE) {
thinned_data <- atl_thin_data(data,
  interval = 60,
  id_columns = c("animal_id"),
  method = "aggregate"
)
}