Uniformly reduce data volumes with either aggregation or resampling
(specified by the method
argument) over an interval specified in
seconds using the interval
argument.
Both options make two important assumptions:
(1) that timestamps are named time
, and
(2) all columns except the identity columns can be averaged in R
.
While the subsample
option returns a thinned dataset with all columns
from the input data, the aggregate
option drops the column
COVXY
, since this cannot be propagated to the averaged position.
Both options handle the column time
differently: while subsample
returns the actual timestamp (in UNIX time) of each sample, aggregate
returns the mean timestamp (also in UNIX time).
In both cases, an extra column, time_agg
, is added which has a uniform
difference between each element corresponding to the user-defined thinning
interval.
The aggregate
option only recognises errors named VARX
and
VARY
, and standard deviation around each position named SD
.
If all of these columns are not present together the function assumes there
is no measure of error, and drops those columns.
If there is actually no measure of error, the function simply returns the
averaged position and covariates in each time interval.
Grouping variables' names (such as animal identity) may be passed as a
character vector to the id_columns
argument.
atl_thin_data( data, interval = 60, time = "time", id_columns = NULL, method = c("subsample", "aggregate") )
data | Cleaned data to aggregate. Must have a numeric column named time. |
---|---|
interval | The interval in seconds over which to aggregate. |
time | The timestamp column name, ideally referring to a column with an integer type. |
id_columns | Column names for grouping columns. |
method | Should the data be thinned by subsampling or aggregation.
If resampling ( |
A dataframe aggregated taking the mean over the interval.
if (FALSE) { thinned_data <- atl_thin_data(data, interval = 60, id_columns = c("animal_id"), method = "aggregate" ) }