Create a split frame with train and test indices for one or more time series.
Usage
make_split(
main_frame,
context,
type,
value,
n_ahead,
n_skip = 0,
n_lag = 0,
mode = "slide",
exceed = TRUE
)Arguments
- main_frame
A
tibblecontaining the time series data.- context
A named
listwith the identifiers forseries_id,value_id, andindex_id.- type
Character value. The type of initial split. Possible values are
"first","last", and"prob".- value
Numeric value specifying the initial split.
- n_ahead
Integer. The forecast horizon, i.e. the number of observations in each test window.
- n_skip
Integer. The number of observations to skip between split origins. The default is
0.- n_lag
Integer. The number of lagged observations to include before the test window. This is useful if lagged predictors are required when constructing test features. The default is
0.- mode
Character value. Either
"slide"for a fixed-window approach or"stretch"for an expanding-window approach.- exceed
Logical value. If
TRUE, out-of-sample splits exceeding the original sample size are created.
Value
A tibble containing the split plan. The output has one row per time
series and split, with list-columns train and test containing
integer row positions.
Details
make_split() creates rolling-origin train-test splits for time series
cross-validation. The output is used by functions such as
slice_train() and slice_test() to extract the corresponding
training and testing samples from main_frame.
The function supports two training-window modes:
mode = "slide"creates a fixed-window approach. The training window has constant length and moves forward over time.mode = "stretch"creates an expanding-window approach. The training window starts at the first observation and grows over time.
The initial training window is controlled by type and value:
type = "first"uses the firstvalueobservations as the initial training window.type = "last"keeps the lastvalueobservations for testing and derives the initial training window from the remaining sample.type = "prob"usesfloor(value * n_total)observations as the initial training window.
The argument n_skip controls how far the rolling origin moves between
consecutive splits. For non-overlapping test windows, use
n_skip = n_ahead - 1.
See also
Other time series cross-validation:
make_future(),
make_tsibble(),
slice_test(),
slice_train(),
split_index()
Examples
library(dplyr)
context <- list(
series_id = "series",
value_id = "value",
index_id = "index"
)
main_frame <- M4_monthly_data |>
filter(series == "M23100")
# Fixed-window split plan
fixed_split <- make_split(
main_frame = main_frame,
context = context,
type = "first",
value = 120,
n_ahead = 18,
n_skip = 17,
n_lag = 0,
mode = "slide",
exceed = FALSE
)
fixed_split
#> # A tibble: 2 × 4
#> series split train test
#> <chr> <int> <list> <list>
#> 1 M23100 1 <int [120]> <int [18]>
#> 2 M23100 2 <int [120]> <int [18]>
# Expanding-window split plan
expanding_split <- make_split(
main_frame = main_frame,
context = context,
type = "first",
value = 120,
n_ahead = 18,
n_skip = 17,
n_lag = 0,
mode = "stretch",
exceed = FALSE
)
expanding_split
#> # A tibble: 2 × 4
#> series split train test
#> <chr> <int> <list> <list>
#> 1 M23100 1 <int [120]> <int [18]>
#> 2 M23100 2 <int [138]> <int [18]>
