spatialsample:
A tidy approach to spatial cross-validation
About Me
Mike Mahoney
PhD candidate at SUNY-ESF
2022 summer intern with Posit (spatialsample, rsample)
These slides: mm218.dev/boston_useR_2023
Cross-validation:
rsample and friends
library(tidymodels)
rsample::vfold_cv(spatialsample::boston_canopy) |> head()
#> # A tibble: 6 × 2
#> splits id
#> <list> <chr>
#> 1 <split [613/69]> Fold01
#> 2 <split [613/69]> Fold02
#> 3 <split [614/68]> Fold03
#> 4 <split [614/68]> Fold04
#> 5 <split [614/68]> Fold05
#> 6 <split [614/68]> Fold06
workflow() |>
add_model(linear_reg()) |>
add_formula(canopy_area_2019 ~ land_area * mean_temp) |>
fit_resamples(vfold_cv(spatialsample::boston_canopy)) |>
collect_metrics()
#> # A tibble: 2 × 6
#> .metric .estimator mean n std_err .config
#> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 rmse standard 377089. 10 20426. Preprocessor1_Model1
#> 2 rsq standard 0.353 10 0.0178 Preprocessor1_Model1
What does “new data” mean?
ggplot(spatialsample::boston_canopy, aes(fill = canopy_area_2019)) + geom_sf() +
scale_fill_distiller(name = "Canopy area (2019)", palette = "YlGn", direction = 1)
Spatial clustering
library(spatialsample)
set.seed(1234)
spatial_clustering_cv(boston_canopy, v = 5)
library(purrr)
walk(spatial_clustering_cv(boston_canopy, v = 5)$splits, function(x) print(autoplot(x)))
Spatial blocking
spatial_block_cv(boston_canopy, v = 5, n = c(10, 10))
Spatial LODO
folds <- spatial_buffer_vfold_cv(boston_canopy, v = Inf, radius = 1500, buffer = 1500)
walk(folds$splits, function(x) print(autoplot(x)))
Other features:
Works with projected & geographic CRS
Handles mismatched CRS
Aware of CRS units, arguments accept explicit units
Handles all geometry types\(^*\)
Integrates with the rest of tidymodels