spatialsample:

A tidy approach to spatial cross-validation

About Me

  • Mike Mahoney

  • PhD candidate at SUNY-ESF

  • 2022 summer intern with Posit (spatialsample, rsample)

  • These slides: mm218.dev/boston_useR_2023

Cross-validation:

rsample and friends

library(tidymodels)
rsample::vfold_cv(spatialsample::boston_canopy) |> head()
#> # A tibble: 6 × 2
#>   splits           id    
#>   <list>           <chr> 
#> 1 <split [613/69]> Fold01
#> 2 <split [613/69]> Fold02
#> 3 <split [614/68]> Fold03
#> 4 <split [614/68]> Fold04
#> 5 <split [614/68]> Fold05
#> 6 <split [614/68]> Fold06
workflow() |> 
  add_model(linear_reg()) |> 
  add_formula(canopy_area_2019 ~ land_area * mean_temp) |> 
  fit_resamples(vfold_cv(spatialsample::boston_canopy)) |> 
  collect_metrics()
#> # A tibble: 2 × 6
#>   .metric .estimator       mean     n    std_err .config             
#>   <chr>   <chr>           <dbl> <int>      <dbl> <chr>               
#> 1 rmse    standard   377089.       10 20426.     Preprocessor1_Model1
#> 2 rsq     standard        0.353    10     0.0178 Preprocessor1_Model1

What does “new data” mean?

ggplot(spatialsample::boston_canopy, aes(fill = canopy_area_2019)) + geom_sf() + 
  scale_fill_distiller(name = "Canopy area (2019)", palette = "YlGn", direction = 1)

Are these folds really unrelated?

rsample::vfold_cv(spatialsample::boston_canopy, v = 5)

Spatial clustering

library(spatialsample)
set.seed(1234)
spatial_clustering_cv(boston_canopy, v = 5)
library(purrr)
walk(spatial_clustering_cv(boston_canopy, v = 5)$splits, function(x) print(autoplot(x)))

Spatial blocking

spatial_block_cv(boston_canopy, v = 5, n = c(10, 10))

Spatial LODO

folds <- spatial_buffer_vfold_cv(boston_canopy, v = Inf, radius = 1500, buffer = 1500)
walk(folds$splits, function(x) print(autoplot(x)))


https://doi.org/10.48550/arXiv.2303.07334

Other features:


Works with projected & geographic CRS

Handles mismatched CRS

Aware of CRS units, arguments accept explicit units

Handles all geometry types\(^*\)

Integrates with the rest of tidymodels

Thank you!


Find me online:

mm218.dev

@mikemahoney218

@MikeMahoney218@fosstodon.org


Slides available at mm218.dev/boston_useR_2023

More spatialsample: https://spatialsample.tidymodels.org/