This document presents an example of how the xarray can be used to handle spatio-temporal raster in R.

Installation

To use the xarray in R, it is required to have the reticulate package installed. To do so, you can use the following command:

install.packages("reticulate")

After installing the reticulate package, we must create a Python virtual environment to install the xarray library. For this, we can use the following command:

In this example, we use conda to manage Python virtual environments, but any other package management tool can also be used (e.g., mamba, virtualenv, poetry).

# defining the name of the environment
conda_env_name <- "r-xarray"

# creating conda environment
reticulate::conda_create(envname = conda_env_name)
## + /opt/conda/bin/conda 'create' '--yes' '--name' 'r-xarray' 'python=3.9' '--quiet' '-c' 'conda-forge'
## [1] "/home/sits/.conda/envs/r-xarray/bin/python"

With the environment created, let’s install the dependencies on it:

reticulate::py_install(
  c("xarray", "dask", "distributed", "rioxarray", "stackstac"),
  envname = conda_env_name,
  pip =  TRUE
)

Importing Python libraries

Now, we have an environment ready to go. So, let’s activate it:

reticulate::use_condaenv(condaenv = conda_env_name)

To use the xarray library, we can import it using the following reticulate command:

xr <- reticulate::import("xarray")

For this example, we also need to import some dask packages:

da <- reticulate::import("dask.array")
dd <- reticulate::import("dask.distributed")

By default, xarray does not manage raster data (e.g., read, write, spatial reference system). For this, we need to use the rioxarray library, which makes a rasterio accessor available in the xarray classes, enabling us to handle raster data. Let’s import it:

rioxarray <- reticulate::import("rioxarray")

Also, in this example, we will use data from an STAC catalog. To keep this example focused on the xarray and avoid many data manipulations, we will also use the stackstac library, which turns STAC items into an xarray:

stackstac <- reticulate::import("stackstac")

Now, we are ready to go!

Configuring Dask

As dask was created to help us scale applications using multiple cores or machines, to use it, we need first to configure the connection between our application and those resources. To do that, we must create an instance of the class dask.distributed.Client:

client <- dd$Client()
client
## <Client: 'tcp://127.0.0.1:42677' processes=4 threads=16, memory=31.20 GiB>

By default, when we create an instance of the dask.distributed.Client class, dask configures a local server to use all cores available in the machine.

Getting data using rstac

In this example, we will get data from the STAC catalog managed by Element 84. For this, we will use the rstac. Let’s import it:

library(rstac)

Now, let’s define the endpoint object of the STAC catalog:

s_obj <- stac("https://earth-search.aws.element84.com/v1")

Using the endpoint we created above, let’s search for Sentinel 2 L2A data from the Ibitinga reservoir (Brazil / SP):

it_obj <- s_obj |>
  stac_search(
    collections = "sentinel-2-l2a",
    bbox = c(-48.931732, -22.055096, -48.850708, -21.982528), # Ibitinga/SP
    limit = 3
  ) |>
  get_request()

Loading data can be very time-consuming. So, in this example, we will download the data locally. For this, we need to create an output directory:

# defining output directory
output_dir <- fs::path("data/example-04/sentinel-2-l2a")

# creating output directory
fs::dir_create(output_dir, recurse = TRUE)

Now, we can download the data:

it_obj |>
  assets_download(asset_names = c("red", "nir"), output_dir = output_dir)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
## ###Items
## - matched feature(s): 580
## - features (3 item(s) / 577 not fetched):
##   - S2B_22KGA_20240223_0_L2A
##   - S2A_22KGA_20240218_0_L2A
##   - S2B_22KGA_20240213_0_L2A
## - assets: nir, red
## - item's fields: 
## assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

As the final step, we need to have the reference of the files we downloaded. rstac helps us by saving an items.json file in an STAC Catalog schema so we can load and manage the data easily:

metadata <- jsonlite::read_json(output_dir / "items.json")

# Adapting the paths of the metadata from /path/to/data to {output_dir}/path/to/data
metadata$features <- lapply(metadata$features, function(x) {
  x$assets <- lapply(x$assets, function(y) {
    y$href <- as.character(fs::path_join(output_dir / y$href))
    
    y
  })
  
  x
})

Creating spatio-temporal array

Based on the STAC Items we have downloaded from the STAC Catalog, we can create a spatio-temporal array. For this, we will use the stackstac library:

spatiotemporal_array <- stackstac$stack(metadata$features)
spatiotemporal_array
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (time: 3,
##                                                                 band: 2,
##                                                                 y: 10980,
##                                                                 x: 10980)> Size: 6GB
## dask.array<fetch_raster_window, shape=(3, 2, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/54)
##   * time                                     (time) datetime64[ns] 24B 2024-0...
##     id                                       (time) <U24 288B 'S2B_22KGA_2024...
##   * band                                     (band) <U3 24B 'nir' 'red'
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     ...                                       ...
##     title                                    (band) <U20 160B 'NIR 1 (band 8)...
##     gsd                                      int64 8B 10
##     common_name                              (band) <U3 24B 'nir' 'red'
##     center_wavelength                        (band) float64 16B 0.842 0.665
##     full_width_half_max                      (band) float64 16B 0.145 0.038
##     epsg                                     int64 8B 32722
## Attributes:
##     spec:        RasterSpec(epsg=32722, bounds=(699960.0, 7490200.0, 809760.0...
##     crs:         epsg:32722
##     transform:   | 10.00, 0.00, 699960.00|\n| 0.00,-10.00, 7600000.00|\n| 0.0...
##     resolution:  10.0

Temporal NDVI mean

Using the spatio-temporal array, we can calculate a temporal mean of NDVI data. For this, first, let’s get the NIR and RED data:

nir <- spatiotemporal_array$sel(band="nir")
red <- spatiotemporal_array$sel(band="red")

Now, we can calculate the NDVI:

ndvi <- (nir - red) / (nir + red)
ndvi
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (time: 3,
##                                                                 y: 10980,
##                                                                 x: 10980)> Size: 3GB
## dask.array<truediv, shape=(3, 10980, 10980), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/49)
##   * time                                     (time) datetime64[ns] 24B 2024-0...
##     id                                       (time) <U24 288B 'S2B_22KGA_2024...
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     s2:product_uri                           (time) <U65 780B 'S2B_MSIL2A_202...
##     ...                                       ...
##     view:sun_azimuth                         (time) float64 24B 77.97 ... 71.46
##     proj:transform                           object 8B {0, 7600000, 10, -10, ...
##     proj:shape                               object 8B {10980}
##     raster:bands                             object 8B {'nodata': 0, 'data_ty...
##     gsd                                      int64 8B 10
##     epsg                                     int64 8B 32722

To finish, we can calculate the temporal mean of the NDVI data we created:

ndvi_mean <- ndvi$mean(dim = c("time"))
ndvi_mean
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (y: 10980,
##                                                                 x: 10980)> Size: 964MB
## dask.array<mean_agg-aggregate, shape=(10980, 10980), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/23)
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     s2:datatake_type                         <U8 32B 'INS-NOBS'
##     earthsearch:boa_offset_applied           bool 1B True
##     s2:product_type                          <U7 28B 'S2MSI2A'
##     ...                                       ...
##     processing:software                      object 8B {'sentinel2-to-stac': ...
##     proj:transform                           object 8B {0, 7600000, 10, -10, ...
##     proj:shape                               object 8B {10980}
##     raster:bands                             object 8B {'nodata': 0, 'data_ty...
##     gsd                                      int64 8B 10
##     epsg                                     int64 8B 32722

With xarray, we are always using lazy evaluation functions. So, the code above still needs to be executed. For this, we can ask dask to execute it using the compute method.

ndvi_mean_data <- ndvi_mean$compute()
ndvi_mean_data
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (y: 10980,
##                                                                 x: 10980)> Size: 964MB
## array([[1.06741027, 1.05594742, 1.08179379, ..., 1.42908627, 1.41649439,
##         1.40925625],
##        [0.98031766, 1.02521273, 1.06912391, ..., 1.42091841, 1.40591751,
##         1.4122806 ],
##        [0.9362474 , 0.9941019 , 1.04582839, ..., 1.40666432, 1.41219386,
##         1.42676016],
##        ...,
##        [0.25959347, 0.2616745 , 0.26388728, ..., 1.61944882, 1.79426859,
##         2.08429357],
##        [0.27365805, 0.27209891, 0.26758767, ..., 1.84578764, 2.18360486,
##         2.58891568],
##        [0.282057  , 0.27502554, 0.267502  , ..., 1.964097  , 2.25718843,
##         2.09648986]])
## Coordinates: (12/23)
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     s2:datatake_type                         <U8 32B 'INS-NOBS'
##     earthsearch:boa_offset_applied           bool 1B True
##     s2:product_type                          <U7 28B 'S2MSI2A'
##     ...                                       ...
##     processing:software                      object 8B {'sentinel2-to-stac': ...
##     proj:transform                           object 8B {0, 7600000, 10, -10, ...
##     proj:shape                               object 8B {10980}
##     raster:bands                             object 8B {'nodata': 0, 'data_ty...
##     gsd                                      int64 8B 10
##     epsg                                     int64 8B 32722

To finish, we can save this result:

# defining file metadata
ndvi_mean_data$rio$set_attrs(nir$attrs, inplace = TRUE)
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (y: 10980,
##                                                                 x: 10980)> Size: 964MB
## array([[1.06741027, 1.05594742, 1.08179379, ..., 1.42908627, 1.41649439,
##         1.40925625],
##        [0.98031766, 1.02521273, 1.06912391, ..., 1.42091841, 1.40591751,
##         1.4122806 ],
##        [0.9362474 , 0.9941019 , 1.04582839, ..., 1.40666432, 1.41219386,
##         1.42676016],
##        ...,
##        [0.25959347, 0.2616745 , 0.26388728, ..., 1.61944882, 1.79426859,
##         2.08429357],
##        [0.27365805, 0.27209891, 0.26758767, ..., 1.84578764, 2.18360486,
##         2.58891568],
##        [0.282057  , 0.27502554, 0.267502  , ..., 1.964097  , 2.25718843,
##         2.09648986]])
## Coordinates: (12/23)
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     s2:datatake_type                         <U8 32B 'INS-NOBS'
##     earthsearch:boa_offset_applied           bool 1B True
##     s2:product_type                          <U7 28B 'S2MSI2A'
##     ...                                       ...
##     processing:software                      object 8B {'sentinel2-to-stac': ...
##     proj:transform                           object 8B {0, 7600000, 10, -10, ...
##     proj:shape                               object 8B {10980}
##     raster:bands                             object 8B {'nodata': 0, 'data_ty...
##     gsd                                      int64 8B 10
##     epsg                                     int64 8B 32722
## Attributes:
##     spec:        RasterSpec(epsg=32722, bounds=(699960.0, 7490200.0, 809760.0...
##     crs:         epsg:32722
##     transform:   | 10.00, 0.00, 699960.00|\n| 0.00,-10.00, 7600000.00|\n| 0.0...
##     resolution:  10.0
# writing raster
ndvi_mean_data$rio$to_raster(output_dir / "ndvi.tif")