This document presents an example of how the xarray can be used to handle spatio-temporal raster in R.

Installation

To use the xarray in R, it is required to have the reticulate package installed. To do so, you can use the following command:

install.packages("reticulate")

After installing the reticulate package, we must create a Python virtual environment to install the xarray library. For this, we can use the following command:

In this example, we use conda to manage Python virtual environments, but any other package management tool can also be used (e.g., mamba, virtualenv, poetry).

# defining the name of the environment
conda_env_name <- "r-xarray"

# creating conda environment
reticulate::conda_create(envname = conda_env_name)
## + /opt/conda/bin/conda 'create' '--yes' '--name' 'r-xarray' 'python=3.9' '--quiet' '-c' 'conda-forge'
## [1] "/home/sits/.conda/envs/r-xarray/bin/python"

With the environment created, let’s install the dependencies on it:

reticulate::py_install(
  c("xarray", "dask", "distributed", "rioxarray", "stackstac"),
  envname = conda_env_name,
  pip =  TRUE
)

Importing Python libraries

Now, we have an environment ready to go. So, let’s activate it:

reticulate::use_condaenv(condaenv = conda_env_name)

To use the xarray library, we can import it using the following reticulate command:

xr <- reticulate::import("xarray")

For this example, we also need to import some dask packages:

da <- reticulate::import("dask.array")
dd <- reticulate::import("dask.distributed")

By default, xarray does not manage raster data (e.g., read, write, spatial reference system). For this, we need to use the rioxarray library, which makes a rasterio accessor available in the xarray classes, enabling us to handle raster data. Let’s import it:

rioxarray <- reticulate::import("rioxarray")

Also, in this example, we will use data from an STAC catalog. To keep this example focused on the xarray and avoid many data manipulations, we will also use the stackstac library, which turns STAC items into an xarray:

stackstac <- reticulate::import("stackstac")

Now, we are ready to go!

Configuring Dask

As dask was created to help us scale applications using multiple cores or machines, to use it, we need first to configure the connection between our application and those resources. To do that, we must create an instance of the class dask.distributed.Client:

client <- dd$Client()
client
## <Client: 'tcp://127.0.0.1:34623' processes=4 threads=16, memory=31.20 GiB>

By default, when we create an instance of the dask.distributed.Client class, dask configures a local server to use all cores available in the machine.

Getting data using rstac

In this example, we will get data from the STAC catalog managed by Element 84. For this, we will use the rstac. Let’s import it:

library(rstac)

Now, let’s define the endpoint object of the STAC catalog:

s_obj <- stac("https://earth-search.aws.element84.com/v1")

Using the endpoint we created above, let’s search for Sentinel 2 L2A data from the Ibitinga reservoir (Brazil / SP):

it_obj <- s_obj |>
  stac_search(
    collections = "sentinel-2-l2a",
    bbox = c(-48.931732, -22.055096, -48.850708, -21.982528), # Ibitinga/SP
    limit = 3
  ) |>
  get_request()

Creating spatio-temporal array

Based on the STAC Items we have selected from the STAC Catalog, we can create a spatio-temporal array. For this, we will use the stackstac library:

spatiotemporal_array <- stackstac$stack(it_obj$features)
spatiotemporal_array
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (time: 3,
##                                                                 band: 32,
##                                                                 y: 10980,
##                                                                 x: 10980)> Size: 93GB
## dask.array<fetch_raster_window, shape=(3, 32, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/52)
##   * time                                     (time) datetime64[ns] 24B 2024-0...
##     id                                       (time) <U24 288B 'S2B_22KGA_2024...
##   * band                                     (band) <U12 2kB 'aot' ... 'wvp-jp2'
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     ...                                       ...
##     title                                    (band) <U31 4kB 'Aerosol optical...
##     gsd                                      (band) object 256B None 10 ... None
##     common_name                              (band) object 256B None ... None
##     center_wavelength                        (band) object 256B None ... None
##     full_width_half_max                      (band) object 256B None ... None
##     epsg                                     int64 8B 32722
## Attributes:
##     spec:        RasterSpec(epsg=32722, bounds=(699960.0, 7490200.0, 809760.0...
##     crs:         epsg:32722
##     transform:   | 10.00, 0.00, 699960.00|\n| 0.00,-10.00, 7600000.00|\n| 0.0...
##     resolution:  10.0

Temporal NDVI mean

Using the spatio-temporal array, we can calculate a temporal mean of NDVI data. For this, first, let’s get the NIR and RED data:

nir <- spatiotemporal_array$sel(band="nir")
red <- spatiotemporal_array$sel(band="red")

Now, we can calculate the NDVI:

ndvi <- (nir - red) / (nir + red)
ndvi
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (time: 3,
##                                                                 y: 10980,
##                                                                 x: 10980)> Size: 3GB
## dask.array<truediv, shape=(3, 10980, 10980), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/47)
##   * time                                     (time) datetime64[ns] 24B 2024-0...
##     id                                       (time) <U24 288B 'S2B_22KGA_2024...
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     earthsearch:boa_offset_applied           bool 1B True
##     ...                                       ...
##     processing:software                      object 8B {'sentinel2-to-stac': ...
##     proj:epsg                                int64 8B 32722
##     grid:code                                <U10 40B 'MGRS-22KGA'
##     raster:bands                             object 8B [{'nodata': 0, 'data_t...
##     gsd                                      object 8B 10
##     epsg                                     int64 8B 32722

To finish, we can calculate the temporal mean of the NDVI data we created:

ndvi_mean <- ndvi$mean(dim = c("time"))
ndvi_mean
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (y: 10980,
##                                                                 x: 10980)> Size: 964MB
## dask.array<mean_agg-aggregate, shape=(10980, 10980), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/21)
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     earthsearch:boa_offset_applied           bool 1B True
##     mgrs:grid_square                         <U2 8B 'GA'
##     s2:processing_baseline                   <U5 20B '05.10'
##     ...                                       ...
##     processing:software                      object 8B {'sentinel2-to-stac': ...
##     proj:epsg                                int64 8B 32722
##     grid:code                                <U10 40B 'MGRS-22KGA'
##     raster:bands                             object 8B [{'nodata': 0, 'data_t...
##     gsd                                      object 8B 10
##     epsg                                     int64 8B 32722

With xarray, we are always using lazy evaluation functions. So, the code above still needs to be executed. For this, we can ask dask to execute it using the compute method.

ndvi_mean_data <- ndvi_mean$compute()
ndvi_mean_data
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (y: 10980,
##                                                                 x: 10980)> Size: 964MB
## array([[1.06741027, 1.05594742, 1.08179379, ..., 1.42908627, 1.41649439,
##         1.40925625],
##        [0.98031766, 1.02521273, 1.06912391, ..., 1.42091841, 1.40591751,
##         1.4122806 ],
##        [0.9362474 , 0.9941019 , 1.04582839, ..., 1.40666432, 1.41219386,
##         1.42676016],
##        ...,
##        [0.25959347, 0.2616745 , 0.26388728, ..., 1.61944882, 1.79426859,
##         2.08429357],
##        [0.27365805, 0.27209891, 0.26758767, ..., 1.84578764, 2.18360486,
##         2.58891568],
##        [0.282057  , 0.27502554, 0.267502  , ..., 1.964097  , 2.25718843,
##         2.09648986]])
## Coordinates: (12/21)
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     earthsearch:boa_offset_applied           bool 1B True
##     mgrs:grid_square                         <U2 8B 'GA'
##     s2:processing_baseline                   <U5 20B '05.10'
##     ...                                       ...
##     processing:software                      object 8B {'sentinel2-to-stac': ...
##     proj:epsg                                int64 8B 32722
##     grid:code                                <U10 40B 'MGRS-22KGA'
##     raster:bands                             object 8B [{'nodata': 0, 'data_t...
##     gsd                                      object 8B 10
##     epsg                                     int64 8B 32722

To finish, we can save this result:

# defining output directory
output_dir <- fs::path("data/example-03")

# creating output directory
fs::dir_create(output_dir, recurse = TRUE)

# defining file metadata
ndvi_mean_data$rio$set_attrs(nir$attrs, inplace = TRUE)
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (y: 10980,
##                                                                 x: 10980)> Size: 964MB
## array([[1.06741027, 1.05594742, 1.08179379, ..., 1.42908627, 1.41649439,
##         1.40925625],
##        [0.98031766, 1.02521273, 1.06912391, ..., 1.42091841, 1.40591751,
##         1.4122806 ],
##        [0.9362474 , 0.9941019 , 1.04582839, ..., 1.40666432, 1.41219386,
##         1.42676016],
##        ...,
##        [0.25959347, 0.2616745 , 0.26388728, ..., 1.61944882, 1.79426859,
##         2.08429357],
##        [0.27365805, 0.27209891, 0.26758767, ..., 1.84578764, 2.18360486,
##         2.58891568],
##        [0.282057  , 0.27502554, 0.267502  , ..., 1.964097  , 2.25718843,
##         2.09648986]])
## Coordinates: (12/21)
##   * x                                        (x) float64 88kB 7e+05 ... 8.098...
##   * y                                        (y) float64 88kB 7.6e+06 ... 7.4...
##     mgrs:latitude_band                       <U1 4B 'K'
##     earthsearch:boa_offset_applied           bool 1B True
##     mgrs:grid_square                         <U2 8B 'GA'
##     s2:processing_baseline                   <U5 20B '05.10'
##     ...                                       ...
##     processing:software                      object 8B {'sentinel2-to-stac': ...
##     proj:epsg                                int64 8B 32722
##     grid:code                                <U10 40B 'MGRS-22KGA'
##     raster:bands                             object 8B [{'nodata': 0, 'data_t...
##     gsd                                      object 8B 10
##     epsg                                     int64 8B 32722
## Attributes:
##     spec:        RasterSpec(epsg=32722, bounds=(699960.0, 7490200.0, 809760.0...
##     crs:         epsg:32722
##     transform:   | 10.00, 0.00, 699960.00|\n| 0.00,-10.00, 7600000.00|\n| 0.0...
##     resolution:  10.0
# writing raster
ndvi_mean_data$rio$to_raster(output_dir / "ndvi.tif")