This document presents an example of how the xarray can be used to handle spatio-temporal raster in R.
To use the xarray in R, it is required to have the reticulate package installed. To do so, you can use the following command:
install.packages("reticulate")
After installing the reticulate package, we must create a Python virtual environment to install the xarray library. For this, we can use the following command:
In this example, we use conda to manage Python virtual environments, but any other package management tool can also be used (e.g., mamba, virtualenv, poetry).
# defining the name of the environment
conda_env_name <- "r-xarray"
# creating conda environment
reticulate::conda_create(envname = conda_env_name)
## + /opt/conda/bin/conda 'create' '--yes' '--name' 'r-xarray' 'python=3.9' '--quiet' '-c' 'conda-forge'
## [1] "/home/sits/.conda/envs/r-xarray/bin/python"
With the environment created, let’s install the dependencies on it:
reticulate::py_install(
c("xarray", "dask", "distributed", "rioxarray", "stackstac"),
envname = conda_env_name,
pip = TRUE
)
Now, we have an environment ready to go. So, let’s activate it:
reticulate::use_condaenv(condaenv = conda_env_name)
To use the xarray library, we can import it using the following reticulate command:
xr <- reticulate::import("xarray")
For this example, we also need to import some dask packages:
da <- reticulate::import("dask.array")
dd <- reticulate::import("dask.distributed")
By default, xarray does not
manage raster data (e.g., read
, write
,
spatial reference system
). For this, we need to use the rioxarray
library, which makes a rasterio
accessor
available in the xarray classes,
enabling us to handle raster data. Let’s import it:
rioxarray <- reticulate::import("rioxarray")
Also, in this example, we will use data from an STAC catalog. To keep this example focused on the xarray and avoid many data manipulations, we will also use the stackstac library, which turns STAC items into an xarray:
stackstac <- reticulate::import("stackstac")
Now, we are ready to go!
As dask was created to help us
scale applications using multiple cores or machines, to use it, we need
first to configure the connection between our application and those
resources. To do that, we must create an instance of the class
dask.distributed.Client
:
client <- dd$Client()
client
## <Client: 'tcp://127.0.0.1:34623' processes=4 threads=16, memory=31.20 GiB>
By default, when we create an instance of the
dask.distributed.Client
class, dask
configures a local server to use all cores available in the machine.
rstac
In this example, we will get data from the STAC catalog managed by Element 84. For this, we will use the rstac. Let’s import it:
library(rstac)
Now, let’s define the endpoint
object of the STAC
catalog:
s_obj <- stac("https://earth-search.aws.element84.com/v1")
Using the endpoint
we created above, let’s search for
Sentinel 2 L2A data from the Ibitinga reservoir (Brazil / SP):
it_obj <- s_obj |>
stac_search(
collections = "sentinel-2-l2a",
bbox = c(-48.931732, -22.055096, -48.850708, -21.982528), # Ibitinga/SP
limit = 3
) |>
get_request()
Based on the STAC Items we have selected from the STAC Catalog, we can create a spatio-temporal array. For this, we will use the stackstac library:
spatiotemporal_array <- stackstac$stack(it_obj$features)
spatiotemporal_array
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (time: 3,
## band: 32,
## y: 10980,
## x: 10980)> Size: 93GB
## dask.array<fetch_raster_window, shape=(3, 32, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/52)
## * time (time) datetime64[ns] 24B 2024-0...
## id (time) <U24 288B 'S2B_22KGA_2024...
## * band (band) <U12 2kB 'aot' ... 'wvp-jp2'
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## ... ...
## title (band) <U31 4kB 'Aerosol optical...
## gsd (band) object 256B None 10 ... None
## common_name (band) object 256B None ... None
## center_wavelength (band) object 256B None ... None
## full_width_half_max (band) object 256B None ... None
## epsg int64 8B 32722
## Attributes:
## spec: RasterSpec(epsg=32722, bounds=(699960.0, 7490200.0, 809760.0...
## crs: epsg:32722
## transform: | 10.00, 0.00, 699960.00|\n| 0.00,-10.00, 7600000.00|\n| 0.0...
## resolution: 10.0
Using the spatio-temporal array, we can calculate a temporal mean of
NDVI data. For this, first, let’s get the NIR
and
RED
data:
nir <- spatiotemporal_array$sel(band="nir")
red <- spatiotemporal_array$sel(band="red")
Now, we can calculate the NDVI:
ndvi <- (nir - red) / (nir + red)
ndvi
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (time: 3,
## y: 10980,
## x: 10980)> Size: 3GB
## dask.array<truediv, shape=(3, 10980, 10980), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/47)
## * time (time) datetime64[ns] 24B 2024-0...
## id (time) <U24 288B 'S2B_22KGA_2024...
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## earthsearch:boa_offset_applied bool 1B True
## ... ...
## processing:software object 8B {'sentinel2-to-stac': ...
## proj:epsg int64 8B 32722
## grid:code <U10 40B 'MGRS-22KGA'
## raster:bands object 8B [{'nodata': 0, 'data_t...
## gsd object 8B 10
## epsg int64 8B 32722
To finish, we can calculate the temporal mean of the NDVI data we created:
ndvi_mean <- ndvi$mean(dim = c("time"))
ndvi_mean
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (y: 10980,
## x: 10980)> Size: 964MB
## dask.array<mean_agg-aggregate, shape=(10980, 10980), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/21)
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## earthsearch:boa_offset_applied bool 1B True
## mgrs:grid_square <U2 8B 'GA'
## s2:processing_baseline <U5 20B '05.10'
## ... ...
## processing:software object 8B {'sentinel2-to-stac': ...
## proj:epsg int64 8B 32722
## grid:code <U10 40B 'MGRS-22KGA'
## raster:bands object 8B [{'nodata': 0, 'data_t...
## gsd object 8B 10
## epsg int64 8B 32722
With xarray, we are
always using lazy evaluation functions. So, the code above still needs
to be executed. For this, we can ask dask to execute it using the
compute
method.
ndvi_mean_data <- ndvi_mean$compute()
ndvi_mean_data
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (y: 10980,
## x: 10980)> Size: 964MB
## array([[1.06741027, 1.05594742, 1.08179379, ..., 1.42908627, 1.41649439,
## 1.40925625],
## [0.98031766, 1.02521273, 1.06912391, ..., 1.42091841, 1.40591751,
## 1.4122806 ],
## [0.9362474 , 0.9941019 , 1.04582839, ..., 1.40666432, 1.41219386,
## 1.42676016],
## ...,
## [0.25959347, 0.2616745 , 0.26388728, ..., 1.61944882, 1.79426859,
## 2.08429357],
## [0.27365805, 0.27209891, 0.26758767, ..., 1.84578764, 2.18360486,
## 2.58891568],
## [0.282057 , 0.27502554, 0.267502 , ..., 1.964097 , 2.25718843,
## 2.09648986]])
## Coordinates: (12/21)
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## earthsearch:boa_offset_applied bool 1B True
## mgrs:grid_square <U2 8B 'GA'
## s2:processing_baseline <U5 20B '05.10'
## ... ...
## processing:software object 8B {'sentinel2-to-stac': ...
## proj:epsg int64 8B 32722
## grid:code <U10 40B 'MGRS-22KGA'
## raster:bands object 8B [{'nodata': 0, 'data_t...
## gsd object 8B 10
## epsg int64 8B 32722
To finish, we can save this result:
# defining output directory
output_dir <- fs::path("data/example-03")
# creating output directory
fs::dir_create(output_dir, recurse = TRUE)
# defining file metadata
ndvi_mean_data$rio$set_attrs(nir$attrs, inplace = TRUE)
## <xarray.DataArray 'stackstac-dbb0e730c11b926fe5ac5a87ce8c790a' (y: 10980,
## x: 10980)> Size: 964MB
## array([[1.06741027, 1.05594742, 1.08179379, ..., 1.42908627, 1.41649439,
## 1.40925625],
## [0.98031766, 1.02521273, 1.06912391, ..., 1.42091841, 1.40591751,
## 1.4122806 ],
## [0.9362474 , 0.9941019 , 1.04582839, ..., 1.40666432, 1.41219386,
## 1.42676016],
## ...,
## [0.25959347, 0.2616745 , 0.26388728, ..., 1.61944882, 1.79426859,
## 2.08429357],
## [0.27365805, 0.27209891, 0.26758767, ..., 1.84578764, 2.18360486,
## 2.58891568],
## [0.282057 , 0.27502554, 0.267502 , ..., 1.964097 , 2.25718843,
## 2.09648986]])
## Coordinates: (12/21)
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## earthsearch:boa_offset_applied bool 1B True
## mgrs:grid_square <U2 8B 'GA'
## s2:processing_baseline <U5 20B '05.10'
## ... ...
## processing:software object 8B {'sentinel2-to-stac': ...
## proj:epsg int64 8B 32722
## grid:code <U10 40B 'MGRS-22KGA'
## raster:bands object 8B [{'nodata': 0, 'data_t...
## gsd object 8B 10
## epsg int64 8B 32722
## Attributes:
## spec: RasterSpec(epsg=32722, bounds=(699960.0, 7490200.0, 809760.0...
## crs: epsg:32722
## transform: | 10.00, 0.00, 699960.00|\n| 0.00,-10.00, 7600000.00|\n| 0.0...
## resolution: 10.0
# writing raster
ndvi_mean_data$rio$to_raster(output_dir / "ndvi.tif")