This document presents an example of how the xarray can be used to handle spatio-temporal raster in R.
To use the xarray in R, it is required to have the reticulate package installed. To do so, you can use the following command:
install.packages("reticulate")
After installing the reticulate package, we must create a Python virtual environment to install the xarray library. For this, we can use the following command:
In this example, we use conda to manage Python virtual environments, but any other package management tool can also be used (e.g., mamba, virtualenv, poetry).
# defining the name of the environment
conda_env_name <- "r-xarray"
# creating conda environment
reticulate::conda_create(envname = conda_env_name)
## + /opt/conda/bin/conda 'create' '--yes' '--name' 'r-xarray' 'python=3.9' '--quiet' '-c' 'conda-forge'
## [1] "/home/sits/.conda/envs/r-xarray/bin/python"
With the environment created, let’s install the dependencies on it:
reticulate::py_install(
c("xarray", "dask", "distributed", "rioxarray", "stackstac"),
envname = conda_env_name,
pip = TRUE
)
Now, we have an environment ready to go. So, let’s activate it:
reticulate::use_condaenv(condaenv = conda_env_name)
To use the xarray library, we can import it using the following reticulate command:
xr <- reticulate::import("xarray")
For this example, we also need to import some dask packages:
da <- reticulate::import("dask.array")
dd <- reticulate::import("dask.distributed")
By default, xarray does not
manage raster data (e.g., read
, write
,
spatial reference system
). For this, we need to use the rioxarray
library, which makes a rasterio
accessor
available in the xarray classes,
enabling us to handle raster data. Let’s import it:
rioxarray <- reticulate::import("rioxarray")
Also, in this example, we will use data from an STAC catalog. To keep this example focused on the xarray and avoid many data manipulations, we will also use the stackstac library, which turns STAC items into an xarray:
stackstac <- reticulate::import("stackstac")
Now, we are ready to go!
As dask was created to help us
scale applications using multiple cores or machines, to use it, we need
first to configure the connection between our application and those
resources. To do that, we must create an instance of the class
dask.distributed.Client
:
client <- dd$Client()
client
## <Client: 'tcp://127.0.0.1:42677' processes=4 threads=16, memory=31.20 GiB>
By default, when we create an instance of the
dask.distributed.Client
class, dask
configures a local server to use all cores available in the machine.
rstac
In this example, we will get data from the STAC catalog managed by Element 84. For this, we will use the rstac. Let’s import it:
library(rstac)
Now, let’s define the endpoint
object of the STAC
catalog:
s_obj <- stac("https://earth-search.aws.element84.com/v1")
Using the endpoint
we created above, let’s search for
Sentinel 2 L2A data from the Ibitinga reservoir (Brazil / SP):
it_obj <- s_obj |>
stac_search(
collections = "sentinel-2-l2a",
bbox = c(-48.931732, -22.055096, -48.850708, -21.982528), # Ibitinga/SP
limit = 3
) |>
get_request()
Loading data can be very time-consuming. So, in this example, we will download the data locally. For this, we need to create an output directory:
# defining output directory
output_dir <- fs::path("data/example-04/sentinel-2-l2a")
# creating output directory
fs::dir_create(output_dir, recurse = TRUE)
Now, we can download the data:
it_obj |>
assets_download(asset_names = c("red", "nir"), output_dir = output_dir)
##
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
## ###Items
## - matched feature(s): 580
## - features (3 item(s) / 577 not fetched):
## - S2B_22KGA_20240223_0_L2A
## - S2A_22KGA_20240218_0_L2A
## - S2B_22KGA_20240213_0_L2A
## - assets: nir, red
## - item's fields:
## assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type
As the final step, we need to have the reference of the files we
downloaded. rstac
helps us by saving an
items.json
file in an STAC Catalog schema so we can load
and manage the data easily:
metadata <- jsonlite::read_json(output_dir / "items.json")
# Adapting the paths of the metadata from /path/to/data to {output_dir}/path/to/data
metadata$features <- lapply(metadata$features, function(x) {
x$assets <- lapply(x$assets, function(y) {
y$href <- as.character(fs::path_join(output_dir / y$href))
y
})
x
})
Based on the STAC Items we have downloaded from the STAC Catalog, we can create a spatio-temporal array. For this, we will use the stackstac library:
spatiotemporal_array <- stackstac$stack(metadata$features)
spatiotemporal_array
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (time: 3,
## band: 2,
## y: 10980,
## x: 10980)> Size: 6GB
## dask.array<fetch_raster_window, shape=(3, 2, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/54)
## * time (time) datetime64[ns] 24B 2024-0...
## id (time) <U24 288B 'S2B_22KGA_2024...
## * band (band) <U3 24B 'nir' 'red'
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## ... ...
## title (band) <U20 160B 'NIR 1 (band 8)...
## gsd int64 8B 10
## common_name (band) <U3 24B 'nir' 'red'
## center_wavelength (band) float64 16B 0.842 0.665
## full_width_half_max (band) float64 16B 0.145 0.038
## epsg int64 8B 32722
## Attributes:
## spec: RasterSpec(epsg=32722, bounds=(699960.0, 7490200.0, 809760.0...
## crs: epsg:32722
## transform: | 10.00, 0.00, 699960.00|\n| 0.00,-10.00, 7600000.00|\n| 0.0...
## resolution: 10.0
Using the spatio-temporal array, we can calculate a temporal mean of
NDVI data. For this, first, let’s get the NIR
and
RED
data:
nir <- spatiotemporal_array$sel(band="nir")
red <- spatiotemporal_array$sel(band="red")
Now, we can calculate the NDVI:
ndvi <- (nir - red) / (nir + red)
ndvi
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (time: 3,
## y: 10980,
## x: 10980)> Size: 3GB
## dask.array<truediv, shape=(3, 10980, 10980), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/49)
## * time (time) datetime64[ns] 24B 2024-0...
## id (time) <U24 288B 'S2B_22KGA_2024...
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## s2:product_uri (time) <U65 780B 'S2B_MSIL2A_202...
## ... ...
## view:sun_azimuth (time) float64 24B 77.97 ... 71.46
## proj:transform object 8B {0, 7600000, 10, -10, ...
## proj:shape object 8B {10980}
## raster:bands object 8B {'nodata': 0, 'data_ty...
## gsd int64 8B 10
## epsg int64 8B 32722
To finish, we can calculate the temporal mean of the NDVI data we created:
ndvi_mean <- ndvi$mean(dim = c("time"))
ndvi_mean
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (y: 10980,
## x: 10980)> Size: 964MB
## dask.array<mean_agg-aggregate, shape=(10980, 10980), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
## Coordinates: (12/23)
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## s2:datatake_type <U8 32B 'INS-NOBS'
## earthsearch:boa_offset_applied bool 1B True
## s2:product_type <U7 28B 'S2MSI2A'
## ... ...
## processing:software object 8B {'sentinel2-to-stac': ...
## proj:transform object 8B {0, 7600000, 10, -10, ...
## proj:shape object 8B {10980}
## raster:bands object 8B {'nodata': 0, 'data_ty...
## gsd int64 8B 10
## epsg int64 8B 32722
With xarray, we are
always using lazy evaluation functions. So, the code above still needs
to be executed. For this, we can ask dask to execute it using the
compute
method.
ndvi_mean_data <- ndvi_mean$compute()
ndvi_mean_data
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (y: 10980,
## x: 10980)> Size: 964MB
## array([[1.06741027, 1.05594742, 1.08179379, ..., 1.42908627, 1.41649439,
## 1.40925625],
## [0.98031766, 1.02521273, 1.06912391, ..., 1.42091841, 1.40591751,
## 1.4122806 ],
## [0.9362474 , 0.9941019 , 1.04582839, ..., 1.40666432, 1.41219386,
## 1.42676016],
## ...,
## [0.25959347, 0.2616745 , 0.26388728, ..., 1.61944882, 1.79426859,
## 2.08429357],
## [0.27365805, 0.27209891, 0.26758767, ..., 1.84578764, 2.18360486,
## 2.58891568],
## [0.282057 , 0.27502554, 0.267502 , ..., 1.964097 , 2.25718843,
## 2.09648986]])
## Coordinates: (12/23)
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## s2:datatake_type <U8 32B 'INS-NOBS'
## earthsearch:boa_offset_applied bool 1B True
## s2:product_type <U7 28B 'S2MSI2A'
## ... ...
## processing:software object 8B {'sentinel2-to-stac': ...
## proj:transform object 8B {0, 7600000, 10, -10, ...
## proj:shape object 8B {10980}
## raster:bands object 8B {'nodata': 0, 'data_ty...
## gsd int64 8B 10
## epsg int64 8B 32722
To finish, we can save this result:
# defining file metadata
ndvi_mean_data$rio$set_attrs(nir$attrs, inplace = TRUE)
## <xarray.DataArray 'stackstac-5f62925312549d03a38973854ea7d911' (y: 10980,
## x: 10980)> Size: 964MB
## array([[1.06741027, 1.05594742, 1.08179379, ..., 1.42908627, 1.41649439,
## 1.40925625],
## [0.98031766, 1.02521273, 1.06912391, ..., 1.42091841, 1.40591751,
## 1.4122806 ],
## [0.9362474 , 0.9941019 , 1.04582839, ..., 1.40666432, 1.41219386,
## 1.42676016],
## ...,
## [0.25959347, 0.2616745 , 0.26388728, ..., 1.61944882, 1.79426859,
## 2.08429357],
## [0.27365805, 0.27209891, 0.26758767, ..., 1.84578764, 2.18360486,
## 2.58891568],
## [0.282057 , 0.27502554, 0.267502 , ..., 1.964097 , 2.25718843,
## 2.09648986]])
## Coordinates: (12/23)
## * x (x) float64 88kB 7e+05 ... 8.098...
## * y (y) float64 88kB 7.6e+06 ... 7.4...
## mgrs:latitude_band <U1 4B 'K'
## s2:datatake_type <U8 32B 'INS-NOBS'
## earthsearch:boa_offset_applied bool 1B True
## s2:product_type <U7 28B 'S2MSI2A'
## ... ...
## processing:software object 8B {'sentinel2-to-stac': ...
## proj:transform object 8B {0, 7600000, 10, -10, ...
## proj:shape object 8B {10980}
## raster:bands object 8B {'nodata': 0, 'data_ty...
## gsd int64 8B 10
## epsg int64 8B 32722
## Attributes:
## spec: RasterSpec(epsg=32722, bounds=(699960.0, 7490200.0, 809760.0...
## crs: epsg:32722
## transform: | 10.00, 0.00, 699960.00|\n| 0.00,-10.00, 7600000.00|\n| 0.0...
## resolution: 10.0
# writing raster
ndvi_mean_data$rio$to_raster(output_dir / "ndvi.tif")