--- title: "Raster API Tutorial" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Raster API Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` **gdalraster** provides bindings to the Raster API of the Geospatial Data Abstraction Library ([GDAL](https://gdal.org/)). Using the API natively enables fast and scalable raster I/O from R. This vignette is an R port of the [GDAL Raster API tutorial](https://gdal.org/tutorials/raster_api_tut.html) for C++, C and Python, (c) 1998-2024 [Frank Warmerdam](https://github.com/warmerdam), [Even Rouault](https://github.com/rouault), and [others](https://github.com/OSGeo/gdal/graphs/contributors) ([MIT license](https://gdal.org/license.html)). ## Opening a raster dataset Before opening a GDAL-supported data store it is necessary to register drivers. There is a driver for each supported raster format. **gdalraster** automatically registers drivers when the package is loaded. A raster dataset is opened by creating a new instance of `GDALRaster-class` passing the `filename` and the access desired (`read_only = TRUE` is the default if not specified, or `read_only = FALSE` to open with update access): ```{r} library(gdalraster) tcc_file <- system.file("extdata/storml_tcc.tif", package="gdalraster") ds <- new(GDALRaster, tcc_file, read_only = TRUE) ``` An error is returned if the dataset cannot be opened (and creation of the `GDALRaster` object fails). Also, note that `filename` may not actually be the name of a physical file (though it usually is). Its interpretation is driver dependent, and it might be a URL, a database connection string, a file name with additional parameters, etc. `GDALRaster` is a C++ class exposed directly to R (via `RCPP_EXPOSED_CLASS`) that encapsulates a GDAL dataset object and its associated raster band objects. Methods of the class are accessed in R using the `$` operator: ```{r} ds str(ds) ``` ## Getting dataset information As described in the [GDAL Raster Data Model](https://gdal.org/user/raster_data_model.html), a GDAL dataset contains a list of raster bands, all pertaining to the same area and having the same resolution. It also has metadata, a coordinate system, a georeferencing transform, size of raster and various other information. In the particular but common case of a "north up" raster without any rotation or shearing, the georeferencing transform (see [Geotransform Tutorial](https://gdal.org/tutorials/geotransforms_tut.html)) takes the following form *with 1-based indexing in R*: ```{r} gt <- ds$getGeoTransform() gt[1] # x-coordinate of upper-left corner of the upper-left pixel gt[2] # pixel width (w-e resolution) gt[3] # 0 for north-up gt[4] # y-coordinate of upper-left corner of the upper-left pixel gt[5] # 0 for north-up gt[6] # pixel height (n-s resolution, negative value) ``` In the general case, this is an affine transform. Class `GDALRaster` includes convenience methods for the case of a north-up raster: ```{r} ds$bbox() # xmin, ymin, xmax, ymax ds$res() # pixel width, pixel height as positive values ``` The following code retrieves some additional information about the dataset: ```{r} # GDAL format driver ds$getDriverShortName() ds$getDriverLongName() # raster size in pixels, number of bands ds$getRasterXSize() ds$getRasterYSize() ds$getRasterCount() ds$dim() # coordinate reference system as WKT string ds$getProjectionRef() # origin and pixel size from the geotransform print(paste("Origin:", gt[1], gt[4])) print(paste("Pixel size:", gt[2], gt[6])) ``` ## Fetching a raster band At this time access to raster data via GDAL is done one band at a time. Also, metadata, block sizes, nodata values and various other information are available on a per-band basis. Class `GDALRaster` provides methods to access raster band objects of the dataset (numbered 1 through `ds$getRasterCount()`) by specifying a `band` number as the first argument: ```{r} # block size ds$getBlockSize(band = 1) # data type ds$getDataTypeName(band = 1) # nodata value ds$getNoDataValue(band = 1) # min, max, mean, sd of pixel values in the band ds$getStatistics(band = 1, approx_ok = FALSE, force = TRUE) # does this band have overviews? (aka "pyramids") ds$getOverviewCount(band = 1) # does this band have a color table? col_tbl <- ds$getColorTable(band = 1) if (!is.null(col_tbl)) head(col_tbl) ``` ## Reading raster data `GDALRaster$read()` is a wrapper for the `GDALRasterBand::RasterIO()` method in the underlying API. This method will automatically take care of data type conversion, up/down sampling and windowing. The following code will read the first row of data into a similarly sized vector. `GDALRaster$read()` will return data as R `integer` type if possible for the raster data type (Byte, Int8, Int16, UInt16, Int32), otherwise the returned vector will be of type `double` (UInt32, Float32, Float64) or `complex` (CInt16, CInt32, CFloat32, CFloat64). The returned data are organized in left to right, top to bottom pixel order. `NA` will be returned in place of the nodata value if the raster dataset has a nodata value defined for the band: ```{r} # read the first row of pixel values ncols <- ds$getRasterXSize() rowdata <- ds$read(band = 1, xoff = 0, yoff = 0, xsize = ncols, ysize = 1, out_xsize = ncols, out_ysize = 1) length(rowdata) typeof(rowdata) head(rowdata) ``` Writing data with `GDALRaster$write()` is similar to `$read()` with an additional argument specifying a vector of pixel data to write (arranged in left to right, top to bottom pixel order). The `xoff`, `yoff`, `xsize`, `ysize` arguments describe the window of raster data on disk to read (or write). It doesn't have to fall on tile boundaries, though access may be more efficient in some cases if it does. Note that GDAL uses memory caching algorithms during raster I/O to improve performance. The operation of the caching mechanism and configuration of cache memory size might be considered when scaling I/O to large datasets (see [GDAL Block Cache](https://usdaforestservice.github.io/gdalraster/articles/gdal-block-cache.html)). The values for `out_xsize` and `out_ysize` describe the size of the output buffer (an R vector of length `out_xsize * out_ysize` that data will be read into). When reading data at full resolution this would be the same as the window size (`xsize`, `ysize`). However, to load a reduced resolution overview, `out_xsize`, `out_ysize` could be set to smaller than the window on disk. The `$read()` method will perform automatic resampling as necessary if the specified output size (`out_xsize * out_ysize`) is different than the size of the region being read (`xsize * ysize`). In this case, overviews (a.k.a. "pyramids") will be utilized to do the I/O more efficiently if overviews are available at suitable resolution. The stand-alone function `plot_raster()` uses base R `graphics` to display raster data read from an open dataset (with options to display a subwindow, to read a reduced resolution overview, or read from multiple bands for RGB data): ```{r fig.width=6, fig.height=4, dev="png"} #| fig.alt: > #| A plot of pixel-level tree canopy cover (%) for an area of interest #| called Storm Lake, which is used for several example datasets in package #| gdalraster. The plot uses a color ramp from light to dark green #| indicating low to high tree canopy cover. Pixels with zero tree canopy #| cover are white. plot_raster(ds, legend = TRUE, main = "Storm Lake Tree Canopy Cover (%)") ``` ## Closing the dataset Calling `GDALRaster$close()` will result in proper cleanup, and flushing of any pending writes. Forgetting to close a dataset opened in update mode in a popular format like GTiff will likely result in being unable to open it afterwards. ```{r} # close the dataset for proper cleanup ds$close() ``` ## Techniques for creating datasets New raster datasets in GDAL-supported formats may be created if the format driver supports creation. There are two general techniques for creating datasets in the GDAL API: `GDALDriver::CreateCopy()` and `GDALDriver::Create()`. Using the CreateCopy method in R involves calling the stand-alone function `createCopy()`, passing in a source raster file name that should be copied. Using the Create method in R involves calling the stand-alone function `create()`, and then explicitly writing all the metadata and raster data with separate calls. All format drivers that support creating new datasets support `createCopy()`, but only a few support `create()`. The function `gdal_formats()` lists all currently configured raster formats along with the following read/write flags: * `ro` - read only * `rw` - read/write, supports `createCopy()` * `rw+` - read/write/update, supports `create()` The table of GDAL [raster format drivers](https://gdal.org/drivers/raster/index.html) can also be consulted to determine if a particular driver supports Create or CreateCopy methods. Note that a number of drivers are read-only and do not support either creation method. ## Using createCopy() `createCopy()` is simple to use as most information is collected from the source dataset. It includes an argument for passing a list of format specific creation options. It can be used to copy a raster to a different format, and/or change options such as the block size and arrangement, compression, various metadata, etc. The following code copies a multi-band raster in FARSITE v.4 LCP format (basically a raw format without support for compression or nodata values) to LZW-compressed GeoTiff: ```{r} lcp_file <- system.file("extdata/storm_lake.lcp", package="gdalraster") tif_file <- file.path(tempdir(), "storml_lndscp.tif") ds <- createCopy(format = "GTiff", dst_filename = tif_file, src_filename = lcp_file, options = "COMPRESS=LZW", return_obj = TRUE) # band = 0 for dataset-level metadata: ds$getMetadata(band = 0, domain = "IMAGE_STRUCTURE") # set nodata value for all bands for (band in 1:ds$getRasterCount()) ds$setNoDataValue(band, -9999) # band 2 of an LCP file is slope degrees ds$getStatistics(band = 2, approx_ok = FALSE, force = TRUE) ds$close() vsi_stat(lcp_file, "size") vsi_stat(tif_file, "size") ``` Note that the `createCopy()` method with `return_obj = TRUE` returns a writable dataset, and that it must be closed properly to complete writing and flushing the dataset to disk. `createCopy()` also has an optional `strict` argument that defaults to `FALSE` indicating that the call should proceed without a fatal error even if the destination dataset cannot be created to exactly match the input dataset. This might be because the output format does not support the pixel datatype of the input dataset, or because the destination cannot support writing georeferencing for instance. Information about format specific creation options can be obtained with the function `getCreationOptions()`. By default, this function lists all available creation options for a format. Output can also be filtered to specific options: ```{r} getCreationOptions("GTiff", "COMPRESS") getCreationOptions("GTiff", "SPARSE_OK") ``` ## Using create() `create()` can be used to create a new raster dataset manually. This function can also take a list of creation options as described above for `createCopy()`, but the raster size, number of bands and band type must be provided explicitly: ```{r} new_file <- file.path(tempdir(), "newdata.tif") ds <- create(format = "GTiff", dst_filename = new_file, xsize = 143, ysize = 107, nbands = 1, dataType = "Int16", return_obj = TRUE) ``` Once the dataset is successfully created, all appropriate metadata and raster data must be written to the file. What this includes will vary according to usage, but a simple case with a projection, geotransform and raster data is covered here: ```{r} # EPSG:26912 - NAD83 / UTM zone 12N ds$setProjection(epsg_to_wkt(26912)) gt <- c(323476.1, 30, 0, 5105082.0, 0, -30) ds$setGeoTransform(gt) ds$setNoDataValue(band = 1, -9999) ds$fillRaster(band = 1, -9999, 0) # ... # close the dataset when done ds$close() ``` ## See also **gdalraster** provides two additional functions for creating raster datasets: * `rasterFromRaster()` creates a new raster with spatial reference, extent and resolution taken from a template raster, without copying data. It optionally changes the format, number of bands, data type and nodata value, sets driver-specific dataset creation options, and initializes to a value. * `rasterToVRT()` creates a virtual raster dataset (VRT) derived from a source raster with options for virtual subsetting, virtually resampling the source data at a different pixel resolution, or applying a virtual kernel filter. Wrapper functions for several GDAL utilities, including `translate()` and `warp()`, are also available. See the [package overview](https://usdaforestservice.github.io/gdalraster/reference/gdalraster-package.html) for a full summary of functionality provided by the GDAL API bindings. ## Data sources The example datasets are National Land Cover Database (NLCD) Tree Canopy Cover (TCC v2021.4) from the USDA Forest Service (), and a multi-band FARSITE landscape file describing terrain, vegetation and wildland fuels from the LANDFIRE Program (LF 2020 version, ).