--- title: "`FIESTA` Manual - Population Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{`FIESTA` Manual - Population Data} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include = F} library(knitr) knitr::opts_chunk$set(message = F, warning = F) ``` ```{r, include=FALSE} # Sets up output folding hooks = knitr::knit_hooks$get() hook_foldable = function(type) { force(type) function(x, options) { res = hooks[[type]](x, options) if (isFALSE(options[[paste0("fold.", type)]])) return(res) paste0( "

", type, "

\n\n", res, "\n\n

" ) } } knitr::knit_hooks$set( output = hook_foldable("output"), plot = hook_foldable("plot") ) ``` ```{r, echo=-1} data.table::setDTthreads(2) ``` ## Overview `FIESTA`'s Estimation Modules combine multiple functions from `FIESTA` or other packages to generate estimates across a user-defined population(s) using different estimation strategies. Each module has an associated `mod*pop` function for compiling the population data and calculations, including adjustments for nonresponse and standardizing auxiliary data. The output form the `mod*pop` functions are input directly into the `mod*estimation` modules. All Population functions require similar data inputs, including a set of inventory response data and summarized auxiliary data for post-stratification or other model-assisted and model-based (i.e., small area) estimation strategies. This vignette describes the required input data sets, parameter inputs, and outputs from the `mod*pop` functions. Refer to the FIESTA_module_estimates vignette for more information on other parameter inputs to the `mod*` Estimation Modules and the following vignettes for running specific examples: * [FIESTA_tutorial_GB](FIESTA_tutorial_GB.html) * [FIESTA_tutorial_MA](FIESTA_tutorial_MA.html) * [FIESTA_tutorial_SA](FIESTA_tutorial_SA.html) ## Module Parameters {#input} The parameters for `FIESTA` modules are organized by different categories based on population data and resulting estimates. [Population data](#popdat): 1. [Population type](#ptyp) 2. [Population data tables and unique identifiers](#dtab) 3. [Estimation area info](#edata) 4. [Population filters](#popfilters) 5. [Estimation unit info](#estunit) 6. [Other](#other) 7. [Post-stratification information (strata_opts)](#strata_opts) 8. [Model-Assisted/Small Area information](#model_opts) 9. [Input data objects](#dataobjects) ### Population Data {#popdat} #### Population type {#ptyp} The population types (i.e., Eval_Type) currently available for `FIESTA` estimation. The population type defines the set of sampled plots and data used for estimation. For example, if you are only interested in area estimates (popType='CURR'), you do not need the tree data. Other population types will be available in the future, including GRM (Growth, mortality, removals), P2VEG (understory vegetation), CHNG (Change), and DWM (down woody material). These population types may have different sets of plots based on what was sampled. * **popType** - Population type ('ALL', 'CURR', 'VOL'). #### Population data tables and unique identifiers {#dtab} The required data tables include forest inventory data from the FIA national database (Burrill et al. 2018). Data table inputs can be the name of a comma-delimited file (\*.csv), a layer within a database, (e.g., SQLite), or an R data frame or data table object already loaded into R. The `pltassgn` table can also be a point shapefile (\*.shp), a spatial layer within a database, or an `sf` R object with one point per plot. The unique identifier for a plot must be provided in the corresponding parameter for each input table, match default variable names. See required variables section for a list of variables necessary to include for estimation. All modules require at least one table. * **popTables** - A named list of data tables used for estimates (cond, plt, tree, seed, vsubpspp, vsubpstr, subplot, subp_cond). See below for more details about tables. + **cond** - Condition-level data, with 1 record per condition, including nonsampled conditions. May include estimation unit and strata assignment if plt=NULL and pltassgn=NULL - *required for all estimates*. + **tree** - Tree-level data, with 1 record per tree - *required for modGBtree() or modGBratio() tree estimates*. + **plt** - Plot-level data, with 1 record per plot, including nonsampled conditions. May include nonsampled plots if PLOT_STATUS_CD variable is in dataset. May include estimation unit and strata assignment if pltassgn=NULL - *optional for all estimates*. + **seed** - Seedling data, with 1 record per seedling count - *required for modGBtree() or modGBratio() seedling estimates*. * **popTableIDs** - A named list of variable names defining unique plot identifiers in each table listed in popTables. See below for more details about tables. + **cond** - Unique identifier for plot in cond (default="PLT_CN"). + **plt** - Unique identifier for plot in plt (default="CN"). + **tree** - Unique identifier for plot in tree (default="PLT_CN"). + **seed** - Unique identifier for plot in seed (default="PLT_CN"). + **vsubpspp** - Unique identifier for plot in vsubpspp (default="PLT_CN"). + **vsubpstr** - Unique identifier for plot in vsubpstr (default="PLT_CN"). + **subplot** - Unique identifier for plot in subplot (default="PLT_CN"). + **subp_cond** - Unique identifier for plot in subp_cond (default="PLT_CN"). * **pltassgn** - Plot-level data, with 1 record per plot and plot assignment of estimation unit and strata, if applying stratification. If nonsampled plots are included, PLOT_STATUS_CD variable must be in table. These plots are excluded from the analysis. - *optional for all estimates*. * **pltassgnid**- Unique identifier for plot in pltassgn (default="PLT_CN"). * **pjoinid** - Join variable in plot (or cond) to match pltassgnid. Does not need to be unique. * **dsn** - Data source name of database where data table layers reside. #### Estimation area info {#edata} Define information for area estimation. * **areawt** - Variable to use for calculating area estimates (e.g., CONDPROP_UNADJ). This may be different for other population types. * **adj** - Adjustment for nonresponse ('none', 'samp', 'plot'). Note: adj='samp', expands area across strata and estimation unit(s), based on the summed proportions of sampled conditions divided by the total plots in the strata/estimation unit; adj='plot', expands area across plot based on 1 divided by the summed proportions of sampled conditions in plot. #### Population filters {#popfilters} Population filters subset the plot data set before population calculations are generated. * **evalid** - If multiple evaluations are in dataset, select evalid for estimation. * **invyrs** - If want to subset inventory years in dataset for estimation. * **intensity** - If want to specify intensification number of code to use for estimation. * **ACI** - Logical. If TRUE, includes All Condition Inventory (ACI) conditions and associated tree data in estimates. If FALSE, a filter, is applied to remove nonsampled nonforest conditions (see cond.nonsamp.filter). #### Estimation unit information {#estunit} An estimation unit is a population, or area of interest, with known area and number of plots. As an example, for RMRS FIA, an estimation unit is generally an individual county. An estimation unit may be a sub-population of a larger population (e.g., Counties within a State). For post-stratified estimation, sub-populations are mutually exclusive and independent within a population, therefore estimated totals and variances are additive. Each plot is assigned to only one estimation unit based on plot center and can be stored in either `pltassgn` or `cond`. For model-based, small area estimators, an estimation unit is a sub-population, referred to as a model domain unit, where each domain unit is a component in a model. * **unitvar/dunitvar** - Name of the estimation/domain unit variable in cond or pltassgn with assignment for each plot (e.g., 'ESTN_UNIT'). * **unitvar2** - Name of a second estimation unit variable in cond or pltassgn with assignment for each plot (e.g., 'STATECD'). * **unitarea/dunitarea** - Total acres by estimation/domain unit. If only 1 estimation unit, include a number representing total acreage for the area of interest. If more than one estimation unit, provide a data frame/data table of total acres by estimation unit, with variables defined by unitvar and areavar. * **areavar** - Name of acre variable in unitarea (Default = "ACRES"). * **areaunits** - Units of areavar in unitarea ('acres', 'hectares'). * **minplotnum.unit** - Minimum number of plots for estimation unit (Default=10). * **unit.action/dunit.action** - What to do if number of plots in an estimation/domain unit is less than minplotnum.unit ('keep', 'remove' 'combine'). If unit.action='combine', combines estimation/domain unit to the following estimation/domain unit in unitzonal/dunitzonal. Note: If there are less than minplotnum.unit plots in an estimation/domain unit: if unit.action/dunit.action='keep', NA is returned for the estimation/domain unit; if unit.action/dunit.action='remove', the estimation/domain unit is removed from the returned output; if unit.action/dunit.action='combine', an automated procedure occurs to group estimation/domain units with less than minplotnum.unit plots with the next estimation/domain unit in the stratalut or unitzonal table. If it is the last estimation/domain unit in the table, it is grouped with the estimation/domain unit preceding in the table. A recommended number of plots for post-stratified estimation is provided as defaults (Westfall and others, 2011). #### Other {#other} * **strata** - TRUE, use post-stratification for reducing variance in estimates (see strata_opts for strata parameters). For use in GB or MA modules. * **savedata** - TRUE, save data to outfolder (See savedata_opts for savedata parameters). #### Post-stratification information (strata_opts) {#strata_opts} Post-stratification is used to reduce variance in population estimates by partitioning the population into homogenous classes (strata), such as forest and nonforest. For stratified sampling methods, the strata sizes (weights) must be either known or estimated. Remotely-sensed data is often used to generate strata weights with proporation of pixels by strata. If stratification is desired (strata=TRUE), the required data include: stratum assignment for the center location of each plot, stored in either pltassgn or cond; and a look-up table with the area, pixel count, or proportion of the total area (strwt) of each strata value by estimation unit, making sure the name of the strata (and estimation unit) variable and values match the plot assignment name(s) and value(s). If strata (and estimation unit) variables are included in cond, all conditions in a plot must have the same strata (and estimation unit) value. In FIESTA, the plot assignments, strata proportions, and area are provided by the user and may be obtained through FIESTA or other means, given the proper format. These parameters are set by supplying a list to the `strata_opts` parameter. The possible parameters that can be set within the `strata_opts` parameter can be seen by running `help(strata_options)` * **stratalut** - Look-up table with pixel counts, area, or proportions (strwt) by strata (and estimation unit). * **strvar** - Name of strata variable in stratalut and pltassgn or cond table with strata assignment for each plot. * **getwt** - If TRUE, calculates strata weights from getwtvar in stratalut. * **getwtvar** - If getwt=TRUE, name of variable in stratalut to calculate weights Default="P1POINTCNT". * **strwtvar** - If getwt=FALSE, name of variable in stratalut with calculated weights (Default = 'strwt'). * **stratcombine** - TRUE, and strata=TRUE, an automated procedure occurs to combine strata within estimation units if less than minplotnum.strat (See note below for more details). * **minplotnum.strat** - Integer. Minimum number of plots for a stratum within an estimation unit (Default=2). Note: If there are less than minplotnum.strat plots (default=2 plots) in any strata/estimation unit combination: if stratcombine=FALSE, an error occurs with a message to collapse classes; if stratcombine=TRUE, an automated procedure occurs to collapse all strata less than minplotnum.strat. The function collapses classes based on the order of strata in stratatlut. If a strata within in estimation unit is less than minplotnum.strat, it is grouped with the next strata class in stratalut. #### Model-Assisted/Small Area information {#model_opts} Other Model-Assisted and Small Area estimation strategies require unit/dunit-level information, including auxiliary data summaries and predictor names. The following parameters are used to provide this information in the MA and SA `FIESTA` modules. * **unitzonal** - Table with zonal statistics (e.g., mean, area, proportions) by estimation/domain unit. * **npixelvar** - Name of variable in unitzonal referencing total pixels by estimation unit (MA module only). * **prednames** - Name(s) of predictors in unitzonal to use in models. If NULL, all variables in table are used. * **predfac** - Name(s) of predictors in unitzonal that are categorical (or factor) variables. #### Input data objects {#dataobjects} Data object parameters allow a user to use other functions from FIESTA to input parameters directly. * **\*data** - Output data list components from `FIESTA::an*data` functions. * **pltdat** - Output data list components from `FIESTA::pltdat` function. * **GBstratdat** - Output data list components from `FIESTA::spGetStrata` function (GB module only). * **auxdat** - Output data list components from `FIESTA::spGetAuxiliary` function (MA and SA modules only). ## Output values from `FIESTA` module population functions (`mod*pop`) {#populationoutput} * **condx** - Data frame of condition data within population, including plot assignments, condition proportion adjustment factor (cadjfac), and adjusted condition proportions (CONDPROP_ADJ). * **pltcondx** - Data frame of plot/condition data within population, used for estimation. * **treex** - Data frame of tree data within population, used for estimation, including trees per acre adjustment factor (tadjfac), and adjusted trees per acre (TPA_ADJ). * **cuniqueid** - Unique identifier of plot in condx and pltcondx. * **tuniqueid** - Unique identifier of plot in treex. * **condid** - Unique identifier of condition in condx and pltcondx. * **ACI.filter** - Filter used for excluding ACI plots if ACI=FALSE. * **unitvar** - Name of estimation unit variable in unitarea and condx. * **unitarea** - Data frame with area by estimation unit. * **areavar** - Name of area variable in unitarea. * **stratalut** - Data frame of stratification information by estimation unit. See below for variable descriptions. * **strvar** - Name of strata variable in stratalut and condx * **expcondtab** - Data frame of condition-level area expansion factors. * **plotsampcnt** - Number of plots by plot status. * **condsampcnt** - Number of conditions by condition status. * **states** - States of FIA plot data used in estimation (for title reference). * **invyrs** - Inventory years of FIA plot data used in estimation (for title reference). ```{r, results = 'asis', echo=FALSE} #stratdat.lut <- read.csv("C:/_tsf/_GitHub/meta/stratdat_variables.csv", stringsAsFactors=FALSE) #source("C:/_tsf/_GitHub/meta/undataframe.R") #stratdat.lut <- undataframe(stratdat.lut) stratdat.lut <- data.frame( Variable = c("ESTN_UNIT", "STRATUMCD", "P1POINTCNT", "P2POINTCNT", "n.strata", "n.total", "strwt", "CONDPROP_UNADJ_SUM", "cadjfac", "ACRES", "expfac", "EXPNS"), Description = c("Estimation unit", "Strata value", "Number of pixels by strata and estimation unit", "Number of P2 plots in population data", "Number of sampled plots in strata", "Number of sampled plots for estimation unit", "Proportion of pixels in strata (strata weight)", "Summed condition proportion in strata", "Adjustment factor for nonsampled plots in strata (CONDPROP_UNADJ_SUM/n.strata)", "Total acres for estimation unit", "Expansion factor, in acres, area in strata divided by number of sampled plots", "Expanded area, in acres, expfac multiplied by strwt"), stringsAsFactors = FALSE) kable(stratdat.lut, format = "pandoc", # default caption = "Description of variables in stratdat.", col.names = names(stratdat.lut), row.names = FALSE, align = c("l"), # align = c("c", "c", "c", "r") # padding = 2 # inner spacing ) ``` ## Required variables in input data tables {#required} The following variables by data table are required for successful `FIESTA` output. ```{r, results = 'asis', echo=FALSE} #required.lut <- read.table("C:/_tsf/_GitHub/meta/required_variables.txt", header=TRUE, sep="\t") required.lut <- data.frame( Table = c("tree", "", "cond", "", "", "", "", "", "", "", "", "plot", "", ""), Variable = c("PLT_CN", "TPA_UNADJ", "PLT_CN", "CONDPROP_UNADJ", "COND_STATUS_CD", "NF_COND_STATUS_CD", "SITECLCD", "RESERVCD", "SUBPROP_UNADJ", "MICRPROP_UNADJ", "MACRPROP_UNADJ", "CN", "STATECD", "INVYR"), Description = c("popTableIDs - Unique identifier for each plot, for joining tables (e.g. PLT_CN)", "Number of trees per acre each sample tree represents (e.g. DESIGNCD=1: TPA_UNADJ=6.018046 for trees on subplot; 74.965282 for trees on microplot)", "popTableIDs - Unique identifier for each plot, for joining tables (e.g., PLT_CN)", "Unadjusted proportion of condition on each plot. Optional if only 1 condition (record) per plot", "Status of each forested condition on plot (i.e. accessible forest, nonforest, water, etc.)", "Only if ACI=TRUE. Status of each nonforest condition plot (i.e. accessible nonforest, nonsampled nonforest)", "Only if landarea=TIMBERLAND. Measure of site productivity", "If landarea=TIMBERLAND. Reserved status", "Unadjusted proportion of subplot conditions on each plot. Optional if only 1 condition (record) per plot", "If microplot tree attributes. Unadjusted proportion of microplot conditions on each plot. Optional if only 1 condition (record) per plot", "If macroplot tree attributes. Unadjusted proportion of macroplot conditions on each plot. Optional if only 1 condition (record) per plot", "popTableIDs - Unique identifier for each plot, for joining tables (e.g. CN)", "Identifies state each plot is located in. Optional if only 1 state", "Identifies inventory year of each plot. Optional. Assumes estimation time span is less than inventory cycle"), stringsAsFactors = FALSE) kable(required.lut, format = "pandoc", # default # caption = "Title of the table", col.names = names(required.lut), row.names = FALSE, align = c("l"), # align = c("c", "c", "c", "r") # padding = 2 # inner spacing ) ``` ## References Burrill, E.A., Wilson, A.M., Turner, J.A., Pugh, S.A., Menlove, J., Christiansen, G., Conkling, B.L., Winnie, D., 2018. Forest Inventory and Analysis Database [WWW Document]. St Paul MN US Dep. Agric. For. Serv. North. Res. Stn. URL https://apps.fs.usda.gov/fia/datamart/datamart.html (accessed 3.6.21).