Climate-fire statistical models

Statistical models were developed between scaled seasonal VPD and/or scaled precedent precipitation and annual burn fraction for each ecoregion in the model domain. Two versions of the statistical model were developed: the static model which does not account for fire-fuel feedback and the dynamic model which does (Abatzoglou et al., 2021).

Static model

In the static model, a function g of the annual burn fraction is a linear combination of seasonal VPD and precedent precipitation:

Equation 1:

g (B (t)) = α_{s} + β_{s, v} v (t) + β_{s, p} p (t) + ε

where g is the link function, B is the annual burn fraction, $t$ is time, $α_{s}$ , $β_{s, v}$ and $β_{s, p}$ are unknown parameters, $ε$ is a random error term , and v(t) and p(t) are the seasonal VPD and precedent precipitation, respectively.

The link function g is a transformation function that links a linear combination of the explanatory variables (v and p) to the response variable (B). Here the logit function is used since it transforms variables that have positive values (zero to infinity), to a variable that only has values between zero and one.

Dynamic model

The static model accounts for climate influences alone, but the dynamic model also accounts for how a fraction of the ecoregion, called the lost fraction, has been made temporarily unavailable to burn due to the preceding years of wildfire activity. This effect is called temporary fire-fuel feedback. In the dynamic model, the function g of the annual burn fraction and lost fraction are related to a linear combination of scaled seasonal VPD and scaled precedent precipitation:

Equation 2:

g (\frac{B (t)}{1 - L (t)}) = α_{d} + β_{d, v} v (t) + β_{d, p} p (t) + ε

where α_d, β_d,v and β_d,p are unknown parameters for the dynamic model, L is the lost fraction, $ε$ is a random error term, g and B are the logit link function and annual burn fraction, v(t) and p(t) are the seasonal VPD and precedent precipitation, respectively.

In the dynamic model, wildfires that occurred in the previous τ years reduce the area available to burn in the modeled year. This feedback effect more heavily weights recent wildfires over older ones with a constant weighting given to the most recent σ years and lesser weighting put on older years up to τ years through a sinusoidal function:

Equation 3:

L (t) = γ \{\sum_{i = - σ}^{- 1} B (i) + \sum_{i = - τ}^{- σ - 1} \frac{B (i)}{2} [1 - \cos \frac{- π (i + σ)}{τ - σ}]\}

where γ is the fuel-limitation strength.

Abatzoglou et al (2021) present acceptable value ranges for τ, σ and γ of 15 to 30 years, 5 years, and 0.5 to 1.5 respectively. Verisk explored the sensitivity of burned area projections to τ and γ including a mapping of τ to mean seasonal VPD. Higher values of γ always lead to lower projections whereas higher values of τ generally lead to lower projections. The size of the historical burn dataset is a key limitation; a consideration in the selection of τ is that higher values of τ means there is less data on which to build the model, potentially limiting the influence of high burn area years in the historical record for some ecoregions. Verisk scientists selected τ to align with accepted literature values while also preserving the amount of historical data used to build the model.

To create the burned area projections, τ was set to be 15 years, σ was set at 5 years and γ at 1.5. A physical argument for taking a γ value higher than 1 is that the previous wildfire years created barriers that prevent wildfires spreading to areas with more fuel.

The dynamic model is used for 31 ecoregions¹ and the static model for the remaining 18. The dynamic-modeled ecoregions were selected by computing the Pearson correlation coefficient between the logarithm of the annual burned area and the seasonal VPD from 1984 to 2020 to estimate the strength of the response and ecoregion 3’s coefficient is used as a threshold. These ecoregions tend to have lower mean seasonal VPD and higher forest fractions.

With τ equal to 15 years, the years 1984 to 1998 were used to compute the lost fraction for 1999 and onwards. The unknown parameters for the static and dynamic models were quantified through a regression on the 1999 to 2020 historical data. For all ecoregions except ecoregion 40, a generalized linear regression model (GLM) with Gamma error distribution and logit² link function is fitted using the R base library (v4.1.2; R Core Team, 2021). This GLM is suited to modeling multiplicative processes where the predictor variables vary linearly, and the response variable varies over several orders of magnitude. For ecoregion 40, a Bayesian regression model with Beta error distribution and logit link function was fitted using the brms library (Bürkner, 2017). The Beta family is preferred here to reduce the weighting on exceptionally high burned area years which cannot be explained well by VPD or precipitation and weights the fit towards most years resulting in a response that can be partly explained by VPD.

The selection of whether to include VPD, precipitation or both as explanatory variables was determined using the static model for all 49 ecoregions and the 1984 to 2020 historical record. Ecoregions could only be candidates for including both variables if burned area was positively correlated to VPD and precipitation which is consistent with the expectations for direct and facilitative climate influences. The most parsimonious model for each ecoregion was identified by preferring the combination which gave the lowest Akaike Information Criterion (AIC) and better fit. In some ecoregions, even though the inclusion of precipitation gave a slightly lower AIC, VPD alone was preferred because of the added uncertainty in future precipitation projections (e.g., ecoregion 78). In summary, 35 ecoregions were modeled using only VPD³ , 6 were modeled using only precipitation⁴, and 9 were modeled using both⁵.

Generalized linear models fitted to historical VPD and precipitation observations for ecoregions 21, 12, and 13 shows examples of GLMs fitted to historical data for three ecoregions (21, 12 and 13). The GLM for Ecoregion 21 (Southern Rockies) shows an exponential response to VPD only. The one for Ecoregion 12 (Snake River Plain) was built using precipitation and shows a positive response but with notable uncertainty. The GLM for Ecoregion 13 (Central Basin and Range) shows the partial dependence of the response variable to VPD and precipitation illustrating a stronger response to precipitation than VPD. The plots also illustrate the notable uncertainty in statistical models.

Historical reproductions using static and dynamic models of annual burn fraction and observations (1984-2020) for three ecoregions illustrates how well the statistical models can reproduce the historical record 1999 to 2020. With the static model, out-of-sample predictions are also shown for 1984 to 1998. The models for Ecoregions 12 and 13 seem to capture the mean and peaks well but they overestimate low burn years and thus underestimate the overall variance. Static model predictions are included for Ecoregion 21 showing there is little difference between the static and dynamic model results in the present climate.

Figure 1. Generalized linear models fitted to historical VPD and precipitation observations for ecoregions 21, 12, and 13 .
Solid lines are linear models, circles are historical observations. Panel a shows ecoregion 21 scaled VPD, panel b shows ecoregion 12 scaled precipitation, the bottom panels show ecoregion 13 scaled VPD and scaled precipitation.

Figure 2. Historical reproductions using static and dynamic models of annual burn fraction and observations (1984-2020) for three ecoregions.
Static model results are in blue, dynamic model results are in red, observations are in black. Static and dynamic model results for ecoregion 21 are shown in panel a. Static model results for ecoregion 12 is shown in panel b. Static model results for ecoregion 13 is shown in panel c.

¹ Ecoregions used in the dynamic model: 1, 3, 4, 5, 6, 8, 9, 10, 11, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 32, 33, 34, 35, 41, 43, 77, 78, 79, 80.

² In the dynamic model, use of the logit link function assumes that B/(1-L) varies from 0 to 1.

³ Ecoregions that used only VPD: 1, 2, 3, 4, 5, 8, 9, 10, 11, 15, 16, 17, 19, 20, 21, 22, 23, 25, 28, 29, 30, 32, 33, 34, 35, 36, 37, 40, 41, 42, 77, 78, 79, 85.

⁴ Ecoregions that used only precipitation: 7, 12, 14, 26, 31, 81.

⁵ Ecoregions that used both VPD and precipitation: 6, 13, 18, 24, 27, 38, 39, 43, 80.