Machine learning to predict final fire size at the time of ignition

Shane R. Coffield; Casey A. Graff; Yang Chen; Padhraic Smyth; Efi Foufoula-Georgiou; James T. Randerson

doi:10.1071/WF19023

RESEARCH ARTICLE (Open Access)

Previous Next Contents Vol 28(11)

Machine learning to predict final fire size at the time of ignition

Shane R. Coffield

^A ^D , Casey A. Graff

^B , Yang Chen

^A , Padhraic Smyth ^B , Efi Foufoula-Georgiou

^C ^A and James T. Randerson

^A

+ Author Affiliations

- Author Affiliations

^A Department of Earth System Science, Croul Hall, University of California, Irvine, CA 92697, USA.

^B Department of Computer Science, Donald Bren Hall, University of California, Irvine, CA 92697, USA.

^C Department of Civil and Environmental Engineering, Engineering Hall 5400, University of California, Irvine, CA 92697, USA.

^D Corresponding author. Email: scoffiel@uci.edu

International Journal of Wildland Fire 28(11) 861-873 https://doi.org/10.1071/WF19023
Submitted: 16 February 2019 Accepted: 15 August 2019 Published: 17 September 2019

Journal Compilation © IAWF 2019 Open Access CC BY-NC-ND

Abstract

Fires in boreal forests of Alaska are changing, threatening human health and ecosystems. Given expected increases in fire activity with climate warming, insight into the controls on fire size from the time of ignition is necessary. Such insight may be increasingly useful for fire management, especially in cases where many ignitions occur in a short time period. Here we investigated the controls and predictability of final fire size at the time of ignition. Using decision trees, we show that ignitions can be classified as leading to small, medium or large fires with 50.4 ± 5.2% accuracy. This was accomplished using two variables: vapour pressure deficit and the fraction of spruce cover near the ignition point. The model predicted that 40% of ignitions would lead to large fires, and those ultimately accounted for 75% of the total burned area. Other machine learning classification algorithms, including random forests and multi-layer perceptrons, were tested but did not outperform the simpler decision tree model. Applying the model to areas with intensive human management resulted in overprediction of large fires, as expected. This type of simple classification system could offer insight into optimal resource allocation, helping to maintain a historical fire regime and protect Alaskan ecosystems.

Additional keywords: boreal forests, decision trees, fire management, random forests, vapour pressure deficit.

Introduction

Globally, fire prediction has received increasing attention because of the health and climate impacts of fires and the fact that fire regimes have been changing. First, in terms of public health, fire aerosols contribute to over 300 000 premature deaths each year (Johnston et al. 2012). They are also associated with increased hospitalisations due to respiratory and cardiovascular illness (Johnston et al. 2007; Delfino et al. 2009; Liu et al. 2017; Cascio 2018). Second, in terms of climate, fires are responsible for both positive and negative feedbacks with the climate system. Fires contribute significantly to the global carbon cycle, emitting 2.2 Pg of carbon annually (van der Werf et al. 2017). Deposition of black carbon aerosols increases the absorbed solar energy, melting snow and ice at high latitudes (Flanner et al. 2007; Mouteva et al. 2015; Hao et al. 2016; Sand et al. 2016). As a competing feedback, direct changes to the local landscape may increase reflected radiation, resulting in surface cooling on timescales of years to decades (Randerson et al. 2006; Rogers et al. 2013; Liu et al. 2019). Third, fire regimes have been changing around the globe because of human management and climate change. On average, global fire activity has been declining, largely driven by land use in grassland, savanna, and tropical ecosystems (Andela et al. 2017). However, areas such as the northern boreal forests and Western USA have seen increased fire activity due to climate change and human-caused ignitions, with climate change threatening to exacerbate this trend in the future (Westerling et al. 2006; Liu et al. 2012; Liu and Wimberly 2016; Veraverbeke et al. 2017).

In the Alaskan boreal forests in particular, the impact of a changing climate has been pronounced. The region has experienced warmer summers, longer growing seasons and an increase in lightning. Because Alaska’s burn area has historically been lightning-limited, the increase in lightning has resulted in recent years having some of the most frequent ignitions and most burned area on record (Kasischke and Turetsky 2006; Kasischke et al. 2010; Veraverbeke et al. 2017). Kasischke et al. (2010) reported that for first decade of the 21st century, the boreal region of Alaska had an average annual burned area of 7670 km², the largest in a 150-year record. With an area of 516 000 km² for the boreal interior region, this corresponds to a fire return frequency of ~70 years – at least 30 years less than estimates of variability for the Holocene (Lynch et al. 2002). Increasing lightning and fire trends are expected to continue with future climate warming (Flannigan et al. 2005; Krawchuk et al. 2009; Romps et al. 2014; French et al. 2015; Young et al. 2017), with one study predicting a doubling of burned area by 2050 relative to 1991–2000 (Balshi et al. 2009). Such a changing fire regime threatens both the native peoples and ecosystems that are maladapted to modern fire frequencies. The huge fires and their impacts in recent years may warrant a rethinking of fire management; lands that have previously been limited-suppression zones could now require increased suppression effort to maintain contemporary burning levels and mitigate impacts to humans and vulnerable ecosystems.

Previous work has illuminated the environmental controls on fires and fire size in boreal forests. The controls are typically a combination of topography, vegetation, meteorology and human activity (Kasischke et al. 2002; Flannigan et al. 2005; DeWilde and Chapin 2006; Parisien et al. 2011a; Parisien et al. 2014; Sedano and Randerson 2014; Rogers et al. 2015). Topography has been shown to be relevant both in terms of slope and aspect. Steep slopes can help with rapid upward spread of fires. Aspect is relevant as it relates to tree species and the thickness of the surface duff layer; black spruce, for example, is more likely to dominate north-facing slopes. This species is more flammable than other conifers and has been shown to influence fire intensity and size (Kasischke et al. 2002; Rogers et al. 2015). The structure of the vegetation as fuel can also control the spatial structure of burn probability, with large areas of contiguous conifer forest more likely to burn (Parisien et al. 2011b). In terms of meteorology, the Canadian Forest Service has developed the Canadian Forest Fire Weather Index (FWI) System to rate fire danger, using weather parameters to represent moisture content in various fuel layers. The weather parameters include 1200 hours local standard time (LST) temperature, relative humidity, 24-h precipitation and 10-m wind speed (Van Wagner 1987). Although the FWI has been used as a predictor of fire size and emissions (Di Giuseppe et al. 2018), simpler variables such as vapour pressure deficit (VPD) and temperature can explain regional variability in fire activity, including fire size (Wiggins et al. 2016). VPD appears to be important in setting both ignitions and spread in boreal forests, with VPD anomalies explaining 45% of the variance in annual burned area (Sedano and Randerson 2014). This is likely because of the importance of VPD in determining the moisture content in dead vegetation (fuels) on short timescales, especially in fine fuels like standing dead grass and live mosses (Miller 2019). Extreme temperature has been found to be a major control on boreal fire size at many different spatial scales, whereas relationships between burned area and other variables, including wind, fuel type, fuel moisture, topography and road density, often vary considerably with spatial and temporal scale (Parisien et al. 2011a; Parisien et al. 2014). Road density is important because it regulates access to wildlands, shaping patterns of both ignition and suppression. Fires near human-populated areas are more likely to be suppressed and less likely to become large (DeWilde and Chapin 2006). The presence of flammable fine fuels near roads may also allow lightning strikes to cause more fires in those areas (Arienti et al. 2009).

Numerous types of fire prediction models exist, including both dynamical physical-based spread models and statistical models. Two examples of dynamical spread models that are commonly used by Alaskan fire management agencies are FARSITE (Finney 1998) and the Fire Spread Probability Simulator (FSPro) (Finney et al. 2011). FSPro is a geospatial probabilistic model for predicting fire growth over many days. FARSITE is a deterministic modelling system used on shorter timescales (1–5 days) with a single weather scenario. In terms of rapid prediction of fire growth from ignition with minimal training, a few tools exist, such as REDapp from the Canadian Interagency Forest Fire Centre (http://redapp.org/, accessed 20 August 2019) and the Fire Behaviour Prediction (FBP) Calculator (Forestry Canada Fire Danger Group 1992). Even these are quite complex in comparison to the models we present, relying on information about fuel composition and mechanistic equations for fire spread.

Several studies have investigated statistical models for fire spread and size, primarily based on meteorological indices (Preisler et al. 2009; Faivre et al. 2014, 2016; Butler et al. 2017; Di Giuseppe et al. 2018). One study used machine learning techniques, including random forests, to predict burned area in Portugal with instantaneous weather conditions at ignition (Cortez and Morais 2007). The models relied on ground-station data and were most accurate for predicting the area of small fires. Less research has focussed specifically on the conditional probability of a large fire given information available at the time of ignition. One study used logistic regression with a fire potential index to predict the probability of fires exceeding a specified threshold in the contiguous USA (Preisler et al. 2009). This work examined the fraction of fires that would become large, but did not attempt to identify which specific ignition events were most likely to become large. Also, classification techniques have rarely been evaluated in the context of fire prediction. One example is a study in Brazil that used machine learning classification to predict the risk of ignitions in different areas, but similarly did not attempt to identify which ignitions were most likely to become large (de Souza et al. 2015).

In this study, we present and evaluate a new framework for fire prediction: using machine learning classification to identify specific ignitions that are most likely to become large fires. This is accomplished with two simple driver variables, extracted near the time and place of each ignition point. The final model is a decision tree that can efficiently classify ignition events. This approach may be especially promising for predicting fires and their impacts in the boreal forests of Alaska, where many ignitions occur and suppression resources are limited. In preparing for a future with more and larger fires, this type of simple prediction system may prove useful for fire and ecosystem management.

Methods

Data

We chose as a study area the state of Alaska. The interior portion of Alaska is primarily a mixture of boreal forests and taiga which experience substantial burning (Wein and Maclean 1983; Kasischke et al. 2002). For example, in the large fire year of 2015, ~20 800 km² of land burned. We chose a 17-year study period of 2001–2017, based on the availability of satellite and ground-based fire data as described below (Fig. 1). For each year, we considered the fire season of 1 May through 31 August, which contains fires accounting for 99.5% of the annual burned area according data obtained from the Alaska Large Fire Database (ALFD, http://fire.ak.blm.gov/incinfo/aklgfire.php, accessed 5 October 2018).

**Fig. 1.** Study area of mainland Alaska, USA. In panel (a), Moderate Resolution Imaging Spectroradiometer (MODIS) active fire detections for 14 August 2005 are overlaid on a satellite optical image taken the same day (NASA EOSDIS). In panel (b), all fire perimeters from the Alaska Large Fire Database (ALFD) for 2001–2017 are overlaid on a background landscape map from QGIS Open Layers.

Fires

We obtained historical fire perimeter data from the ALFD available through the Bureau of Land Management’s Alaska Interagency Coordination Center. The ALFD fire-history data compile information from satellite and ground-based records, reporting fire points, perimeters, start dates and management options back to 1939. For our time period, this gave a set of 1771 fires. The management options are determined by the Alaska Interagency Fire Management Plan (https://agdc.usgs.gov/data/projects/fhm/index.html, accessed 5 October 2018). They include ‘limited’, ‘modified’, ‘full’ and ‘critical’, in order of increasing priority for suppression resources (Fig. 2). Fires occurring in a modified, full, or critical zone are threatening to high-valued cultural or historical sites, high-valued natural resource areas, human property, or human life. Here, we selected only fires occurring in the ‘limited’ fire-management zone, which receives very minimal suppression, for two reasons. First, this set of fires had final fire perimeters that were more likely controlled by natural landscape and climate processes, and less by human intervention, making the modelling problem more tractable. Second, there is likely more flexibility in managing fires in this zone, making it an important potential target for efforts to maintain historical fire regions as a part of broader climate adaptation efforts. Considering fires only in this zone narrowed our dataset of fires from 1771 to 1224 fires.

**Fig. 2.** Prevalence of fires in different fire management zones. Panel (a) shows the fire management zones established by the Alaska Fire Service. Panel (b) shows the number of fires in the ALFD database that occurred in each zone during May–August of 2001–2017. In total, 1224 out of 1771 fires (69%) occurred in the limited management zone, where fires are more likely to be controlled by the natural environment and not suppression efforts. Out of the 1224 fires in the limited management zone, 1168 passed through an additional filter using satellite observations to corroborate the start date. This latter set was used in our model analysis.

We used active fire data from the Moderate Resolution Imaging Spectroradiometer (MODIS) to further filter the ALFD fire perimeter dataset. The MODIS Collection 6 Monthly Fire Location Product (MCD14ML) was obtained from the Department of Geographical Sciences at the University of Maryland (Giglio et al. 2016). Comparison of the ALFD and MODIS fire data revealed some spatial and temporal disagreement. In some cases, large fires in the ALFD had no corresponding fire detections from MODIS, and in other cases, the timing of fire events disagreed by multiple weeks. Since the start dates for some fires may be uncertain given the way multiple data sources are compiled in the ALFD, we compared start days with MODIS active fire detections to screen out potential outliers. We removed fires that were large (>4 km²) but had no associated MODIS detection within 10 km and 5 days, applying a reasonably wide temporal window for agreement as sometimes cloud or smoke cover can obscure fires for a few days. We did not filter out any fires in June 2001 when there was a gap in MODIS data. Our filtering further narrowed our dataset of fires from 1224 to 1168 fires.

Meteorology

We accessed daily meteorological data for 2-m air temperature, relative humidity, precipitation, 10-m wind speed and surface air pressure from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis (Copernicus Climate Change Service 2017). The data are available at a 0.25° resolution. We used temperature and relative humidity to derive VPD. This deficit is the difference between the saturation vapour pressure and the actual vapour pressure; we calculated saturation vapour pressure using the Tetens equation (Tetens 1930). We also created a temperature anomaly variable by subtracting the mean temperature for each day over 2001–2017 from the observed temperature.

As a preliminary validation of the ERA5 meteorology products, we plotted temperature, relative humidity, precipitation and VPD at Fairbanks through time for comparison against ground-truth weather data from the Western Regional Climate Center (https://raws.d.ri.edu, accessed 7 December 2018) (Fig. 3). The ERA5 global reanalysis appears to capture the local variability measured by the Fairbanks station. We also included a time series of the number of total fire detections in the interior region of Alaska (Fig. 3e). Total fire activity shows a strong correspondence to VPD in particular, despite the difference of spatial scales, given that Fairbanks is centrally located and the ERA5 data are spatially correlated across interior Alaska.

**Fig. 3.** Time series of reanalysis weather data, ground station weather data, and fire activity for an example year, 2013. Panels (a–d) show the daily weather data from the European Center for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis for the grid cell at Fairbanks along with *in situ* measurements from Fairbanks Airport station (from the Western Regional Climate Center). Despite the difference in spatial scale, total Moderate Resolution Imaging Spectroradiometer (MODIS) fire detections over interior Alaska (e) show a correspondence to weather, especially vapour pressure deficit (VPD).

Vegetation

We included vegetation data from the LANDFIRE Existing Vegetation Type product, which is a Landsat-based classification available at a 30-m resolution for 2001, 2008, 2010, 2012 and 2014 (Rollins 2009). We created two vegetation classes, grouping together several abundant tree species known to influence fire behaviour: one class for any black or white spruce (evergreen) forest cover, and one class for any birch or aspen (deciduous) forest cover. For each fire, we considered these vegetation data at that location using the closest previous year that the data were available. We calculated the fraction of spruce forest cover and the fraction of birch–aspen forest cover for several different radii around each ALFD fire starting point.

Topography

Lastly, we included topographical data from the USA Geological Survey’s GTOPO30 global digital elevation model (DEM), available at a 30-arc second (~1-km) resolution (Gesch et al. 1999). Similar to the vegetation data, for each fire, we considered slope, elevation and aspect averaged for several different radii around each ALFD starting point.

Model development and selection

We first developed and tested decision tree classifiers predicting final size class using data at the time and place of ignition. In contrast to many machine learning models, such as random forests or neural networks, decision trees are readily interpretable. Their interpretability and simplicity make them more transparent for applications in decision-support systems. They also allow us to draw more scientific insight into which variables, and in which combinations, are major controllers of final fire size.

We divided the population of 1168 fires from the ALFD into terciles and labelled them based on final burned area: ‘small’ corresponds to fires that burned less than 1.2 km², ‘medium’ to fires between 1.2 and 19.8 km², and ‘large’ to fires greater than 19.8 km². It should also be noted that we briefly investigated using four or five fire size groups instead of three groups. We present only the three-size-group approach, given our fairly limited sample size with 10-fold cross-validation. Choosing three groups also makes the classification accuracy higher, which may be more useful for communicating with managers or the public.

In all cases, we used 10-fold cross-validation to develop and validate trees using the scikit-learn package in Python (Pedregosa et al. 2011). The scikit-learn decision tree classifier uses an optimised version of the Classification and Regression Trees (CART) algorithm, which relies on a standard Gini function to optimise leaf-node purity on the training set, and does not support pruning. More details on the algorithm is provided at https://scikit-learn.org/stable/modules/tree.html (accessed 20 August 2019). In cross-validation, we select models based on highest average accuracy on the test sets. The accuracy is defined as the number of correct classifications relative to the total number of classifications.

Because scikit-learn CART does not support pruning, for our analysis, we needed to specify the maximum size of the tree. In total, there were three dimensions to analyse in finding the optimal model: the tree shape, the timespan around ignitions to average weather data, and which variables to include.

As a starting point, we built decision tree classifiers based only on VPD averaged over a 5-day period from the day of ignition (t = 0) to 5 days in the future (t = 5). This window represents the idealised data that would be available in a standard weather forecast. We adjusted the size of the trees, allowing for up to 20 leaf nodes, and chose the tree shape with the highest accuracy in validation.

Next, we found the optimal timespan (around ignitions) over which to average weather data. We held the tree shape constant and varied the timespans of weather data, starting 10 days before ignition and ending 7 days after. Once the optimal timespan was selected, we analysed the information content in different input variables. We allowed the tree shape to change, and we report the highest accuracy of validation achieved (with error bars) using different combinations of weather variables.

In addition to the weather variables, we explored vegetation, topography and day-of-year (DOY) as model inputs. For the vegetation, we considered a spruce fraction and a birch–aspen fraction, averaged for a 4-km radius around each ignition point. We chose a 4-km radius because 4 km gave the largest correlation in a preliminary linear regression analysis between vegetation and burned area.

We tested four other machine learning classification algorithms in comparison to decision trees, all available through the same scikit-learn package in Python: random forests, k-nearest neighbours, gradient boosting and a multi-layer perceptron (MLP). For each, we manually searched over a range of relevant parameters and report model accuracy for the optimal parameter values.

Model analysis

We chose a ‘best model’ based on highest validation accuracy and computed other statistics, including recall and precision, for large fires in particular. We developed and present a metric for the improvement in ‘weighted error’ over a null (random) classification model. This metric captures more information about misclassification. We defined accurate classification as error = 0, misclassification by 1 size class as error = 1, and misclassification by 2 size classes as error = 2. A random classification would have an average weighted error of (1/3) (0) + (1/3) (1) + (1/3) (2) = 1.

As another method of assessing model performance, we considered the cumulative burned area fraction accounted for when fires are ranked according to model prediction. Each fire in each test set was assigned a predicted probability of being in each size class. This allowed us to rank the fires in each test group by their predicted probability of being large. We show the mean and range of cumulative burned area fraction, derived from the 10 folds of data used in the cross-validation. We compare this modelled ranking to 10 simulated random rankings as well as the observed ranking based on observed fire size.

To assess whether the model could capture interannual variability in fire dynamics, we tested whether the best model was able to reproduce year-to-year differences in the fraction of large fires. In this case, we redeveloped models using each year as a hold-one-out fold for cross-validation (instead of 10 equal-sized groups) and calculated the correlation between the observed and predicted fraction of large fires each year.

We also quantified the information content in the spatial v. temporal variability of the weather data. In one scenario, we used the climatological mean weather data for every grid cell as the input, regardless of when each ignition occurred. In a second scenario, we used the spatially averaged weather data for each day as the input, regardless of where on the landscape each ignition occurred. We report and compare the classification accuracies of these scenarios.

To explore the footprint of human fire management, we applied our best model, developed on fires in the ‘limited’ management zone, to fires occurring in other management zones where fires are more actively suppressed. By comparing fire sizes and quantifying the model’s overprediction of large fires in the other zones, we inferred how burned area was being modified by current fire management practices.

Results

For our first set of models, we considered VPD averaged for each fire from the date of ignition through 5 days in the future. Allowing for trees with up to 20 leaf nodes, our ‘baseline’ best classification accuracy was 46.1 ± 6.7% using trees with 3 nodes. This represents the mean and standard deviation of accuracy across the 10 folds.

Next, specifying three-node trees, we averaged VPD data over different timespans. We found the optimal time window to be 1–5 days after the ignition, with an average accuracy of 49.2 ± 4.7% (Fig. 4). Going forward, we considered weather data over only this timespan for each fire.

**Fig. 4.** Classification accuracy with varying time window of weather data. Each cell shows the mean validation accuracy across the 10-fold cross-validation, using weather data averaged over different timespans. The timespans start up to 10 days before ignition (−10) and extend up through 7 days after ignition (+7). In all cases, classification models used only vapour pressure deficit (VPD) with 3 leaf nodes. From this analysis, the optimal time window for classification is from 1 to 5 days after ignition.

Our analysis of weather variables is presented in Table 1. We found that VPD was the best predictor of final fire size at the time of ignition. Models including other weather variables did not outperform the VPD-only model. In addition to accuracy, we report P-values in Table 1, each representing a t-test comparing models with different variables against a random classification. All models except three (wind, surface pressure and temperature anomaly) significantly outperformed a random classification at a P = 0.05 level. It should also be noted that no models with combinations of variables significantly outperformed the models with only VPD or only relative humidity (RH).

**Table 1. Information in different weather variables**
Decision trees are developed and validated including different combinations of variables. The mean and standard deviation of validation accuracy across the 10 folds are reported. Asterisks (*) indicate significantly higher accuracy than a random classification, and **bold** type indicates the highest-accuracy model. Tree shapes vary with up to 5 leaf nodes. RH, relative humidity; T, 2-m air surface temperature; Pr, total daily precipitation; VPD, vapour pressure deficit; W, wind speed; SP, surface pressure; T_anom, temperature anomaly from climatology

Our analysis of other variables (day-of-year, vegetation and topography) is presented in Table 2. We tested all possible combinations of variables and report a select summary. Among the other variables, only two were statistically significant: day-of-year and spruce fraction. For the day-of-year variable, fires ignited in late June and early July were most likely to become large. However, including day-of-year did not improve the VPD model. For the spruce-fraction variable, fires with a low fraction of spruce forest around the ignition point were less likely to develop into the largest size class. This agrees with previous research highlighting the importance of black spruce trees in regulating fire intensity and severity in North America (Rogers et al. 2015). Including spruce fraction did improve the VPD model, although not significantly, with an accuracy of 50.4 ± 5.2%. For the remainder of this paper, we refer to this VPD plus spruce fraction model as our ‘best model’.

**Table 2. Information in other variables (vegetation type, day of year, and topography)**
We tested all possible combinations of all variables and present a selected summary below. Asterisks (*) indicate significantly higher accuracy than a random classification, and **bold** type indicates the highest-accuracy model. ‘Spruce fraction’ is the proportion of black or white spruce cover in a 4-km radius around ignition; ‘Birch–aspen fraction’ is the proportion of birch or aspen cover in a 4-km radius around ignition. VPD, vapour pressure deficit

None of the more complex machine learning classifiers outperformed the simpler decision tree model (Table 3). For each classifier, we present the highest validation accuracy achieved, along with descriptions of the optimal parameters. Any parameters not specified were left at their default values.

**Table 3. Comparison of machine learning classification methods**
In addition to decision trees, we tested several machine learning classification methods. In each case, we manually searched over different combinations of relevant parameters and report the most accurate parameter settings for each model below. Performance of decision trees was effectively indistinguishable from that of random forests, and so we focus on the simpler model in this paper. VPD, vapour pressure deficit

For our best decision tree model, we present a representative tree (Fig. 5) and summary statistics (Table 4). In the tree, ignitions occurring during a period of low VPD were classified as small fires, and ignitions occurring during a period of moderate VPD were classified as medium fires. For ignitions occurring during a period of high VPD, most were classified as large fires. A subset of the high-VPD ignitions had a very low spruce fraction and were classified as small fires. Fig. 6 is a visualisation of the variation across the 10 folds. Our best model yields a weighted error of 0.637 ± 0.059, or an improvement (reduction) of 36.3 ± 5.9% over a random classification.

**Fig. 5.** Example of a classification tree using vapour pressure deficit (VPD) averaged for 5 days after ignitions and the fraction of spruce cover in a 4-km radius. This representative tree results from training the entire dataset of 1168 fires. Thresholds for the splits are selected by the training algorithm, optimising leaf node purity. Colour coding with yellow, orange and red respectively indicates classification as small, medium or large fires. The numbers in brackets indicate the observed number of fires falling in each size class.

**Table 4. Statistics for best model**
Models used vapour pressure deficit (VPD) and the fraction of spruce cover, with VPD averaged for the time interval of 1–5 days after the ignition event and spruce fraction averaged for a 4-km radius. We present the mean statistics across the 10-fold cross-validation. Recall is defined as the number of true positives divided by the sum of true positives and false negatives TP ÷ (TP + FN). It represents the proportion of observed large fires that were accurately identified by the model. Precision is defined as the number of true positives divided by the sum of true and false positives TP ÷ (TP + FP). It represents the proportion of fires the model predicted would be large that were observed large

**Fig. 6.** Performance of decision trees. Fires are separated into 10 columns representing the 10-fold cross-validation, with 116 samples in each fold. Fires are sorted vertically by the observed size from largest at top to smallest at bottom, and coloured based on model classification (red for large, orange for medium, and yellow for small).

The model performed particularly accurately for the large fire class, with a recall of 65.2 ± 8.4% and a precision of 52.5 ± 11.8%. The model predicted that 40% of ignitions would become large fires. In reality, those 40% of ignitions became fires that accounted for 75% of the total burned area. In Fig. 7, we rank fires based on their modelled predicted probability of being large. This shows, for example, that half of the total burned area could be accounted for by the top 29% of fires identified by the model.

**Fig. 7.** Cumulative burned area comparing observed, modelled, and random rankings of fires. Each line is a ranking of 116 fires on the x-axis. The errors about each line represent the variation from the 10-fold cross-validation. The ‘modelled’ line uses the vapour pressure deficit and spruce fraction model, ranking based on the predicted probability of each fire being large, as determined by the decision tree for that fold. The ‘random’ ranking is a numerical simulation in which all the fires are shuffled. In all three cases, the y-axis is the cumulative area that would be accounted for by each ranking system.

Fig. 8 shows two more model assessments, investigating the role of (a) the number of fires in the dataset and (b) the number of leaf nodes in the decision trees. The number of fires in the dataset did not appear to be limiting model performance, as maximum accuracy approached 50% for as few as 200 fires. Also, overfitting did not appear to be limiting model performance, given that we selected our model based on optimal accuracy in the test group. A perfectly fit tree for the training dataset required 480 leaf nodes, but best performance for the test group was achieved with 11 or fewer nodes.

**Fig. 8.** Learning curve and overfitting analysis. For (a), we randomly selected subsets of our fire dataset for model development and validation, and these subsets are ordered by size on the x-axis. The y-axis reflects the mean and standard deviation of model accuracy across 100 simulations. In each simulation, models were developed and validated using vapour pressure deficit and spruce fraction as inputs. The upper limit of accuracy with these parameters appears to be ~50%. The shape of the curve indicates that the accuracy of our model is not strongly constrained by data availability. For (b), we allowed to the number of leaf nodes to increase until each was pure. We chose the 4-leaf-node model as our ‘best model’.

On interannual timescales, the VPD plus spruce fraction decision tree model was able to capture year-to-year variations in the fraction of large fires (Fig. 9). The model correctly predicted the fraction of large fires increases during large fire years (Fig. 9c), as indicated by a significant correlation between predictions and observations during 2001–2017 (r² = 0.50, P = 0.001).

**Fig. 9.** Model performance by year. We reran our best model using each year as a hold-one-out fold for cross-validation (instead of 10 equal-sized groups). Panel (a) shows model accuracy when tested on each year. Panel (b) shows the predicted (left) v. observed (right) fires falling into each size class each year (yellow for small, orange for medium, and red for large). Panel (c) shows the predicted (green) v. observed (black) fraction of large fires each year. The model generally captures the interannual variability of fires, predicting a larger proportion of large fires in 2004, 2005 and 2009, but under predicting large fires in 2015.

We quantified the information in the spatial v. temporal variability of weather with the best model (Table 5). We found that these two components were comparable, with the ‘space-only’ model achieving an accuracy of 40% and the ‘time-only’ model achieving an accuracy of 41%. However, the two models varied significantly in which fire size classes were accurately captured; the ‘space-only’ model had higher recall for large fires, while the ‘time-only’ model had higher recall for small fires.

**Table 5. Information in spatial v. temporal variability of weather**

To quantify human impacts on Alaska’s fire regime, we considered fires in the other management zones that have a higher suppression priority. Specifically, we considered the combination of fires in the ‘modified’, ‘full’ and ‘critical’ management options. More fires in the high suppression zone were small (43%), and fewer became large (25%) (Fig. 10). Although there were 8% more ignitions per unit area in the high suppression zones, there was also 28% less annual burned area per unit area (Table 6). The increased fire frequency was likely explained by the higher density of roads, which allowed more ignitions by both humans and lightning, according to previous research (DeWilde and Chapin 2006; Arienti et al. 2009). Using Table 6, we estimated that the total human footprint on the fire regime in interior Alaska was to increase the frequency of fires by 3.4% but to decrease annual burned area by 7.5% during 2001–2017. The higher frequency of fires was more than offset by the increased suppression effort.

**Fig. 10.** Fire sizes by management zone. The terciles of fires in the ‘limited’ management zone were used to define small (<1.2 km²), medium (1.2–19.8 km²) and large (>19.8 km²). Fires in other management zones are less likely to become large, indicating the impact of suppression effort and human fragmentation of the landscape.

**Table 6. Summary of burned area and fire density across more managed zones**
Fires in the critical, full or modified management options of interior Alaska are more frequent but burn less area annually, per unit area. If the entire interior region followed the fire density and burn area density of the limited management zone, we estimate there would be (1.19 × 10⁻⁴ fires year⁻¹ km⁻²) (633 581 km²) = 75.4 fires annually and (9.61 × 10⁻³ km² year⁻¹ km⁻²) (633 581 km²) = 6089 km² burned area annually. By comparing against the observed values of 78.0 fires year⁻¹ and 5631 km² year⁻¹, we infer that the human footprint is to increase the total number of fires only slightly, by 3.4%, but to decrease the total annual burned area by 7.5%

When applied to the other management zones (critical, full and modified), our model (using VPD and spruce fraction) overpredicted large fires. Accuracy decreased from 50.4 to 43.0%. Precision for large fires decreased from to 52.5 to 34.0%; however, recall for large fires stayed approximately the same, decreasing only slightly from 65.2 to 64.3% (Table 7). This drop in precision but not in recall aligned with intuition and supported the robustness of our model; the model did not predict large fires as precisely in these zones, as many of the fires that would have naturally become large were actively suppressed. However, the model still identified with the same success rate the fires that did become large, based on VPD and spruce fraction.

**Table 7. Statistics for best model applied to other management zones**
Models used vapour pressure deficit (VPD) and spruce fraction, with VPD averaged for the time interval of 1–5 days after the ignition event and spruce fraction averaged for a 4-km radius. This sample of 507 fires included management zones ‘critical’, ‘full’ and ‘modified’

Moreover, we found that the overprediction of large fires in the more managed zones was disproportionate; for this set of ignitions, the model predicted 48.2% would become large (Table 7) rather than 40.2% (Table 4). Ignitions in the more managed zones were more often human-caused and occurred during periods of higher VPD, on average, than did those in the limited management zone (0.70 v. 0.66 kPa respectively). Using the mean fire size for each size class from the limited management zone, we found that our model predicted an average fire size of 1.8 times that which was observed for fires in the more managed zones. This suggests that suppression efforts decreased burned area in more managed zones by ~44%.

Discussion

We present and evaluate a novel approach for fire prediction: decision tree classification with weather and vegetation cover data to predict final fire size at the time of ignition. We found that VPD alone, over the period of a standard weather forecast, could be used to classify ignitions into three groups with ~49% accuracy. VPD combined with one vegetation parameter, spruce fraction, improved accuracy to just over 50%. Further research could scale-up the complexity of the vegetation and topography variables to better capture the fuel structure and barriers to fire spread in the area around ignition.

Our findings suggest that weather, specifically VPD, early in a fire’s life can determine if a fire will be extinguished early or will be able to grow large. Further investigation is needed to compare the duration of fires in the small, medium and large classes in relation to the 5-day window used here. It may be that very dry conditions in the first few days allow the fires to grow large enough to persist through wet intervals, so that they can grow again during hot and dry intervals, as suggested by Sedano and Randerson (2014).

Our results are particularly promising for early identification of large fires. Accuracy was highest for the large fire class, with a recall of 65% and precision of 53%. The framework presented in Fig. 7 allows for a cost–benefit analysis of fire suppression. In theory, if it were possible to suppress fires at the instant of ignition, it may be possible to save 50% of the burned area by targeting only the top 29% of ignitions identified by our model. This type of information could offer substantial benefits for human health and preservation of vulnerable ecosystems as further climate warming increases burned area (Westerling et al. 2006; Liu et al. 2012; Liu and Wimberly 2016; Veraverbeke et al. 2017).

It is likely that weather forecasts would be a key limiting factor for model accuracy, as forecasts tend to degrade rapidly after a few days into the future. We did not investigate the degradation of model accuracy when using archived weather forecasts in place of reanalysis, primarily due to the cost of these ECMWF datasets. We speculate that the primary factor limiting accuracy to 50% is the incomplete characterisation of biology, fuels and barriers with our vegetation cover variables, which do not mechanistically account for fire spread. Information was also lost in our temporal averaging of weather and the inability of coarser-scale reanalysis products to capture very localised variations in precipitation. The number of fires in the dataset did not appear to be limiting the accuracy, based on a learning curve analysis (Fig. 8a).

With our approach focusing on information available at the time of ignition, we found that decision trees, a simple and readily interpretable method, performed similarly to other machine learning classifiers (namely, random forests, k-nearest neighbours, gradient boosting and multi-layer perceptrons). Incorrect application of any of these methods may yield overfitting, and so we provided an analysis of the training v. testing accuracy for our selected decision tree model (Fig. 8b). Although perfect training accuracy requires nearly 500 leaf nodes for a dataset of 1168 fires, testing accuracy is optimised for 11 or fewer leaf nodes. We did not include an analysis of more complex or deep learning methods (e.g. recurrent neural network), given our fairly small dataset and lack of indication that more complex models would outperform simpler models. However, future research in fire-size prediction should investigate more methodologies, especially at larger scales with more data and more complex input variables.

In our comparison of fire sizes and model results for different management zones, we also inferred the footprint of human suppression effort on burned area. As expected, our model overpredicts large fires in zones that are more actively managed. However, the model still had similar recall for the fires that did become large. Our model also allowed us to estimate the impacts of fire suppression, taking into account that human ignitions in these areas tended to occur during periods with hotter and drier weather.

Our models differed in structure and purpose from other fire size prediction methods and were not intended to compete with more complex models used for fire management. Rather, we view our analysis as a useful framework for investigating the major controls on fires using information available at the time of ignition. The insight gained may be useful in other regions beyond boreal forests of Alaska, where the early information could help inform management strategies in vulnerable ecosystems responding to strong trends in climate.

Conflicts of interest

The authors declare that they have no conflicts of interest.

Acknowledgements

This work is based upon support received from the National Science Foundation (NSF) Graduate Research Fellowship Program under grant number DGE-1839285 (for S. R. Coffield), by NSF under grant number 1633631 (for C. A. Graff, J. T. Randerson, P. Smyth) as part of the University of California, Irvine (UCI) NSF Research Traineeship (NRT) Machine Learning and Physical Sciences (MAPS) Program, by NASA under award NNX15AQ06A as part of the California State University-Los Angeles (CSULA)/UCI Data Intensive Research and Education Center (DIRECT)-STEM project (for P. Smyth), by NSF under award CNS-1730158 (for P. Smyth), by the Department of Energy Office of Science’s Reducing Uncertainty in Biogeochemical Interactions through Synthesis and Computation (RUBISCO) Science Focus Area and NASA’s Soil Moisture Active Passive (SMAP), Interdisciplinary Research in Earth Science (IDS) and Carbon Monitoring System (CMS) programs (for J. T. Randerson, Y. Chen), by the NSF under grant number DMS-1839336 (for E. Foufoula-Georgiou, J. T. Randerson, P. Smyth) as part of the Transdisciplinary Research in Principles of Data Science (TRIPODS) program, and by NASA under grant number NNX16AO56G (for E. Foufoula-Georgiou) as part of the Global Precipitation Measurement (GPM) program.

References

Andela N, Morton DC, Giglio L, Chen Y, van der Werf GR, Kasibhatla PS, DeFries RS, Collatz GJ, Hantson S, Kloster S, Bachelet D, Forrest M, Lasslop G, Li F, Mangeon S, Melton JR, Yue C, Randerson JT (2017) A human-driven decline in global burned area. Science 356, 1356–1362.
| A human-driven decline in global burned area.Crossref | GoogleScholarGoogle Scholar | 28663495PubMed |

Arienti MC, Cumming SG, Krawchuk MA, Boutin S (2009) Road network density correlated with increased lightning fire incidence in the Canadian western boreal forest. International Journal of Wildland Fire 18, 970–982.
| Road network density correlated with increased lightning fire incidence in the Canadian western boreal forest.Crossref | GoogleScholarGoogle Scholar |

Balshi MS, McGuire AD, Duffy P, Flannigan MD, Walsh J, Melillo J (2009) Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach. Global Change Biology 15, 578–600.
| Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach.Crossref | GoogleScholarGoogle Scholar |

Butler Z, Chen Y, Randerson JT, Smyth P (2017) Fire event prediction for improved regional smoke forecasting. In ‘Proceedings of the 7th International Workshop on Climate Informatics: CI 2017’, 20–22 September 2017, Boulder, CO, USA. (Eds V Lyubchich, N Oza, A Rhines, E Szekely) NCAR Technical Note NCAR/TN-536+PROC10.5065/D6222SH7. (National Center for Atmospheric Research: Boulder, CO, USA)

Cascio WE (2018) Wildland fire smoke and human health. The Science of the Total Environment 624, 586–595.
| Wildland fire smoke and human health.Crossref | GoogleScholarGoogle Scholar | 29272827PubMed |

Copernicus Climate Change Service (C3S) (2017) ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate. Copernicus Climate Change Service Climate Data Store (CDS). Available at https://cds.climate.copernicus.eu/cdsapp#!/home [Verified 20 Aug 2019]

Cortez P, Morais A (2007) A data mining approach to predict forest fires using meteorological data. In ‘Proceedings of the 13th Portuguese Conference on Artificial Intelligence’, 3–7 December 2007, Guimares, Portugal. (Eds J Neves, M Santos, J Machado) pp. 512–523. (Springer-Verlag, Berlin). Guimaraes, Portugal) http://www3.dsi.uminho.pt/pcortez/fires.pdf [Verified 20 August 2019]

de Souza FT, Koerner TC, Chlad R (2015) A data-based model for predicting wildfires in Chapada das Mesas National Park in the State of Maranhão. Environmental Earth Sciences 74, 3603–3611.
| A data-based model for predicting wildfires in Chapada das Mesas National Park in the State of Maranhão.Crossref | GoogleScholarGoogle Scholar |

Delfino RJ, Brummel S, Wu J, Stern H, Ostro B, Lipsett M, Winer A, Street DH, Zhang L, Tjoa T, Gillen DL (2009) The relationship of respiratory and cardiovascular hospital admissions to the southern California wildfires of 2003. Occupational and Environmental Medicine 66, 189–197.
| The relationship of respiratory and cardiovascular hospital admissions to the southern California wildfires of 2003.Crossref | GoogleScholarGoogle Scholar | 19017694PubMed |

DeWilde L, Chapin FS (2006) Human impacts on the fire regime of interior Alaska: interactions among fuels, ignition sources, and fire suppression. Ecosystems 9, 1342–1353.
| Human impacts on the fire regime of interior Alaska: interactions among fuels, ignition sources, and fire suppression.Crossref | GoogleScholarGoogle Scholar |

Di Giuseppe F, Rémy S, Pappenberger F, Wetterhall F (2018) Using the Fire Weather Index (FWI) to improve the estimation of fire emissions from fire radiative power (FRP) observations. Atmospheric Chemistry and Physics 18, 5359–5370.
| Using the Fire Weather Index (FWI) to improve the estimation of fire emissions from fire radiative power (FRP) observations.Crossref | GoogleScholarGoogle Scholar |

Faivre NR, Jin Y, Goulden ML, Randerson JT (2014) Controls on the spatial pattern of wildfire ignitions in Southern California. International Journal of Wildland Fire 23, 799–811.
| Controls on the spatial pattern of wildfire ignitions in Southern California.Crossref | GoogleScholarGoogle Scholar |

Faivre NR, Jin Y, Goulden ML, Randerson JT (2016) Spatial patterns and controls on burned area for two contrasting fire regimes in Southern California. Ecosphere 7, e01210
| Spatial patterns and controls on burned area for two contrasting fire regimes in Southern California.Crossref | GoogleScholarGoogle Scholar |

Finney MA (1998) FARSITE: Fire Area Simulator-Model development and evaluation. USDA Forest Service, Rocky Mountain Research Station, Research Paper RMRS-RP-4. (Ogden, UT, USA)

Finney MA, Grenfell IC, McHugh CW, Seli RC, Trethewey D, Stratton RD, Brittain S (2011) A method for ensemble wildland fire simulation. Environmental Modeling and Assessment 16, 153–167.
| A method for ensemble wildland fire simulation.Crossref | GoogleScholarGoogle Scholar |

Flanner MG, Zender CS, Randerson JT, Rasch PJ (2007) Present-day climate forcing and response from black carbon in snow. Journal of Geophysical Research – D. Atmospheres 112, D11202
| Present-day climate forcing and response from black carbon in snow.Crossref | GoogleScholarGoogle Scholar |

Flannigan MD, Logan KA, Amiro BD, Skinner WR, Stocks BJ (2005) Future area burned in Canada. Climatic Change 72, 1–16.
| Future area burned in Canada.Crossref | GoogleScholarGoogle Scholar |

Forestry Canada Fire Danger Group (1992) Development and structure of the Canadian forest fire behavior prediction system. Forestry Canada, Science and Sustainable Development Directorate, Information Report ST-X-3. (Ottawa, ON, Canada)

French NHF, Jenkins LK, Loboda TV, Flannigan M, Jandt R, Bourgeau-Chavez LL, Whitley M (2015) Fire in arctic tundra of Alaska: past fire activity, future fire potential, and significance for land management and ecology. International Journal of Wildland Fire 24, 1045–1061.
| Fire in arctic tundra of Alaska: past fire activity, future fire potential, and significance for land management and ecology.Crossref | GoogleScholarGoogle Scholar |

Gesch DB, Verdin KL, Greenlee SK (1999) New land surface digital elevation model covers the Earth. Eos 80, 69–71.
| New land surface digital elevation model covers the Earth.Crossref | GoogleScholarGoogle Scholar |

Giglio L, Schroeder W, Justice CO (2016) The collection 6 MODIS active fire detection algorithm and fire products. Remote Sensing of Environment 178, 31–41.
| The collection 6 MODIS active fire detection algorithm and fire products.Crossref | GoogleScholarGoogle Scholar | 30158718PubMed |

Hao WM, Petkov A, Nordgren BL, Corley RE, Silverstein RP, Urbanski SP, Evangeliou N, Balkanski Y, Kinder BL (2016) Daily black carbon emissions from fires in northern Eurasia for 2002–2015. Geoscientific Model Development 9, 4461–4474.
| Daily black carbon emissions from fires in northern Eurasia for 2002–2015.Crossref | GoogleScholarGoogle Scholar |

Johnston FH, Bailie RS, Pilotto LS, Hanigan IC (2007) Ambient biomass smoke and cardio-respiratory hospital admissions in Darwin, Australia. BMC Public Health 7, 240
| Ambient biomass smoke and cardio-respiratory hospital admissions in Darwin, Australia.Crossref | GoogleScholarGoogle Scholar | 17854481PubMed |

Johnston FH, Henderson SB, Chen Y, Randerson JT, Marlier M, DeFries RS, Kinney P, Bowman DMJS, Brauer M (2012) Estimated global mortality attributable to smoke from landscape fires. Environmental Health Perspectives 120, 695–701.
| Estimated global mortality attributable to smoke from landscape fires.Crossref | GoogleScholarGoogle Scholar | 22456494PubMed |

Kasischke ES, Turetsky MR (2006) Recent changes in the fire regime across the North American boreal region – spatial and temporal patterns of burning across Canada and Alaska. Geophysical Research Letters 33, L09703
| Recent changes in the fire regime across the North American boreal region – spatial and temporal patterns of burning across Canada and Alaska.Crossref | GoogleScholarGoogle Scholar |

Kasischke ES, Williams D, Barry D (2002) Analysis of the patterns of large fires in the boreal forest region of Alaska. International Journal of Wildland Fire 11, 131–144.
| Analysis of the patterns of large fires in the boreal forest region of Alaska.Crossref | GoogleScholarGoogle Scholar |

Kasischke ES, Verbyla DL, Rupp TS, McGuire AD, Murphy KA, Jandt R, Barnes JL, Hoy EE, Duffy PA, Calef M, Turetsky MR (2010) Alaska’s changing fire regime – implications for the vulnerability of its boreal forests. Canadian Journal of Forest Research 40, 1313–1324.
| Alaska’s changing fire regime – implications for the vulnerability of its boreal forests.Crossref | GoogleScholarGoogle Scholar |

Krawchuk MA, Cumming SG, Flannigan MD (2009) Predicted changes in fire weather suggest increases in lightning fire initiation and future area burned in the mixedwood boreal forest. Climatic Change 92, 83–97.
| Predicted changes in fire weather suggest increases in lightning fire initiation and future area burned in the mixedwood boreal forest.Crossref | GoogleScholarGoogle Scholar |

Liu Z, Wimberly MC (2016) Direct and indirect effects of climate change on projected future fire regimes in the western United States. The Science of the Total Environment 542, 65–75.
| Direct and indirect effects of climate change on projected future fire regimes in the western United States.Crossref | GoogleScholarGoogle Scholar | 26519568PubMed |

Liu Z, Yang J, Chang Y, Weisberg PJ, He HS (2012) Spatial patterns and drivers of fire occurrence and its future trend under climate change in a boreal forest of Northeast China. Global Change Biology 18, 2041–2056.
| Spatial patterns and drivers of fire occurrence and its future trend under climate change in a boreal forest of Northeast China.Crossref | GoogleScholarGoogle Scholar |

Liu JC, Wilson A, Mickley LJ, Dominici F, Ebisu K, Wang Y, Sulprizio MP, Peng RD, Yue X, Son JY, Anderson GB, Bell ML (2017) Wildfire-specific fine particulate matter and risk of hospital admissions in urban and rural counties. Epidemiology 28, 77–85.
| Wildfire-specific fine particulate matter and risk of hospital admissions in urban and rural counties.Crossref | GoogleScholarGoogle Scholar | 27648592PubMed |

Liu Z, Ballantyne AP, Cooper LA (2019) Biophysical feedback of global forest fires on surface temperature. Nature Communications 10, 214
| Biophysical feedback of global forest fires on surface temperature.Crossref | GoogleScholarGoogle Scholar | 30644402PubMed |

Lynch JA, Clark JS, Bigelow NH, Edwards ME, Finney BP (2002) Geographic and temporal variations in fire history in boreal ecosystems of Alaska. Journal of Geophysical Research 107, FFR8-1–FFR8-17.
| Geographic and temporal variations in fire history in boreal ecosystems of Alaska.Crossref | GoogleScholarGoogle Scholar |

Miller E (2019) Moisture sorption models for fuel beds of standing dead grass in Alaska. Fire 2, 2
| Moisture sorption models for fuel beds of standing dead grass in Alaska.Crossref | GoogleScholarGoogle Scholar |

Mouteva GO, Czimczik CI, Fahrni SM, Wiggins EB, Rogers BM, Veraverbeke S, Xu X, Santos GM, Henderson J, Miller CE, Randerson JT (2015) Black carbon aerosol dynamics and isotopic composition in Alaska linked with boreal fire emissions and depth of burn in organic soils. Global Biogeochemical Cycles 29, 1977–2000.
| Black carbon aerosol dynamics and isotopic composition in Alaska linked with boreal fire emissions and depth of burn in organic soils.Crossref | GoogleScholarGoogle Scholar |

Parisien MA, Parks SA, Krawchuk MA, Flannigan MD, Bowman LM, Moritz MA (2011a) Scale-dependent controls on the area burned in the boreal forest of Canada, 1980–2005. Ecological Applications 21, 789–805.
| Scale-dependent controls on the area burned in the boreal forest of Canada, 1980–2005.Crossref | GoogleScholarGoogle Scholar | 21639045PubMed |

Parisien MA, Parks SA, Miller C, Krawchuk MA, Heathcott M, Moritz MA (2011b) Contributions of ignitions, fuels, and weather to the spatial patterns of burn probability of a boreal landscape. Ecosystems 14, 1141–1155.
| Contributions of ignitions, fuels, and weather to the spatial patterns of burn probability of a boreal landscape.Crossref | GoogleScholarGoogle Scholar |

Parisien MA, Parks SA, Krawchuk MA, Little JM, Flannigan MD, Gowman LM, Moritz MA (2014) An analysis of controls on fire activity in boreal Canada: comparing models built with different temporal resolutions. Ecological Applications 24, 1341–1356.
| An analysis of controls on fire activity in boreal Canada: comparing models built with different temporal resolutions.Crossref | GoogleScholarGoogle Scholar | 29160658PubMed |

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.

Preisler HK, Burgan RE, Eidenshink JC, Klaver JM, Klaver RW (2009) Forecasting distributions of large federal-lands fires utilizing satellite and gridded weather information. International Journal of Wildland Fire 18, 508–516.
| Forecasting distributions of large federal-lands fires utilizing satellite and gridded weather information.Crossref | GoogleScholarGoogle Scholar |

Randerson JT, Liu H, Flanner MG, Chambers SD, Jin Y, Hess PG, Pfister G, Mack MC, Treseder KK, Welp LR, Chapin FS, Harden JW, Goulden ML, Lyons E, Neff JC, Schuur EAG, Zender CS (2006) The impact of boreal forest fire on climate warming. Science 314, 1130–1132.
| The impact of boreal forest fire on climate warming.Crossref | GoogleScholarGoogle Scholar | 17110574PubMed |

Rogers BM, Randerson JT, Bonan GB (2013) High-latitude cooling associated with landscape changes from North American boreal forest fires. Biogeosciences 10, 699–718.
| High-latitude cooling associated with landscape changes from North American boreal forest fires.Crossref | GoogleScholarGoogle Scholar |

Rogers BM, Soja AJ, Goulden ML, Randerson JT (2015) Influence of tree species on continental differences in boreal fires and climate feedbacks. Nature Geoscience 8, 228–234.
| Influence of tree species on continental differences in boreal fires and climate feedbacks.Crossref | GoogleScholarGoogle Scholar |

Rollins MG (2009) LANDFIRE: a nationally consistent vegetation, wildland fire, and fuel assessment. International Journal of Wildland Fire 18, 235–249.
| LANDFIRE: a nationally consistent vegetation, wildland fire, and fuel assessment.Crossref | GoogleScholarGoogle Scholar |

Romps DM, Seeley JT, Vollaro D, Molinari J (2014) Projected increase in lightning strikes in the United States due to global warming. Science 346, 851–854.
| Projected increase in lightning strikes in the United States due to global warming.Crossref | GoogleScholarGoogle Scholar | 25395536PubMed |

Sand M, Berntsen TK, Von Salzen K, Flanner MG, Langner J, Victor DG (2016) Response of Arctic temperature to changes in emissions of short-lived climate forcers. Nature Climate Change 6, 286–289.
| Response of Arctic temperature to changes in emissions of short-lived climate forcers.Crossref | GoogleScholarGoogle Scholar |

Sedano F, Randerson JT (2014) Multi-scale influence of vapor pressure deficit on fire ignition and spread in boreal forest ecosystems. Biogeosciences 11, 3739–3755.
| Multi-scale influence of vapor pressure deficit on fire ignition and spread in boreal forest ecosystems.Crossref | GoogleScholarGoogle Scholar |

Tetens O (1930) Über einige meteorologische Begriffe. Zeitschrift für Geophysik 6, 297–309.

van der Werf GR, Randerson JT, Giglio L, Van Leeuwen TT, Chen Y, Rogers BM, Mu M, Van Marle MJE, Morton DC, Collatz GJ, Yokelson RJ, Kasibhatla PS (2017) Global fire emissions estimates during 1997–2016. Earth System Science Data 9, 697–720.
| Global fire emissions estimates during 1997–2016.Crossref | GoogleScholarGoogle Scholar |

Van Wagner CE (1987) Development and structure of the Canadian Forest Fire Weather Index System. Canadian Forestry Service, Petawawa National Forest Institute, Forestry Technical Report 35. (Chalk River, ON, Canada)

Veraverbeke S, Rogers BM, Goulden ML, Jandt RR, Miller CE, Wiggins EB, Randerson JT (2017) Lightning as a major driver of recent large fire years in North American boreal forests. Nature Climate Change 7, 529–534.
| Lightning as a major driver of recent large fire years in North American boreal forests.Crossref | GoogleScholarGoogle Scholar |

Wein RW, Maclean DA (1983) An overview of fire in northern ecosystems. In ‘The Role of Fire in Northern Circumpolar Ecosystems’. (Eds RW Wein, DA MacLean) pp. 1–18. (Wiley: New York, NY, USA)

Westerling AL, Hidalgo HG, Cayan DR, Swetnam TW (2006) Warming and earlier spring increase Western US forest wildfire activity. Science 313, 940–943.
| Warming and earlier spring increase Western US forest wildfire activity.Crossref | GoogleScholarGoogle Scholar | 16825536PubMed |

Wiggins EB, Veraverbeke S, Henderson JM, Karion A, Miller JB, Lindaas J, Commane R, Sweeney C, Luus KA, Tosca MG, Dinardo SJ, Wofsy S, Miller CE, Randerson JT (2016) The influence of daily meteorology on boreal fire emissions and regional trace gas variability. Journal of Geophysical Research – Biogeosciences 121, 2793–2810.
| The influence of daily meteorology on boreal fire emissions and regional trace gas variability.Crossref | GoogleScholarGoogle Scholar |

Young AM, Higuera PE, Duffy PA, Hu FS (2017) Climatic thresholds shape northern high-latitude fire regimes and imply vulnerability to future climate change. Ecography 40, 606–617.
| Climatic thresholds shape northern high-latitude fire regimes and imply vulnerability to future climate change.Crossref | GoogleScholarGoogle Scholar |