High Spatiotemporal Resolution PM2.5 Concentration Estimation with Machine Learning Algorithm: A Case Study for Wildfire in California

Cui, Qian; Zhang, Feng; Fu, Shaoyun; Wei, Xiaoli; Ma, Yue; Wu, Kun

doi:10.3390/rs14071635

Open AccessFeature PaperArticle

High Spatiotemporal Resolution PM2.5 Concentration Estimation with Machine Learning Algorithm: A Case Study for Wildfire in California

¹

College of Atmospheric Sciences, Nanjing University of Information Science & Technology, Nanjing 210044, China

²

Department of Atmospheric and Oceanic Sciences, Institute of Atmospheric Sciences, Fudan University, Shanghai 200433, China

³

Shanghai Qi Zhi Institute, Shanghai 200232, China

⁴

Jiading District Meteorological Bureau, Shanghai Meteorological Service, Shanghai 201800, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(7), 1635; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14071635

Submission received: 15 February 2022 / Revised: 19 March 2022 / Accepted: 23 March 2022 / Published: 29 March 2022

(This article belongs to the Special Issue Optical and Laser Remote Sensing of Atmospheric Composition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

As an aggregate of suspended particulate matter in the air, atmospheric aerosols can affect the regional climate. With the help of satellite remote sensing technology to retrieve AOD (aerosol optical depth) on a global or regional scale, accurate estimation of PM2.5 concentration has become an important task to quantify the spatiotemporal distribution of AOD and PM2.5. However, due to the limitations of satellite platforms, sensors, and inversion algorithms, the spatiotemporal resolution of current major AOD products is still relatively low. Meanwhile, for the impact of cloud, the AOD products often have a serious data gap problem, which also objectively limits the spatiotemporal coverage of predicted PM2.5 concentration. Therefore, how to effectively improve the spatiotemporal resolution and coverage of PM2.5 concentration under the requisite accuracy is still a grand challenge. In this study, the fused high spatial-temporal resolution AOD data in our previous study were used to estimate the ground PM2.5 concentration through machine learning algorithms, the deep belief network (DBN). The PM2.5 data had spatiotemporal autocorrelation in geostatistics and followed the Gaussian kernel distribution. Hence, the autocorrelation model modified by Gaussian kernel function integrated with DBN algorithm, named Geoi-DBN, was used to estimate PM2.5 concentration. The cross-validation results showed that the Geoi-DBN (R² = 0.86, RMSE = 6.84 µg m⁻³) performed better than the original DBN (R² = 0.67, RMSE = 10.46 µg m⁻³). The final high quality PM2.5 concentration data can be applied for urban air quality monitoring and related PM2.5 exposure risk assessment such as wildfire.

Keywords:

aerosol optical depth; PM2.5; data fusion; air quality; wildfire

1. Introduction

Atmospheric PM2.5 have many adverse impacts on climate change and human health. With an aerodynamic diameter less than or equal to 2.5 μm, it can directly reach the lungs and cause many related chronic diseases. Studies have found that long-term exposure to aerosol pollution can cause diabetes and cardiovascular and cerebrovascular diseases [1]. PM2.5 pollutants have become a general concern in many parts of the world. Although many countries have built PM2.5 ground monitoring networks that provide real time data, PM2.5 ground-based observations often cannot effectively characterize the distribution of PM2.5 pollution over a large area due to the sparse and spatially uneven distribution of ground monitoring sites. In addition, the high dynamic change of PM2.5 pollutants such as smoke and dust will generate and emit a large amount of particulate matter in a short time, with heterogeneity characteristics in spatial distribution. Consequently, accurate and seamless mapping of PM2.5 concentration with high spatiotemporal resolution by using satellite remote sensing data with the help of a machine learning approach is critically important.

Heterogeneous PM2.5 is caused by the emission of PM2.5, spatial distribution of buildings in ground surface, changes of the intensity of human activities in regional environments, and many other factors [2]. In the early days of remote sensing, researchers usually used a simple linear model to estimate the ground PM2.5 concentrations from satellite AOD (aerosol optical depth) data, without fully considering the covariant mechanism between satellite AOD and ground PM2.5 data [3,4]. Further research found that the statistical relationship between PM2.5 concentration and AOD was greatly affected by factors such as aerosol type, vertical distribution characteristics, and the moisture absorption effect of particles as well as spatiotemporal anisotropy between PM2.5 and AOD [5,6,7,8,9]. Hence, the relative humidity and boundary layer height have been introduced into the PM2.5 estimation model to improve the accuracy. The relationship between AOD and PM2.5 is complicated and it is obvious that the uncertainty of the estimated PM2.5 concentration is large when using one single model [10]. Furthermore, with the deepening understanding of the physical and chemical characteristics and formation mechanism of PM2.5, the number of factors and amount of data involved in PM2.5 estimation have increased rapidly. Models developed from general linear regression to complex multiple regression such as the mixed-effects model, geographically weighted regression model, land use regression model, and other empirical statistical models, etc. [11,12,13,14,15]. Generally, the relationship between AOD and PM2.5 concentration is non-linear [16]. All traditional empirical models are unable to optimally express this nonlinear relationship. Therefore, machine learning methodology was brought into PM2.5 concentration estimation, which has a strong learning ability and can effectively establish the complex nonlinear relationship between PM2.5 concentration and its influencing factors. Studies have found that the PM2.5 estimated accuracy of machine learning is better than traditional linear and semi empirical statistical models [17].

AOD is positively correlated with PM2.5 [18,19]. Aerosol optical depth (AOD) products observed from multi-source satellite such as MODIS (Moderate-resolution Imaging Spectroradiometer), VIIRS (Visible Infrared Imaging Radiometer Suite), and GOES (Geostationary Operational Environmental Satellite) provide the chance to obtain fine resolution PM2.5 maps. However, the AOD product of a single instrument usually has limited application on a large-scale due to low coverage and the amount of missing data caused by cloud contamination and bright surfaces. Furthermore, system bias of sensors and different inversion algorithms make it hard to provide a consistent AOD dataset from different satellites. Most AOD products have a low resolution, which make it difficult to meet the research requirements of small-scale areas such as individual urban area [20,21]. For example, MODIS flown on the Terra and Aqua satellites can provide AOD products in three spatial resolutions: 10 km, 3 km, and 1 km. The VIIRS, as a substituted sensor of MODIS, has a 750 m AOD product. In temporal resolution, most AOD data observed by polar-orbiting satellites are collected daily. In contrast, geostationary satellites can provide minute-scale AOD data with lower spatial resolution such as GOES-16 (4 km/15 min) and Himawari-8 (5 km/10 min) [22]. Compared to the observed AOD data, atmospheric chemistry model simulated AOD data such as the MERRA-2 AOD data are global-coverage, low accuracy, and coarse resolution for the complex structure of model and vast issues considered in simulation [23,24,25]. Overall, no single source of AOD data can provide high precision, high resolution, and high coverage AOD data, which will result in PM2.5 concentration estimation with the same defects.

Fusing multi-source heterogeneous AOD data is an effective way to address the limitations of a single source of AOD data and generate higher quality AOD data [26]. Previous studies that have researched AOD data fusion algorithms have been mainly confined to different areas and by different sources of instruments. Yang and Hu (2018) applied a spatiotemporal kriging approach to fill in the gaps in the MODIS AOD product [27]. Spatial statistical approaches showed that improved data coverage is subject to unsatisfactory results when the available data are sparsely distributed. In light of this, combining aerosol products from multiple sensors has been proposed [28]. Tang et al. (2016) reported on the BME method to fuse missing AOD with MODIS and SeaWiFS (Sea-viewing Wide Field-of-view Sensor) (13.5 km) products [26]. Sogacheva et al. (2020) fused 15 different AOD products together and obtained a consistent monthly AOD dataset by using an algorithm of AOD weighted with the results of the AERONET (AErosol RObotic NETwork) verification [29]. However, all of these studies have failed to capture non-randomness in missing AOD values. Xiao et al. (2017) further combined satellite AOD with model simulations by using a multiple imputation method [30]. It remains difficult to obtain accurate aerosol measurements for the uncertainties and coarse spatial resolution brought by the models. Therefore, our recent studies have addressed these limitations by proposing a data fusion algorithm, MQQA-BME, which is an integrative algorithm by synergizing the advantages of MQQA (Modified Quantile-Quantile Adjustment) and BME (Bayesian Maximum Entropy). In the MQQA-BME algorithm, MQQA is a complementary tool for adjusting the systematic biases in the data sources and BME is used for data downscaling and prediction [31,32]. The integrated MQQA-BME algorithm has been proven to perform well for dynamic multisource data fusion such as TOA reflectance and AOD data.

Based on the fused AOD data, it is possible to seamlessly map the ground PM2.5 concentration with fine spatiotemporal resolution, full coverage, and precision. Spatiotemporal information is essential in the estimation of ground PM2.5 concentration. Li et al. (2016) introduced spatiotemporal information into the DBN (deep belief network) model and called it the Geoi-DBN model, which achieved satisfactory results in ground PM2.5 estimation [33]. However, it is based on the traditional Moran index to describe the spatiotemporal information by using the weight of the correlation, which is a linear statistical relationship and cannot accurately depict the real distribution of PM2.5. Later, several researchers further improved this algorithm based on a random forest model [34] and extreme random tree model to map the ground PM2.5 concentration [35]. However, the majority of these studies generally defined a specific window as the spatiotemporal impact domain, which is prone to subjective factors. In this context, the deep confidence network algorithm is used to mine the complex nonlinear relationship between PM2.5 concentration and AOD data. Meteorological factors at high resolution are also considered when building the estimated model since they are known to have an influence on the PM2.5 concentration. Additionally, drawing on the concept of spatiotemporal autocorrelation in geostatistics, the Gaussian kernel model was brought into the deep confidence network, which will improve the accuracy of PM2.5 concentration by considering prior and neighborhood information. The scale of the window is automatically defined by the spatiotemporal covariance model, which reflects the autocorrelation of data in the spatiotemporal domain.

The science questions to be answered in this paper include: (1) Can fused AOD via multisource data better estimated PM2.5 concentration with high spatiotemporal resolution, especially in a highly dynamic forest fire event? and (2) can the Geoi-DBN algorithm significantly improve the modeling accuracy by incorporating the Gaussian kernel and spatiotemporal covariance models? Given the results of our previous studies, the MQQA-BME algorithm can produce better quality AOD data/products. The objectives of this paper are thus to: (1) evaluate the gap filling and spatiotemporal resolution, improving capabilities in estimating ground PM2.5 concentration of fused AOD data, and (2) demonstrate the Gaussian kernel model and Gaussian kernel model integrated with the DBN algorithm with the aid of meteorology factors to improve the estimation accuracy of ground PM2.5 concentration.

2. Study Area and Datasets

The study area comprises central and southern California, as shown in Figure 1. The hourly average observed ground PM2.5 concentration data from September to November in 2018 (fall) was collected from the U.S. EPA (Environmental Protection Agency) website (https://www.epa.gov/outdoor-air-quality-data (accessed on 6 January 2021)). Twenty-three stations were set up for air pollution monitoring in our study area (Figure 1). The fused AOD data obtained from our previous study [31,32] had a 1 km spatial resolution and a temporal resolution of a half hour. The fused AOD data were derived from the MERRA-2 (Modern-Era Retrospective analysis for Research and Applications, Version 2) reanalysis AOD data, the geostationary satellite GOES-16 (Geostationary Operational Environmental Satellites 16) AOD product, and polar-orbiting satellite Terra MODIS AOD data (MCD19A2/AOD, inversion by MAIAC (Multi-Angle Implementation of Atmospheric Correction) algorithm). We obtained meteorological variables from the ECWMF (European Medium-Term Weather Forecast Center) (https://www.ecmwf.int/en/research/climate-reanalysis (accessed on 6 January 2021)) including wind speed at 10 m above ground (WDSP, m/s); air temperature at 2 m above ground (TMP, K), relative humidity (RH, %), surface pressure (PRESS, Pa), total precipitation (TP, mm), and planetary boundary layer height (PBLH, m). These meteorological data have a spatial resolution of 0.25°× 0.25° and temporal resolution of one hour. In this study, we used the 00:00 UTC, 06:00 UTC, 12:00 UTC, and 18:00 UTC data as examples and the meteorology data were downscaled to AOD spatial resolution (1 km) by using the nearest neighbor interpolation.

3. Methods

3.1. Integration of the MQQA-BME Algorithm

The MQQA algorithm aims to address the systematic bias among the data information of multiple sources by using the quantile–quantile mapping theory to increase the comparability of multiple sources of data information. Since the theoretical root of the MQQA algorithm is matching based on probability distribution, the revisions for the more random deviations were not significant. At the same time, MQQA could only perform error correction on the existing image element values, and could not realize the reconstruction operation of missing data. BME of a nonlinear fusion algorithm could combine physical prior knowledge with probability theory for prediction [36] and perform better in downscaling with high dynamic parameters [37,38]. In addition, considering the autocorrelation of spatiotemporal neighborhood data, we could interpolate to fill in the missing data, and thus significantly improve the data coverage. However, the BME algorithm for downscaling fusion in using simultaneous observations from multiple satellites does not consider the contribution of long-time series of historical information and the problem of systematic bias between sensors. The MQQ-BME algorithm combines the advantages of these two algorithms and overcomes the defects of the original algorithm by introducing the error model constructed by the MQQA algorithm based on historical data information into the Bayesian maximum information entropy algorithm [31]. So far, the integrated MQQA-BME algorithm has been successfully applied to multi-source heterogeneous AOD data fusion [32], providing a data basis for seamless mapping of ground-level PM2.5 concentration. The MQQA-BME algorithm is described in detail in our precious study [31].

The BME theory can integrate physical knowledge with probability law as a nonlinear estimator for prediction that has been widely used in atmospheric studies [37]. Generally, the physical knowledge can be divided into two categories: general knowledge and specific knowledge. The general knowledge expresses the global characteristics such as consistent pattern, which is described by the mean trend value as well as the spatial and temporal dependence indicated by its covariance. The datasets include a hard dataset and a soft dataset. The hard data are deemed accurate with good quality. The soft data are usually data with uncertainty and data missing. In BME theory, a Gaussian process error model is derived from the discrepancies between hard data and soft data to help explore the error propagation in between the two adjacent hard data. Obviously, the Gaussian process error model is not a physical model for bias correction during prediction. For this reason, we improved the Gaussian process error model in this analysis by introducing the MQQA into the BME for systematic bias correction and removing the systematic bias from different data sources to reduce the error propagation. By integrating MQQA and BME with the Shannon information in a format of PDF (probability density function) [38], the MQQA-BME approach can obtain high posterior information about the spatiotemporal structure under estimation to formalize the specific knowledge. Finally, the estimated mean value and variance data are obtained at an estimated point via a mathematical optimization process to achieve data fusion.

3.2. Deep Belief Network Algorithm (DBN)

DBN is a generative deep neural network model [39]. It is composed of multiple hidden layers that can operate by the RBM (restricted Botzman machine), which has two layers (a single visible layer and a hidden layer) of feature-detecting units [39]. There are connections between layers, but not between cells within layers, and hidden layer cells are trained to capture the correlations of higher-order data represented in the visible layer. In other words, several RBMS are connected in series to form a DBN, where the hidden layer of the previous RBM is the explicit layer of the next RBM, and the output of the previous RBM is the input of the next RBM [40]. The architecture of the DBN network is shown in Figure 2.

The nodes of RBM layers and number of central neural layers are key parameters of the DBN model. Generally, the more RBM layers there are, the better the ability to simulate complex nonlinear relationships, however, too many layers can lead to overfitting [33,41]. In addition, a previous study showed that the number of central neurons determined by the number of input variables (n) and output variables (

u

), ranges from

2 \sqrt{n} + u

to

2 n + 1

[42]. In the present study, given that this is a complex nonlinear atmospheric environment, it is advantageous to set the layers of RBM as two layers and the number of central neural layers as 20 per layer, according to the input and output to improve the computational speed. The parameters that take part in the DBN model to estimate PM2.5 are as follows (refers to the nonlinear DBN estimation model):

\begin{array}{l} P M 2.5 \\ = f (M E R R A 2_G O E S 16_M A I A C F u s e d A O D, L a t, L o n, D O Y, U 10, V 10, T M P, P B L H, P R E S S, R H, T P) \end{array}

(1)

where Lat is the latitude; Lon is the longitude; DOY is the Day of Year; U10 is the east–west component of the wind vector; and V10 is the north–south component of the wind vector. TMP is the air temperature at 2 m above ground; RH is the relative humidity; PRESS is the surface pressure; TP is the total precipitation; and PBLH is the planetary boundary layer height.

3.3. Geoi-Deep Belief Network (Geoi-DBN)

Li et al. (2017) found that the accuracy of PM2.5 estimation can be improved by using the DBN model with consideration of the spatiotemporal information of neighborhoods [33]. Wei et al. (2020) further compared and analyzed various deep learning algorithms and found that Geoi-DBN performed better than others, with an R² of 0.88 [31]. When estimating PM2.5 data of a single site, PM2.5 data of the site had good autocorrelation with the surrounding data in a certain range since all of the data belonged to the same emission source. Moreover, there was an obvious time dependence between the data of one day and the data of adjacent time [43]. By considering the spatiotemporal autocorrelation at model initiation, some researchers have used the Moran I index with a spatial weight matrix to explore the spatiotemporal autocorrelation [33]. However, the actual spatial distribution of ground PM2.5 concentration cannot be well expressed by simply introducing Moran I into the estimation model [33]. Therefore, Gaussian kernel function was introduced into this study to describe the real spatiotemporal distribution of PM2.5 [34], as shown in Equations (2)–(5):

S_P M 2.5 = \frac{\sum_{i = 1}^{n} W S_{i} P M {2.5}_{i}}{\sum_{i = 1}^{n} W S_{i}} i = 1, 2, \dots . n

(2)

T_P M 2.5 = \frac{\sum_{j = 1}^{m} W t_{i} P M {2.5}_{j}}{\sum_{j = 1}^{n} W t_{j}} j = 1, 2, \dots . m

(3)

W S_{i} = e x p (\frac{- d^{2}}{2 v a r {(P M 2.5)}_{w i n d o w_r a n g e}})

(4)

W T_{j} = e x p (\frac{- t^{2}}{2 v a r {(P M 2.5)}_{w i n d o w_r a n g e}})

(5)

where S_PM2.5 and T_PM2.5 are the autocorrelation of PM2.5 concentrations in the spatial and temporal neighborhoods, respectively; d is the Euclidean distance between the PM2.5 observations and the point to be estimated within the autocorrelation window; var(PM2.5)_{window_range} is the variance of the data within the autocorrelation window; WS_i is the spatial weight relationship between the i-th PM2.5 data within the autocorrelation window size; and WT_j is the temporal weight relationship between the j-th PM2.5 data within the temporal window size.

Additionally, the window size depicting the autocorrelation impacts between the estimation data and neighborhood data was very significant. Because the data at a close range are highly correlated, small window size will lead to over-fitting of the PM2.5 estimation. In contrast, a large window may introduce non-homologous data to the calculation, which would reduce the accuracy of PM2.5 estimation. The use of an artificial constant attempts to obtain the optimum window size, which is a popular method to obtain the accuracy result, the optimum parameters were determined by conducting parameter sensitivity experiments. Multiple attempts are tedious, and it is difficult to avoid the subjective influence. To solve this problem, a spatiotemporal covariance model was brought into this study, automatically determining the window size and better characterizing the spatial distribution of PM2.5 concentrations. As mentioned previously, the spatiotemporal covariance model was the core of the spatiotemporal correlation model, which should portray the spatiotemporal correlations based on the interdependence between the temporal parameters and the spatial parameters in the autocorrelation window domain [44].

Therefore, through analysis of the observation data of PM2.5 in the long-term series in the study area based on the longitude, latitude, and time (Figure 3), respectively, it was found that by exploring the least square method, the covariance relationships in the spatiotemporal domain followed an exponential distribution. The autocorrelation coefficient between PM2.5 data decreased exponentially with increasing distance. It is noteworthy that the pattern in the temporal domain was more complicated with perhaps some sort of fluctuation after a period of time. This may have been caused by the high dynamic changing and heterogeneous characteristics of PM2.5 concentration in the spatiotemporal domain [45]. Therefore, the temporal distribution needs further study in the future. Here, we still used an exponential distribution to describe a large scale trend in time. In Figure 3, the spatial distance reached three pixels and the temporal lag was larger than three days, the covariance value tended toward 0, meaning that the correlation disappeared. In this study, a 1.5 × 1.5 × 3 autocorrelation window size was used. Finally, the optimized Geoi-DBN model is described in Equation (6):

\begin{array}{l} P M 2.5 \\ = f (M E R R A 2_G O E S 16_M A I A C F u s e d A O D, L a t, L o n, D O Y, U 10, V 10, T M P, P B L H, P R E S S, R H, T P, S_P M_{2.5}, T_P M_{2.5}) \end{array}

(6)

Finally, the efficacy of such validation can be confirmed by a set of statistical indexes through Equations (7)–(9).

The root-mean-square error (RMSE) is:

R M S E = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{i = n} {(y_{i} - \hat{y_{i}})}^{2}}

(7)

The coefficient of determination (R²) is:

R^{2} (\hat{y_{i}}, y_{i}) = 1 - \frac{\sum_{i = 1}^{i = n} {(f_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{i = n} {(y_{i} - \hat{y_{i}})}^{2}}

(8)

The mean absolute difference (MAD) is:

M A D = \sum_{i = 1}^{n} \frac{| \hat{y_{i}} - y_{i} |}{n}

(9)

where

y_{i}

and

\hat{y_{i}}

are the predicted and ground truth data, respectively. n is the total number of pair wised samples and

f_{i}

is the fitted (or modeled, or predicted) value f1,..., fn of

y_{i}

.

4. Results

4.1. Results of Multi-Source Heterogeneous AOD Fusion

Table 1 indicates the comparison between AERONET AOD data and MERRA-2 AOD, GOES-16 AOD, MAIAC AOD, and MERRA-2_GOES-16 fused AOD as well as MERRA-2_GOES-16_MAIAC AOD fused data, respectively, in our study area in the fall of 2018. The results show that the R² ranged from 0.34 to 0.53, with the fused R² of 0.48 being acceptable due to gap filling. Among the products of multisource AOD data, the AOD products of polar-orbiting satellites have the highest accuracy (MAIAC), which is also consistent with the previous studies of many researchers [46]. MERRA-2 data in the low-value region correlated well with ground data, but there was an underestimation, possibly due to missing emission source inventories in the aerosol model [47]. In comparison, the AOD data generated by MERRA-2 had a higher accuracy than GOES-16 data and an advantage of full coverage image with a sample size of 674. Nevertheless, GOES 16 is still recommended as one of the fusion sources benefit because of its high temporal resolution of 30 min. The intermediate fusion data of MERRA-2 and GOES 16, namely MERRA-2_GOES-16 AOD and the final fusion data of MERRA-2, GOES 16, and MAIAC AOD data, called MERRA-2_GOES-16_MAIAC AOD, were also verified. Figure 4 shows the distribution of R2, MAD, and RMSE between the MERRA-2_GOES-16_MAIAC AOD and AERONET observed AOD. Obviously, AERONET AOD sites located in coastal locations could result in large errors for sites close to the coast in California where it is easily influenced by the marine environment [48]. Overall, the fusion accuracy of MERRA-2_GOES-16_MAIAC AOD data was slightly lower than the MAIAC AOD data, but the spatial coverage and temporal resolution were greatly improved. Moreover, the final fused MERRA-2_GOES-16_MAIAC AOD data were in good agreement with the ground-based AOD data.

4.2. Potential Effects of Variables on PM2.5

AOD had a good spatial and temporal correlation with PM2.5, especially when pollutants were concentrated in the lower atmosphere. AOD is highly positively related to PM2.5, whereby the correlation coefficient can even reach above 0.75 in some regions [49]. A large number of studies have found significant differences in aerosol chemical composition and regional meteorological fields in different regions, which may lead to the weakening of this relationship in some regions or even negative correlation, limiting the application of estimation model.

It is difficult to observe the properties of aerosols for the complex atmospheric conditions in a certain area [50,51]. To achieve accurate statistical analysis of the relationship between AOD and PM2.5, several parameters are necessary to introduce into the estimation model. Theoretically, PM2.5 concentrations are closely related to near-surface meteorological factors such as wind speed, wind direction, and relative humidity [52,53,54]. For instance, the increase in relative humidity promotes the hygroscopic growth of pollutants and dust-haze [55]. It was found that when the relative humidity reached 98–99%, the optical characterization of AOD changed by 25%, while, when the relative humidity reached 50–80%, the optical characterization of AOD only changed by 5% [56]. Concurrently, the turbulent processes in the atmospheric boundary layer play an important role in the diffusion and dilution of pollutants. When the turbulence activity decreases, the height of mixing layer decreases and the atmospheric stability increases. Pollutants and water vapor are concentrated in the boundary layer, resulting in a large amount of aerosol accumulation [57]. Hoff revised the formula of AOD estimated PM2.5 by considering the relative humidity and atmospheric boundary layer height impacts as follows [6]:

τ = P M 2.5 H f (R H) \frac{3 Q_{e x t, d r y}}{4 ρ r_{e f f}}

(10)

where H is the uniformly mixed atmospheric planetary boundary layer height; τ is the aerosol optical thickness; f (RH) is the ratio of ambient extinction coefficient to dry extinction coefficient;

ρ

is the mass density function of aerosol (g m⁻³),

Q_{e x t, d r y}

is the extinction coefficient of Mie scattering and the corresponding radius of the particles. reff is the area weighted average radius.

It should be pointed out that quantitative analysis of the statistical relationships between input parameters of machine learning models and PM2.5 concentrations in the study area is necessary and will screen out effective variables [58]. Since machine learning models are “black boxes”, it is difficult to account for the estimation results with materialization mechanisms or mathematical formulas. Currently, it is impossible to improve machine learning models based on the relevant materialization mechanisms for better understanding and improving PM2.5 estimation. To overcome this limitation, statistical analysis of the input parameters before estimation can help deepen the understanding of the materialization mechanisms between PM2.5 and other parameters.

Figure 5 and Figure 6 show the diurnal mean distributions of EPA observed PM2.5 concentration and AERONET AOD data in the fall of 2018 in our study area. As seen from the figures, there were two peaks at around 15:00–18:00 UTC and 00:00–03:00 UTC before the wildfire outbreak, consistent with the diurnal pattern revealed in previous studies [59]. The AOD data and PM2.5 data had a synchronous change. Both reflected high values due to wildfire outbreaks, which emit a lot of pollutants by biomass combustion. Therefore, in this study, AOD was applied as the major factor to estimate ground-level PM2.5 concentration. During the wildfire period, note that there were two peaks in the AOD data while there was only one peak for PM2.5 later in the day. This is because PM2.5 is not only related to AOD, but is also influenced by meteorological factors [45]. As shown in Figure 5, PM2.5 data are more constant during the wildfire period than AOD. The deeper reason is that it had a data gap of AOD around the fire point and some smoke plume is also easily treated as cloud. The influence of smoke plume and heterogeneous surface lead to the bias of satellite AOD retrieval. The physicochemical process of connect uplift smoke plume to surface concentrations needs further study in the future.

In order to further investigate the quantitative relationship between AOD and ground PM2.5 concentration, Figure 7a,b show the annual mean AOD and PM2.5 spatial distribution of AERONET station and ground EPA stations, respectively. Results indicate that the spatial distribution of the two, with high values in the northwestern region and lower values in the southern region, is consistent with the transport of smoke from wildfire according to the HYSPLT (Hybrid Single Particle Lagrangian Integrated Trajectory) online model (https://www.ready.noaa.gov/HYSPLIT.php (accessed on 10 March 2021)). The HYSPLIT forward trajectories in Figure 8 show that the polluted substance transfers out to the center of California from 11 to 25 November 2018 were mainly from the north. The surrounding area where biomass has been burning usually indicates the increase in PM2.5 concentration.

Figure 7c is the AOD distribution from a satellite point of view, which also follows the spatial distribution of ground stations. Figure 7d further verifies that there is a high correlation between PM2.5 and AOD in this study area (R² = 0.42). Overall, it is feasible to use AOD satellite observation data and machine learning algorithms to estimate the ground-surface PM2.5 concentration in this study area.

4.3. High-Resolution PM2.5 Concentration Estimation Based on AOD Fusion Products

Figure 9 compares and analyzes the accuracy of both DBN and Geoi-DBN algorithms in estimating the ground-level PM2.5 concentrations during the California wildfire period in the fall of 2018. The results for the original DBN and Geoi-DBN algorithm by 10-fold cross-validation between the measured and predicted PM2.5 concentrations from the DBN model and Geoi-DBN model had an R² of 0.67 and 0.86 and RMSE of 10.46 µg m⁻³ and 6.84 µg m⁻³, respectively. We found that the accuracy of the Geoi-DBN algorithm was significantly better than the original DBN algorithm. This indicates that the Geoi-DBN algorithm could estimate the ground PM2.5 concentration more accurately to introduce the a priori information of the observation of the PM2.5 concentrations of surrounding stations. Estimated PM2.5 concentrations were more correlated to the PM2.5 concentration in the spatiotemporal domain associated with S_PM2.5 and T_PM2.5 variables. Figure 9b,d show the estimated spatial distribution of PM2.5 concentrations, and both could estimate the PM2.5 distribution well since they both belong to the DBN algorithm framework. The PM2.5 concentrations were significantly higher in the north and lower in the south, and the spatial distribution of estimated PM2.5 concentrations from two algorithms was consistent with the ground station observation data, ranging from 0 µg m⁻³ to 80 µg m⁻³. However, the original DBN algorithm, to some extent, had overestimation in the high value area and its slope was lower than Geoi-DBN as it contains a number of underestimated valuations. This phenomenon needs further research in the future.

Furthermore, when catastrophic air pollution events such as wildfire occur, the PM2.5 concentration observation is often required to be of high resolution, in order to provide timely and accurate services for disaster prevention and mitigation. During the two peak periods of hourly PM2.5 concentration, as shown in Figure 5 and Figure 6, to further validate the performance of the Geoi-DBN algorithm, the PM2.5 concentrations point data at the ground-based sites of the observed data and the Geoi-DBN algorithm estimated data on 10 November 2018 (the breakout of wildfire), 16 November 2018 (the developing of wildfire), and 25 November 2018 (the end of wildfire) are shown, respectively (Figure 10 and Figure 11), and the associated grid spatial distribution map of the estimated PM2.5 concentration is also shown in these figures. At the time of the wildfire, the value continued to increase in the high value region, but the high value range decreased. Ten sites in the north reached 100 µg m⁻³, pinpointing the serious PM2.5 exposure risk. Since then, as the disaster was brought under control, the concentration and area of PM2.5 dropped significantly to lower than 20 µg m⁻³. This further demonstrates that the estimated high spatial and temporal resolution PM2.5 concentration (1 km and hourly) by using the Geoi-DBN algorithm with AOD fused data can better represent the variation of ground PM2.5 data during wildfire.

4.4. Uncertainty Analysis

The main goal of this paper was to predict the spatial distribution of PM2.5 for places with no ground monitors while leveraging the high resolution and full coverage of AOD data. Few studies have examined the relationship between ground PM2.5 and AOD at finer spatial and temporal resolution: essentially, the multi-source fused AOD, taking part in estimation, still has uncertainty. Given these research gaps (downscaling, fusion, and prediction uncertainty), Figure 12 shows the curve of long-time series of estimated PM2.5 concentration (Figure 12a) and fused AOD (Figure 12b) as well as the histograms of their deviations from the ground observed data, respectively. Here, is shown the mean value of different AERONET sites (PM2.5 stations) and the matching up fused AOD (estimated PM2.5). The results showed that the fused AOD and the estimated PM2.5 concentrations fluctuated similarly, with lower values during the non-fire period and higher concentrations during the fire period. The difference between the bias of the AOD fusion product (from −0.2 to 0.4) and bias of PM2.5 estimation data (range of −20 µg m⁻³ to 40 µg m⁻³) was not distinct. The mean absolute errors were 0.063 (AOD) and 6.74 µg m⁻³ (PM2.5), respectively. It should be noted that the error increased rather than decreased on some days during the wildfire outbreak, which may be caused by the absence of available observational data in its spatiotemporal neighborhood.

5. Conclusions

This paper has improved a novel machine learning model, namely the Geoi-DBN, considering the spatiotemporal autocorrelation with Gaussian kernel function to estimate the ground PM2.5 concentrations over central and south California during the wildfire period in the fall of 2018. In the estimation model, high spatial and temporal resolution AOD generated by fused AOD products and meteorological reanalysis data were used as the main input factors to improve the accuracy of the estimated ground PM2.5 concentration. The fusion of AOD data was carried out based on multi-scale and multi-source heterogeneous AOD data such as MERRA-2, GOES-16, MAIAC, and AERONET AOD products through the synergistic use of the MQQA-BME algorithm. The seamless AOD data generated by fusing the multi-source heterogeneous data could truly reflect the distribution characteristics of AOD. Compared with the AERONET AOD data, the cross-validated R² reached 0.69 and the RMSE was 0.072. The Geoi-DBN model could provide PM2.5 concentration data with high spatial and temporal resolution, which can meet the requirements of the timeliness and accuracy of monitoring data when catastrophic air pollution events such as fires occur. The Geoi-DBN algorithm was significantly better than the original DBN algorithm, and the cross-validated R² and RMSE were 0.86 and 6.84 µg m⁻³, respectively.

Although the Geoi-DBN models performed better than the original DBN model in ground PM2.5 estimation, there are several ways to further improve the proposed models, which deserve further investigation. First, more input variables and longer archives can be used to further improve the models. Second, other sophisticated approaches such as deep learning could be utilized to refine the estimation accuracy for ground PM2.5 concentrations. The resolution of AOD and PM2.5 concentration can be improved in the near future with advanced sensor operation.

Author Contributions

X.W. designed the study. Q.C. collected and processed the data, analyzed the results, and wrote the original draft. F.Z. provided constructive comments on the paper. S.F., Y.M. and K.W. revised the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42075125 and Shanghai Pujiang Program, grant number 20PJ1401800. The APC was sponsored by Shanghai Qi Zhi Institute.

Data Availability Statement

The data presented in this study are available on request from the corresponding author and first co-author.

Acknowledgments

The authors would also like to express gratitude to the data providers worldwide who provided the data/information used in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kloog, I.; Koutrakis, P.; Coull, B.A.; Lee, H.J.; Schwartz, J. Assessing Temporally and Spatially Resolved PM2.5 Exposures for Epidemiological Studies Using Satellite Aerosol Optical Depth Measurements. Atmos. Environ. 2011, 45, 6267–6275. [Google Scholar] [CrossRef]
Li, Z.; Roy, D.P.; Zhang, H.K.; Vermote, E.F.; Huang, H.; Sciences, S.; Observations, E.; Lansing, E. Evaluation of Landsat-8 and Sentinel-2A Aerosol Optical Depth Retrievals across Chinese Cities and Implications for Medium Spatial Resolution Urban Aerosol Monitoring. Remote Sens. 2019, 11, 122. [Google Scholar] [CrossRef] [Green Version]
Engel-Cox, J.A.; Holloman, C.H.; Coutant, B.W.; Hoff, R.M. Qualitative and Quantitative Evaluation of MODIS Satellite Sensor Data for Regional and Urban Scale Air Quality. Atmos. Environ. 2004, 38, 2495–2509. [Google Scholar] [CrossRef]
Wang, J.; Christopher, S.A. Intercomparison between Satellite-Derived Aerosol Optical Thickness and PM2.5 Mass: Implications for Air Quality Studies. Geophys. Res. Lett. 2003, 30, 30. [Google Scholar] [CrossRef]
Guo, J.P.; Zhang, X.Y.; Che, H.Z.; Gong, S.L.; An, X.; Cao, C.X.; Guang, J.; Zhang, H.; Wang, Y.Q.; Zhang, X.C.; et al. Correlation between PM Concentrations and Aerosol Optical Depth in Eastern China. Atmos. Environ. 2009, 43, 5876–5886. [Google Scholar] [CrossRef]
Hoff, R.M.; Christopher, S.A. Remote Sensing of Particulate Pollution from Space: Have We Reached the Promised Land? J. Air Waste Manag. Assoc. 2009, 59, 645–675. [Google Scholar] [CrossRef]
Liu, H.; Pinker, R.T.; Holben, B.N. A Global View of Aerosols from Merged Transport Models, Satellite, and Ground Observations. J. Geophys. Res. D Atmos. 2005, 110, 1–16. [Google Scholar] [CrossRef] [Green Version]
Ma, Z.; Hu, X.; Sayer, A.M.; Levy, R.; Zhang, Q.; Xue, Y.; Tong, S.; Bi, J.; Huang, L.; Liu, Y. Satellite-Based Spatiotemporal Trends in PM2.5 Concentrations: China, 2004–2013. Environ. Health Perspect. 2016, 124, 184–192. [Google Scholar] [CrossRef] [Green Version]
Tian, J.; Chen, D. A Semi-Empirical Model for Predicting Hourly Ground-Level Fine Particulate Matter (PM2.5) Concentration in Southern Ontario from Satellite Remote Sensing and Ground-Based Meteorological Measurements. Remote Sens. Environ. 2010, 114, 221–229. [Google Scholar] [CrossRef]
Gupta, P.; Christopher, S.A. Particulate Matter Air Quality Assessment Using Integrated Surface, Satellite, and Meteorological Products: 2. A Neural Network Approach. J. Geophys. Res. Atmos. 2009, 114, 114. [Google Scholar] [CrossRef]
Bai, Y.; Wu, L.; Qin, K.; Zhang, Y.; Shen, Y.; Zhou, Y. A Geographically and Temporally Weighted Regression Model for Ground-Level PM2.5 Estimation from Satellite-Derived 500 m Resolution AOD. Remote Sens. 2016, 8, 262. [Google Scholar] [CrossRef] [Green Version]
Ma, Z.; Liu, Y.; Zhao, Q.; Liu, M.; Zhou, Y.; Bi, J. Satellite-Derived High Resolution PM2.5 Concentrations in Yangtze River Delta Region of China Using Improved Linear Mixed Effects Model. Atmos. Environ. 2016, 133, 156–164. [Google Scholar] [CrossRef]
Li, M.; Ouyang, T.; Roberts, A.P.; Heslop, D.; Zhu, Z.; Zhao, X.; Tian, C.; Peng, S.; Zhong, H.; Peng, X.; et al. Influence of Sea Level Change and Centennial East Asian Monsoon Variations on Northern South China Sea Sediments Over the Past 36 Kyr. Geochem. Geophys. Geosystems 2018, 19, 1674–1689. [Google Scholar] [CrossRef]
You, W.; Zang, Z.; Zhang, L.; Li, Y.; Pan, X.; Wang, W. National-Scale Estimates of Ground-Level PM2.5 Concentration in China Using Geographically Weighted Regression Based on 3 Km Resolution MODIS AOD. Remote Sens. 2016, 8, 184. [Google Scholar] [CrossRef] [Green Version]
You, W.; Zang, Z.; Zhang, L.; Li, Y.; Wang, W. Estimating National-Scale Ground-Level PM25 Concentration in China Using Geographically Weighted Regression Based on MODIS and MISR AOD. Environ. Sci. Pollut. Res. 2016, 23, 8327–8338. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Qu, X. Characteristics of Air Pollutant Concentration Change and Its Relationship with Meteorological Conditions in Wuhan 2017. In Proceedings of the 2018 International Conference on Energy, Power, Electrical and Environmental Engineering (EPEEE 2018), Wuhan, China, 27–28 September 2018; pp. 63–68. [Google Scholar] [CrossRef]
Zou, B.; Wang, M.; Wan, N.; Wilson, J.G.; Fang, X.; Tang, Y. Spatial Modeling of PM2.5 Concentrations with a Multifactoral Radial Basis Function Neural Network. Environ. Sci. Pollut. Res. 2015, 22, 10395–10404. [Google Scholar] [CrossRef]
Liu, Y.; Park, R.J.; Jacob, D.J.; Li, Q.; Kilaru, V.; Sarnat, J.A. Mapping Annual Mean Ground-Level PM2.5 Concentrations Using Multiangle Imaging Spectroradiometer Aerosol Optical Thickness over the Contiguous United States. J. Geophys. Res. D Atmos. 2004, 109, 1–10. [Google Scholar] [CrossRef]
Wang, Z.; Chen, L.; Tao, J.; Zhang, Y.; Su, L. Satellite-Based Estimation of Regional Particulate Matter (PM) in Beijing Using Vertical-and-RH Correcting Method. Remote Sens. Environ. 2010, 114, 50–63. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Z. Remote Sensing of Atmospheric Fine Particulate Matter (PM2.5) Mass Concentration near the Ground from Satellite Observation. Remote Sens. Environ. 2015, 160, 252–262. [Google Scholar] [CrossRef]
Fang, X.; Zou, B.; Liu, X.; Sternberg, T.; Zhai, L. Satellite-Based Ground PM2.5 Estimation Using Timely Structure Adaptive Modeling. Remote Sens. Environ. 2016, 186, 152–163. [Google Scholar] [CrossRef]
Lim, H.; Choi, M.; Kim, J.; Kasai, Y.; Chan, P.W. AHI/Himawari-8 Yonsei Aerosol Retrieval (YAER): Algorithm, Validation and Merged Products. Remote Sens. 2018, 10, 699. [Google Scholar] [CrossRef] [Green Version]
Russell, A.; Dennis, R. NARSTO Critical Review of Photochemical Models and Modeling. Atmos. Environ. 2000, 34, 2283–2324. [Google Scholar] [CrossRef]
Meng, Z.; Donald, D.; John, H. Seinfeld. Size-Resolved and Chemically Resolved Model of Atmospheric Aerosol Dynamics. Environ. Eng. 1998, 103, 3419–3435. [Google Scholar] [CrossRef]
Matter, P.; Pai, P.; Grosjean, D. Modeling Atmospheric Particulate Matter. Environ. Sci. Technol./News 1999, 33, 80A–86A. [Google Scholar]
Tang, Q.; Bo, Y.; Zhu, Y. Spatiotemporal Fusion of Multiple-Satellite Aerosol Optical Depth (AOD) Products Using Bayesian Maximum Entropy Method. J. Geophys. Res. 2016, 121, 4034–4048. [Google Scholar] [CrossRef] [Green Version]
Yang, J.; Hu, M. Filling the Missing Data Gaps of Daily MODIS AOD Using Spatiotemporal Interpolation. Sci. Total Environ. 2018, 633, 677–683. [Google Scholar] [CrossRef]
Gupta, P.; Patadia, F.; Christopher, S.A. Multisensor Data Product Fusion for Aerosol Research. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1407–1415. [Google Scholar] [CrossRef]
Sogacheva, L.; Popp, T.; Sayer, A.M.; Dubovik, O.; Garay, M.J.; Heckel, A.; Christina Hsu, N.; Jethva, H.; Kahn, R.A.; Kolmonen, P.; et al. Merging Regional and Global Aerosol Optical Depth Records from Major Available Satellite Products. Atmos. Chem. Phys. 2020, 20, 2031–2056. [Google Scholar] [CrossRef] [Green Version]
Xiao, Q.; Wang, Y.; Chang, H.H.; Meng, X.; Geng, G.; Lyapustin, A.; Liu, Y. Full-Coverage High-Resolution Daily PM2.5 Estimation Using MAIAC AOD in the Yangtze River Delta of China. Remote Sens. Environ. 2017, 199, 437–446. [Google Scholar] [CrossRef]
Wei, X.; Chang, N.B.; Bai, K. A Comparative Assessment of Multisensor Data Merging and Fusion Algorithms for High-Resolution Surface Reflectance Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 4044–4059. [Google Scholar] [CrossRef]
Wei, X.; Bai, K.; Chang, N.B.; Gao, W. Multi-Source Hierarchical Data Fusion for High-Resolution AOD Mapping in a Forest Fire Event. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102366. [Google Scholar] [CrossRef]
Li, T.; Shen, H.; Yuan, Q.; Zhang, X.; Zhang, L. Estimating Ground-Level PM2.5 by Fusing Satellite and Station Observations: A Geo-Intelligent Deep Learning Approach. Geophys. Res. Lett. 2017, 44, 11985–11993. [Google Scholar] [CrossRef] [Green Version]
Bai, K.; Li, K.; Chang, N.B.; Gao, W. Advancing the Prediction Accuracy of Satellite-Based PM2.5 Concentration Mapping: A Perspective of Data Mining through in Situ PM2.5 Measurements. Environ. Pollut. 2019, 254, 113047. [Google Scholar] [CrossRef] [PubMed]
Wei, J.; Huang, W.; Li, Z.; Xue, W.; Peng, Y.; Sun, L.; Cribb, M. Estimating 1-Km-Resolution PM2.5 Concentrations across China Using the Space-Time Random Forest Approach. Remote Sens. Environ. 2019, 231, 111221. [Google Scholar] [CrossRef]
Hristopulos, D.T.; Christakos, G. Practical Calculation of Non-Gaussian Multivariate Moments in Spatiotemporal Bayesian Maximum Entropy Analysis. Math. Geol. 2001, 33, 543–568. [Google Scholar] [CrossRef]
Bayat, B.; Zahraie, B.; Taghavi, F.; Nasseri, M. Evaluation of Spatial and Spatiotemporal Estimation Methods in Simulation of Precipitation Variability Patterns. Theor. Appl. Climatol. 2013, 113, 429–444. [Google Scholar] [CrossRef]
Xu, Y.; Serre, M.L.; Reyes, J.; Vizuete, W. Bayesian Maximum Entropy Integration of Ozone Observations and Model Predictions: A National Application. Environ. Sci. Technol. 2016, 50, 4393–4400. [Google Scholar] [CrossRef]
Hinton, G.E.; Osindero, S.; Teh, Y.W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
Zhang, J.; Li, S. A Deep Learning Scheme for Mental Workload Classification Based on Restricted Boltzmann Machines. Cogn. Technol. Work 2017, 19, 607–631. [Google Scholar] [CrossRef]
Le Roux, N.; Bengio, Y. Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Neural Comput. 2008, 20, 1631–1649. [Google Scholar] [CrossRef]
Fletcher, D.; Goss, E. Forecasting with Neural Networks. An Application Using Bankruptcy Data. Inf. Manag. 1993, 24, 159–167. [Google Scholar] [CrossRef]
Yuan, Q.; Zhang, L.; Shen, H. Hyperspectral Image Denoising Employing a Spectral-Spatial Adaptive Total Variation Model. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3660–3677. [Google Scholar] [CrossRef]
Chen, Y. An Analytical Process of Spatial Autocorrelation Functions Based on Moran’s Index. PLoS ONE 2021, 16, e0249589. [Google Scholar] [CrossRef] [PubMed]
Junghenn Noyes, K.T.; Kahn, R.A.; Limbacher, J.A.; Li, Z. Canadian and Alaskan Wildfire Smoke Particle Properties, Their Evolution, and Controlling Factors, from Satellite Observations. Atmos. Chem. Phys. Discuss. 2021, 1–34. [Google Scholar] [CrossRef]
Jinnagara Puttaswamy, S.; Nguyen, H.M.; Braverman, A.; Hu, X.; Liu, Y. Statistical Data Fusion of Multi-Sensor AOD over the Continental United States. Geocarto Int. 2014, 29, 48–64. [Google Scholar] [CrossRef]
Gueymard, C.A.; Yang, D. Worldwide Validation of CAMS and MERRA-2 Reanalysis Aerosol Optical Depth Products Using 15 Years of AERONET Observations. Atmos. Environ. 2020, 225, 117216. [Google Scholar] [CrossRef]
Levy, R.C.; Remer, L.A.; Martins, J.V.; Kaufman, Y.J.; Plana-Fattori, A.; Redemann, J.; Wenny, B. Evaluation of the MODIS Aerosol Retrievals over Ocean and Land during CLAMS. J. Atmos. Sci. 2005, 62, 974–992. [Google Scholar] [CrossRef]
Guo, Y.; Feng, N.; Christopher, S.A.; Kang, P.; Zhan, F.B.; Hong, S. Satellite Remote Sensing of Fine Particulate Matter (PM2.5) Air Quality over Beijing Using MODIS. Int. J. Remote Sens. 2014, 35, 6522–6544. [Google Scholar] [CrossRef]
Ramachandran, S.; Cherian, R. Regional and Seasonal Variations in Aerosol Optical Characteristics and Their Frequency Distributions over India during 2001–2005. J. Geophys. Res. Atmos. 2008, 113, 1–16. [Google Scholar] [CrossRef]
Koelemeijer, R.B.A.; Homan, C.D.; Matthijsen, J. Comparison of Spatial and Temporal Variations of Aerosol Optical Thickness and Particulate Matter over Europe. Atmos. Environ. 2006, 40, 5304–5315. [Google Scholar] [CrossRef]
Han, Y.; Wu, Y.; Wang, T.; Zhuang, B.; Li, S.; Zhao, K. Impacts of Elevated-Aerosol-Layer and Aerosol Type on the Correlation of AOD and Particulate Matter with Ground-Based and Satellite Measurements in Nanjing, Southeast China. Sci. Total Environ. 2015, 532, 195–207. [Google Scholar] [CrossRef] [PubMed]
Just, A.C.; Wright, R.O.; Schwartz, J.; Coull, B.A.; Baccarelli, A.A.; Tellez-Rojo, M.M.; Moody, E.; Wang, Y.; Lyapustin, A.; Kloog, I. Using High-Resolution Satellite Aerosol Optical Depth to Estimate Daily PM2.5 Geographical Distribution in Mexico City. Environ. Sci. Technol. 2015, 49, 8576–8584. [Google Scholar] [CrossRef] [Green Version]
Paciorek, C.J.; Liu, Y. Limitations of Remotely Sensed Aerosol as a Spatial Proxy for Fine Particulate Matter. Environ. Health Perspect. 2009, 117, 904–909. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tang, G.; Zhang, J.; Zhu, X.; Song, T.; Münkel, C.; Hu, B.; Schäfer, K.; Liu, Z.; Zhang, J.; Wang, L.; et al. Mixing Layer Height and Its Implications for Air Pollution over Beijing, China. Atmos. Chem. Phys. 2016, 16, 2459–2475. [Google Scholar] [CrossRef] [Green Version]
Gupta, P.; Christopher, S.A.; Wang, J.; Gehrig, R.; Lee, Y.; Kumar, N. Satellite Remote Sensing of Particulate Matter and Air Quality Assessment over Global Cities. Atmos. Environ. 2006, 40, 5880–5892. [Google Scholar] [CrossRef]
Zhang, X.Y.; Wang, Y.Q.; Niu, T.; Zhang, X.C.; Gong, S.L.; Zhang, Y.M.; Sun, J.Y. Atmospheric Aerosol Compositions in China: Spatial/Temporal Variability, Chemical Signature, Regional Haze Distribution and Comparisons with Global Aerosols. Atmos. Chem. Phys. 2012, 12, 779–799. [Google Scholar] [CrossRef] [Green Version]
Van Donkelaar, A.; Martin, R.V.; Brauer, M.; Boys, B.L. Use of Satellite Observations for Long-Term Exposure Assessment of Global Concentrations of Fine Particulate Matter. Environ. Health Perspect. 2015, 123, 135–143. [Google Scholar] [CrossRef] [Green Version]
Guo, J.; Xia, F.; Zhang, Y.; Liu, H.; Li, J.; Lou, M.; He, J.; Yan, Y.; Wang, F.; Min, M.; et al. Impact of Diurnal Variability and Meteorological Factors on the PM2.5—AOD Relationship: Implications for PM2.5 Remote Sensing. Environ. Pollut. 2017, 221, 94–104. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The study area of central and southern California as well as the locations of the PM2.5 stations.

Figure 2. Structure of the DBN model.

Figure 3. (a,b) Spatial and temporal covariance models for the PM2.5 data.

Figure 4. Spatial distribution of R², MAD, RMSE of final fused AOD with AERONET observed AOD data.

Figure 5. Daily and hourly distribution of the observed ground PM2.5 concentration based (a) before wildfire outbreak; (b) during the wildfire outbreak (two peaks time at 15:00–18:00 UTC and 00:00–05:00 UTC).

Figure 6. Daily and hourly distribution of observed AOD data of AERONET sites (a) before wildfire outbreak; (b) during the wildfire outbreak.

Figure 7. The comparison of AOD and PM2.5 during the wildfire period. (a) Distribution of daily mean PM2.5 concentration at EPA sites; (b) Daily mean AOD data of AERONET sites; (c) Spatial distribution map of AOD data by MAIAC AOD data; (d) Scatter plot of ground PM2.5 concentration vs. fused AOD data.

Figure 8. Fifteen days forward trajectories starting on 11 November 2018.

Figure 9. Comparison between the DBN model and Geoi-DBN model in estimating PM2.5 concentration during the wildfire period in 2018: (a) Scatter plot of cross-validation of DBN model; (b) Spatial distribution of estimated PM2.5 concentration by DBN model; (c) Scatter plot of cross-validation of the Geoi-DBN model; (d) spatial distribution of thee estimated PM2.5 concentration by the Geoi-DBN model.

Figure 10. Distribution of PM2.5 concentration at 18:00 UTC peak. (a–c) Ground observed PM2.5 concentration; (d–f) Estimated PM2.5 concentration at the corresponding EPA sites based on Geoi-DBN model; (g–i) Spatial distribution map of estimated PM2.5 concentration based on Geoi-DBN.

Figure 11. Distribution of PM2.5 concentration at 00:00 UTC peak. (a–c) Ground observed PM2.5 concentration; (d–f) Estimated PM2.5 concentration at the corresponding EPA sites based on the Geoi-DBN model; (g–i) Spatial distribution map of estimated PM2.5 concentration based on Geoi-DBN.

Figure 12. Variation curves of (a)PM2.5 and (b) AOD concentration and their bias with the ground observed data (DOY: Day of Year).

Table 1. The accuracy result of various AOD data.

AOD	R²	RMSE	MAD	N
MERRA-2	0.41	0.10	0.05	488
GOES 16	0.34	0.11	0.07	430
MERRA-2_GOES 16	0.30	0.14	0.10	674
MAIAC AOD	0.53	0.07	0.04	392
Final Fused AOD	0.48	0.08	0.05	674

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, Q.; Zhang, F.; Fu, S.; Wei, X.; Ma, Y.; Wu, K. High Spatiotemporal Resolution PM2.5 Concentration Estimation with Machine Learning Algorithm: A Case Study for Wildfire in California. Remote Sens. 2022, 14, 1635. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14071635

AMA Style

Cui Q, Zhang F, Fu S, Wei X, Ma Y, Wu K. High Spatiotemporal Resolution PM2.5 Concentration Estimation with Machine Learning Algorithm: A Case Study for Wildfire in California. Remote Sensing. 2022; 14(7):1635. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14071635

Chicago/Turabian Style

Cui, Qian, Feng Zhang, Shaoyun Fu, Xiaoli Wei, Yue Ma, and Kun Wu. 2022. "High Spatiotemporal Resolution PM2.5 Concentration Estimation with Machine Learning Algorithm: A Case Study for Wildfire in California" Remote Sensing 14, no. 7: 1635. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14071635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High Spatiotemporal Resolution PM2.5 Concentration Estimation with Machine Learning Algorithm: A Case Study for Wildfire in California

Abstract

1. Introduction

2. Study Area and Datasets

3. Methods

3.1. Integration of the MQQA-BME Algorithm

3.2. Deep Belief Network Algorithm (DBN)

3.3. Geoi-Deep Belief Network (Geoi-DBN)

4. Results

4.1. Results of Multi-Source Heterogeneous AOD Fusion

4.2. Potential Effects of Variables on PM2.5

4.3. High-Resolution PM2.5 Concentration Estimation Based on AOD Fusion Products

4.4. Uncertainty Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI