Next Article in Journal
Mobility, Out-of-Home Activity Participation and Needs Fulfilment in Later Life
Next Article in Special Issue
Spectral Feature Selection Optimization for Water Quality Estimation
Previous Article in Journal
Quality of Life: The Interplay between Human Behaviour, Technology and the Environment
Previous Article in Special Issue
Impact of Urbanization on Ecosystem Health: A Case Study in Zhuhai, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Global and Geographically and Temporally Weighted Regression Models for Modeling PM2.5 in Heilongjiang, China from 2015 to 2018

1
School of Forestry, Northeast Forestry University, Harbin 150040, China
2
Department of Forest and Natural Resources Management, State University of New York College of Environmental Science and Forestry, One Forestry Drive, Syracuse, New York, NY 13210, USA
3
Key Laboratory of Sustainable Forest Ecosystem Management-Ministry of Education, Northeast Forestry University, Harbin 150040, China
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2019, 16(24), 5107; https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph16245107
Submission received: 20 November 2019 / Revised: 11 December 2019 / Accepted: 11 December 2019 / Published: 14 December 2019
(This article belongs to the Special Issue Spatio-Temporal Environmental Monitoring and Social Sensing)

Abstract

:
Objective: This study investigated the relationships between PM2.5 and 5 criteria air pollutants (SO2, NO2, PM10, CO, and O3) in Heilongjiang, China, from 2015 to 2018 using global and geographically and temporally weighted regression models. Methods: Ordinary least squares regression (OLS), linear mixed models (LMM), geographically weighted regression (GWR), temporally weighted regression (TWR), and geographically and temporally weighted regression (GTWR) were applied to model the relationships between PM2.5 and 5 air pollutants. Results: The LMM and all GWR-based models (i.e., GWR, TWR, and GTWR) showed great advantages over OLS in terms of higher model R2 and more desirable model residuals, especially TWR and GTWR. The GWR, LMM, TWR, and GTWR improved the model explanation power by 3%, 5%, 12%, and 12%, respectively, from the R2 (0.85) of OLS. TWR yielded slightly better model performance than GTWR and reduced the root mean squared errors (RMSE) and mean absolute error (MAE) of the model residuals by 67% compared with OLS; while GWR only reduced RMSE and MAE by 15% against OLS. LMM performed slightly better than GWR by accounting for both temporal autocorrelation between observations over time and spatial heterogeneity across the 13 cities under study, which provided an alternative for modeling PM2.5. Conclusions: The traditional OLS and GWR are inadequate for describing the non-stationarity of PM2.5. The temporal dependence was more important and significant than spatial heterogeneity in our data. Our study provided evidence of spatial–temporal heterogeneity and possible solutions for modeling the relationships between PM2.5 and 5 criteria air pollutants for Heilongjiang province, China.

1. Introduction

Air pollutants can be emitted from anthropogenic and natural sources and may be either emitted directly (primary pollutants) or formed in the atmosphere (as secondary pollutants) [1]. They may be transported or formed over long distances and have influences on human health, ecosystems, the built environment, and climate in large areas. It has been confirmed by extensive epidemiological studies that air pollution is closely associated with increased risks of mortality or morbidity for cardiovascular and respiratory diseases [2,3,4,5]. It was reported that air pollution ranked the 7th killer to the global public health and contributed to 3.2 million and 4.2 million premature deaths Worldwide in 2010 and 2016, respectively [6,7]. The European Environment Agency estimates that about 30% of Europe’s urban population is still exposed to air pollution concentrations exceeding the European Union air quality standards [1,8]. With the rapid urbanization and industrialization, China has experienced a serious challenge of preventing and controlling air pollution as well. About 1.2 million premature deaths in 2010 and 25 million disability-adjusted life-years annually in China resulted from ambient or outdoor air pollutions [9]. Ambient particulate matter pollution has become the 4th killer to the health of Chinese people in 2010, after dietary risk, high blood pressure, and smoking [6]. It is of great importance to prevent and control air pollution for constructing ecological civilization in China.
In 2012, the Chinese Ministry of Environmental Protection (MEP) released the revised Ambient Air Quality Standards and defined the 6 criteria air pollutants including the particular matter with aerodynamic diameter of 10 μm or less (PM10), the particular matter with aerodynamic diameter of 2.5 μm or less (PM2.5), ozone (O3), nitrogen dioxide (NO2), and sulfur dioxide (SO2), and carbon monoxide (CO) [10]. To mitigate the conflicts between air pollution and public health, the Chinese State Council issued the Air Pollution Prevention and Control Action Plan (APPCAP) in September 2013 and proposed a goal of reducing PM2.5 by 10% in major cities by the end of 2017 compared to 2012 levels [11]. The Plan was a milestone of air pollution prevention and control efforts in China and considered the most stringent air pollution control policy in China to date [12]. Although the improvements to air quality in most cities in recent years, decreasing PM2.5 in the Northern Plain of China is still a tough challenge and requires strong transregional collaboration [13]. Most studies focused on the levels of air pollution in the Beijing-Tianjin-Hebei region, the Yangtze River Delta, and the Pearl River Delta, which are the most developed regions in China in terms of a high level of per capita GDP and high population density (e.g., [14,15,16,17,18]). There is no adequate attention paid to the air pollution problems in northern China compared to above-developed regions, especially in the Heilongjiang Province.
To clarify the pollution level, characteristics, and main sources of particulate matter, the Heilongjiang government began to investigate the sources of atmospheric particulate matter within the province in 2013 [19]. It found that the main pollutants in the cities of Heilongjiang Province were inhalable particulate matter (PM10) and fine particulate matter (PM2.5). In particular, the increase in coal-fired emissions during the heating season (late October, November, December, January, February, and March) significantly increased the concentration of PM2.5. The local sources of PM10 and PM2.5 in Heilongjiang Province are mainly from coal-fired, followed by motor vehicle exhaust and dust. The contributions of different sources have certain seasonal variations [19]. Although the sources of air pollutants were explored, the spatial and temporal variation characteristics of different air pollutants in Heilongjiang Province were barely investigated.
Space and time are significant determinants of PM2.5. Similar to Tobler’s (1970) first law of geography that is “everything is related to everything else, but near things are more related than distant things” [20], many geographic processes (such as air pollution) followed similar rules in the temporal domain. Over the last two decades, researchers developed various statistical models to deal with spatial and/or temporal heterogeneities. Geographically weighted regression (GWR) was designed to extend the traditional global model fitted by ordinary least squares (OLS) and could effectively deal with spatial heterogeneity and autocorrelation in model errors [21,22,23]. GWR belongs to local modeling techniques and fits a regression model at each geographic location based on the neighbors within a specific bandwidth and distance-dependent weight function. In recent years, researchers have extended GWR to a temporal dimension for spatiotemporal modeling, which is called geographically and temporally weighted regression (GTWR) [24,25]. GTWR deals with both spatial and temporal nonstationarities simultaneously by constructing a weight matrix based on spatiotemporal distance. GTWR greatly expands the boundary of local modeling techniques, and has been applied in various fields, such as environment, economics, sociology, and so on [24,26,27,28,29,30]. As a special case of GTWR, GWR ignores the temporal non-stationarity, while TWR (temporally weighted regression) ignores the spatial non-stationarity thus that GTWR integrates GWR and TWR into one uniform framework.
Besides GTWR, linear mixed models (LMM) can be applied to deal with both temporal autocorrelations and spatial heterogeneity problems. LMM can account for the sources of heterogeneity and dependence in the data by using the random-effects and model temporal autocorrelations by using appropriate covariance matrices for the model residuals, thereby aiding in statistical inference [31]. LMM can improve model fitting and performance if a random variation is focused, particularly in the studies of ecological heterogeneity or the heritability of discrete characters [32]. However, LMM is not widely applied for air pollution data but particularly suitable to this study because it can deal with air pollutants as the fixed-effects and cities (spatial) as the random-effects within longitudinal observations (temporal). Therefore, LMM is an appropriate alternative to model PM2.5 in this study.
Given the need for a better understanding of spatial–temporal heterogeneity between the criteria air pollutants in Heilongjiang Province, spatial–temporal statistical techniques present an important tool to quantitatively describe the amount of air pollution at particular locations at specific times. The objective of this study was to model PM2.5 with the 5 criteria air pollutants (SO2, NO2, PM10, CO, and O3) in Heilongjiang Province, China, from 2015 to 2018 using global and geographically and temporally weighted regression models. Specifically, this study implemented 5 models, including ordinary least square regression (OLS) that was applied as the benchmark for model comparisons, linear mixed model (LMM) that considered different cities (or regions) as the random-effects and repeated measurements over time, traditional geographically weighted regression (GWR), temporally weighted regression (TWR), and geographically and temporally weighted regression (GTWR). The goodness-of-fit, prediction accuracy, uncertainty accuracy of models, and model residuals were evaluated and compared based on corrected Akaike’s information criterion (AICc), adjusted coefficient of determination ( R a 2 ), and root mean squared errors (RMSE), mean absolute error (MAE), normal (Z) scores, and Moran’s I of the model residuals. Understanding the spatial and temporal heterogeneity of various air pollutants can assist in the PM2.5 prediction and air pollutants control management in the future.

2. Materials and Methods

2.1. Study Area and Data

Heilongjiang Province is located in the Northeast of China between 43°26′ N and 53°33′ N Latitude, and 121°11′ E and 135°05′ E Longitude. It has 13 prefecture-level administrative regions, including 12 prefecture-level cities (i.e., Harbin, Qiqiha’er, Mudanjiang, Jiamusi, Daqing, Jixi, Shuangyashan, Yichun, Qitaihe, Hegang, Suihua, and Heihe) and 1 region (Da Xing’an Mountain) with a total area of 473,000 km2 (Figure 1). Generally, the terrain of Heilongjiang is high in the northwest, north, and southeast, low in the northeast and southwest, and is mainly composed of mountains, terraces, plains, and water. The elevations of mountains, terraces and plains are between 300 and 1000 m (accounting for 58% of the total area), between 200 and 350 m (14%), and between 50 and 200 m (28%), respectively. Heilongjiang Province belongs to the continental monsoon climate of cold temperate and temperate zone with a short frost-free period and has large regional differences in climate. The main characteristics are low-temperature and drought in spring, warm and rainy in summer, dry and early-frost in autumn, and long cold in winter. The precipitation is abundantly affected by the southeast monsoon in summer and insufficient controlled by the dry and cold northwest wind in winter, which presents obvious monsoon characteristics.
In this study, 6 criteria air pollutants, including SO2, NO2, PM10, CO, O3, and PM2.5, were obtained from the Environmental Monitoring Station of Heilongjiang Province and had already aggregated into the 12 prefecture-level cities and 1 region based on the 56 environmental monitoring sites in Heilongjiang Province. All the pollutants were recorded by μg/m3, except CO (mg/m3). The daily records of the criteria air pollutants were aggregated into weekly measurements of the 13 cities (region) of Heilongjiang Province from 2015 to 2018 (i.e., 210 weeks per city or region, and 2730 records in total). The descriptive statistics of the 6 air pollutants are summarized in Table 1.
Figure 2 presents the matrix of Pearson correlation coefficients (ρ) of the 6 air pollutants. It showed that PM2.5 was highly correlated (ρ >0.5) with PM10, NO2, CO, and SO2; whereas O3 was weakly (−0.25 < ρ < 0) to moderately (−0.5 < ρ < −0.25) correlated with the group of PM10, NO2 and PM2.5, and the group of SO2 and CO, respectively. Figure 3 presents the variability of the 6 criteria air pollutants cross locations (12 prefecture-level cities and 1 region) and periods (210 weeks from 2015 to 2018). There were obvious seasonal trends for all the air pollutants in most cities (region), especially in Harbin, Qiqiha’er, and Suihua. On the other hand, Heihe, Yichun, and Da Xing’an Mountain region showed relatively stable trends for all pollutants, except O3 compared to other cities. It indicated both spatial and temporal variability visually existing in Heilongjiang Province.

2.2. Methods

2.2.1. OLS and LMM

Traditional ordinary least square (OLS) regression was used as the benchmark for model comparisons as follows:
Y i = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + ε i
where Yi is the response variable (i.e., PM2.5 in this study), where i = 1, 2, …, n; β0 ~ β5 are the regression coefficients to be estimated from the data; the predictors X1 ~ X5 represent SO2, NO2, PM10, CO, and O3, respectively; and εi is the random error term following the normal distribution with zero mean and constant variance, i.e., N(0, σ2I), with I denoting an n × n identity matrix.
The model coefficient vector βT = [β0, β1, …, β5] of OLS is estimated by
β ^ = ( X T X ) 1 X T Y
where the superscript T denotes the transpose of a matrix, X and Y are the vectors of predictors and response variables, respectively. The OLS regression represents a universal or constant relationship between predictors and response variables across the entire study area.
The traditional OLS model was not only a global model but also a fixed-effects model for the predictors (SO2, NO2, PM10, CO, and O3), which were the only levels or factors under consideration for the statistical inference. In contrast, the random-effects were the factors that were randomly selected from an infinite population of the possible levels or factors and could vary if the experiment was implemented for another time thus that the statistical inference was direct towards the entire population of factor levels, not just those “random” levels that were incorporated into the experiment. In this study, the 13 cities (region) were considered as the random-effects because they were one subset of the cities in Heilongjiang Province. Thus, the linear mixed model was used to incorporate both fixed-effects (SO2, NO2, PM10, CO, and O3) and random-effects (13 cities), as well as account for time series observations across the 4 year periods. LMM was an extension of the linear models, and can be expressed as:
Y = X β + Z γ + ε
where Y is an n × 1 column vector of the response variable, X is an n × p matrix of the (p − 1) predictors with the first column of 1 for estimating the intercept coefficient, n is the number of sample observations, p is the number of fixed-effects parameters, β is an p × 1 vector of unknown fixed-effects parameters, Z is a known n × q design matrix for the q random-effects, γ is an q × 1 vector of unknown random-effects parameters, and ε is an n × 1 vector of the random model errors. LMM follows several assumptions:
{ E ( γ ) = 0   and   Var ( γ ) = G   E ( ε ) = 0   and   Var ( ε ) = R Cov ( γ , ε ) = 0 ,       γ ,   ε ~ N o r m a l Var ( Y ) = V = Z G Z + R
where E(∙), Var(∙), and Cov(∙) denote expectation, variance, and covariance, respectively; “Normal” represents “following a normal distribution”. The variance of Y can be estimated by random-effects design matrix Z, and covariance matrices G and R. The estimates of the fixed-effects and random-effects parameters can be expressed by Equations (5) and (6), respectively,
β ^ = ( X V ^ 1 X ) 1 X V ^ 1 Y
γ ^ = G ^ Z V ^ 1 ( Y X β ^ )
where G ^ , R ^ , and V ^ are the reasonable estimates of G, R, and V, respectively, from the data.

2.2.2. GWR Model and Parameter Estimation

GWR model extends the traditional ordinary least square regression from a global to local framework [24], and can be expressed as follows:
Y i = β 0 ( u i , v i ) + k = 1 p 1 β k ( u i , v i ) X i k + ε i               i = 1 ,   2 , , n
where Yi is the response variable, (ui, vi) denotes the coordinates of the location i in space, β0(ui,vi) and βk(ui,vi) represent the intercept and a set of (p − 1) slope parameters at the location i, respectively. Xik represents a set of (p − 1) predictors (k = 1, 2, …, p−1) at the ith location, p is the total number of parameters to be estimated, εi is the error term of location i. Comparing to the “fixed” coefficient estimates of the global OLS model (Equation (1)), GWR captures the spatial heterogeneity using varied parameter estimates over space.
GWR follows Tobler’s first law of geography [20], and is calibrated using a locally weighted least squares approach, and the estimation of parameters is obtained by the following equation:
β ^ ( u i , v i ) = ( X T W ( u i , v i ) X ) 1 X T W ( u i , v i ) Y
where W(ui,vi) is an n × n weight matrix whose off-diagonal elements are zero and diagonal elements denote the geographical weighting of the neighboring observations for the focal observation i as follows:
W ( u i , v i )   =   ( w i 1 0 0 0 w i 2 0 0 0 w in )
It is critical to select an appropriate weight matrix for estimating the parameters of GWR. The spatial weights can be estimated by a spatial kernel function, also called a distance-decay function. According to whether the bandwidth is varied, the 2 basic types of spatial kernels are fixed and adaptive kernels, which use fixed bandwidth and a fixed number of nearest neighbors within an adaptive bandwidth, respectively [33]. The commonly used spatial kernel functions include exponential kernel function, Gaussian kernel function, and bi-square kernel function [21]. In this study, the bi-square function was selected because it had the best (smallest) AICc for fitting the GWR model to the data. The adaptive kernel of bi-square function is defined as follows [34]:
w i j = { [ 1 ( d i j h i ) 2 ] 2 , i f   d i j < h i 0   , o t h e r w i s e
where dij is the distance between locations i and j; hi is the bandwidth used to estimate parameters at location i. The optimal bandwidth is usually selected based on a goodness-of-fit criterion such as cross-validation or Akaike Information Criterion (AIC) [25]. The corrected AIC (AICc) approach was applied for the optimal bandwidth selection in this study.

2.2.3. GTWR and TWR Models

GTWR is an extension of GWR with temporal variations and incorporates both spatial and temporal heterogeneity in the data. The spatiotemporal nonstationary in the parameter estimates is captured by constructing the weight matrix based on spatiotemporal distances [24,25]. The GTWR model can be expressed as follows:
Y i = β 0 ( u i , v i , t i ) + k = 1 p 1 β k ( u i , v i , t i ) X i k + ε i               i = 1 ,   2 , , n
The parameter β k ( u i , v i , t i ) should be estimated for every predictor k and every space-time location i. The estimation of β k ( u i , v i , t i ) is very similar to that in GWR (Equation (8)), and can be expressed as follows:
β ^ ( u i , v i , t i ) = ( X T W ( u i , v i , t i ) X ) 1 X T W ( u i , v i , t i ) Y
Since GTWR considers both space and time, a spatial–temporal weight matrix is constructed based on a spatio–temporal distance. In this study, the Euclidean spatio–temporal distance was applied and determined as follows:
d i j S T = λ [ ( u i u j ) 2 + ( v i v j ) 2 ] + μ ( t i t j ) 2
where λ and μ are the scale factors in space and time metric system, respectively, which are used to balance the different effects that measure the spatial and temporal distance; ti and tj are the observed times at the locations i and j. In this study, an adaptive Gaussian distance–decay function was applied to construct a weight matrix. The weight matrix was still a diagonal matrix in GTWR and the diagonal elements are determined as follows [24]:
w i j = exp { ( d i j S T ) 2 h S T 2 } = exp { ( d i j S ) 2 h S 2 } × exp { ( d i j T ) 2 h T 2 }
where h S T 2 is a parameter of spatio–temporal bandwidth, and h S 2 = h S T 2 / λ and h T 2 = h S T 2 / μ are spatial and temporal bandwidth, respectively. d i j S = [ ( u i u j ) 2 + ( v i v j ) 2 ] is the spatial distance, and d i j T = ( t i t j ) 2 is the temporal distance.
If no spatial variation exists in the observed data, the parameter λ would be zero (i.e., λ = 0) in Equation (13), that leads to the temporally weighted regression. TWR only considers the temporal variation based on the temporal distance. On the contrary, the parameter μ would be zero (i.e., μ = 0) in Equation (13) and GTWR would reduce to the traditional GWR without considering the temporal variation. Thus, both GWR and TWR are special cases of GTWR. All GWR-based local models (GWR, TWR, and GTWR) were performed using R package Gwmodel [35] under R version 3.5.1 [36].

2.2.4. Model Assessment

The model goodness-of-fit was evaluated by the corrected Akaike’s information criterion (AICc) [21], adjusted coefficient of determination ( R a 2 ), root mean squared errors (RMSE), mean absolute error (MAE), and Z score of the model residuals. AICc is one of the most commonly used goodness-of-fit criteria for model comparisons and defined as Equation (15). The smaller the AICc is, the better the model performs.
AICc = 2 n log e ( σ ^ ) + n log e ( 2 π ) + n { n + tr ( S ) n 2 tr ( S ) }
where n is the sample size, σ ^ is the estimated standard deviation of the error term, and tr(S) denotes the trace of hat matrix S (i.e., Y ^ = S Y ). The hat matrix is a function of bandwidth and defined as follows in the GWR model:
S = X ( X T W ( u i , v i ) X ) 1 X T W ( u i , v i )
The coefficient of determination (R2) is another basic and widely used good-of-fit (Equation (17)). It represents the percentage of the total variation in the observed Y that is explained by the model. However, R2 tends to exaggerate the explained percentage since it never decreases by adding more predictor variables. The adjusted coefficient of determination ( R a 2 ) overcomes this drawback by dividing RSS and SST by their associated degrees of freedom (see Equation (18)). As multiple predictors were included, R a 2 was adopted.
R 2 = 1 R S S S S T
R a 2 = 1 R S S / ( n p ) S S T / ( n 1 ) = 1 ( n 1 ) ( 1 R 2 ) ( n p )
where RSS is the residual sum of squares, and SST is the sum of squares of total variation of the response variable, n is the sample size, p is the number of coefficients (including all of predictor coefficients and intercept).
The R2 statistics of LMM is more complicated than the fixed-effects models and attracted many scientists’ attention [37,38]. In this study, the conditional R2 ( R c 2 ) was used, representing the variance explained by the entire model (including both fixed and random effects) [37], which can be expressed as follows:
R c 2 = σ f 2 + σ r 2 σ f 2 + σ r 2 + σ ε 2
where σ f 2 , σ r 2 , and σ ε 2 represent the variance of the fixed-effects, random-effects, and model residuals, respectively.
In addition to AICc and R a 2 , the root mean squared errors (RMSE), mean absolute error (MAE), and Z score of the model residuals are calculated to evaluate model performance by Equations (20−22), respectively:
R M S E = i = 1 n ( y i y ^ i ) 2 n
M A E = i = 1 n | y i y ^ i | n
Z = y i y ^ i s t d ( y ^ i )
where y i and y ^ i are the observed and predicted values of the response variable, and std denotes standard deviation.
The Moran’s I is commonly used to investigate the spatial dependencies in the model residuals from each regression model [23]. Moran’s I was positive when the model residuals tended to be similar, negative when they tended to be dissimilar, and approximately 0 when they arranged randomly and independently over space [39] and can be expressed as follows:
I = n W · i j w i j ( e i e ¯ ) ( e j e ¯ ) i ( e i e ¯ ) 2
where wij is the diagonal elements of the spatial weight matrix, W is the sum of all wij, ei denotes the residual at location i and e ¯ denotes the mean of residuals. The expected value of Moran’s I under the null hypothesis of no spatial autocorrelation (i.e., randomization) is E ( I ) = 1 n 1 . In this study, the Moran’s I of the model residuals were calculated and plotted against the number of nearest neighbors (i.e., bandwidth) to present the relationship between spatial autocorrelation of the residuals and bandwidth. The p-value (two-tailed) calculated for the null hypothesis test of randomization was also plotted against the number of nearest neighbors.

3. Results

3.1. OLS and LMM

The OLS regression was used to model the relationships between PM2.5 and 5 criteria air pollutants as the benchmark for model comparisons. OLS was global and fixed-effects in nature and represented the average relationship between the response variable and predictors. Table 2 lists the parameter estimates of the OLS model (Equation (1)). All model coefficients were statistically significant and positive for predicting PM2.5, except O3. The OLS model indicated that PM2.5 increased as the 4 air pollutants (SO2, NO2, PM10, and CO) increased, while PM2.5 increased as O3 decreased. This was evident by the negative correlation between PM2.5 and O3 (Figure 2), which was consistent with a previous study [40]. Table 2 also indicated that PM10 was the most influential factor on PM2.5 because it had the largest standardized estimate (0.716). PM2.5 would increase 0.52 μg/m3 when PM10 increased 1 μg/m3 while keeping other predictors as constants. The OLS model fitted the data well according to the R a 2 , i.e., 85% of the total variation of PM2.5 can be explained by the OLS model.
The linear mixed model was applied for the 5 criteria air pollutants (SO2, NO2, PM10, CO, and O3) as the fixed-effects and the 13 cities (or region) as the random-effects. The G matrix (Equation (4)) of the random-effects was estimated by the covariance structure of variance components (VC), while the R matrix (Equation (4)) of the model residuals was estimated by the covariance structure of first-order autoregressive (AR(1)). The covariance structures of VC and AR(1) can be expressed by Equations (24) and (25),
Var ( γ ) = G = [ σ 1 2 0 0 0 σ 2 2 0 0 0 σ 6 2 ]
Var ( ε ) = R = σ 2 [ 1 r r 2 r 3 r 4 r 5 r 1 r r 2 r 3 r 4 r 2 r 1 r r 2 r 3 r 3 r 2 r 1 r r 2 r 4 r 3 r 2 r 1 r r 5 r 4 r 3 r 2 r 1 ]
where σ 1 2 ~ σ 6 2 represent the variance of β0 (intercept), β1 (SO2), β2 (NO2), β3 (PM10), β4 (CO), and β5 (O3), respectively; σ 2 represents the variance of the model residuals, and r represents the first-order temporal autocorrelation. The parameter estimates of the G and R matrices were listed in Table 3. It seems that the temporal autocorrelations between the weekly observations were relatively high (r = 0.552) and significant.
Table 4 listed the parameter estimates of the fixed-effect (overall) of the LMM model. It indicated that all the coefficient estimates followed the same signs of the OLS estimates. The standard errors of the LMM coefficient estimates were much larger than those of the OLS estimates. This was because LMM treated the air pollutants as the fixed-effects but adjusted each coefficient estimate for each of the 13 cities (the random-effects), while OLS fitted an average model using the air pollutants, thus that underestimated the standard errors of the model coefficients. The LMM model explained 89.8% of the total variation of PM2.5 in the data. The AICc (18,756.7) of the LMM model was much smaller than that (20,109.26) of the OLS model, indicating a much better model fitting by LMM over OLS.
The 13 cities (or region) were treated as the random-effects in the LMM model thus that the model coefficients can be derived for each of the 13 cities (or region), which were listed in the Supplementary Materials (Table S1) due to the page limit. Figure 4 represents the relationships between predicted PM2.5 and the most significant predictor (PM10) as an example (while keeping other predictors as constants of means). The solid black line was the overall or fixed-effects model, while the colored dotted lines were the 13 models for the 13 cities (region). All the regression lines had slightly different slopes compared with the slope of the overall model. Among the 13 cities (region), the slope coefficient β3 (PM10) of Jiamusi (Figure 4a) and Shuangyashan (Figure 4b) were significantly greater (i.e., steeper slope) than that of the overall model, indicating that increasing the PM10 level in Jiamusi and Shuangyashan would cause a much higher PM2.5 than the other cities, especially for the higher concentrations of PM10. In contrast, Daqing (Figure 4a), Heihe (Figure 4a), and Suihua (Figure 4b) had significantly smaller β3 (i.e., shallower slope) than that of the overall model, indicating that increasing the PM10 level in Daqing, Heihe, and Suihua would cause a lower increase in PM2.5 than the other cities.

3.2. Local Models (GWR, TWR, and GTWR)

The local models, including GWR, TWR, and GTWR, were used to explore spatial and/or temporal heterogeneity in PM2.5 and 5 criteria air pollutants across the study area. The adaptive bi-square kernel function was used as the spatial weighting kernel function in this study. The appropriate bandwidth (number of neighbors) was selected based on the smallest AICc. The parameters and model fitting information of GWR, TWR, and GTWR are summarized in Table 5. The signs and magnitudes of the all median coefficients of the three GWR-based models (GWR, TWR, and GTWR) were compatible with those of the OLS model. However, the TWR’s model coefficients were more similar to those of GTWR, while GWR’s model coefficients were slightly different from those of TWR and GTWR.

3.3. Model Assessment

Table 6 showed the model goodness-of-fit (i.e., R a 2 and AICc), prediction accuracy (i.e., RMSE and MAE), and prediction uncertainty (i.e., the mean and standard deviation (Std) of the Z score of model residuals). The three GWR-based models fitted the data better than the OLS model ( R a 2 = 0.846), with the order of TWR ( R a 2 = 0.968), GTWR ( R a 2 = 0.968), and GWR ( R a 2 = 0.884). Both GTWR and TWR also performed better than the LMM model ( R c 2 = 0.898), while GWR was slightly worse than LMM (Table 4). In the terms of AICc, TWR (17,944.31) fitted the data best, followed by GTWR (18,016.58), LMM (18,756.70), GWR (19,403.51), and OLS (20,109.26). The results indicated that the temporal autocorrelations in the 210-week time series data played a more important role than the spatial heterogeneity across the 13 cities (region). It was a bit surprising that LMM performed better than GWR in terms of goodness-of-fit. However, the prediction accuracy measures of GWR were slightly better than that of LMM in terms of RMSE (GWR 8.207 vs. LMM 8.482) and MAE (GWR 5.621 v.s. LMM 5.794). It was clear that TWR and GTWR were significantly superior to OLS, LMM and GWR according to all comparison criteria, i.e., they had higher R a 2 , smaller AICc, smaller RMSE, and MAE. TWR and GTWR reduced RMSE and MAE of the model residuals by 67% from the OLS model, while GWR only reduced RMSE and MAE by 15%.
To investigate the spatial dependencies in the model residuals from each model, the Moran’s I of the model residuals of OLS, GWR, TWR, and GTWR were calculated and compared. Figure 5 presents the Moran’s I calculated using different bandwidths (i.e., number of neighbors) for all 5 models (Figure 5a). The positive Moran’s I indicated that the model residuals were clustering in similar model residuals (either positive or negative), which may cause a serious violation of the independence assumption of the model residuals and lead to insufficient estimation of the model coefficients [39]. Therefore, the OLS model either under-predicted or over-predicted PM2.5 across the study area. The Moran’s I of the model residuals for all 5 models approached zero (i.e., randomness) with the increase of bandwidth (number of neighbors) and tended to be stable when the number of neighbors reached about 650.
However, the Moran’s I of the LMM and GWR-based models were much smaller than those of OLS and approached zero in the opposite direction (negative) to that of OLS (positive), which indicated that the residuals of LMM and GWR-based models were dissimilar to OLS when the bandwidth was small (Figure 5a). Figure 5b focused on the Moran’s I of LMM and GWR-based models only to see the differences between the 4 models. The LMM and GWR-based models generated little under-predictions or over-predictions for the patches of local areas and produced a significant reduction of spatial autocorrelation in the model residuals to various degrees by explicitly applying the local information among neighboring locations and/or time. The differences of Moran’s coefficients among LMM, GWR, TWR, and GTWR were more obvious with small bandwidths than large bandwidths. The LMM and GWR-based models had almost the same and little spatial autocorrelation when the number of neighbors was greater than 850. TWR performed over GTWR, and then both over LMM and GWR when the number of neighbors was less than 250. The performances of TWR and GTWR were almost the same when the bandwidth was larger than 250. It was consistent with the results of Table 6 that used the optimized bandwidth.
Figure 6 presents the p-value (two-tailed) for the randomization null hypotheses test of the model residuals of OLS, LMM, GWR, TWR, and GTWR calculated using different bandwidths (i.e., number of neighbors). It indicated that the residuals of OLS were not random at any bandwidths since the p-value was much less than 0.05 (reject randomization null hypotheses) across all bandwidths. The residuals of LMM and all GWR-based models presented randomized character across all bandwidth except the residuals of LMM and GWR when the number of neighbors was smaller than 250. The magnitude of randomness gradually increased with the increase of the number of neighbors and followed by order of GWR, LMM, TWR, and GTWR. The TWR and GTWR had a similar trend. In general, the p-value of the randomization null hypotheses test followed the same trend of Moran’s coefficients since they all reacted to the spatial autocorrelation of the model residuals.

4. Discussion

Heilongjiang Province is the northernmost province in China, with the characters of high latitude, low temperature, four distinct seasons, and long and chilly winter. The cold winter leads to a very long heating season (from late October to March in the next year) in Heilongjiang. The industrial production, urban heating, emissions from unorganized pollution sources (such as straw burning) around cities, and motor vehicle exhaust emission mainly contributed to the air pollution of Heilongjiang [19]. Thus, effective measures should be implemented, such as the prohibition of open biomass burning, improvement of coal energy efficiency, and full use of clean fuels (nuclear, wind, and solar energy) for municipal heating [41].
In recent years, the Heilongjiang government has attached great importance to the fight of “blue sky defense” and made obvious progress in the prevention and control of air pollution. Fortunately, the annual concentration of PM2.5 from 2015 to 2018 presented a decreasing trend in most cities, except Jixi (Figure 7). The air pollutants are divided into 6 grades (I: Excellent; II: Good; III: Light pollution; IV: Moderate pollution; V: Heavy pollution; VI: Serious pollution) in terms of the Technical Regulation on Ambient Air Quality Index of China currently being tried [42,43]. Only Harbin (1 out of 13) exceeded the national annual PM2.5 standards (≤35 μg/m3) in all 4 years, but with a declining tendency within grade II. On the contrary, Heihe, Yichun, and Da Xing’an Mountain region were in excellent condition of air in all 4 years, which was consistent with the stable seasonal trend of the 6 air pollutants in Figure 3. The annual PM2.5 concentration of the other cities oscillated around the grade I limit (35 μg/m3). Some cities exceeded the grade I limit in 2015 and 2016 and then fell within the limit in the following 2 years (like Daqing, Hegang, and Mudanjiang), which indicated progress in the prevention and control of air pollution. However, Jixi, one of the most vital coal origins of the Heilongjiang Province, faced a more serious air pollution problem in 2017 than other years due to the poor air quality in heating seasons resulting from long and icy winters. This situation was alleviated in 2018. The provincial city Harbin had an even bigger air pollution problem than other cities because of the massive urban heating of around 9.5 million population, industry, and increased motor vehicle exhaust emissions every year. The phenomenon of spatial and temporal non-stationarity of PM2.5 shown in Figure 7 is consistent with that in Figure 3.
Recently, researchers have investigated the relationship between PM2.5 and various factors, such as aerosol optical depth (AOD) and normalized difference vegetation index (NDVI) derived from satellite imagery, meteorological factors (like temperature, wind speed, relatively humidity), transportation emission factors, density of industrial plants, land-use, gross domestic product (GDP), and digital elevation model (DEM) and so on [44,45,46,47,48,49,50]. These studies have led to an increasingly comprehensive understanding of PM2.5 and impact factors. Although PM2.5 correlates with many factors, it has the most direct correlation with the criteria air pollutants, especially PM10. It is meaningful to evaluate the spatial–temporal relationships between PM2.5 and criteria air pollutants since it can provide useful information on the PM2.5 concentration without direct PM2.5 monitoring, especially before 2015, when the PM2.5 monitoring network was sparse in Heilongjiang.
Meanwhile, PM2.5 modeling techniques vary according to the different predictors applied and objectives. In general, there are four major categories for modeling PM2.5: (1) Time series analysis and related statistical analysis (e.g., [16,51,52,53,54,55]); (2) GTWR and its derivative models (e.g., [30,56,57,58]); (3) machine learning method using plenty of predictors (e.g., [48,49]); (4) comprehensive approach by integrating several above methods (e.g., [46]). Each method has pros and cons. It is vital to choose an appropriate method instead of a complex approach to solve the problems according to a specific dataset. And, it is meaningful to compare different methods using the same dataset and accuracy indices.
In this study, the spatial and/or temporal heterogeneity of PM2.5 concentrations of Heilongjiang Province were investigated using LMM, GWR, TWR, and GTWR models based on the criteria air pollutants from 2015 to 2018. The major problem of global models applied to environmental processes is that they assume the processes to be constant across space or time. The spatial or temporal effects (spatial/temporal autocorrelation and heterogeneity) may violate the assumptions of independent observation and/or invariant variance, which biases the estimates of standard errors and results in imprecise coefficient estimates [59,60]. Although LMM is a global method, it could deal with spatial and temporal dependence and heterogeneity using appropriate variances of random-effects and random errors (i.e., G and R matrices), thus obtaining a better model performance than OLS (5% improvement of adjusted R2). However, the local model GWR did not perform better than LMM because it only handled spatial heterogeneity by “borrowing” the data from surrounding locations and ignored temporal information [25]. TWR and GTWR showed advantages over OLS by incorporating temporal heterogeneity (12% improvement of adjusted R2). The obvious seasonal variation (Figure 3) and decreasing tendency (Figure 7) lead to the high efficiency of incorporating temporal information in local models. However, the GTWR was not significantly superior to TWR in terms of the adjusted R2, AICc, RMSE, and MAE of the model residuals, which might result from the inadequate geographical locations (only 13 cities or regions). It was also the reason why the improvement of the GWR (3.8% of adjusted R2) was so limited. It is more efficient to incorporate temporal information than spatial information in this case. The criteria air quality data from all 56 air monitoring stations across the entire Heilongjiang Province could be applied to model the spatial and temporal heterogeneity of PM2.5. However, it is a challenge to process a massive dataset with both spatial and temporal information due to the limited calculation capability of the computer. Sometimes, it is a prerequisite to balance between the calculation efficiency and data richness. We sacrificed some spatial information by using the citywide average concentrations since they were the averages of concentrations at all monitoring sites in each city, and also the daily concentrations of air pollutants reported to the public by the government [61]. We also sacrificed some temporal information by aggregating daily data into weekly data for balancing those two. Nevertheless, the TWR model based on 210 weekly data was sufficient to describe the relationship between PM2.5 and the other 5 standard air pollutants in this study. Thus, it is scientific to apply the same policy for prevention and control of air pollution throughout the entire Heilongjiang Province with special attention paid to temporal changes.
Heilongjiang Province has begun to systematically and repeatedly measure the ambient air quality data (6 air pollutants) since 2015. How to reduce or eliminate the harm of PM2.5 to public health has become one of the most challenging problems for air pollution prevention in Heilongjiang Province. PM2.5 is inextricably related to other standard pollutants. Although some researchers have been investigated the spatial–temporal heterogeneity in the PM10–PM2.5 relationship using GWR-based models (e.g., [62]), the relationship between PM2.5 and criteria air pollutants (e.g., PM10, SO2, NO2, CO, O3) is still unclear and the spatial–temporal heterogeneity in the relationship is still to be studied.

5. Conclusions

In this study, we investigated the relationships between PM2.5 and 5 criteria air pollutants (i.e., PM10, SO2, NO2, CO, O3) for 13 cities (or region) of Heilongjiang Province during 2015~2018 using global and graphically and temporally weighted regression (i.e., OLS, LMM, GWR, TWR, and GTWR). The daily data were integrated into weekly data due to excessive computation. The model performance and spatial autocorrelation of model residuals were compared. The results showed that all parameter estimates tended to be positive for predicting PM2.5, except O3. The LMM and all GWR-based models (i.e., GWR, TWR, and GTWR) showed great advantages over OLS in terms of model fitting, such as producing higher R2 and more desirable model residuals, especially TWR and GTWR that incorporated temporal variation. The GWR, LMM, and TWR and GTWR improved the explanation of variance in PM2.5 by 3%, 5%, and 12%, respectively, from the R2 (0.85) of OLS. TWR yielded slightly better model performance, prediction accuracy and uncertainty accuracy than GTWR (smaller AICc, RMSE, MAE and standard deviation of Z score of model residuals), and also reduced RMSE and MAE of model residuals by 67% comparing to OLS model; while GWR only reduced RMSE and MAE by 14%~15% comparing to OLS. The traditional OLS and GWR were inadequate for describing the nonstationary of PM2.5. The LMM slightly performed better than GWR since it considered different locations as a random effect and meanwhile handled the repeated measurements using the R matrix, which provided an alternative solution besides the GWR-based models. Although GWR that incorporates the spatial non-stationarity of PM2.5 improved the model performance of OLS, it is still far from TWR that incorporates temporal nonstationary of PM2.5. GTWR did not bring any improvements to TWR by adding spatial information because of the limited number of locations. The temporal heterogeneity is more obvious than spatial heterogeneity in this case. Thus, the incorporation of temporal information is inevitable and adequate for modeling the relationship between PM2.5 and the other air pollutants in this study. This work provides evidence of spatial–temporal heterogeneity in the relationship between PM2.5 and the other standard air pollutants, and also provides possible solutions for modeling PM2.5 with the other air pollutants for Heilongjiang province. In addition, the localized model coefficients and predictions of the GWR models can provide spatio-temporal “hot spots” of PM2.5 pollution, which should be useful for assisting the governmental agencies to pin-point the seriousness of air pollution or local emission in order to make better management decisions.

Supplementary Materials

The following are available online at https://0-www-mdpi-com.brum.beds.ac.uk/1660-4601/16/24/5107/s1, Table S1: Parameter estimates of random effect and final estimates of LMM.

Author Contributions

Conceptualization, Z.Z.; Data curation, Q.W.; Formal analysis, Q.W., L.Z. and Z.Z.; Funding acquisition, W.D. and Z.Z.; Project administration, W.D.; Supervision, Z.Z.; Writing—original draft, Q.W.; Writing—review & editing, L.Z. and Z.Z.

Funding

This work was supported by “The Natural Science Foundation of Heilongjiang Province”, C2018001, and also supported by “The Fundamental Research Funds for the Central Universities”, 2572019CP15.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. European Environment Agency. Air Quality in Europe—2017 Report EEA Report No 13/2017; European Environment Agency: Copenhagen, Denmark, 2017; pp. 7–15. [Google Scholar]
  2. Hüls, A.; Schikowski, T. Ambient particulate matter and COPD in China: A challenge for respiratory health research. Thorax 2017, 72, 771–772. [Google Scholar] [CrossRef] [PubMed]
  3. Wu, R.; Song, X.; Bai, Y.; Chen, J.; Zhao, Q.; Liu, S.; Xu, H.; Wang, T.; Feng, B.; Zhang, Y.; et al. Are current Chinese national ambient air quality standards on 24-hour averages for particulate matter sufficient to protect public health? J. Environ. Sci. 2018, 71, 67–75. [Google Scholar] [CrossRef] [PubMed]
  4. Chang, H.H.; Pan, A.; Lary, D.J.; Waller, L.A.; Zhang, L.; Brackin, B.T.; Finley, R.W.; Faruque, F.S. Time-series analysis of satellite-derived fine particulate matter pollution and asthma morbidity in Jackson, MS. Environ. Monit. Assess. 2019, 191, 280. [Google Scholar] [CrossRef] [PubMed]
  5. Wang, H.; Tian, C.; Wang, W.; Luo, X. Temporal cross-correlations between ambient air pollutants and seasonality of tuberculosis: A time-series analysis. Int. J. Environ. Res. Public Health 2019, 16, 1585. [Google Scholar] [CrossRef] [Green Version]
  6. Wong, E. Air pollution linked to 1.2 million premature deaths in China. The New York Times, 1 April 2013. [Google Scholar]
  7. World Health Organization Air Quality. 2 May 2018. Available online: http://www.who.int/mediacentre/factsheets/fs313/en/ (accessed on 19 November 2019).
  8. Dia, D.; Tchepel, O. Spatial and temporal dynamics in air pollution exposure assessment. Int. J. Environ. Res. Public Health 2018, 15, 558. [Google Scholar]
  9. Zhou, M.; Wang, H.; Zhu, J.; Chen, W.; Wang, L.; Liu, S.; Li, Y.; Wang, L.; Liu, Y.; Yin, P.; et al. Cause-specific mortality for 240 causes in China during 1990–2013: A systematic subnational analysis for the global burden of disease study 2013. Lancet 2016, 387, 251–272. [Google Scholar] [CrossRef]
  10. Ministry of Environmental Protection. Ambient Air Quality Standards (GB3095-2012); Ministry of Environmental Protection (MEP): Beijing, China, 2012; pp. 1–6.
  11. The State Council of China. Air Pollution Prevention and Control Action Plan. 10 September 2013. Available online: http://www.gov.cn/zhengce/content/2013-09/13/content_4561.htm (accessed on 19 November 2019).
  12. Huang, J.; Pan, X.; Guo, X.; Li, G. Health impact of China’s Air Pollution Prevention and Control Action Plan: An analysis of national air quality monitoring and mortality data. Lancet Planet. Health 2018, 2, e313–e323. [Google Scholar] [CrossRef] [Green Version]
  13. Chang, W.; Zhan, J.; Zhang, Y.; Li, Z.; Xing, J.; Li, J. Emission-driven changes in anthropogenic aerosol concentrations in China during 1970–2010 and its implications for PM2.5 control policy. Atmos. Res. 2018, 212, 106–119. [Google Scholar] [CrossRef]
  14. Liu, J.; Li, W.; Wu, J.; Liu, Y. Visualizing the intercity correlation of PM2.5 time series in the Beijing-Tianjin-Hebei region using ground-based air quality monitoring data. PLoS ONE 2018, 13, e0192614. [Google Scholar] [CrossRef]
  15. Wang, J.; Qu, W.; Li, C.; Zhao, C.; Zhong, X. Spatial distribution of wintertime air pollution in major cities over eastern China: Relationship with the evolution of trough, ridge and synoptic system over East Asia. Atmos. Res. 2018, 212, 186–201. [Google Scholar] [CrossRef]
  16. Yang, Y.; Christakos, G.; Yang, X.; He, J. Spatiotemporal characterization and mapping of PM2.5 concentrations in southern Jiangsu Province, China. Environ. Pollut. 2018, 234, 794–803. [Google Scholar] [CrossRef] [PubMed]
  17. Xu, F.; Xiang, N.; Higano, Y. How to reach haze control targets by air pollutants emission reduction in the Beijing-Tianjin-Hebei region of China? PLoS ONE 2017, 12, e0173612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Zhang, H.; Wang, Y.; Hu, J.; Ying, Q.; Hu, X. Relationships between meteorological parameters and criteria air pollutants in three megacities in China. Environ. Res. 2015, 140, 242–254. [Google Scholar] [CrossRef] [PubMed]
  19. Wu, Y. Source Apportionment of Atmospheric Particle Matter of Cities in Heilongjiang Province, 1st ed.; China Environment Publishing House: Beijing, China, 2017; pp. 238–258. [Google Scholar]
  20. Tobler, W. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970, 46 (Suppl. 1), 234–240. [Google Scholar] [CrossRef]
  21. Fotheringham, A.S.; Brunsdon, C.; Charlton, M. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships; John Wiley & Sons Ltd.: Chichester, UK, 2002; pp. 27–64. [Google Scholar]
  22. Zhang, L.; Gove, J.H.; Heath, L.S. Spatial residual analysis of six modeling techniques. Ecol. Model. 2005, 186, 154–177. [Google Scholar] [CrossRef]
  23. Zhang, L.; Ma, Z.; Guo, L. An evaluation of spatial autocorrelation and heterogeneity in the residuals of six regression models. Forest Sci. 2009, 55, 533–548. [Google Scholar]
  24. Huang, B.; Wu, B.; Barry, M. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int. J. Geogr Inf. Sci. 2010, 24, 383–401. [Google Scholar] [CrossRef]
  25. Fotheringham, A.S.; Crespo, R.; Yao, J. Geographical and temporal weighted regression (GTWR). Geogr. Anal. 2015, 47, 431–452. [Google Scholar] [CrossRef] [Green Version]
  26. Ma, X.; Zhang, J.; Ding, C.; Wang, Y. A geographically and temporally weighted regression model to explore the spatiotemporal influence of built environment on transit ridership. Comput. Environ. Urban Syst. 2018, 70, 113–124. [Google Scholar] [CrossRef]
  27. Wu, J.; Wei, Y.; Li, Q.; Yuan, F. Economic transition and changing location of manufacturing industry in China: A study of the Yangtze River Delta. Sustainability 2018, 10, 2624. [Google Scholar] [CrossRef] [Green Version]
  28. Zhang, X.; Huang, B.; Zhu, S. Spatiotemporal influence of urban environment on taxi ridership using geographically and temporally weighted regression. ISPRS Int. J. Geo Inf. 2019, 8, 23. [Google Scholar] [CrossRef] [Green Version]
  29. Peng, Y.; Li, W.; Luo, X.; Li, H. A geographically and temporally weighted regression model for spatial downscaling of MODIS land surface temperatures over urban heterogeneous regions. IEEE Trans. Geosci. Remote. 2019, 7, 5012–5027. [Google Scholar] [CrossRef]
  30. Chu, H.; Bilal, M. PM2.5 mapping using integrated geographically temporally weighted regression (GTWR) and random sample consensus (RANSAC) models. Environ. Sci. Pollut. Res. 2019, 26, 1902–1910. [Google Scholar] [CrossRef] [PubMed]
  31. Jaeger, B.C.; Edwards, L.J.; Das, K.; Sen, P.K. An R2 statistic for fixed effects in the generalized linear mixed model. J. Appl. Stat. 2017, 44, 1086–1105. [Google Scholar] [CrossRef]
  32. Bolker, B.M.; Brooks, M.E.; Clark, C.J.; Geange, S.W.; Poulsen, J.R.; Stevens, M.H.H.; White, J.S.S. Generalized linear mixed models: A practical guide for ecology and evolution. Trends Ecol. Evolut. 2009, 24, 127–135. [Google Scholar] [CrossRef] [PubMed]
  33. Paez, A.; Uchida, T.; Miyamoto, K. A general framework for estimation and inference of geographically weighted regression models: Location-specific kernel bandwidths and a test for local heterogeneity. Environ. Plan. 2002, A34, 733–754. [Google Scholar] [CrossRef]
  34. Wu, B.; Li, R.; Huang, B. A geographically and temporally weighted autoregressive model with application to housing prices. Int. J. Geo Inf. Sci. 2014, 28, 1186–1204. [Google Scholar] [CrossRef]
  35. Gollini, I.; Lu, B.; Charlton, M.; Brunsdon, C.; Harris, P. GWmodel: An R package for exploring spatial heterogeneity using geographically weighted models. J. Stat. Softw. 2015, 63, 1–50. [Google Scholar] [CrossRef] [Green Version]
  36. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2008; Available online: https://www.R-project.org/ (accessed on 13 December 2019).
  37. Nakagawa, S.; Schielzeth, H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 2013, 4, 133–142. [Google Scholar] [CrossRef]
  38. Edwards, L.J.; Muller, K.E.; Wolfinger, R.D.; Qaqish, B.F.; Schabenberger, O. An R2 statistic for fixed effects in the linear mixed model. Stat. Med. 2008, 27, 6137–6157. [Google Scholar] [CrossRef] [Green Version]
  39. Zhang, L.; Gove, J.H. Spatial Assessment of Model Errors from Four Regression Techniques. For. Sci. 2005, 51, 334–346. [Google Scholar]
  40. Guo, H.; Wang, Y.; Zhang, H. Characterization of criteria air pollutants in Beijing during 2014–2015. Environ. Res. 2017, 154, 334–344. [Google Scholar] [CrossRef] [PubMed]
  41. Zhou, T.; Sun, J.; Yu, H. Temporal and spatial patterns of China’s main air pollutants: Years 2014 and 2015. Atmosphere 2017, 8, 137. [Google Scholar] [CrossRef] [Green Version]
  42. Ministry of Environmental Protection. Technical Regulation on Ambient Air Quality Index (on Trial) (HJ633-2012); Ministry of Environmental Protection (MEP): Beijing, China, 2012; pp. 1–6.
  43. Zhan, D.; Kwan, M.; Zhang, W.; Wang, S.; Yu, J. Spatiotemporal variations and driving factors of air pollution in China. Int. J. Environ. Res. Public Health 2017, 14, 1538. [Google Scholar] [CrossRef] [Green Version]
  44. Zheng, C.; Zhao, C.; Zhu, Y.; Wang, Y.; Shi, X.; Wu, X.; Chen, T.; Wu, F.; Qui, Y. Analysis of influential factors for the relationship between PM2.5 and AOD in Beijing. Atmos. Chem. Phys. 2017, 17, 13473–13489. [Google Scholar] [CrossRef] [Green Version]
  45. Liu, S.; Hua, S.; Wang, K.; Qiu, P.; Liu, H.; Wu, B.; Shao, P.; Liu, X.; Wu, Y.; Xue, Y.; et al. Spatial-temporal variation characteristics of air pollution in Henan of China: Localized emission inventory, WRF/Chem simulations and potential source contribution analysis. Sci. Total Environ. 2018, 624, 396–406. [Google Scholar] [CrossRef]
  46. Li, L.; Zhang, J.; Qiu, W.; Wang, J.; Fang, Y. An ensemble spatiotemporal model for predicting PM2.5 concentrations. Int. J. Environ. Res. Public Health 2017, 14, 549. [Google Scholar] [CrossRef] [Green Version]
  47. Chelani, A.B. Estimating PM2.5 concentration from satellite derived aerosol optical depth and meteorological variables using a combination model. Atmos. Pollut. Res. 2019, 10, 847–857. [Google Scholar] [CrossRef]
  48. Bai, Y.; Zeng, B.; Li, C.; Zhang, J. An ensemble long short-term memory neural network for hourly PM2.5 concentration forecasting. Chemosphere 2019, 222, 286–294. [Google Scholar] [CrossRef]
  49. Xue, T.; Zheng, Y.; Tong, D.; Zheng, B.; Li, X.; Zhu, T.; Zhang, Q. Spatiotemporal continuous estimates of PM2.5 concentrations in China, 2000–2016: A machine learning method with inputs from satellites, chemical transport model, and ground observations. Environ. Int. 2019, 123, 345–357. [Google Scholar] [CrossRef]
  50. Yang, Q.; Yuan, Q.; Yue, L.; Li, T.; Shen, H.; Zhang, L. The relationships between PM2.5 and aerosol optical depth (AOD) in mainland China: About and behind the spatio-temporal variations. Environ. Pollut. 2019, 248, 526–535. [Google Scholar] [CrossRef] [PubMed]
  51. Evagelopoulos, V.; Zoras, S.; Triantafyllou, A.G.; Albanis, T.A. PM10-PM2.5 time series and fractal analysis. Glob. Nest J. 2006, 8, 234–240. [Google Scholar]
  52. Lin, L.; Zhang, A.; Chen, W.; Lin, M. Estimates of daily PM2.5 exposure in Beijing using spatio-temporal Kriging model. Sustainability 2018, 10, 2772. [Google Scholar] [CrossRef] [Green Version]
  53. Du, J.; Qiao, F.; Yu, L. Temporal characteristics and forecasting of PM2.5 concentration based on historical data in Houston, USA. Resour. Conserv. Recy. 2019, 147, 145–156. [Google Scholar] [CrossRef]
  54. Gallero, F.J.G.; Vallejo, M.G.; Umbría, A.; Baena, J.G. Multivariate statistical analysis of meteorological and air pollution data in the ‘Campo de Gibraltar’ region, Spain. Environ. Monit. Assess. 2006, 119, 405–423. [Google Scholar] [CrossRef]
  55. Gocheva-Ilieva, S.G.; Ivanov, A.V.; Voynikova, D.S.; Boyadzhiev, D.T. Time series analysis and forecasting for air pollution in small urban area: An SARIMA and factor analysis approach. Stoch. Environ. Res. Risk Assess. 2014, 28, 1045–1060. [Google Scholar] [CrossRef]
  56. Bai, Y.; Wu, L.; Qin, K.; Zhang, Y.; Shen, Y.; Zhou, Y. A geographically and temporally weighted regression model for ground-Level PM2.5 estimation from satellite-derived 500 m resolution AOD. Remote. Sens. 2016, 8, 262. [Google Scholar] [CrossRef] [Green Version]
  57. Guo, Y.; Tang, Q.; Gong, D.; Zhang, Z. Estimating ground-level PM2.5 concentrations in Beijing using a satellite-based geographically and temporally weighted regression model. Remote Sens. Environ. 2017, 198, 140–149. [Google Scholar] [CrossRef]
  58. Lin, G.; Fu, J.; Jiang, D.; Hu, W.; Dong, D.; Huang, Y.; Zhao, M. Spatio-temporal variation of PM2.5 concentrations and their relationship with geographic and socioeconomic factors in China. Int. J. Environ. Res. Public Health 2014, 11, 173–186. [Google Scholar] [CrossRef] [Green Version]
  59. Lu, J.; Zhang, L. Geographically local linear mixed models for tree height diameter relationship. Forest Sci. 2012, 58, 75–84. [Google Scholar] [CrossRef]
  60. Zhen, Z.; Li, F.; Liu, Z.; Liu, C.; Zhao, Y.; Ma, Z.; Zhang, L. Geographically local modeling of occurrence, count, and volume of downwood in Northeast China. Appl. Geogr. 2013, 37, 114–126. [Google Scholar] [CrossRef]
  61. Wang, Y.; Ying, Q.; Hu, J.; Zhang, H. Spatial and temporal variations of six criteria air pollutants in 31 provincial capital cities in China during 2013–2014. Environ. Int. 2014, 73, 413–422. [Google Scholar] [CrossRef] [PubMed]
  62. Chu, H.; Huang, B.; Lin, C. Modeling the spatio-temporal heterogeneity in the PM10-PM2.5 relationship. Atmos. Environ. 2015, 102, 176–182. [Google Scholar] [CrossRef]
Figure 1. The geographical location of the study area: Heilongjiang Province, P.R. China (including the Da Xing’an Mountain). Note: Jiagedaqi is a special residential area where the environmental monitoring site of Da Xing’an Mountain region is located in.
Figure 1. The geographical location of the study area: Heilongjiang Province, P.R. China (including the Da Xing’an Mountain). Note: Jiagedaqi is a special residential area where the environmental monitoring site of Da Xing’an Mountain region is located in.
Ijerph 16 05107 g001
Figure 2. Pearson correlation coefficients (ρ) between the 6 air pollutants under study. The diagonal plots are the frequency distributions of SO2, NO2, PM10, CO, O3, and PM2.5.
Figure 2. Pearson correlation coefficients (ρ) between the 6 air pollutants under study. The diagonal plots are the frequency distributions of SO2, NO2, PM10, CO, O3, and PM2.5.
Ijerph 16 05107 g002
Figure 3. Variability of the 6 criteria air pollutants in the 13 cities (region) of Heilongjiang Province from 2015 to 2018 (210 weeks in total) (a) SO2, (b) NO2, (c) PM10, (d) CO, (e) O3, and (f) PM2.5.
Figure 3. Variability of the 6 criteria air pollutants in the 13 cities (region) of Heilongjiang Province from 2015 to 2018 (210 weeks in total) (a) SO2, (b) NO2, (c) PM10, (d) CO, (e) O3, and (f) PM2.5.
Ijerph 16 05107 g003
Figure 4. The relationship between particular matter (PM)10 and PM2.5 predicted by (a) fixed-effect (overall) vs. models of the 6 cities (b) fixed-effect (overall) vs. models of the other seven cities (region).
Figure 4. The relationship between particular matter (PM)10 and PM2.5 predicted by (a) fixed-effect (overall) vs. models of the 6 cities (b) fixed-effect (overall) vs. models of the other seven cities (region).
Ijerph 16 05107 g004
Figure 5. Moran’s I of the model residuals calculated using the different number of neighbors (a) OLS, LMM, GWR, TWR, and GTWR, and (b) only LMM, GWR, TWR, and GTWR.
Figure 5. Moran’s I of the model residuals calculated using the different number of neighbors (a) OLS, LMM, GWR, TWR, and GTWR, and (b) only LMM, GWR, TWR, and GTWR.
Ijerph 16 05107 g005
Figure 6. The p-values (two-tailed) for the randomization null hypotheses test of the model residuals of OLS, LMM, GWR, TWR, and GTWR calculated using a different number of neighbors.
Figure 6. The p-values (two-tailed) for the randomization null hypotheses test of the model residuals of OLS, LMM, GWR, TWR, and GTWR calculated using a different number of neighbors.
Ijerph 16 05107 g006
Figure 7. The annual concentration of PM2.5 of the 13 cities/regions of Heilongjiang Province from 2015 Table 2018. For visualization, (a) presents 7 cities, and (b) presents the other 6 cities. The grade I (35 µg/m3) and II (75 µg/m3) limits are shown. Note: For PM2.5 (μg/m3), grade I: ≤35, II: 35–75, III: 75–115, IV: 115–150, V: 150–250, VI: >250 (MEP, 2012).
Figure 7. The annual concentration of PM2.5 of the 13 cities/regions of Heilongjiang Province from 2015 Table 2018. For visualization, (a) presents 7 cities, and (b) presents the other 6 cities. The grade I (35 µg/m3) and II (75 µg/m3) limits are shown. Note: For PM2.5 (μg/m3), grade I: ≤35, II: 35–75, III: 75–115, IV: 115–150, V: 150–250, VI: >250 (MEP, 2012).
Ijerph 16 05107 g007
Table 1. Descriptive statistics of the variables used in this study (n = 2730).
Table 1. Descriptive statistics of the variables used in this study (n = 2730).
VariablesMin.Q1MeanMedianQ3Max.Std
SO2 (μg/m3)2.007.5716.4812.0021.29178.0014.17
NO2 (μg/m3)2.8615.4322.9520.2928.29104.4311.12
PM10 (μg/m3)10.7136.0058.6750.2171.57363.8633.69
CO (mg/m3)0.100.500.720.640.873.030.34
O3 (μg/m3)15.5754.0072.1668.6487.00172.2923.81
PM2.5 (μg/m3)3.5717.8634.0926.4342.71235.5724.50
Table 2. Parameter estimates of the ordinary least squares regression (OLS) model.
Table 2. Parameter estimates of the ordinary least squares regression (OLS) model.
ParameterEstimateStd. Errort Testp-ValueStandardized Estimate
Intercept−4.3980.850−5.1730.0000.000
SO2 (μg/m3)0.0810.0174.6790.0000.047
NO2 (μg/m3)0.3800.02515.368<2 × 10−160.172
PM10 (μg/m3)0.5200.00769.733<2 × 10−160.716
CO (mg/m3)5.6400.7157.8840.0000.079
O3 (μg/m3)−0.0860.008−10.221<2 × 10−16−0.083
Model fitting information
R a 2 0.846
AICc20,109.26
Table 3. Estimates of the variance of linear mixed models (LMM).
Table 3. Estimates of the variance of linear mixed models (LMM).
Parameter σ 1 2 σ 2 2 σ 3 2 σ 4 2 σ 5 2 σ 6 2 r σ 2
Estimate78.0920.0220.0820.009157.0100.0010.55275.110
Table 4. Parameter estimates of overall (fixed-effect) of LMM.
Table 4. Parameter estimates of overall (fixed-effect) of LMM.
ParameterEstimateStd. Errort Testp-Value
Intercept−9.812.64−3.710.00
SO2 (μg/m3)0.070.051.270.23
NO2 (μg/m3)0.420.094.780.00
PM10 (μg/m3)0.470.0317.27<0.0001
CO (mg/m3)11.863.623.280.01
O3 (μg/m3)−0.050.01−3.540.00
Model fitting information
R c 2 0.898
AICc18,756.70
Note: R c 2 is the conditional R2 of LMM.
Table 5. Parameter estimates of geographically weighted regression (GWR), temporally weighted regression (TWR), and geographically and temporally weighted regression (GTWR).
Table 5. Parameter estimates of geographically weighted regression (GWR), temporally weighted regression (TWR), and geographically and temporally weighted regression (GTWR).
ModelsParameterMinQ1MedianQ3MaxModel fitting Information
GWR
(Num of Neighbors = 262; Adaptive)
Intercept−26.901−10.534−7.609−3.8365.662 R a 2 : 0.884
AICc: 19,403.51
SO2−0.270−0.0210.0460.2590.516
NO2−0.2820.0840.4000.6761.061
PM100.3250.4370.5360.5860.628
CO−4.8471.8343.86711.79634.484
O3−0.171−0.102−0.075−0.0260.052
TWR
(Num of Neighbors = 20; Adaptive)
Intercept−99.573−14.089−5.1412.91891.439 R a 2 : 0.968
AICc: 17,944.31
SO2−2.985−0.1990.0990.5354.196
NO2−2.667−0.2340.1360.5512.637
PM10−0.2590.2830.4710.7041.140
CO−41.876−3.1726.12519.781137.483
O3−1.250−0.116−0.0170.0650.918
GTWR
(Num of Neighbors = 20; Adaptive)
Intercept−99.573−14.412−5.3392.79091.439 R a 2 : 0.968
AICc: 18016.58
SO2−2.985−0.1970.1120.5504.196
NO2−2.667−0.2320.1500.5822.637
PM10−0.2590.2790.4560.7001.140
CO−53.703−3.1286.11119.953137.483
O3−1.250−0.118−0.0170.0650.918
Note: The unit of SO2, NO2, and PM10 and O3 is μg/m3, the unit of CO is mg/m3.
Table 6. Comparison of OLS, LMM, and GWR-based models (GWR, TWR, and GTWR).
Table 6. Comparison of OLS, LMM, and GWR-based models (GWR, TWR, and GTWR).
Num of NeighborModel TypeAICc R a 2 RMSE of ResidualsMAE of ResidualsMean of Z ScoreStd of Z Score
——OLS20,109.250.8469.5986.58700.426
——LMM18,757.50.898 §8.4825.794−0.0010.375
262GWR19,403.490.8848.2075.62100.356
20TWR17944.390.9683.132.17500.129
20GTWR18016.580.9683.162.18700.131
Note: “§” represents the conditional R2 ( R c 2 ).

Share and Cite

MDPI and ACS Style

Wei, Q.; Zhang, L.; Duan, W.; Zhen, Z. Global and Geographically and Temporally Weighted Regression Models for Modeling PM2.5 in Heilongjiang, China from 2015 to 2018. Int. J. Environ. Res. Public Health 2019, 16, 5107. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph16245107

AMA Style

Wei Q, Zhang L, Duan W, Zhen Z. Global and Geographically and Temporally Weighted Regression Models for Modeling PM2.5 in Heilongjiang, China from 2015 to 2018. International Journal of Environmental Research and Public Health. 2019; 16(24):5107. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph16245107

Chicago/Turabian Style

Wei, Qingbin, Lianjun Zhang, Wenbiao Duan, and Zhen Zhen. 2019. "Global and Geographically and Temporally Weighted Regression Models for Modeling PM2.5 in Heilongjiang, China from 2015 to 2018" International Journal of Environmental Research and Public Health 16, no. 24: 5107. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph16245107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop