Abstract

Ambient ozone (O3) is an important phytotoxic pollutant, and detailed knowledge of its spatial distribution is becoming increasingly important. The aim of the paper is to compare different spatial interpolation techniques and to recommend the best approach for producing a reliable map for O3 with respect to its phytotoxic potential. For evaluation we used real-time ambient O3 concentrations measured by UV absorbance from 24 Czech rural sites in the 2007 and 2008 vegetation seasons. We considered eleven approaches for spatial interpolation used for the development of maps for mean vegetation season O3 concentrations and the AOT40F exposure index for forests. The uncertainty of maps was assessed by cross-validation analysis. The root mean square error (RMSE) of the map was used as a criterion. Our results indicate that the optimal interpolation approach is linear regression of O3 data and altitude with subsequent interpolation of its residuals by ordinary kriging. The relative uncertainty of the map of O3 mean for the vegetation season is less than 10%, using the optimal method as for both explored years, and this is a very acceptable value. In the case of AOT40F, however, the relative uncertainty of the map is notably worse, reaching nearly 20% in both examined years.

1. Introduction

Ambient ozone (O3) is a widely studied air pollutant [1]. Due to its unsaturated chemical structure, it is highly reactive and contributes to the oxidative power of atmosphere, essential for scavenging many pollutants from the atmosphere [2]. It has important negative impacts on both human health and the environment as acknowledged in numerous studies [3]. Moreover, due to its absorption-radiation abilities, O3 is an important greenhouse gas [4, 5], and there are significant interactions between O3 and climate change [6].

Current ambient O3 levels have increased by approximately two times as compared to those measured over a century ago [7]. Although the magnitude and origin of the hemispheric O3 trends are still not completely understood [8], there are indications that background O3 levels over the midlatitudes of the Northern Hemisphere have continued to rise over the past three decades within the range of approximately 0.5–2% per year [9]. A significant contribution to O3 levels both in Europe and North America originates in East Asia as a result of its dynamic development regarding population growth and increased fossil fuel consumption [10].

For the above reasons, the detailed knowledge of spatial distribution of ambient O3 levels is becoming increasingly important. To develop a reliable, accurate, and continuous air pollutant surface predicting the values at locations without measurements is an essential task which we frequently encounter in environmental and health-related studies. This benefit of O3 mapping stands out when viewed alongside the increasingly limited financial resources available for costly ambient air quality monitoring networks.

There are a wide range of techniques available for spatial interpolation, the advantages and limitations of which are widely discussed in the scientific literature [11]. In principle these techniques are classified as deterministic (the nearest neighbor and polynomial regression) or stochastic (geostatistical approaches as kriging and cokriging). The difference between these two is that the geostatistical methods use the spatial correlation structure and allow a prediction variability estimate to assess, under certain conditions, prediction accuracy. In between the deterministic and stochastic methods, there are a wide range of radial basis functions or splines.

The quality of maps of air pollution depends mainly on the quality of the input data measured at the stations, on the number of measuring sites, and also on their spatial distribution [12, 13]. Air pollution measurements, particularly those from online permanent monitoring used in routine monitoring networks, are very costly and so the number of sites is generally very limited. The number of required sites depends obviously on the type of air pollutant and on the representativeness of the measuring site. The representativeness is closely related to proximity of emission sources and topography; more stations are needed in complex terrain in contrast to flat [14]. When developing a surface for pollutants with high spatial variability (due to the importance of their local emission sources), for example, PM, benzo(a)pyrene, or toxic metals, more sites are needed. In contrast, pollutants like O3, with a more regional character, depending mostly on regional phenomena such as meteorology and long-range or regional air pollution transport need a less-dense monitoring network.

Maps of ambient O3, in context of its impacts on environment in rural areas, produced by different approaches were published for different regions: United Kingdom [15], Sierra Nevada in California [1618], and the Carpathians in Europe [19]. Across the EU, mapping of background O3 at a fine spatial scale (1×1 km) was carried out [20], as well as mapping of exposure index AOT40 at 2×2 km grid resolution [21, 22]. Across the Czech Republic, the method of [15] was applied for ozone deposition mapping [23].

In the Czech Republic (CR), ambient O3 levels are elevated [24, 25], limit values over vast regions are frequently exceeded, and phytotoxic potential is high [2628]. The aim of this paper is to compare the different spatial interpolation techniques and to recommend the optimal approach for producing a reliable map of O3 with respect to its phytotoxic potential.

2. Methods

2.1. Ambient Ozone Data

We used real-time O3 levels measured at sites in the nationwide air quality monitoring network by UV absorbance, a reference method as declared by the EC [29]. The ozone analyzers used were the Thermo Environmental Instruments (TEI), M49. The samplers were placed ca 2 m above ground. Standard procedures for quality control and quality assurance [29] were applied. We considered O3 seasonal means (April–September) and the exposure index AOT40 for forests—AOT40F [30], calculated according to [31]. The input data were 1 h mean concentrations. Data capture required for calculations of seasonal means was 75%. The AOT40 index as a cumulative variable is very sensitive to the quality of measured data [32, 33] and obviously also to missing values. More stringent requirements are, thus, needed for calculation of AOT40 as compared to the seasonal mean. In cases when we had less than 90% of hourly O3 concentrations for the period of April–September for calculating the AOT40F, we corrected it according to [34] so as to prevent underestimation of the O3 exposure.

With respect to the aim of this study to assess O3 exposure for forests, urban sites were not considered. From a total of 55 sites monitoring real-time O3 concentrations across the CR, we accounted only for rural sites as specified by EoI [35]. The rural sites according to EoI are the sites with no important emission sources nearby and which are assumed to be affected only by long-range or regional air pollution transport. Thus, the representativeness of such sites is considerable, mostly in tens to hundreds of kms. The selection of sites resulted in 24 rural sites distributed unevenly across the CR (Figure 1), with more sites at border mountain areas as compared to the interior of the country. Considering that the area of the CR is 79 000 km2, the sampling density was approximately one monitor per 3 292 km2.

2.2. Spatial Interpolation

Maps for mean vegetation season O3 concentrations and for exposure index AOT40F for forests were prepared. For spatial interpolation, we used 24 rural sites run by the CHMI. In addition to the Czech sites, we also used data from selected measuring sites in Germany and Poland to improve the interpolation near border areas (Figure 1). Measuring sites in Slovakia are too distant so they cannot be used. Data from Austrian sites situated near the border were not available. The maps were prepared using ArcGIS Geostatistical Analyst [36] on a grid of 1×1 km resolution. A 25 m resolution DEM was used.

For spatial interpolation, eleven methods were used (Table 1). These methods are adequately referenced in scientific literature, so we comment on them only very briefly. In principle, we use three different interpolation methods using measured data only and two basic methods which combine measured and supplementary data, with four subvariants.

2.2.1. Interpolation Methods Using Measured Data Only

(A) Inverse Distance Weighted Method
The inverse distance weighted method (IDW) is likely to be the most frequently used deterministic method [37]. For interpolation we used 𝑍𝑠0=𝑛𝑖=1𝑍𝑠𝑖/𝑑𝑘0𝑖𝑛𝑖=11/𝑑𝑘0𝑖,(1) where 𝑍(𝑠0) is the interpolated value of concentration in the point 𝑠0, 𝑍(𝑠𝑖) is the measured value of concentration in the ith point, i=1,,𝑛, 𝑑0𝑖 is the distance between the interpolated point and the ith point with measurement, and n is the number of sites used for interpolation.

(B) Radial Basis Functions Method
Radial basis functions (RBF) interpolate the measured value while minimizing the total curvature of the surface. The interpolation is described by 𝑍𝑠0=𝑛𝑖=1𝑤𝑖𝑑Φ0𝑖+𝑤𝑛+1,(2) where Φ(x) is a specific RBF function, 𝑑0𝑖 is the distance between the interpolated point and the ith point with measurement, 𝑤1,,𝑤𝑛+1 are the weighting parameters, and n is the number of surrounding sites used for interpolation.
Although calculation of the RBF and estimation of its parameters is rather complicated, the computation is simple and fast. The parameters 𝑤1,,𝑤𝑛+1 are obtained from the system of equations given by 𝑛𝑗=1𝑤𝑗Φ𝑑𝑖𝑗+𝑤𝑛+1𝑠=𝑍𝑖,𝑖=1,,𝑛,𝑛𝑗=1𝑤𝑗=𝑤𝑛+1.(3) A more detailed description is given by [36]. Coyle et al., 2002, [15] applied this interpolation technique within his approach for ambient O3 mapping for Great Britain.

(C) Ordinary Kriging
Ordinary kriging is a geostatistical interpolation method. It considers the statistical model: 𝑍(𝑠)=𝜇+𝜀(𝑠),𝑠𝐷,(4) where μ represents the constant mean structure of the concentration field, 𝜀(𝑠) is a smooth variation plus measurement error (both zero-mean), and 𝐷 is the examined area.
The interpolation is performed according to the equation 𝑍𝑠0=𝑛𝑖=1𝜆𝑖𝑍𝑠𝑖,𝑛𝑖=1𝜆𝑖=1,(5) where 𝑍(𝑠0) is the interpolated value of concentration in the point 𝑠0, 𝑍(𝑠𝑖) is the measured value of concentration in the ith point, 𝑖=1,,𝑛, 𝑛 is the number of surrounding sites used for interpolation, and 𝜆1,,𝜆𝑛 are the weights assumed based on a semivariogram.
The weights λi are derived from a semivariogram γ() in order to minimize the mean square error. The explicit calculation is achieved by the system of equations given by 𝑛𝑗=1𝜆𝑗𝛾𝑠𝑖𝑠𝑗𝑠+𝛾0𝑠𝑖𝑚=0,𝑖=1,,𝑛,𝑛𝑖=1𝜆𝑖=1.(6) Kriging is a commonly used standard method. For construction of an O3 surface, it was used, for example, by [3840].

2.2.2. Interpolation Methods Using Both Measured and Auxiliary Data

The methods described in Section 2.2.1 were used only for the interpolation of the measured O3 concentrations. Apart from these methods, there are others using well-correlated physical relationships between concentrations and other characteristics, for which more complex spatial information is known. The simplest approaches are the linear regression models without spatial interpolation; more complicated are different combinations of linear regression and spatial interpolation.

(A) Linear Regression Model without Spatial Interpolation
The basic linear regression model equation considered is 𝑍(𝑠)=𝑐+𝑎1𝑋1(𝑠)+𝑎2𝑋2(𝑠)+,(7) where 𝑋𝑖(𝑠) are different supplementary parameters at the point 𝑠, for 𝑖=1,2,,𝑐,and𝑎1,𝑎2,, are the parameters of the linear regression model.
In our case, altitude is used as the auxiliary parameter.

(B) Linear Regression Model Followed by the Spatial Interpolation of Residuals
The interpolation is estimated according to 𝑍𝑠0=𝑐+𝑎1𝑋1𝑠0+𝑎2𝑋2𝑠0𝑠++𝜂0,(8) where 𝑍(𝑠0) is the estimated value of the O3 concentration at the point 𝑠0,𝑋1(𝑠0),𝑋2(𝑠0),,𝑋𝑛(𝑠0) are the n number of individual auxiliary variables at the point 𝑠0,𝑐,𝑎1,𝑎2,,𝑎𝑛 are the 𝑛 selected parameters of the linear regression model calculated at the points of measurement, and 𝜂(𝑠) is the spatial interpolation of the residuals of the linear regression model at the points of measurement.

The output of a dispersion model, altitude, meteorological variables (temperature, relative humidity, global radiation) may be among the auxiliary characteristics. For preparing an O3 surface, this method was used, for example, by Loibl et al., 1994 [41], who used relative altitude as the auxiliary characteristics, Horálek et al., 2008 [42], who used model EMEP, altitude, and global radiation as the auxiliary characteristics, and Abraham and Comrie, 2004 [43].

In this study we used the altitude as the sole auxiliary characteristic. The major reason was that preliminary analysis of our data showed the best association between O3 concentrations and altitude. Inclusion of meteorological variables did not bring any further benefit to our model. The assumption of linear distribution of O3 with altitude was tested prior to the regression analysis.

Different interpolation methods, as described in Section 2.2.1, can be used for interpolation of residuals.

(C) Interpolation of Mean Afternoon Concentration Minus Regression of Mean Afternoon Increment with Altitude
Ozone concentrations show diurnal variation. Next to this, mean afternoon increment (i.e., the difference between the mean afternoon concentration and the mean whole-day concentration) is strongly related to altitude; see [15]. Coyle introduced the mapping method in which this regression relation is combined with the spatial interpolation of the afternoon concentration; that is, 𝑍𝑠0𝑠=𝜌0𝑅Δ𝑠0,(9) where 𝜌(𝑠0) is the spatial interpolation of the mean afternoon concentrations at the point 𝑠0 and 𝑅Δ(𝑠0) is the regression relation of the increment Δ based on altitude at the point 𝑠0.

While Coyle uses minimum curvature function (i.e., one of the RBF function) for the interpolation of afternoon values, we use ordinary kriging, as it shows generally better results; see bellow.

(D) Interpolation of Mean Afternoon Concentration Minus Regression of Mean Afternoon Increment with Altitude Followed by the Spatial Interpolation of Residuals
The variant of the method C is the addition of the interpolation of its residuals to the results of this method. As for the method B, different interpolation methods as introduced under Section 2.2.1 are used.

2.3. Uncertainty of Maps

We used cross-validation analysis for the assessment of uncertainty of the map. Cross-validation compares a value measured at a monitoring site with an estimated value based on interpolation of values measured at other sites. The root mean square error (RMSE) of the map, calculated according to (10), was used as the uncertainty criterion. RMSE should be as small as possible:RMSE=1𝑁𝑁𝑖=1𝑍𝑠𝑖𝑍𝑠𝑖2,(10) where RMSE is the root mean square error of the whole map, 𝑍(𝑠𝑖) is the measured concentration at the 𝑖th site,𝑖=1,,𝑁,𝑍(𝑠𝑖) is the concentration at the 𝑖th site estimated from concentrations measured at other sites, 𝑖=1,,𝑁, and 𝑁 is the number of measuring sites.

3. Results

The RMSE values comparing the different interpolation techniques both for O3 seasonal means and AOT40F are presented in Tables 2 and 3. Considering the average from the vegetation seasons of 2007 and 2008, our results indicate that the optimal interpolation approaches are LR+res_OK and ALR+res_OK. In the case of O3 seasonal means, the rankings for both methods were exactly the same, while for AOT40F LR+res_OK slightly outperformed ALR+res_OK. All three interpolation techniques alone—IDW, RBF, and OK—gave much worse results in comparison to linear regression of measured data and altitude with subsequent interpolation of its residuals. This held both for O3 seasonal means and AOT40F but was more pronounced for the O3 seasonal means.

The relative uncertainty of the map of mean O3 for the vegetation season was 8% for LR+res_OK and ALR+res_OK methods for both explored years. This is a thoroughly acceptable value. Even though the IDW method ranking indicated that it was the worst interpolation approach, the relative uncertainty of the map of mean O3 for the vegetation season was 13%. In the case of AOT40F, however, the relative uncertainty of the map was notably worse. For LR+res_OK, ranking as the best approach, its relative uncertainty values were 18% in 2007 and 19% in 2008, while for IDW, assessed as the worst approach, its values were 21% for 2007 and 20% for 2008.

Figures 2 and 3 show the spatial interpolation of mean O3 concentrations in the 2008 vegetation season prepared by interpolation techniques LR+res_OK and ALR+res_OK which ranked best in the comparison (Table 2). The two approaches resulted in maps which are very similar and exhibited only minor differences. The same applies for Figures 4 and 5 which show the spatial variability of AOT40F values in the 2008 vegetation season prepared by interpolation techniques LR+res_OK and ALR+res_OK which ranked best in the comparison (Table 3). The relationships between the results of these two interpolation approaches are presented by scatter plots (Figure 6). Notably better results were obtained for O3 seasonal means.

4. Discussion

To produce a reliable air pollutant map to predict values in regions without measurements is an essential yet challenging task. Generally, dense monitoring networks are expensive but give a precise picture of spatial variability of a given phenomenon. Sparse sampling and monitoring networks, however, although less expensive, may miss significant spatial features of the studied phenomenon [12]. To make a real monitoring network denser, it is possible, for example, to add virtual measuring sites to improve the quality of interpolation [44]. Currently, however, no rigorous methodology for the determination of the number of monitoring sites sufficient/adequate to develop a reliable air pollutant surface is available [45]. Familiarity with the terrain and various phenomena that could affect the air pollutant concentrations and distributions are among the most important issues [12].

For mapping purposes, a number of techniques are available. There are substantial qualitative differences between the maps derived using different interpolation techniques as shown, for instance, by [46] for the maps of NO2. The assessment of performance of the different techniques is extremely important. The maps derived by different interpolation techniques may be compared and evaluated by using the objective criteria, such as cross-validation [42]—when we omit one site in interpolation process and predict its values based on the rest of the sites and then compare the predicted and measured values.

Presented maps are applicable merely for rural areas. The obvious limitation of the maps is the low number of measuring sites which are unevenly distributed across the country. The spatial distribution of sites has strong historic connotations. Originally the measuring sites were located preferentially to more polluted regions, and they still remain in their original setting to observe the long-time trends.

Our results show that using the auxiliary data, in our case the dependence of O3 concentrations on altitude in particular, significantly improves the overall quality of the resulting map (see Tables 2 and 3). Meteorology was not factored into the linear regression models as the preliminary analysis of our data showed that including meteorological variables did not bring any further benefit to our model. The likely reason is fairly low number of O3 measuring sites. In near future we intend to use the Eulerian photochemical dispersion model CAMx [47] as auxiliary characteristics. The preliminary results seem to be promising. Next to this, it can be stated that the methods using ordinary kriging in its spatial interpolation part show the best results.

If we compare the two methods which ranked as the best, the LR+res_OK (i.e., linear regression of measured values with altitude followed by the interpolation of its residuals using ordinary kriging) approach slightly outperforms ALR+res_OK (i.e., the Coyle’s approach [15] followed by the interpolation of its residuals using ordinary kriging) or gives comparable results. ALR+res_OK, however, is much more complicated and demanding for computation and, thus, less practical for application.

We can reasonably assume that notably worse results of mapping of AOT40F as compared to mean O3 concentration for a vegetation season (see Figure 6) are due to more random/incidental factors affecting AOT40. Moreover, exposure index AOT40 is less robust characteristic as compared to mean O3 concentrations [33].

Presented maps show the high-resolution O3 spatial patterns which can be used for assessment of O3 effects on vegetation. Exposure maps in particular are useful for indication of areas with high O3 phytotoxic potential for forests and were already used across the Czech forests earlier [26]. Spatial patterns of O3 seasonal means are useful for estimation of O3 deposition as published for the Czech coniferous and deciduous forests by [23] and for estimation of stomatal flux.

5. Conclusions

We developed reasonable continuous surfaces for ambient O3 vegetation season mean concentrations and AOT40F using eleven interpolation approaches. The comparison based on RMSE indicates that linear regression between measured O3 data and altitude with subsequent interpolation of its residuals outperforms the interpolation techniques IDW, radial basis functions, and ordinary kriging significantly. This holds for both O3 seasonal means and AOT40F, and, in the case of O3 seasonal means, this feature is more pronounced as compared to AOT40F. Considering all different aspects, including the results of cross-validation analysis and the demandingness of computation, linear regression of O3 data and altitude with subsequent interpolation of its residuals by ordinary kriging can be recommended as the optimal approach out of the eleven spatial interpolation techniques examined. Notably better results in mapping were obtained for mean seasonal O3 concentrations in comparison to exposure index AOT40F.

Acknowledgments

This study was funded by the Ministry of Environment of the Czech Republic MŽP (SP/1b7/189/07). The data on ambient ozone were provided by the Czech Hydrometeorological Institute. We appreciate the assistance of Linton Corbet who revised the English and commented on the final version of the paper.