1. Introduction
According to statistics, forests account for 75% of the total primary production of the Earth’s biosphere and 80% of the Earth’s plant biomass [
1], with multiple economic, ecological and social benefits, and are an essential ecological support system for human life [
2]. However, forests are highly vulnerable to natural disasters, such as forest fires, pests, weeds and rodents, frost damage, and wind damage, among which forest fires cause the most severe damage [
3]. A forest fire is a sudden, destructive, and difficult-to-control natural disaster that can cause significant damage to humans and ecosystems [
4,
5,
6,
7].
Because smoke develops in the early phases of forest fires, identifying forest fire smoke quickly is crucial for effective forest fire control [
8]. Furthermore, following a fire, smoke spreads swiftly over broad distances, and smoke particles are suspended in the air as condensation nuclei and transported by the wind, affecting air quality, climate change, forest health, water quality, and other factors [
9]. Also, according to a comparison of smoke detected by satellites and visibility recorded by meteorological observers, smoke aerosols alter near-ground visibility, which has a significant impact on forest fire prevention and control [
10,
11]. When absorbed over time, diffuse smoke species comprise aerosols with relatively small particle sizes, which can easily lead to diseases affecting human health impacts such as respiratory infections, asthma, lung cancer, and cardiovascular ailments [
12]. Consequently, the detection of small areas of early forest fire smoke, large areas of smoke from spreading fires, smoke that is intentionally ignited in an area to control the spread of forest fires, or smoke that drifts to other non-burning areas is necessary for a variety of practical applications, such as fire point detection [
13], climate change modeling [
14] and environmental monitoring [
15]. To completely comprehend the effects of forest fire smoke, there is an urgent need for accurate, near-real-time detection of forest fire smoke.
Among different forest-fire monitoring methods, a remote sensing satellite can provide large-area, global observations with rich spatial, temporal, and spectral information [
16]. Most remote sensing monitoring of forest fires utilizes the on-board sensors of low to medium spatial resolution satellites (>250 m) [
17], Moderate Resolution Imaging Spectroradiometer (MODIS) [
18], and Visible Infrared Imaging Radiometer Suite (VIIRS) [
19]. These satellite data have a high temporal resolution (e.g., VIIRS and MODIS revisit for the same location were 24 and 12 h, respectively.), and a poor spatial resolution, thus small-area smoke of early forest fire tend to be missed based on remote sensing data with poor spatial resolution. Furthermore, high spatial resolution satellites, such as Landsat-8 with a resolution of 30 m [
16] and Sentinel-2 (S2) with resolutions of 10 m, 20 m and 60 m [
20], are limited by their operational orbits and cannot continuously visit the same location (revisit times of 17 and 25 days for Landsat-8 and Sentinel 2, respectively), resulting in that forest fire cannot be detected in time. In various remote sensing surveillance tasks, no combination of high spatial resolution and high temporal resolution of remote sensing data brings about a lot of severe limitations.
In order to solve the above-mentioned problems, the Super-Resolution Reconstruction (SRR) methods, which convert Low-Resolution (LR) images into High-Resolution (HR) images, have been put forward. The SR of remote sensing images can be divided into two types: single image super resolution (Single Image Super-Resolution, SISR) reconstruction and multi-image super resolution (Multi-image Super-Resolution, MISR) reconstruction. SISR can effectively enhance the spatial resolution of an image without additional information, which is more common in practical applications [
21]. SISR algorithms can be classified into interpolation-based, reconstruction-based, and learning-based reconstruction. Interpolation-based reconstruction employs the source image’s element values to estimate high-resolution element values. This method is computationally simple and fast and works better with smooth images, but it is difficult to reconstruct high-frequency information and is readily influenced by high-frequency noise [
22]. The reconstruction-based strategy uses a priori information to match output to input. This technique is better at enhancing edge information and suppressing ghosting during reconstruction. However, many reconstruction-based approaches perform poorly with increasing scale [
23]. Learning-based methods require acquiring a large amount of low-resolution and corresponding high-resolution data and learning the complex relationship between them to obtain the ability to reconstruct unknown low-resolution data, which has wide application as a more flexible and effective reconstruction method [
24]. In order to obtain the remote sensing images with high spatial resolution and high temporal resolution, the SISR method based on deep learning is used to perform super-resolution reconstruction of VIIRS images in each band in this paper.
Since Dong et al. proposed SRCNN in 2014 [
25], more and more research has used convolutional neural networks (CNNs) for SR tasks due to the robust nonlinear fitting and learning capabilities of CNNs. Meanwhile, SR studies have used multiple architectures designed (ResNet [
26], DenseNet [
27], etc.) to learn the mapping from LR to HR images, such as VDSR [
28], EDSR [
29], and DBPN [
30]. However, it is difficult to obtain corresponding high spatial resolution remote sensing images due to the high cost of sensors in VIIRS. Furthermore, the remote sensing images are to some extent different from the natural images taken by a general camera. Compared with natural images, the targets in remote sensing images cover fewer pixels and have more complex backgrounds. Also, remote sensing images have multiple multispectral bands and hyperspectral bands with a lot of information and a large field of view [
31]. Current work has addressed some of these issues, such as the recovery of texture details, input of oversized images, and suppression of perceptual losses [
32]. However, there are still many problems in extracting smoke regions in super-resolution reconstructed VIIRS images:
The effect of super-resolution reconstruction of the image is usually evaluated quantitatively by the metrics of reconstruction accuracy and perceptual quality. However, we cannot determine the effect of these reconstructed image accuracy metrics, such as PSNR and SSIM, often used, and the perceptual quality of the reconstructed image on the smoke detection accuracy. In other words, the sensitivity of smoke detection accuracy to image reconstruction quality metrics is unknown.
Due to the fact that smoke from early forest fires comprises only a small portion of the image, it is more readily discernible in photographs captured by Landsat-8 than VIIRS. To increase model performance, we trained a super-resolution network and a smoke segmentation network with Landsat-8 data. However, when we attempted to utilize the established model to recognize images captured by other sensors in VIIRS, the accuracy of detection was drastically reduced due to the sensitivity of the CNN-based model architecture to the distribution and properties of the training and test images, the above problem is known as the domain adaptation problem [
33]. In remote sensing, the domain disparity problem is often caused by many reasons, such as illumination conditions, imaging time, imaging sensors, and geographic location. These factors will change the spectral characteristics of the object and lead to large intra-class variability [
33], For example, images obtained from different satellite sensors may have different colors. In addition, two satellites with similar functional bands (Landsat-8’s SWIR2: 2110–2290 nm; VIIRS’s M11: 2225–2275 nm) possess different wavelength ranges due to differences in imaging sensors.
As mentioned in the most recent SSDA smoke detection algorithm based on VIIRS images [
34], forest fire smoke is extremely sensitive to certain bands and remote sensing parameters of VIIRS. Remote sensing parameters such as the M11 band, BT (brightness temperature), and AOD (aerosol optical depth) can help to distinguish smoke from other landscapes. However, existing image super-resolution methods do not reconstruct these sensitive bands with high quality, and thus cannot use this information effectively in smoke segmentation. Moreover, the sensitivity of deep learning-based smoke segmentation methods to these parameters is unclear.
To solve the above problems, this research proposes a method for forest fire smoke detection based on super-resolution reconstructed VIIRS images. The main contributions are as follows:
Landsat-8’s multispectral smoke dataset was created independently in global forest fire-prone regions. The seasonal, environmental, and temporal diversity of fire occurrence was also taken into account in order for the data to meet mission requirements.
Using Landsat-8 satellite images and the CNN architecture, a network enabling super-resolution reconstruction of VIIRS images is constructed. To improve the accuracy and perceptual quality of the reconstructed images, the network combines a residual network with an artifact removal module and a channel attention mechanism, and the reconstructed VIIRS images with high temporal and spatial resolution were obtained.
Unlike prior reconstruction effect evaluation methods that focused solely on the reconstructed image, the image super-resolution performance in this paper is assessed and optimized in terms of smoke detection accuracy. In addition, the detection sensitivity of multi-band and different combinations of remote sensing indices is analyzed to better combine the deep learning method with the characteristics of remotely sensed smoke.
2. Data
2.1. Landsat-8 Multispectral Data
Landsat-8 OLI data are available in scenes that are about 185 km × 180 km and have path (ground track parallel) and row (latitude parallel) coordinates in a worldwide reference system (WRS-2). With a 16-day repeat cycle and a 10 a.m. equatorial overpass time, the Landsat-8 orbital characteristics in conjunction with a 15° field of view allow it to cover the whole world. In each OLI band, the radiometric accuracy is 3%. These products have a 12 m (90% circular error) geolocation accuracy and can be used to compare different times. The parameters of the bands used in this paper are shown in
Table 1.
This study utilized satellite images of smoke containing various types of ground cover, fire-prone areas, and years (
Table 2), and downloaded Landsat-8, VIIRS data for the same places as well as capture dates for the study areas, which included (i) conifers in high-latitude boreal forests; (ii) mixed coniferous forests in subtropical evergreen sclerophyll forests in West America; (iii) dry sclerophyll woods and open forests in eastern Australia; (iv) tropical rainforests in the Amazon; and (v) subtropical forests in Southeast Asia. In these five regions, 20 sets of Landsat-8 and VIIRS images were collected in total.
Due to climate considerations, forest fires are most prevalent in the spring, autumn, and winter. This study investigated four fire seasons from 2016 to 2020 (
Figure 1).
Depending on the stage of the forest fire, both the concentration and proportion of smoke pixels in a single photograph vary. At the beginning of the fire, pixels containing sparsely scattered smoke occupy a small percentage of the image; however, towards the middle of the fire, pixels containing densely spread smoke occupy nearly the entirety of the image. In order to create a more accurate recognition model, the fraction of smoke pixels in the images must be varied while studying smoke detection in early forest fires.
Figure 2 depicts the percentage of smoke pixels to cropped image pixels used to form the dataset.
2.2. VIIRS Multispectral Data
Scanning imaging radiometer VIIRS (Visible Infrared Imaging Radiometer Suite) captures radiation images of the Earth’s surface, atmosphere, ice, and oceans in both the visible and infrared spectrums. It is an upgrade and expansion of previous Earth Observation Series instruments like the MODIS medium-resolution imaging spectrometers and the High-Resolution Radiometer AVHRR. The VIIRS sensor data record (SDR) contains 22 channels that span visible and infrared wavelengths between 0.41 and 12.01 microns. Five of these channels are I-bands, sixteen are medium-resolution bands (M-bands), and the remaining channel is a unique panchromatic Day/Night band (DNB). In this study, we utilized one band with high resolution (I1) and four bands with moderate resolution (M3, M4, M11, M15). The parameters of the bands used in this paper are shown in
Table 3.
To ensure the consistency of the study area, both VIIRS and Landsat-8 images were captured at the same time and location (
Figure 3). In addition, using VIIRS and Landsat-8 true color images, we manually annotated the smoke plume to establish smoke benchmark data and validate the smoke detection results. The total number of manually labeled Landsat-8 smoke pixels is 106,750,000 and the total number of VIIRS smoke pixels is 675,732.
2.3. Datasets for Smoke Detection
Smoke detection results were evaluated using the NOAA S-NPP Data Exploration (NDE) version of VIIRS level 2 aerosol products, including aerosol optical depth (AOD) and aerosol detection products (ADP) [
19]. The VIIRS AOD product provides aerosol optical depth at 550 nm, which is defined as vertically integrated column total extinction at 0.55 nm. This product is derived from the Second Simulation of the Satellite Signal in the Solar Spectrum (6S) radiative transfer model. During the day, the VIIRS ADP product classifies VIIRS pixels as clear, smoke, or dust. For operational smoke detection in the ADP product, two algorithms, one based on deep blue and one on IR-visible, have been developed.
The scattering-based smoke detection algorithm (SSDA)was also used to evaluate the smoke detection results of this study. SSDA relies mainly on visible light and the blue and green bands of VIIRS. The SSDA is founded on the theory of Mie scattering, which occurs when the diameter of an atmospheric particulate is comparable to the wavelength of the scattered light. As a result of the close correspondence between smoke particulate diameters and the blue/green band wavelengths, smoke frequently causes Mie scattering in the VIIRS blue and green bands.
2.4. Datasets for Training and Testing
To create the new algorithm, smoke samples from fire-prone areas were chosen for training and validation, and the algorithm was tested globally. Images from 20 different areas were downloaded in total. To ensure consistency across the study area, the size of each study area was cropped to 150 km × 150 km. Images from Landsat-8 (30 m) were saved as 5000 × 5000 pixels. VIIRS (375 m) images were saved at 400 × 400 pixels.
2.5. Datasets for Smoke Segmentation Network
Seven Landsat-8 image bands were used in total to train the super-resolution reconstruction model: R, G, B, NIR, SWIR1, SWIR2, and TIRS1. The Landsat-8 images were cropped to 100 × 100 pixels using the sliding window method. Considering the balance of dataset type, location, and image acquisition time, a total of 100,000 100 × 100 pixel images from 20 study areas are selected in this paper to build a dataset for image super-resolution reconstruction, of which 90% are used for training and 10% for testing.
2.6. Datasets for Super-Resolution Network
We used Landsat-8 images in five bands to train the smoke segmentation model: R, G, B, SWIR2, TIRS1. Landsat-8 images from various bands were cropped to 250 × 250 pixel sizes using a random cropping method. To ensure that 60% of the images had smoke, the other backgrounds were clouds, water bodies, vegetation, bare land, and cities, resulting in a total of 6030 sets of images screened. To reduce overfitting, it was also necessary to enhance the existing images using data enhancement techniques: images of 250 × 250 pixel size were randomly cropped to 240 × 240 pixels and mirrored vertically and horizontally. Finally, a total of 54,270 images of 240 × 240 pixels were selected to build a dataset for smoke segmentation, where 70% of the images were used for training, 15% for validation, and 15% for testing.
6. Conclusions
This paper proposes a low spatial resolution satellite remote sensing image smoke detection method with an integrated image super resolution reconstruction network and a smoke segmentation network in order to make use of high temporal resolution (24 h) but low spatial resolution (375 m), accurate and timely detection of early forest fire smoke, and achieve the effect of using high spatial resolution satellite smoke detection. The first step was to create a multispectral remote sensing smoke dataset with various years, seasons, geographies, and land cover types. The second part of this research proposed a super-resolution network of satellite remote sensing images, which combines the super-resolution reconstruction network with a channel attention mechanism and an artefact removal module. Experimental validation was made in comparison to previous approaches. Thirdly, experimental analysis was implemented to determine the effects of the super-resolution reconstruction module on the results of smoke segmentation as well as the sensitivity of various band combinations and remote sensing indices of multispectral data. The outcomes demonstrate that with a Jaccard index of 0.742, the smoke segmentation result of the satellite image (93.75 m) following the semantic segmentation is remarkably similar to the result of the high-resolution satellite image (30 m). The domain offset issue that arises during the smoke segmentation of remote sensing images based on super-resolution reconstruction is also resolved by deep domain adaptation. Last but not least, the M11 band for VIIRS images is a sensitive band for smoke segmentation utilizing deep learning algorithms, which can significantly aid in smoke pixel segmentation in VIIRS images. Super-resolution reconstruction of remote sensing images’ accuracy is more crucial for the task of smoke segmentation than the reconstruction’s perceptual quality, and it is also more sensitive to smoke segmentation.
However, the distortion of remote sensing signals generated by super-resolution might result in a sequence of errors in subsequent data processing. Secondly, the explanation for the favorable effect of sensitive bands on smoke detection needs to be researched further, which will also aid in the discovery of new remote sensing parameters suitable for smoke detection. Furthermore, studies on how to combine remote sensing characteristics while utilizing deep learning algorithms to segment smoke more precisely in remote sensing images and employ smoke areas for fire spot identification, smoke pollution assessment, and analysis are needed.