Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea

Han, Zhaohui; He, Yijun; Liu, Guoqiang; Perrie, William

doi:10.3390/rs12030480

Open AccessArticle

Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea

¹

School of Marine Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, China

²

Laboratory for Regional Oceanography and Numerical Modeling, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China

³

Fisheries and Oceans Canada, Bedford Institute of Oceanography, Dartmouth, NS B2Y 4A2, Canada

^*

Author to whom correspondence should be addressed.

Remote Sens. 2020, 12(3), 480; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12030480

Submission received: 27 November 2019 / Revised: 22 January 2020 / Accepted: 31 January 2020 / Published: 3 February 2020

(This article belongs to the Special Issue AI-based Remote Sensing Oceanography)

Download

Browse Figures

Versions Notes

Abstract

:

The Data Interpolating Empirical Orthogonal Functions (DINEOF) method has demonstrated usability and accuracy for filling spatial gaps in remote sensing datasets. In this study, we conducted the reconstruction of the chlorophyll-a concentration (Chl-a) data using a convolutional neural networks model called Data-Interpolating Convolutional Auto-Encoder (DINCAE), and we compared its performance with that of DINEOF. Furthermore, the cloud-free sea surface temperature (SST) was used as a phytoplankton dynamics predictor for the Chl-a reconstruction. Finally, four reconstruction schemes were implemented: DINCAE (Chl-a only), DINCAE (Chl-a and SST), DINEOF (Chl-a only), and DINEOF (Chl-a and SST), denoted rec1, rec2, rec3, and rec4 respectively. To quantitatively evaluate the accuracy of these reconstruction schemes, both the cross-validation and in situ data were used. The study domain was chosen to be the Northern South China Sea (SCS) and West Philippine Sea (WPS), bounded by 115–125°E and 16–24°N to test the model performance for the reconstruction of Chl-a under different Chl-a controlling mechanisms. The in situ validation showed that rec1 performs best among the four reconstruction schemes, and that adding SST into the Chl-a reconstruction cannot improve the reconstruction results. However, for cross validation, adding SST can slightly improve spatial distributions of the root mean square error (RMSE) between the reconstructed data and the original data, especially over the SCS continental shelf. Furthermore, the potential of DINCAE prediction is confirmed in this paper; thus, the trained DINCAE model can be re-applied to reconstruct other missing data, and more importantly, it can also be re-trained using the reconstructed data, thereby further improving reconstruction results. Another consideration is efficiency; with similar reconstruction conditions, DINCAE is 5–10 times faster than DINEOF.

Keywords:

DINEOF; DINCAE; reconstruction; chlorophyll-a; neural networks

Graphical Abstract

1. Introduction

Satellite-based monitoring of oceanic biogeochemical variables, such as chlorophyll-a concentration (Chl-a), often suffers from clouds, sun glint, thick aerosols, and other problems. The probability is only 25–30% to obtain cloud-free conditions over global oceans for near-daily coverage of MODIS measurements [1], and only ≈5% of pixels are high-quality [2]. This limitation strongly reduces the potential applications of ocean color observations. Therefore, missing data are a crucial problem when working with Chl-a satellite data (Chl-a_sat). Currently, many different methods have been developed to solve this problem by filling the missing values, ranging from simple linear interpolations and bicubic splines to the use of complicated statistical methods, such as optimal interpolation (OI) and Kriging. In the past, OI methods have been widely used to merge the daily multi-sensor data, especially sea surface temperature (SST) [3,4,5].

Because OI needs complex priori information [6], the application of OI in Chl-a reconstruction is limited. Saulquin et al. [7] presented daily cloudless Chl-a fields with a resolution of 1.1 × 1.1 km² over the Ireland–Biscay–Iberia Regional Ocean Observing System area, based on an improved Kriging scheme. However, at present, one of the most widely used methods in Chl-a reconstruction is DINEOF (Data Interpolating Empirical Original Functions), presented by Beckers and Rixen [8] and Alvera-Azcárate et al. [9]. DINEOF is a self-consistent and parameter free piece of technology for the reconstruction of gaps in data, and provides results that are statistically similar to OI reconstruction methods, but 30 times faster [9,10].

The growth of phytoplankton is mainly controlled by the concentration of nutrients. Some upper ocean physical processes, such as upwelling, cold eddies, or turbulent advection, can bring limited quantities of nutrients into the euphotic layer, thereby promoting phytoplankton reproduction [11,12,13]. These physical processes are often accompanied by colder SST, lower sea surface height (SSH), or higher wind speeds than what is experienced in surrounding areas. Therefore, SST, SSH, and wind speed have strong correlations with Chl-a [14,15], and can be used as auxiliary variables in the reconstruction of Chl-a. The successful application of multivariate DINEOF in multiple sea areas [16,17,18,19] has proven the effectiveness of using such auxiliary variables to improve Chl-a reconstruction.

All the reconstruction methods mentioned above have shown significant performances in their respective study areas. However, those methods that are based on linear interpolation techniques can only handle linear relationships in temporal and spatial domains, and therefore lose some non-linear features. Machine learning techniques are a potential method to retain more small-scale structures during the reconstruction process, because of their ability to handle non-linear relationships and complex interactions, which often occur in ecological data [20,21,22]. Recently, Barth et al. [23] proposed a reconstruction method based on convolutional neural networks (CNN), called DINCAE (Data-Interpolating Convolutional Auto-Encoder). The CNN methodology used in DINCAE can effectively reduce the dimensionality of the input data, which is therefore similar to the Empirical Orthogonal Functions (EOFs) projection in the DINEOF approach [24]. However, when we consider work with large and complete datasets (i.e., no gaps, or almost no gaps) as input data, unlike typical neural networks, DINCAE can be used to extract nonlinear relationships from input data that have a large number of missing data. In their pioneering article, Barth et al. applied the DINCAE method to SST reconstruction of the Provencal Basin and obtained better results than those of the DINEOF reconstruction (cross-validation). Meanwhile, other machine learning methods have also been proposed to preserve more nonlinear, stochastic features in the reconstructed fields, such as Self Organizing Maps [25,26], a wavelet neural Network [27], and other neural network methods [28,29].

The purpose of this paper is to apply DINCAE to the reconstruction of Chl-a for the first time, and therefore to investigate the influence of extended input data (adding SST) on the reconstruction accuracy of DINCAE, and to compare the performance of DINEOF and DINCAE in the reconstruction of Chl-a. This work is organized as follows: The data used in this study is described in Section 2. Section 3 introduces the DINEOF and DINCAE methods and the parameter settings that are applied in this study. Section 4 compares the reconstruction results of DINEOF and DINCAE. The advantages and limitations of DINCAE for Chl-a reconstruction are discussed in Section 5. Finally, the conclusions are presented in Section 6.

2. Materials

2.1. Study Area

The Northern South China Sea (SCS) and West Philippine Sea (WPS), bounded by 115–125°E and 16–24°N, were selected to test the model performance for the reconstruction of Chl-a (Figure 1b). SCS is the largest oligotrophic marginal sea in the tropical and subtropical Pacific Ocean and is connected to the WPS only through Luzon Strait. There are different mechanisms controlling the Chl-a distributions in SCS and WPS, and this causes the mean chlorophyll concentration of SCS to be twice the concentration of the adjacent oligotrophic open ocean, including the upstream areas of the Kuroshio and WPS, as shown in Figure 2a; these statements are consistent with previous studies [14,30,31]. In SCS, the Chl-a distribution has distinct seasonality caused by monsoon winds [32,33], whereas in WPS, Chl-a has indistinct seasonality and variability is considered to be relatively small, with an essentially constant nutrient supply from the deep ocean throughout the year [34]. Therefore, we can test the performances of various models for the reconstruction of Chl-a under different Chl-a production mechanisms.

2.2. Chl-a

The satellite obtained sea surface chlorophyll-a (Chl-a_sat) data used in this study are a daily merged product obtained from the GlobColour project (http://globcolour.info). The data presented are a weighted average of multiple sensors (MODIS and VIIRSN (Visible Infrared Imaging Radiometer Suite)) with a resolution of ≈4 km for a period from 1 January 2015 to 31 December 2018. These data have been developed, validated, and distributed by ACRI-ST, France. For the domain of study, the size of the images is 240 × 192 pixels. Due to cloud cover, 82% of the sea pixels are missing in this dataset. The temporal variability of total missing data presents an irregular distribution with a minimum daily spatial average of about 30–50% and a maximum daily spatial average of more than 90% (Figure 1a). Spatially, as shown in Figure 1b, regions with the highest percentages of missing data are the coastal waters of Mainland China and ocean waters to the east of Taiwan, with an average data loss higher than 85%. Regions with the lowest percentage in missing data are the coastal waters off the Western Philippine Islands, with an average data loss between 50 and 70%. In the remaining regions, the average percentage of missing data varies from 70 to 85%. In order to obtain reliable results, images with cloud cover larger than 98% were not used for the analysis, which is the same threshold as that of Alvera-Azcárate et al. [35], so the final temporal size was 1268 images (out of an initial 1461 images).

2.3. SST

This study is different from other multivariate DINEOF Chl-a reconstruction methodologies, which use gapped SST satellite images, such as Alvera-Azcárate et al. [14] and Damien Sirjacobs et al. [15]. We used cloud-free daily MW-IR SST data spanning from 1 January 2015 to 31 December 2018, which can provide more effective information for Chl-a reconstruction. This dataset was generated by Remote Sensing Systems (RSS), using multiple sources of satellite data and optimal interpolation (OI) (www.remss.com/measurements/sea-surface-temperature/oisst-description/). The horizontal resolution was ≈9 km for SST, which was interpolated to the Chl-a grid (4 km).

2.4. Cross-Validation Data

There are two types of cross-validation datasets in this paper. One was used to optimize the reconstruction algorithm by minimizing the expected error between the cross-validation points and their reconstructed values, iteratively, denoted “test data” in Figure 3. The other dataset was considered to be the “missing data” in the whole reconstruction process, thereby considered to be independent in the evaluation of the performance of the reconstruction method, denoted “CV-data” in Figure 3. The CV-data were extracted following the methodology of Barth et al. [23]. First, the cloud masks from the first 50 images of the Chl-a time series were extracted. Second, the extracted cloud masks were “put” on the other 50 images, which were randomly selected from the rest of the Chl-a time series. In this way, a total of 286,461 measurements (0.55% of the total sea pixels and 2.57% of the total measurements) were withheld. The test data of the DINEOF methodology were extracted followed Alvera-Azcárate et al. [35], whereby the 20 cleanest images of the “Input data” in Figure 3 were covered using the clouds extracted from the rest of the images. Thus, a total of 434,834 measurements (0.83% of the total sea pixels and 4.01% of the valid pixels in the input data) were withheld. The test data for the DINCAE method are introduced in Section 3.2.

Figure 4 shows the temporal and spatial distributions of the CV-data. In terms of temporal distribution, the CV-data are randomly distributed among different days of the year. The ratio of the number of CV-data in each image to ocean pixels (41,098 points) is 0–60%. In terms of spatial distribution, CV-data are more densely distributed over the sea area near the Philippines and less densely distributed over the South China Sea continental shelf and the waters off the east coast of Taiwan, but generally cover the research area.

2.5. In Situ Dataset

In situ surface Chl-a data (called Chl-a_situ) from 6 to 17 August 2015 were collected during the Kuroshio experiment in the summer of 2015. The total number of surface samples was 21 from 15 in situ stations, the positions of which are shown by the blue circles in Figure 1b. For every in situ sample, the Chl-a satellite image with the shortest time interval (to the collection time of the in situ data sample) was selected, in order to search for the nearest grid cell relative to the location of the field measurement. In order to make the in situ validation more realistic, only cloudy satellite Chl-a matchups (20 samples) were used in the following calculations.

3. Methods

In this study, a total of four reconstruction schemes were implemented by combining the two reconstruction techniques and two input datasets, as shown in Table 1. The same CV-data were chosen to compare the performance of these four reconstruction schemes. The test data used to verify the reconstruction effort during the iteration were randomly generated internally by the reconstruction algorithm, shown in Figure 3 and Figure 5.

Additionally, Chl-a_sat data were transformed to base-e logarithm (ln) values, following the approach used by Hilborn and Costa [36]. The following discussion refers to ln(mg m⁻³) unless otherwise stated. The root mean square error (RMSE) and bias used in this paper are defined as follows:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {[\ln ({y^{'}}_{i}) - \ln (y_{i})]}^{2}},

(1)

Bias = \frac{1}{n} \sum_{i = 1}^{n} [\ln ({y^{'}}_{i}) - \ln (y_{i})],

(2)

where n is the number of samples,

y_{i}

is the satellite derived value, and

{y^{'}}_{i}

is the reconstructed value.

3.1. DINEOF

Before making the calculations, the spatial and temporal means were removed from the Chl-a_sat and SST_sat data separately, and results were normalized; missing data were set to zero. The missing data were then calculated by an iterative procedure. For a detailed description of DINEOF, please refer to Alvera-Azcárate et al. [9] and Alvera-Azcárate et al. [16]. A filter of the temporal covariance matrix was applied to improve the temporal coherency of the results, following Alvera-Azcárate et al. [35]. This filter effectively keeps the coherency in time to ensure a smooth transition between subsequent images. The filter length was 1.1 days. In this study, DINEOF was implemented using the 3.0 Linux binary available through the University of Liège GeoHydrodynamics and Environment Research group (GHER) [37].

The optimal number of EOFs retained for the reconstruction was determined by cross validation using the “test data” in Figure 3, and the convergence threshold was set to 0.001. The expected maximum for the number of modes was 72. DINEOF has two options to reconstruct missing data; fully reconstructed images and filled images. The former were activated in this study. Therefore, all ocean color data were reconstructed at every ocean pixel (including non-missing pixels).

3.2. DINCAE

The DINCAE method was presented by Barth et al. [23]. This method can be trained using incomplete satellite observations, in order to reconstruct missing observations and to provide an error estimate for the reconstruction. The open source DINCAE toolbox based on Python is available at https://github.com/gher-ulg/dincae. The specific methodology used for the DINCAE reconstruction is shown in Figure 5. It can be divided into three phases: data preprocessing, the training phase, and the reconstruction phase.

At the data preprocessing stage, the respective time averages of Chl-a_sat and SST_sat are subtracted in order to obtain the Chl-a and SST anomalies. Thereafter, Chl-a and SST anomalies are scaled by the inverse of the error variance which is effectively a constant (zero for the data that are missing). The precise value of this constant is not important because this constant will be multiplied by a weight matrix which will be optimized during the network training [23]. In this study, the parameters of the input dataset in rec2 (rec1) are given in Table 2, and the total input dataset size was 13 (10) × 240 × 192 × 1124.

During the training phase, the input data (see Figure 5) are used to train the DINCAE neural network iteratively. A benefit of the neural network training is that GPU (graphical processing unit) parallel computing can be applied, and the training speed is effectively accelerated. However, due to limits in GPU memory, the input dataset was randomly shuffled (over the time dimension) and then cut into minibatches that included 50 images, as an array of size 13 × 240 × 192 × 50. The complete time series was split into 26 minibatches with 50 images each and one last minibatch with only 18 images (representing a total of 1268, as mentioned before). An optimization cycle using all 26 minibatches is called an epoch. In order to prevent the minimization algorithm from being trapped in a local minimum, for each minibatch, the neural network must mark some valid data values as “missing” in order to test the training effect (test data), and the training itself uses the remaining valid data (training data) randomly. Nevertheless, the test data is eliminated in the reconstruction phase. Thus, the networks of DINCAE are trained using these training data, and the structure of DINCAE can be divided into five parts: an input layer, encoding layers, fully connected layers, decoding layers, and an output layer:

The input layer is used to receive the training datasets (training phase) and test datasets (reconstruction phase).
Encoding layers are equivalent to the singular value decomposition (SVD) in DINEOF, which can effectively reduce the dimensionality of the input data in order to simplify the complexity of the required computing used in the neural networks. The work of the convolution layer is to extract the features of the input data, where the number and scale of these features depend on the depth of the convolution layer, which is also called the “convolution kernel size”. The work of the pooling layer is to compress the features extracted by the convolution layers, and the compression degree is determined by the type, size, and strides of the pooling layer. In this study, the size of convolution kernel was (3 × 3), and the maximum pooling was chosen, where pool size = (2,2) and strides = (2,2).
Fully connected layers are equivalent to the hidden layer in the traditional feedforward neural network. Their function is to combine the extracted features nonlinearly. There were two full connection layers in this study, which were N/5 (rounded to the nearest integer) and N, where N is the number of neurons in the last pooling layer of the encoder.
Decoding layers are composed of convolution layers and interpolation layers (nearest neighbor interpolation) to upsample the results. The interpolation layers are skip connected to the output of the pooling layers in order to capture small-scale structures that are lost in the encoding layers and fully connected layers [38].
The output layer gives the results; that is, an array $T_{i j k}$ with a size of 240 × 192 × 2, including two parameters:
- $T_{i j 1}$ : Chl-a scaled by the inverse of the expected error variance;
- $T_{i j 2}$ : logarithm of the inverse of the expected error variance.

There are five encoding layers and five decoding layers in this work and their filter size is the same as that used by Barth et al. [23], hereafter referred to as the Barth scheme.

Similar to a DINEOF reconstruction (Figure 3), the CV-data in a DINCAE reconstruction are also not used during the reconstruction phase, as shown in Figure 5. In this study, the reconstruction was performed every 10 epochs to obtain the effect of reconstruction, by cross-validation. The rebuilt Chl-a anomaly

{\hat{c h l}}_{i j}

and the corresponding error variance

{\hat{σ}}_{i j}

are computed as:

{\hat{σ}}_{i j}^{2} = \frac{1}{\max (\exp (\min (T_{i j 1}, γ)), δ)},

(3)

{\hat{c h l}}_{i j} = T_{i j 2} {\hat{σ}}_{i j}^{2},

(4)

where

γ = 10

and

δ = 10^{- 3} \ln {(mg \cdot L^{- 1})}^{- 2}

, the same as those used by Barth et al. [23]. The two constants are introduced to avoid a division by a value close to zero.

Overfitting is a common problem in neural network training, as it leads to degraded results. In order to avoid overfitting, there are four strategies carried out during DINCAE training:

(1): A drop-out layer is introduced between the fully connected layers, and is randomly inactivated at a given rate (hereafter referred to as the dropout_rate, to be used later) of neurons in the fully connected layers.
(2): Gaussian-distributed noise is added to the input data with a zero mean and a given standard deviation (hereafter referred to as jitter_std, to be used later).
(3): A regularization technique is added to the Adam optimizer [39]. The basic idea of regularization is to add a penalty term to the loss function to punish the large weight and reduce the complexity of the model and prevent overfitting. In this study, the regularization parameter is set by $ϵ = 10^{- 8} + L 2 \cdot β_{L 2}$ , where L2 is the regularization term, $β_{L 2}$ is the weight for L2, and the value is given as L2_beta in the following discussion. Other standard parameters for the Adam optimizer are the learning rate $α$ = 0.001 and the exponential decay rates for the first moment estimates $β_{1}$ = 0.9 and the second moment estimates, $β_{2}$ = 0.999.
(4): The activation function in the convolution layers uses Leaky-RELU [40] instead of the rectified linear unit (RELU) to prevent gradients from falling too fast:

$f (x) = {\begin{array}{l} α x, (x < 0) \\ x, (x > = 0) \end{array}, α = 0.2 .$

(5)

Note that during the reconstruction phase, drop-out and Gaussian-distributed noise are disabled.

4. Results

For DINEOF reconstruction, the optimal numbers of EOFs kept for reconstruction are 23 and 26 for rec3 and rec4 respectively, minimizing the expected errors to 0.35 and 0.36 ln(mg m⁻³), respectively, as shown in Figure 6. These optimal EOFs can explain 98.74% and 98.59% of the initial variance for rec3 and rec4, respectively, which are calculated as the variance of points not covered by clouds.

DINCAE is different from DINEOF. Whereas the DINCAE method is a new technique, DINEOF has proven its generality by many studies [10,17,18,19]. Although Barth et al. [23] obtained good results in their SST reconstruction using their EOF scheme with dropout_rate = 0.3, jitter_std = 0.05, and L2_beta = 0, its robustness should be further tested in many future studies. In this study, we took the Barth scheme as a reference scheme to find the optimal combination of dropout_rate, jitter_std, and L2_beta for rec2, proceeding in a step-by-step methodology.

First, an optimal jitter_std was chosen from (0.05, 0.10, 0.15). The specific method is that we let dropout_rate = 0.3 and jitter_std = 0, the same as in the Barth scheme, but we changed jitter_std only. The RMSE between the CV-data and the corresponding reconstruction data was computed every 10 epochs. The results show that the RMSE is smallest when jitter = 0.05 (Figure 7a). We update jitter in the reference scheme, and here we kept the jitter to 0.05, and then, in a similar way, we then chose the optimal dropout_rate and L2_beta from (0.3, 0.9, 1.2) and (0, 0.01, 0.001, 0.0001), respectively. Finally, we obtained an optimal combination; that is, jitter_std = 0.05, dropout_rate = 0.9, and L2_beta = 0.001. Applying the same parameterization selection process to rec1, we obtained the same optimal combination as rec2. The results for rec1 and rec2, which use the optimal combination, are shown in Figure 7d. Both RMSE solid lines have an initial sharp decrease, and after 200 epochs they mostly stabilize, but still present some fluctuations. The dotted blue and orange lines represent the average reconstructions between epochs 200 and 1000 for rec1 and rec2 respectively. The results show that the performance of rec2 is slightly better than that of rec1.

In this study, the computing environment was an Intel Core E5-2643 V4, GeForce GTX1080Ti, with the tensorflow v1.2 network library [41]. The calculation times for the four reconstruction schemes from rec1 to rec4 were 5, 5.5, 24.2 and 61.9 h, respectively. Due to the GPU acceleration in computation, the calculation time of DINCAE (1000 epochs) was an order of magnitude smaller than that of DINEOF. As shown in Figure 5, the test data in each minibatch were randomly masked as clouds. This random selection ensures the fairness of the neural network training, but also leads to some fluctuations in the gradients, and therefore, in the weights of DINCAE. Averaging the output over different epochs can effectively eliminate these random fluctuations, which was shown by Barth et al. [23]. In this study, a total of 80 outputs of the neural network between epoch 200 and epoch 1000 (saved every 10th epoch) were averaged, and this averaged reconstruction is slightly better than any intermediate result (dotted lines in Figure 7d). In the analysis, we took those averaged fields to represent the DINCAE reconstructions.

4.1. Reconstruction Statistics Using CV-Data

The scatterplot between the CV-data (Chl-a_sat) and the corresponding Chl-a_rec model results are shown in Figure 8. Comparisons are conducted at all CV-data (286,461). The scatter distribution of rec1 and rec2 results exhibits the highest pixel density along the “ideal” dotted lines, especially in the range of 0.1–1.0 mg m⁻³, while the scatters for rec3 and rec4 results are more evenly distributed on both sides of the “ideal” dotted line. It is evident that the Chl-a_rec results are notably biased low at higher concentrations (>10 mg m⁻³) for rec1 and rec2, whereas all four reconstruction schemes are biased high at lower concentrations (<0.02 mg m⁻³). The statistical results for CV-data correlated with the corresponding Chl-a_rec results, for all reconstruction implementations, show that the RMSEs between the reconstructed data and original data are 0.28, 0.27, 0.30, and 0.33 ln(mg m⁻³) for rec1, rec2, rec3, and rec4, respectively. The corresponding linear correlation coefficients (R) are 0.95, 0.95, 0.94, and 0.93 respectively, and the corresponding mean biases are −0.008, 0.003, 0.006, and 0.007 ln(mg m⁻³), respectively. Obviously, rec2 performs best overall, producing the lowest RMSE (0.27), the highest linear correlation coefficient (R) (0.95), and the bias nearest to zero (0.003).

Spatially, the RMSE values calculated between CV-data and the corresponding reconstruction values, for each pixel time series, are shown in Figure 9. These four reconstruction schemes have similar RMSE spatial patterns, with larger RMSE values over the SCS continental shelf and coastal waters, compared to smaller RMSE values over the open ocean. It is notable that adding SST to the DINCAE reconstruction can reduce the RMSE over the shelf area (Figure 9a,b). By contrast, the RMSE over the shelf area increases when SST is added to the DINEOF reconstruction (Figure 9c,d).

As mentioned in Section 2.1, the chlorophyll concentration controlling mechanisms are very different for SCS and WPS. For simplicity, we partitioned the study area into SCS and WPS by the 120-degree longitude line, in order to assess the model performance for the Chl-a reconstructions under different Chl-a production mechanisms. Table 3 shows the numbers and the mean of cross-validation points between these areas, SCS and WPS, and provides the statistical results for the four reconstruction schemes in these two sub-areas. The cross-validation points in the SCS and WPS are not much different, specifically 145,476 and 140,985 points respectively, whereas the average chlorophyll concentration for SCS is three times that of WPS (0.34 mg m⁻³ vs. 0.11 mg m⁻³). The comparison shows that, in general for all four reconstruction schemes, higher productivity is achieved in SCS than in WPS. Similar to the global reconstruction results shown in Figure 8, DINCAE is superior to DINEOF in these two sub-areas, producing lower RMSE and higher R. Specifically, DINCAE reconstructions are slightly improved by adding SST to the input data in SCS, as shown by comparing rec1 and rec2, producing lower RMSE values (from 0.26 to 0.25), higher R values (from 0.95 to 0.96), and biases that are nearest zero (from −0.027 to −0.019). However, the improvement from adding SST in DINCAE reconstructions disappears in WPS. For all reconstruction implementations (Table 3), the intercepts are less than 0 and slopes are less than 1, meaning that the reconstructed Chl-a values are underestimated at high concentrations, and overestimated at low concentrations.

4.2. Spatial Distribution of Reconstruction for a Specified Day

To further verify the ability of spatial reconstruction for these four reconstruction cases, we selected 17 May 2018, which has the most points in 50 CV-data figures to make comparisons between the spatial distributions of the satellite-derived and the modeled Chl-a. The corresponding SST distribution which was used as the auxiliary variable in rec2 and rec4 is also shown in Figure 10g. The satellite image, with more than 90% gaps over the whole study domain, was used in the reconstruction by DINCAE and DINEOF methodologies. These reconstructed fields were generated from the artificial cloud dataset Chl-a_sat+cloud and have a reasonably accurate spatial distribution, with patterns that are also indirectly exhibited by those of Chl-a_rec (see Figure 10a,c–f). As such, all four reconstruction schemes can reconstruct the high Chl-a variations (bright yellow color in Figure 10a–f) exhibited in the Taiwan Bank Upwelling and Dongshan Upwelling [12], where very strong upwelling brings cold deep water to the surface, leading to the relatively low SST shown in Figure 10g. Mesoscale features such as filaments associated with the Kuroshio near the Luzon Strait are evident, as shown in Chl-a and SST distributions (Figure 10a,g), and were also revealed in the four reconstruction methodologies (Figure 10c–f).

In order to clearly compare the spatial differences between the four modeled Chl-a and the remotely sensed Chl-a, we introduce the spatial distribution of relative error (Re_err) defined as (Chl-a_rec-Chl-a_sat)/Chl-a_sat, shown in Figure 11. Obviously, the Re_err values for rec1, rec2, and rec4 are much larger than those of rec3 near the Han River discharge area (22°N, 115°E). A possible reason is that the Chl-a distribution is greatly affected by runoff and human activities near the river plume, leading to low correlation coefficients (−0.15, 0.15) between Chl-a and SST, as shown in Figure 2b. DINCAE may not capture the characteristics of the Chl-a distribution in this area, or there may be overfitting. Furthermore, adding auxiliary data such as SST into the reconstruction models may not improve the results, and may even make them worse, given the near-zero correlation coefficients between Chl-a and SST in this area. For the remaining area, the performance of DINCAE is better than DINEOF, especially in the basin areas. Furthermore, all four methods perform well in terms of scatterplots and statistical results, as shown in Figure 12. Moreover, rec2 has the best overall results, producing the lowest RMSE (0.20), with intercept nearest to zero (−0.14), highest R (0.96), and slope closest to 1.0 (0.94).

4.3. Validation with In Situ Observations

A total of 20 samples were used to assess the performance of the Chl-a reconstruction algorithms. As shown in Figure 13, the Chl-a_rec of the four reconstruction schemes generally agree well with in situ Chl-a. Among them, the fitted line for rec1 (the blue dotted line in Figure 13) is closest to the ideal line (the grey solid line in Figure 13), with R, slope, and intercept values of 0.925, 0.81, and −0.16 ln(mg m⁻³), respectively (shown in Table 4). For these four reconstruction schemes, the slopes of the linear fit between Chl-a_situ and Chl-a_rec are all less than 1, with negative intercepts and high correlation coefficients (R > 0.91), indicating the trend that reconstructed Chl-a values appear to be overestimated for lower values (<0.4 mg m⁻³) and underestimated for higher values (0.4–1.6 mg m⁻³); this pattern is similar to that shown by the cross validation mentioned above (Figure 8 and Figure 12). One possible reason for this result is the smoothness of the reconstruction method, which makes the reconstructed values in the range of extreme values shift to the normal values. It is not expected that in the comparisons of the two pairs of univariate and multivariable reconstruction schemes, viz., in comparing rec1 vs. rec2 and rec3 vs. rec4, both univariate schemes perform better than the corresponding multivariable schemes, with a lower RMSE (0.38 vs. 0.42 and 0.38 vs. 0.44), a higher R (0.925 vs. 0.917 and 0.921 vs. 0.917), a slope nearer to 1 (0.81 vs. 0.66 and 0.75 vs. 0.63), and an intercept closer to 0 (−1.6 vs. −0.46 and −0.31 vs. −0.47).

5. Discussion

In this paper, both CV-data and in situ data were used to assess the performances of four reconstruction schemes. Among them, the CV-data were marked as clouds and never used in the whole reconstruction process, so that those data could be considered independent. However, since the CV-data and input data were from the same dataset (see Figure 3), compared with other truly independent data, the distribution of the CV-data was obviously more similar to the input data. Therefore, the cross-validation results were expected to be unrealistically good when using these CV-data as the cross validation dataset. However, as mentioned above, the shortage of CV-data can be ignored when using the same CV-data to compare the performances of the four reconstruction schemes. For the in situ validation, when matching the field—measured samples with the Chl-a image sequence, using a large time window (12 h) may lead to a slight bias in the field evaluation, but this was not expected to qualitatively alter our conclusions.

The methodology of the DINEOF cross-validation dataset selection (test data in Figure 3) is also very important. Usually, randomly-selected cross-validation points will lead to quite a high number of retained EOFs and an unrealistically low cross-validation RMSE. However, using point clusters instead of random points as cross-validation data can reduce the number of retained EOFs and also improve the reconstruction results by removing high EOF modes which mostly contain noise.

The trained DINCAE, discussed in the previous sections, has been saved and is denoted rec1-2015 (and rec2-2015). The latter can be reloaded to reconstruct additional datasets that originate from the same source as trained data, but at different times, such as the dataset obtained for January-to-October 2019; this process is called DINCAE-prediction. In order to capture the capability of DINCAE-prediction, we extracted CV-data-2019 from a dataset from 2019 and used the remaining dataset, denoted input-2019, as input data for reconstruction, using the DINCAE method. The CV-data-2019 selection is the same as the test data selection described in Figure 5, but now covers the 10 cleanest images of input-2019, using the clouds extracted randomly from the rest of the images. Thus, a total of 214,121 measurements were withheld.

We used the models denoted rec1-2015/rec2-2015, trained with 900 epochs, using the dataset of the time period, 2015–2018, to reconstruct the missing Chl-a values of input-2019. The scatterplots between the CV-data-2019 and the corresponding predicted data using rec1-2015 and rec2-2015 are shown in Figure 14a,b. The predicted data agree well with CV-data-2019, with RMSE, R, slope, and intercept values of 0.31/0.32, 0.9/0.9, 0.76/0.77, and −0.59/−0.46 for rec1-2015/rec2-2015, respectively. Furthermore, the models of rec1-2015 and rec2-2015 can be retrained and updated using a new dataset. Here, we used input-2019 to retrain rec1-2015 and rec2-2015 iteratively for 1000 epochs, using an average of 200–1000 epochs as the reconstruction values. For consistency, the latter models are referred to as rec1-2019 and rec2-2019. The scatterplots between the CV-data-2019 and the corresponding reconstructed data using rec1-2019 and rec2-2019 are shown in Figure 14c,d. Retraining DINCAE further improves the performance of the DINCAE reconstruction methodology, with RMSE, R, slope, and intercept values of 0.26/0.26, 0.93/0.93, 0.81/0.80, and −0.44/−0.44 for rec1-2019/rec2-2019, respectively.

Both validations used CV-data and in situ data, proving that DINCAE (rec1 and rec2) performs better than DINEOF (rec3 and rec4) in Chl-a reconstructions. See Figure 8 and Figure 13 for details. The same results were obtained by Barth et al. [23] in SST reconstructions. Therefore, it is reasonable to suggest that DINCAE is a valid generic reconstruction method like DINEOF. An unexpected result is that the performances of the two multivariate reconstructions (rec2 and rec4) are worse than those of the two univariate reconstructions (rec1 and rec3) with respect to in situ validation (see Figure 13 and Table 4). Compared with the cross-validation dataset, the number of in situ samples in this paper was too small (20 samples), and the measured time is too short, too concentrated (6 to 17 August 2015). Although the in situ validation is more convincing, the cross-validation in this study can better reflect the overall reconstruction effect of the missing dataset. In this study, we did not filter the outliers, which may slightly influence the performance of reconstruction methodologies. But this is not the key focus of this study.

6. Conclusions

In this study, we successfully applied DINCAE to reconstruct a series of Chl-a satellite products over the Northern South China Sea (SCS) and the West Philippine Sea (WPS). To our knowledge, this is the first time that the DINCAE technique has been used to reconstruct gaps in the Chl-a_sat data. We also compared the reconstruction performances of DINCAE and DINEOF using Chl-a-only data, and Chl-a and SST data, as input data based on the same cross-validation data. In this methodology, a total of four reconstruction schemes are implemented. Both cross-validation and in situ validation show the high performances of DINEOF and DINCAE methods in Chl-a reconstruction. Furthermore, DINCAE is superior to DINEOF. A somewhat surprising result is that, in contrast to the cross-validation results, the performances of the multivariate reconstruction methods in in situ validation are not as good as those of the univariate reconstruction methods. The prediction capability of DINCAE was also tested and results were found to agree well with the CV-data-2019; see Figure 14a,b. Retraining the prediction model further improves the reconstruction results, as shown in Figure 14c,d.

It is notable that for similar results, DINCAE is 5–10 times faster than DINEOF. Both DINCAE and DINEOF have better performances in SCS than in WPS, where there are different chlorophyll growth and control mechanisms. Considering that the average SCS chlorophyll concentration is three times that of WPS, given the same sources of noise, the signal-to-noise ratio for Chl-a_sat in SCS will always be higher than in WPS. Thus, the performances of these reconstruction methodologies are accordingly different.

A common problem in neural network training is overfitting of the provided observations, leading to degraded results when using the cross validation. Some technical solutions can prevent overfitting, but their parameters need to be modified to suit different datasets. Through a series of sensitivity tests, we found that L2_beta is a key parameter to prevent overfitting in DINCAE. Other parameters (dropout_rate and jitter_std) may only have small effects. Similar to Barth et al. in SST reconstruction, the epoch average improved the results, albeit not as clearly Barth et al. [23]. However, epoch-averaging is a necessary method, because there are limitations in the cross-validation data, because it can only represent a small part of the whole dataset. Therefore, an accurate determination of which epoch can provide the actual minimum error for the reconstruction result is impossible; however, the epoch-averaging method can provide a way to resolve this issue.

In the data analysis methods that can be used to reconstruct missing data, the methodologies in this paper can be used to calculate reduction dimension projections, such as the singular value decomposition in DINEOF and the auto-encoder in DINCAE. This approach will inevitably lose some small-scale information. Therefore, data smoothing occurs. However, DINCAE can capture nonlinear and mutated relationships, thereby resolving part of the problem. Thus, this may be the reason that DINCAE is better than DINEOF, in terms of RMSE and the linear correlation coefficients between the CV-data (in situ data) and the corresponding molded data. In the future, a multivariate DINCAE methodology can be improved by adding more auxiliary data to DINCAE, such as wind fields and currents, and the resulting smoothness may be further reduced.

Author Contributions

Y.H. contributed to the idea of this study; Z.H. preprocessed data, analyzed data, and wrote the paper; Y.H., G.L., and W.P. helped with manuscript’s preparation and revision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation under grant 416220104003, the National Program on Global Change and Air-Sea Interaction under grant GASI-IPOVAI-04, the National Key Research and Development Program under grant 2016YFC1401002, and the Research and Innovation Project for College Graduates of Jiangsu Province under grant KYLX15_0872. From Canada, funding support was provided by the Government Research Initiative Program (GRIP) of the Canada Space Agency.

Acknowledgments

We thank the GHER group at the University of Liège for providing the DINCAE and DINEOF toolboxes, Alexander Barth for useful advice, and GlobColour for providing the satellite Chl-a data used. We are also grateful to those who provided access to the remote sensing system with which the daily MW-IR SST data was recorded. Furthermore, the in situ datasets were provided by Wenxi Cao and his research team from South China Sea Institute of Oceanography, Chinese Academy of Sciences.

Conflicts of Interest

The authors declare that there is no conflict of interest.

References

King, M.D.; Platnick, S.; Menzel, W.P.; Ackerman, S.A.; Hubanks, P.A. Spatial and temporal distribution of clouds observed by MODIS onboard the Terra and Aqua satellites. IEEE Trans. Geosci. Remote Sens. 2013, 51, 3826–3852. [Google Scholar] [CrossRef]
Feng, L.; Hu, C. Cloud adjacency effects on top-of-atmosphere radiance and ocean color data products: A statistical assessment. Remote Sens. Environ. 2016, 174, 301–313. [Google Scholar] [CrossRef]
Reynolds, R.W.; Smith, T.M.; Liu, C.; Chelton, D.B.; Casey, K.S.; Schlax, M.G. Daily High-Resolution-Blended Analyses for Sea Surface Temperature. J. Clim. 2007, 20, 5473–5496. [Google Scholar] [CrossRef]
Group for High Resolution Sea Surface Temperature (GHRSST). Available online: https://www.ghrsst.org/ghrsst-data-services/products/ (accessed on 13 November 2019).
Hosoda, K.; Sakaida, F. Global Daily High-Resolution Satellite-Based Foundation Sea Surface Temperature Dataset: Development and Validation against Two Definitions of Foundation SST. Remote Sens. 2016, 8, 962. [Google Scholar] [CrossRef] [Green Version]
Bennett, A.F. Inverse Modeling of the Ocean and Atmosphere, 1st ed.; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
Saulquin, B.; Gohin, F.; Garrello, R. Regional Objective Analysis for Merging High-Resolution MERIS, MODIS/Aqua, and SeaWiFS Chlorophyll-a Data From 1998 to 2008 on the European Atlantic Shelf. IEEE Trans. Geosci. Remote Sens. 2011, 49, 143–154. [Google Scholar] [CrossRef] [Green Version]
Beckers, J.M.; Rixen, M. EOF Calculations and Data Filling from Incomplete Oceanographic Datasets. J. Atmos. Ocean. Technol. 2003, 20, 1839–1856. [Google Scholar] [CrossRef]
Alvera-Azcárate, A.; Barth, A.; Rixen, M.; Beckers, J.M. Reconstruction of incomplete oceanographic data sets using empirical orthogonal functions: Application to the Adriatic Sea surface temperature. Ocean Model. 2005, 9, 325–346. [Google Scholar] [CrossRef] [Green Version]
Miles, T.N.; He, R.; Li, M. Characterizing the South Atlantic Bight seasonal variability and cold-water event in 2003 using a daily cloud-free SST and chlorophyll analysis. Geophys. Res. Lett. 2009, 36. [Google Scholar] [CrossRef] [Green Version]
Abraham, E.R. The generation of plankton patchiness by turbulent stirring. Nature 1998, 391, 577–580. [Google Scholar] [CrossRef]
Tang, D.; Kester, D.R.; Ni, I.H.; Kawamura, H.; Hong, H. Upwelling in the Taiwan Strait during the summer monsoon detected by satellite and shipboard measurements. Remote Sens. Environ. 2002, 83, 457–471. [Google Scholar] [CrossRef]
Lin, J.; Cao, W.; Wang, G.; Hu, S. Satellite-observed variability of phytoplankton size classes associated with a cold eddy in the South China Sea. Mar. Pollut. Bull. 2014, 83, 190–197. [Google Scholar] [CrossRef] [PubMed]
Shiozaki, T.; Chen, Y.-L.L. Different mechanisms controlling interannual phytoplankton variation in the South China Sea and the western North Pacific subtropical gyre: A satellite study. Adv. Space Res. 2013, 52, 668–676. [Google Scholar] [CrossRef]
Volpe, G.; Nardelli, B.B.; Cipollini, P.; Santoleri, R.; Robinson, I.S. Seasonal to interannual phytoplankton response to physical processes in the Mediterranean Sea from satellite observations. Remote Sens. Environ. 2012, 117, 223–235. [Google Scholar] [CrossRef]
Alvera-Azcárate, A.; Barth, A.; Beckers, J.M.; Weisberg, R.H. Multivariate reconstruction of missing data in sea surface temperature, chlorophyll, and wind satellite fields. J. Geophys. Res. Oceans 2007, 112. [Google Scholar] [CrossRef] [Green Version]
Sirjacobs, D.; Alvera-Azcárate, A.; Barth, A.; Lacroix, G.; Park, Y.; Nechad, B.; Ruddick, K.; Beckers, J.-M. Cloud filling of ocean colour and sea surface temperature remote sensing products over the Southern North Sea by the Data Interpolating Empirical Orthogonal Functions methodology. J. Sea Res. 2011, 65, 114–130. [Google Scholar] [CrossRef]
Yao, Z.; He, R. Cloud-free sea surface temperature and colour reconstruction for the Gulf of Mexico: 2003–2009. Remote Sens. Lett. 2012, 3, 697–706. [Google Scholar] [CrossRef]
Li, Y.; He, R. Spatial and temporal variability of SST and ocean color in the Gulf of Maine based on cloud-free SST and chlorophyll reconstructions in 2003–2012. Remote Sens. Environ. 2014, 144, 98–108. [Google Scholar] [CrossRef]
Guisan, A.; Zimmermann, N.E. Predictive habitat distribution models in ecology. Ecol. Model. 2000, 135, 147–186. [Google Scholar] [CrossRef]
Elith, J.; Graham, C.H.; Anderson, R.P.; Dudík, M.; Ferrier, S.; Guisan, A.; Hijmans, R.J.; Huettmann, F.; Leathwick, J.R.; Lehmann, A. Novel methods improve prediction of species’ distributions from occurrence data. Ecography 2006, 29, 129–151. [Google Scholar] [CrossRef] [Green Version]
Smoliński, S.; Radtke, K. Spatial prediction of demersal fish diversity in the Baltic Sea: Comparison of machine learning and regression-based techniques. ICES J. Mar. Sci. 2016, 74. [Google Scholar] [CrossRef] [Green Version]
Barth, A.; Alvera-Azcárate, A.; Licer, M.; Beckers, J.M. DINCAE 1.0: A convolutional neural network with error estimates to reconstruct sea surface temperature satellite observations. Geosci. Model Dev. Discuss. 2019, 2019. [Google Scholar] [CrossRef]
Chicco, D.; Sadowski, P.; Baldi, P. Deep autoencoder neural networks for gene ontology annotation predictions. In Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB ’14), Newport Beach, CA, USA, 20–23 September 2014; pp. 533–540. [Google Scholar]
Jouini, M.; Lévy, M.; Crépon, M.; Thiria, S. Reconstruction of satellite chlorophyll images under heavy cloud coverage using a neural classification method. Remote Sens. Environ. 2013, 131, 232–246. [Google Scholar] [CrossRef]
Chapman, C.; Charantonis, A.A. Reconstruction of subsurface velocities from satellite observations using iterative self-organizing maps. IEEE Geosci. Remote Sens. Lett. 2017, 14, 617–620. [Google Scholar] [CrossRef]
Patil, K.; Deo, M.C. Prediction of daily sea surface temperature using efficient neural networks. Ocean Dyn. 2017, 67, 357–368. [Google Scholar] [CrossRef]
Krasnopolsky, V.; Nadiga, S.; Mehra, A.; Bayler, E.; Behringer, D. Neural networks technique for filling gaps in satellite measurements: Application to ocean color observations. Comput. Intell. Neurosci. 2016, 2016, 6156513. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jo, Y.; Kim, D.; Kim, H. Chlorophyll Concentration Derived from Microwave Remote Sensing Measurements USING Artificial Neural Network Algorithm. J. Mar. Sci 2018, 26, 102–110. [Google Scholar] [CrossRef]
Chen, Y.L.L.; Chen, H.-Y.; Tuo, S.-H.; Ohki, K. Seasonal dynamics of new production from Trichodesmium N2 fixation and nitrate uptake in the upstream Kuroshio and South China Sea basin. Limnol. Oceanogr. 2008, 53, 1705–1721. [Google Scholar] [CrossRef] [Green Version]
Liu, K.; Atkinson, L.; Quinones, R.; Talaue-McManus, L. Carbon and Nutrient Fluxes in Continental Margins: A Global Synthesis; Springer Science & Business Media: Berlin, Germany, 2010. [Google Scholar] [CrossRef]
Liu, K.K.; Chao, S.Y.; Shaw, P.T.; Gong, G.C.; Chen, C.C.; Tang, T.Y. Monsoon-forced chlorophyll distribution and primary production in the South China Sea: Observations and a numerical study. Deep Sea Res. Part I Oceanogr. Res. Pap. 2002, 49, 1387–1412. [Google Scholar] [CrossRef]
Liu, K.-K.; Tseng, C.-M.; Yeh, T.-Y.; Wang, L.-W. Elevated phytoplankton biomass in marginal seas in the low latitude ocean: A case study of the South China Sea. In Advances in Geosciences: Volume 18: Ocean Science (OS); World Scientific: Singapore, 2010; pp. 1–17. [Google Scholar] [CrossRef]
McClain, C.R.; Signorini, S.R.; Christian, J.R. Subtropical gyre variability observed by ocean-color satellites. Deep Sea Res. Part II Top. Stud. Oceanogr. 2004, 51, 281–301. [Google Scholar] [CrossRef] [Green Version]
Alvera-Azcárate, A.; Barth, A.; Sirjacobs, D.; Beckers, J.M. Enhancing temporal correlations in EOF expansions for the reconstruction of missing data using DINEOF. Ocean Sci. 2009, 5, 475–485. [Google Scholar] [CrossRef] [Green Version]
Hilborn, A.; Costa, M. Applications of DINEOF to satellite-derived chlorophyll-a from a productive coastal region. Remote Sens. 2018, 10, 1449. [Google Scholar] [CrossRef] [Green Version]
DINEOF—GHER. Available online: http://modb.oce.ulg.ac.be/mediawiki/index.php/DINEOF (accessed on 13 November 2019).
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Cham, Switzerland, 5–9 October 2015; pp. 234–241. [Google Scholar]
Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Zheng, X. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2015, arXiv:1603.04467. [Google Scholar]

Figure 1. (a) Temporal variation of cloud cover percentage in the domain (in black) and with a low-pass filter of 30 days (red line); (b) mean percentage of cloud coverage for the complete set of images. SCS stands for South China Sea, and WPS stands for West Philippine Sea. The blue circles represent sampling stations.

Figure 2. (a) Averaged Chl-a for 2015 to 2018 using valid pixels; (b) the correlation coefficient between Chl-a and SST over 2015–2018. The red circles represent sampling stations, and the dotted lines are the bathymetric contours of −500 m and −1000 m.

Figure 3. Validation data selection process for DINEOF, where the percentage represents the rate of cross-validation points and 52,112,264 of the total number of sea pixels.

Figure 4. (a) Percentages for the temporal distribution of cross-validation data. (b) Spatial distribution density of the cross-validation data, where pixels occurring less than twice are marked as white.

Figure 5. Schematic flowchart for DINCAE reconstruction, where the percentage represents the rate of cross-validation points and 52,112,264 of the total number of sea pixels.

Figure 6. Cross-validation root mean square error (RMSE) (solid lines) and cumulative explained variance (solid circles) for the DINEOF reconstructions. The minimum error is marked with a triangle on each line.

Figure 7. Cross-validation errors as a function of iteration for the DINCAE experiments. (a) The optimal jitter_std in rec2 chosen from (0.05, 0.10, 0.15) based on dropout_rate = 0.3 and L2_beta = 0; (b) the optimal dropout_rate in rec2 chosen from (0.3, 0.9, 1.2) based on jitter_std = 0.05 and L2_beta = 0; (c) the optimal L2_beta in rec2 chosen from (0, 0.01, 0.001, 0.0001) based on jitter_std = 0.05 and dropout_rate = 0.9; (d) comparison of the expected errors between rec1 and rec2 using the parameters that jitter_std = 0.05, dropout_rate = 0.9, and L2_beta = 0.001, and the dotted lines are the RMSE values of the CV-data and the averaged outputs of DINCAE between epoch 200 and epoch 1000.

Figure 8. Scatterplots of the CV-data (Chl-a_sat) versus the corresponding modeled data (Chl-a_rec) for the four reconstruction schemes. Color scale indicates the number of points in each 0.01 * 0.01 log10 (mg m⁻³) bin. Green dotted lines represent the best fit line and the grey dotted lines represent the 1:1 line.

Figure 9. Spatial distribution of RMSE between cross-validation data and corresponding modeled data for the four reconstruction schemes: (a) rec1, (b) rec2, (c) rec3 and (d) rec4.

Figure 10. Daily reconstruction of 17 May, 2018: (a) original Chl-a_sat; (b) adding artificial cloud to original images Chl-a_sat_+cloud, (c) Chl-a_rec1, (d) Chl-a_rec2, (e) Chl-a_rec3, (f) Chl-a_rec4, and (g) SST_sat.

Figure 11. Spatial distribution of relative error, (Chl-a_rec-Chl-a_sat)/Chl-a_sat for 17 May 2018.

Figure 12. Scatterplots of the CV-data (Chl-a_sat) versus the corresponding modeled data (Chl-a_rec) for the four reconstruction schemes on 17 May, 2018. Color scale indicates the number of points in each 0.01 * 0.01 log10 (mg m⁻³) bin. Green dotted lines show the best fit line and grey dotted lines represent the 1:1 line.

Figure 13. Comparison of the reconstructed Chl-a_rec data from the four reconstruction schemes, with in situ Chl-a_situ data. The dotted lines represent the fitted lines for the corresponding scatter points using the same colors, for the four schemes. The grey solid line represents the ideal line.

Figure 14. Scatterplots of the cross-validation data (Chl-a_sat) versus the corresponding modeled data (Chl-a_rec), where the Chl-a_sat ranged from 2019.1 to 2019.10. (a) Chl-a_rec reconstructed by the rec1-2015 model; (b) Chl-a_rec reconstructed by the rec2-2015 model; (c) Chl-a_rec reconstructed by the rec1-2019 model; (d) Chl-a_rec reconstructed by the rec2-2019 model. The color scale indicates the number of points in each 0.01 * 0.01 log10 (mg m⁻³) bin. Green dotted lines show the best fit line and grey dotted lines represent the 1:1 line.

Table 1. Four reconstruction schemes.

Reconstruction	Method	Input Satellite Data
rec1	DINCAE	ln(Chl-a)
rec 2	DINCAE	ln(Chl-a) and SST
rec 3	DINEOF	ln(Chl-a)
rec 4	DINEOF	ln(Chl-a) and SST

Table 2. The total list of input parameters.

Index	Var Name
1	ln(Chl-a) anomalies scaled by the inverse of the error variance (zero if the data are missing)
2	Inverse of the error variance (zero if the data are missing)
3–4	Scaled ln(Chl-a) anomalies and inverse of error variance of the previous day
5–6	Scaled ln(Chl-a) anomalies and inverse of error variance of the following day
7	Longitude (scaled linearly between −1 and 1)
8	Latitude (scaled linearly between −1 and 1)
9	Cosine of the day of the year divided by 365.25
10	Sine of the day of the year divided by 365.25
11	SST anomalies scaled by the inverse of the error variance (zero if the data are missing) *
12	Scaled SST anomalies of the previous day *
13	Scaled SST anomalies of the following day *

* used in rec2.

Table 3. Relationship between Chl-a_rec values and corresponding Chl-a_sat values in SCS and WPS.

Area	N, Mean, mg m⁻³	Method	RMSE, ln(mg m⁻³)	Bias, ln(mg m⁻³)	R	Slope	Intercept
SCS	145,476 0.34	rec1	0.26	−0.027	0.95	0.89	−0.22
		rec2	0.25	−0.019	0.96	0.90	−0.19
		rec3	0.28	−0.014	0.94	0.88	−0.21
		rec4	0.32	−0.005	0.92	0.92	−0.12
WPS	140,985 0.11	rec1	0.29	0.011	0.91	0.77	−0.57
		rec2	0.28	0.025	0.91	0.78	−0.53
		rec3	0.31	0.027	0.89	0.82	−0.43
		rec4	0.33	0.009	0.88	0.81	−0.47

Table 4. Relationship between Chl-a_rec values and corresponding Chl-a_situ values.

In Situ Data	Method	RMSE, ln(mg m⁻³)	Bias, ln(mg m⁻³)	R	Slope	Intercept
N = 20 mean = −1.61 ln(mg m⁻³)	rec1	0.38	−0.14	0.925	0.81	−0.16
	rec2	0.42	−0.09	0.917	0.66	−0.46
	rec3	0.38	−0.09	0.921	0.75	−0.31
	rec4	0.44	−0.11	0.917	0.63	−0.47

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, Z.; He, Y.; Liu, G.; Perrie, W. Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea. Remote Sens. 2020, 12, 480. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12030480

AMA Style

Han Z, He Y, Liu G, Perrie W. Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea. Remote Sensing. 2020; 12(3):480. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12030480

Chicago/Turabian Style

Han, Zhaohui, Yijun He, Guoqiang Liu, and William Perrie. 2020. "Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea" Remote Sensing 12, no. 3: 480. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12030480

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea

Abstract

1. Introduction

2. Materials

2.1. Study Area

2.2. Chl-a

2.3. SST

2.4. Cross-Validation Data

2.5. In Situ Dataset

3. Methods

3.1. DINEOF

3.2. DINCAE

4. Results

4.1. Reconstruction Statistics Using CV-Data

4.2. Spatial Distribution of Reconstruction for a Specified Day

4.3. Validation with In Situ Observations

5. Discussion

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI