Next Article in Journal
Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series
Previous Article in Journal
Monitoring and Forecasting the Impact of the 2018 Summer Heatwave on Vegetation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Complex-Valued Convolutional Autoencoder and Spatial Pixel-Squares Refinement for Polarimetric SAR Image Classification

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Submission received: 26 January 2019 / Revised: 23 February 2019 / Accepted: 25 February 2019 / Published: 4 March 2019

Abstract

:
Recently, deep learning models, such as autoencoder, deep belief network and convolutional autoencoder (CAE), have been widely applied on polarimetric synthetic aperture radar (PolSAR) image classification task. These algorithms, however, only consider the amplitude information of the pixels in PolSAR images failing to obtain adequate discriminative features. In this work, a complex-valued convolutional autoencoder network (CV-CAE) is proposed. CV-CAE extends the encoding and decoding of CAE to complex domain so that the phase information can be adopted. Benefiting from the advantages of the CAE, CV-CAE extract features from a tiny number of training datasets. To further boost the performance, we propose a novel post processing method called spatial pixel-squares refinement (SPF) for preliminary classification map. Specifically, the majority voting and difference-value methods are utilized to determine whether the pixel-squares (PixS) needs to be refined or not. Based on the blocky structure of land cover of PolSAR images, SPF refines the PixS simultaneously. Therefore, it is more productive than current methods worked on pixel level. The proposed algorithm is measured on three typical PolSAR datasets, and better or comparable accuracy is obtained compared with other state-of-the-art methods.

1. Introduction

Polarimetric synthetic aperture radar (PolSAR) image classification have been extensively used in topographic mapping, natural disaster monitoring, quantitative statistics on vegetation coverage, and urban and rural planning [1,2,3]. In recent years, deep learning models have been utilized to classify optical image and achieved superior accuracy [4]. Nevertheless, imaging mechanism of PolSAR images is different from that of the optical images [5]. These models achieve weak performance working on PolSAR images directly [6].
Before the deep learning models are applied to image classification, many traditional algorithms have been proposed. They focus on developing feature extractors and classifiers that are divided into two parts. The first part aims to design the filters associated with corresponding features. For example, wavelet transform filter is exploited to extract local features [7]. Markov random field and Fisher discriminant analysis are employed to learn spatial features between adjacent pixels [8,9]. In addition, Gabor wavelet filtering is used to extract texture and edge information in different directions [10]. 3D-Gabor filter is employed to generate multiple cubes for active learning [11]. The second part designs classifier via the obtained features to achieve classification tasks, including hierarchical classifier [12], wavelet transform classifier [13], and complex Wishart classifier [14]. Others such as k-nearest neighbor classifier used in [15], improve the classification accuracy significantly. Assisted by SVM and random forest classifier [16,17], Uhlmann et al. annotated PolSAR images according to color features and artificial designed features of PolSAR images [18]. These algorithms have shown better performance, but there is still a need to design hand crafted feature extractors and select suitable classifiers based on experiences that not only spends much time on designing models but also produces poor generalization performance.
With the significant breakthrough of convolutional neural network (CNN), it has performed in optical image classification task [19]. Deep learning methods are introduced in PolSAR images classification. For example, Zhang et al. exploited stack sparse autoencoder to extract spatial sparse features, reducing the effect of speckle noise on pixel level [20]. Geng et al. applied deep recurrent encoding neural networks (DRENNs) to extract contextual information of SAR images [21]. In [22], stack autoencoder is utilized to extract PolSAR features from synthetic target database firstly. Then the classifier, which is constructed of multi-layer perceptron network, is used to label the urban area. Nonetheless, in these models, adequate training datasets are required to achieve high classification accuracy. However, attaining sufficient training samples is difficult because of the rarity and confidentiality of remote sensing images. Consequently, Shang et al. added an information encoder to CNN to increase samples’ utilization [23]. Gao et al. obtained joint feature map using CNN and Multiple Feature Learning to increase the discriminant performance of the features [24]. There are also many unsupervised feature extraction methods, such as sparse autoencoder (SAE) [25], convolutional autoencoder (CAE) [26], multilayer autoencoder with a restriction using Euclidean distance [27], discriminant Analysis with Graph Learning (DAGL) [28], multilayer autoencoders and self-paced learning (SPL) [29], Wishart autoencoder (WAE) and Wishart convolutional autoencoder (WCAE) [30], and Wishart deep belief network (W-DBN) [31]. Specifically, the prior information of Wishart distribution of PolSAR data are used in WAE and WCAE, which increase the accuracy rate by over 2%. W-DBN is composed of the Wishart-Bernoulli restricted Boltzmann machine (WBRBM), achieving better classification performance based on unsupervised pre-training and fine tuning. However, only the real value of coherence matrix or covariance matrix of pixels of PolSAR images is used among these algorithms. To solve this problem, Zhang et al. introduced phase information to CNN and proposed a complex-valued CNN (CV-CNN) [6], which had achieved comparable accuracy and verified the significance of phase information in PolSAR image. But massive annotated training datasets are needed in CV-CNN.
To alleviate the problem that CAE cannot extract the features of PolSAR image adequately with tiny amounts of training datasets, complex-valued convolutional autoencoder network (CV-CAE) is proposed in this paper. Firstly, CV-CAE extracts features from unannotated complex-valued input patch, then training complex-valued fully connected network (CFC) and fine tune CV-CAE with annotated training data. Experiments with three typical datasets show that the classification accuracy can be further improved. Nowadays, many post processing methods have been introduced into the PolSAR image classification. Among them, Liu et al. proposed the Cleaning algorithm, in which Bayesian theory and local spatial information are employed to rectify the class of each pixel [31]. In [32], refined spatial-anchor graph is proposed to reassign the border pixels using majority voting and distance measurement. These methods increase the classification accuracy by refining the pixels one by one. Therefore, Considering the efficiency of postprocessing methods, SPF is proposed in this paper by calculating blocky land cover structure of preliminary classified map. SPF uses majority voting and difference-value to determine whether the refined condition is met or not, and then refines the class of all pixels within the PixS. Therefore, compared with pixel level refinement, SPF can obtain higher refinement efficiency. The proposed algorithm is evaluated using three PolSAR datasets, and achieve better accuracy than other compared algorithms.
The rest of this paper is strctured as follows. Section 2 describes the framework of proposed CV-CAE and SPF in details. Data preprocessing and experimental analysis are introduced in Section 3. The conclusion is discussed in Section 4.

2. Classification Based on CV-CAE Network

In our work, considering the phase and amplitude information of PolSAR images. CV-CAE network is proposed by extending the unsupervised model CAE to complex domain. In order to promote the efficiency of pixel level refinement, a post processing method, SPF, is adopted. The architecture and the training process of CV-CAE, along with the implementation method of SPF are outlined in the following.

2.1. The Framework of the Proposed Algorithm

The framework of CV-CAE, depicted in Figure 1, consists of the feature extraction and classification. Which are marked with the red and blue dotted box respectively. Detailed explanation is as follows. Firstly, the network in red box extracts features. Then classification network that formed with encode part of CV-CAE after training and the CFC achieves classification task. Where C 11 i is the first channel value of each pixel in the ith input patch, C ^ 11 i is the decoding value of C 11 i , c n is the nth value of classification result, and n indicates the number of terrain type.

2.1.1. CV-CAE

CV-CAE consists of four complex-valued parts, which are input, output, encoding, and decoding. The configuration of CV-CAE is given in Table 1. The encoding includes convolution and mean pooling corresponding to the second and third layer. Next two layers, upsampling and deconvolution, are the components of decoding. Sigmoid activation function is utilized in CV-CAE.
In Table 1, the structure and parameters of convolutional layer and mean pooling layer are represented by “Conv. feature mappings number (kernel size)/activation function” and “Mean-Po. Stride (pooling size)”. In addition, the structure and parameters of the next two layers are similar to those of the two layers. Classification network is formed with Fully connected layer. Output size is the number of output feature mapping. N is the number of terrain type.
Spatial features play a pivotal role in classification of PolSAR images. Therefore, input of CV-CAE is a complex-valued patch that cropped from original PolSAR images. As shown in Figure 1, C 11 i , C 12 i , C 13 , i , C 22 i , C 23 , i and C 33 i are complex-valued pixel values of six channels in the i t h input patch. Considering the terrain type of PolSAR images [33,34], the size of 12 × 12 is selected as input patch. On the one hand, this size is big enough to contain the spatial feature that is needed for classification. On the other hand, with the smaller input size, the computational efficiency is increased and the risk of over-fitting is prevented [23].
In encoding part, complex-valued convolution extracts discriminant features for classification task from the complex-valued input patch. They are different from that of real-valued convolution for these features include spatial and polarized information. All parameters in complex-valued convolutional operation are complex value. Specifically, the i t h complex input patch is X i c l W 1 × H 1 × C , where l is the layers’ number, and c is the number of channels ( c = 1 , 2 , · · · , C ). The output corresponding to the i t h input is y i k l W Y × H Y × K , k is the number of feature mappings. The complex-valued convolution is defined as
y i k l = c = 1 C X i c l W i k l + b k l = c = 1 C r e a l X i c l · r e a l W i k l i m a g X i c l · i m a g W i k l + j c = 1 C r e a l X i c l · i m a g W i k l + i m a g X i c l · r e a l W i k l + b k l
where r e a l · and i m a g · are real and imaginary part of the complex value ·. Character * represents convolutional operation. W i k l is the convolutional kernel of size W 2 × H 2 × C × K . Generally, kernels with size of 3 × 3 or 5 × 5 are recommended because they are more effective in feature extraction than others [35,36]. b k l is bias. The parameters of CV-CAE to be trained in the l t h layer of convolutional operation are W i k l and b k l . In complex domain, whose number is two times that of the real field. That is 2 × W 2 × H 2 × C × K + K . For a convolutional operation with stride S and zero-padding P, the size of feature mappings of convolution result is calculated by
W Y = W 1 W 2 + 2 P / S + 1 H Y = H 1 H 2 + 2 P / S + 1
In Equation (1), only linear transformation is performed on the input data. In order to obtain improved generalization and robustness of CV-CAE, nonlinear operations must be adopted. In neural networks, sigmoid and ReLU are the two commonly recommended [37]. In addition, they showed good performance on nonlinear transformation and accelerated the speed of training. In CV-CAE, the complex-valued nonlinear operation is defined as
Y i k l = σ r e a l y i k l + j σ i m a g y i k l
where σ z = 1 1 + e z denotes sigmoid activation function. Y i k l , the size same as y i k l , is the result of complex-valued nonlinear transformation.
Pooling is reducing the dimension of its input features based on similarity, which not change the number of channels at all. By means of pooling, the pivotal features are preserved and the redundant information is reduced. Therefore, the calculation and convergence of networks are more efficient. In neural networks, the most useful pooling operations are max-pooling and mean-pooling. Pooling size and stride are dominant parameters. Appropriate parameter values not only eliminate redundant information but also retain the discriminant features. Based on the previous experience, the pooling size 2 × 2 or 3 × 3 and stride 2 are commonly recommended.
No padding convolution with kernel size 5 and stride 1 is employed in encoding part. The number of convolutional kernels is 12. In the complex domain, max pooling cannot be directly adopted. So the mean pooling with a pooling size 2 and a stride 2 is exploited in CV-CAE. According to Equation (2), with the complex-valued input patch size of 12 × 12 × 6 , the size of feature mappings after convolution and mean pooling operation are 8 × 8 × 12 and 4 × 4 × 12 .
Decoding part consists of uppooling and deconvolution, and it is the inverse process of encoding, which aims to reconstruct the input of encoding. In uppooling, feature mappings of encoding are extended by utilizing the location information retained in the pooling process. There are different extension methods with diverse pooling operation. For inverse mean pooling, the result is the case that a pixel value in the feature maps is copied to all positions within the pooling size. Deconvolution, also called transposition convolution, is the inverse process of convolution. In deconvolution, the sparse image representation generated by uppooling is reconstructed to the identical resolution as input patch of encoding. Deconvolution result Y ˜ i c l is calculated by
Y ˜ i c l = σ r e a l y ˜ i c l + j σ i m a g y ˜ i c l
y ˜ i c l = k = 1 K Y i k l W ˜ i c l + b ˜ c l = k = 1 K r e a l Y i k l · r e a l W ˜ i c l i m a g Y i k l · i m a g W ˜ i c l + j k = 1 K r e a l Y i k l · i m a g W ˜ i c l + i m a g Y i k l · r e a l W ˜ i c l + b ˜ c l
The parameters to be trained in deconvolution are W ˜ i c l and b ˜ c l . Where W ˜ i c l is the deconvolution kernel with size W 2 × H 2 × K × C . The number of parameters is 2 × W 2 × H 2 × K × C + C , C is the number of bias b ˜ c l . In decoding part, the input features of the uppooling are also the feature mappings of encoding. The size of which is 4 × 4 × 12 . The output of uppooling with size of 8 × 8 × 12 . Kernels size 5 × 5 and the number of output features 6 are employed in deconvolution. Therefore, the output size of decoding is 12 × 12 × 6 .

2.1.2. Classification Network

Encoding of CV-CAE after training and CFC are included in the classification network. Encoding part has been elaborated in Section 2.1.1 and will not be repeated here. The input of CFC Y i k ( l ) is a vector that is obtained by reshaping encoding result Y ˜ i c l . The number of input neurons is equal to the number of the elements in this vector. The result of CFC is O i n l , n is the number of neurons in a complex-valued output layer (n = 1, 2, ⋯, N), which is also the number of terrain type of PolSAR images. Therefore, O i n l can be described as
O i n l = σ r e a l o i n l + j σ i m a g o i n l
o i n l = k = 1 K Y i k l · W i n l + b n l
where character · represents dot product operation. The parameters to be trained are weights W i n l and bias b n l . In CFC, the number of input elements are 4 × 4 × 12 . And the number of bias b n l is the neurons N of output layer. N is varied in different datasets.

2.2. Network Training

In CV-CAE, there are two stages of training. Firstly, unannotated datasets are utilized to train CV-CAE. The encoding of CV-CAE after training is employed to extract features. Then, annotated dataset are applied to train the CFC and fine tune the encoding part. The detailed procedure is as follows.

2.2.1. CV-CAE Training

The training of CV-CAE is to minimize the loss function J θ , which aims to reconstruct the input of CV-CAE by optimizing the parameters θ . θ includes convolutional kernel W i k l , W ˜ i c l and bias b k l , b ˜ c l . In CV-CAE, the reconstruction error J θ with input X i c l and output Y ˜ i c l can be calculated by
J θ = 1 2 N i = 1 N r e a l Y ˜ i c l r e a l X i c l 2 + i m a g Y ˜ i c l i m a g X i c l 2
where l = 1 , 2 , · · · , L and c = 1 , 2 , · · · C represent network layers and channel numbers respectively. The W ˜ i c l and b ˜ c l in decoding can be updated iteratively using the following Equations.
W ˜ i c l = W ˜ i c l η J θ W ˜ i c l
b ˜ c l = b ˜ c l η J θ b ˜ c l
As can be known from Equation (8), J θ is a function of parameter θ . To solve Equations (9) and (10), finding the partial derivatives J θ J θ W ˜ i c l W ˜ i c l and J θ J θ b ˜ c l b ˜ c l are needed. By imitating the real-valued solution process and extending the chain rule to the complex domain, the result can be defined as
J θ W ˜ i c l = J θ r e a l W ˜ i c l + J θ i m a g W ˜ i c l = J θ r e a l Y ˜ i c l r e a l Y ˜ i c l r e a l W ˜ i c l + J θ i m a g Y ˜ i c l i m a g Y ˜ i c l r e a l W ˜ i c l + j J θ r e a l Y ˜ i c l r e a l Y ˜ i c l i m a g W ˜ i c l + J θ i m a g Y ˜ i c l i m a g Y ˜ i c l i m a g W ˜ i c l
with Equations (6)–(8), the second and third term in Equation (11) are zero. So there are two terms in Equation (11). The result can be calculated by utilizing same methods in bias b ˜ c l
J θ b ˜ c l = J θ r e a l b ˜ c l + J θ i m a g b ˜ c l = J θ r e a l Y ˜ i c l r e a l Y ˜ i c l r e a l b ˜ c l + j J θ i m a g Y ˜ i c l i m a g Y ˜ i c l i m a g b ˜ c l
The same update method is exploited as in encoding. After training with unannotated dataset, the discriminant features that obtained by encoding part of CV-CAE are used as input for CFC.

2.2.2. Classification Network Training

Annotated dataset are applied to train the CFC in this section. In real-valued convolutional autoencoder (RV-CAE), softmax is used as output layer to obtain the probability of each category. However, complex-valued input data cannot attain the certain probabilistic value of every class. Therefore, the output layer is a complex-valued fully connected layer with N neurons. Mean square error (MSE) between the output of the CFC and the one-hot vector are used as loss function. In complex domain, ON value of one-hot vector is recorded as 1 + j , others are 0. The length of vector is the number of classes of the datasets. Therefore, the loss function of CFC is defined as
E = 1 2 N i = 1 N r e a l T i r e a l O i n l 2 + i m a g T i i a m g O i n l 2
where O i n l is the result of CFC, and T i is the target corresponding to the input of O i n l . In this part, the updating method of complex-valued weight W i n l and bias b n l are similar with those in encoding of CV-CAE.

2.3. Spatial Pixel-Squares Refinement

The goal of PolSAR image classification is to assign each pixel to one class. But some pixels may be misclassified into other classes, which affects the classification accuracy. In order to reduce its impact on classification accuracy, this paper proposes a post processing method called SPF based on the blocky structure of PolSAR image. The whole algorithm is summarized in Algorithm 1. For a preliminary calssified mapping of size w × h , the number of times the PixS moves in the horizontal and vertical directions is w / s and h / s , where s is the stride of PixS movement, and ⌊⌋ indicates rounding down. p i x N u m n represents the number of pixels of the n t h ( 1 n r × r ) class in PixS.
Algorithm 1: Spatial Pixel-squares Refinement
Input: Preliminary classification result size w × h , PixS size r, Stride s, Thresholds τ 0 .
while not refined all PixS do
1: Find the class with the largest number of pixels p i x N u m max in PixS.
2: If r × r / 2 < p i x N u m max < r × r
3:    Sort all classes in PixS by the number of pixels: p i x N u m max , p i x N u m 2 e d _ max , · · · .
4:    If p i x N u m max p i x N u m 2 e d _ max > τ 0
5:     Refine all classes of pixels in PixS to the one class with the largest number of pixels.
6:    end if
7: end if
end while
output: refined result.
In SPF, the most critical step is to determine the refined condition. Therefore, the majority voting and difference-value methods are used as judgement rule. Specifically, majority voting is applied to find the class with the largest number of pixels p i x N u m max in PixS. The size of PixS is r (r represent the number of pixels in each row or column, r s ). Then compare p i x N u m max with r × r / 2 to determine whether to continue processing the PixS or move to the next PixS, i.e.,
r × r / 2 < p i x N u m max < r × r
where r × r / 2 is selected to avoid that more than one category satisfies the refined condition. Here, the PixS that satisfying Equation (14) is called unstable window. In unstable window, the number of pixels belonging to each class p i x N u m 1 , p i x N u m 2 , · · · , p i x N u m n need to be calculated, and sorted it then according to pixels’ number. The queue can be represented as p i x N u m max , p i x N u m 2 e d _ max , · · · . In our work, SPF refines all the classes in unstable window that satisfy the next refined condition into the one class. Therefore, to reduce computational complexity, difference–value method is employed to calculate the difference of first two classes. Then comparing the result with setting threshold τ 0 , the next refined condition can be calculated by
p i x N u m max p i x N u m 2 e d _ max > τ 0
If both Equations (14) and (15) are established, all pixels in PixS are changed to the category with the largest number of pixels.
The diagram of SPF is shown in Figure 2. Left shows unprocessed PixS, the refined result is displayed in right picture.
In SPF, the size of the PixS is one of the most crucial factors affecting the refinement results. It is proved it by experiment that the larger size of PixS incorrectly refines other class of pixels in the edge of land cover, and smaller size affects the efficiency of refinement. The optimal result can be obtained by setting the size of PixS to 3 × 3 and the threshold τ 0 to 3.
There are three different cases of PixS to be refined in Figure 3. Each digit in the PixS represents a pixel class.
When r = 3 , we have r × r / 2 = 3 × 3 / 2 . Let n u m y c = m denote the number of pixels of class m, here m 1 , 2 , 3 . In Figure 3a, n u m y c = 3 = 6 > 3 × 3 / 2 , n u m y c = 1 = 2 , and n u m y c = 2 = 1 , p i x max p i x 2 e d _ max = 6 2 = 4 > τ 0 . Hence, the classes of all pixels in PixS are changed into 3. In Figure 3b, n u m y c = 3 = 5 > 3 × 3 / 2 , n u m y c = 1 = 3 , and n u m y c = 2 = 1 . But p i x max p i x 2 e d _ max = 5 3 = 2 < τ 0 . Therefore, this PixS should maintain unchanged. In Figure 3c, n u m y c = 3 = 3 < 3 × 3 / 2 . Keep it also unchanged.

3. Experimental Results and Discussion

3.1. PolSAR Datasets

3.1.1. PolSAR Data Preprocessing

The scattering characteristics of pixels in PolSAR images are represented by a scattering matrix S [38]. It is defined as
S = S H H S H V S V H S V V
Generally, covariance matrix or coherent matrix are used as the unit of PolSAR image [39]. In CV-CAE, covariance matrix is adopted. Covariance matrix contains all the polarization information of object obtained by radar measurement. And it is deduced from scattering matrix. The effectiveness of covariance matrix has been authenticated in [40]. According to reciprocity theorem S H V = S V H , scattering vector is x = S H H 2 S H V S V V . Covariance matrix can be calculated by the kronecker product of the x as follows
C = x x H = S H H 2 2 S H H S H V S H H S V V 2 S H V S H H 2 S H V 2 2 S H V S V V S V V S H H 2 S V V S H V S V V 2
where the superscript *, T , H represent conjugation, transposition and conjugate transposition respectively. In order to suppress the speckle noise of PolSAR images, multi-look processing is introduced in covariance matrix
C = 1 L i = 1 L x i x i H = C 11 C 12 C 13 C 21 C 22 C 23 C 31 C 32 C 33
where L is the number of looks. And x i is the scattering vector of the i t h look. It can be known from the scattering properties of the PolSAR images that elements on the principal diagonal of the covariance matrix C are real values. The rest are complex values and conjugated at the symmetric position of the main diagonal. i.e., C 12 corresponds to C 21 , C 13 corresponds to C 31 , and C 23 corresponds to C 32 are conjugated. To reduce redundancy while preserving the integrity of input information, the upper triangular elements C 11 , C 12 , C 13 , C 22 , C 23 , C 33 of C are employed as input of the CV-CAE.
In computer vision, data normalization can effectively avoid the problem of vanishing gradient and exploding gradient, and improve the convergence efficiency of propoded network [25]. So the real values (diagonal elements) and complex values (non-diagonal elements) of input data need to be preprocessed. Taking the first channel C 11 as an example of real values
C ˜ 11 = C 11 μ C 11 δ C 11 2
where C ˜ 11 is the normalized result of C 11 , μ C 11 and δ C 11 2 are the average and standard deviation of C 11 . They can be defined as
μ C 11 = 1 n i = 1 n C 11 i
δ C 11 2 = 1 n i = 1 n C 11 i μ C 11 2
Taking the second channel C 12 as an example of complex values
C ˜ 12 = C 12 μ C 12 δ C 12 2
where the average μ C 12 and standard deviation δ C 12 2 of C 12 are calculated by
μ C 12 = 1 n i = 1 n C 12 i
δ C 12 2 = 1 n i = 1 n C 12 i μ C 12 C 12 i μ C 12 ¯
Other real values ( C 22 and C 33 ) and complex values ( C 13 and C 23 ) of input data are treated in the same way.

3.1.2. PolSAR Datasets for Experiment

In this paper, three PolSAR images are used to verify the performance of the proposed algorithm. These datasets are acquired with Airborne SAR (AIRSAR) platform. Two of them show agriculture areas over Flevoland in the Netherlands. There are available online at https://earth.esa.int/web/guest/missions/esa-operational-eo-missions/envisat. And the third one is AIRSAR data over San Francisco [30]. After preprocessed, the datasets are divided into training datasets and test datasets. Training datasets are 5% and the rest are used as test datasets. The spatial resolution of the test datasets is 12 × 12 and the number of channels is 6, which are the same as that of the training datasets. Detailed analyzing is shown in the following experiments.

3.2. Comparative Algorithms

To objectively evaluate the effectiveness of the proposed method, our algorithm is compared against three state-of-the-art algorithms. They include RV-CAE, WAE, WCAE, and fixed-feature-size CNN (FFS-CNN) [41]. To ensure the fairness of comparison, firstly, the input information content of RV-CAE should be equivalent with that of CV-CAE, so the input elements of RV-CAE are designed as C 11 , C 22 , C 33 , r e a l ( C 12 ) , i m a g ( C 12 ) , r e a l ( C 13 ) , i m a g ( C 13 ) , r e a l ( C 23 ) , i m a g ( C 23 ) . Secondly, the number of parameters in CV-CAE and RV-CAE must be the same. Therefore, in the experiment, we configure the structure and the number of parameters of RV-CAE according to Table 2. In this table, “parameters” indicate the number of parameters in each layer. S R w × S R h and S C w × S C h represent the size of feature mapping of mean pooling in RV-CAE and CV-CAE respectively. N is the number of terrain type.
The structure of RV-CAE is same as that of CV-CAE. But, the input size of RV-CAE is 12 × 12 with 9 channels. However, the number of parameters in complex domain is double of that in real domain. Therefore, in order to make sure the parameters of RV-CAE same as CV-CAE, the number of feature mappings is set 16 in RV-CAE. The quantity of parameters is 5 × 5 × 9 × 16 . Which is equal to the 5 × 5 × 6 × 12 × 2 in CV-CAE. In CV-CAE and RV-CAE, the number of parameters of fully connected layer is the product of neurons number of input layer and output layer. In CV-CAE, the number of neurons of input layer is S C w × S C h , while S R w × S R h in RV-CAE, which are feature mappings of mean pooling layer after reshaping.

3.3. Results and Analysis of Experiments

3.3.1. Experiment on Flevoland Datasets of 14 Classes

The first experiment is carried on the datasets over Flevoland, which is a subset of an L-band, full PolSAR image, attained by AIRSAR platform in 1991. It is widely applied as a benchmark data for PolSAR image classification research. The Pauli RGB image and the corresponding ground-truth are exhibited in Figure 4a,b, its size is 1020 × 1024 pixels. There are in total 14 identified classes including Potatoes, Fruit, Oats, Beet, Barley, Onions, Wheats, Beans, Peas, Maize, Flax, Rapeseed, Grass, and Lucerne. Each color indicates a type of class in ground-truth map, the corresponding legends are listed in Figure 4c.
The structure of the network is shown in Figure 1. Hyperparameters were selected as follows. Firstly, unsupervised training processe is employed to train CV-CAE with learning rate 0.001. Then the annotated training data is utilized to train CFC and fine tune encoding of CV-CAE. In supervised training processes, learning rate η is 0.48, and the batchsize is 100. In CFC, the number of neurons of the input layer is 192, and the output layer is 14.
For convenience, the proposed methods CV-CAE add SPF are abbreviated to CV-CAE+SPF. The classification results of the compared algorithms and the proposed algorithm are shown in Figure 5. The notable different results are highlighted by black rectangle. Comparing Figure 5a,c, the number of misclassified pixels of CV-CAE are clearly less than that of the compared RV-CAE. And the intra-class of the classification map of CV-CAE is smoother than that of RV-CAE. As is shown in the lower left of Figure 5a,c, CV-CAE achieves the more distinguishable edge. In Figure 5a,b, the number of misclassified pixels is further depressed by CV-CAE+SPF. CV-CAE+SPF achieves the best classification result compared with other two algorithms.
The classified accuracy of each class, OA and Kappa are listed in Table 3, and the best results are shown in bolding. From Table 3, we can know that the CV-CAE+SPF obtained the best accuracy than the CV-CAE and RV-CAE. The OA of RV-CAE, CV-CAE and CV-CAE+SPF are 98.34%, 98.7%, and 98.82% respectively. And the Kappa coefficients also achieve improvement in our algorithms including CV-CAE and CV-CAE+SPF. This indicates the effectiveness of our methods. Specifically, the accuracy of Oats is 100%, which is achieved by the proposed methods. And the accuracy of Beans in CV-CAE is 92.7% while RV-CAE is only 82.9%. These results illustrate that phase information is a crucial feature in PolSAR image classified tasks.
In addition, the classification accuracy CV-CAE+SPF is further improved compared with CV-CAE in Table 3, which indicates the success of SPF. Furthermore, another experment is carried out to evaluate the effectiveness of proposed SPF. The result shown that the proposed SPF takes 4.39 s while improving the correct rate by 0.12%. And the compated algorithm (pixel-by-pixel refinement based on majority vote) takes 70.95 s while increasing the correct rate by only 0.04%. However, the proposed algorithm achieves a lower accuracy rate on Onions. From the confusion matrix of CV-CAE+SPF shown in Table 4 (Each row in the table indicates the natural class, and each column indicates the predicted class. 1 to 14 represent the Potatoes, Fruit, Oats, Beet, Barley, Onions, Wheats, Beans, Peas, Maize, Flax, Rapeseed, Grass, Lucerne), we can know that Beet, Wheats, Beans, and Maize take the large ratio of misclassified classes of Onions. Considering the ground-truth in Figure 4b, it can be found that the annotated area of Onions is smaller than other classes such as Potatoes, Barely, and Wheats. Consequently, many of the input patch is smaller than 12 × 12 in size and needed zero padding, which leads to the discriminant features cannot be extracted adequately.

3.3.2. Experiment on Flevoland Datasets of 15 Classes

In this experiment, the datasets acquired by AIRSAR platform in 1989. Pauli RGB image and the corresponding ground-truth are shown in Figure 6a,b, whose size is 750 × 1024 pixels. According to the ground-truth, there are 15 classes including Stem beans, Peas, Forest, Lucerne, Wheat, Beet, Potatoes, Bara soil, Grass, Rapeseed, Barley, Wheat2, Wheat3, Water, and Buildings. The structure of the network and the ratio of training datasets are chosen the same as the previous experiment. Here, the hyperparameters of proposed algorithms are selected the same as those in the experiments of Flevoland of 14 classes. However, learning rate η is 0.43, and the number of output neurons is 15.
As is shown in Figure 7, the classification results of WAE, WCAE, RV-CAE, FFS-CNN, CV-CAE and CV-CAE+SPF corresponding from a to f. It can be seen from Figure 7a,c that most pixels of Rapeseed are misclassified into Water and Wheat2. And in Figure 7b, many pixels of Rapeseed are misclassified into Grass and Wheat2. To evaluate the performance of the proposed method, the comparison is made between the compared methods and the proposed methods. It can be observed from Figure 7e,f that the number of misclassified pixels are lower than that of compared algorithms. Therefore, CV-CAE and CV-CAE+SPF give the best performance. In addition, the intra-class smoothness and the inter-class distinctness of the proposed algorithms are better than that of the compared algorithms.
The classification accuracy of the proposed algorithms and the compared algorithms is listed in Table 5. CV-CAE and CV-CAE+SPF achieve better OA than the compared algorithms, followed by WCAE, WAE, and RV-CAE. In this experiment, WAE performs not well in recognizing Beet, Potatoes, and Grass. The accuracy of these classes is lower than 85% while CV-CAE+SPF achieved 93.09%, 89.24% and 87.02% respectively. Moreover. RV-CAE cannot distinguish Potatoes, Grass and Buildings clearly, and discriminate Potatoes and Grass with the accuracy of 77.56% and 73.14%. But the proposed CV-CAE improves the accuracy of these two classes by 10 points compared with RV-CAE, and also achieves 100% accuracy on Bare soil. Therefore, phase information can promote the improvement of classification accuracy. In order to explicate the effect of SPF, a comparison of CV-CAE and CV-CAE+SPF is carried out, and the OA is increased by 1 point in CV-CAE+SPF, i.e., 94.31% is comparable to 93.31%. However, the result of FFS-CNN is higher than that of the CV-CAE, but lower than that of CV-CAE+SPF. Furthermore, FFS-CNN is based on the LeNet-5, which contains three convolutional layers with the size of convolutional kernel 3 × 3 and feature mappings of 100. So the parameters of FFS-CNN are much larger than those of the algorithm proposed in this paper.
To evaluate the generalization of proposed SFP, which is also used to process the preliminary classification results of the compared method. The OA is improved by 0.78%, 1.03%, 1.29%, and 0.86%, of WAE, WCAE, RV-CAE, and FFS-CNN respectively.

3.3.3. Experiment on San Francisco Datasets of 5 Classes

San Francisco Datasets, acquired by the AIRSAR platform, is adopted in this experiment. The Pauli RGB image and corresponding ground-truth are shown in Figure 8a,b. Five colors in ground-truth map represents five terrain types, which are vegetation, low-density urban, high-density urban, and developed. The legends are listed in Figure 8c. From Figure 8b, we can know that most of the annotated areas are irregular. Thus, the complexity of this experiment is higher than the previous two experiments.
In this experiment, learning rate η is 0.6, the number of output neurons is 5, network structure and other hyperparameters are same as that of the above two experiments.
Table 6 indicates the classification results of each algorithm. For WAE, the classification accuracy of Vegetation and Low-Density urban is 58.85% and 78.12%, while CV-CAE achieves significant improvement in classification accuracy. RV-CAE cannot distinguish High-Density urban clearly with the accuracy is 80.76%. The performance of WCAE is better than that of WAE and RV-CAE in Vegetation, Low-Density urban and High-Density urban. However, the accuracy of Developed category is slightly lower than that of the above two algorithms. According to the results summarized in Table 6, compared with WCAE, 1.5 points is increased of OA by CV-CAE+SPF. However, the recognition rate of CV-CAE+SPF on Vegetation and High-Density urban is lower than that of other classes. From the confusion matrix of CV-CAE+SPF shown in Table 7 (Each row in the table indicates the natural class, and each column indicates the predicted class. 1 to 5 represent Water, Vegetation, Low-Density urban, High-Density urban, Developed), we can know that there is a large proportion of these two classes of misclassification into the Low-Density urban. Therefore, the features of these two classes are similar to the Low-Density urban. It also can be verified in Figure 8b. However, associating with the phase information and SPF, CV-CAE+SPF gives the best performance. Its OA and Kappa coefficient are 97.03% and 0.96.

4. Conclusions

CAE has demonstrated significant success in computer vision. In order to take advantage of phase information of PolSAR images, the RV-CAE is extended to complex domain and CV-CAE is proposed. CV-CAE is designed to extract more discriminant features from amplitude and phase information of tiny number of unannotated training data. To fit the classification task, a small number of annotated training datasets are needed to adjust the classification network, the convolution operation of which is initialized by the trained CV-CAE. We have tested the performance of proposed CV-CAE on three PolSAR datasets and compared against several other similar models including WAE, WCAE, and RV-CAE. CV-CAE achieves the better performance than the compared algorithms. In addition, a post processing method named SPF is proposed to further improve the performance. Benefitting from the blocky structure of land cover of PolSAR images, the proposed SPF refines the class of pixels in the spatial squares at the same time, which alleviates the time-consuming problem of pixel level refinement. Compared with CV-CAE, CV-CAE+SPF further improves the classification accuracy. Future work will investigate ways of replacing a two-stage network with an end-to-end network to reduce the complexity and improve the efficiency of this network. We can also investigate the advantages of shorter time-consuming and more efficient post processing methods to achieve better results.

Author Contributions

Methodology, G.W.; Data precessing & Expermiental results analysis, G.W. and R.S.; Oversaw and suggestions, R.S. and L.J.; Writting review & editing, R.S. and M.A.O.

Funding

This work was partially supported by the National Natural Science Foundation of China under Grants 61773304, 61836009, 61772399 and U1701267, the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) under Grants No. B07048, the Major Research Plan of the National Natural Science Foundation of China under Grants 91438201 and 91438103, and the Program for Cheung Kong Scholars and Innovative Research Team in University under Grant IRT1170.

Acknowledgments

The authors would like to show their gratitude to the editors and the anonymous reviewers for their insightful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PolSARPolarimetric Synthetic aperture
CV-CAEcomplex-valued convolutional autoencoder
SPFSpatial pixel-squares refinement
PixSPixel-squares
CNNconvolutional neural network
SAEsparse autoencoder
WAEWishart autoencoder
WCAEWishart convolutional autoencoder
CFCComplex-valued fully connected
RV-CAEreal-valued convolutional autoencoder
MSEMean Square Error
OAOverall Accuracy

References

  1. Van, J.J.; Burnette, C.F. Bayesian classification of polarimetric SAR images using adaptive a priori probabilities. Int. J. Remote Sens. 1992, 13, 835–840. [Google Scholar]
  2. Shang, R.; Yuan, Y.; Jiao, L.; Hou, B.; Esfahani, A.M.; Stolkin, R. A Fast Algorithm for SAR Image Segmentation Based on Key Pixels. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 5657–5673. [Google Scholar] [CrossRef]
  3. Wang, Y.; He, C.; Liu, X.; Liao, M. PolSAR Land Cover Classification Based on Roll-Invariant and Selected Hidden Polarimetric Features in the Rotation Domain. Remote Sens. 2017, 9, 660. [Google Scholar] [CrossRef]
  4. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA, 4–9 February 2017; p. 12. [Google Scholar]
  5. Yamaguchi, Y.; Moriyama, T.; Ishido, M.; Yamada, H. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1699–1706. [Google Scholar] [CrossRef]
  6. Zhang, Z.; Wang, H.; Xu, F.; Jin, Y. Complex-valued convolutional neural network and its application in polarimetric SAR image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7177–7188. [Google Scholar] [CrossRef]
  7. Akbarizadeh, G. A New Statistical-Based Kurtosis Wavelet Energy Feature for Texture Recognition of SAR Images. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4358–4368. [Google Scholar] [CrossRef]
  8. Ghosh, A.; Subudhi, B.N.; Bruzzone, L. Integration of Gibbs Markov Random Field and Hopfield-Type Neural Networks for Unsupervised Change Detection in Remotely Sensed Multitemporal Images. IEEE Trans. Image Process. 2013, 22, 3087–3096. [Google Scholar] [CrossRef] [PubMed]
  9. Bombrun, L.; Beaulieu, J.M. Fisher Distribution for Texture Modeling of Polarimetric SAR Data. IEEE Geosci. Remote Sens. Lett. 2008, 5, 512–516. [Google Scholar] [CrossRef] [Green Version]
  10. Lee, T.S. Image Representation Using 2D Gabor Wavelets. IEEE Geosci. Remote Sens. Lett. 1996, 18, 959–971. [Google Scholar]
  11. Hu, J.; He, Z.; Li, J.; He, L.; Wang, Y. 3D-Gabor Inspired Multiview Active Learning for Spectral-Spatial Hyperspectral Image Classification. Remote Sens. 2018, 10, 1070. [Google Scholar] [CrossRef]
  12. Freeman, A.; Villasenor, J.; Klein, J.D.; Hoogeboom, P.; Groot, J. On the use of multi-frequency and polarimetric radar backscatter features for classification of agricultural crops. Int. J. Remote Sens. 1994, 15, 1799–1812. [Google Scholar] [CrossRef]
  13. Du, L.; Lee, J.; Hoppel, K.; Mango, S.A. Segmentation of SAR images using the wavelet transform. Int. J. Imaging Syst. Technol. 1992, 4, 319–326. [Google Scholar] [CrossRef]
  14. Lee, J.S.; Grunes, M.R.; Ainsworth, T.L.; Du, L.J.; Schuler, D.L.; Cloude, S.R. Unsupervised classification using polarimetric decomposition and the complex Wishart classifier. IEEE Trans. Geosci. Remote Sens. 1999, 37, 2249–2258. [Google Scholar]
  15. Hou, B.; Kou, H.; Jiao, L. Classification of polarimetric SAR images using multilayer autoencoders and superpixels. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3072–3081. [Google Scholar] [CrossRef]
  16. Zhao, Q.; Principe, J.C. Support vector machines for SAR automatic target recognition. IEEE Trans. Aerosp. Electron. Syst. 2001, 37, 643–654. [Google Scholar] [CrossRef]
  17. Loosvelt, L.; Peters, J.; Skriver, H.; Baets, B.; Verhoest, N. Impact of reducing polarimetric SAR input on the uncertainty of crop classifications based on the random forests algorithm. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4185–4200. [Google Scholar] [CrossRef]
  18. Uhlmann, S.; Kiranyaz, S. Integrating color features in polarimetric SAR image classification. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2197–2216. [Google Scholar] [CrossRef]
  19. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  20. Zhang, L.; Ma, W.; Zhang, D. Stacked Sparse Autoencoder in PolSAR Data Classification Using Local Spatial Information. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1359–1363. [Google Scholar] [CrossRef]
  21. Geng, J.; Wang, H.; Fan, J.; Ma, X. SAR Image Classification via Deep Recurrent Encoding Neural Networks. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2255–2269. [Google Scholar] [CrossRef]
  22. De, S.; Bruzzone, L.; Bhattacharya, A.; Bovolo, F.; Chaudhuri, S. A Novel Technique Based on Deep Learning and a Synthetic Target Database for Classification of Urban Areas in PolSAR Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 154–170. [Google Scholar] [CrossRef]
  23. Shang, R.; Wang, J.; Jiao, L.; Stolkin, R.; Hou, B.; Li, Y. SAR Targets Classification Based on Deep Memory Convolution Neural Networks and Transfer Parameters. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2834–2846. [Google Scholar] [CrossRef]
  24. Gao, Q.; Lim, S.; Jia, X. Hyperspectral Image Classification Using Convolutional Neural Networks and Multiple Feature Learning. Remote Sens. 2018, 10, 299. [Google Scholar] [CrossRef]
  25. Hosseini, A.E.; Zurada, J.M.; Nasraoui, O. Deep learning of part-based representation of data using sparse autoencoders with nonnegativity constraints. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2486–2498. [Google Scholar] [CrossRef] [PubMed]
  26. Masci, J.; Meier, U.; Ciresan, D.; Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the 21st International Conference on Artificial Neural Networks, Espoo, Finland, 14–17 June 2011; pp. 52–59. [Google Scholar]
  27. Deng, S.; Du, L.; Li, C.; Ding, J.; Liu, H. SAR automatic target recognition based on euclidean distance restricted autoencoder. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3323–3333. [Google Scholar] [CrossRef]
  28. Chen, M.; Wang, Q.; Li, X. Discriminant Analysis with Graph Learning for Hyperspectral Image Classification. Remote Sens. 2018, 10, 836. [Google Scholar] [CrossRef]
  29. Chen, W.; Gou, S.; Wang, X.; Li, X.; Jiao, L. Classification of PolSAR Images Using Multilayer Autoencoders and a Self-Paced Learning Approach. Remote Sens. 2018, 10, 110. [Google Scholar] [CrossRef]
  30. Xie, W.; Jiao, L.; Hou, B.; Ma, W.; Zhao, J.; Zhang, S.; Liu, F. POLSAR image classification via Wishart-AE model or Wishart-CAE model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3604–3615. [Google Scholar] [CrossRef]
  31. Liu, F.; Jiao, L.; Hou, B.; Yang, S. POL-SAR Image Classification Based on Wishart DBN and Local Spatial Information. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3292–3308. [Google Scholar] [CrossRef]
  32. Liu, H.; Yang, S.; Gou, S.; Chen, P.; Wang, Y.; Jiao, L. Fast Classification for Large Polarimetric SAR Data Based on Refined Spatial-Anchor Graph. IEEE Trans. Geosci. Remote Sens. 2017, 14, 1589–1593. [Google Scholar] [CrossRef]
  33. Ulaby, F.T.; Charles, E. Radar Polarimetry for Geoscience Applications; Artech House, Inc.: Norwood, MA, USA, 1990; 376p. [Google Scholar]
  34. Marques, P.A.; Dias, J.M. Moving Targets Processing in SAR Spatial Domain. IEEE Trans. Geosci. Remote Sens. 2007, 43, 864–874. [Google Scholar] [CrossRef]
  35. Boureau, Y.L.; Bach, F.; LeCun, Y.; Ponce, J. Learning mid-level features for recognition. In Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2018; pp. 2559–2566. [Google Scholar]
  36. Karen, S.; Andrew, Z. Very deep convolutional networks for large-scale image recognition. arXiv, 2009; arXiv:1409.1556. [Google Scholar]
  37. Xavier, G.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–14 April 2011; pp. 315–323. [Google Scholar]
  38. Li, Y.; Chen, Y.; Liu, G.; Jiao, L. A Novel Deep Fully Convolutional Network for PolSAR Image Classification. Remote Sens. 2018, 10, 1984. [Google Scholar] [CrossRef]
  39. Biondi, F. Multi-chromatic analysis polarimetric interferometric synthetic aperture radar (MCAPolInSAR) for urban classification. Int. J. Remote Sens. 2018, 1–30. [Google Scholar] [CrossRef]
  40. Chen, S.; Wang, X.; SatoLi, M. PolInSAR Complex Coherence Estimation Based on Covariance Matrix Similarity Test. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4699–4710. [Google Scholar] [CrossRef]
  41. Wang, L.; Xu, X.; Dong, H.; Gui, R.; Pu, F. Multi-Pixel Simultaneous Classification of PolSAR Image Using Convolutional Neural Networks. Sensors 2018, 18, 769. [Google Scholar] [CrossRef] [PubMed]
Figure 1. CV-CAE architecture. Red and blue boxes are the structure of the CV-CAE and classification network respectively.
Figure 1. CV-CAE architecture. Red and blue boxes are the structure of the CV-CAE and classification network respectively.
Remotesensing 11 00522 g001
Figure 2. The refinement process of SPF. the shaded part represents p i x N u m max , the blank part represents other classes.
Figure 2. The refinement process of SPF. the shaded part represents p i x N u m max , the blank part represents other classes.
Remotesensing 11 00522 g002
Figure 3. PixS to be refined in three different cases.
Figure 3. PixS to be refined in three different cases.
Remotesensing 11 00522 g003
Figure 4. Flevoland datasets of 14 classes. (a) Pauli RGB. (b) Ground-truth map. (c) Legends.
Figure 4. Flevoland datasets of 14 classes. (a) Pauli RGB. (b) Ground-truth map. (c) Legends.
Remotesensing 11 00522 g004
Figure 5. The classification results and the result overlaid with ground-truth of our algorithm and RV-CAE. (a,d) are result of CV-CAE. (b,e) are result of CV-CAE+SPF. (c,f) are result of RV-CAE.
Figure 5. The classification results and the result overlaid with ground-truth of our algorithm and RV-CAE. (a,d) are result of CV-CAE. (b,e) are result of CV-CAE+SPF. (c,f) are result of RV-CAE.
Remotesensing 11 00522 g005
Figure 6. Flevoland datasets of 15 classes. (a) Pauli RGB. (b) Ground-truth map. (c) Legends.
Figure 6. Flevoland datasets of 15 classes. (a) Pauli RGB. (b) Ground-truth map. (c) Legends.
Remotesensing 11 00522 g006
Figure 7. The classification results of our algorithms and compared algorithms. (a) WAE. (b) WCAE. (c) RV-CAE. (d) FFS-CNN. (e) CV-CAE. (f) CV-CAE+SPF.
Figure 7. The classification results of our algorithms and compared algorithms. (a) WAE. (b) WCAE. (c) RV-CAE. (d) FFS-CNN. (e) CV-CAE. (f) CV-CAE+SPF.
Remotesensing 11 00522 g007
Figure 8. San Francisco datasets of 5 class. (a) Pauli RGB. (b) Corresponding Ground-truth. (c) Legends.
Figure 8. San Francisco datasets of 5 class. (a) Pauli RGB. (b) Corresponding Ground-truth. (c) Legends.
Remotesensing 11 00522 g008
Table 1. The Framework and Parameters Configuration of CV-CAE.
Table 1. The Framework and Parameters Configuration of CV-CAE.
Layer NO.ArchitectureOutput Size (Pixels)
1Input layer12 × 12 × 6
2Conv.12 (5 × 5 × 6)/sigmoid8 × 8 × 12
3Mean-Po.2 (2 × 2)4 × 4 × 12
4Upsampl.2 (2 × 2)8 × 8 × 12
5Deconv.6 (5 × 5 × 12)/sigmoid12 × 12 × 6
6Fully connected1 × N
Table 2. The Structure and Parameters Number of RV-CAE and CV-CAE.
Table 2. The Structure and Parameters Number of RV-CAE and CV-CAE.
Layer NO.RV-CAECV-CAE
ArchitectureParametersArchitectureParameters
1Input Layer-Input Layer-
2Conv.16 (5 × 5 × 9)/sigmoid3600Conv.12 (5 × 5 × 6)/sigmoid1800 × 2
3Mean-Po.2 (2 × 2)-Mean-Po.2 (2 × 2)-
4Upsampl (2 × 2)-Upsampl (2 × 2)-
5Deconv.9 (5 × 5 × 16)/sigmoid3600Deconv.6 (5 × 5 × 12)/sigmoid1800 × 2
6Fully Connected S R w × S R h × N Fully Connected S C w × S C h × N × 2
Table 3. The OA and Kappa Coefficient of Our Algorithms and the Compared Algorithms.
Table 3. The OA and Kappa Coefficient of Our Algorithms and the Compared Algorithms.
ClassWAEWCAERV-CAECV-CAECV-CAE+SPF
Potatoes89.8399.7899.6999.7999.8
Fruit97.6288.294.7697.0998.07
Oats98.9298.2898.78100100
Beet89.6691.7290.0392.5192.77
Barley97.2795.9699.5199.7999.78
Onions81.4885.6997.4291.8890.7
Wheats89.4794.9199.7699.8699.87
Beans87.5291.0482.992.795.56
Peas89.9591.4999.9199.5499.77
Maize94.1999.0595.598.698.84
Flax94.4989.1294.0295.5496.56
Rapeseed89.6294.7399.999.9399.94
Grass84.5997.2397.3896.1296.88
Luceme96.3497.4699.7398.6198.2
OA96.5397.4998.3498.798.82
Kappa0.960.970.980.9840.986
Table 4. The Confusion Matrix of CV-CAE+SPF.
Table 4. The Confusion Matrix of CV-CAE+SPF.
%1234567891011121314
199.800000.030.0100000.1300.02
20.3298.0700.260.210.030.0301.0300.05000
3001000000000001000
400092.770.016.830.030.040000.0100
5000.02099.7800.19000000.020
60.38001.60.3890.71.171.460.141.550000.05
7000.09000.0399.87000000.090.01
8000004.25095.56000000
9000000.230099.7700000
100000.2300.8500098.84000.080
1100000.050.4201.530096.5601.440
1200.0300000.02000099.9400
130.6400001.051.430000096.880
14000001.420.3700000098.2
Table 5. The OA and Kappa Coefficient of Our Algorithms and the Compared Algorithms.
Table 5. The OA and Kappa Coefficient of Our Algorithms and the Compared Algorithms.
ClassWAEWCAERV-CAEFFS-CNNCV-CAECV-CAE+SPF
Stem beans88.0293.0988.29392.2593.56
Peas91.4992.3687.9593.2192.2693.52
Forest97.8998.7497.1298.9798.7499.21
Lucerne88.589.2290.6991.9891.1892.24
Wheat91.4894.5194.7295.4194.8995.38
Beet84.791.0180.2591.8590.993.09
Potatoes81.9487.2177.5688.6386.9389.24
Bare soil97.9299.4210099.0999.1299.35
Grass69.8282.1773.1485.9184.4287.02
Rapeseed92.6691.0391.9193.5492.9693.24
Barley96.8993.8394.3494.3493.4594.88
Wheat287.8490.9188.9891.0990.4291.65
Wheat394.8596.695.2997.0896.7697.13
Water98.6796.6699.0597.7697.5697.72
Buildings86.5587.0982.1490.5590.5590.13
OA90.7492.9490.399493.3194.31
Kappa0.90.920.890.9350.930.94
Table 6. The OA and Kappa Coefficient of Our Algorithms and the Compared Algorithms.
Table 6. The OA and Kappa Coefficient of Our Algorithms and the Compared Algorithms.
ClassWAEWCAERV-CAECV-CAECV-CAE+SPF
Water99.9198.2495.5199.4699.5
Vegetation58.8591.3487.8793.7193.77
Low-Density urban78.1296.8890.5697.5897.65
High-Density urban81.4391.2980.7693.0693.26
Developed94.0893.6394.1595.5495.88
OA87.8795.4490.8496.9497.03
Kappa0.810.930.860.950.96
Table 7. The Confusion Matrix of CV-CAE+SPF.
Table 7. The Confusion Matrix of CV-CAE+SPF.
%12345
199.50.440.010.040
20.3993.772.771.471.59
300.3697.651.990
400.16.0493.260.45
503.280.280.5695.88

Share and Cite

MDPI and ACS Style

Shang, R.; Wang, G.; A. Okoth, M.; Jiao, L. Complex-Valued Convolutional Autoencoder and Spatial Pixel-Squares Refinement for Polarimetric SAR Image Classification. Remote Sens. 2019, 11, 522. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11050522

AMA Style

Shang R, Wang G, A. Okoth M, Jiao L. Complex-Valued Convolutional Autoencoder and Spatial Pixel-Squares Refinement for Polarimetric SAR Image Classification. Remote Sensing. 2019; 11(5):522. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11050522

Chicago/Turabian Style

Shang, Ronghua, Guangguang Wang, Michael A. Okoth, and Licheng Jiao. 2019. "Complex-Valued Convolutional Autoencoder and Spatial Pixel-Squares Refinement for Polarimetric SAR Image Classification" Remote Sensing 11, no. 5: 522. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11050522

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop