Enhancing the Localization Accuracy of UAV Images under GNSS Denial Conditions

Gao, Han; Yu, Ying; Huang, Xiao; Song, Liang; Li, Li; Li, Lei; Zhang, Lei

doi:10.3390/s23249751

Open AccessArticle

Enhancing the Localization Accuracy of UAV Images under GNSS Denial Conditions

¹

School of Geospatial Information, Information Engineering University, Zhengzhou 450001, China

²

31016 Troops, Beijing 100088, China

³

61175 Troops, Nanjing 210049, China

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(24), 9751; https://0-doi-org.brum.beds.ac.uk/10.3390/s23249751

Submission received: 3 November 2023 / Revised: 4 December 2023 / Accepted: 5 December 2023 / Published: 11 December 2023

(This article belongs to the Section Navigation and Positioning)

Download

Browse Figures

Versions Notes

Abstract

:

Unmanned aerial vehicles (UAVs) are widely used in many industries. The use of UAV images for surveying requires that the images contain high-precision localization information. However, the accuracy of UAV localization can be compromised in complex GNSS environments. To address this challenge, this study proposed a scheme to improve the localization accuracy of UAV sequences. The combination of traditional and deep learning methods can achieve rapid improvement of UAV image localization accuracy. Initially, individual UAV images with high similarity were selected using an image retrieval and localization method based on cosine similarity. Further, based on the relationships among UAV sequence images, short strip sequence images were selected to facilitate approximate location retrieval. Subsequently, a deep learning image registration network, combining SuperPoint and SuperGlue, was employed for high-precision feature point extraction and matching. The RANSAC algorithm was applied to eliminate mismatched points. In this way, the localization accuracy of UAV images was improved. Experimental results demonstrate that the mean errors of this approach were all within 2 pixels. Specifically, when using a satellite reference image with a resolution of 0.30 m/pixel, the mean error of the UAV ground localization method reduced to 0.356 m.

Keywords:

GNSS denial; deep learning; localization; satellite image; unmanned aerial vehicle

1. Introduction

Unmanned aerial vehicles demonstrate adaptability to complex terrains and operate at high speeds, enabling their successful application across diverse fields such as agriculture [1], firefighting [2,3,4], express transportation [5], field search and rescue [6,7], location tracking [8], and military operations [9]. These drones rely on high-precision localization information obtained from the Global Navigation Satellite System (GNSS). However, in challenging environments such as canyons, forests, deserts, water bodies, urban settings, and areas affected by human interference [10,11,12], airborne GNSS receivers are prone to problems such as signal attenuation, interference, and multipath effects. These issues lead to unreliable UAV localization [13,14,15]. Consequently, accurate localization data become unattainable for UAVs in these harsh conditions, resulting in UAV images that lack precise localization information. When used for surveying, UAV images should contain high-precision localization information. Therefore, in the case of GNSS denial, the UAV localization method proves inadequate for practical purposes and requires the use of external data to improve the accuracy of UAV images.

Schleiss [16] proposed a method for converting aerial images into street map-like representations using the conditional generative adversarial network (cGAN) [17] to achieve the visual self-localization of UAVs. However, this method exhibited a median localization error of 22.7 m, with the median error for the entire dataset being approximately 40 m, indicating a relatively large positioning error. Yol et al. [18] proposed a similarity function based on mutual information [19], which stitched a series of geographic images into a map as the localization reference. While effective in textured urban environments, this method demonstrated poor robustness in areas with weaker textures. Shan et al. [20] used the histogram of oriented gradient (HOG) [21] features for the registration of UAV frames with satellite images. Subsequently, particle filtering algorithms were utilized to localize UAVs. This method was not robust and required clear texture features, such as buildings and roads. Filho et al. [22] proposed a method based on feature recognition to identify landmark buildings and then determine UAV localization information. This method required pre-configured building images in the task area to achieve feature extraction and UAV image localization, which was not conducive to rapid localization of UAV images. Goforth et al. [23] trained a deep learning model based on the convolutional neural network (CNN) using existing satellite images. They passed UAV images and geographic reference satellite images through a CNN consisting of VGG16 neural network layers [24] and matched them, and an average localization error of less than 8 m was achieved. The method required pre-training of the model using similar images in the task area. Thus, the robustness was not high for other scenes. Saranya et al. [25,26] used satellite images with geographic information as the reference to determine the location of the UAV with the SURF algorithm as the key point detector and the RANSAC error optimization algorithm. This method had a certain robustness for rotating and scaling images but was not validated for cross-scale images.

The aforementioned methods were robust to rotation, scaling, and other transformations of the reference images. However, for cross-scale images, their robustness was poor, and the localization error was large, making it difficult to handle the entire process. In response to the shortcomings of the aforementioned methods in terms of poor cross-scale image adaptability, low localization accuracy, and lack of full-process solutions, this study proposed a method for improving the localization accuracy of UAV images with the assistance of satellite images. First, the UAV images without localization information were used to simulate the images captured under GNSS denial conditions. Commercial satellite images were utilized as reference images, and the strip relationship between UAV sequence images was used to assist in approximate location retrieval from UAV images. Subsequently, feature extraction, registration, and error removal were performed on satellite images with localization information to obtain corresponding relationships between images, thereby achieving high-precision localization of UAV images. This comprehensive and automated approach can significantly enhance localization accuracy, particularly in GNSS-denied environments. Experimental results validate the feasibility and effectiveness of the proposed method.

2. Principles and Methods

This study aimed to improve the localization accuracy of UAV images under GNSS denial conditions. To achieve this objective, the study followed a specific technical approach: first, it employed an image retrieval and localization method based on cosine similarity to identify single UAV images with high similarity. Subsequently, leveraging the inherent relationships within UAV sequence images, short strip sequence images were selected to aid in approximate location retrieval, resulting in the estimation of UAV image positions within high-precision satellite images. Subsequently, a deep learning image registration network, combining SuperPoint and SuperGlue, was utilized for precise feature point extraction and matching. Additionally, the RANSAC algorithm was applied to eliminate mismatched points. Through these methods, the localization accuracy of UAV images was significantly improved. Finally, the acquired feature points were utilized to calculate the homography matrix of the UAV and satellite images, enabling precise localization of the UAV image.

The key technical flowchart of this study is shown in Figure 1. Figure 1a shows the input image and pre-processing stage. The input images consisted of original satellite images and sequence UAV images without localization information. Subsequently, the sequence UAV and satellite images underwent pre-processing, which included re-encoding, slicing, and scale normalization. Figure 1b shows the process of approximate location retrieval for UAV sequence images. The cosine similarity between the UAV image and satellite image was computed, enabling the approximate location retrieval of a single UAV image on the satellite image. Simultaneously, leveraging the strip relationship between the sequence UAV images facilitated the rapid extraction of neighboring UAV images, achieving quick approximate location retrieval and localization of UAV short strip sequence images. Figure 1c shows the accurate matching and accuracy improvement of UAV images. This study used the SuperPoint algorithm and SuperGlue neural network. Feature extraction and matching of UAV images without localization information were performed based on satellite images with localization information. Moreover, the RANSAC algorithm was used to optimize the matching results and eliminate mismatched points, leading to the derivation of the homography matrix. Consequently, the corresponding relationship between UAV images and satellite images was established. High-precision geographic coordinates could be quickly determined based on the pixel coordinates of UAV images, ultimately resulting in UAV images with high-precision localization information.

2.1. Approximate Location Retrieval of UAV Sequence Images

2.1.1. Image Pre-Processing

To address the differences in resolution and size between satellite images and UAV images, as well as the limitations imposed by deep learning networks and computer hardware on input image sizes, this study employed a geocoding method to process cross-scale remote sensing images from various sources.

Initially, satellite images and high-resolution UAV images were geocoded based on latitude and longitude, and then, by comparing the proportion of ground objects taken by UAV images and satellite images, the scale difference proportion of the image scale can be estimated by proportion conversion and prior knowledge. According to the estimated proportion, the encoded image can be scaled, and then the image is sliced according to the coding rules. This approach effectively reduced the scale of the processed data, thereby enhancing processing efficiency. Subsequently, the segmented images were systematically renamed and arranged according to the geocoding, ensuring orderly data processing and establishing a unified coordinate system. This systematic arrangement facilitated the subsequent extraction and matching of feature points. On this basis, the UAV images were used as candidate images for sequence matching with the query images, and feature points were extracted. This method can effectively cope with the differences in resolution and size between satellite and UAV images. Further, it can adapt to the limitations of the input image size imposed by the deep learning network and performance of computer hardware. Through these steps, UAV images with high-precision localization information can be obtained. The pixel coordinates of UAV images can be utilized to quickly obtain high-precision geographic coordinates. This method not only improves the localization accuracy of UAV images but also effectively handles cross-scale remote sensing images from different sources. It demonstrated practicality and scalability in its application.

2.1.2. UAV Sequence Image Extraction

A UAV captures a series of images along a specific strip during its flight, and these images exhibit strong correlations. Among these, the positional relationship between adjacent images can be determined through the relative orientation of continuous image pairs. Hence, this study utilized the strip relationships among UAV images to approximate their locations.

In this study, we present a novel and efficient method for UAV approximate location computation based on short strips. This method leverages the strip relationships to merge UAV images, enabling more accurate estimation of image locations and enhancing computational efficiency. Initially, this study registered the UAV sequence images with reference satellite images and calculated the cosine similarity between each pair of images. Subsequently, the UAV image with the highest cosine similarity was chosen as the center image of the strip. Additionally, the two images before and after the center image were selected, forming a sequence of five images with coarse overlapping (refer to Figure 2). Finally, the short strip was considered as a whole entity, and its approximate location was calculated separately using the satellite images.

2.1.3. Calculation of Approximate Location

GNSS denial often occurs in smaller areas. Therefore, the coverage range of the satellite images selected for the experiment in this study was relatively small. In this context, this study compared UAV images with satellite images with localization information. Through meticulous feature extraction and registration processes, the study aimed to identify similarities between them. In this process, this study used a cosine similarity-based algorithm to roughly retrieve the location of the sequence UAV images on satellite images. Cosine similarity is a method of measuring the degree of similarity between two vectors in a vector space by calculating the cosine of the angle between them. If the angle between two vectors is close to 0°, their cosine similarity is high, indicating a high similarity between the two images. On the contrary, if the angle between two vectors is close to 90°, their cosine similarity is low, indicating a low similarity between the two images. The relative position of the UAV image on the satellite image can be retrieved quickly by this method. It can avoid the limitations of deep learning networks and computer hardware performance on the size of the input image. Moreover, this method is efficient and fast due to the simplicity and high efficiency of the cosine similarity computation. Additionally, its lightweight deployment eases the burden on hardware, rendering it highly practical and scalable.

Notably, this study conducted comparative experiments on various traditional image retrieval methods, such as average hashing, interpolation hashing, perception hashing, wavelet hashing, and cosine similarity detection. It was experimentally verified that the cosine similarity detection method had a good effect on the similarity detection of cross-scale remote sensing images and UAV images from different sources. Therefore, during image pre-processing, this study used the cosine similarity detection method (refer to Figure 3). The principle is as follows:

The cosine of the angle between two vectors was utilized to measure the similarity of two images as follows:

\cos (θ) = \frac{A \cdot B}{∥ A ∥ ∥ B ∥} = \frac{\sum_{i = 1}^{n} (x_{i} \times y_{i})}{\sqrt{\sum_{i = 1}^{n} {(x_{i})}^{2}} \times \sqrt{\sum_{i = 1}^{n} {(y_{i})}^{2}}}

(1)

In the above equation, both A and B are eigenvectors, and θ is the angle between them in vector space. The cosine similarity between the reference satellite images and the UAV sequence images was calculated as a feature to evaluate the similarity between them.

Figure 3. Schematic for calculating the approximate location of a UAV.

2.2. Accurate Matching and Accuracy Improvement of UAV Images

2.2.1. Feature Point Extraction Based on SuperPoint Network

SuperPoint is a fully connected CNN framework [27]. The algorithm uses a lightweight convolutional neural network (CNN) to detect key points in images. The network consists of a common encoder and two decoders, which are the feature encoder, key point decoder, and descriptor decoder, as shown in Figure 4. When the algorithm processed the full-size image, it first processed and reduced the dimension of the input image through the shared feature encoder, which reduced the size of the image and obtained the image features.

Shared feature encoder. This part is constructed by the VGG modified lightweight fully convolutional neural network to encode the image. It consists of three parts: convolutional layer, Max-Pooling layer and nonlinear activation layer, there are a total of eight 33 convolutional layers, the number of convolution kernels is 64, 64, 64, 64, 128, 128, 128, 128, and the step size is 1. After each convolutional layer, the ReLU function is used, and the image size can be changed into one eighth of the output through three Max-Pooling layers.

The feature decoder. This part is divided into a feature point decoder and descriptor decoder. The feature point decoder is responsible for detecting interest points in the downscaled image, which is composed of three parts: point-by-point convolution layer, Softmax activation function, and tensor transformation function. By calculating the probability value of each pixel in the picture, it can determine whether the point is a feature point. The information contained in a pixel of the input picture to the encoder corresponds to the pixel information in the 8 × 8 area of the original image, and there are 64 channels. The encoder also sets another situation without feature points, a total of 65 channels, using multiple classifiers to remove the channels without key points, the image is changed from W/8 × H/8 × 65 to W/8 × H/8 × 64, the image is transformed from W/8 × H/8 to W × H after the reshape, and the number of channels is reduced to 1. The extracted feature points can be obtained in the output image.

The descriptor decoder part. This part consists of three parts: a convolutional layer, an interpolation function, and L2 normalization. Its role is to generate interest point descriptors for the downscaled image. It will preferentially learn the semi-dense descriptor, and then obtain the description of unit length by the bi-cubic interpolation method and L2 normalization, and then establish the loss function based on the corresponding feature points and the corresponding feature point descriptor. The algorithm then constructs the loss function based on the corresponding feature points and their descriptors. The final loss function is the sum of the losses from the feature point detector and descriptor, as shown in Equation (2).

L (X, X^{'}, D, D^{'}, Y, Y^{'}, S) = L_{p} (X, Y) + L_{p} (X^{'}, Y^{'}) + λ L_{d} (D, D^{'}, S)

(2)

L_p is the loss associated with the feature point, L_d is the loss associated with the feature vector,

X, Y, D

are the feature points extracted by the convolutional network from three directions,

L_{p} (X, Y)

and

L_{d} (X^{'}, Y^{'})

are the loss functions of the feature point before and after the processing of the same image, respectively, λ is the weighting parameter, and

L_{d} (D, D^{'}, S)

is the loss function of the corresponding feature point descriptor.

2.2.2. Feature Point Matching Based on SuperGlue Network

In this study, the SuperGlue network was used to accomplish feature matching between UAV sequence images and satellite images. The SuperGlue network comprises two main modules: the attention graph neural network (GNN) and optimal matching. In image matching, learning feature matching involves finding a partial assignment between two sets of local features. This assignment problem is transformed into an optimal transportation problem through the solution of a linear assignment problem. The cost function for this optimization is predicted using a GNN. Inspired by the Transformer, it takes advantage of attention within and between images to utilize the spatial relationships of key points and their visual appearance. Consequently, the allocation structure of prediction is strengthened, allowing for effective handling of occluded and duplicate key points. The processing flowchart for the entire algorithm is generally divided into two frameworks: the attention GNN and optimal matching layer (Figure 5). The attention GNN is further segmented into the key point encoder and image attention mechanism. The key point encoder generates feature vectors, while the image attention mechanism produces the feature-matching vector. The optimal matching layer is responsible for obtaining the optimal matching score matrix and outputting the matching results.

Two modules are briefly described below. Suppose that the total number of feature points in images A and B is M and N, respectively. The position of the feature points and the description vector in image A are denoted as:

{p_{i}^{A}, i = 1, 2, \dots, M}

and

{d_{i}^{A}, i = 1, 2, \dots, M}

; The feature point positions and description vectors in image B are denoted as:

{p_{i}^{Β}, i = 1, 2, \dots, M}

and

{d_{i}^{Β}, i = 1, 2, \dots, M}

.

In the attention graph convolutional network, the position of the feature point is firstly increased by a feature point encoder composed of multilayer perceptron (MLP). The high-dimensional vector is obtained and added with the feature descriptor vector to obtain the initial representation of each feature point. That is:

^{(0)} x_{i} = d_{i} + M_{_{LP}} (p_{i})

(3)

Then, a multivariate graph is constructed. The vertices of the graph are all the feature points in the two images, and the edges contain two types of edges: intra-image edges and cross-image edges. Intra-image edges connect pairs of feature points in a single image, while cross-image edges connect pairs of feature points from two images. After the graph is constructed, the features of all vertices in the graph are aggregated and updated by using the message passing mechanism. It is updated by:

^{(l + 1)} x_{i}^{A} =^{(l)} x_{i}^{A} + MLP ([^{(l)} x_{i}^{A} ∥ m_{ε \to i}])

(4)

where:

[\cdot | | \cdot]

denotes join;

m_{ε \to i}

is the information passed to node i from all other nodes in the graph through the self-attention mechanism. After updating by information aggregation, the description vector of each feature point is obtained as:

f_{i}^{A} = W \cdot^{(L)} x_{i}^{A} + b, \forall i \in A

(5)

Thus, each feature vector aggregates the spatial and descriptive information of all feature points in its image and all feature points in the other image.

According to the updated features, the optimized matching layer calculates an M × N similarity matrix S, where each cell (i,j) in the matrix represents the similarity between the feature

f_{j}^{A}

in image A and the feature

f_{j}^{B}

in image B, namely:

S_{i, j} = < f_{i}^{A}, f_{j}^{B} >, (i, j) \in A \times B

(6)

Due to occlusion or a different field of view, a feature point in one image may not have a matching feature point in another image. To this end, the matrix S is expanded to an (M + 1) × (N + 1) matrix S, in which a new row and a new column are used to describe the case where the feature points do not match, namely:

{\bar{S}}_{i, N + 1} = {\bar{S}}_{M + 1, j} = {\bar{S}}_{M + 1, N + 1} = z \in R

(7)

Then, the feature-point-matching problem is transformed into an optimal transportation problem, which can be solved by the Sinkhorn algorithm. Since the Sinkhorn algorithm is derivable, it can be implemented with one network layer.

The affine transformation matrix was constructed based on the correspondence between feature-matching points of two images. This matrix was a universal linear transformation matrix that can achieve rotation, scaling, and translation transformations of images. The affine matrix can be expressed as follows:

[\begin{matrix} x_{2} \\ y_{2} \\ 1 \end{matrix}] = [\begin{matrix} h_{11} & h_{12} & t_{x} \\ h_{21} & h_{22} & t_{y} \\ 0 & 0 & 1 \end{matrix}] \cdot [\begin{matrix} x_{1} \\ y_{1} \\ 1 \end{matrix}]

(8)

In this equation, h₁₁, h₁₂, h₂₁, h₂₂ are the inter-image scale zoom and rotation variables, and t_x and t_y are the translation variables. This affine transformation matrix can be computed by importing two images matching the coordinates of the feature points (x₁, y₁) and (x₂, y₂).

Using the affine transformation matrix and mapping relationship between feature points of UAV and satellite images, the geographic coordinates of the feature points in the satellite image can be mapped to the pixel coordinates of the UAV image. This allows for the geographic coordinates of the corresponding feature points in the UAV image to be determined, thereby enhancing the accuracy of UAV ground localization.

2.3. Error Elimination and Optimization

During the feature matching of images from different sources, the algorithm combining SuperPoint and SuperGlue can effectively achieve image feature matching. However, differences in perspective, scale, and affine deformation exist between UAV and satellite images. Consequently, there is often a phenomenon of individual mismatched feature points in feature matching results, which affects the subsequent use of feature points as control points to correct UAV images. This study utilized the RANSAC algorithm to eliminate mismatched feature points.

The RANSAC algorithm was employed for feature point purification to enhance the estimation accuracy of the homography matrix between images, thereby improving the accuracy of image registration. It can accurately estimate model parameters even in the presence of a large number of outlier feature-matching point pairs. The inter-image homography matrix established based on the corresponding feature point relationship between two images was used as the estimation model. A homography matrix is a vector related to rotation, translation, and plane parameters [28]. The equation for the homography matrix is as follows:

[\begin{matrix} u_{1} \\ v_{1} \\ 1 \end{matrix}] = [\begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{matrix}] \cdot [\begin{matrix} u_{2} \\ v_{2} \\ 1 \end{matrix}]

(9)

where u₁, v₁, u₂, v₂ represent the pixel coordinates of the feature points on different images. By expanding and simplifying the matrix of Equation (10), the following equation can be obtained:

u_{2} = \frac{h_{11} u_{1} + h_{12} v_{1} + h_{13}}{h_{31} u_{1} + h_{32} v_{1} + h_{33}}

(10)

v_{2} = \frac{h_{21} u_{1} + h_{22} v_{1} + h_{23}}{h_{31} u_{1} + h_{32} v_{1} + h_{33}}

(11)

The feature points extracted by the SuperPoint algorithm were substituted into the model. The RANSAC algorithm randomly selected a set of feature point pairs from paired feature points as the minimum sample. The initial homography matrix model was then calculated. Subsequently, the remaining feature point data were incorporated into the model, and iterative operations were performed to verify the reliability of the feature points on the data. Generally, the errors of all matching feature point pairs were compared with a pre-set threshold. The feature points were divided into inner and outer points. Points with errors less than the threshold were regarded as inner points. During the process of obtaining the optimal homography matrix, the inner points that satisfied the geometrical constraints were retained as correctly matched feature points. Conversely, the outer points that did not meet the conditions were rejected as mismatched points. This approach effectively extracted reliable feature points and eliminated mismatched feature points, enhancing the accuracy and reliability of feature point matching.

3. Experimental Results and Analysis

3.1. Dataset and Experimental Environment

3.1.1. Satellite Image Dataset

The satellite images employed in this paper were procured from commercial satellites. Figure 6 depicts the satellite images of the experimental area, with a magnified view of a localized region on the right. These images encompass a width of 12.79 km, a height of 12.51 km, covering an area of 159.68 square kilometers within the latitude and longitude range of 112.95–113.09° E and 34.40–34.51° N. The ground resolution stands at 0.60 m/pixel, with dimensions of 2480 × 2823 pixels. Their high ground resolutions allowed for detailed information retrieval. In the actual experimental process, these images were processed based on the specific model and efficiency requirement. To streamline data processing and enhance experimental efficiency, the input images were standardized to 1280 × 960 pixels through a normalization process.

In order to accurately simulate real-world conditions in the actual production environment, we refrained from adjusting or manipulating any image attributes, such as image quality, color scheme, brightness, white balance, contrast, and shadow, during the pre-processing of satellite images. This approach ensures the full preservation of inherent characteristics in the images. Preserving these original and crucial features is of paramount importance for subsequent tasks involving image analysis and recognition.

3.1.2. Aerial Image Dataset

The UAV images used in this study were mainly taken by a DJI M300 UAV, a UAV product of DJI Innovation Technology Co., LTD., Shenzhen, China. The parameters of the equipped camera are listed in Table 1.

These images were acquired in select regions of Dengfeng and Zhengzhou, China, encompassing diverse geographical features such as cities, settlements, lakes, and vegetation. These images not only exhibited excellent representativeness but also served as a foundation for comparing and analyzing different regions. The Dengfeng area in China contributed a total of 607 UAV images, captured on 22 June 2022, with each pixel corresponding to a ground resolution of 9.91 cm (Figure 7).

Additionally, the Zhengzhou area contributed 512 UAV images, captured on 26 September 2021, with each pixel corresponding to a ground resolution of 2.94 cm (Figure 8).

In the pre-processing process of UAV images, in order to ensure the smooth development of the experiment, we only manually removed a very small number of images, which have the characteristics of large areas of cloud cover and large areas of building shadows, etc., without adjusting any attributes of the images. The original information of the image is preserved.

3.1.3. Experimental Environment

The experiment was conducted using a Lenovo Y9000P laptop with an NVIDIA GeForce RTX 3060 6 GB GPU, Intel Core i7-11900H @ 2.50 GHz CPU, and 32 GB DDR4 3200 MHz RAM. The specific configuration parameters are listed in Table 2.

3.2. Image Approximate Location Estimation

Before estimating the approximate location of the image, this study conducted experiments to determine if the image needed denoising. The cosine similarity method was employed to detect the similarity between the images before and after denoising, as shown in Figure 9. The blue line represents the similarity between the images before denoising, and the gray line represents the similarity between the images after denoising. Upon analyzing the fold lines in the table, this study found that the denoising process significantly improved the overall calculation results. It effectively reduced the calculation error and facilitated a more accurate determination of the similarity threshold. Consequently, the matching results could be more precisely screened. Regarding the fitting effect, the rankings of the image retrieval estimation results remained unchanged before and after denoising. Hence, the image retrieval method proposed in this study was not sensitive to the presence of noise. Even with noise in the original image, the proposed method could accurately estimate the ranking of results.

The experimental comparisons demonstrated that utilizing cosine similarity to assess image similarity offers higher accuracy and greater robustness when handling large-format remote sensing images from diverse sources. Compared with the deep learning methods, this method did not require pre-training and supervision and had higher efficiency and faster matching speed.

In Figure 10, the left images represent the reference satellite images. The 10 images displayed in Figure 10a depict the sequence UAV images intended for matching. The images enclosed in red boxes represent the normally estimated images, and the UAV sequence image strip is indicated by the blue line in Figure 10b.

Upon comparing the images within the red and blue boxes, it was observed that the short strip approximate location estimation method, based on the internal relationship between UAV sequence images, effectively identifies unmatched images. Consequently, it supplements the image matching results based on cosine similarity. Moreover, extracting the sequence UAV image strip enhances the accuracy of the approximate location estimation.

3.3. Feature Extraction and Matching

In practical applications, the resolution difference between UAV and satellite images is often large, posing challenges to image matching. For better cross-scale image matching, this study adopted commercial satellite image data for the experiments (see Figure 11). Both images were sized at 1280 × 960 pixels. The left image had a resolution of 0.60 m/pixel, while the right image had a resolution of 2.39 m/pixel.

The experiment used a variety of feature-matching methods with rotation and scale invariance for comparison. These methods included traditional algorithms such as SIFT, SURF, and ORB for feature extraction, the FLANN fast nearest neighbor algorithm, the widely used deep learning CMM-Net model, and the proposed SuperPoint and SuperGlue combined network; among them, the network models used in this paper are the original default weights and hyperparameters and are not specially trained for the background of this experiment, so they are more robust and reliable.

The results are shown in Figure 11. Comparative experiments revealed that all these methods can achieve cross-scale image matching. However, traditional matching methods such as SIFT, SURF, and ORB outperformed the deep learning CMM-Net and the proposed algorithm in terms of the number of feature matching points, as demonstrated in Table 3. The traditional algorithms had a certain advantage in the number of feature points extracted, but the effect of feature matching was poorer. There were insufficient feature-matching points as well as incorrect matching. The primary reason was that the shallow feature descriptors of the neighborhood gradient of the traditional algorithms cannot adapt to the nonlinear distortion between the images from different sources. Additionally, they cannot meet the requirements of deep feature matching. This demonstrates that traditional matching algorithms cannot achieve feature matching between the UAV and satellite images.

Moreover, both the deep learning-based CMM-Net and the proposed algorithm extracted sufficient and uniformly distributed feature points in the two images, enabling efficient and accurate cross-scale image matching. CMM-Net could extract more feature points than the proposed algorithm. However, in terms of the number of successfully matched feature points, the proposed algorithm could extract more abstract feature points and construct deep-level descriptors under different lighting conditions and extreme perspective differences at varying scales. Even with a resolution difference of approximately 4 times, key points with strong robustness could still be obtained. Therefore, the proposed algorithm, combining SuperPoint and SuperGlue for feature matching of UAV images and satellite images, can provide higher accuracy in matching feature point information.

3.4. Number of Control Points before and after RANSAC Rejection

In the aforementioned experiment, the UAV block candidate images and satellite block query images were used as experimental data. By comparing and analyzing the feature matching results, the feature extraction results shown in Figure 12 were obtained. Numerous feature points were observed at building corners and vegetation coverage areas. Notably, the texture complexity was higher in vegetation areas. Images from diverse sources often displayed inconsistent texture features in vegetation areas due to temporal variations, posing challenges in achieving high-precision matching of feature areas at the pixel level.

To eliminate the feature-matching error points with lower accuracy, this study used the RANSAC algorithm to process the matching results. The results of removing mismatched points are shown in Figure 12. This figure shows that the feature matching accuracy in the vegetation area was low, leading to the elimination of a significant number of mismatched points. In contrast, control points were concentrated in areas with clearly defined texture information, such as building corners. By utilizing this algorithm to eliminate coarse matching feature points, high-precision feature points with robustness were extracted.

Through experimental comparisons, the matched control points were optimized for error using the RANSAC algorithm, which eliminated most of the mismatched points without evident features and retained the matched points with evident features. Consequently, the RANSAC algorithm exhibited robustness in processing this set of experimental data, ensuring accurate feature matching.

3.5. UAV Image Localization Accuracy Results

To validate the feasibility and accuracy of the proposed method, 12 feature points within the experimental area were chosen as checkpoints, and these checkpoints are used as the basis for accuracy evaluation. These points were then compared with the traditional methods and the localization coordinates of the UAV images, which had undergone accuracy enhancement through the proposed method. The corresponding results are listed in Table 4.

Comparative analysis revealed that using high-resolution satellite images can improve the ground localization accuracy of UAVs. The resolution of the satellite reference images used in this study can ensure the quality and accuracy of the images. Furthermore, compared with the traditional methods, the experimental results also prove that the proposed method has better accuracy and stability for the improved UAV sequence image positioning. The experimental results show that the positioning error of the proposed method is within 0.356 m, and the overall average error is within 2 pixels.

In summary, the proposed method exhibited superior accuracy and stability, significantly enhancing the ground localization precision of UAVs and yielding the expected experimental results.

4. Conclusions

This study aimed to improve the localization accuracy of UAV images under GNSS denial conditions. By relying on satellite reference images, the localization accuracy of UAV images was improved, and the proposed method was compared with traditional algorithms. The important conclusions of this study are as follows:

(1) This study proposed a method for approximate location retrieval of UAV sequence images based on the strip relationship and cosine similarity of UAV sequence images. Through this method, it was possible to achieve approximate location retrieval of UAV sequence images on satellite reference images. This method can quickly retrieve UAV and satellite images across scales and exhibits good robustness in different experimental areas.

(2) This study employed the SuperPoint feature extraction method based on deep learning and the feature-point-matching network, SuperGlue, which enables high-precision feature matching between UAV and satellite images. The RANSAC algorithm was used to reject mismatched feature points and select the best feature points to obtain robust high-precision control points. Experimental verification demonstrated that the proposed method significantly enhances the ground localization accuracy of UAV images. The overall scheme exhibited a mean error of less than 2 pixels, outperforming traditional feature-matching algorithms such as SIFT, SURF, and ORB.

(3) When using satellite reference images with a resolution of 0.30 m/pixel, the mean error of the UAV ground localization method was 0.356 m, further affirming the effectiveness and feasibility of the proposed method.

In summary, this study introduced a novel method for UAV ground localization, addressing the challenge of UAV image localization under GNSS denial conditions. The method yielded excellent experimental results, demonstrating high accuracy and stability, aligning perfectly with the study’s objectives.

Author Contributions

All authors contributed to the manuscript and discussed the results. Y.Y. put forward the idea of this paper. H.G. and X.H. processed and analyzed the sentinel data and contributed to the manuscript of the paper. L.S. made criticism and revised the manuscript. L.L. (Li Li) and L.L. (Lei Li) collated the experimental data, and L.Z. analyzed the deformations of the manuscript and criticized it. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 42071340); Program of Song Shan Laboratory (included in the management of Major Science and Technology Program of Henan Province, Grant No. 221100211000-01).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

Thanks to the anonymous reviewers for their thorough review and their instructive comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Candiago, S.; Remondino, F.; De Giglio, M.; Dubbini, M.; Gattelli, M. Evaluating multispectral images and vegetation indices for precision farming applications from UAV images. Remote Sens. 2015, 7, 4026–4047. [Google Scholar] [CrossRef]
Akhloufi, M.A.; Castro, N.A.; Couturier, A. UAVs for wildland fires. In Proceedings of the Autonomous Systems: Sensors, Vehicles, Security, and the Internet of Everything, Orlando, FL, USA, 16–18 April 2018; Volume 10643. [Google Scholar] [CrossRef]
Akhloufi, M.A.; Castro, N.A.; Couturier, A. Unmanned aerial systems for wildland and forest fires: Sensing, perception, cooperation and assistance. Drones 2021, 5, 15. [Google Scholar] [CrossRef]
Mokrova, M.I. Studying the effect of difficult fire conditions on the quality of observation and safety of UAV flight. Izv. YuFU. Tekhnicheskie Nauk. 2021, 1, 112–124. [Google Scholar] [CrossRef]
Jordan, S.; Moore, J.; Hovet, S.; Box, J.; Perry, J.; Kirsche, K.; Lewis, D.; Tse, Z.T.H. State-of-theart technologies for UAV inspections. IET Radar Sonar Navig. 2018, 12, 151–164. [Google Scholar] [CrossRef]
Scherer, J.; Yahyanejad, S.; Hayat, S.; Yanmaz, E.; Andre, T.; Khan, A.; Vukadinovic, V.; Bettstetter, C.; Hellwagner, H.; Rinner, B. An autonomous multiUAV system for search and rescue. In Proceedings of the First Workshop on Micro Aerial Vehicle Networks, Systems, and Applications for Civilian Use, DroNet’15, ACM, New York, NY, USA,, 18 May 2015; pp. 33–38. [Google Scholar] [CrossRef]
Mittal, M.; Mohan, R.; Burgard, W.; Valada, A. Vision-based autonomous UAV navigation and landing for urban search and rescue. In Proceedings of the International Symposium on Robotics Research (ISRR), Hanoi, Vietnam, 6–10 October 2019. [Google Scholar] [CrossRef]
Zoev, I.V.; Markov, N.G.; Ryzhova, S.E. Intelligent system of computer vision of unmanned aerial vehicles for monitoring technological facilities of oil and gas companies. Izv. Tomsk. Politekh. Universiteta. Inzhiniring Georesursov 2019, 330, 34–49. [Google Scholar]
De Melo, C.F.E.; Silva, T.D.E.; Boeira, F.; Stocchero, J.M.; Vinel, A.; Asplund, M.; De Freitas, E.P. UAVouch: A secure identity and location validation scheme for UAV-networks. IEEE Access 2021, 9, 82930–82946. [Google Scholar] [CrossRef]
Peshekhonov, V.G. High-precision navigation independently of global navigation satellite systems data. Gyroscopy Navig. 2022, 13, 1–6. [Google Scholar] [CrossRef]
Sabatini, R.; Moore, T.; Hill, C.; Ramasamy, S. Avionics-based GNSS integrity augmentation performance in a jamming environment. In Proceedings of the AIAC16: 16th Australian International Aerospace Congress, Engineers Australia, Melbourne, Australia, 23–24 February 2015; pp. 469–479. [Google Scholar]
Groves, P.D.; Jiang, Z.; Rudi, M.; Strode, P. A portfolio approach to NLOS and multipath mitigation in dense urban areas. In Proceedings of the 26th International Technical Meeting of The Satellite Division of the Institute of Navigation, The Institute of Navigation, Nashville, TN, USA, 16–20 September 2013. [Google Scholar]
Conte, G.; Doherty, P. An integrated UAV navigation system based on aerial image matching. In Proceedings of the 2008 IEEE Aerospace Conference, Big Sky, MT, USA, 1–8 March 2008; pp. 1–10. [Google Scholar]
Viswanathan, A.; Pires, B.R.; Huber, D. Vision- based robot localization across seasons and in remote locations. In Proceedings of the International Conference on Robotics and Automation, Stockholm, Sweden, 16–21 May 2016; pp. 4815–4821. [Google Scholar]
Schmidt, G.T. GPS based navigation systems in difficult environments. Gyroscopy Navig. 2019, 10, 41–53. [Google Scholar] [CrossRef]
Schleiss, M. Translating aerial images into street-map representations for visual self-localization of UAVs, ISPRS-International Archives of the Photogrammetry. Remote Sens. Spat. Inf. Sci. 2019, 42, 575–580. [Google Scholar] [CrossRef]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Yol, A.; Delabarre, B.; Dame, A.; Dartois, J.-E.; Marchand, E. Vision- based absolute localization for unmanned aerial vehicles. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; pp. 3429–3434. [Google Scholar] [CrossRef]
Dame, A.; Marchand, E. Second-order optimization of mutual information for real-time image registration. IEEE Trans. Image Process. 2012, 21, 4190–4203. [Google Scholar] [CrossRef] [PubMed]
Shan, M.; Wang, F.; Lin, F.; Gao, Z.; Tang, Y.Z.; Chen, B.M. Google map aided visual navigation for UAVs in GPS-denied environment. In Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, China, 6–9 December 2015; pp. 114–119. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Soc. Conference Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar] [CrossRef]
Silva Filho, P.; Shiguemori, E.H.; Saotome, O. Uav Visual Autolocalizaton Based on Automatic Landmark Recognition. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci 2017, IV–2/W3, 89–94. [Google Scholar] [CrossRef]
Goforth, H.; Lucey, S. GPS-denied UAV localization using pre-existing satellite imagery. In Proceedings of the International Conference on Robotics and Automation (ICRA), IEEE, Montreal, QC, Canada, 20–24 May 2019; pp. 2974–2980. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition, Conference ICLR. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Saranya, K.C.; Naidu, V.P.S.; Singhal, V.; Tanuja, B.M. Application of vision-based techniques for UAV position estimation. In Proceedings of the International Conference on Research Advances in Integrated Navigation Systems (RAINS), Bangalore, India, 6–7 May 2016; pp. 1–5. [Google Scholar] [CrossRef]
Wang, X.; Kealy, A.; Li, W.; Jelfs, B.; Gilliam, C.; May, S.L.; Moran, B. Toward autonomous UAV localization via aerial image registration. Electronics 2021, 10, 435. [Google Scholar] [CrossRef]
DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. arXiv 2018, arXiv:1712.07629. [Google Scholar]
Zhao, X.; Li, H.; Wang, P.; Jing, L. An Image Registration Method for Multisource High-Resolution Remote Sensing Images for Earthquake Disaster Assessment. Sensors 2020, 20, 2286. [Google Scholar] [CrossRef]

Figure 1. Overall design process.

Figure 2. Schematic of sequence image strip extraction.

Figure 4. SuperPoint neural network model. Here W is the width, H is the height, X is the input image tensor, and D is the descriptor.

Figure 5. SuperGlue image-matching network.

Figure 6. Image data of the experimental area satellite.

Figure 7. Image data of Dengfeng region in China (partial).

Figure 8. Image data of a certain region in Zhengzhou, China (partial).

Figure 9. Cosine similarity matching results before and after image denoising.

Figure 10. Matching results of approximate image locations.

Figure 11. Matching results between UAV images and commercial satellite images.

Figure 12. Comparison of the results of the RANSAC rejection of control point matching errors.

Table 1. Parameters of the camera mounted on DJI M300.

Parameter Type	Parameter Information
size	198 × 166 × 129 mm
weight	about 800 g
supported models	Matrice 300 RTK
absolute accuracy	plane accuracy: 3 cm elevation accuracy: 5 cm
minimum photo interval	0.7 s
shutter speed	mechanical shutter: 1/2000–1 s; electronic shutter: 1/8000–1 s aperture range: f/2.8–f/16 aperture not greater than f/5.6
ISO scope	photo: 100–25,600 video: 100–25,600

Table 2. Experimental environment.

Category	Configuration
model	Y9000P
graphics card	NVIDIA GeForce RTX 3060 6 GB
CPU	Intel Core i7-11900H @ 2.50 GHz
memory	32 GB DDR4 3200 MHz
operating system	Ubuntu 22.04
language environment	Python 3.6 and Python 3.9

Table 3. Matching results of feature points between UAV images and commercial satellite images.

Method	Image	Number of Feature Points	Number of Successful Matching Points
SIFT	left image	14,824	9
SIFT	right image	9290	9
SURF	left image	15,162	14
SURF	right image	8338	14
ORB	left image	10,000	176
ORB	right image	9968	176
CMM-net	left image	5787	75
CMM-net	right image	4388	75
SuperPoint + SuperGlue	left image	3558	143
SuperPoint + SuperGlue	right image	2265	143

Table 4. Comparison of UAV ground localization accuracy error (m).

UAV Positioning Lifting Method	Mean Error in X Direction	Mean Error in Y Direction	Mean Error
SIFT	0.344	0.472	0.560
SURF	0.265	0.356	0.483
ORB	0.413	0.504	0.786
proposed method	0.190	0.286	0.356

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, H.; Yu, Y.; Huang, X.; Song, L.; Li, L.; Li, L.; Zhang, L. Enhancing the Localization Accuracy of UAV Images under GNSS Denial Conditions. Sensors 2023, 23, 9751. https://0-doi-org.brum.beds.ac.uk/10.3390/s23249751

AMA Style

Gao H, Yu Y, Huang X, Song L, Li L, Li L, Zhang L. Enhancing the Localization Accuracy of UAV Images under GNSS Denial Conditions. Sensors. 2023; 23(24):9751. https://0-doi-org.brum.beds.ac.uk/10.3390/s23249751

Chicago/Turabian Style

Gao, Han, Ying Yu, Xiao Huang, Liang Song, Li Li, Lei Li, and Lei Zhang. 2023. "Enhancing the Localization Accuracy of UAV Images under GNSS Denial Conditions" Sensors 23, no. 24: 9751. https://0-doi-org.brum.beds.ac.uk/10.3390/s23249751

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing the Localization Accuracy of UAV Images under GNSS Denial Conditions

Abstract

1. Introduction

2. Principles and Methods

2.1. Approximate Location Retrieval of UAV Sequence Images

2.1.1. Image Pre-Processing

2.1.2. UAV Sequence Image Extraction

2.1.3. Calculation of Approximate Location

2.2. Accurate Matching and Accuracy Improvement of UAV Images

2.2.1. Feature Point Extraction Based on SuperPoint Network

2.2.2. Feature Point Matching Based on SuperGlue Network

2.3. Error Elimination and Optimization

3. Experimental Results and Analysis

3.1. Dataset and Experimental Environment

3.1.1. Satellite Image Dataset

3.1.2. Aerial Image Dataset

3.1.3. Experimental Environment

3.2. Image Approximate Location Estimation

3.3. Feature Extraction and Matching

3.4. Number of Control Points before and after RANSAC Rejection

3.5. UAV Image Localization Accuracy Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI