Detection of Unauthorized Unmanned Aerial Vehicles Using YOLOv5 and Transfer Learning

Al-Qubaydhi, Nader; Alenezi, Abdulrahman; Alanazi, Turki; Senyor, Abdulrahman; Alanezi, Naif; Alotaibi, Bandar; Alotaibi, Munif; Razaque, Abdul; Abdelhamid, Abdelaziz A.; Alotaibi, Aziz

doi:10.3390/electronics11172669

Open AccessArticle

Detection of Unauthorized Unmanned Aerial Vehicles Using YOLOv5 and Transfer Learning

by

Nader Al-Qubaydhi

¹,

Abdulrahman Alenezi

¹,

Turki Alanazi

¹,

Abdulrahman Senyor

¹,

Naif Alanezi

¹,

Bandar Alotaibi

^1,2,*

,

Munif Alotaibi

^3,*

,

Abdul Razaque

^4,*

,

Abdelaziz A. Abdelhamid

^3,5

and

Aziz Alotaibi

⁶

¹

Department of Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia

²

Sensor Networks and Cellular Systems Research Center, University of Tabuk, Tabuk 71491, Saudi Arabia

³

Dahaa Research Group, Department of Computer Science, Shaqra University, Shaqra 11961, Saudi Arabia

⁴

Department of Cybersecurity, International Information Technology University, Almaty 050000, Kazakhstan

⁵

Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt

⁶

Department of Computer Science, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Electronics 2022, 11(17), 2669; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11172669

Submission received: 26 June 2022 / Revised: 19 August 2022 / Accepted: 22 August 2022 / Published: 26 August 2022

(This article belongs to the Section Electrical and Autonomous Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

Drones/unmanned aerial vehicles (UAVs) have recently grown in popularity due to their inexpensive cost and widespread commercial use. The increased use of drones raises the possibility that they may be employed in illicit activities such as drug smuggling and terrorism. Thus, drone monitoring and automated detection are critical for protecting restricted areas or special zones from illicit drone operations. One of the most challenging difficulties in drone detection in surveillance videos is the apparent likeness of drones against varied backdrops. This paper introduces an automated image-based drone-detection system that uses an enhanced deep-learning-based object-detection algorithm known as you only look once (YOLOv5) to defend restricted territories or special zones from unauthorized drone incursions. The transfer learning to pretrain the model is employed for improving performance due to an insufficient number of samples in our dataset. Furthermore, the model can recognize the detected object in the images and mark the object’s bounding box by joining the results across the region. The experiments show outstanding results for the loss value, drone location detection, precision and recall.

Keywords:

drone detection; YOLOv5; unmanned aerial vehicle; deep learning

1. Introduction

Drones, also known as unmanned aerial vehicles, are aircraft that are not piloted by humans. Drones may operate with varying degrees of autonomy, either independently using nearby PCs or remotely controlled by a human administrator [1,2]. Drones were first used for activities that were too dangerous for people to complete, primarily for military purposes [3,4]. Drones are unquestionably more adaptable and remote-controllable than watched aircraft. Since they are controlled by satellites and feature very sophisticated cameras, they provide the pilots who operate.

Drones are known for their low-cost, flexibility, and light weight. Thus, they have rapidly spread and become utilized in various application domains such as industry, urban management, sports, agriculture, peacekeeping, surveillance, and transportation [5,6,7]. Even though drones are employed to supplant humans in risky missions, they are correspondingly utilized for nefarious purposes. For example, drones can scout targets and collect data from private zones or track moving objects such as humans and vehicles for spying purposes, even remotely. They may also contain weapons for attacking targets or conveying explosives to unpredictably terrorize public places. Thus, the surveillance and automated detection of drones are crucial for safeguarding restricted regions or special zones from illegal drone interventions. The existing UAV detection methods are mainly divided into the following categories: audio signal-based detection methods, radar-based detection methods, radio frequency-based detection methods, and image and video-based detection methods. In our paper, we used image-based detection methods since these are less sensitive to noisy environments [8].

Paper Organization

The remainder of the research paper is organized as follows. Section 2 presents the current work. Section 3 summarizes our research contributions. Section 4 introduces the UAV-detection model. Section 5 discusses the experimental setup and results. Section 6 discusses the future work. Section 7 concludes our paper.

2. Related Work

There have been several methods proposed in the past for UAV detection. The previous approaches closely related to our method are those that detect and differentiate drones from other objects present in images captured from videos recorded by static cameras. For example, Saqib et al. [9] proposed flying object detection method using Faster R-CNN and VGG-16. Faster R-CNN is a very well-known deep convolutional neural network used for object detection [10] and VGG-16 [11] is a convolutional neural network that has 16 layers. The proposed method achieved an acceptable result of 66% for the mean average precision (mAP). The authors suggested annotating birds as a standalone class to enable the trained model to differentiate birds from drones and improve the performance with regard to reducing false positives.

Aker et al. [12] utilized a you only look once (YOLOv2) [13] framework to differentiate drones from birds after they detected the object in the recorded video. The authors utilized precision–recall (RR) curves to evaluate their method. The recall and precision percentiles both reached 90. Nalamati et al. [14] investigated various approaches to detect small objects utilizing Faster R-CNN and single-shot detector (SSD) [15] and to differentiate drones from birds using Inception v2 [16] and ResNet-101 [17]. The authors explored the validity of these architectures using a dataset that contains 8771 frames captured from 11 videos. Two sets of experiments were conducted: when the drone was near the camera, and when the drone was far away from the camera. In the first set of experiments, the results for all the architectures were high. In the second set of experiments, the combination of Faster R-CNN and ResNet-101 performed better in terms of recall and precision than the other architectures. However, the authors omitted the detection time as an effective performance metric when they conducted their experiments and planned to study this metric in future work.

De la Iglesia et al. [18] utilized both a well-known object-detection method (i.e., RetinaNet [19]) and the feature pyramid network (FPN) [20] to predict whenever the object in the image was a drone or a bird. The FPN was adopted to detect objects on two different levels: the lower pyramidal levels aimed to detect small objects, while the upper pyramidal levels concentrated on large objects. For recognition, the authors utilized ResNet-50-D [21] and validated this architecture using a dataset that contains images of drones and birds. Magoulianitis et al. [22] used deep CNN with skip connection and network in network (DCSCN) [23] to improve the performance by pre-processing the images. Subsequently, the authors utilized Faster-RCNN to detect the objects. The proposed method achieved acceptable recall results.

Other studies [24,25] divided drone detection into two phases: easy-to-detect objects (i.e., objects that are far away from the camera or objects in low-contrast images) in the first phase; a classifier with high precision was introduced to decrease the number of false positives in the second phase. Schumann et al. [25] utilized both median-background subtraction and a deep-neural-network-based region proposal network (RPN) to develop a robust flying object (i.e., drones, birds, and clutter) detector. The authors evaluated their method using a customized dataset of 10,286 images. The authors also proposed using VGG-conv5 as a classifier and took first place in the 2017 drone-vs.-bird challenge. Craye and Ardjoune [24] used two different architectures: the semantic segmentation network U-net [26] for detection and ResNet-v2 for recognition. The proposed method won first place in the 2019 drone-vs-bird competition, achieving 71% recall and 76% precision.

Seidaliyeva et al. [27] utilized a background-subtraction technique to detect the moving objects and a well-known CNN architecture (i.e., mobileNet-v2) to classify the detected objects into three classes: bird, drone, and background. The proposed approach achieved promising results of 70.1% precision, 78.8% recall, and 74.2% f-score. However, the proposed approach’s high-performance relies on the presence of a moving background.

Khan et al. [28] introduced an end-to-end air-defense system capable of autonomously detecting and targeting drones. The proposed system consists of three phases. The moving objects are detected by radar (i.e., it can emit microwaves) in the first phase. Once the object is detected, the camera is triggered if any flying object is detected, to recognize what type of object has been detected, using YOLOv3 in the second phase. In the final phase, the laser gun is utilized to lock the threat if the certainty of the recognition exceeds 75%. The proposed detection and recognition model achieved promising average loss results of 0.184961.

The current image-based detection models encountered several difficulties. For instance, many of the existing models suffer from slow detection times and some of them cannot locate small moving objects in the images.

3. Research Contribution

To overcome the existing challenges indicated in the previous section, a drone-detection approach that utilizes an advanced deep-learning-based object detection and recognition method (i.e., a convolutional neural network (CNN)-based technique) is proposed. Object-detection techniques have become more accurate due to the renaissance of deep-learning techniques that reemerged due to powerful computing resources. Figure 1 shows a typical deep-learning-based drone-detection framework using images. Typical deep-learning algorithms usually consist of several layers: the input layer, hidden layers, and output layer. Moreover, this method aims to automatically detect objects from videos recorded with diverse contrasts (including low contrast). The YOLOv5 provides better accuracy and a faster detection rate. We added transfer learning with YOLOv5 which helps to reuse the knowledge gained from one task to other tasks. Furthermore, the integration of both efficiently detects the objects in the images and marks the bounding box of the object. The proposed method comprises several phases: locating the regions of interest in the image using the selective-search mechanism, extracting the features using CNN and pretrained weights, recognizing the objects using a supervised-learning algorithm, and marking the object’s bounding box by integrating the results across the regions. To make it more realistic, we used a dataset that consists of images of drones with different backgrounds such as buildings, water, trees, humans, and other background objects.

The contributions of this research can be summarized as follows:

An automated image-based drone-detection system utilizing an advanced deep-learning-based object-detection method known as you only look once (YOLOv5) is introduced for protecting restricted regions or special zones from unlawful drone interventions.
We used the YOLOv5 powerful deep-learning-based object-detection approach to find drones in images taken from varied distances. To the best of our knowledge, this is the first effort to use Yolov5 for the task of drone detection from images.
The YOLOv5 efficiently improves the detection capability of unauthorized UAVs from images.
The transfer learning method is integrated with YOLOv5 to train the model because our dataset lacked enough samples. This integration improved the accuracy.
By merging the data from several locations, the model can recognize the identified object in the images and designate its bounding box.
The model can identify the detected object in the images and marked the object’s bounding box through joining the results across the regions.

4. Materials and UAV-Detection Model

4.1. Background of YOLO Algorithms

Most current high-performance object-detection frameworks are based on the R-CNN algorithm series and YOLO algorithm series. R-CNN-based object-detection frameworks have achieved high performance in terms of accuracy in various fields; however, these frameworks suffer from slow detection times. In some situations, these frameworks cannot detect objects in real time. To solve the speed problem, a YOLO algorithm series treats the image-detection challenge as a regression problem with a simple cascade model [29,30]. The YOLO model is an advanced real-time object-detection model that has gained significant attention in the research community and has achieved state-of-the-art performance in various areas. YOLO stands for “you only look once”. It can process a streaming video in less than 25 s.

YOLO divides the 2D images into a grid during the training process. Each grid cell is responsible for detecting the object within its border. In target-detection problems, YOLO considers the entire image in the training phase to gain global information. Thus, the entire image is the input to the YOLO network, and the bounding box spot and the category to which the bounding box belongs are returned. The whole image’s characteristics are utilized, and the prediction is applied to each bounding box separately. Each bounding box includes five confidences and predictions proportional to the grid unit.

The YOLO algorithm was proposed by [31] in 2015. Then, the YOLO algorithm was upgraded to four new versions, namely YOLOv2 [13], YOLOv3 [32], YOLOv4 [33], and most recently, YOLOv5 [34,35]. YOLOv2 utilizes a new training algorithm known as k-means to cluster the bounding boxes. An a priori box is employed to improve the intersection over union (IOU) performance between the ground truth and the prediction box. In the cluster analysis, the distance indicator is represented by the IOU value of the current box and the cluster center box. Compared to the first version of YOLO, this version achieves better recall and accuracy rates. In YOLOv3, the recognition step is improved due to the utilization of both residual network (ResNet) [17] and Darknet-53. Additionally, feature pyramid networks (FPNs) [20] are utilized to perceive multiscale prediction. These additions to YOLO improved the performance in terms of accuracy and speed and decreased the false background-detection rate. In YOLOv4, the head part was adopted from YOLOv3, CSPDarknet-53 replaced the backbone network, and the receptive field was expanded by employing spatial pyramid pooling (SPP) [36] along with the path-aggregation network (PANet) [37] in the neck par. These modifications to the architecture improved the feature-extraction ability of YOLOv4 in comparison with the previous algorithms.

4.2. The Classification Framework

In this work, we used the most advanced version of the YOLO algorithm series, YOLOv5. The YOLOv5 model has a high performance and fast detection speed, and can meet the requirements of real-time applications. It can rapidly achieve all the required steps to detect an object using a single neural network. In UAV detection, both the detection speed and accuracy are imperative. Thus, YOLOv5 achieves not only outstanding detection performance in UAV detection but also achieves real-time speed. Moreover, it has excellent performance and can easily be trained to detect different objects. YOLOv5 has been developed in Pytorch, which is an open source framework for deep learning based on Python.

In the YOLOv5 architecture, there are three main components as follows:

The backbone is made up of a revolutionary core block called the cross-stage partial network (CSPNet) [34,35]. CSPNet fixes various gradient-related difficulties. It reduces the algorithm’s parameter count and the number of floating-point operations per second (FLOPS). As a result, it improves the inference speed and accuracy while reducing the architecture’s size. Furthermore, the backbone has multiple convolutional layers, four CSP bottlenecks with three convolutions, and one spatial pyramid pooling rapidly (SPPF). The backbone’s primary goal is to extract different-size feature maps from the input picture using many rounds of convolution and pooling. As a result, the backbone layers in YOLOv5 work as a feature extractor [38].
The neck, also known as the path aggregation network (PAN), is used for feature fusion. It saves and sends features from deep layers to the detecting head. As a result, it extracts feature information and creates the output feature maps of three different sizes [39].
The head or output portion performs object detection. There are multiple convolutional layers in the head section, four CSP bottlenecks with three convolutions, and upsampling and concatenate layers. The head section predicts visual characteristics, draws bounding boxes around the target object, and determines the class.

The position of the bounding box of the target can be calculated as follows:

σ_{r}^{g} = O_{r, g} * ℧_{T}^{P}

(1)

where r is the bounding box of the g grid, and

σ_{r}^{g}

is the confidence score of the r bounding box.

O_{r, g}

is the target if it is within the g box. If the target is in the g box, the value of

O_{i, j}

would be 0; otherwise, it would be 1.

O_{i, j}

is the intersection over union (IoU), a well-known evaluation metric in image detection. The IOU score depends on the accuracy of the correct location of the bounding box around the target [40].

YOLOv5 has several versions, namely YOLOv5s, yolov5m, yolov5l, and yolov5x. In this work, we used yolov5x. YOLOv5x is the largest model among them. It has 476 layers. These layers create approximately 87 million parameters.

The feature map of the convolution layer is calculated as:

R = L * W + b

(2)

where ∗ is the convolution operator. The convolution operation is explained as

L * W = \sum_{n, m} W (n, m) χ L (x - n, y - m)

(3)

where L is the input volume that convolves ∗ with the trained weight, W, of the layer, and b is the bias term. Then, the activation function is applied on the output feature map R. The output feature map ∗ summarizes the detected features in the input. n and m denote the weight kernel indexes.

χ

denotes the element-wise multiplication between the weight and the input.

Figure 2 shows the architecture of YoloV5. It shows the backbone, PANet, and the output parts.

There are several layers of feature maps generated in the backbone network. During the training process, the data-augmentation step is used to cover a broad spectrum of semantic variations. It scales the images by a 0.5 fraction and uses an image translation of a 0.1 fraction.

The loss function (

L F

) of Yolov5 is the sum of regression function for the bounding box, confidence loss and classification loss. It is calculated as follows:

L F = l_{b x} + l_{s} + l_{j}

(4)

where

l_{b x}

is the regression function for the bounding box,

l_{j}

is the loss function for the confidence and

l_{s}

is the loss function for the classification [40].

The

l_{b x}

is calculated as follows:

\begin{matrix} l_{b x} = λ_{c d} \sum_{i = 0}^{s^{2}} \sum_{m = 0}^{b} I_{i, m}^{j} b j (2 - W_{i} \times h_{i}) [{(x_{i} - x {^{'}}_{i}^{m})}^{2} \\ + {(y_{i} - y {^{'}}_{i}^{m})}^{2} + {(w_{i} - w {^{'}}_{i}^{m})}^{2} + {(h_{i} - h {^{'}}_{i}^{m})}^{2}] \end{matrix}

(5)

where

h^{'}

and

w^{'}

are the height and the width of the target, respectively.

y_{i}

and

x_{i}

are the correct coordinates of the target.

λ_{c d}

is the indicator function of whether the cell i contains an object.

The

l_{s}

is calculated as follows:

l_{s} = λ_{s} \sum_{i = 0}^{s^{2}} \sum_{m = 0}^{b} I_{i, m}^{j} \sum_{C \in c l} V_{i} (c) l o g (V V_{i} (c))

(6)

The

l_{j}

is calculated as follows:

l_{j} = λ_{n o j} \sum_{i = 0}^{s^{2}} \sum_{m = 0}^{b} I_{i, m}^{n o j} {(c_{i} - c_{l})}^{2} + λ_{j} \sum_{i = 0}^{s^{2}} \sum_{m = 0}^{b} I_{i, m}^{j} {(c_{i} - c c_{l})}^{2}

(7)

where

λ_{n o j}

is the category loss coefficient,

λ_{s}

is the classification loss function,

c l

is confidence score, and

c l

is the class.

5. Experimental Setup and Results

In this section, we present and discuss the experiments conducted to prove the effectiveness of the proposed approach. The section starts with describing the dataset employed in this research, then discusses the conducted experiment and compares the results achieved using the proposed approach with the results achieved using other competing approaches.

5.1. Dataset

The machine learning model should be trained on a set of labeled annotated images to detect and recognize drone objects in an image. We employed the freely available drone dataset from Kaggle [41] in this research. This dataset consists of 1359 images of drones captured in an Earth-to-drone view. All the images in this dataset were labeled and annotated to fit the training of the adopted Yolov5 framework.

Figure 3 presents a set of sample drone images in the dataset. As shown in the figure, the dataset contains multiple categories of drones, and the images were captured at different distances from the drone, which represents an advantage for this dataset. UAVs are classified into three categories: fixed-wing UAVs, rotary-wing UAVs, and hybrid UAVs (i.e., combining the characteristics of both fixed-wing and rotary-wing UAVs).

As shown in Figure 4, the rotary-wing UAVs can be classified based on the number of rotors into four sub-categories: quadcopters, octocopters, tricopters, and hexacopters. In this research, we considered all the categories of drones as a single class.

As shown in Figure 5, the dataset presents a challenge in accurately detecting drones. The variation in the sizes of the drone objects represents a significant challenge for machine-learning models for accurately detecting and classifying small drone objects against an ambient background. Therefore, a set of region proposals of various sizes was generated to search for drone objects in the input images. Figure 5 depicts the distribution of the sizes and locations of drone objects in the images of the dataset.

5.2. Evaluation Metric

To evaluate the performance of the YOLOv5 trained model in detecting drones, we adopted two metrics: recall and precision. The curves of these metrics are built by changing the threshold of the detection threshold. Therefore, the model’s effectiveness can be measured based on these vital model evaluation metrics. On the other hand, a distinct and deeper meaning of each of these metrics is presented in the following.

Precision: This metric measures the percentage of relevant detection results. This can be determined using the following equation:

P r e c i s i o n = \frac{T P}{(T P + F P)}

(8)

where TP and FP denote the number of truly detected objects and the number of non-detected objects, respectively. Thus, this calculates the number of positive class predictions which belong to the positive class. This measures how much of the bounding-box predictions are true and correct.

Recall: This metric measures the percentage of total correctly classified results. It can be determined using the following equation:

R e c a l l = \frac{T P}{(T P + F N)}

(9)

where FN refers to the number of falsely detected objects.

For the experiments conducted in this research, if the detected bounding box overlapped by more than 50% with the predefined bounding box, this detected bounding box was considered a true-positive bounding box; otherwise, it was considered a false-positive bounding box.

5.3. Model Tuning

The experiment was conducted using a local machine (8 GB NVidia RTX2070 graphic processing unit (GPU), with a main memory of 16 GB, 1.9 GHz CPU, and SSD hard drive) to train YOLOv5. To run the Yolov5 process on this GPU, cuDNN 7.6.5 was employed. We fine-tuned and configured the YOLOv5 architecture for the Kaggle drone dataset. This research used transfer learning to make the YOLOv5 framework compatible with this dataset. We used already existing pretrained weights from a different model. These pretrained weights were trained on the COCO large dataset. We fine-tuned the last three YOLOv5 and convolutional layers to match the number of classes in the dataset. The original YOLOv5x pretrained model was trained on 80 classes; thus, we changed the number of classes to two, namely “drone” and “background”. To adequately address the data scarcity, YOLOv5 introduces various data augmentation techniques. We used data augmentation before applying YOLO5 and set the parameters, such as rotation, translation, scaling, and other parameters, to enable YOLOv5 to generate various images from a single image to enrich the given dataset and allow for better training. YOLOv5x uses data augmentation steps during the training process to cover a broad spectrum of semantic variations. It scales the images by a 0.5 fraction and uses an image translation of a 0.1 fraction. In addition, the number of batches was set to 4 to increase the trained model’s robustness and better match the GPU resources. Finally, the number of epochs was set to 100, after which it was noticed that the trained model became steady and stable. On the other hand, other hyperparameters were incorporated in the conducted experiments, such as decay = 0.00036, initial learning rate = 0.0032, final learning rate = 0.12, and momentum = 0.843. These parameters were kept with their default values. Finally, we trained and tested YOLOv5 on our local machine using the drone dataset. We trained YOLOv5 for 100 iterations, saved the trained weights for each 25 iterations, and later constructed a number-of-iterations-versus-mAP curve at four different points as weights that were saved at 25, 50, 75, and 100 iterations. A flowchart of the overall experiment is shown in Figure 6.

5.4. Results

To evaluate the performance of the proposed fine-tuned Yolov5 in detecting drone objects, we utilized the pretrained model weights for transfer learning. These pretrained weights were trained on the COCO large dataset. The training process was then established based on the fine-tuned parameters described in the previous section.

The images in this dataset were split into three sets: the training, validation, and test sets. The training set was composed of 60% of the overall dataset images. In comparison, the validation set consisted of 20% of the images in the dataset, and finally, the testing set was composed of the other 20% of the dataset images.

Figure 7 shows the performance of the fine-tuned YOLOv5 model through the training process. In this figure, the images in the top row refer to the model’s performance for the training set, whereas the images in the bottom row refer to the model’s performance using the validation set. From these images, it can be noted that the loss in detecting drone objects in the training set reached a value of less than 0.01 after 100 epochs. However, the loss in detecting drones in the validation set reached its minimum value around epoch number 50; then, it increased slightly as iteration number 100 was approached. As the validation loss kept increasing after the 100th iteration, and achieving a better generalization of the trained model, we stopped the training process at the 100th iteration. The early stopping is adopted for the conducted experiments. Therefore, the training process halts when no significant change occurred in the performance of the training model for ten consecutive epochs. In Figure 7, there are some oscillations in the signals; these types of oscillations are prevalent during the training and caused by divergent weights.

To further analyze the performance of the model-training process, Figure 8 shows a mapping of the model precision and recall of the drone detection during the training process. In the figure, it can be seen that the model could achieve an mAP of 96.8%, which represents the area under the curve. This value refers to the capability of the trained model to accurately detect drone objects with high precision and recall values. To emphasize the generalization of the trained model, the testing set was used to evaluate this model and various criteria were evaluated and compared with those for the other competing models as presented in the next section.

Figure 9 presents a set of images containing different types of drone objects along with the detection results using the trained model. As shown in the figure, the proposed approach could accurately detect the locations of drone objects of different types and sizes, which shows the effectiveness of the proposed approach for realizing this task.

5.5. Complexity and Parameter Uncertainty

YOLOv5 advancements resulted in quicker and more accurate models on GPUs, but introduced complexities to CPU deployments. The settings, such as rotation, scaling, translation, and others, were configured to allow YOLOv5 to produce several objects from a single image in order to enrich the supplied dataset and allow for better training. YOLOv5 employs data-augmentation phases throughout the training process to cover a wide range of semantic variances. Furthermore, the parameter uncertainty includes measurement errors, the use of surrogate data, sampling errors, and variability dependent on datasets. Because of parameter uncertainty, exposures may be overestimated or underestimated. The inaccuracies in sampling may come from gathering too few observations or nonrepresentative sampling. Generally, the investigations of residential exposures involve very few measures and are usually confined to a small number of situations. The pretrained weights in our case are trained using the COCO big dataset. The training procedure is then built based on the stated fine-tuned parameters.

5.6. Comparison with Other Models

To demonstrate the superiority of the proposed approach for the task of drone detection, several experiments were conducted to measure the performance of different approaches such as yolov3 and yolov4 in comparison with the proposed model. Firstly, yolov3 was employed to be trained on the same images of the training and validation sets. Figure 10 shows the loss and mAP values progress while training yolov3 on the training set. As shown in the figure, the training process takes approximately 6000 iterations to achieve better values of loss and mAP metrics. The minimum achieved average loss value was 0.0597, and the best achieved mAP was 64.6% after 6000 iterations. These values represent the mean overall training epochs.

On the other hand, yolov4 was trained on the same dataset, and the progress of the evaluation criteria is depicted in Figure 11. As shown in the figure, the progress achieved by yolov4 for the mAP was much better than the corresponding progress achieved by yolov3. The best mAP value achieved by yolov4 was 89.9 for the training set and, overall, the training epochs. In addition, the best average loss value achieved by yolov4 was 0.5360, which is higher than the corresponding value achieved by yolov3. However, the effects of these values can become apparent when we check the generalization of these models.

After training the three models, yolov3, yolov4, and the proposed model, these models were used to check the generalization property. These models were used to measure the evaluation criteria, precision, recall, and mAP using the training and testing sets. Figure 12 depicts the evaluation results achieved for the three models. As shown in the figure, the results achieved by yolov4 for the three evaluation criteria are better than those achieved by other models. However, this did not reflect the generalization of the model. Therefore, the generalization could be checked when we evaluated the trained model on the testing set that the model had not seen before.

As the testing set was used to check the generalization of the trained models, Figure 13 presents the achieved results for the evaluation criteria. In this figure, we can note that, although yolov4 achieved the best results on the training set, it did not generalize well on the testing set. However, the proposed model could achieve the best generalization and higher values of the evaluation metrics for the testing set. In addition, the precision achieved by yolov3 was better than the precision achieved by yolov4 for the testing set; however, the recall value of yolov3 was much lower than that of yolov4. On the other hand, the proposed model achieved the best balanced values for the three evaluation criteria. These results emphasize the superiority of the proposed model when compared with the other competing models. On the other hand, the proposed approach is compared with mask RCNN, and the results are shown in Figure 14. As shown in the figure, the proposed fine-tuned model could achieve higher precision, recall, and mAP values.

Figure 15 depicts a heatmap of the precision values achieved using the three models on the testing set. This figure shows that the precision of the drone detection achieved by yolov3 and yolov4 was 92% and 91%, respectively. However, the precision achieved by the proposed model on the same testing set was 94.7%. These values reflect the efficiency of the proposed model and its promising performance in the drone detection task.

Moreover, Table 1 compares the proposed approach and other competing methods in the literature for the same drone detection task. As shown in the table, the best results are achieved by two approaches: the proposed approach and the CNN approach. However, the proposed approach was tested on a larger dataset than the dataset used to evaluate the CNN approach. In addition, the dataset employed in this research contains small objects, which forms a challenge in accurately detecting drone objects. Overall, this paper highlights the application of Yolov5, which to the best of our knowledge, is the first research that addresses this application for the drone detection task and achieved promising results.

5.7. Discussion of Results

It is evident from the experiments that YOLOv5 has a number of benefits over other detection techniques. MaskCNN is slower than YOLOv5. The YOLOv5 is hence more suited for real-time applications. Additionally, there are fewer parameters in YOLOv5 than there are in MaskCNN. As a result, YOLOv5’s memory occupancy is better than MaskCNN’s. YOLOv5 also has the benefit of being able to recognize distant or tiny items. The MaskCNN has the drawback of needing a lot of data to be trained effectively. Moreover, it improves the ability to detect unauthorized UAVs. Comparing the YOLOv5 to other models, it offers more accuracy and a faster detection rate. The transfer learning method makes it easier to reuse knowledge gained from one task to another with the addition of YOLOv5. Furthermore, the two work together to recognize objects accurately and indicate their bounding boxes. The generalization property is checked using yolov5. The competing algorithms, on the other hand, lack a generalization property. Using the training and testing sets, the proposed model assesses the evaluation criteria, precision, recall, and mAP.

6. Future Work

In the future, we will try to apply our method to a larger dataset to demonstrate the effectiveness of the YOLOv5 algorithm. Moreover, we will include more challenging deadset scenarios and conditions. For example, we will try to have several types of birds, noises, airplanes, and drones. We will also try to have ambiguous images of drones and other objects.

7. Conclusions

In this study, we proposed a fine tuning of the recently emerged yolov5 deep-learning model for detecting drone objects in images captured at various distances. The proposed approach was evaluated using a freely available dataset that consists of the different types of drones with different sizes. The fine-tuning was performed by adjusting the model’s parameters to best fit the task. In addition, we employed data augmentation to provide the training phase of the model with sufficient data for it to be adequately trained. The results show that the trained network can detect drones of different types with high accuracy. Moreover, to emphasize the superiority of the proposed approach, three other models, namely, yolov3, yolov4, and maskRCNN, were trained and tested on the same dataset, and their results were compared to those achieved by the proposed approach. The comparison results confirmed the effectiveness and superiority of the proposed method for the task of drone detection.

Author Contributions

A.R. and A.A.A., conceptualization, writing, methodology, and results; B.A. and M.A., writing, conceptualization, draft preparation, editing, idea proposal and visualization; A.S., N.A.-Q., T.A. and N.A., writing and reviewing; A.A. (Abdulrahman Alenezi) and A.A. (Aziz Alotaibi), draft preparation, editing, and reviewing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Sensor Networks and Cellular System (SNCS) Research Center under Grant 1442-002.

Acknowledgments

Taif University Researchers Supporting Project number (TURSP-2020/302), Taif University, Taif, Saudi Arabia. The authors gratefully acknowledged the support of the SNCS Research Center at the University of Tabuk, Saudi Arabia. In addition, the authors would like to thank the deanship of scientific research at Shaqra University for supporting this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sandino, J.; Vanegas, F.; Maire, F.; Caccetta, P.; Sanderson, C.; Gonzalez, F. UAV framework for autonomous onboard navigation and people/object detection in cluttered indoor environments. Remote Sens. 2020, 12, 3386. [Google Scholar] [CrossRef]
Azar, A.T.; Koubaa, A.; Ali Mohamed, N.; Ibrahim, H.A.; Ibrahim, Z.F.; Kazim, M.; Ammar, A.; Benjdira, B.; Khamis, A.M.; Hameed, I.A.; et al. Drone Deep Reinforcement Learning: A Review. Electronics 2021, 10, 999. [Google Scholar] [CrossRef]
Udeanu, G.; Dobrescu, A.; Oltean, M. Unmanned aerial vehicle in military operations. In Proceedings of the 18th International Conference “Scientific Research and Education in the Air Force—AFASES”, Brasov, Romania, 26–28 May 2016; pp. 199–205. [Google Scholar]
Pedrozo, S. Swiss military drones and the border space: A critical study of the surveillance exercised by border guards. Geogr. Helv. 2017, 72, 97–107. [Google Scholar] [CrossRef]
Restas, A. Drone applications for supporting disaster management. World J. Eng. Technol. 2015, 3, 316. [Google Scholar] [CrossRef]
Lee, S.; Choi, Y. Reviews of unmanned aerial vehicle (drone) technology trends and its applications in the mining industry. Geosyst. Eng. 2016, 19, 197–204. [Google Scholar] [CrossRef]
Gallacher, D. Drone applications for environmental management in urban spaces: A review. Int. J. Sustain. Land Use Urban Plan. 2016, 3. [Google Scholar] [CrossRef]
Liu, B.; Luo, H. An Improved Yolov5 for Multi-Rotor UAV Detection. Electronics 2022, 11, 2330. [Google Scholar] [CrossRef]
Saqib, M.; Khan, S.D.; Sharma, N.; Blumenstein, M. A study on detecting drones using deep convolutional neural networks. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–5. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Aker, C.; Kalkan, S. Using deep networks for drone detection. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Nalamati, M.; Kapoor, A.; Saqib, M.; Sharma, N.; Blumenstein, M. Drone detection in long-range surveillance videos. In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–6. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
de la Iglesia, D.; Mendez, M.; Dosil, R.; Gonzalez, I. Drone detection CNN for close-and long-range surveillance in mobile applications. In Proceedings of the AVSS 2019, Taipei, Taiwan, 18–21 September 2019. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 2117–2125. [Google Scholar]
He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 558–567. [Google Scholar]
Magoulianitis, V.; Ataloglou, D.; Dimou, A.; Zarpalas, D.; Daras, P. Does deep super-resolution enhance uav detection? In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–6. [Google Scholar]
Yamanaka, J.; Kuwashima, S.; Kurita, T. Fast and accurate image super resolution by deep CNN with skip connection and network in network. In Proceedings of the International Conference on Neural Information Processing, Guangzhou, China, 14–18 November 2017; Springer: Cham, Switzerland,, 2017; pp. 217–225. [Google Scholar]
Craye, C.; Ardjoune, S. Spatio-temporal semantic segmentation for drone detection. In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–5. [Google Scholar]
Schumann, A.; Sommer, L.; Klatte, J.; Schuchert, T.; Beyerer, J. Deep cross-domain flying object classification for robust UAV detection. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland,, 2015; pp. 234–241. [Google Scholar]
Seidaliyeva, U.; Akhmetov, D.; Ilipbayeva, L.; Matson, E.T. Real-Time and accurate drone detection in a video with a static background. Sensors 2020, 20, 3856. [Google Scholar] [CrossRef]
Khan, F.R.; Muhabullah, M.; Islam, R.; Monirujjaman Khan, M.; Masud, M.; Aljahdali, S.; Kaur, A.; Singh, P. A Cost-Efficient Autonomous Air Defense System for National Security. Secur. Commun. Netw. 2021, 2021, 9984453. [Google Scholar] [CrossRef]
Yao, J.; Qi, J.; Zhang, J.; Shao, H.; Yang, J.; Li, X. A Real-time detection algorithm for kiwifruit defects based on YOLOv5. Electronics 2021, 10, 1711. [Google Scholar] [CrossRef]
Wu, W.; Liu, H.; Li, L.; Long, Y.; Wang, X.; Wang, Z.; Li, J.; Chang, Y. Application of local fully Convolutional Neural Network combined with YOLO v5 algorithm in small target detection of remote sensing image. PLoS ONE 2021, 16, e0259283. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Chien-Yao Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Qi, D.; Tan, W.; Yao, Q.; Liu, J. YOLO5Face: Why Reinventing a Face Detector. arXiv 2021, arXiv:2105.12931. [Google Scholar]
Github. Yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 9 December 2021).
Luvizon, D.; Tabia, H.; Picard, D. SSP-Net: Scalable Sequential Pyramid Networks for Real-Time 3D Human Pose Regression. arXiv 2020, arXiv:2009.01998. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
Xing, L.; Fan, X.; Dong, Y.; Xiong, Z.; Xing, L.; Yang, Y.; Bai, H.; Zhou, C. Multi-UAV cooperative system for search and rescue based on YOLOv5. Int. J. Disaster Risk Reduct. 2022, 76, 102972. [Google Scholar] [CrossRef]
Wang, Q.; Cheng, M.; Huang, S.; Cai, Z.; Zhang, J.; Yuan, H. A deep learning approach incorporating YOLO v5 and attention mechanisms for field real-time detection of the invasive weed Solanum rostratum Dunal seedlings. Comput. Electron. Agric. 2022, 199, 107194. [Google Scholar] [CrossRef]
Xu, Q.; Zhu, Z.; Ge, H.; Zhang, Z.; Zang, X. Effective Face Detector Based on YOLOv5 and Superresolution Reconstruction. Comput. Math. Methods Med. 2021, 2021, 7748350. [Google Scholar] [CrossRef]
Mehdi Ozel. Available online: https://www.kaggle.com/dasmehdixtr/drone-dataset-uav (accessed on 25 December 2021).
Behera, D.; Bazil, A. Drone Detection and Classification using Deep Learning. In Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13–15 May 2020; pp. 1012–1016. [Google Scholar]
Tan, L.; Lv, X.; Lian, X.; Wang, G. YOLOv4 Drone: UAV image target detection based on an improved YOLOv4 algorithm. Comput. Electr. Eng. 2021, 93, 107261. [Google Scholar] [CrossRef]
Wu, Q.; Feng, D.; Cao, C.; Zeng, X.; Feng, Z.; Wu, J.; Huang, Z. Improved Mask R-CNN for Aircraft Detection in Remote Sensing Images. Sensors 2021, 21, 2618. [Google Scholar] [CrossRef] [PubMed]
Mahdavi, F.S.; Rajabi, R. Drone Detection Using Convolutional Neural Networks. In Proceedings of the 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Mashhad, Iran, 23–24 December 2020; pp. 1–5. [Google Scholar]

Figure 1. Typical drone detection based on deep learning.

Figure 2. YOLOv5 architecture.

Figure 3. Sample images from the drone dataset.

Figure 4. Categories of rotary-wing UAVs, with different propeller guards.

Figure 5. Distribution of the sizes and locations of drone objects in images of the dataset.

Figure 6. Flowchart of the conducted experiment.

Figure 7. The progress in the performance of the proposed fine-tuned model during the training process.

Figure 8. Precision–recall curve for detecting drone objects in the training set using the proposed fine-tuned model.

Figure 9. Drone detection using the proposed model supplied with images in the testing set.

Figure 10. The progress of the loss value while training the approach proposed by Behera and Bazil [42].

Figure 11. The progress of the loss value while training the method proposed by Tan et al. [43].

Figure 12. Precision, recall, and mAP results achieved by proposed model for the training set compared to Behera and Bazil [42] and Tan et al. [43].

Figure 13. Precision, recall, and mAP results for the testing set using the proposed model compared to Behera and Bazil [42] and Tan et al. [43].

Figure 14. Precision, recall, and mAP results of Yolov5 and MaskRCNN models.

Figure 15. The precision of drone detection using the proposed model and other competing models namely: Yolov3 [32], Yolov4 [33], and MaskRCNN [44].

Table 1. Comparison between the precision achieved by the proposed approach and other competing approaches.

Reference	Methods	Dataset Size	Results (Precision)
[45]	CNN	712	95.0%
[45]	SVM	712	88.0%
[45]	KNN	712	80.0%
[12]	Yolov2	215	90.0%
[9]	VGG16	2727	66.0%
[44]	MaskRCNN	1359	93.6%
Proposed	Yolov5	1359	94.7%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Qubaydhi, N.; Alenezi, A.; Alanazi, T.; Senyor, A.; Alanezi, N.; Alotaibi, B.; Alotaibi, M.; Razaque, A.; Abdelhamid, A.A.; Alotaibi, A. Detection of Unauthorized Unmanned Aerial Vehicles Using YOLOv5 and Transfer Learning. Electronics 2022, 11, 2669. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11172669

AMA Style

Al-Qubaydhi N, Alenezi A, Alanazi T, Senyor A, Alanezi N, Alotaibi B, Alotaibi M, Razaque A, Abdelhamid AA, Alotaibi A. Detection of Unauthorized Unmanned Aerial Vehicles Using YOLOv5 and Transfer Learning. Electronics. 2022; 11(17):2669. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11172669

Chicago/Turabian Style

Al-Qubaydhi, Nader, Abdulrahman Alenezi, Turki Alanazi, Abdulrahman Senyor, Naif Alanezi, Bandar Alotaibi, Munif Alotaibi, Abdul Razaque, Abdelaziz A. Abdelhamid, and Aziz Alotaibi. 2022. "Detection of Unauthorized Unmanned Aerial Vehicles Using YOLOv5 and Transfer Learning" Electronics 11, no. 17: 2669. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11172669

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Unauthorized Unmanned Aerial Vehicles Using YOLOv5 and Transfer Learning

Abstract

1. Introduction

Paper Organization

2. Related Work

3. Research Contribution

4. Materials and UAV-Detection Model

4.1. Background of YOLO Algorithms

4.2. The Classification Framework

5. Experimental Setup and Results

5.1. Dataset

5.2. Evaluation Metric

5.3. Model Tuning

5.4. Results

5.5. Complexity and Parameter Uncertainty

5.6. Comparison with Other Models

5.7. Discussion of Results

6. Future Work

7. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI