Next Article in Journal
Autonomous Multiple Tramp Materials Detection in Raw Coal Using Single-Shot Feature Fusion Detector
Next Article in Special Issue
A Comprehensive Survey of the Recent Studies with UAV for Precision Agriculture in Open Fields and Greenhouses
Previous Article in Journal
Fermented Antler Recovers Stamina, Muscle Strength and Muscle Mass in Middle-Aged Mice
Previous Article in Special Issue
Design of a Sweet Potato Transplanter Based on a Robot Arm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mallard Detection Using Microphone Arrays Combined with Delay-and-Sum Beamforming for Smart and Remote Rice–Duck Farming

1
Faculty of Software and Information Science, Iwate Prefectural University, Takizawa City 020-0693, Japan
2
Faculty of Systems Science and Technology, Akita Prefectural University, Yurihonjo City 015-0055, Japan
3
Faculty of Bioresource Sciences, Akita Prefectural University, Akita City 010-0195, Japan
4
Institute of Engineering Innovation, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan
*
Author to whom correspondence should be addressed.
Submission received: 25 November 2021 / Revised: 19 December 2021 / Accepted: 22 December 2021 / Published: 23 December 2021
(This article belongs to the Collection Agriculture 4.0: From Precision Agriculture to Smart Farming)

Abstract

:
This paper presents an estimation method for a sound source of pre-recorded mallard calls from acoustic information using two microphone arrays combined with delay-and-sum beamforming. Rice farming using mallards saves labor because mallards work instead of farmers. Nevertheless, the number of mallards declines when they are preyed upon by natural enemies such as crows, kites, and weasels. We consider that efficient management can be achieved by locating and identifying the locations of mallards and their natural enemies using acoustic information that can be widely sensed in a paddy field. For this study, we developed a prototype system that comprises two sets of microphone arrays. We used 64 microphones in all installed on our originally designed and assembled sensor mounts. We obtained three acoustic datasets in an outdoor environment for our benchmark evaluation. The experimentally obtained results demonstrated that the proposed system provides adequate accuracy for application to rice–duck farming.

1. Introduction

Rice–duck farming, a traditional organic farming method, uses hybrid ducks released in a paddy field for weed and pest control [1]. Farmers in northern Japan use mallards because of their utility value as a livestock product [2]. Moreover, mallard farming saves labor because mallards work instead of farmers. Mallard farming is a particularly attractive approach for farmers, especially in a regional society that faces severe difficulties posed by the population decrease and rapid progress of aging [3]. As mallard farming has attracted attention, it has become popular not only in Japan, but also in many Asian countries.
One difficulty posed by mallard farming is that a sord of mallards must be gathered to a specific area in a paddy field. They trample down rice, which produces stepping ponds in which it is difficult to grow rice. Another shortcoming is that weed control effects are not obtained in areas outside the range of mallard activities. For this case, accurate position estimation of mallards in a paddy field is necessary. However, as depicted in the right photograph of Figure 1, detecting mallards among grown rice plants involves difficulties because the targets are not visible. Moreover, the number of mallards decreases because of predation by their natural enemies such as crows, kites, and weasels. to protect mallards from their natural enemies, specifying and managing mallard positions in real time are crucially important tasks.
To mitigate or resolve these difficulties, we evaluated an efficient management system created by locating and identifying mallards and their natural enemies using acoustic information that can be sensed widely in a paddy field. Furthermore, one can provide effective control to elucidate exactly when, where, and what kinds of natural enemies are approaching. This study was conducted to develop a position estimation system for mallards in a paddy field. Developing stable production technology is strongly demanded because rice produced by mallard farming trades at a high price on the market. For this system, we expect technological development and its transfer to remote farming [2], which is our conceptual model for actualizing smart farming.
This paper presents a direction and position estimation method for a sound source of pre-recorded mallard calls using acoustic information with arrayed microphones combined with delay-and-sum (DAS) beamforming. Using acoustic information, the approach can detect mallards that are occluded by stalks or grass. Based on the results, we inferred that an efficient management system can be actualized from locating and identifying mallards and their natural enemies using acoustic information that can widely sense a paddy field. We developed a prototype system with a 64-microphone array installed on our originally designed and assembled mount. To reproduce the sounds obtained using a microphone in advance, we conducted a simulated experiment in an actual environment to evaluate the estimation accuracy of the method.
This paper is structured as follows. Section 2 briefly reviews state-of-the-art acoustic and multimodal methods of automatic bird detection from vision and audio modalities. Subsequently, Section 3 and Section 4 present the proposed localization method using an originally developed microphone system based on DAS beamforming. The experimental results obtained using our original acoustic benchmark datasets obtained using two drones are presented in Section 5. Finally, Section 6 presents the conclusions and highlights future work.

2. Related Studies

Detecting birds that fly across the sky in three-dimensional space is a challenging task for researchers and developers. Over a half-century ago, bird detection using radar [4] was studied to prevent bird strikes by aircraft at airports. By virtue of advances in software technology and improvements in the performance and affordability of sensors and computers, bird detection methods have diversified, especially in terms of offering improved accuracy and cost-effective approaches. Recent bird-detection methods can be categorized into two major modalities based on survey articles: visual [5,6,7,8] and audio [9,10,11]. Numerous outstanding methods [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49] have been proposed from both modalities.
Benchmark datasets and challenges for the verification and comparison of detection performance have been expanded in both vision [50,51,52,53] and acoustic [54,55,56,57,58,59,60,61,62,63,64,65] modalities. During the first half of the 2010s, probability models and conventional machine-learning (ML) models combined with part-based features obtained using local descriptors were used widely and popularly. These methods comprise k-means [66], morphological filtering (MF) [67], principal component analysis (PCA), Gaussian mixture models (GMMs), expectation–maximization (EM) algorithms [68], boosting [69], scale-invariant feature transform (SIFT) [70], random forests (RFs) [71], bag-of-features (BoF) [72], histograms of oriented gradients (HOGs) [73], support vector machines (SVMs) [74], local binary patterns (LBPs) [75], background subtraction (BS) [76], and multi-instance, multi-label (MIML) [77]. These methods require preprocessing of input signals for the enhancement of the features. Moreover, algorithm selection and parameter optimizations conducted in advance for preprocessing have often relied on the subjectivity and experience of the developers. If data characteristics differ even slightly, then performance and accuracy drop drastically. Therefore, parameter calibration is necessary and represents a great deal of work. In the latter half of the 2010s, deep-learning (DL) algorithms have been predominant, especially after the implementation of the error back-propagation learning algorithm [78] in convolutional neural networks (CNNs) [79].
Table 1 and Table 2 respectively present representative studies of vision-based and sound-based bird detection methods reported during the last decade. As presented in the fourth columns, the representative networks and backbones used in bird detection methods are the following: regions with CNN (RCNN) [80], VGGNet [81], Inception [82], ResNet [83], XNOR-Net [84], densely connected convolutional neural networks (DC-CNNs) [85], fast RCNN [86], faster RCNN [87], you only look once (YOLO) [88], and weakly supervised data augmentation network (WS-DAN) [89]. End-to-end DL models require no pre-processing for feature extraction of the input signals [90]. Moreover, one-dimensional acoustic signals can be input to the DL model as two-dimensional images [91].
Recently, vision-based detection methods have targeted not only birds, but also drones [92]. In 2019, the Drone-vs-Bird Detection Challenge (DBDC) was held at the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) [93]. The goal of this challenge was the detection of drones that appear at several image coordinates in videos without detecting birds that are present in the same frames. The developed algorithms must alert detected drones only, without alerting birds. Moreover, it is necessary to provide estimates of drone locations on the image coordinates. As a representative challenge for acoustic bird detection, bird acoustic detection [65] was conducted at the Detection and Classification of Acoustic Scenes and Events (DCASE) in 2018. Among the 33 submissions evaluated for this challenge, the team of Lasseck et al. [44] was awarded the champion title. For this detection task, the organizer provided 10 h of sound clips, including negative samples with human imitations of bird sounds. The algorithms produced a binary output of whether or not the sounds were birds, based on randomly extracted sound clips of 10 s. There are differences between our study and the extraction challenge of the coordinates of birds inferred from sound clips of the same period. To improve the accuracy for wide applications, the algorithms developed for the challenge incorporated domain adaptation, transfer learning [94], and generative adversarial networks (GANs) [95]. By contrast, our study is specialized for application to mallard farming in remote farming [2].
In comparison to a large number of previous studies mentioned above, the novelty and contributions of this study are as follows:
  • The proposed method can detect the position of a sound source of pre-recorded mallard calls output from a speaker using two parameters obtained from our originally developed microphone arrays;
  • Compared with existing sound-based methods, our study results provide a detailed evaluation that comprises 57 positions in total through three evaluation experiments;
  • To the best of our knowledge, this is the first study to demonstrate and evaluate mallard detection based on DAS beamforming in the wild.
Based on the underlying method used in drone detection [96] and our originally developed single-sensor platform [97], this paper presents a novel sensing system and shows experimentally obtained results obtained from outdoor experiments.

3. Proposed Method

3.1. Position Estimation Principle

Figure 2 depicts the assignment of two microphone arrays for position estimation. The direction perpendicular to the array was set to 0° of the beam. Let ( x , y ) be the coordinate of an estimation target position based on origin O. The two microphone arrays, designated as M1 and M2, are installed at distance p x horizontally and at distance p y vertically.
Assuming θ 1 and θ 2 as the angles between the lines perpendicular to p x and p y and the straight line to ( x , y ) , and letting L 1 and L 2 represent the straight distances between the respective microphone array systems and ( x , y ) , then using trigonometric functions, p x and p y are given as presented below [98].
p x = L 1 cos θ 1 + L 2 sin θ 2 ,
L 1 = p x L 2 sin θ 2 cos θ 1 ,
p y = L 1 sin θ 1 + L 2 cos θ 2
= p x L 2 sin θ 2 cos θ 1 sin θ 1 + L 2 cos θ 1 cos θ 2 cos θ 1
= L 2 cos θ 1 cos θ 2 cos θ 1 L 2 sin θ 1 sin θ 2 cos θ 1 + p x sin θ 1 cos θ 1 .
The following expanded equation is obtained by solving for L 2 .
L 2 cos θ 1 cos θ 2 sin θ 1 sin θ 2 cos θ 1 = p y p x sin θ 1 cos θ 1
L 2 cos ( θ 1 + θ 2 ) cos θ 1 = p y cos θ 1 p x sin θ 1 cos θ 1
L 2 = p x sin θ 1 p y cos θ 1 cos ( θ 1 + θ 2 ) .
One can obtain ( x , y ) as follows:
x y = p x L 2 cos θ 2 p y L 2 sin θ 2 .
Alternatively, using L 2 , one can obtain p y as follows:
L 2 = p x L 1 cos θ 1 sin θ 2 ,
p y = L 1 sin θ 1 + p x L 1 cos θ 1 sin θ 2 cos θ 2
= L 1 sin θ 1 sin θ 2 sin θ 2 + p x cos θ 2 sin θ 2 L 1 cos θ 1 cos θ 2 sin θ 2 .
The following expanded equation is obtained by solving for L 1 .
L 1 sin θ 1 sin θ 2 cos θ 1 cos θ 2 sin θ 2 = p y p x cos θ 2 sin θ 2
L 1 cos ( θ 1 + θ 2 ) sin θ 2 = p y sin θ 2 p x cos θ 2 sin θ 2
L 1 = p x sin θ 2 p y sin θ 2 cos ( θ 1 + θ 2 ) .
Using L 1 and θ 1 , one can obtain ( x , y ) as follows:
x y = p x L 1 cos θ 1 p y L 1 sin θ 1 .
Angles θ 1 and θ 2 are calculated using the DAS beamforming method, as shown in Figure 3.

3.2. DAS Beamforming Algorithm

Beamforming is a versatile technology used for directional signal enhancement with sensor arrays [99]. Letting y ( t ) be a beamformer output signal at time t and letting M be the number of microphones, then for z m ( t ) and w m ( t ) , which respectively denote a measured signal and a filter of the m-th sensor, y ( t ) is calculated as:
y ( t ) = m = 1 M w m ( t ) z m ( t ) ,
where symbol ⊗ represents convolution.
Based on our previous study [96], we considered DAS beamforming in the temporal domain. Assuming that single plane waves exist and letting s m ( t ) be a set of acoustic signals, then delay τ m as expressed for the formula below occurs for incident waves observed for the m-th sensor.
z m ( t ) = s m ( t τ m ) ,
where M represents the total number of microphones.
The delayed τ m of incident waves was offset by the advanced + τ m of incident waves for a filter. Signals from the direction of θ are enhanced because of the gathered phases of signal s m ( t ) in all channels. The temporal compensated filter w m ( t ) is defined as:
w m ( t ) = 1 M δ ( t + τ m ) ,
where δ is Dirac’s delta function.
Letting θ be an angle obtained using beamforming, then, for the comparison of acoustic directions, the relative mean power level G ( θ ) of y ( t ) is defined as:
G ( θ ) = 1 T t = 0 T y 2 ( t ) ,
where T represents the interval time length.
We changed θ from −90° to 90° with 1° intervals. Let P 1 ( θ ) and P 2 ( θ ) respectively denote G ( θ ) obtained from M1 and M2. Using P 1 ( θ ) and P 2 ( θ ) , one can obtain θ 1 and θ 2 as:
θ n = arg max 90 θ 90 P n ( θ ) , n = { 1 , 2 }

4. Measurement System

4.1. Mount Design

For this study, we designed an original sensor mount with installed microphones and electric devices: an amplifier, an analog–digital converter, a battery, an inverter, and a laptop computer. Figure 4 depicts the design and assembled mount. We used 20 mm square aluminum pipes for the main frame. The terminals were joined with L-shaped connectors, bolts, and nuts. To set the electrical devices, acrylic boards were used for the plate located at a 575 mm height. Microphones were installed on a bracket arm as an array for a straight 1200 mm long line with a 30 mm gap.
As for the installation of microphones, a spatial foldback occurs in high-frequency signals if the gap is wide. In general [98], the distance d between microphones and the upper frequency f where no foldback occurs is defined as:
d c 2 f
where the constant c is the speed of sound. We confirmed from our preliminary experimental results that the power of mallard calls was concentrated in the band below 5 kHz. Therefore, we installed the microphones in 30 mm intervals because the upper limit is approximately 5.6 kHz for d = 0.030 . In addition, the directivity is reduced for low frequencies, if d is narrow.
The height from the ground to the microphones was 1150 mm. We used cable ties to tighten the microphones to the bracket arm. We also labeled the respective microphones with numbers that made it easy to check the connector cables to the converter terminals.

4.2. Microphone Array

Table 3 presents the components and their quantities for the respective microphone arrays. We introduced different devices for M1 and M2 because of the various circumstances such as an introduced period, research budget preparation, and coordination with our related research projects. The microphones used for M1 and M2 were, respectively, DBX RTA-M (Harman International Inds., Inc., Stamford, CT, USA) and ECM-8000 (Behringer Music Tribe Makati, Metro Manila, Philippines). Table 4 presents detailed specifications for both microphones. The common specification parameters are an omnidirectional polar pattern and the 20–20,000 Hz frequency range. For the amplifier and analog-to-digital (AD) converter, we used separate devices for M1 and an integrated device for M2. Here, both converters include a built-in anti-aliasing filter. Under the condition of the arrayed microphone interval, d = 0.030 m, grating lobes may appear for the upper limit of 5.6 kHz and above. However, we assumed that the effect is relatively small because the target power of mallard calls is concentrated in the band below 5 kHz. For outdoor data acquisition experiments, we used portable lithium-ion batteries of two types. Moreover, to supply measurement devices with 100 V electrical power, the commercial power supply voltage in Japan, we used an inverter to convert direct current into alternating current.
Figure 5 portrays photographs of all the components in Table 3 installed on the mount depicted in Figure 4. In all, the 64 microphones were connected to their respective amplifiers with 64 cables in parallel.

5. Evaluation Experiment

5.1. Experiment Setup

Acoustic data acquisition experiments were conducted at the Honjo Campus of Akita Prefectural University (Yurihonjo City, Akita Prefecture, Japan). This campus, located at 39°23′37″ north latitude and 140°4′23″ east longitude, has a size of 204,379 m2. The left panel of Figure 6 portrays an aerial photograph of the campus. The surroundings comprise an expressway to the east and a public road to the south. The experiment was conducted at the athletic field on the west side of the campus, as depicted in the right panel of Figure 6.
We used a pre-recorded sound of a mallard as the sound source. We played this sound on a Bluetooth wireless speaker that had been loaded on a small unmanned ground vehicle (UGV) to move to the respective measurement positions. In our earlier study, we developed this UGV as a prototype for mallard navigation, particularly for remote farming [2]. We moved it remotely via a wireless network. Our evaluation target was the detection of stationary sound sources. While the UGV was stopped at an arbitrary position, we played a mallard call sound file for 10 s. The sound pressure level produced from the speaker was 75 ± 10 dB.
We conducted experiments separately on 19, 26, and 28 August 2020. Table 5 shows the meteorological conditions on the respective experimental days. All days were clear and sunny under a migratory anticyclone. Experiments were conducted during 14:00 to 15:00 Japan Standard Time (JST), which is 9 h ahead of the Coordinated Universal Time (UTC). The temperature was approximately 30 °C. The humidity was in the typical mean range for Japan. The wind speed was less than 5 m/s.
We conducted three evaluation experiments, designated as Experiments A–C. Experiment A comprised a sound source orientation estimation experiment using a single microphone array. The objective of this experiment was to verify the angular resolutions. Subsequently, Experiments B and C provided position estimation experimental results obtained using two microphone arrays. The difference between the two experimental setups was the field size and the orientations of the respective microphone arrays.

5.2. Experiment A

Figure 7 depicts the setup of the sound source positions and the microphone array for Experiment A. We used M1 alone to evaluate the angular resolution as a preliminary experiment. The sound source comprises 26 positions: P1–P26. We divided these positions into two groups as P1–P13 and P14–P26 which were located, respectively, on the circumferences of half circles with radii of 10 m and 20 m. The angle interval of the respective positions from the origin was 15°. Here, P1–P13 existed on the straight lines of P14–P26 from the origin.
Let θ g denote the ground-truth (GT) angle of θ . Figure 8 depicts the angle estimation results for 16 positions at θg = ±15°, ±30°, ±60°, and ±90°. The experiment results demonstrated a unimodal output distribution of G ( θ ) . Moreover, G ( θ ) at positions with the radius of 10 m were found to be greater than those of 20 m. The purple-filled circles denote the vertices of the respective output waves. We let θ m represent the maximum angle of G ( θ ) in 90 ° θ 90 ° calculated from the Formula (21).
Table 6 presents θ m at the respective positions. The error E between θ g and θ m is calculated as:
E = ( θ g θ m ) 2 .
The experimentally obtained results demonstrated that E = 0 in 15 ° θ m 15 ° . As an overall tendency, the increase of the angles and E demonstrated a positive correlation apart from the negative angles of the 10 m radius.
Figure 9 presents the two-dimensional distributions of ( x m , y m ) for comparison with ( x g , y g ) , which represents the GT coordinates of the respective positions. The theoretical distance between M1 and the respective sound source positions was unobtainable for this experimental setup. Therefore, provisional positions were calculated by substituting L 1 as 10 m or 20 m.
Figure 10 presents scatter plots of θ g and θ m . The distribution results for L 1 = 10 m and L 1 = 20 m are presented, respectively, in the left and right panels. On the plus side, θ m for θ g exhibited smaller values as the angle increased. By contrast, on the minus side, it exhibited greater values as the angle increased. As the angle increased, the absolute θ m for absolute θ g tended to decrease.

5.3. Experiment B

Experiment B was performed to evaluate the position estimation using two microphone arrays: M1 and M2. Figure 11 depicts the setup of the sound source positions and the microphone arrays for Experiment B. The experiment field was a square area of 30 m in length and width. The coordinates of M1 and M2 were, respectively, (0, 30) in the upper left corner and (30, 0) in the lower right corner. The microphone frontal orientations of M1 and M2 were, respectively, horizontal and vertical. For this arrangement, the positions on the diagonal between M1 and M2 were undetermined because the intersection of L 1 and L 2 was indefinite in the condition of θ 1 = θ 2 . Therefore, the estimation target comprised 12 positions at 10 m intervals along the vertical and horizontal axes, excluding coordinates on the diagonal.
Figure 12 depicts the angle estimation results for 8 of 12 positions. The respective output waves exhibited a distinct peak, especially for shallow angles.
Table 7 presents the experiment results: θ g 1 , θ g 2 , θ m 1 , θ m 2 , E 1 , and E 2 . The mean error values of M1 and M2 were, respectively, 1.25° and 0.92°. The highest error of M1 was 7° at P7. This error increased the mean M1 error.
Table 8 presents the position estimation results. Coordinates ( x m , y m ) were calculated from θ m 1 and θ m 2 based on the proposed method (21). The error values E x and E y were calculated from the following.
E x E y = ( x g x m ) 2 ( y g y m ) 2 .
The mean error values of E x and E y are, respectively, 0.43 m and 0.42 m. The details showed that P11 and P12 had no error. The error at P1 was the highest. Figure 13 presents the distributions of the GT coordinates ( x g , y g ) and estimated coordinates ( x m , y m ) . The six positions on the upper right indicate that the error values were small. This resulting tendency was attributed to the measurement angles of M1 and M2 being within 45°. The six positions on the bottom left indicate that the error values were large. This resultant tendency was attributed to the measurement angles of M1 and M2, which were greater than 45°.
Figure 14 presents scatter plots of the GT coordinates and estimated coordinates. The distribution results for x and y are presented, respectively, in the left and right panels. Compared to Experiment A, the arrangement of the microphone array for Experiment B used 90°, which is half of the effective measurement range. However, the estimated angle error increased as it approached 90°. The error distribution trend demonstrated that the estimated position coordinates were calculated as a larger value than the GT position coordinates.

5.4. Experiment C

Figure 15 shows the experimental setup used for Experiment C. Position estimation was conducted in a rectangular area of 30 m in width and 20 m in length. The two microphone arrays, M1 and M2, were installed, respectively, at coordinates (0, 20) and (30, 0). The microphone frontal orientation of M1 and M2 was set to the equivalent of their diagonals to use the small error range around 0° effectively. The position detection interval was 5 m horizontally and vertically. For Experiment C, the position that could not be calculated on the diagonal was merely (15, 10). This field consists of a rectangular shape, unlike the square field for Experiment A. The estimation target for Experiment C had 32 positions: P1–P32.
Figure 16 depicts the angle estimation results obtained for each group with similar vertical positions to those shown in Figure 15. The respective results demonstrated that distinct peaks shown as filled purple circles from the output waves were obtained at the source direction angles.
Table 9 presents the estimated angles and error values. The mean error values of M1 and M2 were, respectively, 0.28° and 0.53°. The error values were smaller than 2° because an effective angular range smaller than ±45° was used.
Table 10 presents the position estimation results. The mean error values of E x and E y were, respectively, 0.46 m and 0.38 m. The error value at P11 was the highest. Moreover, the error values at P6, P21, and P22 were high. These positions were present in the shallow angles. For this setup of the microphone frontal directions, slight angular errors induced greater coordinate errors.
Figure 17 portrays the distributions for the GT positions and estimated positions. This tendency indicated that the error values were significant at positions near the diagonal between M1 and M2.
Figure 18 portrays the scatter plots between the GT coordinates and estimated coordinates for the respective axes. The distribution variation shown for the results of Experiment C was higher than that of the scatter plots for Experiment B. By contrast, the distribution variation was low in the lower left and upper right regions, which were more distant from both microphone arrays.

5.5. Discussion

In Japanese rice cultivation, paddy rice seedlings are planted with approximately a 0.3 m separation. Therefore, mallards move in a grid up to 0.3 m. Table 11 presents the simulated evaluation results of the estimation accuracy for mallard detection within grids up to the third neighborhood in each coordinate for Experiments B and C. The experimentally obtained results revealed that the proposed method can detect mallards with 67.7% accuracy in a similar grid. Moreover, accuracies of 78.9%, 83.3%, and 88.3% were obtained, respectively, in the first, second, and third neighborhood regions. We considered that the allowable errors can be set widely for various mallard group sizes. Although this study was conducted to assess the application of the method in a rice paddy field, the accuracy can be expected to vary greatly depending on other applications.

6. Conclusions

This study was undertaken to detect mallards based on acoustic information. We developed a prototype system comprising two sets of microphone arrays. In all, we used 64 microphones installed on our originally designed and assembled sensor mounts. For the benchmark evaluation, we obtained three acoustic datasets in an outdoor environment. For the first experiment, the angular resolutions were evaluated using a single microphone array. An error accumulation tendency was demonstrated as the angle increased from the front side of the microphones. For the second experiment, the sound sources were estimated at 12 positions with 10 m grids in a square area. The accumulated position errors were affected by the frontal orientation settings of the two microphone arrays. For the third experiment, sound sources were at 23 positions with 5 m grids in a rectangular area. To minimize blind positions, the two microphone arrays were installed with their front sides facing diagonally. Although the error increased in the diagonal direction because of the effect of the angular resolution of shallow angles, the positional errors were reduced compared to the second experiment. These experimentally obtained results revealed that the proposed system demonstrated adequate accuracy for application to rice–duck farming.
As future work aimed at practical use, we expect to improve our microphone array system in terms of miniaturization and waterproofing to facilitate its practical use for remote farming. For the problem of missing sound sources on the diagonal line between the two microphone arrays, we must consider superposition approaches using probability maps. We would like to distinguish mallard calls from noises such as the calls of larks, herons, and frogs in paddy fields. Moreover, for the protection of the ducks, we expect to combine acoustic information and visual information to detect and recognize the natural enemies of mallards. For this task, we plan on developing a robot that imitates natural enemies such as crows, kites, and weasels. We expect to improve the resolution accuracy of multiple mallard sound sources. Furthermore, we would like to discriminate mallards from crows and other birds and predict the time series mobility paths of mallards using a DL-based approach combine with state-of-the-art backbones. Finally, we would like to actualize the efficient protection of mallards by developing a system that notifies farmers of predator intrusion based on escape behavior patterns. This is because physical protection of mallards, such as electric fences and nylon lines around over paddy fields, requires a significant effort from farmers.

Author Contributions

Conceptualization, H.M.; methodology, K.W. and M.N.; software, S.N.; validation, K.W. and M.N.; formal analysis, S.Y.; investigation, S.Y.; resources, K.W. and M.N.; data curation, S.N.; writing—original draft preparation, H.M.; writing—review and editing, H.M.; visualization, H.W.; supervision, H.M.; project administration, K.S.; funding acquisition, H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers 17K00384 and 21H02321.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Datasets described as the results of this study are available upon request to the corresponding author.

Acknowledgments

We would like to express our appreciation to Masumi Hashimoto and Tsubasa Ebe, who are graduates of Akita Prefectural University, for their great cooperation with experiments.

Conflicts of Interest

The authors declare that they have no conflict of interest. The funders had no role in the design of the study, in the collection, analyses, or in interpretation of data, in the writing of the manuscript, nor in the decision to publish the results.

Abbreviations

The following abbreviations and acronyms are used for this report.
ADanalog-to-digital
AVSSadvanced video and signal-based surveillance
BoFbag-of-features
BSbackground subtraction
CNNconvolutional neural network
DASdelay-and-sum
DBDCDrone-vs-Bird Detection Challenge
DCASEdetection and classification of acoustic scenes and events
DC-CNNdensely connected convolutional neural network
DLdeep learning
EMexpectation–maximization
GANgenerative adversarial network
GMMGaussian mixture models
HOGhistogram of oriented gradients
JSTJapan Standard Time
LBPlocal binary pattern
MFmorphological filtering
MIMLmulti-instance, multi-label
MLmachine learning
PCAprincipal component analysis
RCNNregions with convolutional neural network
RFrandom forest
SIFTscale-invariant features transform
SVMsupport vector machine
UGVunmanned ground vehicle
UTCCoordinated Universal Time
WS-DANweakly supervised data augmentation network
YOLOyou only look once

References

  1. Hossain, S.; Sugimoto, H.; Ahmed, G.; Islam, M. Effect of Integrated Rice-Duck Farming on Rice Yield, Farm Productivity, and Rice-Provisioning Ability of Farmers. Asian J. Agric. Dev. 2005, 2, 79–86. [Google Scholar]
  2. Madokoro, H.; Yamamoto, S.; Nishimura, Y.; Nix, S.; Woo, H.; Sato, K. Prototype Development of Small Mobile Robots for Mallard Navigation in Paddy Fields: Toward Realizing Remote Farming. Robotics 2021, 10, 63. [Google Scholar] [CrossRef]
  3. Reiher, C.; Yamaguchi, T. Food, agriculture and risk in contemporary Japan. Contemp. Jpn. 2017, 29, 2–13. [Google Scholar] [CrossRef]
  4. Lack, D.; Varley, G. Detection of Birds by Radar. Nature 1945, 156, 446. [Google Scholar] [CrossRef]
  5. Chabot, D.; Francis, C.M. Computer-automated bird detection and counts in high-resolution aerial images: A review. J. Field Ornithol. 2016, 87, 343–359. [Google Scholar] [CrossRef]
  6. Goel, S.; Bhusal, S.; Taylor, M.E.; Karkee, M. Detection and Localization of Birds for Bird Deterrence Using UAS. In Proceedings of the 2017 ASABE Annual International Meeting, Spokane, WA, USA, 16–19 July 2017. [Google Scholar]
  7. Siahaan, Y.; Wardijono, B.A.; Mukhlis, Y. Design of Birds Detector and Repellent Using Frequency Based Arduino Uno with Android System. In Proceedings of the 2017 2nd International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 1–3 November 2017; pp. 239–243. [Google Scholar]
  8. Aishwarya, K.; Kathryn, J.C.; Lakshmi, R.B. A Survey on Bird Activity Monitoring and Collision Avoidance Techniques in Windmill Turbines. In Proceedings of the 2016 IEEE Technological Innovations in ICT for Agriculture and Rural Development, Chennai, India, 15–16 July 2016; pp. 188–193. [Google Scholar]
  9. Bas, Y.; Bas, D.; Julien, J.F. Tadarida: A Toolbox for Animal Detection on Acoustic Recordings. J. Open Res. Softw. 2017, 5, 6. [Google Scholar] [CrossRef] [Green Version]
  10. Dong, X.; Jia, J. Advances in Automatic Bird Species Recognition from Environmental Audio. J. Phys. Conf. Ser. 2020, 1544, 012110. [Google Scholar] [CrossRef]
  11. Kahl, S.; Clapp, M.; Hopping, W.; Goëau, H.; Glotin, H.; Planqué, R.; Vellinga, W.P.; Joly, A. Overview of BirdCLEF 2020: Bird Sound Recognition in Complex Acoustic Environments. In Proceedings of the 11th International Conference of the Cross-Language Evaluation Forum for European Languages, Thessaloniki, Greece, 20–25 September 2020. [Google Scholar]
  12. Qing, C.; Dickinson, P.; Lawson, S.; Freeman, R. Automatic nesting seabird detection based on boosted HOG-LBP descriptors. In Proceedings of the 18th IEEE International Conference on Image, Brussels, Belgium, 11–14 September 2011; pp. 3577–3580. [Google Scholar]
  13. Descamps, S.; B’echet, A.; Descombes, X.; Arnaud, A.; Zerubia, J. An Automatic Counter for Aerial Images of Aggregations of Large Birds. Bird Study 2011, 58, 302–308. [Google Scholar] [CrossRef]
  14. Farrell, R.; Oza, O.; Zhang, N.; Morariu, V.I.; Darrell, T.; Davis, L.S. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 161–168. [Google Scholar]
  15. Mihreteab, K.; Iwahashi, M.; Yamamoto, M. Crow birds detection using HOG and CS-LBP. In Proceedings of the International Symposium on Intelligent Signal Processing and Communications Systems, New Taipei City, Taiwan, 4–7 November 2012. [Google Scholar]
  16. Liu, J.; Belhumeur, P.N. Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 3–6 December 2013; pp. 2520–2527. [Google Scholar]
  17. Xu, Q.; Shi, X. A simplified bird skeleton based flying bird detection. In Proceedings of the 11th World Congress on Intelligent Control and Automation, Shenyang, China, 27–30 June 2014; pp. 1075–1078. [Google Scholar]
  18. Yoshihashi, R.; Kawakami, R.; Iida, M.; Naemura, T. Evaluation of Bird Detection Using Time-Lapse Images around a Wind Farm. In Proceedings of the European Wind Energy Association Conference, Paris, France, 17–20 November 2015. [Google Scholar]
  19. T’Jampens, R.; Hernandez, F.; Vandecasteele, F.; Verstockt, S. Automatic detection, tracking and counting of birds in marine video content. In Proceedings of the Sixth International Conference on Image Processing Theory, Tools and Applications, Oulu, Finland, 12–15 December 2016; pp. 1–6. [Google Scholar]
  20. Takeki, A.; Trinh, T.T.; Yoshihashi, R.; Kawakami, R.; Iida, M.; Naemura, T. Combining Deep Features for Object Detection at Various Scales: Finding Small Birds in Landscape Images. IPSJ Trans. Comput. Vis. Appl. 2016, 8, 5. [Google Scholar] [CrossRef] [Green Version]
  21. Takeki, A.; Trinh, T.T.; Yoshihashi, R.; Kawakami, R.; Iida, M.; Naemura, T. Detection of Small Birds in Large Images by Combining a Deep Detector with Semantic Segmentation. In Proceedings of the 2016 IEEE International Conference on Image Processing, Phoenix, AR, USA, 25–28 September 2016; pp. 3977–3981. [Google Scholar]
  22. Yoshihashi, R.; Kawakami, R.; Iida, M.; Naemura, T. Bird Detection and Species Classification with Time-Lapse Images around a Wind Farm: Dataset Construction and Evaluation. Wind Energy 2017, 20, 1983–1995. [Google Scholar] [CrossRef]
  23. Tian, S.; Cao, X.; Zhang, B.; Ding, Y. Learning the State Space Based on Flying Pattern for Bird Detection. In Proceedings of the 2017 Integrated Communications, Navigation and Surveillance Conference, Herndon, VA, USA, 18–20 April 2017; pp. 5B3-1–5B3-9. [Google Scholar]
  24. Wu, T.; Luo, X.; Xu, Q. A new skeleton based flying bird detection method for low-altitude air traffic management. Chin. J. Aeronaut. 2018, 31, 2149–2164. [Google Scholar] [CrossRef]
  25. Lee, S.; Lee, M.; Jeon, H.; Smith, A. Bird Detection in Agriculture Environment using Image Processing and Neural Network. In Proceedings of the 6th International Conference on Control, Decision and Information Technologies, Paris, France, 23–26 April 2019; pp. 1658–1663. [Google Scholar]
  26. Vishnuvardhan, R.; Deenadayalan, G.; Vijaya Gopala Rao, M.V.; Jadhav, S.P.; Balachandran, A. Automatic Detection of Flying Bird Species Using Computer Vision Techniques. In Proceedings of the International Conference on Physics and Photonics Processes in Nano Sciences, Eluru, India, 20–22 June 2019. [Google Scholar]
  27. Hong, S.J.; Han, Y.; Kim, S.Y.; Lee, A.Y.; Kim, G. Application of Deep-Learning Methods to Bird Detection Using Unmanned Aerial Vehicle Imagery. Sensors 2019, 19, 1651. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Boudaoud, L.B.; Maussang, F.; Garello, R.; Chevallier, A. Marine Bird Detection Based on Deep Learning using High-Resolution Aerial Images. In Proceedings of the OCEANS 2019—Marseille, Marseille, France, 17–20 June 2019; pp. 1–7. [Google Scholar]
  29. Jo, J.; Park, J.; Han, J.; Lee, M.; Smith, A.H. Dynamic Bird Detection Using Image Processing and Neural Network. In Proceedings of the 7th International Conference on Robot Intelligence Technology and Applications, Daejeon, Korea, 1–3 November 2019; pp. 210–214. [Google Scholar]
  30. Fan, J.; Liu, X.; Wang, X.; Wang, D.; Han, M. Multi-Background Island Bird Detection Based on Faster R-CNN. Cybern. Syst. 2020, 52, 26–35. [Google Scholar] [CrossRef]
  31. Akcay, H.G.; Kabasakal, B.; Aksu, D.; Demir, N.; Öz, M.; Erdoǧan, A. Automated Bird Counting with Deep Learning for Regional Bird Distribution Mapping. Animals 2020, 10, 1207. [Google Scholar] [CrossRef]
  32. Mao, X.; Chow, J.K.; Tan, P.S.; Liu, K.; Wu, J.; Su, Z.; Cheong, Y.H.; Ooi, G.L.; Pang, C.C.; Wang, Y. Domain Randomization-Enhanced Deep Learning Models for Bird Detection. Sci. Rep. 2021, 11, 639. [Google Scholar] [CrossRef]
  33. Marcoň, P.; Janoušek, J.; Pokorný, J.; Novotný, J.; Hutová, E.V.; Širůčková, A.; Čáp, M.; Lázničková, J.; Kadlec, R.; Raichl, P.; et al. A System Using Artificial Intelligence to Detect and Scare Bird Flocks in the Protection of Ripening Fruit. Sensors 2021, 21, 4244. [Google Scholar] [CrossRef] [PubMed]
  34. Jančovič, P.; Köküer, M. Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments. EURASIP J. Adv. Signal Process. 2011, 2011, 982936. [Google Scholar] [CrossRef] [Green Version]
  35. Briggs, F.; Lakshminarayanan, B.; Neal, L.; Fern, X.Z.; Raich, R.; Hadley, S.J.K.; Hadley, A.S.; Betts, M.G. Acoustic Classification of Multiple Simultaneous Bird Species: A Multi-Instance Multi-Label Approach. J. Acoust. Soc. Am. 2012, 131, 4640–4650. [Google Scholar] [CrossRef] [Green Version]
  36. Stowell, D.; Plumbley, M.D. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ Life Environ. 2014, 2, E488. [Google Scholar] [CrossRef] [Green Version]
  37. Papadopoulos, T.; Roberts, S.; Willis, K. Detecting bird sound in unknown acoustic background using crowdsourced training data. arXiv 2015, arXiv:1505.06443v1. [Google Scholar]
  38. de Oliveira, A.G.; Ventura, T.M.; Ganchev, T.D.; de Figueiredo, J.M.; Jahn, O.; Marques, M.I.; Schuchmann, K.-L. Bird acoustic activity detection based on morphological filtering of the spectrogram. Appl. Acoust. 2015, 98, 34–42. [Google Scholar] [CrossRef]
  39. Adavanne, S.; Drossos, K.; Cakir, E.; Virtanen, T. Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection. In Proceedings of the 25th European Signal Processing Conference, Kos, Greece, 28 August–2 September 2017; pp. 1729–1733. [Google Scholar]
  40. Pellegrini, T. Densely connected CNNs for bird audio detection. In Proceedings of the 25th European Signal Processing Conference, Kos, Greece, 28 August–2 September 2017; pp. 1734–1738. [Google Scholar]
  41. Cakir, E.; Adavanne, S.; Parascandolo, G.; Drossos, K.; Virtanen, T. Convolutional Recurrent Neural Networks for Bird Audio Detection. In Proceedings of the 25th European Signal Processing Conference, Kos, Greece, 28 August–2 September 2017; pp. 1744–1748. [Google Scholar]
  42. Kong, Q.; Xu, Y.; Plumbley, M.D. Joint detection and classification convolutional neural network on weakly labelled bird audio detection. In Proceedings of the 25th European Signal Processing Conference, Kos, Greece, 28 August–2 September 2017; pp. 1749–1753. [Google Scholar]
  43. Grill, T.; Schlüter, J. Two Convolutional Neural Networks for Bird Detection in Audio Signals. In Proceedings of the 25th European Signal Processing Conference, Kos, Greece, 28 August–2 September 2017; pp. 1764–1768. [Google Scholar]
  44. Lasseck, M. Acoustic Bird Detection with Deep Convolutional Neural Networks. In Proceedings of the IEEE AASP Challenges on Detection and Classification of Acoustic Scenes and Events, Online, 30 March–31 July 2018. [Google Scholar]
  45. Liang, W.K.; Zabidi, M.M.A. Bird Acoustic Event Detection with Binarized Neural Networks. Preprint 2020. [Google Scholar] [CrossRef]
  46. Solomes, A.M.; Stowell, D. Efficient Bird Sound Detection on the Bela Embedded System. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal, Barcelona, Spain, 4–8 May 2020; pp. 746–750. [Google Scholar]
  47. Hong, T.Y.; Zabidi, M.M.A. Bird Sound Detection with Convolutional Neural Networks using Raw Waveforms and Spectrograms. In Proceedings of the International Symposium on Applied Science and Engineering, Erzurum, Turkey, 7–9 April 2021. [Google Scholar]
  48. Kahl, S.; Wood, C.M.; Eibl, M.; Klinck, H. BirdNET: A deep learning solution for avian diversity monitoring. Ecol. Inform. 2021, 61, 101236. [Google Scholar] [CrossRef]
  49. Zhong, M.; Taylor, R.; Bates, N.; Christey, D.; Basnet, B.; Flippin, J.; Palkovitz, S.; Dodhia, R.; Ferres, J.L. Acoustic detection of regionally rare bird species through deep convolutional neural networks. Ecol. Inform. 2021, 64, 101333. [Google Scholar] [CrossRef]
  50. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.F. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  51. Anil, R. CSE 252: Bird’s Eye View: Detecting and Recognizing Birds Using the BIRDS 200 Dataset. 2011. Available online: https://cseweb.ucsd.edu//classes/sp11/cse252c/projects/2011/ranil_final.pdf (accessed on 21 December 2021).
  52. Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech–UCSD Birds-2000-2011 Dataset. Computation & Neural Systems Technical Report, CNS-TR-2011-001. 2011. Available online: http://www.vision.caltech.edu/visipedia/papers/CUB_200_2011.pdf (accessed on 21 December 2021).
  53. Yoshihashi, R.; Kawakami, R.; Iida, M.; Naemura, T. Construction of a bird image dataset for ecological investigations. In Proceedings of the IEEE International Conference on Image Processing, Quebec City, QC, Canada, 27–30 September 2015; pp. 4248–4252. [Google Scholar]
  54. Buxton, R.T.; Jones, I.L. Measuring nocturnal seabird activity and status using acoustic recording devices: Applications for island restoration. J. Field Ornithol. 2012, 83, 47–60. [Google Scholar] [CrossRef]
  55. Glotin, H.; LeCun, Y.; Artieŕes, T.; Mallat, S.; Tchernichovski, O.; Halkias, X. Neural Information Processing Scaled for Bioacoustics, from Neurons to Big Data; Neural Information Processing Systems Foundation: San Diego, CA, USA, 2013. [Google Scholar]
  56. Stowell, D.; Plumbley, M.D. An open dataset for research on audio field recording archives: Freefield1010. arXiv 2013, arXiv:1309.5275v2. [Google Scholar]
  57. Goëau, H.; Glotin, H.; Vellinga, W.-P.; Rauber, A. LifeCLEF bird identification task 2014. In Proceedings of the 5th International Conference and Labs of the Evaluation Forum, Sheffield, UK, 15–18 September 2014. [Google Scholar]
  58. Salamon, J.; Jacoby, C.; Bello, J.P. A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 1041–1044. [Google Scholar]
  59. Vellinga, W.P.; Planqué, R. The Xeno-Canto collection and its relation to sound recognition and classification. In Proceedings of the 6th International Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015. [Google Scholar]
  60. Stowell, D.; Giannoulis, D.; Benetos, E.; Lagrange, M.; Plumbley, M.D. Detection and Classification of Audio Scenes and Events. IEEE Trans. Multimed. 2015, 17, 1733–1746. [Google Scholar] [CrossRef]
  61. Salamon, J.; Bello, J.P.; Farnsworth, A.; Robbins, M.; Keen, S.; Klinck, H.; Kelling, S. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLoS ONE 2016, 11, e0166866. [Google Scholar] [CrossRef]
  62. Stowell, D.; Wood, M.; Stylianou, Y.; Glotin, H. Bird Detection in Audio: A Survey and a Challenge. In Proceedings of the IEEE 26th International Workshop on Machine Learning for Signal Processing, Salerno, Italy, 13–16 September 2016; pp. 1–6. [Google Scholar]
  63. Darras, K.; Pütz, P.; Fahrurrozi; Rembold, K.; Tscharntke, T. Measuring sound detection spaces for acoustic animal sampling and monitoring. Biol. Conserv. 2016, 201, 29–37. [Google Scholar] [CrossRef]
  64. Hervás, M.; Alsina-Pagés, R.M.; Alias, F.; Salvador, M. An FPGA-Based WASN for Remote Real-Time Monitoring of Endangered Species: A Case Study on the Birdsong Recognition of Botaurus stellaris. Sensors 2017, 17, 1331. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Stowell, D.; Stylianou, Y.; Wood, M.; Pamula, H.; Glotin, H. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. Methods Ecol. Evol. 2018, 10, 368–380. [Google Scholar] [CrossRef] [Green Version]
  66. MacQueen, J.B. Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
  67. Serra, J.; Vincent, L. An overview of morphological filtering. Circuits Syst. Signal Process. 1992, 11, 47–108. [Google Scholar] [CrossRef] [Green Version]
  68. Moon, T.K. The expectation-maximization algorithm. IEEE Signal Process. Mag. 1996, 13, 47–60. [Google Scholar] [CrossRef]
  69. Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
  70. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, Corfu, Greece, 20–25 September 1999. [Google Scholar]
  71. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  72. Csurka, G.; Dance, C.R.; Fan, L.; Willamowski, J.; Bray, C. Visual categorization with bags of keypoints. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2004; pp. 1–22. [Google Scholar]
  73. Dalall, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 886–893. [Google Scholar]
  74. Boser, B.; Guyon, I.; Vapnik, V. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
  75. Heikkila, M.; Schmid, C. Description of interest regions with local binary patterns. Pattern Recognit. 2009, 42, 425–436. [Google Scholar] [CrossRef] [Green Version]
  76. Cristani, M.; Farenzena, M.; Bloisi, D.; Murino, V. Background Subtraction for Automated Multisensor Surveillance: A Comprehensive Review. EURASIP J. Adv. Signal Process. 2010, 1, 24. [Google Scholar] [CrossRef] [Green Version]
  77. Zhou, Z.H.; Zhang, M.L.; Huang, S.J.; Li, Y.F. Multi-Instance Multi-Label Learning. Artif. Intell. 2012, 176, 2291–2320. [Google Scholar] [CrossRef] [Green Version]
  78. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  79. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
  80. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
  81. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556v6. [Google Scholar]
  82. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
  83. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  84. Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 525–542. [Google Scholar]
  85. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 4700–4708. [Google Scholar]
  86. Wang, X.; Shrivastava, A.; Gupta, A. A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 2606–2615. [Google Scholar]
  87. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  88. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  89. Hu, T.; Qi, H.; Huang, Q.; Lu, Y. See Better before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification. arXiv 2019, arXiv:1901.09891. [Google Scholar]
  90. Zinemanas, P.; Cancela, P.; Rocamora, M. End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments. In Proceedings of the 24th Conference of Open Innovations Association, Moscow, Russia, 8–12 April 2019; pp. 533–539. [Google Scholar]
  91. Purwins, H.; Li, B.; Virtanen, T.; Schlüter, J.; Chang, S.; Sainath, T. Deep Learning for Audio Signal Processing. IEEE J. Sel. Top. Signal Process. 2019, 13, 206–219. [Google Scholar] [CrossRef] [Green Version]
  92. Seidailyeva, U.; Akhmetov, D.; Ilipbayeva, L.; Matson, E.T. Real-Time and Accurate Drone Detection in a Video with a Static Background. Sensors 2020, 20, 3856. [Google Scholar] [CrossRef]
  93. Coluccia, A.; Ghenescu, M.; Piatrik, T.; De Cubber, G.; Schumann, A.; Sommer, L.; Klatte, J.; Schuchert, T.; Beyerer, J.; Farhadi, M.; et al. Drone-vs-Bird Detection Challenge at IEEE AVSS2019. In Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, Taiwan, 18–21 September 2019; pp. 1–7. [Google Scholar]
  94. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  95. Pan, Z.; Yu, W.; Yi, X.; Khan, A.; Yuan, F.; Zheng, Y. Recent Progress on Generative Adversarial Networks (GANs): A Survey. IEEE Access 2019, 7, 36322–36333. [Google Scholar] [CrossRef]
  96. Madokoro, H.; Yamamoto, S.; Watanabe, K.; Nishiguchi, M.; Nix, S.; Woo, H.; Sato, K. Prototype Development of Cross-Shaped Microphone Array System for Drone Localization Based on Delay-and-Sum Beamforming in GNSS-Denied Areas. Drones 2021, 5, 123. [Google Scholar] [CrossRef]
  97. Hashimoto, M.; Madokoro, H.; Watanabe, K.; Nishiguchi, M.; Yamamoto, S.; Woo, H.; Sato, K. Mallard Detection using Microphone Array and Delay-and-Sum Beamforming. In Proceedings of the 19th International Conference on Control, Automation and Systems, Jeju, Korea, 15–18 October 2019; pp. 1566–1571. [Google Scholar]
  98. Van Trees, H.L. Optimum Array Processing; Wiley: New York, NY, USA, 2002. [Google Scholar]
  99. Veen, B.; Buckley, K. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Mag. 1988, 5, 4–24. [Google Scholar] [CrossRef]
Figure 1. Mallards for rice–duck farming (left). After some growth of rice plants (right), mallards are hidden by the plants. Detecting mallards visually would be extremely difficult.
Figure 1. Mallards for rice–duck farming (left). After some growth of rice plants (right), mallards are hidden by the plants. Detecting mallards visually would be extremely difficult.
Applsci 12 00108 g001
Figure 2. Arrangement of two microphone arrays M1 and M2 for position estimation.
Figure 2. Arrangement of two microphone arrays M1 and M2 for position estimation.
Applsci 12 00108 g002
Figure 3. Delay-and-sum (DAS) beamforming in the time domain [96].
Figure 3. Delay-and-sum (DAS) beamforming in the time domain [96].
Applsci 12 00108 g003
Figure 4. Originally designed and assembled sensor mount.
Figure 4. Originally designed and assembled sensor mount.
Applsci 12 00108 g004
Figure 5. Photographs showing the appearance of M1 (left) and M2 (right) after the setup.
Figure 5. Photographs showing the appearance of M1 (left) and M2 (right) after the setup.
Applsci 12 00108 g005
Figure 6. Aerial photograph showing the experimental environment (left) and UGV used to move the sound source positions (right).
Figure 6. Aerial photograph showing the experimental environment (left) and UGV used to move the sound source positions (right).
Applsci 12 00108 g006
Figure 7. Setup of sound source positions and the microphone array M1 for Experiment A.
Figure 7. Setup of sound source positions and the microphone array M1 for Experiment A.
Applsci 12 00108 g007
Figure 8. Angle estimation results at 16 positions ( θ g = ± 15 ° , ± 30 ° , ± 60 ° , and ± 90 ° ).
Figure 8. Angle estimation results at 16 positions ( θ g = ± 15 ° , ± 30 ° , ± 60 ° , and ± 90 ° ).
Applsci 12 00108 g008
Figure 9. Comparative positional distributions between ( x g , y g ) and ( x m , y m ) found from Experiment A.
Figure 9. Comparative positional distributions between ( x g , y g ) and ( x m , y m ) found from Experiment A.
Applsci 12 00108 g009
Figure 10. Scatter plots of θ g and θ m for P1–13 (left) and P14–26 (right).
Figure 10. Scatter plots of θ g and θ m for P1–13 (left) and P14–26 (right).
Applsci 12 00108 g010
Figure 11. Setup of sound source positions and microphone arrays for Experiment B.
Figure 11. Setup of sound source positions and microphone arrays for Experiment B.
Applsci 12 00108 g011
Figure 12. Angle estimation results obtained at eight positions (P1, P2, P4, P5, P8, P9, P11, and P12).
Figure 12. Angle estimation results obtained at eight positions (P1, P2, P4, P5, P8, P9, P11, and P12).
Applsci 12 00108 g012
Figure 13. Positional distributions for the GT positions and estimated positions for Experiment A.
Figure 13. Positional distributions for the GT positions and estimated positions for Experiment A.
Applsci 12 00108 g013
Figure 14. Scatter plots showing the GT coordinates and estimated coordinates for Experiment B.
Figure 14. Scatter plots showing the GT coordinates and estimated coordinates for Experiment B.
Applsci 12 00108 g014
Figure 15. Setup of sound source positions and microphone arrays for Experiment C.
Figure 15. Setup of sound source positions and microphone arrays for Experiment C.
Applsci 12 00108 g015
Figure 16. Angle estimation results obtained for P1–P32 ( y = 0 , 5, 10, 15, and 20).
Figure 16. Angle estimation results obtained for P1–P32 ( y = 0 , 5, 10, 15, and 20).
Applsci 12 00108 g016aApplsci 12 00108 g016b
Figure 17. Positional distributions between the GT positions and estimated positions for Experiment C.
Figure 17. Positional distributions between the GT positions and estimated positions for Experiment C.
Applsci 12 00108 g017
Figure 18. Scatter plots showing the GT coordinates and estimated coordinates for Experiment C.
Figure 18. Scatter plots showing the GT coordinates and estimated coordinates for Experiment C.
Applsci 12 00108 g018
Table 1. Representative studies of vision-based bird detection methods reported during the last decade.
Table 1. Representative studies of vision-based bird detection methods reported during the last decade.
YearAuthorsMethodDataset
2011Qing et al. [12]Boosted HOG-LBP + SVMoriginal
2011Descamps et al. [13]Image energyoriginal
2011Farrell et al. [14]HOG + SVM + PNADoriginal
2012Mihreteab et al. [15]HOG + CS-LBP + linear SVMoriginal
2013Liu et al. [16]HOG + linear SVM[52]
2014Xu et al. [17]linear SVM classifieroriginal
2015Yoshihashi et al. [18]BS + CNN[18]
2016T’Jampens et al. [19]SBGS + SIFT + BoW + SVMoriginal
2016Takeki et al. [20]CNN (ResNet) + FCNs + DeepLaboriginal
2016Takeki et al. [21]CNN + FCN + SPoriginal
2017Yoshihashi et al. [22]CNN (ResNet)[53]
2017Tian et al. [23]Faster RCNNoriginal
2018Wu et al. [24]Skeleton-based MPSC[50,52]
2019Lee et al. [25]CNNoriginal
2019Vishnuvardhan et al. [26]Faster RCNNoriginal
2019Hong et al. [27]Faster RCNNoriginal
2019Boudaoud et al. [28]CNNoriginal
2019Jo et al. [29]CNN (Inception-v3)original
2020Fan et al. [30]RCNNoriginal
2020Akcay et al. [31]CNN + RPN + Fast-RCNNoriginal
2021Mao et al. [32]Faster RCNN (ResNet-50)original
2021Marcoň et al. [33]RNN + CNNoriginal
Table 2. Representative studies of sound-based bird detection methods reported during the last decade.
Table 2. Representative studies of sound-based bird detection methods reported during the last decade.
YearAuthorsMethodDataset
2011Jančovič et al. [34]GMMoriginal
2012Briggs et al. [35]MIMLoriginal
2014Stowell et al. [36]PCA + k-means + RF[55]
2015Papadopoulos et al. [37]GMM[60]
2015Oliveira et al. [38]DFT + MForiginal
2017Adavanne et al. [39]CBRNN[62]
2017Pellegrini et al. [40]DenseNet[62]
2017Cakir et al. [41]CRNN[62]
2017Kong et al. [42]JDC-CNN (VGG)[56,62]
2017Grill et al. [43]CNN[56]
2018Lassecket al. [44]CNN[65]
2020Liang et al. [45]BNN (XNOR-Net)[58,61]
2020Solomes et al. [46]CNN[65]
2021Hong et al. [47]2D-CNN[58,59]
2021Kahl et al. [48]CNN (ResNet-157)original
2021Zhong et al. [49]CNN (ResNet-50) + GANoriginal
Table 3. System components and quantities for the respective microphone arrays: M1 and M2.
Table 3. System components and quantities for the respective microphone arrays: M1 and M2.
ItemM1AmountM2Amount
MicrophoneDBX RTA-M×32Behringer ECM8000×32
AmplifierMP32×1ADA8200×4
AD converterOrion32×1(included amplifier)
BatteryJackery 240×1Anker 200×1
Table 4. Detailed specifications of microphones of two types.
Table 4. Detailed specifications of microphones of two types.
ParameterDBX RTA-MBehringer ECM8000
Polar patternOmnidirectionalOmnidirectional
Frequency range20–20,000 Hz20–20,000 Hz
Impedance259 Ω ± 30% (@1 kHz)600 Ω
Sensitivity 63 ± 3 dB 60 dB
Mic head diameter10 mm(no data)
Length145 mm193 mm
Weight(no data)120 g
Table 5. Meteorological conditions on the respective experiment days.
Table 5. Meteorological conditions on the respective experiment days.
ParameterExperiment AExperiment BExperiment C
Date19 August 202026 August 202028 August 2020
Time (JST)14:00–15:0014:00–15:0014:00–15:00
WeatherSunnySunnySunny
Air pressure (hPa)1008.11008.81007.2
Temperature (°C)28.631.733.4
Humidity (%)655358
Wind speed (m/s)4.33.84.9
Wind directionWNWWNWW
Table 6. Angle estimation results and error values found from Experiment A (°).
Table 6. Angle estimation results and error values found from Experiment A (°).
Position θ g θ m EPosition θ g θ m E
P1−90−900P14−90−837
P2−75−723P15−75−687
P3−60−546P16−60−564
P4−45−441P17−45−423
P5−30−291P18−30−291
P6−15−150P19−15−150
P7000P20000
P815150P2115150
P930291P2230291
P1045432P2345432
P1160546P2460546
P1275678P25756312
P13907416P26907218
Table 7. Angle estimation results and error values found from Experiment B (°).
Table 7. Angle estimation results and error values found from Experiment B (°).
Position θ g 1 θ g 2 θ m 1 θ m 2 E 1 E 2
P1−9090−868743
P2−7290−718713
P3−5690−568703
P4−9072−877230
P5−6363−636201
P6−340−34000
P7−9056−835571
P8−2727−272700
P9−180−18000
P1003403400
P1101801800
P12000000
Table 8. Position estimation results obtained from Experiment B (°).
Table 8. Position estimation results obtained from Experiment B (°).
Position x g y g x m y m E x E y
P1001.472.001.472.00
P20101.059.971.050.03
P30200.5319.880.530.12
P41009.401.080.601.08
P5101010.739.820.730.18
P610309.7630.000.240.00
P720020.161.210.161.21
P8202019.8719.870.130.13
P9203020.2530.000.250.00
P10301030.009.760.000.24
P11302030.0020.250.000.00
P12303030.0030.000.000.00
Table 9. Angle estimation results and errors obtained from Experiment C (°).
Table 9. Angle estimation results and errors obtained from Experiment C (°).
Position θ g 1 θ g 2 θ m 1 θ m 2 E 1 E 2
P145−4544−4510
P231−4531−4500
P318−4518−4401
P48−457−4312
P50−450−4401
P6−6−45−5−4411
P745−3644−3412
P827−3427−3301
P911−3111−3100
P100−270−2700
P11−8−18−7−1711
P12−140−14000
P13−1845−184500
P1445−2745−2502
P1518−2318−2300
P160−180−1800
P17−180−18000
P18−2318−231800
P19−2745−274500
P2045−1844−1810
P210−140−1301
P22−18−8−18−701
P23−270−27000
P24−3111−311001
P25−3427−342700
P26−3645−354401
P27−45−6−45−600
P28−450−44010
P29−458−44810
P30−4518−451701
P31−4531−443011
P32−4545−454500
Table 10. Position estimation results obtained from Experiment C (m).
Table 10. Position estimation results obtained from Experiment C (m).
Position x g y g x m y m E x E y
P1000.350.000.350.00
P2504.990.000.010.00
P310010.010.350.010.35
P415015.220.520.220.52
P520019.820.180.180.18
P625023.700.111.300.11
P7050.255.780.250.78
P8554.755.370.250.37
P910510.154.950.150.05
P1015515.194.810.190.19
P1120516.227.333.782.33
P1225525.054.950.050.05
P1330530.004.710.000.29
P140100.0010.920.000.92
P155105.0610.080.060.08
P1610109.6110.390.390.39
P17201020.399.610.390.39
P18251024.949.920.060.08
P19301030.0010.250.000.25
P200150.0815.240.080.24
P215153.3416.661.661.66
P22101512.6513.552.651.45
P23151514.8115.190.190.19
P24201519.3815.170.620.17
P25251525.0915.120.090.12
P26301529.7414.760.260.24
P275205.3020.000.300.00
P28102010.1819.820.180.18
P29152015.1319.740.130.26
P30202019.3720.000.630.00
P31252024.7619.570.240.43
P32302030.0020.000.000.00
Table 11. Estimation accuracies within grids up to the third neighborhood.
Table 11. Estimation accuracies within grids up to the third neighborhood.
ExperimentAxis0.3 m0.6 m0.9 m1.2 m
Bx58.3%75.0%83.3%91.7%
By75.0%75.0%75.0%83.3%
Cx71.9%81.3%87.5%87.5%
Cy65.6%84.4%87.5%90.6%
Mean 67.7%78.9%83.3%88.3%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Madokoro, H.; Yamamoto, S.; Watanabe, K.; Nishiguchi, M.; Nix, S.; Woo, H.; Sato, K. Mallard Detection Using Microphone Arrays Combined with Delay-and-Sum Beamforming for Smart and Remote Rice–Duck Farming. Appl. Sci. 2022, 12, 108. https://0-doi-org.brum.beds.ac.uk/10.3390/app12010108

AMA Style

Madokoro H, Yamamoto S, Watanabe K, Nishiguchi M, Nix S, Woo H, Sato K. Mallard Detection Using Microphone Arrays Combined with Delay-and-Sum Beamforming for Smart and Remote Rice–Duck Farming. Applied Sciences. 2022; 12(1):108. https://0-doi-org.brum.beds.ac.uk/10.3390/app12010108

Chicago/Turabian Style

Madokoro, Hirokazu, Satoshi Yamamoto, Kanji Watanabe, Masayuki Nishiguchi, Stephanie Nix, Hanwool Woo, and Kazuhito Sato. 2022. "Mallard Detection Using Microphone Arrays Combined with Delay-and-Sum Beamforming for Smart and Remote Rice–Duck Farming" Applied Sciences 12, no. 1: 108. https://0-doi-org.brum.beds.ac.uk/10.3390/app12010108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop