Identification of blurred terahertz images by improved cross-layer convolutional neural network

Zhichao Chen; Zhichao Chen; Cuizhen Wang; Jiaxuan Feng; Zhiyong Zou; Fengting Jiang; Haiqing Liu; Yinxian Jie

doi:10.1364/OE.487324

1. Introduction

Terahertz wave, also known as T-ray, has been extensively studied for several decades, which is uninjurious of DNA of human cells. Thus, it can be as a safer detection wave source [1]. T-rays can penetrate a body surface and detect internal characteristic in many situations without generating structural damage. It is frequently utilized in various types of applications due to the ultra-wide spectrum of its wavelength. Terahertz wave is absorbed by materials at varying rates. For example, water molecules absorb a high amount of terahertz wave, but metal surfaces reflect terahertz light significantly [2]. For specific protein structures, such as those found in cancer cells, terahertz spectra exhibit distinctive absorption curves [3]. This approach is frequently employed in various fields, such as the identification of agricultural goods [4–6], cancer and other pathological tissue detection [7–9], and hidden object detection [10–13].

Due to the significant growth in the computing capacity of Graphics Processing Unit (GPU) devices over the last decade, deep learning methods based on GPU began to iterate and evolve rapidly. Many machine learning models with various mature structures have been derived for the computer vision domain, spanning from machine learning model AlexNet [14], to the subsequent deep learning models like VGG network structure [15], and GoogleNet [16], mobilenet [17]. They are able to classify and identify a wide range of items in complex settings. As a result, this outstanding image processing approach begins with the combination of many disciplines and tasks, and it gives a new method for a wide range of domains, including terahertz imaging. The combination of terahertz imaging and deep learning is applied in many fields including human safety inspection [18], biomedical imaging [19,20], quality screening of agricultural products [21], and industrial manufacturing defect inspection [22]. At the present, the most serious issue with terahertz imaging is that the imaging effect is ambiguous, making content recognition and detection challenging. The intrinsic long wavelength and imaging process of anchoring by imaging distance become a significant barrier to improve the resolution and recognition rate of terahertz imaging. Therefore, some scholars try to increase imaging equipment hardware performance. Others attempt to apply a range of approaches for terahertz image super-resolution reconstruction to increase imaging quality [23,24], but it still has limits in dealing with highly blurred images.

This paper provides a new method to improve the imaging effect, a direct end-to-end learning method from highly blurred image to target recognition by using Layer-cross CNN and Multi-Power-Attenuation dataset. Specifically, the Layer-cross CNN structure is designed based on the characteristics of the blurred terahertz image, The number of high-level and low-level convolution layers is increased, and a connection channel is built between them to allow low-level features to be reused. Thus, the network can capture the image's edge features and semantic features. Secondly, a Multi-Power- Attenuation terahertz dataset is used to be train data for CNN, it is collected by a Power- Attenuation image system, which can place attenuators between the imaging plane and the light source and gradually increase the number of attenuators to acquire imaging data under various imaging power conditions. Finally, data of different power utilized in training can be flexibly adjusted upon demand. It can be targeted to train networks that are sensitive to highly blurred images or get good performance with any imaging power. Figure 1 shows the network effect which aims to blurred image.

Fig. 1. The Cross-Layer CNN train by Multi-Power-Attenuation dataset could find out the extremely blurred scissor which is invisible to the naked eyes.

Download Full Size | PDF

Overall, the contributions of this paper are as follows:

1) A Cross-Layer CNN model is constructed, which can increase the model's blurred image recognition ability by setting a suitable convolution layer structure and establishing a learnable connection channel between the low and high levels to reutilize the image's edge information.
2) In order to recognize the highly blurred image, the Multi-Power-Attenuation dataset is constructed. Using different imaging light power to collect data, the same sample data between different power will have the same characteristics, even the blurred image. This dataset construction method enables CNN to achieve a higher precision in dealing with clear images and blurred images.

2. Multi-power-attenuation dataset

In this section, The previous work focusing on the THz image dataset is reviewed, especially on original THz image acquisition and construction of the Multi-Power-Attenuation dataset. According to the spectrum characteristics of data with different power, the data of each layer of Multi-Power-Attenuation dataset are analyzed.

2.1 Terahertz image acquisition

In this work, Multi-Power-Attenuation datasets was obtained using the active THz imaging system, as shown in Fig. 2. The system consists of a solid-state laser, a reflecting mirror, an electrically driven mobile platform, and a THz linear camera. The solid-state laser's output frequency is around 340 GHz. To reduce the impact of mechanical vibration on the imaging process, the moving speed of the electrically driven mobile platform is set to 0.15 m/s. With the linear THz camera’s capture rate up to 5000 lines per second, so the image acquisition rate is 5000 fps (5 kHz), the term fast-256 device is capable of a scan speed up to 15 m/s, Pixel size is 1.5 × 3 mm, The imaging area is 384 × 3 mm.

Fig. 2. THz-scanning system collecting the original image data.

Download Full Size | PDF

In this work, in order to simulate the effect of terahertz imaging under different power conditions, a multi-layer attenuate device is set between the carrier platform and the mirror. As shown in Fig. 3, the device has four layers, and the spacing between each layer is adjustable. The attenuate material can be manually arranged on any layer, and the thickness of the overall attenuate material can be adjusted by increasing or decreasing the number of attenuate materials. The attenuate materials mainly include polyethylene plastic board, cotton fabric, thin wood board, and multi-layer cardboard.

Figure 4 shows the laser power curve under different layers.

Fig. 3. Multi-Power-Attenuation data acquisition principle and examples.

Download Full Size | PDF

Fig. 4. Laser power curve (colors shown in Fig. 4(b) identified the corresponding five attenuate layer states).

Download Full Size | PDF

A laser power meter replaced the position of the camera to test the laser power, and the decline curve of the laser power with the increase of the number of attenuate layers is shown in Fig. 4(a).

(1)$${{L_{i}} = 10 \times lg\left( {\frac{{{P_0}}}{{{P_i}}}} \right)}$$

L_i describes the loss of laser power under various attenuation layers and the unit is dB, P₀ is the laser power in the case of Layer0, and P_i is the laser power of Layer-i. In which the laser power is 3.16 mw corresponding to Fig. 3(a) without attenuation. When the first attenuate layer is added, the power decreases to 1.17 mW, resulting in a power loss of 63.0% and L₁ takes 4.31 dB, corresponding to the situation in Fig. 3(b). After adding the second attenuate layer, the power decreased to 0.36 mw, with a loss of 88.6%, L₂ takes 9.43 dB, corresponding to the situation in Fig. 3(c). After adding the third attenuate layer, the power is reduced to 0.15 mw, resulting in a loss of 96.3%, L₃ takes 13.23 dB, corresponding to the situation in Fig. 3(d). After adding the fourth attenuate layer, the power decreased to 57 µw, a loss of 98.2%, L₄ takes 17.44 dB corresponding to the situation in Fig. 3(e).

The effect of terahertz imaging is obviously different under different attenuate conditions. Figure 3(a) is the imaging result without any attenuation. Because the terahertz wave penetrates the platform almost unaffected and is received by the camera, so under this condition, the sample is discernible in the original image and the edge is obvious, as shown from the grayscale intensity image of Fig. 3(a). There is an obvious intensity difference between the gray intensity of the main part of the sample and the background. From Fig. 3(b) to Fig. 3(e), the number of attenuate layers gradually increased from one layer to four, and the image quality gradually deteriorated, the gray intensity value no longer has an obvious difference, and the variance of gray value becomes smaller.

The Fourier transform is used to carry the original image of different attenuate layer and superimpose the average value of the spectrum of the same attenuate layer to get the average spectrum of all images of each layer. As shown in Fig. 5, in frequency-curve, the blue vertical line is located at the frequency of 100 and the red vertical line is located at the frequency of 300, the left side of the blue line regards as the low frequency component, the middle of the red and blue line as the middle frequency component, and the right side of the red line as the high frequency component. From the frequency-curve of Fig. 5(a) to Fig. 5(e), With the increase of attenuate layers, the peak value of the high frequency component gradually falls until the high frequency component no longer has a peak value and shows a downward trend after attenuate_layer3. At the same time, the decrease of the overall intensity represents the decrease of the overall gray intensity difference, and the relative proportion of the low frequency component increases.

Fig. 5. Frequency domain characteristics of different attenuate layers.

Download Full Size | PDF

The above analysis of the frequency domain image proves that in the process of increasing the number of attenuate layers, the high-frequency component of the image decreases gradually, at the same time, the overall gray intensity value also decreases. The relative increase of the low-frequency component represents that the noise gradually becomes dominant in the image, and the outline of the object gradually becomes blurred until it is difficult to identify.

Data collection is carried out manually, in the process of acquisition and scanning, the original data is the image frame in which the sample appears completely in the center of the image. Between the collection of 2 images, the sample will be flipped and rotated slightly while ensuring a clear image. We capture 784 images as original datasets. The size of the original image data collected by the camera is 512P × 512Px, and all images are saved in .png format, with four image channels (R × G × B × A).

As shown in Fig. 6, original dataset are horizontal scanned images, and these data have a strong identity, in order to make the model have better recognition ability for multi-angle images and incomplete images, The original data is enhanced by randomly rotate, randomly flip horizontally, and randomly vertically flip, and cut it randomly. Because there is a highly blurred image in the original data, in order to avoid the distortion of hidden features in the image caused by image processing, the contrast will be kept, brightness and grayscale value of the image. Each original image will go through the enhancement module to generate two enhanced images, and split according to the number of attenuate layers when the image is collected. The enhanced hierarchical dataset is composed as shown in Tables 1 and 2, and all experiments in this paper use enhanced and hierarchical datasets

Table 1. Expanded ordinary objects data

View Table | View all tables in this article

3. Cross-layer CNN model

In recent years, CNN's outstanding feature extraction ability in the field of optical visuals sets the stage for various feature engineering. ResNet's residual block make deep learning to overcome the gradient vanishing problem [25], Inception and Densenet try a network structure with more extensive connections, which gives inspiration for this paper's work [26,27]. In this paper, utilizing ResNet as the basic network structure and improves it according to the characteristics of the Multi-Power-Attenuation dataset. At the same time, it uses a structure like the short connection in Densenet to construct Cross-Layer link, and the connection achieve feature reuse at the Layer level.

3.1 Network architecture

ResNet's architecture has been proved to be suitable for dealing with Multi-Power-Attenuation datasets in tests, will be described in Section 4.2, hence the four-layer structure of ResNet was chosen as the basic feature extraction framework in this work. Each layer consists of several bottle neck block in series. As shown in Fig. 7, Convolution-Layer1 to Convolution-Layer4 consist by (6,8,46,3) bottle necks. In ResNet, Convolution-Layer1 and Convolution-Layer2 mostly consist of 3 and 4 bottle neck blocks [25], they are used to extract edge information and texture information of image. Different from the classification task of visible light datasets, the images in Multi-Power-Attenuation datasets are usually simple but the edges are not clear enough, which is more obvious in highly blurred images, so these low-dimensional features have a more important impact on the ability of the whole model. So the capacity of Convolution-Layer1 and Convolution-Layer2 is doubled, The third Layer contains 46 bottle necks because of this layer is mainly used to extract high-dimensional semantic features from images. 46 blocks could ensure the ability of feature extraction, and also avoid model be over-fitting. A Cross-Layer block connection consisting of two bottle neck with down sampling ability between the input of Convolution-Layer1 and the input of Convolution-Layer4, this architecture could increase the feature extraction ability of the network.

Fig. 6. Original image data of Terahertz scanning samples. (a)-(g) Named knife1-knife6, (h) cylinder, (i) key, (j) necklace, (k) scissor, (l) screwdriver, (m) wrench.

Download Full Size | PDF

Fig. 7. The architecture of the Cross-Layer network (top) and the structure of bottle necks (bottom).

Download Full Size | PDF

Every bottle neck block takes a input features as ${P_{in}}$ and get a output feature as ${P_{out}}$.The features maps are C × H ×W, which means channels, high, and width of features maps. The ${f_{Relu}}$ means the Relu process and ${C_i}$ indicate the convolute process of i-th convolution-block

(2)$${{P_{out}} = {f_{Relu}}({{C_3}({{C_2}({{C_1}({{P_{in}}} )} )} )+ {P_{in}}} )}$$

Each bottle neck block contains 3 convolution-block named $Con{v_1}$ to $Con{v_3}$,The convolution kernel size of the $Con{v_1}$ is 1 × 1, and its output channel is significantly less than the input channel, so it is mainly used to reduce the dimension of the feature graph. The convolution core size of the $Con{v_2}$ is 3 × 3, which is the main feature extraction unit or downsampling unit, and its output channel is equal to the input channel. The convolution core size of the $Con{v_3}$ is 1 × 1, which increases the number of channels of the feature graph to the initial level. The whole bottleneck block structure is equivalent to a 3 × 3 convolution block, but the dimensionality reduction of $Con{v_1}$ makes it more efficient than the latter. Figure 8 shows calculating speed gap in the case of using bottleneck block and a single convolution block in Cross-Layer link.

Fig. 8. Numbers of calculated steps in 15 minutes of using bottle neck architecture and normal convolution block at Cross-Layer link.

Download Full Size | PDF

3.2 Cross-layer link

In the structure of CNN, some edge features are inevitably lost when the features are extracted by downsampling, and the lost features cannot be recovered from the upsampling, so the edge features can be recovered through the splicing of features.

ResNet first proposed the concept of short connection [25], but its short connection only acts in the same bottleneck structure, and the function of this short connection is avoid to vanishing gradient. However, DenseNet extends this method to various bottleneck structures [26], realizing the reuse of low-dimensional features, but the connections are too dense, and the feature graphs between a small number of bottlenecks will not change much. Redundant connections greatly increase the amount of calculation of DenseNet and greatly prolong the computing cycle. Therefore, only one short connection is used in Cross-Layer CNN, and the Cross-Layer link is set between Convolution-Layer1 and Convolution-Layer4. The feature map of as shown in Fig. 9, Layer1 mainly focuses on the edge features of the image, while the feature map of Layer4 begins to pay attention to the complex semantic features of the image object itself, so the edge feature reuse is realized by establishing a connection between the two. Because this short connection spans 54 bottleneck block and has been compressed twice, this structure can effectively combine the low-level features with the high-level semantic features without consuming too much computing time. Figure 9(d) shows the concern area of Convolution-Layer4 generated by the network without the Cross-Layer link. Figure 9(b) shows a more accurate concern area than Fig. 9(d), this indicates the Cross-Layer link could welly reuse the edge features.

(3)$${{P_{Li}} = {F_{Li}}({{P_{L({i - 1} )}}} )\; \textrm{(}0 \le i < 3\textrm{)}}$$

(4)$${{P_{L3}} = {F_{L3}}({{F_{L2}}({{P_{L1}}} )} )+ {P_{L1}}}$$

Fig. 9. The original data (a) and feature map concern area of Convolution-Layer4 (b) and Convolution-Layer1 (c) and Convolution-Layer4 without Cross-Layer link (d).

Download Full Size | PDF

${\textrm{P}_{\textrm{Li}}}$ represents the output feature of the i-th Convolution-Layer, and ${F_{Li}}$ represents the process of multilayer convolution of the i-th Convolution-Layer.

4. Experiment results and discussion

4.1 Implementation details

In this work, experiments contain two parts. The first part aim to figure out the improvement of Cross-Layer CNN compared with other models, All the layers of Power-Attenuation dataset are used to train models and utilizes high blurred layer or a portion of all layer to evaluate the performance of Cross-Layer CNN and others. In the second part of the experiment, the Multi-Power-Attenuation dataset is split to 5 sub-dataset according to the attenuate layers, select the high blurred Layer as the classification target, and gradually increase the other Layer used in the training. The part with higher imaging power in Multi-Power-Attenuation is regarded as a traditional dataset, The disparity between the model with the entire Multi-Power-Attenuation dataset and the model with the traditional dataset model will demonstrate the effect of the Multi-Power-Attenuation dataset In this experiment, the NVIDIA GeForce RTX 3090 GPU is used for calculation, and the memory size is 24Gb. In order to qualitatively evaluate the representation ability of the model, the epoch of all training processes is set to 100, the learning rate is set to 1E-2, the loss function be used is CrossEntropyLoss, and the optimizer is SGD (stochastic gradient descent), the parameters setups of the optimizer is that momentum set to 0.8, dampening set to 0 and weight-decay set to 0.

In this section, Cross-Layer CNN and different types of deep learning models with different depth structures are used to train Multi-Power-Attenuation datasets as a whole. The learning results of the models are compared and analyzed horizontally.

As shown in Fig. 10, each layer of data from layer 0 to layer 3 is randomly divided into train dataset and test dataset at 8:2, layer 4 is randomly divided into train dataset and test dataset at 1:1.The train dataset and test dataset of each layer are put together to form a total test dataset and train dataset. The training set contains 1198 pictures, and the test set contains 370 pictures. Because the training data contain some extreme data in Layer4 which is too fuzzy, all kinds of evaluation parameters such as the value of Loss and Accuracy will vibrate in a certain extent during the training process. In order to weaken the influence of this phenomenon on model evaluation visually, all the curves are smoothly modified by smoothing function, which is provided by Tensorboard [28].

(5)$${\left\{ {\begin{array}{c} {f(0 )= \; y[0 ]}\\ {f(t )= f({t - 1} )\ast w + ({1 - w} )\ast y[t ]\; (t > 0)} \end{array}} \right.}$$

The smoothing function is shown in Formula (5), y indicate the ordinate value of original data curve, t represents the abscissa value of data curve and smoothed curve, $f(t )$ means the smoothed ordinate value of the abscissa t, for the train process t represents a batch, for the test process t represents an epoch, the parameter w represents the smoothing weight, the maximum value is 1, the minimum value is 0, the larger w makes the current step value more affected by the previous step value, and w is 0.6 in this paper.

4.2 Evaluation metrics

In this work, the detection metrics introduced in [29] are adopted, the performance indicators are Weighted-precision, Weighted-recall, and Weighted-F1-score. When the classifier prediction ends, we can draw the confusion matrix. The classification results are divided into the following categories: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN).

However, in the multi-classification problem, the confusion matrix is shown in Table 3, and the TP and FP, and FN for each label are calculated separately.

(6)$${{W_i} = \frac{{{N_i}}}{{\mathop \sum \nolimits_{j = 0}^m {N_j}}}}$$

(7)$${{N_i} = F{N_i} + TP{_i}}$$

Table 2. Expanded knife data

View Table | View all tables in this article

Table 3. Confusion Matrix of label3

View Table | View all tables in this article

In order to prevent the number of labels from affecting the representation ability of indicators, weights are set to all evaluation indicators. As shown in Formula (6) and Formula (7), Wi is expressed as the weight of label I, Ni is expressed as the number of True-label of label I, and m is the total number of labels.

Precision focuses on evaluating how much real Positive data accounts for among all the data predicted to be Positive. Recall focuses on evaluating how much of all Positive data is successfully predicted as Positive. The F1-score is the combination of Precision and Recall. As shown in Formula (8), ${\textrm{P}_w},{\textrm{R}_w}and{\textrm{F}_w}$ indicate the Weighted-Precisions, Weighted-Recall and Weighted-F1-score. Weighted-Precision is accumulated by multiplying the Precision of each label by its weight, Weighted-Recall and Weighted-F1-score Follow the same principle in Formular 9 and Formular 10

(8)$${\textrm{P}_w} = \mathop \sum \limits_{i = 0}^m {P_i}\cdot {W_i}$$

(9)$${{\textrm{R}_w} = \mathop \sum \limits_{i = 0}^m {R_i}\cdot {W_i}}$$

(10)$${{\textrm{F}_w} = \frac{{\mathop \sum \nolimits_{i = 0}^m {F_i}}}{m}}$$

4.3 Comparison with state-of-the-art methods

Three kinds of depth structure of ResNet and DenseNet are used to compare with Cross-Layer CNN. Figure 10 shows the accuracy and model loss in the training process, Table 4 is evaluated based on 50% of the data in Layer4, and the test data used by Table 5 is 20% of Layer0-Layer4. In which the accuracy used in Fig. 10(d) is unweighted, which is universal set to improve the training speed, while the precision in Tables 4 and 5 is evaluated according to Formula (8) on the training set after all network training, so there is a certain difference between the accuracy curve in Fig. 10(d) and the precision in Table 4. The curve of Fig. 10(d) can show the changing trend of the accuracy of seven models, and the performance analysis is mainly based on the precision of Table 4.

Fig. 10. Different models (test Layer4) with loss curve (a) and accuracy curve (b) of training, loss curve (c), and accuracy curve (d) of testing. And segmentation of train data and test data (e).

Download Full Size | PDF

Table 4. Model evaluation results on the layer4 test dataset

View Table | View all tables in this article

Figure 10(a) and Fig. 10(b) represent the loss and accuracy of the model on the training set during training. Because of the shallow depth of the model, ResNet-34 and ResNet-50 reached the convergence at the earliest before the 20th epoch and the limit of precision reached about 0.7. On the other hand, the structure of the four models of ResNet-101 and DenseNet-121, DenseNet-169, DenseNet-201 is relatively deep, which converges at about 60 epochs, and the limit of precision is about 0.80, among which the best precision is 0.8123 of ResNet-101. The precision of the above six models does not have an upward trend after reaching convergence and oscillating around the highest value. However, the accuracy curve of Cross-Layer CNN still does not converge after 80 epoch and exceeds the highest value of other models at 90 epochs (see Fig. 10(d) red box). This shows that the precision value of 0.8607 in Table 4 exceeds the limit that other models can reach, and an additional 4.84% improvement is obtained. The structure of Cross-Layer CNN can achieve better results than other models in dealing with highly blurred images, and breaks through the recognition ability limits of other models. This shows that the structure of Cross-Layer CNN is very effective on such highly blurred terahertz datasets as Multi-Power-Attenuation dataset. Figure 11 shows the confusion matrixes of all 7 models, the darker the color of the box, the more predictions of the label will have. The larger the value on the diagonal of the matrix, the higher the prediction accuracy of the model, and the value on the non-diagonal line indicates the wrong prediction. As shown in Fig. 11 all models. Good results can be obtained in the recognition of knifes (upper left corner), but the recognition effect of ordinary samples is poor. This mainly because of ordinary samples are Smaller and more sophisticated, Cross-Layer CNN gets a higher accuracy in dealing with this samples, that’s may be the reason of the better performance of Cross-Layer CNN. Table 5 used all layer’s dataset as test dataset aims to evaluate the Generalization ability of networks. Dealing with various blurred images, all the models can achieve the highest precision of more than 0.97, Cross-Layer CNN, up to 0.9912. This shows a similar trend with Table 4. Such a high precision shows that all models can achieve good results in the face of data except highly blurred images. The addition of Layer0-3 data and the reduction of Layer4 data make the gap of models in Table 5 smaller than Table 4. Cross-Layer CNN can achieve a very high recognition rate in the face of whole power images, which shows that Cross-Layer CNN has a strong ability to generalize terahertz images under various power.

Fig. 11. Confusion matrixes of ResNet 34 (a), ResNet 50 (b), ResNet 101 (c), DenseNet 121 (d), DenseNet 169 (e), DenseNet 201 (f), and Cross-Layer CNN (g) all the results and analysis of the model only represent the effect based on this experimental condition, and cannot represent the representation ability of each model under other experimental conditions (such as more iterative epoch).

Download Full Size | PDF

Table 5. Model evaluation results on the all layer test dataset

View Table | View all tables in this article

The concern area of an image by a network can intuitively show the basis of how the model judges the picture, which can be very helpful for the performance evaluation of the model. The image of the scissors is used as data to evaluate the area of concern of the model. Because it is a knife handle composed of two rings and a vertical blade, it has more complex geometric characteristics than other samples such as knife-sample. Although key-sample also has this property, because the sample of the key is too small, the loss of potential features of detail will be more serious under high blur. Figure 12 shows the areas of interest of the above ten models when classifying scissors patterns at Layer4. In each group of images, the left is the original image, the middle is the thermal map of the area of interest, and the right is the superposition of the two. The lighter the part of the thermal map, the more concerned the model is to the region. The Grad-cam algorithm is used to calculate the position mapping and feedback of the feature map of the last convolution layer before the full connection layer of the convolution neural network corresponding to the original image [30].

Fig. 12. Concern area of Layer4 scissors by 7 models which contains (a)-(c) Densenet(121-169-201), (d)-(f) ResNet(34-50-101), (g) Cross-Layer CNN.

Download Full Size | PDF

According to the results of Tables 4 and 5, the recognition ability of the above models for blurred images cannot be easily judged, and the gap between classification accuracy of many networks is small, but as shown in Fig. 12, the areas of concern of these models for the same image are very different, which shows that although they get the same classification results, the basis of classification is very different. The results of ResNet Fig. 12(d)-(f) are reasonable for image classification. The two species of erythema shown in the picture represent the model's attention to the hidden scissors handle and scissors tip, respectively. The results of DenseNet Fig. 12(a)-(c) all pay attention to the wrong features, and the noise judgment in the image is regarded as the hidden feature in the image, and it can be seen from these images that with the deepening of the network layers of the model, the area of concern of the model for the image gradually shrinks, and the attention of the area of concern gradually increase. The preferences of the model are transferred to the specific feature, which is related to the deeper the depth of the convolution neural network with the more complex the features it focuses on. All the structures of DenseNet pay attention to the error features, ResNet performs well, but compared with other structures, Cross-Layer CNN performs better than other ResNet networks in accuracy at Tables 4 and 5, and its characteristics are more focused on areas of concern, so Cross-Layer CNN is the better network model.

Figure 13 shows the area where Cross-Layer CNN pays attention to the scissors image of Layer0-3. Layer 4 see Fig. 12(g). It can be seen from these images that the model will focus on the handle and blade part of the scissors for the clearest scissors image (Layer 0). With the increase of the number of layers, the Cross-Layer CNN with Multi-Power-Attenuation dataset still accurately finds the position of the handle and blade in a large amount of noise and excavates the hidden image features in the image. This shows that when facing a highly blurred image, the network can transfer the main body graphic logic discrimination basis from the clear image to the blurred image and find the location of the target for recognition and classification.

Fig. 13. Concern area of Layer0-3 (a)-(d) scissors by Cross-Layer-CNN.

Download Full Size | PDF

4.4 Comparison of multi-power-attenuation dataset and traditional dataset

In this section, every experiment will use same network structure to train but use different train data. The Multi-Power-Attenuation dataset is split to 5 parts according to the image power like section 4.3, but the train data of each model is composite in a hierarchical way. The training set of the first layer model is only composed of Layer0 data, which is named Traindata-0, and the dataset of the second layer model contains the data of Layer 0 and Layer 1 and is named Traindata-01. Other models follow the same principle. The Traindata-012 contains all high imaging power images in Multi-Power-Attenuation dataset which maximum power loss is 9.43 dB, so it is regarded as traditional dataset. Traindata-012 contains 940 images, to avoid the influence of train data size to results, Traindata-0123 and Traindata-01234 will delete image randomly until contains 940 images. In order to figure out the effect to high blurred images of different train datasets, Layer4 data is expanded and use a half to construct the test dataset, another part is add into Traindata-01234.

As shown in Fig. 14(a) and Fig. 14(b), the convergence speed of the model slows down slightly with the increase of the dataset, but it still converges quickly, and the trained model shows obvious trend according to the classification results. As shown in Table 6, the precision of the model trained only by Layer0 at the beginning is only 0.2341, and the increase of precision after adding Layer1 and Layer 2 is only 0.63\% and 7.56\%, this shows the traditional dataset poor performance in dealing high blurred images. But when the data of Layer3 is added, the precision increases by 33.15\%. Traindata-01234 get the best precision at 0. 8988 by using the whole Multi-Power-Attenuation dataset construction, take an increase of 58.28\% to traditional dataset (Traindata-012). As shown in Fig. 15, using Traindata-0 is almost impossible to classify correctly, even Traindata-0123 perform very poorly in the identification of ordinary objects. But using the whole Multi-Power-Attenuation dataset can take a marked improvement in all kind of identifying. This indicate that the method of construct dataset as Multi-Power-Attenuation dataset ways can take a better performance dealing with high blurred images than traditional ways. The random data-delete of Traindata-01234 may reduce the percentage of highly blurred images, that why the precision of Traindata-01234 is higher than precision of Cross-Layer CNN in Table 4.

Fig. 14. Hierarchical models (test Layer4) with loss curve (a) and accuracy curve (b) of training, loss curve (c), and accuracy curve (d) of testing.

Download Full Size | PDF

Fig. 15. Confusion Matrixes of Traindata-0 (a), Traindata-01 (b), Traindata-012 (c), Traindata-0123 (d), and Traindata-01234 (e).

Download Full Size | PDF

Table 6. Hierarchical model evaluation results on the Layer4 dataset

View Table | View all tables in this article

Figure 16 shows the evolution of the area of interest of the model after adding different layers of data. The area of concern generated by Traindata-0 contains only noise. After adding Layer 1 data, the model begins to pay attention to the area containing the sample, but it is still affected by noise. After adding the Layer 2 data, the model began to pay attention to the sample area correctly and greatly reduced the attention to the noise. After adding Layer 3 data, the model has been able to distinguish the rough outline of the sample, and the influence of noise is further reduced. With the addition of Layer 4 data, the model refinement features to focus on the handle and tip of the sample, and are almost no longer disturbed by noise. This proves the better performance of Multi-Power-Attenuation dataset from another aspect.

Fig. 16. Concern area of hierarchical model aim to Layer4, Traindata-0 (a), Traindata-01 (b), Traindata-012 (c), Traindata-0123 (d), Traindata-01234 (e).

Download Full Size | PDF

5. Conclusion

In summary, a method to identify high blurred terahertz images by using a Cross-Layer CNN with Multi-Power-Attenuation dataset has been demonstrated. It is proved that this method is superior to the traditional method in experiment results. Cross-Layer CNN is design to deal with low imaging power data by reuse low layer features. That makes it breaks through the classification limit of traditional models on blurred datasets. The accuracy of blurred image classification can reach 89.88%. The Multi-Power-Attenuation dataset is composed of data with different imaging power. Compared with traditional high-power datasets, this dataset can improve the classification accuracy of blurred images by 58%. The Multi-Power-Attenuation dataset improved the limit power loss range of terahertz image recognition. This method can compensate the unclear effect of terahertz imaging to a certain extent, which contributes to decrease the application threshold of terahertz imaging and broadening the application range of terahertz imaging in the future. It provides a new direction and idea for terahertz imaging algorithm and imaging system.

Funding

Institute of Energy, Hefei Comprehensive National Science Center (No. 21KZS205); National Natural Science Foundation of China (No.12127809).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request

References

1. C. Jansen, S. Wietzke, O. Peters, M. Scheller, N. Vieweg, M. Salhi, and M. Koch, “Terahertz imaging: applications and perspectives,” Appl. Opt. 49(19), E48–E57 (2010). [CrossRef]

2. H. Chen, W. Ma, Z. Huang, Y. Zhang, Y. Huang, and Y. Chen, “Graphene-based materials toward microwave and terahertz absorbing stealth technologies,” Adv. Opt. Mater. 7(8), 1801318 (2019). [CrossRef]

3. C. Yu, S. Fan, Y. Sun, and E. Pickwell-MacPherson, “The potential of terahertz imaging for cancer diagnosis: A review of investigations to date,” Quantitative imaging in medicine and surgery 2(1), 33 (2012). [CrossRef]

4. Y. Jiang, H. Ge, F. Lian, Y. Zhang, and S. Xia, “Early detection of germinated wheat grains using terahertz image and chemometrics,” Sci. Rep. 6(1), 21299 (2016). [CrossRef]

5. Y. Jiang, H. Ge, and Y. Zhang, “Quantitative analysis of wheat maltose by combined terahertz spectroscopy and imaging based on Boosting ensemble learning,” Food Chem. 307, 125533 (2020). [CrossRef]

6. Y. Shen, Y. Yin, B. Li, C. Zhao, and G. Li, “Detection of impurities in wheat using terahertz spectral imaging and convolutional neural networks,” Comput Electron Agric. 181, 105931 (2021). [CrossRef]

7. S. J. Oh, Y. M. Huh, J. S. Suh, J. Choi, S. Haam, and J. H. Son, “Cancer diagnosis by terahertz molecular imaging technique. Journal of Infrared,” J. Infrared, Millimeter, Terahertz Waves 33(1), 74–81 (2012). [CrossRef]

8. M. El-Shenawee, N. Vohra, T. Bowman, and K. Bailey, “Cancer detection in excised breast tumors using terahertz imaging and spectroscopy,” Biomedical Spectroscopy and Imaging 8(1-2), 1–9 (2019). [CrossRef]

9. T. Chavez, T. Bowman, J. Wu, K. Bailey, and M. El-Shenawee, “Assessment of terahertz imaging for excised breast cancer tumors with image morphing,” J. Infrared, Millimeter, Terahertz Waves 39(12), 1283–1302 (2018). [CrossRef]

10. J. F. Federici, B. Schulkin, F. Huang, D. Gary, R. Barat, F. Oliveira, and D. Zimdars, “THz imaging and sensing for security applications—explosives, weapons and drugs,” Semicond. Sci. Technol. 20(7), S266–S280 (2005). [CrossRef]

11. A. Y. Pawar, D. D. Sonawane, K. B. Erande, and D. V. Derle, “Terahertz technology and its applications,” Drug Invention Today 5(2), 157–163 (2013). [CrossRef]

12. M. C. Kemp, “Detecting hidden objects: Security imaging using millimetre-waves and terahertz,” in Conference on Advanced Video and Signal Based Surveillance (pp. 7–9) (IEEE, 2007).

13. W. R. Tribe, D. A. Newnham, P. F. Taday, and M. C. Kemp, “Hidden object detection: security applications of terahertz technology,” in Terahertz and Gigahertz Electronics and Photonics III (Vol. 5354, pp. 168–176) (SPIE, 2004).

14. A. Krizhevsky, “One weird trick for parallelizing convolutional neural networks,” arXiv, arXiv:1404.5997 (2014). [CrossRef]

15. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv, arXiv:1409.1556 (2014). [CrossRef]

16. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, and A. Rabinovich, “Going deeper with convolutions,” in Conference on Computer Vision and Pattern Recognition (pp. 1–9) (IEEE, 2015).

17. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Conference on Computer Vision and Pattern Recognition (pp. 4510–4520) (IEEE, 2018).

18. D. Liang, F. Xue, and L. Li, “Active terahertz imaging dataset for concealed object detection,” arXiv, arXiv:2105.03677 (2021). [CrossRef]

19. A. I. Knyazkova, A. V. Borisov, L. V. Spirina, and Y. V. Kistenev, “Paraffin-embedded prostate cancer tissue grading using terahertz spectroscopy and machine learning,” J. Infrared, Millimeter, Terahertz Waves 41(9), 1089–1104 (2020). [CrossRef]

20. H. Liu, N. Vohra, K. Bailey, M. El-Shenawee, and A. H. Nelson, “Deep learning classification of breast cancer tissue from terahertz imaging through wavelet synchro-squeezed transformation and transfer learning,” J. Infrared, Millimeter, Terahertz Waves 43(1-2), 48–70 (2022). [CrossRef]

21. L. Afsah-Hejri, E. Akbari, A. Toudeshki, T. Homayouni, A. Alizadeh, and R. Ehsani, “Terahertz spectroscopy and imaging: A review on agricultural applications,” Comput Electron Agric. 177, 105628 (2020). [CrossRef]

22. Q. Mao, Y. Zhu, C. Lv, Y. Lu, X. Yan, S. Yan, and J. Liu, “Convolutional neural network model based on terahertz imaging for integrated circuit defect detections,” Opt. Express 28(4), 5000–5012 (2020). [CrossRef]

23. Y. Li, W. Hu, X. Zhang, Z. Xu, J. Ni, and L. P. Ligthart, “Adaptive terahertz image super-resolution with adjustable convolutional neural network,” Opt. Express 28(15), 22200–22217 (2020). [CrossRef]

24. Z. Long, T. Wang, C. You, Z. Yang, K. Wang, and J. Liu, “Terahertz image super-resolution based on a deep convolutional neural network,” Appl. Opt. 58(10), 2731–2735 (2019). [CrossRef]

25. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Conference on Computer Vision and Pattern Recognition (pp. 770–778) (IEEE, 2016).

26. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Conference on Computer Vision and Pattern Recognition (pp. 4700–4708) (IEEE, 2017).

27. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Conference on Computer Vision and Pattern Recognition (pp. 2818–2826) (IEEE, 2016).

28. https://tensorflow.google.cn/tensorboard

29. S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classification techniques,” Emerging Artificial Intelligence Applications in Computer Engineering160(1), 3–24 (2007).

30. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in International Conference on Computer Vision (pp. 618–626) (IEEE, 2017).

Confusion Matrix		Predicted label
		Label 1	Label 2	Label 3
True label	Label 1			FP
	Label 2			FP
	Label 3	FN	FN	TP

Models	Weighted-precision	Weighted-recall	Weighted-F1-score
ResNet_34	0.7014	0.6105	0.5921
ResNet_50	0.7863	0.7947	0.7714
ResNet_101	0.8123	0.7263	0.7279
DenseNet_121	0.8056	0.7378	0.7353
DenseNet_169	0.7655	0.7473	0.7265
DenseNet_201	0.7862	0.7368	0.7206
Cross-Layer CNN	0.8607	0.7578	0.7573

Models	Weighted-precision	Weighted-recall	Weighted-F1-score
ResNet_34	0.9790	0.9789	0.9788
ResNet_50	0.9860	0.9859	0.9859
ResNet_101	0.9886	0.9885	0.9885
DenseNet_121	0.9842	0.9732	0.9732
DenseNet_169	0.9860	0.9853	0.9854
DenseNet_201	0.9811	0.9808	0.9807
Cross-Layer CNN	0.9912	0.9910	0.9910

Models	Weighted-precision	Weighted-recall	Weighted-F1-score
Train data-0	0.2341	0.2894	0.2395
Train data-01	0.2404	0.3315	0.2596
Train data-012	0.3160	0.3789	0.3227
Train data-0123	0.6475	0.5473	0.4910
Train data-01234	0.8988	0.8894	0.8829

Confusion Matrix		Predicted label
		Label 1	Label 2	Label 3
True label	Label 1			FP
	Label 2			FP
	Label 3	FN	FN	TP

Identification of blurred terahertz images by improved cross-layer convolutional neural network

Abstract

1. Introduction

2. Multi-power-attenuation dataset

2.1 Terahertz image acquisition

3. Cross-layer CNN model

3.1 Network architecture

3.2 Cross-layer link

4. Experiment results and discussion

4.1 Implementation details

4.2 Evaluation metrics

4.3 Comparison with state-of-the-art methods

4.4 Comparison of multi-power-attenuation dataset and traditional dataset

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (16)

Tables (6)

Equations (10)

Optics Express

No	Cylinder	Key	Necklace	Scissor	Screwdriver	Wrench
Layer0	20	20	20	20	20	20
Layer1	10	20	10	20	20	20
Layer2	10	20	10	20	20	20
Layer3	10	20	10	20	20	20
Layer4	10	10	10	10	10	10
Total	60	90	60	90	90	90

No	Knife1	Knife2	Knife3	Knife4	Knife5	Knife6	Knife7
Layer0	38	40	40	40	40	40	40
Layer1	40	40	40	20	40	40	40
Layer2	40	40	20	20	40	40	40
Layer3	40	40	20	20	20	20	20
Layer4	20	20	20	20	20	20	10
Total	178	180	140	120	160	160	150

No	Knife1	Knife2	Knife3	Knife4	Knife5	Knife6	Knife7
Layer0	38	40	40	40	40	40	40
Layer1	40	40	40	20	40	40	40
Layer2	40	40	20	20	40	40	40
Layer3	40	40	20	20	20	20	20
Layer4	20	20	20	20	20	20	10
Total	178	180	140	120	160	160	150