Transformer assisted dual U-net for seismic fault detection

Wang, Zhiwei; You, Jiachun; Liu, Wei; Wang, Xingjian

doi:10.3389/feart.2023.1047626

ORIGINAL RESEARCH article

Front. Earth Sci., 24 March 2023
Sec. Environmental Informatics and Remote Sensing
Volume 11 - 2023 | https://doi.org/10.3389/feart.2023.1047626

Transformer assisted dual U-net for seismic fault detection

Zhiwei Wang¹

Jiachun You²*

Wei Liu²

Xingjian Wang³

¹School of Petroleum Engineering, China University of Petroleum (East China), Qingdao, Shandong, China
²College of Geophysics, Chengdu University of Technology, Chengdu, Sichuan, China
³State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Chengdu University of Technology, Chengdu, Sichuan, China

Automatic seismic fault identification for seismic data is essential for oil and gas resource exploration. The traditional manual method cannot accommodate the needs of processing massive seismic data. With the development of artificial intelligence technology, deep learning techniques based on pattern recognition have become a popular research area for seismic fault identification. Despite the progress made with U-shaped neural networks (Unet), they still fall short in meeting the stringent requirements of fault prediction in complex structures. We propose a novel approach by combining a standard Unet with a transformer Unet to create a parallel dual Unet model, called Dual Unet with Transformer. To improve the accuracy of fault prediction, we compare six loss functions (including Binary Cross Entropy loss, Dice coefficient loss, Tversky loss, Local Tversky loss, Multi-scale Structural Similarity and Intersection over Union loss) using synthetic data, based on three evolution metrics involving Dice coefficient, Sensitivity and Specificity, find that the binary cross entropy loss function is the most robust one. An example comparing the prediction performance of different Unet models on synthetic data demonstrates the superior performance of our Dual Unet model, verifying the practical application value. To further validate the practical feasibility of our proposed method, we use real seismic data with a complex fault system and find that our proposed model is more accurate in predicting the fault system compared to well-developed Unet models such as the classical Unet and classical coherence cube algorithm, without transfer learning. This confirms the potential for wide-scale application of our proposed model.

1 Introduction

Seismic fault detection is a crucial step of oil and gas reservoir exploration because faults often serve as pathways for hydrocarbon migration. Furthermore, faults have geological significance as they indicate changes in stress and provide valuable information for drilling. Fault identification technology is constantly developing with the development of seismic exploration technology. In the past, the discontinuity or edge of seismic images is considered as a sign of a fault. Therefore, many fault detection methods are proposed to enhance those discontinuities using some seismic attributes including the semblance, coherence and curvature (Marfurt et al., 1998; Marfurt et al., 1999; Roberts, 2001). To pursue better performance, more improved approaches are proposed including the ant tracking and attributes fused methods (Pedersen et al., 2002; Di et al., 2019; Yuan et al., 2020; Acuña-Uribe et al., 2021; Yuan et al., 2022), but the results still rely heavily on the experience of interpreters and the quality of the seismic attributes used. Moreover, the presence of noise in seismic images can negatively impact the accuracy of fault detection. Therefore, it is imperative to develop an automatic fault identification method.

With the rapid development of deep learning, especially the deep convolution neural networks (CNN), more and more attention has been paid to processing and interpreting seismic data, such as velocity inversion, seismic salt interpretation and noise suppression (Shi et al., 2019; Wu and McMechan, 2019; You et al., 2020). The powerful capability of deep CNNs to establish non-linear relationships between inputs and targets has made automatic fault identification based on CNN models a popular area of application. Seismic fault detection is essentially a classification task, with labels of “fault” and “non-fault.” Over the years, researchers have developed a variety of neural network architectures to tackle this task. In the early stages, support vector machine (SVM) and multi-layer perceptron (MLP) methods were applied to deal with this task (Di et al., 2017; 2018). In recent years, more end-to-end fault-detection deep CNN models have been developed (Xiong et al., 2018). The fault detection task is regarded as semantic segmentation of images, and the standard Unet architecture including encoder and decoder is introduced (Li et al., 2019; Wu et al., 2019). Because of the superiority of Unet models, its many variants have been successfully applied in seismic fault detection, such as a nested residual Unet, Unet 3plus and wavelet transform based CNN (Yang et al., 2020; Gao et al., 2022; Shen et al., 2022).

The main feature of a CNN model is that it shares receptive fields by using filters with limited size. Because of that, it is difficult for CNN-based methods to learn explicit global and long-term semantic information. In cases where the fault system is complex, the positive (fault) and negative (non-fault) labels in seismic images are highly unbalanced, and the CNN model may suffer from an unsatisfactory result, which seems to be unable to fully meet the strict requirements of seismic fault detection. Inspired by the significant success of the transformer with attention mechanism in the field of Natural Language Processing (NLP), a vision transformer (ViT) module with an attention mechanism was introduced (Dosovitskiy et al., 2021). However, transformers were originally designed to process one-dimensional sequences and focus on building global relationships between inputs and targets, which results in a lack of localization information, which coincidentally is the advantage of a CNN model. Integrating the strengths of both models is becoming a new trend, leading to the development of combined CNN and transformer architectures, such as the Transformer-based Unet (TransUnet) and Shifted Windows Transformer-based Unet (Swin TransUnet) (Cao et al., 2021; Chen et al., 2021). Although these two hybrid models have been successfully applied to medical image segmentation, there are few reports on their use in seismic fault prediction. Using the combined CNN-Transformer model to develop a new end-to-end hybrid structure for seismic fault prediction is both promising and significant.

In our manuscript, we begin by presenting our newly developed hybrid CNN-Transformer architecture. Next, we investigate the loss function used in the image segmentation and compare their performances. Afterwards, we detailed compare several well-established CNN architectures using synthetic data and evaluate their metrics. Lastly, we apply the developed CNN models to perform seismic fault prediction on real data and summarize our work.

2 Methodology

2.1 Architecture of Unet model

In our manuscript, for the task of semantic segmentation, various variants of the standard Unet model that incorporate transformers are gaining increasing attention. Two of these well-developed models are the TransUnet and Swin TransUnet. The TransUnet model integrates multiple transformer blocks into the bottom layer of the standard Unet model, while the Swin TransUnet replaces the convolutional blocks in the encoder-decoder components with transformer blocks. TransUnet combines the convolution blocks with transformers, showing more fused features; Swin TransUnet illustrates a purely U-shaped Transformer architecture. Further research is needed to determine which architecture produces better results in seismic fault prediction. The architectures of Unet models are shown in Figure 1.

FIGURE 1

FIGURE 1. Architectures of (A) classical Unet, (B) TransUnet, (C) SwinUnet.

In order to extend the applications of CNN models, a transformer with attention mechanism embedded within a CNN is proposed and serves as a powerful tool in computer vision. In the later examples of synthetical data, we can observe that predicted results of the traditional Unet model are more continuity but lack detailed information whereas transformer assistant Unet models are short of continuity in seismic fault prediction. Due to the use of shared convolution kernels, conventional convolutional neural network models such as Unet are more suitable for learning local features of input images but have limited ability to capture global features. The Transformer models show a good performance of global learning, but its description of local features of images is not ideal. To take advantage of the strengths of both models, we propose a new hybrid architecture called the Dual Unet with Transformer, as illustrated in Figure 2.

FIGURE 2

FIGURE 2. Architecture of dual Unet with transformer.

2.2 Loss function

In deep learning, the loss function plays a crucial role. By minimizing the loss function, the model converges and reduces the predictive error of the CNN model. Therefore, different loss functions have a significant impact on the model. In the case where the parameters of the deep neural network architecture have been determined, there is a need for a deeper comparative study on how to select the loss function so that the deep neural network converges to an optimal solution. In the seismic fault detection, the positive (faults) and negative (no-faults) labels are extremely unbalanced. The selection of the loss function is crucial for prediction accuracy as it has advantages in handling label imbalances. In our manuscript, we discuss a loss function that is introduced. As seismic fault detection is a binary classification, the loss function we discuss belongs to the binary segmentation problem.

Binary Cross Entropy loss: Binary cross entropy is a classic and widely used loss function in binary classification, but for image segmentation, it is defined to predict a binary label at a pixel level. Its function is defined as

\begin{array}{c} l o s s^{B C E} = - [y \log \tilde{y} + (1 - \tilde{y}) \log (1 - \tilde{y})] \end{array} (1)

Where $y$ and $\tilde{y}$ are the ground truth and predicted labels, respectively.

Dice coefficient loss: Dice coefficient is a widely used measurement in computer vision, which is applied to calculate the similarity between two images. It has also been suggested for use as a loss function (Milletari et al., 2016). The dice coefficient loss between labels and outputs can be written as

\begin{array}{c} l o s s^{D C} = 1 - \frac{2 y \tilde{y}}{y + \tilde{y}} \end{array} (2)

Tversky loss: The loss function of dice coefficient keeps an equal weigh between precision and recall. However, it is difficult to train a network for highly imbalanced data by using the dice coefficient loss function, in which predicting small scaled seismic faults is crucial. To improve performance, the Tversky loss function (Salehi et al., 2017) based on the Tversky index is defined as

\begin{array}{c} l o s s^{T} = 1 - \frac{y \tilde{y}}{y \tilde{y} + α (1 - y) \tilde{y} + β y (1 - \tilde{y})} \end{array} (3)

Where $α$ and $β$ are coefficients. Noted that when $α$ = $β$ = 0.5, the Tversky loss function is degenerated into the dice coefficient loss function.

Local Tversky loss: Based on the Tversky index, to balance precision and recall ratios in the small regions-of-interest and make the loss function more sensitive to the small regions of interest, a local Tversky loss (Abraham and Khan, 2019) is proposed with a parameter $γ$ , and its loss function is defined as

\begin{array}{c} l o s s^{l o c a l_T} = {(l o s s^{T})}^{\frac{1}{γ}} \end{array} (4)

Where $γ$ is in the range of 1–3.

Multi-scale Structural Similarity (MS-SSIM) loss: The structural similarity index (SSIM) is used to measure image quality evaluation between a processed image and a reference image. However, the SSIM index is a single-scale assessment. To calculate image quality assessment more flexibly, a multi-scale structural similarity (MS-SSIM) index is proposed (Wang et al., 2003), it can be computed by combining the evaluation at different scales using

\begin{array}{c} l o s s^{m s - s s i m} = 1 - {[l (y, \tilde{y})]}^{α_{M}} \prod_{j = 1}^{M} {[c (y, \tilde{y})]}^{β_{j}} {[s (y, \tilde{y})]}^{γ_{j}} \end{array} (5)

Where $l (y, \tilde{y}) = \frac{2 μ_{x} μ_{y} + C_{1}}{μ_{x}^{2} + μ_{y}^{2} + C_{1}}$ , $c (y, \tilde{y}) = \frac{2 σ_{x} σ_{y} + C_{2}}{σ_{x}^{2} + σ_{y}^{2} + C_{2}}$ and $s (y, \tilde{y}) = \frac{σ_{x y} + C_{3}}{σ_{x} σ_{y} + C_{3}}$ , $C_{1} = {(K_{1} L)}^{2}$ , $C_{2} = {(K_{2} L)}^{2}$ and $C_{3} = \frac{C_{2}}{2}$ . In generally, $L$ =255 and $K_{1} ≪ 1$ , $K_{2} ≪ 1$ .

Intersection over Union (IoU) loss: The IoU index (Rahman and Wang, 2016) is performed to measure a standard similarity between the predicted and ground truth images for a segment issue, this loss function is generally used in object detection and its definition is written as

\begin{array}{c} l o s s^{I o U} = 1 - \frac{|y \cap \tilde{y}|}{|y \cup \tilde{y}|} \end{array} (6)

3 Numerical experiments

3.1 Performance of loss functions on synthetic data

The selection of loss functions in seismic fault detection is a less concerned topic. In this example, we compare the performance of different loss functions using synthetic data, which lays a solid foundation for the following works. We use synthetic 2D seismic images with faults and their corresponding fault labels as training samples to train a standard Unet (Wu et al., 2019). The synthetic 2D seismic images and their corresponding labels are shown in Figure 3. In the stage of training a neural network, we employ a total of 5,120 samples, in which 80% of them are used as training samples and the remaining 20% are used as validation datasets while an additional 256 samples are applied to test the accuracy of neural networks. In order to quantitatively evaluate the performance of different loss functions, we use three.

FIGURE 3

FIGURE 3. Synthetic 2D seismic images and their corresponding labels, (A,C) are the seismic patch, (B,D) are their corresponding fault labels.

Evaluation indexes, including dice coefficient, sensitivity and specificity, in evaluating the prediction results of CNN models. Dice coefficient is used to account for overlapping pixels between the predicted and ground-truth images while sensitivity and specificity mirror the ratios of true positive and true negative, respectively. These metrics are calculated by using the following equations

\begin{array}{l} D C = \frac{2 T P}{2 T P + F P + F N}, \\ s e n s i t i v i t y = \frac{T P}{T P + F N} \\ s p e c i f i c i t y = \frac{T N}{T N + F P} \end{array} (7)

Where TP, FP FN and TN represent the number of true positive, false positive, false negative and true negative, respectively.

During the training of the neural network, we set the number of epochs to 30 and use the same optimization parameters, including Adam algorithm and learning rate of 0.0001. The loss and accuracy curves using different loss functions are drawn in Figure 4. Additionally, we also compile statistics for these three metrics using different loss functions, which are listed in Table 1. Examining the loss and accuracy curves, it can be seen that the binary cross entropy loss function achieves the lowest error and the highest accuracy. In the prediction results, we obtain a best dice coefficient of 0.9101 by using the binary cross entropy loss, and IoU loss also gets a very close value, a dice coefficient of 0.9050. As for sensitivity, binary cross entropy loss also surpasses other loss functions and IoU loss follows it closely. Except for MS-SSIM loss function, the specificity of most loss functions is almost equal. The comparison of metrics mutually confirms the accuracy curves using different loss functions (Figure 4B).

FIGURE 4

FIGURE 4. Loss (A) and accuracy (B) curves using different loss functions, the red, blue, back, green, yellow and cyan lines present dice loss, Tversky, local Tverskry, MS-SSIM, IoU and binary cross entropy loss functions.

TABLE 1

TABLE 1. Evaluation metrics by using different loss functions.

In this test, we can conclude that it seems that a single loss function is very difficult to get the best scores in all indexes. In the seismic fault detection task, based on the seismic fault labels, it can be observed that the positive and negative labels seem to be imbalanced, but the binary cross entropy function outperforms other loss functions in most metrics and it is probably the most robust one. According to manuscript of Jadon (2020), other loss functions may work better in the case of highly imbalanced data sets. Therefore, our further work is based on the binary-cross entropy loss function.

3.2 Fault prediction on synthetic data using different architectures

After determining the performances of different loss functions, we carry out an example to compare the performance of different Unet architectures, including the standard Unet, transUnet, swinTrans Unet and dual Unet with transformer. In the neural network training stage, we use the same training datasets in the examples of comparing loss function. Figure 5 shows the accuracy of the validated data sets using different Unet models. By observing the accuracy curves, it can be seen that the predicted accuracy of the proposed dual Unet with transformer model is superior to the other Unet models. For a fair comparison, we pick up some predicted fault images from the test data set by using different Unet models, which are shown in Figure 6. By comparing the results, we notice that the predicted faults from our proposed models exhibit more accurate information than that of other Unet models. In the experiment, the faults predicted by the traditional Unet model have greater discreteness and less continuity, and the transUnet seems to produce more artifacts. Our proposed dual Unet model combines characteristics and properties of the traditional Unet model and the transUnet model. This example illustrates the superiority of our proposed method and provides a foundation for its application in practical data.

FIGURE 5

FIGURE 5. Accuracy curves of the validation date sets recorded by using different Unet models, the black, blue and green lines indicate accuracy of swinUnet, standard Unet and transUnet while the red dashed line is the accuracy of our proposed dual Unet with transformer.

FIGURE 6

FIGURE 6. Comparison of predicted faults by using different Unet models in the test data set: (A) standard Unet; (B) TransUnet; (C) Swin TransUnet; (D) our proposed dual Unet with Transformer; (E) ground truth label.

It is interesting to note that the purely swin transformer U-type model seems to produce an imperfect prediction. The predicted results of Swin TramsUnet model have an obvious gap from those of other models. The emphasis on global feature extraction makes it difficult to consider the local continuity of seismic events in the linear mapping of swin transformer blocks, and the precision curve of Swin TransUnet in processing validation data sets also proves this view. At present, we doubt that whether swinTrans Unet is able to achieve a better performance than other methods as described in the medical image segmentation, for the seismic fault detection task (Cao et al., 2021). Fortunately, TransUnet seems to hold a good accuracy compared with the standard Unet. Because of that, we prefer to merge TransUnet and the standard Unet, to build a merged Unet architecture, the predicted accuracy and fault images verify our judgement.

4 Application of real data

In the actual seismic data fault prediction, our work selects a shallow sea area in the southwest of Bohai Bay where the faults are relatively well developed. In terms of regional structure, the study area is located in the east of the low uplift in the Cheng Bei, at the junction of the Bohai Depression and Jiyang Depression. To the south is the Zhendong Depression, to the north is the Bohai Depression, and to the east and west are the Chengbei low uplift and the Bonan low uplift, respectively. The study area is rich in hidden mountains, which have experienced the evolution stages of ancient platform development, Triassic platform disintegration, Yanshan rapid deformation, ancient Quaternary faulting, and recent Quaternary depression. The internal structure of the hidden mountain belt is quite complex, with a large number of folds and fault structures, as shown in Figure 7. Therefore, carrying out the characterization and description of faults in this study area is of great significance for understanding the evolution of the hidden mountains and predicting oil and gas resources.

FIGURE 7

FIGURE 7. Geological background of the research area.

After the neural network training of synthetic data is completed, we try to use our pretrained Unet models to perform seismic fault prediction on the real data. The real seismic section is painted in Figure 8. In Figure 8, some faults are easy to notice directly, which have been marked by red lines. For a seismic fault detection task, we prefer applying the pretrained Unet models to predict the seismic faults straightforwardly without transfer learning, which is a tough challenge. The predicted probability of faults overlapping with the seismic section is shown in Figure 8. In the predicted faults, for some large-scale faults such as fault F1, three Unet models generate similar results. For the case of fault F3, the TransUnet model can only predict it intermittently or hardly. Maybe inherited the ability of standard Unet model, our proposed dual Unet with transformer can produce clearer fault lines than the standard Unet and TransUnet models, especially fault F4 at 1.2–1.4 s. To furtherly compare the performance of fault prediction, we enlarge on the red dashed box (F4) in Figure 9 and display it in Figure 10, it is obvious to see that our proposed model yields a better quality of fault prediction than other two methods. Note that because the Swin TransUnet model has not obtained ideal results in the synthetic example, hence we do not include it in the practical application. In order to compare the application effects of neural network methods and traditional fault identification methods in practical examples, this article used the classical coherence cube algorithm to process the actual example (Bahorich and Farmer, 1995). As shown in Figures 8, 9, the neural network method provides a clearer and more continuous characterization of the fault compared to conventional methods. This also demonstrates the necessity and superiority of conducting deep research on neural network methods.

FIGURE 8

FIGURE 8. Original seismic section with manual interpreted faults.

FIGURE 9

FIGURE 9. Predicted faults by using different neural networks: (A) classical coherence cube algorithm; (B) conventional Unet; (C) TransUnet; (D) our proposed dual Unet with transformer.

FIGURE 10

FIGURE 10. Enlarged images of the red dashed box in Figure 8: (A) classical coherence cube algorithm; (B) conventional Unet; (C) TransUnet; (D) our proposed dual Unet with transformer.

It is worth noting that the seismic fault prediction of actual seismic data using our proposed model is not performed using transfer learning. The predicted results supply hard evidence to prove that our proposed model has a better generalization than the standard Unet and TransUnet models, and it is of great significance for seismic fault prediction of practical data.

5 Discussion

Our target is to emphasize and raise the significance of loss function in deep learning. Loss functions are crucial in determining the performance of a model. However, for complex objectives like segmentation, it’s not feasible to choose a single, universal loss function. The optimal loss function depends mostly on the dataset properties used for training, such as distribution, skewness, and boundaries. It’s worth noting that none of the existing loss functions are universally superior in all use cases. Specifically, the binary-cross entropy function performs well in our cases, and we do not think this is a conclusion that applies to all deep learning problems. It may perform well for fault detection task, but for different deep learning tasks, other loss functions may be more effective. In our opinion, specific deep learning tasks need to be analyzed in detail. For example, for deep learning tasks that involve noise suppression in seismic data, which loss function enables better performance of the deep neural network model, and relevant numerical experiments and comparative studies need to be conducted. For medical image semantic segmentation tasks, Jadon (2020) has made a detailed comparison of the performance of different loss functions, which has a different conclusion with fault detection task.

For the interpretation of 3D seismic data, due to the large amount of data, manual interpretation is difficult to efficiently and quickly complete the relevant interpretation tasks. Fully automatic or semi-automatic computer interpretation solutions have received increasing attention and research. In theory, 3D seismic data can be regarded as an unfolding form of multiple 2D data. For seismic data interpretation, we believe that the successful application of 2D seismic data is the basis for the application of 3D seismic data. Therefore, for fault recognition work, the feasibility and effectiveness of the proposed method in this article were first verified on 2D seismic data. Of course, the development of 3D Transformer-based fracture recognition technology is also one of our future research directions. Currently, in the research of 3D medical image semantic segmentation, some scholars have developed 3D Transformer models, which can provide references for our future research on 3D Transformer-based fault detection (Hatamizadeh et al., 2022; Liang et al., 2022). However, we need to develop a 3D Transformer model suitable for fault recognition in seismic data according to the characteristics of seismic data.

6 Conclusion

Aiming at the problem of seismic fault identification, after analyzing the shortcomings of the convolution block and the transformer block, we attempt to integrate a standard Unet model with a TransUnet model, and develop a dual Unet with transformer. In order to discuss which kinds of loss function can make CNN models converges quickly and produce a best performance, we carried out a numerical example to compare the performance of six loss functions and find that the binary cross entropy loss function has a superior performance in the task of seismic fault detection. In addition, a synthetic data is employed to compare performaces of different Architectures, the predicted fault sections show that our proposed transformer assisted dual Unet depicts the fault system clearer than that of the standard Unet, TransUnet, Swin TransUnet and classical coherence cube algorithm. Based on that, through seismic fault prediction and qualitative comparison, predicted results demonstrate that our proposed dual Unet with transformer model obtains a more accurate and convergent fault prediction than that of the standard Unet, TransUnet and Swin transUnet models in a synthetical case. In the application of real data, our proposed model generates a higher quality fault predicted image, compared with other Unet models, proving its practical application value.

Data availability statement

The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Author contributions

JY: The conception and design of the study, manuscript editing; ZW: Manuscript writing and revising, processing of data; WL: manuscript reviewing and editing; XW: Provide the real seismic data and some suggestion.

Funding

This research is supported by National Natural Science Foundation of China (Grant Nos 42050104, 42030812 and 42004103).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer SY declared a shared affiliation with the author ZW to the handling editor at time of review.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abraham, N., and Khan, N. M. (2019). “A novel focal Tversky loss function with improved attention U-net for lesion segmentation,” in 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019). 683–687.

ORIGINAL RESEARCH article

Transformer assisted dual U-net for seismic fault detection

1 Introduction

2 Methodology

2.1 Architecture of Unet model

2.2 Loss function

3 Numerical experiments

3.1 Performance of loss functions on synthetic data

3.2 Fault prediction on synthetic data using different architectures

4 Application of real data

5 Discussion

6 Conclusion

Data availability statement

Author contributions

Funding

Conflict of interest

Publisher’s note

References

This article is part of the Research Topic

People also looked at