Next Article in Journal
An Interesting Image of Transmural Migration of a Levonorgestrel-Releasing Intrauterine Device (LNg-IUD)
Next Article in Special Issue
Deep Learning Assisted Automated Assessment of Thalassaemia from Haemoglobin Electrophoresis Images
Previous Article in Journal
Fecal Calprotectin for Small Bowel Crohn’s Disease: Is It a Cutoff Issue?
Previous Article in Special Issue
Deep Learning Artificial Intelligence to Predict the Need for Tracheostomy in Patients of Deep Neck Infection Based on Clinical and Computed Tomography Findings—Preliminary Data and a Pilot Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Ensemble Transfer Learning for Fetal Head Analysis: From Segmentation to Gestational Age and Weight Prediction

1
College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110 , Qatar
2
Sidra Medical and Research Center, Sidra Medicine, Doha P.O. Box 26999, Qatar
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Submission received: 12 August 2022 / Revised: 25 August 2022 / Accepted: 26 August 2022 / Published: 15 September 2022
(This article belongs to the Special Issue Artificial Intelligence in Clinical Medical Imaging Analysis)

Abstract

:
Ultrasound is one of the most commonly used imaging methodologies in obstetrics to monitor the growth of a fetus during the gestation period. Specifically, ultrasound images are routinely utilized to gather fetal information, including body measurements, anatomy structure, fetal movements, and pregnancy complications. Recent developments in artificial intelligence and computer vision provide new methods for the automated analysis of medical images in many domains, including ultrasound images. We present a full end-to-end framework for segmenting, measuring, and estimating fetal gestational age and weight based on two-dimensional ultrasound images of the fetal head. Our segmentation framework is based on the following components: (i) eight segmentation architectures (UNet, UNet Plus, Attention UNet, UNet 3+, TransUNet, FPN, LinkNet, and Deeplabv3) were fine-tuned using lightweight network EffientNetB0, and (ii) a weighted voting method for building an optimized ensemble transfer learning model (ETLM). On top of that, ETLM was used to segment the fetal head and to perform analytic and accurate measurements of circumference and seven other values of the fetal head, which we incorporated into a multiple regression model for predicting the week of gestational age and the estimated fetal weight (EFW). We finally validated the regression model by comparing our result with expert physician and longitudinal references. We evaluated the performance of our framework on the public domain dataset HC18: we obtained 98.53% mean intersection over union (mIoU) as the segmentation accuracy, overcoming the state-of-the-art methods; as measurement accuracy, we obtained a 1.87 mm mean absolute difference (MAD). Finally we obtained a 0.03% mean square error (MSE) in predicting the week of gestational age and 0.05% MSE in predicting EFW.

Graphical Abstract

1. Introduction

Ultrasonic imaging, also known as ultrasound, is frequently utilized in clinical assessment since it does not include ionizing radiation, and it is less expensive than computed tomography (CT) and magnetic resonance imaging (MRI) [1]. Women usually have one to three ultrasounds during pregnancy. If the lady is pregnant with twins or is at high risk, ultrasounds may be required more frequently [2]. Ultrasound may be utilized in various prenatal diagnostic situations, including: confirming the pregnancy and the position of the fetus, calculating the gestational age of the fetal baby, verifying the number of fetal bodies, examining fetal development, examining the amounts of the placenta and amniotic fluid, identifying congenital disabilities, looking into complications, and other prenatal tests [3]. When ultrasound is routinely used in early pregnancy, it will result in an earlier detection of problems and an improved management of pregnancy complications, which is better than relying on clinical indicators such as bleeding in early pregnancy [4]. Halle et al. [5] reported that 1111 women received prenatal treatment at primary care health centers in their health cohort. Ninety-five percent of women reported having some fetal ultrasound scan prior to the 19th week scan, and 64% reported having two or more scans during this period. Seventy-eight percent of women decided to participate in week 11–14 screening for fetal abnormalities. Therefore, ultrasound is the preferable option for prenatal care compared to other imaging modalities, because it allows for the recognition and measurement of anatomical structures that can be used as guidelines for physician assessment of the fetal health status [3].
Many clinical ultrasonography diagnostics necessitate the use of anatomical structure measurements that are clear and reliable. These measurements are used to estimate fetal gestational age and weight, which is essential for monitoring growth patterns during pregnancy [6]. Abdominal circumference (AC), femur length (FL), crown–rump length (CRL), occipitofrontal diameter (OFD), biparietal diameter (BPD), and head circumference (HC) are some of the biological characteristics that may be measured during a prenatal checkup [7]. In the 13th to 25th week of pregnancy, obstetricians and gynecologists may calculate the fetus’s gestational age and weight, evaluate the fetus’s growth, and decide if aberrant head development is suspected, by measuring the fetus’s HC [8]. When measuring HC in clinical practice, the procedure is performed manually by either overlaying an ellipse on the fetal skull or by recognizing landmarks that delimit the central head axis. Despite this practice, the manual delineation raises concerns about measurement repeatability and time consumption, since ultrasound imaging is prone to various errors, including motion blurring, missing borders, acoustic shadows, speckle noise, and a low signal-to-noise ratio [9]. As a result, interpreting ultrasound images becomes extremely difficult, necessitating the use of skilled operators. Figure 1 shows ultrasound image samples that are noisy and indistinct, with an incomplete head contour; additionally, the fetal skull is not evident enough to be detected in the first trimester, as indicated in the samples obtained from the public dataset [10].
Traditional approaches for fetal biometric segmentation and measures have been under investigation for the past decade. As a result of the development of these approaches, workflow efficiency has been increased by lowering the number of steps required for routine fetal measures and examination time [6]. The randomized Hough transform [11], semi-supervised patch-based graphs [12], multilevel thresholding circular shortest paths [13], boundary fragment models [14], haar-like features [7], active contouring [15], morphological operators [16], the difference of Gaussians [17], and deformable models [18] have all been used in previous HC measurement studies.
With the advancement of deep learning technology in recent years, integrating medical images and artificial intelligence has emerged as a popular study area in medicine [19]. Convolutional neural networks (CNNs) have rapidly gained popularity as a powerful tool for many image processing applications, including classification, object identification, segmentation, and registration, among others [20]. As a result, the field of medical image segmentation is exploding with new applications. A few representative designs of CNNs are fully convolutional networks (FCNs) [21], UNet [22], and three-dimensional VNet [23].

1.1. Contributions

Numerous challenges remain for prior traditional and deep learning methods, including segmenting regions with missing edges, the absence of textural contrast, the specification of a region of interest (ROI), and background detection. These difficulties can be overcome using ensemble learning. Nowadays, CNNs are evolving towards lightweight architectures that can be integrated in edge computing frameworks [24], but prior mentioned techniques required a lengthy training period, high network parameters, high image resolution, and costly resources to run a heavy model. However, these issues may be mitigated by fine-tuning a pre-trained lightweight network. Finally, earlier studies did not explore the feasibility of utilizing machine learning and segmented image measurements to determine fetal gestational age (GA), estimated fetal weight (EFW), and abnormality signs. In this regard, this work proposes a complete pipeline for automatic segmentation and measuring the fetal head in two-dimensional (2D) ultrasound images, followed by a prediction of the fetal gestation age and weight. Below is a summary of technical contributions:
  • We fine-tuned eight segmentation networks using a pre-trained lightweight network (EffientNetB0) and employed weighted voting ensemble learning on the trained segmentation networks to obtain the optimal segmentation result.
  • We extensively evaluated the ensemble transfer learning model (ETLM) by performing three-level evaluations: fetal head segmentation evaluation, predicted mask and post-processing quality assessment, and head measurement evaluation.
  • We generated a new fetal head measurement dataset and manually labeled it by adding fetal gestation age and weight.
  • We trained multiple regressions model to predict fetal GA and EFW to address the limitation of the current formulas (Equations (21) and (22)).
  • We evaluated the regression model result using an expert obstetrician, and a longitudinal reference using Pearson’s correlation coefficient (Pearson’s r).

1.2. Organization

The following is the paper’s organization: Section 2 discusses relevant research on fetal head segmentation, HC measurement, and fetal GA and EFW calculation. Section 3.1 discusses the dataset and our methodology pipeline in depth. Section 4 contains details about the experiment and evaluation methods. Section 5 presents the results, discussion, and a comparison with state-of-the-art works. Section 6 highlights the strengths and limitations of the research. Finally, Section 7 covers a conclusion and future work.

2. Related Work

Our works deals with fetal head segmentation using traditional approaches and deep learning, HC measurement, and the calculation of GA and EFW. It is impossible to provide here an extensive overview of the literature related to these topics. We refer readers to the survey and review [4,25,26,27]. In the following, we discuss the methods that are most closely related to our work.

2.1. Fetal Head Segmentation

2.1.1. Traditional Approaches

Many works have used a variety of machine learning algorithms for fetal head segmentation. One example is the probabilistic boosting tree (PBT), which has been utilized for AC measurement [28]. A random Hough transform approach developed by Lu et al. [29] has been used to recognize incomplete ellipses in images with severe noise. However, their method may fail to detect the fetal head in low-contrast ultrasound images. Zhang et al. [30] developed multi-scale and multi-directional filter banks to extract anatomical structures and texture characteristics from fetal anatomical structure and texture images. Li et al. [31] used a prior knowledge of fetal head circumference to obtain the region of interest with random forest and detect the fetal head edge with phase symmetry. They found that their method performed poorly on fitting the fetal skull from ultrasound images with partially missing features taken in late pregnancy. A complex approach, such as [10], retrieved the HC by using haar-like characteristics to train a random forest classifier to detect the fetal skull, and employed the Hough transform, dynamic programming, and elliptical fitting. Even though these previous approaches produced promising findings, they were only tested on small datasets of specific pregnancy trimesters, and fetal ultrasound images at different stages of pregnancy vary in their inherent characteristics. Therefore, aspects such as the efficiency and accuracy of current traditional methods for automatic fetal head segmentation and HC biometry performance need to be improved because with current limitations, they are not adequate for accurate and reliable diagnosis by physicians.

2.1.2. Deep Learning

Deep learning techniques began to grow in popularity because of advancements in technology. This method had significantly better skills in image processing tasks due to their promising capabilities. In particular, CNN has emerged as a top choice for medical image classification, segmentation, and object detection [4]. UNet [22] is a network often used for biomedical image segmentation because of the symmetric structure observed in the images, allowing for the efficient use of skip connection layers and the reduced computing complexity. First, a feature map is extracted from an image via the encoders in the UNet architecture. Then, the decoders cascade their corresponding encoded feature maps to extract even more spatial information from the image. Several modified U-shape networks [32,33,34] been used to segment fetal ultrasound images, and have achieved notable results. The segmented images obtained can be utilized to detect the elliptic fetal skull and calculate the fetal HC. Sobhaninia et al. [32] proposed a multi-task deep network structure based on the LinkNet topology. They segmented fetal ultrasound images using LinkNet [35] capabilities. Their experimental results revealed that multi-task learning yields better segmentation outcomes than a single-task network. Qiao and Zulkernine [36] presented an expanded UNet model [22] with dilated convolution layers and Squeeze-and-Excitation (SE) blocks to enhance segmentation of the fetal skull border and skull in 2D ultrasound images. They used dilated convolution extracting features from a more extensive spatial range to detect edges without increasing the model complexity, and to measure fetal HC.
Desai et al. [37] proposed the DUNet architecture based on the UNet. The image and its scattering coefficients (SC) are inputs for the DUNet. Each of these inputs has an encoder. The encoders’ outputs are combined and sent into a single decoder, eliminating data augmentation and reducing the training time. Aji et al. [38] utilized UNet with pixel-wise classification to increase ROI image classification performance. Each pixel is divided into four classes: maternal networks have horizontal direction patterns, higher head borders have concave arc patterns, lower head boundaries have convex arc patterns, and the rest. The LinkNet network [35] was used as inspiration for the multi-scale and low-complexity structure of the proposed network by Sobhaninia et al. [39]. They were able to lower the number of convolutional layers in mini-LinkNet. The LinkNet network includes four encoder blocks; however, the mini-LinkNet network has just three encoder blocks, which appear to be more efficient and may retain image characteristics. These researchers demonstrate that employing a light network for the segmentation of the fetal head can lead to the intended result. Brahma et al. [40] proposed accurate binary DSCNNs for medical image segmentation. The networks’ encoder and decoder structures use parameter-free skip connections to binarize them. Asymmetric encoder–decoder DSCNNs feature pyramid networks with asymmetric decoders and spatial pyramid pooling with atrous convolutions are evaluated on the fetal head image. An intensely supervised attention-gated (DAG) VNet method was introduced by Zeng et al. [41] for automated two-dimensional ultrasound image segmentation of the fetal head. Attention gates (AGs) and deep supervision were added to the original VNet architecture. Multi-scale loss functions for deep supervision were also introduced. The suggested DAG VNet technique increased segmentation accuracy while increasing the convergence speed by including the attention mechanism and deep supervision strategy. Xu et al. [42] proposed a vector self-attention layer (VSAL) and a context aggregation loss (CAL) in CNN. Geometric priors and multi-scale calibration were developed for long-range spatial reasoning. Unlike nonlocal neural networks, VSAL could concurrently attend to spatial and channel information, and VSAL consider multi-scale information by applying geometric priors and multi-scale calibration. They also introduced context aggregation loss (CAL) as an additional benefit to VSAL. CAL analyzes global contextual information and intra- and inter-class dependencies. Then, they use VSAL as the backbone to replace the convolutional layers. The suggested VSAL outperforms various mainstream methods on prenatal images. It also shows the method’s adaptability to various segmentation networks. Skeika et al. [43] presented an innovative approach for automatically segmenting a fetal head in 2D ultrasound images. The suggested approach, called VNet-c, uses the original VNET [23] but includes several modifications. The modifications include pre-processing, batch normalization, dropout use, data augmentation, loss function, and network depth adjustments. The authors in [23] evaluated the suggested method’s performance quantitatively using negative and positive rates. The fetal head and abdomen segmentation in an ultrasound image was performed by Wu et al. [44] using a cascaded FCN in combination with context information. Sinclair et al. [45] used an VGG-16 FCN to segment the fetal head in ultrasound images taken during the second trimester. Object detection is also used with fetal ultrasound images, using fast regions convolutional neural networks (R-CNN) and FCN. Al Bander et al. [46] developed a method to identify the fetal head boundary using a combination of fast R-CNN and FCN that included target localization and segmentation.
All of the works mentioned above did not consider the resource constraints and training time. To the best of our knowledge, this is the first trial to employ ensemble transfer learning for fetal head segmentation and to develop a lightweight model with low resources and less training time with respect to model accuracy.

2.2. Fetal Head Measurement

Various methods have been proposed to derive accurate geometric measurements from segmentation masks, such as the head circumference and radii. In general, most methods consider various elliptical models for representing the fetal head shape. Zhang et al. [47] proposed a method that estimates the HC from ultrasound images without segmentation. Their technique uses a regression CNN, for which they tested four networks of varying complexity and three regression losses. It is the first direct measurement of fetal head circumference without segmentation. Region-proposal CNN for head localization and centering, and regression CNN for precise HC delineation are proposed by Fiorentino et al. [48]. Then, distance fields are used to train the regression CNN. In order to make the network task of directly regressing the HC line easier, a distance field is used to smooth the HC line. Skeika et al. [43] used their own designed algorithm to calculate HC from the predicted mask. Zeng et al. [41] used fitted ellipses to calculate HC biometric measurements based on the following formula:
H C = 2 π × S e m i A x i s b + 4 × S e m i A x i s a S e m i A x i s b
where S e m i A x i s a and S e m i A x i s b are the major and minor axes of the ellipse. Qiao and Zulkernine [36], and Li et al. [49] used the direct least square fitting of ellipses to measure the HC. A second-order polynomial, such as the following, can be used to express a generic conic:
F a , x = a · x = a x 2 + b x y + c y 2 + d x + e y + f = 0
where a = a , b , c , d , e , f T and x = x 2 , x y , y 2 , x , y , 1 T Aji et al. [38] used an ellipse fitting method comparable to the ElliFit method, in which the median value of the largest area’s edge points is generally sought. Following these two operations, five ellipse parameters are acquired and used for elliptical representation. Once these two numbers have been calculated, they are multiplied by the pixel size of the input image. After obtaining the parameters, it is possible to approximate HC by computing the ellipse border using the following formula:
H C = 0.5 × π × a + b
where a and b are the major and minor axes of the ellipse.
In this work, we propose a geometry fitting framework for computing fetal head measurements, composed of the following processing steps: smoothing, parameterization, resampling, the linear least square minimization process for fitting an explicit model, and the accurate geometric distance between points. The model is parameterized in a way that the Jacobian and the geometric parameters of the best-fit ellipse can be computed in closed-form.

2.3. GA and EFW Calculation

In general, the starting day of the last menstrual period (LMP) is used to calculate gestational age (GA). However, in around 40% of pregnancies, the LMP is unknown or unreliable [50]. Ultrasound provides more reliable information on GA and is primarily acknowledged as the preferred approach. Ultrasound can determine GA more accurately than physical examination in most pregnancies. During the first trimester, the gestational sac mean diameter and crown–rump length (CRL) are used to determine GA. Measurements of the fetal head, torso, and extremities are most frequently used in the second and third trimesters. A combination of BPD, HC, abdominal circumference (AC), and femur length (FL) are typically measured parameters that are used to calculate the GA [51]. Many other variables have been examined and linked to GA, but few increase the accuracy of GA estimation [52].
In fetal medicine, the ultrasound estimation of fetal weight (EFW) is essential for prenatal care. EFW helps the physician to determine whether fetuses are the proper size for their gestational age (GA), small (SGA), or large (LGA) [53]. The EFW is calculated from the HC, BPD, FL, and AC measurements. The formulas of Hadlock et al. [54] were the most accurate, with the lowest Euclidean distance and the highest absolute mean error being less than 10%. Hadlock et al. [54] (Equation (4)) used HC, AC, and FL measurements with or without BPD. They found a robust connection between birth weight and EFW based on HC, AC, and FL measurements [55].
l o g e ( E F W ) = 1.326 0.00326 × A C × F L + 0.0107 × H C + 0.0438 × A C + 0.158 × F L .
where A C , F L , and H C are the measurements that are mentioned in the previous paragraph. To the best of our knowledge, this is the first trial study to employ a machine learning regression model to predict fetal GA and EFW based on the fetal head, without the need for other measures such as AC and FL.

3. Materials and Methods

3.1. Methodology

Figure 2 illustrates the workflow of a full end-to-end pipeline that was proposed to achieve the main contribution of this paper. The pipeline components are demonstrated in three main blocks, as seen in Figure 2. These blocks can be subdivided as follows:
  • Automatic segmentation: takes as an input a ultrasound image, and gives an output binary mask representing the fetal head.
    (a)
    Eight segmentation models are fine-tuned independently using the pretrained CNN EfficientNetB0 as the feature extractor.
    (b)
    The segmentation predictions of these models are integrated through ETLM.
  • Measurements extraction: from an automatically computed and smoothed binary mask, we fit an analytic explicit ellipse model that we use for computing the geometric measurements of interest, such as semi-axis and head orientation.
    (a)
    Image post-processing and smoothing.
    (b)
    Fetal head measurement.
  • GA and EFW Prediction: from measurements and manual annotations, we fit a regression model that is able to predict GA and EFW, which we validate clinically.
    (a)
    Generate new GA and EFW dataset and labeling.
    (b)
    Trained multiple regression models on the new dataset.
    (c)
    Clinical and longitudinal study validation.
In the following, we firstly describe the dataset used in this study and detail the various components of the framework.

3.2. Dataset

The dataset on which the suggested approach was evaluated is available on Grand Challenge HC18 (https://hc18.grand-challenge.org/, accessed on 21 May 2022). Table 1 shows the distribution of the dataset during various trimesters of pregnancy. The dataset consists of ultrasound images, a training set of 999 images, a CSV file containing the HC and pixels size of each image, a test set of 335 images, and a CSV file containing only the pixel size of each image. These images were taken from 551 women throughout their first, second, and third trimesters of pregnancy. The images were acquired from the Radboud University Medical Center’s Department of Obstetrics in Nijmegen, Netherlands, using the Voluson E8 and the Voluson 730 (General Electric, Boston) [10]. All data were collected and anonymized by qualified sonographers following the Declaration of Helsinki. The local ethics commission authorized the data collection and usage for research purposes (CMO Arnhem-Nijmegen). Each image was 800 × 540 pixels in size, with pixel sizes varying from 0.052 to 0.326 mm due to sonographer modifications to accommodate varying fetus sizes. The sonographer manually marked each image by drawing an ellipse corresponding to the skull portion. The unique issues in the images are depicted in Figure 1. The difficulties included the head being in a variable location in the image, incomplete ellipse, and the fetal head’s dimensions fluctuating over the gestational trimesters.
We augmented the dataset to increase the network’s resilience, prevent overfitting of the training data, and improve the network’s generalization ability. Nine images were generated for each image and mask in the training set using [56]. The final augmented training set includes: (1) Center Crop, (2) Random Rotate, (3) Grid Distortion, (4) Horizontal Flip, (5) Vertical Flip, (6) Random Brightness, (7) Sharpen, (8) Affine Transformation, (9) Fancy PCA, and (10) Original Image. The total number of training sets became 9990 images and 9990 masks.

3.3. Ensemble Transfer Learning Model (ETLM)

3.3.1. Transfer Learning

Transfer learning is the capacity of a system to recognize and apply information from one area to another. Transfer learning has three levels. First, full-adaptation uses a pre-trained network’s weights and updates during training. Second, partial-adaptation starts with a pre-trained network but freezes the first few layers’ weights and updates the final layers during training. Third, zero-adaptation uses a pre-trained model to establish the weights for the whole network without updating any layers [57].
This work took weights from a lightweight network (EfficientNet) and then fine-tuned them on prenatal ultrasound images. Because the dataset consists of medical images, the full-adaptation approach was used. To ensure that the best model was selected for low cost and efficiency, the lightweight EfficientNet [58] versions from B0 to B3 were utilized. EfficientNetB0 was selected based on the obtained result. EfficientNetB0 was used as the backbone (encoder) for different segmentation networks; therefore, the last block, which includes the dense layer, was removed, as seen in Figure 3.

3.3.2. Ensemble Learning

Many artificial intelligence applications have significantly benefited from the use of ensemble learning, a machine-learning approach that uses numerous base learners to construct an ensemble learner for improved generalization of the learning system. A voting ensemble (sometimes known as a “majority voting ensemble”) is a type of ensemble machine learning model that incorporates predictions from several other models to arrive at a final prediction [59]. When applied effectively, it can help models perform better than any of the individual models. Voting ensembles combine the results of numerous models to arrive at a final result. For example, the predictions for each label are added together, and the label with the most votes is predicted. Almost the same results were obtained across all segmentation models in our study. Therefore, using a voting ensemble is practical when two or more models perform well on a predictive modeling task.
The models must all agree on most of their predictions for the ensemble to work. Hence, each model’s contribution is proportionate to its capacity or competence in a weighted average or weighted sum ensemble. A weighted average forecast begins by assigning each ensemble member a fixed weight coefficient [60]. A percentage of the weight may be represented as a floating-point number in the range of 0 to 1. Consider a case of three-segmentation models with three fixed weights of 0.2/0.3/0.4, where larger weights indicate a better performing model. It is possible to achieve the ideal average weight using classification accuracy or negative error, depending on the competence of each model. In this work, we used Intersection Over Union (IoU) to determine the optimal average weight for each of our eight segmentation models. The following equation is the base of weighted voting ensemble learning:
y ^ = arg max j i = 1 n W i P i , j
where P i , j : predicted class membership probability of the i classifier for class label j and W i : optimal weighting parameter.
The weighted voting method was applied to eight segmentation models to find the final prediction’s optimal average weight. The segmentation models include UNet [22], UNetPlus [61], AttUNet [62], UNet 3+ [63], TransUNet [64], Feature Pyramid Network (FPN) [65], LinkNet [65], and DeepLabv3 [66]. All models were trained on the same parameter. Further, the hyperparameter tuning method [67] was applied to select a set of optimal hyperparameters, including optimizer, learning rate, loss function, and trainable parameters for the eight models, as seen in Table 2.

3.3.3. Image Pre-Processing

As seen in Table 2, three image preprocessing steps were applied to eliminate undesirable distortions and to highlight certain image features. The three steps can be summarized as follows:
  • Normalization: the ultrasound image intensity range is 0 to 255. Therefore, we applied a normalization technique for shifting and rescaling values to fit in a range between 0 and 1. The Normalization Formula is as follows:
    Z = X X m i n X m a x X m i n
    where Z: the normalized value in the image, X: the original value in the image, X m i n : the minimum value in the image, and X m a x the maximum value in the image .
  • Resizing: The original image and mask size is 800 × 540 pixels; the images and masks were resized into two different sizes, and the difference between the two inputs, 64 × 64 and 128 × 128, is compared to evaluate the lightweight models and to use low-cost resources. In addition, while the original mask intensity was only two values, 0 and 255, after mask resizing, the intensity of the masks randomly ranged between 0 and 255. Therefore, the threshold of the resized masks had to be set to the original intensity, where 0 represents black pixels, and 255 represents white pixels. Finally, Softmax [68] was used as the output function; therefore, we had to encode the mask values to 0 for black and 1 for white pixels.
  • One-Hot encoding: One-hot encoding is not often used with numerical values (images). In this study, because the output function is Softmax and the loss function is categorical focal Jaccard loss, it is recommended that one-hot encoding be used. The class representing white pixels is (0, 1), and the class representing black pixels is (1, 0).

3.3.4. Hybrid Loss Function and Optimizer

As part of the ensemble transfer learning process, selecting the appropriate loss functions increased segmentation accuracy during subsequent inference time. Therefore, various loss functions were used for medical image segmentation [69]. This work used hyperparameter tuning to comprise the best loss function based on the IoU score. The optimal loss function was the categorical focal Jaccard loss (CFJL), which is a combination of the categorical focal loss (CFL) [70] and Jaccard loss (JL) [71], as defined below:
C F L ( G T , P R ) = G T · α · ( 1 P R ) γ · l o g ( P R )
J L ( A , B ) = 1 A B A B
C F J L = C F L + J L
Among the different optimizers, Adam and RMSProp [72] achieve accurate segmentation. The result demonstrates that the loss value of the Adam and RMSProp optimizers was lower than the others. However, using RMSProp with schedule learning rate and step decay that drops the learning rate (LR) by a factor every few epochs, it outperformed Adam. The step decay learning rate was defined as below:
L R = I n i t i a l L R × d r o p e p o c h e p o c h D r o p

3.4. Measurements Extraction

3.4.1. Post-Processing

Multi-smoothing and edge detection techniques were applied as post-processing to correct the defective segmented mask and improve the segmentation results. The aim was to smooth and sharpen the ellipse of the contour. Among various smoothing techniques, we employed a median filter combined with morphological image processing in our scenario, where the median filter is a non-linear digital filter that suppresses pulsed (non-stationary random process) interference by eliminating all suspicious readings. The filter calculates the median output value from a set of input data (see Equation (11)) [73].
Morphological image processing is a technique that deals with the shape or morphology of picture features. Morphological operations are well suited to the processing of binary images, since they rely solely on the relative ordering of pixel values, rather than their numerical values. Greyscale images can also be subjected to morphological techniques in which the light transfer functions are unknown, and where the absolute pixel values are of no or small importance. In our scenario, a pixel is in the neighborhood if its Euclidean distance from the origin is less than the ideal value of 25 [74]. This combination of median filter and morphological process provided the best result. Figure 2 illustrates the predicted mask before and after the smoothing.
f ^ ( x , y ) = median ( s , t ) S x y { g ( s , t ) }
where g ( s , t ) is noise, and the median filtering method is to sort the pixels in the sliding filter window, then the output pixel value f ^ ( x , y ) of the filtering result is the median value of the sequence [75].

3.4.2. HC Measurements

After the post-processing stage, the predicted mask is ready for measurements, which are obtained through fitting an ellipse model to the extracted contour. The task of fitting an ellipse model on top of scatter measurements is still considered a challenging problem by the computer vision and computational geometry community [76]. In our case, we started from the assumption that the contours extracted from generated masks are closed and smooth. To enforce this assumption, we used the preprocessing method described in [77], consisting of smoothing, parametrization and resampling, in a way where the input for the fitting procedure is a uniform angular parametrization of a given contour composed of a list of points x i = ( u ( θ i ) , v ( θ i ) ) T and a 1-to-1 mapping between angles θ i and samples x i in pixel units. Then, we used a non-linear least squares minimization process for fitting an explicit model x = x ( θ ) based on angular parametrization:
x ( θ ) = c + A r ( θ ) ,
where c = ( c u , c v ) T is the barycenter of the ellipse, r ( θ ) = ( cos θ , sin θ ) T is the angular unit vector, and A = [ ( a u u , a u v ) , ( a v u , a v v ) ] is a 2 × 2 matrix mapping the unit circle to ellipse. The proposed explicit model has various advantages. First, it depends on six parameters Œ = { c u , c v , a u u , a u v , a v u , a v v } all having the same dimensions (in pixels), and this makes it easy to define meaningful geometric bounds for the minimization process; second, the cost function can be computed with respect to the real geometric distance between points; finally, the Jacobian of cost function and the geometric parameters of the best-fit ellipse can be computed in closed-form. In our case, we considered as the cost function the square geometric distance weighted with the curvature computed for each point and regularized with the Tikhonov term for avoiding that the Jacobian matrix becomes singular during the minimization process:
C ( Œ ) = i w i x Œ ( θ i ) x i 2 + τ Œ 2 ,
where w i = κ i = 1 R i = 1 x i c is an estimate of the curvature of the ellipse in the point x i and τ is a small regularization constant (in our experiments τ = 10 8 ). Hence, the fitting problem can be stated as finding the set of parameters Œ for minimizing the cost function:
Πopt = arg min ΠC ( Π) ,
which can be solved using standard methods, like the Levenberg–Marquardt (LM) [78] or Trust Regions (RTS) [79]. In our experiments, we tried both methods as implemented in the Python scipy module, without noticeable differences in the fitting accuracy. As initial values for the minimization process, we used the parameters extracted from the bounding box of the contour.
Once the parametric representation of the ellipse was recovered, the geometric measurements can be computed in a closed form. Specifically, the semi-axes length and vectors can be computed by finding the extrema of the square distance between the ellipse and the center of the ellipse. According to the parametric model:
θ e x t = arg min θ x ( θ ) c 2 = arg min θ A r ( θ ) 2 ,
leading to the equation A r ˙ · A r = 0 , with solution
θ e x t = 1 2 arctan ( a u v 2 + a v v 2 ) ( a u u 2 + a v u 2 ) 2 ( a u u a u v + a v u a v v ) + k π 2
from which the semi-axes vectors can be directly computed. As seen in Figure 4, the measurements of interest include:
  • center x: represents the length in millimeters between the image’s beginning pixel on the x-axis and the ellipse’s middle pixel.
  • center y: represents the length, in millimeters, between the image’s beginning pixel on the y-axis and the ellipse’s middle pixel.
  • semi-axes a: Once the ellipse’s center is determined, the semi-axes determine the radius’s maximum value based on the distance between the ellipse’s middle and its farthest point.
  • semi-axes b: Once the ellipse’s center is determined, the semi-axes determine the radius’s minimum value based on the distance between the ellipse’s middle and its nearest point.
  • angle: contains the radian value of the angle formed by the center y and the semi-axis b.
  • area: is the size of the area in millimeters that represent the fetal head.
From previous values, the equivalent diameter, biparietal diameter (BPD), occipitofrontal diameter (OFD), and HC were calculated based on the following formula:
E q u i v a l e n t d i a m e t e r = s e m i a x e s a + s e m i a x e s b
B P D = s e m i a x e s b 2
O F D = s e m i a x e s a 2
H C = π B P D + O F D / 2
To ensure that the formula that we obtained from [6] for calculating HC is more accurate than that formally used in [41], the mean difference was calculated to compare both the formulas with the HC ground truth, which was given for the whole training set. Table 3 shows that our HC measurement is the closest to the HC ground truth.

3.5. GA and EFW Prediction

After completing the segmentation and fetal head measurements in the previous section, eight values (features) that represent the fetal head were obtained. These values are needed to generate a new dataset for fetal GA and EFW prediction.

3.5.1. Fetal Gestational Age Dataset

In the domain of fetal size and dating, Altman and Chitty [80] proposed a new formula for calculating the gestation age based on HC; later Loughna et al. [6] proved that this formula is only accurate when the fetal age is between 13 to 25 weeks. Therefore, this study used the formula recommended by Altman and Chitty [80] to label the new dataset manually, but only included GA from 13 to 25 weeks. Finally, the new dataset was used to train multi-regression models and predict GA from 10 to 40 weeks to overcome the limitation of the original formula:
l o g e ( G A ) = 0.010611 × H C 0.000030321 × H C 2 + 0.43498 × 10 7 × H C 3 + 1.848
G A = exp ( l o g e ( G A ) )
Table 4 shows that we generated the new dataset from both the training and testing images. The new dataset was split into three partitions for training (13–25 weeks), validation (10–40 weeks), and testing ( G A < 13 and G A > 25 weeks). The purpose of the validation set was to select the optimum regression model. The test set is used to compare the efficiency of the selected model, with results being obtained by an expert doctor. The mean square error (MSE) was used to evaluate different regression models, and Pearson’s r [81], to measure the statistical association between the predicted results by the regression models and the physician results based on test dataset GA prediction.

3.5.2. Fetal Weight Dataset

Estimated Fetal Weight (EFW) is calculated based on Hadlock’s formula [54], which required a pre-knowledge of HC, BPD, AC, and FL. In addition, Salomon et al. [53] proposed a polynomial formula to find a new reference chart for EFW calculation which only required the knowledge of GA. This new formula (see Equation (22)) is used to estimate fetal weight in grams based on fetal GA from 20 to 36 weeks. This formula was used to label the new dataset manually, but only fetal weights for GAs between 20 to 36 weeks were used in this dataset. The new dataset was then used to train multi-regression models and predict the EFW from 10 to 40 weeks to overcome the limitations of the original formula:
E F W = 26256.56 + 4222.827 × G A 251.9597 × G A 2 + 6.623713 × G A 3 0.0628939 × G A 4
Table 5 shows that we generated the new dataset from both training and testing images. The new dataset was split into three partitions for training (20–36 weeks), validation (10–40 weeks), and testing ( G A < 20 and G A > 36 weeks). The purpose of the validation set is to select the optimum regression model for fetal weight prediction. The test set is used to compare the efficiency of the selected model, with results being obtained from longitudinal reference [82]. The mean square error (MSE) was used to evaluate different regression models, and Pearson’s r [81] was used to measure the statistical association between predicted results by the regression models and longitudinal reference [82].

4. Experiments

In this section, the experiment set up is identified, and the three levels of evaluation for the segmentation model and the two levels of evaluation for the GA and EFW predictions are explained.

4.1. Training

This study’s experiments were performed on a graphics workstation, with Intel(R) Core(TM) i9-9900K CPU @ 3.60 GHz, NVIDIA GeForce RTX 2080 Ti 11 GB, and 64 G RAM. The popular Tensorflow 2.6.0 and Keras 2.4.0 were chosen for the deep learning framework. All segmentation models were trained using the same hyperparameter settings as seen in Table 2; each model was trained for 100 epochs, and the training time was reported. The input size of model training for the first experiment was 64 × 64, and the second was 128 × 128.

4.2. Segmentation Models Evaluation

Three levels of evaluation were conducted to quantitatively analyze and evaluate the segmentation model’s performance, as seen in Table 6.

4.2.1. Level 1: Segmentation Evaluation

Eight indices (Equations (23)–(30)) were used to evaluate segmentation model performance. These indices included area under the curve (AUC), accuracy (ACC), mean intersection over union (mIoU), precision (Pre), recall, dice similarity coefficient (DSC), mean squared error (MSE), and mean pixel accuracy (mPA), as defined below:
T r u e P o s i t i v e R a t e ( T P R ) = T r u e P o s i t i v e ( T P ) T r u e P o s i t i v e ( T P ) + F a l s e N e g a t i v e ( F N ) F a l s e P o s i t i v e R a t e ( F P R ) = F a l s e P o s i t i v e ( F P ) F a l s e P o s i t i v e ( F P ) + T r u e N e g a t i v e ( T N )
A U C = T P R F P R
A c c u r a c y = T P + T N T P + T N + F P + F N
m I o U ( U , V ) = i = 1 C | U V | | U V | = i = 1 C T P T P + F P + F N
P r e = T P T P + F P
R e c a l l = T P T P + F N
D S C = 2 P r e R e c a l l P r e + R e c a l l = 2 T P 2 T P + F P + F N
M S E = i = 1 C ( G i P i ) 2
m P A = i = 1 C T P i = 1 C T P + F P

4.2.2. Level 2: Post-Processing Evaluation

This study compared predicted masks using different models with ground truth masks to evaluate the predicted mask in terms of quality assessment and smoothing (post-processing). For this purpose, five indices [83] (Equations (31)–(35)) were used, including mean Hausdorff distance (mHD), mean surface distance (MSD), relative volume difference (RVD), mean structural similarity index (MSSIM), and peak signal-to-noise ratio (PSNR):
m H D P , G = 1 P p P m a x d ( p , g ) + 1 G g G m a x d ( p , g ) 2
M S D = 1 2 d ^ ( S p , S g ) + d ^ ( S g , S p )
R V D ( P , G ) = G P P
M S S I M ( G , P ) = 1 M j = 1 M S S I M ( g j , p j )
P S N R ( P , G ) = 10 l o g 255 2 M S E ( P , G )

4.2.3. Level 3: Measurement Evaluation

To ensure the set of values obtained through the measurement algorithm, three indices (Equations (28), (31) and (36)) were used to evaluate the test dataset, including mHD, DCS, and mean absolute difference (MAD), as defined below:
M A D = H C p H C g n

4.3. Evaluation of GA and EFW Prediction

Regression models were used for the estimated fetal GA and EFW predictions. MSE (Equation (29)) was used to evaluate and select the best regression model. Pearson’s r [81] (Equation (37)) was used to evaluate the predicted value (GA and EFW) by calculating the statistical association between our model, the medical doctor, and the longitudinal reference.
r = i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n ( x i x ¯ ) 2 ( y i y ¯ ) 2
where r is correlation coefficient, x i is the values of the x-variable in a sample, x ¯ is the mean of the values of the x-variable, y i is the values of the y-variable in a sample, y ¯ is the mean of the values of the y-variable.

5. Results and Discussion

The first part of this section presents the obtained results for the different models’ segmentation efficiency, mask quality assessment (post-processing), measurement performance, and a comparison with the previous state-of-the-art. The second part presents the obtained result for the fetal GA and EFW regression models’ efficiency and clinical validation.

5.1. Segmentation Performance

Figure 5 shows that all models obtained a validation score above 0.98 IoU during training. The FPN reached 0.9861 IoU, which is slightly better than other models. It is a 0.04 IoU improvement, compared to the lower performing model LinkNet, which has a 0.9825 IoU. UNet3+ obtained the second-best value but took a long time to train, as seen in Table 7. Therefore, UNet3+ was excluded from the weighted voting algorithm. LinkNet, Deeplab3,TransUNet, and UNet Plus obtained low scores of 0.982, 0982, and 0.983, respectively; therefore, they were excluded during the weighted voting algorithm. The FPN, UNet, and AttUNet models obtained the highest IoU score with a low training time. These models were used to perform weighted voting and to select the optimum weight for our ETLM. Table 7 reports eight indices that are used to evaluate each model’s segmentation performance [84].
The overall result proves that transfer learning using EfficientNetB0 achieved promising results, despite a low input size and less training time. Therefore, this study proves that transfer learning can develop a lightweight model, which was a challenge for medical image segmentation tasks. With an input size of 128 × 128 and no augmentation, results may vary from one model to another. It can be seen from the two indices, mIoU and MSE, that the FPN and AttUNet achieved the best result with the average training time. Further, with input size 64 × 64 and augmentation, ETLM outperformed all other models in terms of ACC, mIoU, Pre, Recall, DSC, AUC, and mPA. In the case of input size 128 × 128 with augmentation, ETLM outperformed all other models in terms of ACC, mIoU, Pre, Recall, AUC, MSE, and mPA. Finally, as seen in Table 7, all indices reported during validation showed that ensemble learning could add slight improvements to the segmentation model and predict image masks. However, these new predicted masks had to be post-processed for edge smoothing and required quality assessment tests, as discussed in the following subsection.

5.2. Measurements Performance

5.2.1. Post-Processing Evaluation

Figure 6 presents samples of original images, predicted masks after post-processing, ground truths, and ellipse fitted masks; however, it is challenging to identify differences and similarities by looking at the predicted masks and ground truths. Therefore, this study performed a mask quality assessment test shown in Table 8, to prove that the promised result obtained during the level one evaluation is realistic and reliable.
Table 8 shows a comparison between two distinct groups of predicted masks; the first group was predicted using various segmentation networks trained with a 64 × 64 input size. The other group used networks that trained with a 128 × 128 input size. In both cases, the results indicate that ETLM is more like the ground truth mask, where minimum mHD, MSD, RVD, and maximum MSSIM and PSNR were obtained using masks predicted by ETLM with post-processing. However, some results may vary slightly, as seen in the case of the 128 × 128 FPN, which obtained minimum mHD, but the ETLM performance was best in other indices. The RVD is always negative, as seen in Table 8, which means that in all cases, the predicted mask size (fetal head contour) was bigger than the ground truth in the masks predicted by different networks. However, ETLM minimized this difference to 0.0016 to achive the best similarity with the ground truth. Overall, the level two evaluation proved that the predicted masks obtained by this study’s ETLM are remarkably close to the ground truth, with a difference of 0.011, as reported by MSSIM (see Figure 6).

5.2.2. Fetal Head Measurement Evaluation

Fetal head measurements were evaluated on the testing dataset, which consisted of 355 images. Unfortunately, the ground truth for this dataset is not available to the public; therefore, the measurement evaluation result was obtained by submitting measurement values to the dataset website (https://hc18.grand-challenge.org/ 11 August 2022) and obtaining the mHD, MAD, and DCS, as shown in Table 9.

5.3. Comparative Analysis

Table 10 provides a comprehensive comparison between our ETLM and the published results reported in the literature. First, the ETLM outperformed the state-of-the-art models in the segmentation task regarding ACC, mIoU, Pre, and mPA. Second, the results of this study are better than [32,36,39,43,47], in terms of MAD, and better than [32,39,42] in terms of mHD. However, the result in this study is inferior to the results found in [41,49] because the models used in those studies were heavy and trained for more than 30 h with high input resolution, making the models very expensive in terms of required resources and time. Finally, a model weight comparison showed that the lightweight ETLM used in this study is superior, because promising results with very low resolution (128 × 128) and less training time (2 h) were achieved. This study proves that ensemble and transfer learning overcomes medical image segmentation challenges such as low image intensity, the need for expensive resources, long training time, and heavy model deployment.

5.4. GA and EFW Prediction Performance

For fetal GA and EFW prediction, we trained 17 regression models on each dataset independently. Because the dataset contains large numerical values, a log transformation was applied to both datasets before training, making the highly skewed distributions less skewed. The performance of each model was evaluated using MSE, and the result was reported in Table 11. This task aimed to address the limitation of both formulas (see Equations (21) and (22)) used to estimate the GA and EFW. Therefore, the regression model was used to predict GA for the fetus when the GA of the fetus was 13 > G A > 25 , and the EFW for the fetus when the GA of the fetus was 20 > G A > 36 . In both cases, the ground truth was non-existent because both formulas had limitations, and a GA and EFW could not be calculated in the mentioned periods; therefore, the following steps were taken:
  • Validation of predicted GA: 50 random samples images taken from the testing set ( 13 > G A > 25 ) were given to a senior attending physician with 21 years of experience in maternal-fetal medicine, to estimate GA. We used Pearson’s r to measure the strength of a linear association between the physician prediction and the model prediction for the same sample set. Because we do not have any pre-knowledge of the dataset in terms of ethnicity or location, the GA may vary based on these factors; therefore, in this work, we tried to predict the GA in the 50th percentile, and considered the median.
  • Validation of predicted EFW: In the case of EFW, the senior physician could not estimate the EFW based on fetal head images and required more factors such as FL, AC, and CRL. Therefore, a growth chart taken from a longitudinal reference was used for estimated fetal weight, regardless of fetal sex [82]. Then, Pearson’s r was used to measure the strength of the linear association between the longitudinal reference and the model prediction for the same sample set that fell in the range of 20 > G A > 36 . This study tried to predict the EFW in the 50th percentile and considered the median for the above mentioned reason.
Table 11, shows that most regression models achieved a promising result in GA and EFW datasets based on MSE. In the GA validation dataset, polynomial regression and Deep NN achieved a lower MSE of 0.0003 and 0.00072, respectively. However, to ensure the reliability of each model, all models were used to predict the 50th percentile of GA. The predicted GA was then compared with the physician’s estimations using Pearson’s r. After comparing the predicted GA with the physician’s estimation, Table 11 shows that Deep NN and polynomial regression outperformed all regression models for predicting the GA, with Pearson’s r of 0.9978 and 0.9958, respectively.
For Fetal EFW, LinearSVR, XGBRFRegressor, and linear regression achieved the lower MSE in the EFW validation dataset, as reported in Table 11. Nonetheless, all the models were used to predict the 50th percentile of EFW in the test dataset to ensure the reliability of each model’s prediction. Then, it was compared with the longitudinal reference table, as seen in Appendix A Table A1. As a result, Pearson’s r showed that LinearSVR outperformed all the models and predicted the EFW in the 50th percentile with the highest association with the longitudinal reference (r = 0.9989). In addition, XGBRFRegressor showed a low MSE during validation, and a low association with the longitudinal reference.
Overall, most regression models could predict the GA and EFW in the 50th percentile, as seen from Pearson’s results in Table 11. It is concluded that the regression models in this study address the limitations of the formulas currently used to calculate GA and EFW in the specific period. Without limitation, these models only required measurement of the fetal head to calculate GA and EFW from the 10th week to the 40th week. This study is the first work that utilized machine learning to predict the GA and EFW based on fetal head images. A sample of model prediction for GA and EFW was added to (Supplementary File S1 and File S2), respectively.

6. Strength and Limitations

Including US machines in various medical settings is advised; however, this is not always feasible, due to the cost of purchasing multiple devices or portability concerns. Mobile Health companies such as Clarius (Clarius Mobile Health Corp., Vancouver, BC, Canada) [85] developed portable pocket handheld ultrasound scanners that represent a promising tool in regional anesthesia procedures and vascular accesses [86]. Furthermore, these portable devices are still examined for extensive imaging, such as prenatal scans, which require a lightweight AI system to maintain high accuracy and low resource. Therefore, in this work, we deployed lightweight architectures that can be used in portable devices without client-server communication. The architectures resulted in fast training on low-end machines and fast inference without the need for complex client-server architecture that would pose issues for data privacy and security limitations related to image resolution that can affect measurement accuracy. In addition to fetal head segmentation, a regression model was employed to predict GA and EFW in the 50th percentile in all trimesters based on fetal head features, which current methods cannot do. Furthermore, the framework in this study can be extended to build a fully automatic AI system in the client-server to provide a detailed report for any fetal head ultrasound images.
Despite the study’s strengths, the framework still has some constraints that will need to be overcome in the future. First, downsampling the original images reduced the measurement accuracy. For example, subsampling images from 128 × 128 to 64 × 64 reduced the PSNR value by 3.1 and mHD by 0.17 mm, as seen in Table 8. Second, fetal GA and EFW may vary slightly from one group to another, based on ethnicity and gender. This study did not have this information, so the 50th percentile was predicted as the median. Moreover, the clinical appliance has to be decided by medical personnel, since the existing differences between the actual image and the one generated by the proposed model could be substantial in the medical field.

7. Conclusions and Future Work

This work proposed a new pipeline that utilized transfer learning and ensemble learning to build ensemble models called ETLM. Eight segmentation networks were evaluated to build an ensemble model based on the weighted voting method for fetal head segmentation. These segmented masks were used to accurately measure HC, BPD, OFD, and other values in ultrasound images. Masks segmented by each model went through a quality assessment test to ensure the efficiency of ETLM, and were compared with other independent models. Our experimental results show that the proposed pipeline achieved comparable performance to state-of-the-art models in segmentation and measurement. Further, regression models showed that of the features obtained from the segmented fetal images to build a new dataset for GA and EFW, only fetal head images were required to predict GA and EFW. The results of this study were validated with the assistance of an expert physician and longitudinal reference. This study is the first work that provides a completed approach from image segmentation to GA and EFW prediction. Future work will include a full adoption of transfer learning based on a model trained on ultrasound images, regardless of the domain of the images. Further, a traditional machine learning classifier will be used to find the best features to reduce ultrasound images’ intensity and noise. Finally, the cavum septum pellucidum and the lateral ventricle will be segmented, measured, and compared with the ultrasound machine.
Future work will include a full adoption of transfer learning based on a model trained on ultrasound images, regardless of the domain of the images. Further, a traditional machine learning classifier will be used to find the best features that will reduce the intensity and the noise in the ultrasound images. Finally, we will segment and measure the cavum septum pellucidum and the lateral ventricle, and compare our results with the ultrasound machine.

Supplementary Materials

The following supporting information can be downloaded at: https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/diagnostics12092229/s1, File S1: Sample of Models Prediction for GA against Doctor calculation. File S2: Sample of Models Prediction for EFW against Longitudinal reference in 50th Percentile.

Author Contributions

M.A. (Mahmood Alzubaidi) and M.A. (Marco Agus): Conceptualization, Formal Analysis, Methodology, Writing. U.S.: Review and Editing. M.M.: Validation. K.A. and M.H.: Supervision, Review, Editing. All authors have read and agreed to the published version of the manuscript.

Funding

Open Access funding provided by the Qatar National Library.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data and code are available upon request.

Acknowledgments

The HC18 Challenge offered fetal ultrasound images gathered from the Radboud University Medical Center’s Department of Obstetrics database in Nijmegen, the Netherlands, in conformity with the local ethical council (CMO Arnhem-Nijmegen). All data were anonymized per the Declaration of Helsinki’s principles.

Conflicts of Interest

No conflict of financial or personal interest that may have affected this study’s results were reported by the authors.

Appendix A. Longitudinal Reference

Table A1. Growth chart for estimated fetal weight regardless of fetal sex.
Table A1. Growth chart for estimated fetal weight regardless of fetal sex.
Gestational Age (Weeks)Estimated Fetal Weight (g) by Percentile
2.5510255075909597.5
14707378839098104109113
15899399106114124132138144
16113117124133144155166174181
17141146155166179193207217225
18174181192206222239255268278
19214223235252272292313328340
20260271286307330355380399413
21314327345370398428458481497
22375392412443476512548575595
23445465489525565608650682705
24523548576618665715765803830
25611641673723778836894938970
26707743780838902971103810871125
2781385589896410391118119612511295
289299771026110211891279136814291481
29105311081165125113501453155416221682
30118512471313141015231640175318281897
31132613941470157917071838196420462126
32147315481635175719012047218722762367
33162617081807194221032266241925162619
34178518721985213423122492265927642880
35194820382167233025272723290430183148
36211322052352253127452959315332773422
37228023722537273329663195340335383697
38244625362723293531863432365237993973
39261226962905313534033664389740584247
40277528493084333336173892413543124515

References

  1. Mayer, D.P.; Shipilov, V. Ultrasonography and magnetic resonance imaging of uterine fibroids. Obstet. Gynecol. Clin. N. Am. 1995, 22, 667–725. [Google Scholar] [CrossRef]
  2. Griffin, R.M. Fetal Biometry. WebMD 2020. Available online: https://www.webmd.com/baby/fetal-biometry (accessed on 11 August 2022).
  3. Whitworth, M.B.L.; Mullan, C. Ultrasound for fetal assessment in early pregnancy. Cochrane Database Syst. Rev. 2015, 2015, CD007058. [Google Scholar] [CrossRef] [PubMed]
  4. Alzubaidi, M.; Agus, M.; Alyafei, K.; Althelaya, K.A.; Shah, U.; Abd-Alrazaq, A.; Anbar, M.; Makhlouf, M.; Househ, M. Toward deep observation: A systematic survey on artificial intelligence techniques to monitor fetus via ultrasound images. iScience 2022, 25, 104713. [Google Scholar] [CrossRef] [PubMed]
  5. Halle, K.F.; Fjose, M.; Kristjansdottir, H.; Bjornsdottir, A.; Getz, L.; Tomasdottir, M.O.; Sigurdsson, J.A. Use of pregnancy ultrasound before the 19th week scan: An analytical study based on the Icelandic Childbirth and Health Cohort. BMC Pregnancy Childbirth 2018, 18, 512. [Google Scholar] [CrossRef]
  6. Loughna, P.; Chitty, L.; Evans, T.; Chudleigh, T. Fetal Size and Dating: Charts Recommended for Clinical Obstetric Practice. Ultrasound 2009, 17, 160–166. [Google Scholar] [CrossRef]
  7. Jatmiko, W.; Habibie, I.; Ma’sum, M.A.; Rahmatullah, R.; Satwika, I.P. Automated Telehealth System for Fetal Growth Detection and Approximation of Ultrasound Images. Int. J. Smart Sens. Intell. Syst. 2015, 8, 697–719. [Google Scholar] [CrossRef]
  8. Schmidt, U.; Temerinac, D.; Bildstein, K.; Tuschy, B.; Mayer, J.; Sütterlin, M.; Siemer, J.; Kehl, S. Finding the most accurate method to measure head circumference for fetal weight estimation. Eur. J. Obstet. Gynecol. Reprod. Biol. 2014, 178, 153–156. [Google Scholar] [CrossRef]
  9. Noble, J.A. Ultrasound image segmentation and tissue characterization. Proc. Inst. Mech. Eng. Part H J. Eng. Med. 2010, 224, 307–316. [Google Scholar] [CrossRef]
  10. Van den Heuvel, T.L.A.; de Bruijn, D.; de Korte, C.L.; Ginneken, B. Automated measurement of fetal head circumference using 2D ultrasound images. PLoS ONE 2018, 13, e0200412. [Google Scholar] [CrossRef]
  11. Espinoza, J.; Good, S.; Russell, E.; Lee, W. Does the Use of Automated Fetal Biometry Improve Clinical Work Flow Efficiency? J. Ultrasound Med. 2013, 32, 847–850. [Google Scholar] [CrossRef]
  12. Ciurte, A.; Bresson, X.; Cuadra, M.B. A semi-supervised patch-based approach for segmentation of fetal ultrasound imaging. In Proceedings of the Challenge US: Biometric Measurements from Fetal Ultrasound Images, ISBI 2012, Barcelona, Spain, 2–5 May 2012; pp. 5–7. [Google Scholar]
  13. Ponomarev, G.V.; Gelfand, M.S.; Kazanov, M.D. A multilevel thresholding combined with edge detection and shape-based recognition for segmentation of fetal ultrasound images. In Proceedings of the Challenge US: Biometric Measurements from Fetal Ultrasound Images, ISBI 2012, Barcelona, Spain, 2–5 May 2012; pp. 17–19. [Google Scholar]
  14. Stebbing, R.V.; McManigle, J.E. A boundary fragment model for head segmentation in fetal ultrasound. In Proceedings of the Challenge US: Biometric Measurements from Fetal Ultrasound Images, ISBI, Barcelona, Spain, 2–5 May 2012; pp. 9–11. [Google Scholar]
  15. Perez-Gonzalez, J.L.; Muńoz, J.C.B.; Porras, M.C.R.; Arámbula-Cosío, F.; Medina-Bańuelos, V. Automatic Fetal Head Measurements from Ultrasound Images Using Optimal Ellipse Detection and Texture Maps. In Proceedings of the VI Latin American Congress on Biomedical Engineering CLAIB 2014, Paraná, Argentina, 29–31 October 2014; Braidot, A., Hadad, A., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 329–332. [Google Scholar] [CrossRef]
  16. Shrimali, V.; Anand, R.S.; Kumar, V. Improved segmentation of ultrasound images for fetal biometry, using morphological operators. In Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, MI, USA, 3–6 September 2009; pp. 459–462. [Google Scholar] [CrossRef]
  17. Rueda, S.; Fathima, S.; Knight, C.L.; Yaqub, M.; Papageorghiou, A.T.; Rahmatullah, B.; Foi, A.; Maggioni, M.; Pepe, A.; Tohka, J.; et al. Evaluation and Comparison of Current Fetal Ultrasound Image Segmentation Methods for Biometric Measurements: A Grand Challenge. IEEE Trans. Med. Imaging 2014, 33, 797–813. [Google Scholar] [CrossRef]
  18. Jardim, S.M.; Figueiredo, M.A. Segmentation of fetal ultrasound images. Ultrasound Med. Biol. 2005, 31, 243–250. [Google Scholar] [CrossRef]
  19. Ahmad, M.; Qadri, S.F.; Ashraf, M.U.; Subhi, K.; Khan, S.; Zareen, S.S.; Qadri, S. Efficient Liver Segmentation from Computed Tomography Images Using Deep Learning. Comput. Intell. Neurosci. 2022, 2022, 2665283. [Google Scholar] [CrossRef]
  20. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
  21. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
  22. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Strasbourg, France, 27 September–1 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
  23. Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar] [CrossRef]
  24. Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef] [Green Version]
  25. Torres, H.R.; Morais, P.; Oliveira, B.; Birdir, C.; Rüdiger, M.; Fonseca, J.C.; Vilaça, J.L. A review of image processing methods for fetal head and brain analysis in ultrasound images. Comput. Methods Programs Biomed. 2022, 215, 106629. [Google Scholar] [CrossRef]
  26. Mayer, C.; Joseph, K.S. Fetal growth: A review of terms, concepts and issues relevant to obstetrics. Ultrasound Obstet. Gynecol. 2013, 41, 136–145. [Google Scholar] [CrossRef]
  27. Dudley, N.J. A systematic review of the ultrasound estimation of fetal weight. Ultrasound Obstet. Gynecol. 2005, 25, 80–89. [Google Scholar] [CrossRef]
  28. Carneiro, G.; Georgescu, B.; Good, S.; Comaniciu, D. Detection and Measurement of Fetal Anatomies from Ultrasound Images using a Constrained Probabilistic Boosting Tree. IEEE Trans. Med. Imaging 2008, 27, 1342–1355. [Google Scholar] [CrossRef]
  29. Lu, W.; Tan, J.; Floyd, R. Automated fetal head detection and measurement in ultrasound images by iterative randomized hough transform. Ultrasound Med. Biol. 2005, 31, 929–936. [Google Scholar] [CrossRef]
  30. Zhang, L.; Ye, X.; Lambrou, T.; Duan, W.; Allinson, N.; Dudley, N.J. A supervised texton based approach for automatic segmentation and measurement of the fetal head and femur in 2D ultrasound images. Phys. Med. Biol. 2016, 61, 1095–1115. [Google Scholar] [CrossRef] [PubMed]
  31. Li, J.; Wang, Y.; Lei, B.; Cheng, J.Z.; Qin, J.; Wang, T.; Li, S.; Ni, D. Automatic Fetal Head Circumference Measurement in Ultrasound Using Random Forest and Fast Ellipse Fitting. IEEE J. Biomed. Health Inform. 2018, 22, 215–223. [Google Scholar] [CrossRef] [PubMed]
  32. Sobhaninia, Z.; Rafiei, S.; Emami, A.; Karimi, N.; Najarian, K.; Samavi, S.; Reza Soroushmehr, S.M. Fetal Ultrasound Image Segmentation for Measuring Biometric Parameters Using Multi-Task Deep Learning. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 6545–6548. [Google Scholar] [CrossRef]
  33. Cerrolaza, J.J.; Sinclair, M.; Li, Y.; Gomez, A.; Ferrante, E.; Matthew, J.; Gupta, C.; Knight, C.L.; Rueckert, D. Deep learning with ultrasound physics for fetal skull segmentation. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 564–567. [Google Scholar] [CrossRef]
  34. Budd, S.; Sinclair, M.; Khanal, B.; Matthew, J.; Lloyd, D.; Gomez, A.; Toussaint, N.; Robinson, E.C.; Kainz, B. Confident Head Circumference Measurement from Ultrasound with Real-Time Feedback for Sonographers. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Shenzhen, China, 13–17 October 2019; Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 683–691. [Google Scholar] [CrossRef] [Green Version]
  35. Chaurasia, A.; Culurciello, E. LinkNet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar] [CrossRef]
  36. Qiao, D.; Zulkernine, F. Dilated Squeeze-and-Excitation U-Net for Fetal Ultrasound Image Segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Virtual, 27–29 October 2020; pp. 1–7. [Google Scholar] [CrossRef]
  37. Desai, A.; Chauhan, R.; Sivaswamy, J. Image Segmentation Using Hybrid Representations. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1–4. [Google Scholar] [CrossRef]
  38. Aji, C.P.; Fatoni, M.H.; Sardjono, T.A. Automatic Measurement of Fetal Head Circumference from 2-Dimensional Ultrasound. In Proceedings of the 2019 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), Surabaya, Indonesia, 19–20 November 2019; pp. 1–5. [Google Scholar] [CrossRef]
  39. Sobhaninia, Z.; Emami, A.; Karimi, N.; Samavi, S. Localization of Fetal Head in Ultrasound Images by Multiscale View and Deep Neural Networks. In Proceedings of the 2020 25th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 1–2 January 2020; pp. 1–5. [Google Scholar] [CrossRef]
  40. Brahma, K.; Kumar, V.; Samir, A.E.; Chandrakasan, A.P.; Eldar, Y.C. Efficient Binary Cnn For Medical Image Segmentation. In Proceedings of the 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice, France, 13–16 April 2021; pp. 817–821. [Google Scholar] [CrossRef]
  41. Zeng, Y.; Tsui, P.H.; Wu, W.; Zhou, Z.; Wu, S. Fetal Ultrasound Image Segmentation for Automatic Head Circumference Biometry Using Deeply Supervised Attention-Gated V-Net. J. Digit. Imaging 2021, 34, 134–148. [Google Scholar] [CrossRef] [PubMed]
  42. Xu, L.; Gao, S.; Shi, L.; Wei, B.; Liu, X.; Zhang, J.; He, Y. Exploiting Vector Attention and Context Prior for Ultrasound Image Segmentation. Neurocomputing 2021, 454, 461–473. [Google Scholar] [CrossRef]
  43. Skeika, E.L.; Luz, M.R.D.; Fernandes, B.J.T.; Siqueira, H.V.; De Andrade, M.L.S.C. Convolutional Neural Network to Detect and Measure Fetal Skull Circumference in Ultrasound Imaging. IEEE Access 2020, 8, 191519–191529. [Google Scholar] [CrossRef]
  44. Wu, L.; Xin, Y.; Li, S.; Wang, T.; Heng, P.A.; Ni, D. Cascaded Fully Convolutional Networks for automatic prenatal ultrasound image segmentation. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, Australia, 18–21 April 2017; pp. 663–666. [Google Scholar] [CrossRef]
  45. Sinclair, M.; Baumgartner, C.F.; Matthew, J.; Bai, W.; Martinez, J.C.; Li, Y.; Smith, S.; Knight, C.L.; Kainz, B.; Hajnal, J.; et al. Human-level Performance On Automatic Head Biometrics in Fetal Ultrasound Using Fully Convolutional Neural Networks. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 17–21 July 2018; pp. 714–717. [Google Scholar] [CrossRef]
  46. Al-Bander, B.; Alzahrani, T.; Alzahrani, S.; Williams, B.M.; Zheng, Y. Improving fetal head contour detection by object localisation with deep learning. In Medical Image Understanding and Analysis; Zheng, Y., Williams, B.M., Chen, K., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 142–150. [Google Scholar] [CrossRef]
  47. Zhang, J.; Petitjean, C.; Lopez, P.; Ainouz, S. Direct estimation of fetal head circumference from ultrasound images based on regression CNN. In Proceedings of the Third Conference on Medical Imaging with Deep Learning, Montreal, QC, Canada, 6–8 July 2020; Arbel, T., Ben Ayed, I., de Bruijne, M., Descoteaux, M., Lombaert, H., Pal, C., Eds.; PMLR: Baltimore, MA, USA, 2020; Volume 121, pp. 914–922. [Google Scholar]
  48. Fiorentino, M.C.; Moccia, S.; Capparuccini, M.; Giamberini, S.; Frontoni, E. A regression framework to head-circumference delineation from US fetal images. Comput. Methods Programs Biomed. 2021, 198, 105771. [Google Scholar] [CrossRef]
  49. Li, P.; Zhao, H.; Liu, P.; Cao, F. Automated measurement network for accurate segmentation and parameter modification in fetal head ultrasound images. Med. Biol. Eng. Comput. 2020, 58, 2879–2892. [Google Scholar] [CrossRef]
  50. Verburg, B.O.; Steegers, E.A.P.; De Ridder, M.; Snijders, R.J.M.; Smith, E.; Hofman, A.; Moll, H.A.; Jaddoe, V.W.V.; Witteman, J.C.M. New charts for ultrasound dating of pregnancy and assessment of fetal growth: Longitudinal data from a population-based cohort study. Ultrasound Obstet. Gynecol. 2008, 31, 388–396. [Google Scholar] [CrossRef]
  51. Mu, J.; Slevin, J.C.; Qu, D.; McCormick, S.; Adamson, S.L. In vivo quantification of embryonic and placental growth during gestation in mice using micro-ultrasound. Reprod. Biol. Endocrinol. 2008, 6, 34. [Google Scholar] [CrossRef]
  52. Butt, K.; Lim, K. Determination of Gestational Age by Ultrasound: In Response. J. Obstet. Gynaecol. Can. 2016, 38, 432. [Google Scholar] [CrossRef]
  53. Salomon, L.J.; Bernard, J.P.; Ville, Y. Estimation of fetal weight: Reference range at 20–36 weeks’ gestation and comparison with actual birth-weight reference range. Ultrasound Obstet. Gynecol. 2007, 29, 550–555. [Google Scholar] [CrossRef]
  54. Hadlock, F.P.; Harrist, R.; Sharman, R.S.; Deter, R.L.; Park, S.K. Estimation of fetal weight with the use of head, body, and femur measurements—A prospective study. Am. J. Obstet. Gynecol. 1985, 151, 333–337. [Google Scholar] [CrossRef]
  55. Hammami, A.; Mazer Zumaeta, A.; Syngelaki, A.; Akolekar, R.; Nicolaides, K.H. Ultrasonographic estimation of fetal weight: Development of new model and assessment of performance of previous models. Ultrasound Obstet. Gynecol. 2018, 52, 35–43. [Google Scholar] [CrossRef] [Green Version]
  56. Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
  57. Hesamian, M.H.; Jia, W.; He, X.; Kennedy, P. Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges. J. Digit. Imaging 2019, 32, 582–596. [Google Scholar] [CrossRef]
  58. Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; PMLR: Baltimore, MA, USA, 2019; Volume 97, pp. 6105–6114. [Google Scholar]
  59. Yang, Y.; Lv, H. Discussion of Ensemble Learning under the Era of Deep Learning. arXiv 2021, arXiv:2101.08387. [Google Scholar]
  60. Polikar, R. Ensemble learning. In Ensemble Machine Learning: Methods and Applications; Zhang, C., Ma, Y., Eds.; Springer US: Boston, MA, USA, 2012; pp. 1–34. [Google Scholar] [CrossRef]
  61. Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Stoyanov, D., Taylor, Z., Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R., Bradley, A., Papa, J.P., Belagiannis, V., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar] [CrossRef]
  62. Oktay, O.; Schlemper, J.; Le Folgoc, L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Y Hammerla, N.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
  63. Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar] [CrossRef]
  64. Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
  65. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
  66. Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
  67. O’Malley, Tom and Bursztein, Elie and Long, James and Chollet, François and Jin, Haifeng and Invernizzi, Luca and others. et al. KerasTuner. 2019. Available online: https://github.com/keras-team/keras-tuner (accessed on 1 April 2022).
  68. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  69. Asgari Taghanaki, S.; Abhishek, K.; Cohen, J.P.; Cohen-Adad, J.; Hamarneh, G. Deep semantic segmentation of natural and medical images: A review. Artif. Intell. Rev. 2021, 54, 137–178. [Google Scholar] [CrossRef]
  70. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  71. Bertels, J.; Eelbode, T.; Berman, M.; Vandermeulen, D.; Maes, F.; Bisschops, R.; Blaschko, M.B. Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation: Theory and Practice. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Shenzhen, China, 13–17 October 2019; Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 92–100. [Google Scholar] [CrossRef]
  72. Zou, F.; Shen, L.; Jie, Z.; Zhang, W.; Liu, W. A Sufficient Condition for Convergences of Adam and RMSProp. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11119–11127. [Google Scholar] [CrossRef]
  73. Ning, C.; Liu, S.; Qu, M. Research on removing noise in medical image based on median filter method. In Proceedings of the 2009 IEEE International Symposium on IT in Medicine & Education, Albuquerque, NM, USA, 2–5 August 2009; Volume 1, pp. 384–388. [Google Scholar] [CrossRef]
  74. Van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T. Scikit-image: Image processing in Python. PeerJ 2014, 2, e453. [Google Scholar] [CrossRef] [PubMed]
  75. Song, Y.; Liu, J. An improved adaptive weighted median filter algorithm. J. Phys. Conf. Ser. 2019, 1187, 042107. [Google Scholar] [CrossRef]
  76. Hu, C.; Wang, G.; Ho, K.C.; Liang, J. Robust Ellipse Fitting with Laplacian Kernel Based Maximum Correntropy Criterion. IEEE Trans. Image Process. 2021, 30, 3127–3141. [Google Scholar] [CrossRef]
  77. Al-Thelaya, K.; Agus, M.; Gilal, N.; Yang, Y.; Pintore, G.; Gobbetti, E.; Calí, C.; Magistretti, P.; Mifsud, W.; Schneider, J. InShaDe: Invariant Shape Descriptors for visual 2D and 3D cellular and nuclear shape analysis and classification. Comput. Graph. 2021, 98, 105–125. [Google Scholar] [CrossRef]
  78. Gavin, H.P. The Levenberg-Marquardt Algorithm for Nonlinear Least Squares Curve-Fitting Problems; Duke University: Durham, NC, USA, 2019; pp. 1–19. [Google Scholar]
  79. Voglis, C.; Lagaris, I. A rectangular trust region dogleg approach for unconstrained and bound constrained nonlinear optimization. In Proceedings of the WSEAS International Conference on Applied Mathematics, Corfu Island, Greece, 17–19 August 2004; Volume 7. [Google Scholar]
  80. Altman, D.G.; Chitty, L.S. New charts for ultrasound dating of pregnancy. Ultrasound Obstet. Gynecol. 1997, 10, 174–191. [Google Scholar] [CrossRef]
  81. Sedgwick, P. Pearson’s correlation coefficient. BMJ 2012, 345, e4483. [Google Scholar] [CrossRef]
  82. Kiserud, T.; Piaggio, G.; Carroli, G.; Widmer, M.; Carvalho, J.; Neerup Jensen, L.; Giordano, D.; Cecatti, J.G.; Abdel Aleem, H.; Talegawkar, S.A.; et al. The World Health Organization Fetal Growth Charts: A Multinational Longitudinal Study of Ultrasound Biometric Measurements and Estimated Fetal Weight. PLoS Med. 2017, 14, e1002220. [Google Scholar] [CrossRef] [Green Version]
  83. Samajdar, T.; Quraishi, M.I. Analysis and Evaluation of Image Quality Metrics. In Information Systems Design and Intelligent Applications; Mandal, J.K., Satapathy, S.C., Kumar Sanyal, M., Sarkar, P.P., Mukhopadhyay, A., Eds.; Springer: New Delhi, India, 2015; pp. 369–378. [Google Scholar] [CrossRef]
  84. Qadri, S.F.; Shen, L.; Ahmad, M.; Qadri, S.; Zareen, S.S.; Khan, S. OP-convNet: A Patch Classification-Based Framework for CT Vertebrae Segmentation. IEEE Access 2021, 9, 158227–158240. [Google Scholar] [CrossRef]
  85. Lundin, A. Clarius Mobile Health Makes Leadership Changes to Accelerate Growth. AXIS Imaging News 2022. [Google Scholar]
  86. Strumia, A.; Costa, F.; Pascarella, G.; Del Buono, R.; Agrò, F.E. U smart: Ultrasound in your pocket. J. Clin. Monit. Comput. 2021, 35, 427–429. [Google Scholar] [CrossRef]
Figure 1. Typical prenatal ultrasound images from each trimester. (AC) First trimester, green arrows indicate blurred fetal head and artifacts. (DF) Second trimester, blue arrows indicate poor signal-to-noise ratio and reflection from the fetal membranes and amniotic fluid interface. (GI) Third trimester, yellow arrows indicate speckle noise and standard sutures or ultrasonography artifacts.
Figure 1. Typical prenatal ultrasound images from each trimester. (AC) First trimester, green arrows indicate blurred fetal head and artifacts. (DF) Second trimester, blue arrows indicate poor signal-to-noise ratio and reflection from the fetal membranes and amniotic fluid interface. (GI) Third trimester, yellow arrows indicate speckle noise and standard sutures or ultrasonography artifacts.
Diagnostics 12 02229 g001
Figure 2. Workflow of the pipeline followed in this paper: Block (1) for fetal head segmentation. Block (2) for smoothing and measuring. Block (3) for fetal GA and weight prediction red.
Figure 2. Workflow of the pipeline followed in this paper: Block (1) for fetal head segmentation. Block (2) for smoothing and measuring. Block (3) for fetal GA and weight prediction red.
Diagnostics 12 02229 g002
Figure 3. The architecture of EfficientNetB0.
Figure 3. The architecture of EfficientNetB0.
Diagnostics 12 02229 g003
Figure 4. Illustration of the fetal head measurement.
Figure 4. Illustration of the fetal head measurement.
Diagnostics 12 02229 g004
Figure 5. Segmentation networks performance based on IoU validation score during training with 128 × 128 input size.
Figure 5. Segmentation networks performance based on IoU validation score during training with 128 × 128 input size.
Diagnostics 12 02229 g005
Figure 6. Qualitative comparison of segmentation performance of networks on a fetal head ultrasound image. The predicted mask, ground truth, and original image boundaries are shown. The predicted masks using different networks and the proposed ETLM are shown in the first row.
Figure 6. Qualitative comparison of segmentation performance of networks on a fetal head ultrasound image. The predicted mask, ground truth, and original image boundaries are shown. The predicted masks using different networks and the proposed ETLM are shown in the first row.
Diagnostics 12 02229 g006
Table 1. Distribution of dataset during the various trimesters of pregnancy.
Table 1. Distribution of dataset during the various trimesters of pregnancy.
Trimesters of PregnancyTraining SetsTesting Sets
First trimester16555
Second trimester693233
Third trimester14147
Total999335
Table 2. The selected segmentation models and their details.
Table 2. The selected segmentation models and their details.
Model
Name
BackboneOutput
Function
NormalizationOne-Hot
Encoding
OptimizerLoss
Function
Batch SizeEpochInput SizeTrainable
Params
UNetEfficientNetB0Softmax0 to 10 = black pixel
1 = weight pixel
RMSprop
+
Scheduler
Learning
Rate Step
Decay
Categorical
Focal
Jaccard loss
3210064 × 64
128 × 128
2,776,114
UNet_plus2,389,042
Att_UNet2,614,725
UNet 3+3,183,330
TransUNet2,218,322
FPN4,911,614
LinkNet6,049,342
DeepLabv34,027,810
Table 3. Comparing two HC measurement formulas with the HC ground truth using the mean difference.
Table 3. Comparing two HC measurement formulas with the HC ground truth using the mean difference.
FormulaMean HC of the GTMean HC by Each FormulaMean Difference
Our formula174.3831 mm174.2411 mm−0.14203
Other Formula178.3705 mm3.9874
Table 4. Fetal gestational age dataset.
Table 4. Fetal gestational age dataset.
GA ValidationGA TrainingGA Testing
Dataset(10–40) weeks(13–25) weeksGA < 13
GA > 25
Training999692307
Testing335232103
Total1334924410
Table 5. Estimated fetal weight dataset.
Table 5. Estimated fetal weight dataset.
EFW ValidationEFW TrainingEFW Testing
Dataset(10–40) weeks(20–36) weeksGA < 20
GA > 36
Training999551448
Testing335175160
Total1334726608
Table 6. Evaluation levels for segmentation model.
Table 6. Evaluation levels for segmentation model.
Level 1
Segmentation
Evaluation
Level 2
Post-Processing
Evaluation
Level 3
Measurement
Evaluation
TotalTraining 80%
Validation 20%
Validation 100%Validation 100%
Augmented7992
1998
Training Set 999
Testing Set 335
Table 7. Level one: performance evaluation and comparison of segmentation results for all models with various input sizes, and with and without augmentation.
Table 7. Level one: performance evaluation and comparison of segmentation results for all models with various input sizes, and with and without augmentation.
Model Trained with Input SizeNetworkAugmentationACCmIoUPreRecallDSCAUCMSEmPATime (min)
128 × 128UNetNo0.98550.96670.97800.97400.98540.98200.0140.97110:11:02
UNet_plus0.98520.96620.97100.98150.98510.98420.0140.96650:10:43
Att_UNet0.98620.96800.97690.97870.98620.98410.0130.97210:11:35
UNet 3+0.98560.96710.97700.97660.98560.98300.0140.96930:25:20
TransUNet0.98520.96620.97830.9740.98520.98210.0140.97560:12:08
FPN0.98660.96930.97900.97780.98600.98400.0130.97300:13:29
LinkNet0.98570.96730.97700.97600.98560.98300.0140.96920:12:14
Deeplabv30.98520.96600.97910.97270.98450.98170.0140.97630:11:04
64 × 64UNetYes0.99170.98100.98700.98590.99160.99000.0080.98700:39:00
UNet_plus0.98980.97670.98430.98330.98960.98800.0100.98150:38:18
Att_UNet0.99190.98150.98810.98630.99190.99000.0080.98750:40:38
UNet 3+0.99200.98160.98830.98620.99190.99040.0070.98921:16:44
TransUNet0.99130.98020.98730.98510.99120.98960.0080.98730:44:44
FPN0.99260.98310.98870.98780.99250.99130.0070.98860:48:51
LinkNet0.99120.98000.98680.98540.99110.98960.0080.98600:46:13
Deeplabv30.99080.97900.98690.98380.99030.98890.0090.98421:07:17
ETLM0.99280.98410.98920.98810.99340.99180.0080.9904NA
128 × 128UNetYes0.99280.98200.98880.98860.99280.99170.0070.98980:37:15
UNet_plus0.99230.98070.98790.98790.99220.99110.0070.98770:35:10
Att_UNet0.99280.98190.98870.98850.99270.99160.0070.98910:38:59
UNet 3+0.99330.98320.99000.98900.99330.99210.0060.99081:40:12
TransUNet0.99280.98190.98900.98840.99270.99160.0070.98920:38:29
FPN0.99390.98460.99080.98990.99380.99280.0060.99050:42:47
LinkNet0.99270.98170.98920.98790.99260.99140.0070.98860:36:30
Deeplabv30.99260.98280.98860.98780.99230.99130.0070.98840:43:11
ETLM0.99420.98530.99130.99030.99080.993160.0050.9914NA
Table 8. Level two: predicted mask (post-processed) quality assessment for models with various input sizes.
Table 8. Level two: predicted mask (post-processed) quality assessment for models with various input sizes.
Model Trained with Input SizeNetworkOriginal
Training Images
mHD (mm)MSD (mm)RVDMSSIMPSNR
64 × 64ETLMYes0.9276340.0034989−0.003870.98108–0.9825525.142206
FPN1.1866360.0049680−0.012370.97322–0.9754423.47897
UNet1.1187710.0048532−0.012130.97352–0.975727023.5358
Att_UNet1.5126620.0049149−0.012220.973301–0.975526323.505971
Trans_UNet1.1187710.0049047−0.012080.97344304–0.9756385023.50993
128 × 128ETLMYes0.7530950.00181170.0016390.989922–0.99070628.247806
FPN0.6254120.0020034−0.002640.9888480–0.989668927.536022
UNet1.2508240.0020566−0.001960.98856–0.98942127.484995
Att_UNet0.9888620.0020950−0.001770.988375–0.98924727.41142
Trans_UNet0.7530950.0020579−0.002430.988523–0.9893736527.43699
Table 9. Level three: measurement evaluation based on testing dataset.
Table 9. Level three: measurement evaluation based on testing dataset.
Model Trained
with Input Size
NetworkOriginal
Testing Images
mHD (mm)MAD (mm)DCS
128 × 128ETLMYes1.67151.87350.9716
Table 10. Comprehensive comparison with state-of-the-art models.
Table 10. Comprehensive comparison with state-of-the-art models.
Type of ComparisonSegmentationMeasurementModel Weight
NetworkACCmIoUPremPADSCMADmHDModel Trained
with Input Size
Bach
Size
GPU RAMEpochsTraining Time
ETLM
[UNet: Att_UNet: FPN]
[0.3: 0.3: 0.4]
0.99420.98530.99130.99140.97161.871.67128 × 1283211 GB3002:01 h
VNet-c [43]0.98880.95940.9767NA0.97911.89NA512 × 51246 GB30053:35 h
VSAL [42]NANAN/A0.9900.9710NA3.234256 × 256424 GB10017:30 h
SAPNet [49]NA0.9646NA0.98020.97901.811.22480 × 3201011 GB700NA
Regression CNN [47]NANANANA0.97761.901.32800 × 80016NA1500NA
DAG V-Net [41]NANANANA0.97931.771.27768 × 512211 GB2030 h
MTLN [32]NANANANA0.96842.121.72800 × 540NA11 GB20015 h
UNet [36]NANANANA0.97312.69NA216 × 320432 GB100NA
DSCNN [40]NANANANA0.9689NANANANANANANA
MS-LinkNet [39]NANANANA0.93752.273.70NA1011 GB15018 h
Table 11. Result and validation of multiple regression models for GA and EFW prediction.
Table 11. Result and validation of multiple regression models for GA and EFW prediction.
Fetal GA Prediction
in the 50th Percentile
(13 > GA > 25) Week
EFW Prediction
in the 50th Percentile
(20 > GA > 36) Week
Regression modelMSEPearson’s rMSEPearson’s r
Polynomial Regression0.000330.99589.087230.9422
Linear Regression0.002050.98990.000350.9988
Random Forest Regressor0.008420.95116.543800.9844
XGBRFRegressor0.022680.95050.000180.9847
Neural network0.013920.98050.002560.9946
KNeighbors Regressor0.009210.95820.002140.9841
SGDRegressor0.002190.99010.001460.9968
AdaBoostRegressor0.010860.95050.001000.9843
BaggingRegressor0.010810.98320.002810.9964
StackingRegressor0.008240.95066.938900.9843
LinearSVR0.001990.99010.000540.9989
LGBMRegressor0.010110.95147.728670.9843
Lasso0.08300NA0.173390.8507
VotingRegressor0.002480.99090.000310.8507
BayesianRidge0.002060.98990.000350.9988
Deep NN0.000720.99780.00068NA
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alzubaidi, M.; Agus, M.; Shah, U.; Makhlouf, M.; Alyafei, K.; Househ, M. Ensemble Transfer Learning for Fetal Head Analysis: From Segmentation to Gestational Age and Weight Prediction. Diagnostics 2022, 12, 2229. https://0-doi-org.brum.beds.ac.uk/10.3390/diagnostics12092229

AMA Style

Alzubaidi M, Agus M, Shah U, Makhlouf M, Alyafei K, Househ M. Ensemble Transfer Learning for Fetal Head Analysis: From Segmentation to Gestational Age and Weight Prediction. Diagnostics. 2022; 12(9):2229. https://0-doi-org.brum.beds.ac.uk/10.3390/diagnostics12092229

Chicago/Turabian Style

Alzubaidi, Mahmood, Marco Agus, Uzair Shah, Michel Makhlouf, Khalid Alyafei, and Mowafa Househ. 2022. "Ensemble Transfer Learning for Fetal Head Analysis: From Segmentation to Gestational Age and Weight Prediction" Diagnostics 12, no. 9: 2229. https://0-doi-org.brum.beds.ac.uk/10.3390/diagnostics12092229

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop