Next Article in Journal
Evidence on the Impact of Winter Heating Policy on Air Pollution and Its Dynamic Changes in North China
Next Article in Special Issue
The Spatiotemporal Distribution of Flash Floods and Analysis of Partition Driving Forces in Yunnan Province
Previous Article in Journal
Unintended Circularity?—Assessing a Product-Service System for its Potential Contribution to a Circular Economy
Previous Article in Special Issue
Evaluation of Vegetation Restoration along an Expressway in a Cold, Arid, and Desertified Area of China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Earthquake Fatalities Assessment Method Based on Feature Importance with Deep Learning and Random Forest Models

Key Laboratory of Earthquake Engineering and Engineering Vibration, Institute of Engineering Mechanics, China Earthquake Administration, Harbin 150080, China
*
Author to whom correspondence should be addressed.
Sustainability 2019, 11(10), 2727; https://0-doi-org.brum.beds.ac.uk/10.3390/su11102727
Submission received: 27 March 2019 / Revised: 24 April 2019 / Accepted: 8 May 2019 / Published: 14 May 2019

Abstract

:
This study aims to analyze and compare the importance of feature affecting earthquake fatalities in China mainland and establish a deep learning model to assess the potential fatalities based on the selected factors. The random forest (RF) model, classification and regression tree (CART) model, and AdaBoost model were used to assess the importance of nine features and the analysis showed that the RF model was better than the other models. Furthermore, we compared the contributions of 43 different structure types to casualties based on the RF model. Finally, we proposed a model for estimating earthquake fatalities based on the seismic data from 1992 to 2017 in China mainland. These results indicate that the deep learning model produced in this study has good performance for predicting seismic fatalities. The method could be helpful to reduce casualties during emergencies and future building construction.

1. Introduction

Earthquakes impose a large number of threats to the Chinese (Table 1). If there is a proper rapid estimation of the number of casualties in an earthquake, the impact and losses of the disaster could be decreased [1]. The human and material resources of emergency management can be allocated by predicting the death toll [2]. We use the surface-wave magnitude (Ms) in the study. According to the current emergency response regulations of relevant Chinese departments, the following categories of emergency personnel and materials are obtained: (1) When the magnitude is less than 6 and the predicted number of deaths is 0–10. The government will need 10–50 emergency personnel and 200–300 tents; (2) When the magnitude is greater than or equal to 6 and less than 6.5, and the predicted number of deaths is 0–10. The number of emergency personnel is 50–100, and the number of tents is 1000–3000. (3) When the magnitude is greater than or equal to 6.5 and less than 7, and the predicted death toll is 0–10, 200–500 emergency personnel and 3000–5000 tents will be needed. If the predicted number more than 10, 500–1000 emergency personnel and 5000–10000 tents will be needed. (4) When the magnitude is more than 7 and the death toll is less than 10, 500–1000 emergency personnel and 5000–10000 tents will be required. When the death toll is between 10–100, 1000–5000 emergency personnel and 10000–20000 tents will be required. When the death toll is 100–1000, 5000–10000 emergency personnel and more than 20,000 tents will be needed and (5) when the number of deaths is greater than 1000, it is necessary to draw the necessary emergency personnel and material distribution according to the specific economic and political conditions in the local area.
However, there are many factors which may affect fatalities, and not every factor has a decisive impact on earthquake casualties. Therefore, it is also necessary to select a suitable method to evaluate the importance of each factor.
Linear models are the most constantly used methods for assessing feature correlation [3]. In reference [4], the research has given the relationship between human losses and factors such as population density and the intensity and magnitude of the earthquakes based on the linear models. Nevertheless, due to the uncertainties and fuzziness in the data of the factors [5], integrated ensemble models were proposed and applied to the feature importance assessment models [6,7,8,9] for the purpose of improving accuracy and generalization ability of the traditional linear models [10]. In the present studies, the excellent performance of ensemble algorithms on prediction ability and generalization capacity has been proven better than the linear models [3]. However, so far, no research has been conducted to evaluate the importance of influencing factors and different structure types on earthquake casualties using machine learning methods. Previous studies of earthquake casualties based on experience directly gave influencing factors [2,11] and structural types [12] or based on the statistical methods gave [13].
Different methods were developed to estimate the casualties in earthquakes. Most studies used empirical analysis methods [12,14,15] and some software systems to assess casualties. For instance, geographic information system (GIS) [11,16], the U.S. Geological Survey’s Prompt Assessment of Global Earthquakes for Response (PAGER) system [17] and the Disaster Management Tool (DMT) software [18]. In reference [18], the authors present the casualty estimation model, which is part of the DMT software. The model is based on the evaluation of laserscanning data that collected by the airborne sensors and it also can be used to detect collapsed buildings, to assess their damage type, and to compute the number of the trapped victims for each collapsed building. The PAGER system, made recourse to the EERI World Housing Encyclopedia (WHE) project (including the non-engineered building) [19], can estimate the fatality for large earthquakes in the two hours [17]. However, these systems cannot assess losses in a few minutes. Empirical methods usually established linear models that were evaluated by fitting one or more functions [20]. These models have many disadvantages: the workloads tend to be large and the amount of data small; the abnormal points were usually deleted instead of calculating fit together within the models; and they have strong subjectivity. These shortcomings can be compensated by neural networks in the field of machine learning [21].
With the rise of machine learning algorithms, some studies of estimating fatalities based on back propagation neural network (BPNN) method have begun to emerge [21]. Because of different earthquakes of intensity, population density, and different structure types, it is extremely perplexing to define a certainty relevance to evaluate fatalities caused by an earthquake. Hence, deep learning method, with its abilities to estimate perplexed relevances, could be an outstanding method to evaluate fatalities. However, BPNN method is not a very perfect network, it has many shortcomings: (1) The convergence speed is too slow and it takes hundreds or more than hundreds of times to learn to converge [22]; (2) it cannot guarantee convergence to a global minimum point [23,24]; (3) there are a number of hidden layers and neurons in that are not theoretically guided, but are determined empirically, thus, the network tends to be large [22]—the redundancy invisibly increases time of network learning [25]; and (4) learning and memory of the network are unstable. Deep learning optimization algorithms can improve the shortages of BPNN method.
Therefore, we assessed the importance of the factors based on three machine learning methods and selected the random forest algorithm as the optimal classifier. At the same time, we evaluated the contribution degree of 43 different structure types based on the random forest algorithm. Finally, the deep learning assessment model was established with the factors of population density, magnitude, focal depth, epicentral intensity, and time. Figure 1 shows the flowchart of the entire assessment process.

2. Features

2.1. Data

An important question, in human losses expected studies, is whether or not a conditioning variable is actually useful and needed for the assessment and prediction. There are many factors affecting the casualties of earthquakes, such as the intensity of earthquakes, the vulnerability of houses and the economic development in earthquake areas. However, some factors do not have data for every earthquake. We chose the following ten features: date, time, magnitude, epicentral intensity, abnormal intensity, focal depth, secondary disasters, population density, economic situation, and damage ratio of different structure types.
  • The regularity of people’s working and resting time makes the differences of earthquake occurrence time have a great impact on the fatalities [4,13]. Time was divided into two parts of this study: daytime (7:00–21:00) and sleeping time (21:00–next7:00).
  • Magnitude is a parameter to measure the energy release by the seismic wave, generally, the greater the magnitude, the greater the disaster caused. The magnitude is expressed by the surface wave magnitude Ms commonly used in China.
  • Considering the seismic source as a point, the vertical distance from the point to the ground is called the focal depth. Often, the smaller the focal depth is, the closer to the epicenter and the greater the damage caused.
  • In general, the higher the intensity, the greater the casualties [4]. This study used the epicenter intensity listed in the Earthquake Disasters and Losses Assessment Report in Chinese Mainland.
  • Population density has a great impact on the number of earthquake deaths [4]. There are obvious differences between densely populated and more sparsely populated areas. For example, the population density in Tibet province is low and there are even existing places where no people, hence, the casualties in the earthquake are bound to be small. Conversely, high population densities contribute to an increase in the number of deaths [2]. The location of the epicenter can be obtained from the China Earthquake Networks Center after an earthquake, and the population density can be obtained from the data published by the Statistical Bureau.
  • Generally, seismic intensity decreases with increasing distance from the epicenter. But high intensity points appear in low intensity areas, or, conversely, for reasons such as geological structure, topography, and the superposition of deep seismic reflection waves, which is called seismic intensity abnormally. The abnormal intensity in this paper refers to the occurrence of high intensity points in low intensity areas, which often aggravates disaster losses. Abnormal intensity was expressed in two cases: yes and no.
  • The economic situation has a great impact on disaster losses [4]. Usually, the better the economic situation, the lighter the disaster will be in the same earthquake intensity, and the higher the population density will be in the concentrated areas of social wealth. According to the situation mentioned in the Earthquake Disasters and Losses Assessment Report in Chinese Mainland, the economic situation was divided into seven categories, from top to bottom, the economic situation is getting better: (1) national poverty region; (2) special poverty area, deep poverty area, remote and poor mountainous area, remote and poor area, border poverty areas, provincial poverty region, remote area; (3) minority poverty area, general poverty area; (4) financial deficit area; (5) economically backward area, minority area; (6) general area; (7) western medium-developed areas.
  • Secondary disasters will cause secondary damage to the disaster area. The impacts cannot be underestimated: they mostly manifest as mountain collapse, landslide and debris flow, and very few are the result of fires. They can only be divided into two cases in this study due to the small number of generations: yes and no.
  • Most of the fatalities are caused by building damage [2] and this factor is vital to the number of deaths [13]. Therefore, this paper chooses the damage ratio of houses as the feature. Damage ratios consist of collapsed structures, heavy damage, moderate damage, and slight damage.
  • Different earthquake occurrence dates can sometimes lead to aggravation of earthquake damage, for instance, rain and snowy weather will affect rescue efforts. Dates were processed quarterly, and a year is divided into four quarters in this study.

2.2. Importance Assessments of 9 Features

The selected supervised classifiers are the random forests (RF), adaptive boosting (AdaBoost) and classification and regression tree (CART). The CART algorithm, a decision tree model and a non-parametric data mining method, has many advantages including ease of handling numerical and categorical data and multiple outputs situations [25]. The CART algorithm is a component learner with gini features, such as division standard, while ensemble learning combines multiple weak classifiers using different methods. The most common methods of ensemble learning are bootstrap aggregating (bagging) method and boosting method, in which bagging is a parallel algorithm and boosting is a sequential algorithm. The RF algorithm, the expansion of bagging method, exploits random binary trees to discriminate and classify data [7]. The AdaBoost approach, a boosting algorithm, constructs a strong classifier with weak classifiers and updates the weight of samples based on learning error, as shown in Figure 2. Each sample in the training data is given equal weight α at first. A weak classifier is trained on the training data and the error rate ε of the classifier is calculated. Then, the classifier is trained again on the unified data. The ε will be increased while αwill be reduced to the first classification on the second training classifier. AdaBoost calculates ε of each weak classifier and assigns α to each classifier. In Figure 2, the first row is the data set, where the different width of histograms represents different weight on each sample. The data set will be weighted by α in the third row after passing through the classifier [26]. The final output is obtained by summing the weighted results. The the error rate ε (Equation (1)) and the weight α (Equation (2)) are calculated as follows:
ε = N 1 N 2
α = 1 2 ( 1 ε ε )
where N1, N2 are the number of incorrectly classified and classified samples, respectively.
We worked with the CART, RF, and AdaBoost models implemented in the jupyter of the Anaconda Navigator. Data was generally required to be standardized in machine learning. We used the StandardScaler preprocessing method of the sklearn function library to process the magnitude, focal depth, epicentral intensity, and population density. Equation (3) presents the process:
x = ( x μ ) / σ
where x is the features, μ is the mean of the data and δ is the variance of the data. The parameters selection of each model is shown in Table 2. Table 3 presents the result of verifying the model with cross validation score function in sklearn function library. And the importance of nine features in casualty assessment is shown in Figure 3.

2.3. Importance Assessments of Structure Types

The importance of different structure types was assessed alone because of the complexity of the structure types and the great contribution to the death [27]. For the accuracy and comprehensiveness of the assessment, this study listed 43 universal and special structure types from the Earthquake Disasters and Losses Assessment Report in Chinese Mainland from 1992 to 2007: reinforced concrete frame structure, masonry structure, brick-wood structure, civil structure, national brick-wood structure, brick-concrete structure, bucket-piercing frame structure, brick-concrete structure (building of two or more floors), shed, brick-concrete structure(building of only one floor), brick-column civil structure, wing-room, national civil structure, simple house with dry fortified earth wall, brick-column adobe structure, dry brick building, brick structure, timber structure, timber stack structure, timber framework, brick-masonry structure, frame structure, adobe structure, wood-column adobe structure, reinforced frame structure, mixed house, brick adobe structure, general houses (owned by citizens), stone-wood structure, stone structure, simple house, old Tibetan house, stone-grass structure, stone-concrete structure, alunite house, earth rock house, industrial plant, steel frame structure, soil tamper structure, cave dwelling, wooden frame house, flag stone house, and general house.
Structural damages were divided into five grades: collapse, heavy damage, moderate damage, slight damage, and basically undamaged. We chose the collapse, heavy damage of buildings, and population density as input parameters. Firstly, death was almost being caused by the collapse and heavy damage of the architecture. Another, population density, was closely related to the seismic casualty and the number of buildings. We only selected random forest algorithm to assess the importance of structure types for the mean accuracy of the RF model was higher than the other algorithm as seen from Table 3. Table 4 presents the procedure of RF in feature assessment. The importance of structure types can be seen in Figure 4.

3. Model

We chose the population density, magnitude, focal depth, epicentral intensity, and time as the input parameters. The criteria and reasons for selecting five input parameters among ten features were as follows:
(1)
Features were selected from high to low in the order of features importance. According to their subjectivity, the number of occurrences and obtained time after the earthquake.
(2)
The results of the importance assessment show the relative values of the nine factors that contribute to the assessment of the casualty, rather than the absolute value of each of the factors alone. Time ranked seventh instead of higher in the importance assessment results because we only divided it into the two parts. In the deep learning model, we did not to classify the time. Therefore, we chose the time because it could be almost obtained at the same time as an earthquake and it had not subjectivity in the deep learning model.
(3)
The division of the economic status had a strong subjectivity and the data selected in the test set was the economic situation of the year of the occurrence of the earthquake. With the annual inflation and the depreciation of the coin, the situation of that year may not be applicable to the future.
(4)
Although the date was more important than the time, the subjectivity of the date was also great, which was only divided into four seasons.
(5)
The secondary disasters and abnormal intensity were only divided into yes and no. The combined number of the two phenomena was small and not every earthquake was accompanied by them.
(6)
The structure types were too complex. Structure destroyed during each earthquake were different and every destruction was divided into five parts.

3.1. Data

We collected 289 destructive earthquakes occurred in the Chinese mainland from 1992 to 2017 in the Earthquake Disasters and Losses Assessment Report in Chinese Mainland (Table A1). The excel table data were pre-processed by openpyxl module in deep learning method, and the data set of 228 earthquake cases were returned without a missing value. Among these data, we selected 180 data as the training set and 38 data as the test set. The remaining ten were used as validation sets. The time was recorded in minutes, and the hours are converted into minutes, which are calculated at 1440 min per day. When the time is x: y, the data will be processed as (60x+y)/144. For example, if the time is 04:32, the change is 0.19. Other parameters need not be specially processed.

3.2. Deep Learning Model

3.2.1. Hyper Parameter

A deep learning model is a multilayer stack of simple modules and many of which compute non-linear input to output mappings. The hidden layers of 3 to 18 in the deep learning model can implement extremely complex functions of the original variables that are sensitive to minute details [23]. Given the small data set in this study, we chose a four-layers back propagation network. The number of neurons in the hidden layers was optimized for getting accurate output [21]. To select the number of two hidden layer’s neurons, the training began with ten and three neurons separately and was repeated for more neurons. The number of neurons in the hidden layers were determined to be 40 and 5 separately for reducing the complexity and running time of the model.
There are different methods to avoid over fitting during the training process and to obtain models that are expert in generalizing the problem to be explained. For instance, stopping iteration in advance, regularization, dropout, and data set expansion. Stopping iteration in advance and data set expansion are not applicable to this model. The dropout method is a random deletion of half of the hidden layer nodes, hence, it is also not applicable. Finally, the regularization method was chosen. Regularization in deep learning includes coefficients L1 and L2. We chose the most commonly used L2 regularization. L2 regularization is to add an additional regularization term to the loss function, presented in Equation (4).
c = c 0 + λ 2 n w w 2
where c and c0 are the new and old loss functions, separately, λ is the L2 regularization rate, n is the number of the sample and w is the weight. Through a large number of tests, the optimal parameters were obtained as shown in Table 5. Figure 5 demonstrates the model with two hidden layers, consisting of forty and five neurons respectively, five input layers, and one output layer.

3.2.2. Optimization Algorithm

The optimization algorithms used in the model including the adaptive moment estimation (Adam) [28], mini-batch gradient descent [29], and moving average model [30]. We used the adam algorithm to make the learning rates updating independent for parameters by calculating the first and second moment estimation of gradients. The mini-batch is to reduce the randomness for the data in a pool determine the direction of the gradient together and reduce the possibility of deviation during the descent process [31]. The model computer was a workstation with a good performance and the data volume of the model was less than 2000. The size of the training pool was usually chosen as the n-th power of 2. Hence, we chose 16 data as a training pool through experiments. The moving average model is to prevent the sudden change when updating parameters [32].

3.2.3. Process of the Model

Definition structure and forward propagation used tensorflow framework [30]. The parameters of deep learning network needed only define weights w and b, and the normal distribution was chosen as the weight generating function. The activation function was relu function [23] and was used in the first hidden layer. The second hidden layer and the output layer are basically equivalent to linear regression. We selected Adam algorithm as back propagation algorithm. The learning rate is optimized. The mean square error loss function was selected for the data set was little (Equation (5)).
c = 1 n ( y t ) 2
where c presents loss function, n is the number of the samples, y is the predictive value and t is the true value.

4. Results

We chose ten seismic cases with different parameters as the test data, of which the eighth, ninth, and tenth cases were Ludian earthquake in 2014, Yushu earthquake in 2010 and Wenchuan earthquake in 2008. Selection of specific test sets is shown in Table 6. Comparing with the intensity-based mortality estimation method in Assessment of Earthquake Disaster Situation in Emergency Period [33] (China-National Standard), presented in Equation (6) and (7). The Standard is an important basis for the government’s decision-making on earthquake relief. After an earthquake, the primary tasks of the emergency work are to conduct disaster investigation and assessment in a short period of time and to provide the government with timely and necessary information on the disaster situation. Therefore, relevant departments have formulated the Standard in a comprehensive summary of the research methods and field practices of earthquake disaster assessment in China. Focus on rapid and dynamic disaster recovery and assessment.
N D = j = 6 I m a x A j ρ R j
ln R j = 44.365 + 7.516 I j 0.329 I j 2
where ND is the number of deaths; Imax is the intensity of the meizoseismal area; Aj is the distribution area of intensity value j; ρ is the population density; Rj is the death rate corresponding to the intensity value j; Ij is the intensity value j.
Table 7 and Figure 6 show the comparison results of the deep learning model, China-National Standard and true value. Table 8 presents the accuracy of the two methods in cases from1 to 7 and cases from 8 to 10, separately. The results demonstrate as follows:
(1)
Adam algorithm and moving average model based on deep learning can accelerate the convergence speed and improve the accuracy of model prediction.
(2)
The accuracy was higher, and the factors considered were more than the Standard.
(3)
The prediction accuracy of the last three earthquake cases is generally not high.
(4)
The China-National Standard model selected fewer parameters. The model was based on the experience of many experts and had been tested for many years. In the cases 1–7, the predicted values and the true values were all on the order of magnitude, but the performance on the cases 8–10 was far worse than the deep learning model, which was far from the true value and has no reference for the actual earthquakes.

5. Discussion

Three issues were considered in this study: the importance of nine features to the human losses in China mainland, the importance of 43 structure types of the fatalities, and human losses assessment. The first issue was addressed by adopting three traditional machine learning method domains to investigate different features’ influence on the seismic fatalities. The results can provide a basis for further seismic assessment. The second issue was addressed by RF algorithm to assess the importance of 43 structure types. The third issue was addressed by establishing a deep learning model domain to predict the seismic death. A detailed investigation of these two issues is presented as follows.

5.1. Importance of Different Features

We propose an automatic classifier based on RF, CART, and AdaBoost, and some attributes to describe the feature importance of seismic. Random forest is the most stable among three methods. Figure 7 exhibits the results of three tests on that three methods. RF provides probability estimates on the classification that are useful to accept or reject a new classification [8]. Thus, we chose the RF algorithm as the main method. It fully proved the importance of the population density, magnitude, focal depth, epicentral intensity. The sum of the four features is 74.68%, which is far more important than other factors.
Time is vital to seismic, for example, the Tangshan earthquake occurred at 3:24 on July 28, 1976, when people slept in the house, and time aggravated the disaster. Losses will be relatively mitigated in daytime for most people are not sleeping and could escape quickly. The reason that time ranks seventh in this study is that time was only divided into two possibilities: daytime and sleeping time. More classes are discriminated by an attribute, the more important the attribute is [8]. In the same way, the ratio of abnormal intensity and the secondary disaster are only 2.63% and 1.18% separately for the fewer classes. However, time is divided into different levels, not modeled at an actual time, so there is interference with the results. The secondary disasters and the abnormal intensity are classified because they occur less frequently. In summary, the features selected by the RF algorithm experiments are consistent with those used by other scholars to study earthquake casualties (population density [2], magnitude [2], focal depth [34], epicentral intensity [16], and time [13]).

5.2. Importance of Different Structure Types

We chose all 43 structure types in China mainland to assess, which are far more than the number of the types in the previous studies [35]. Although the structure types of the WHE project are suitable for the most region of the world, it also lacks some structures that Chinese characteristic structure, such as the national civil structure, the old Tibetan house and the national brick-wood structure. Hence, the data from the Earthquake Disasters and Losses Assessment Report in Chinese Mainland of every earthquake in China mainland is more suitable for the study comparing the data of the WHE project. The HAZUS system (HAZad United States of Multi-Hazard) only has twelve structure types and estimates the casualties for collapsed buildings and not for the heavy damaged buildings [18]. In reference [18], the casualties are estimated at the level of single buildings and not for an entire zone in an adapted HAZUS system. The population density varies greatly in different parts of China, so the HAZUS system and the adapted system are not suitable for the casualties estimation of China.
The importance of different structural types was obtained based on RF method first. In the part of data collecting, there are no integration of structural forms and some may be slightly repeated in order not to omit any structure type. For instance, brick-column civil structure, brick-concrete structure (building of two or more floors) and brick-concrete structure (building of only one floor). However, repeated structures, even when combined, the importance was close to zero.
Figure 4 displays that the contribution to the casualties of reinforced concrete structure is the largest, followed by civil structure, stone-concrete structure, brick-concrete structure, brick-wood structure, brick house and the frame house. The other structures are not shown in Figure 4 and could be neglected. We concluded that the structure with large effect may not be good at seismic behavior. The result manifests that once the building is destroyed, the damage will be greater.

5.3. Human Losses Assessment Model

We proposed a human losses rapid prognostic assessment model based on the above feature engineering. The results indicate the predictions of large earthquakes with magnitude 6.5 or more are lower than the others. For example, the actual death toll of the Wenchuan earthquake is 69227, but the estimation of the deep learning model is 37406, the PAGER model is 50,000 and the other empirical model [20] is 30,000. It can be seen that the accuracy of the assessment of casualties for a large destructive earthquake is not high according to the traditional accuracy calculation (Equation (8)). The reason may be that there are more factors affecting large earthquakes than small earthquakes [2], and the uncertainty is greater than that of small earthquakes [34]. For example, the mountainous area of Ludian County is as high as 87.9% and the high incidence of secondary disasters (e.g., debris flow and landslide) cause large number of casualties. The accuracy of general prediction models is reduced due to the neglect of geological conditions [12].
accuracy = ( 1 | t r u e   v a l u e e s t i m a t e   v a l u e | t r u e   v a l u e ) × 100 %
Different geological conditions in different parts of China. In China, the areas with the most earthquakes are Sichuan Province, Yunnan Province, and Xinjiang Province. According to the earthquake cases and the results of the deep learning model, we can obtain conclusions as follows: (1) The basin region with soft rock-soil can aggravate earthquake disaster in Yunnan Province and Sichuan Province, such as Ludian earthquake in 2014 and Wenchuan earthquake in 2008; (2) In the event of an earthquake, the number of casualties in the area of the fault zone will be more serious in Xinjiang Province and Yunnan Province, such as Jinghe earthquake in 2017, Hutubi earthquake in 2016, and Ninger earthquake in 2007, and (3) Qinghai Province and Sichuan Province are close to each other, and the geological conditions are similar. Areas with crumbly strata have higher seismic vulnerability, such as Yushu earthquake in 2010. Above all, the error of the deep learning most comes from the geological problems.
There are some important reasons that affect the differences of the accuracy between the case 1–7 and case 8–10 besides the geological conditions. Earthquake casualties come from the direct and the secondary disasters, and the assessment mode of the latter is more difficult. However, there is currently no professional method to distinguish the human losses from the direct and the secondary disasters after the earthquake. Therefore, although the secondary disaster was evaluated at the time of importance assessment, it was not selected as an input variable. For destructive earthquakes, such as case 8–10, the types of secondary disasters are more abundant, so the impact is greater. Moreover, the traffic network is very important for post-earthquake rescue. Small earthquakes with magnitudes less than 6.5, such as cases 1–7, usually have limited damage to the traffic network and rarely completely block traffic. However, in the case of a large earthquake with a magnitude more than 6.5, such as the cases 8–10, the traffic network is usually interrupted, which seriously hinders rescue and greatly increases the number of deaths. The study did not consider the damage of the traffic network is also the main reason for the accuracy of the case 8–10.
The purpose of this study is to rapidly assess the human losses. Hence, the input features should be obtained in a short time after an earthquake. In reference [21], the train set was only from the Bam earthquake in 2003, thus, the results were not representative. Compared with empirical methods [36], the data set is larger [4], covering almost all the data from 1990 to 2017 without default. The accuracy of the results is higher; the data set is larger, and the factors considered are more than the China-National Standard [33]. It proved the deep learning technical can be used to estimate the causalities without any assumptions as compared with statistical methods, and can be directly processed by inner functions.

6. Conclusions and Future Works

6.1. Conclusions

We proposed a method to assess the importance of the nine factors affecting casualties in earthquakes and rank the importance of each feature based on the random forest algorithm. At the same time, 43 structural types were evaluated by this method, and the contribution of different structural forms to the death toll were obtained, which provides a basis for the future construction of structural forms. Based on the above evaluation of importance, we have reached the following conclusions:
(1)
The works of the features importance fully prove the importance of the population density, magnitude, focal depth, and epicentral intensity on contribution to the death. The importance of the time less than the date and economic status because it is divided into only two parts.
(2)
The Random Forest algorithm performs better than the AdaBoost and the CART algorithms, both in terms of stability and accuracy.
(3)
Reinforced concrete structure, national civil structure, and civil structure have the highest contribution to the death toll. The contribution of stone-concrete structure, brick-wood structure, brick structure, and frame structure are small. The other structures are smaller, and even cannot be displayed in the figure.
A deep learning model for estimating human losses based on the results of the Random Forest algorithm was established. We selected five important features and compared the results with the China-National Standard because the Standard is suitable for the rapid assessment works. The results demonstrate that the accuracy is higher than the other methods and the running time are suitable for the emergency rescue work. Therefore, this method can be used to evaluate the fatalities for future earthquakes in China mainland. This model can serve the China Earthquake Administration and the Chinese government.

6.2. Extension of the Works

Further extensive studies are needed and some recommendations for future research are given as follows. First, this research is based on the evaluation of factors importance. Future studies can extend the study by adding the number of factors. Second, this paper estimates the importance of different structures to death based on the Random Forest algorithm. Future studies can add the sparse learning to process the data for the classifier getting better results. Third, the deep learning model assesses the human losses with some optimization algorithms. Future studies can add the hidden layers and continue to optimize the algorithms. It will be of great interests to focus on death prediction in the future works.

Author Contributions

Conceptualization, H.J. and J.L. (Junqi Lin); methodology, H.J.; software, H.J.; validation, all three authors; formal analysis, H.J.; investigation, H.J.; resources, J.L. (Junqi Lin); data curation, J.L. (Junqi Lin); writing—original draft preparation, H.J.; writing—review and editing, H.J.; visualization, H.J.; supervision, J.L. (Junqi Lin) and J.L. (Jinlong Liu); project administration, J.L. (Junqi Lin) and J.L. (Jinlong Liu); funding acquisition, J.L. (Junqi Lin) and J.L. (Jinlong Liu)

Funding

This research was funded by NATIONAL KEY R&D PROGRAM OF CHINA, grant number 2018YFC1504503.

Acknowledgments

We would like to thank Chen Zhao for guidance and help with applying machine learning algorithms.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The 289 earthquake cases of the deep learning model with the default value from 1992 to 2017 in China mainland.
Table A1. The 289 earthquake cases of the deep learning model with the default value from 1992 to 2017 in China mainland.
DateTimeMagnitude (Ms)Epicentral IntensityFocal Depth (km)Population Density (People Per Square)Deaths
11992/4/2311:32 PM6.9764.984
21992/12/187:21 PM5.46311.111
31993/1/274:32 AM6.381460.680
41993/2/13:33 AM5.361035.630
51993/3/2010:52 PM6.6811.762
61993/5/247:57 AM5621.280
71993/5/303:26 PM4.9610122.380
81993/7/175:46 PM5.661611.061
91993/8/1410:30 PM5.678105.600
101993/9/54:22AM5.164032.330
111993/12/14:37 AM672894.262
121994/1/118:51 AM6.771016.450
131994/9/162:20 PM7.3720700.004
141994/9/1911:28 PM5.26197.350
151995/2/188:14 AM5.16101.510
161995/4/2512:13 AM5.672062.420
171995/7/125:46 AM7.381055.5011
181995/7/226:44 AM5.881080.3112
191995/9/2011:14 AM5.26120
201995/9/2612:39 PM5.362711.620
211995/10/66:26 AM4.9610945.950
221995/10/246:46 AM6.5912.581.9459
231996/2/37:14 PM791057.44309
241996/2/287:21 PM5.4715412.941
251996/3/1911:00 PM6.98113.3824
261996/5/311:32 AM6.4824183.1926
271996/6/18:49 PM5.461091.970
281996/7/23:05 PM5.2610125.552
291996/9/253:24 AM5.771542.561
301996/12/214:39 PM5.575.162
311997/1/219:47 AM6.48180.5912
321997/1/2510:38 AM5.16103.030
331997/1/305:59 PM5.571036.820
341997/3/12:04 PM67131.981
351997/4/111:34 PM6.681758.788
361997/5/312:51 PM5.2676.280
371997/8/134:13 PM5.377.9545.090
381997/9/2611:19 AM4.264.8314.600
391997/10/238:28 PM5.3610587.570
401997/11/310:29 AM5.67140.93
411998/1/1011:50 AM6.281082.1149
421998/3/199:51 PM67116.750
431998/4/1410:47 AM4.7610.70.000
441998/5/295:11 AM6.271119.090
451998/6/252:39 PM5.2511926.691
461998/7/117:04 PM5615.517.030
471998/7/209:05 AM6.17143.59
481998/7/2812:51 PM5.561125.280
491998/8/275:03 PM6.681142.093
501998/9/1811:53 AM4.8615361.840
511998/10/28:49 PM5.371383.460
521998/11/197:38 PM6.281040.226
531998/12/13:37 PM5.1710322.070
541999/3/119:18 PM5.6711116.080
551999/3/156:42 PM5.661116.930
561999/4/152:29 PM4.76888.601
571999/5/1711:29 AM5.2620471.600
581999/6/1711:02 PM5.351125.280
591999/8/176:47 PM57125413.560
601999/9/148:54 PM56991.670
611999/9/277:49 PM5.162014.140
621999/11/19:24 PM5.679271.870
631999/11/2512:40 AM5.2610314.931
641999/11/264:51 AM561315.070
651999/11/2912:10 PM5.671527.03
661999/11/304:24 PM5626651.861
672000/1/157:37 AM6.5630123.047
682000/1/274:55 AM5.5710115.120
692000/4/155:32 PM5.37131.190
702000/4/2911:54 AM4.761610.331
712000/6/66:59 PM5.9815247.060
722000/8/219:25 PM5.16887.612
732000/9/128:27 AM6.68134.550
742000/10/68:05 PM5.8618147.330
752001/2/238:09 AM64.703
762001/3/124:57 PM5610103.060
772001/3/247:23 AM553.630
782001/4/1211:13 AM5.986196.152
792001/5/245:10 AM5.87539.071
802001/5/245:10 AM5.87599.640
812001/6/82:03 AM5.365208.091
822001/6/2310:48 AM4.9615.44736.800
832001/7/107:51 AM5.3613125.500
842001/7/115:41 AM5.36108.320
852001/7/152:36 AM5.168337.430
862001/9/412:05 PM568102.910
872001/10/271:35 PM6715138.411
882001/11/145:26 PM8.1100
892002/8/87:42 PM5.377.630
902002/9/512:18 PM46185.700
912002/10/2011:46 PM561519.150
922002/12/149:27 PM5.97157.352
932002/12/258:57 PM5.76205.280
942003/1/47:07 PM5.4617124.750
952003/2/141:34 AM5.46250
962003/2/2410:03 AM6.8925.230.67268
972003/4/178:48 AM6.68143.840
982003/4/246:37 AM4.561256.530
992003/5/411:44 PM5.8726.749.291
1002003/6/1710:46 PM4.855172.200
1012003/7/109:54 AM4.8610241.690
1022003/7/2111:16 PM6.286101.9116
1032003/8/166:58 PM5.981536.464
1042003/8/185:03 PM5.7793.082
1052003/8/2110:16 AM56102287.920
1062003/9/27:16 AM5.96101.770
1072003/9/277:33 PM7.911155.480
1082003/10/166:28 PM6.18584.303
1092003/10/258:41 PM6.181858.8710
1102003/11/1310:35 AM5.2812165.951
1112003/11/152:49 AM5.1710415.914
1122003/11/251:40 PM4.962027.870
1132003/11/269:38 PM578411.220
1142003/12/19:38 AM6.181820.6810
1152004/3/79:29 PM5.66152.430
1162004/3/249:53 AM5.97302.911
1172004/5/117:27 AM5.9202.340
1182004/6/175:25 AM4.765322.131
1192004/7/127:08 AM6.7330.450
1202004/8/106:26 PM5.6810353.504
1212004/8/246:05 PM5.8251.110
1222004/9/78:15 PM5733105.801
1232004/9/172:31 AM4.9612295.740
1242004/10/196:11 AM566899.160
1252004/12/263:30 PM56772.441
1262005/1/56:05 AM4.7656.320
1272005/1/2612:30 AM56669.300
1282005/2/157:38 AM6.273238.810
1292005/4/84:04 AM6.561021.140
1302005/6/24:06 AM5.966.370
1312005/7/2511:43 PM5.161588.851
1322005/8/510:14 PM5.3621178.520
1332005/8/1312:58 PM5.3615143.400
1342005/8/265:08 AM5.262.270
1352005/10/277:18 PM4.4616152.501
1362005/11/268:49 AM5.7710638.1013
1372006/1/129:05 AM561683.140
1382006/3/273:20 AM4.36140.000
1392006/3/318:23 PM561554.310
1402006/6/2112:52 AM561593.701
1412006/7/411:56 AM5.1520437.740
1422006/7/195:53 PM5.67154.900
1432006/7/229:10 AM5.169169.8522
1442006/8/251:51 PM5.177194.472
1452006/11/2310:04 AM5.15170
1462007/3/1310:22 AM4.9662175.440
1472007/6/35:34 AM6.485103.633
1482007/6/234:17 PM5.861694.320
1492007/7/206:06 PM5.772527.430
1502008/2/15:06 AM4.666496.130
1512008/3/216:33 AM7.373314.390
1522008/3/218:36 PM5611157.980
1532008/3/2411:24 PM4.16101000.000
1542008/3/304:32 PM563326.150
1552008/4/209:14 PM5.162262.460
1562008/4/215:42 AM4.263326.150
1572008/5/122:28 PM8914238.3469227
1582008/6/102:05 PM5.26144.200
1592008/8/218:24 PM5.98778.785
1602008/8/304:30 PM6.1810131.5241
1612008/8/308:46 PM5.36252.850
1622008/10/511:52 PM6.88273.620
1632008/10/64:30 PM6.68810.1710
1642008/11/109:22 AM6.37109.250
1652008/11/224:01 PM4.168200.000
1662008/12/264:20 AM4.965399.360
1672009/1/259:47 AM56714.410
1682009/2/206:02 PM5.26615.220
1692009/4/1912:08 PM5.5678.450
1702009/4/225:26 PM5676.240
1712009/7/97:19 PM6810115.441
1722009/8/89:26 PM46111885.352
1732009/8/289:52 AM6.4782.360
1742009/11/25:07 AM5610117.180
1752010/1/175:37 PM3.476
1762010/1/315:36 AM57101661.601
1772010/2/229:32 PM4.261051.520
1782010/2/2512:56 PM5.1616105.260
1792010/4/49:46 PM4.55838.200
1802010/4/147:49 AM7.19148.692698
1812010/6/58:58 PM4.6555.761
1822010/6/102:38 PM5.1680.930
1832010/8/298:53 AM4.80
1842010/10/244:58 PM4.768714.980
1852011/1/19:56 AM5.110
1862011/1/87:34 AM5.6560
1872011/1/129:19 AM510
1882011/1/1912:07 PM4.8691749.160
1892011/2/14:16 PM5.37
1902011/2/153:18 PM5.110
1912011/3/1012:58 PM5.881084.5525
1922011/3/204:00 PM5.230
1932011/3/249:55 PM7.262058.670
1942011/4/105:02 PM5.37710.960
1952011/4/169:11 AM6130
1962011/4/304:35 PM560
1972011/5/1011:41 PM6.1560
1982011/5/229:34 AM5.210
1992011/6/89:53 AM5.3650.020
2002011/6/206:16 PM5.2610139.080
2012011/6/263:48 PM5.210235.260
2022011/7/253:05 AM5.26106.020
2032011/8/23:40 AM5.110
2042011/8/97:50 PM5.211134.750
2052011/8/116:06 PM5.87819.710
2062011/9/1511:27 PM5.5661.750
2072011/9/188:40 PM6.87203.517
2082011/10/169:44 PM56411.290
2092011/10/3011:23 AM5.7223
2102011/11/15:58 AM5.420
2112011/11/18:21 AM672845.000
2122011/12/18:48 PM5.2610104.130
2132012/1/82:20 PM562724.520
2142012/3/96:50 AM6301.180
2152012/5/36:19 PM5.4787.780
2162012/6/155:51 AM5.46209.380
2172012/6/243:59 PM5.771152.394
2182012/6/305:07 AM6.681013.600
2192012/7/208:11 PM4.966640.631
2202012/8/126:47 PM6.27302.620
2212012/9/711:19 AM5.7814226.8381
2222012/11/261:33 PM5.56893.750
2232012/12/710:08 PM5.1697.290
2242013/1/188:42 PM5.471511.900
2252013/1/2312:18 PM5.167625.00
2262013/1/242:01 AM4.25
2272013/1/2912:38 AM6.172016.340
2282013/3/31:41 PM5.57968.040
2292013/3/1111:01 AM5.26842.240
2302013/3/291:01 PM5.66135.230
2312013/4/179:45 AM57968.870
2322013/4/208:02 AM7913116.88196
2332013/4/225:11 PM5.376106.992
2342013/7/227:45 AM6.6820139.9795
2352013/8/125:23 AM6.181013.04
2362013/8/284:44 AM5.18915.753
2372013/8/318:04 AM5.98103.573
2382013/11/236:04 AM5.579
2392013/12/14:34 PM5.36929.010
2402013/12/161:04 PM5.175147.910
2412014/2/125:19 PM7.39123.390
2422014/4/56:40 AM5.3613274.050
2432014/5/309:20 AM6.181287.12
2442014/8/34:30 PM6.5912216.67617
2452014/8/176:07 AM567325.71
2462014/10/19:23 AM561530.230
2472014/10/79:49 PM6.68548.251
2482014/10/251:20 PM4.2651611.890
2492014/11/224:55 PM6.381816.725
2502014/12/64:20 PM5.981048.251
2512015/1/102:50 PM55102.690
2522015/1/141:21 PM5614164.580
2532015/2/222:42 PM5614171.790
2542015/3/16:24 PM5.571170.810
2552015/3/146:14 AM4.36107625.502
2562015/3/309:47 AM5.577119.030
2572015/4/157:08 AM4.569144.311
2582015/4/153:39 PM5.87104.560
2592015/4/252:11 PM8.19204.3327
2602015/5/2212:05 AM4.6570
2612015/7/39:07 AM6.581044.833
2622015/10/307:26 PM5.161062.390
2632016/1/145:18 AM5.3658.220
2642016/1/211:13 AM6.48100.640
2652016/2/119:10 PM5683.810
2662016/3/1211:14 AM4.4650
2672016/5/119:15 AM5.576.90
2682016/5/1812:48 AM56157.490
2692016/5/225:08 PM4.666162.500
2702016/7/315:18 PM5.471021.980
2712016/8/1111:49 AM4.451013.150
2722016/9/231:23 AM5.161635.770
2732016/10/173:14 PM6.27921.831
2742016/11/2510:24 PM6.78100.651
2752016/12/81:15 PM6.2863.100
2762016/12/144:14 PM550
2772016/12/206:04 PM5.8790.730
2782016/12/278:17 AM4.86100
2792017/1/282:46 AM4.96110
2802017/2/87:11 PM4.97100
2812017/3/277:55 AM5.161258.290
2822017/5/41:40 PM4.96100
2832017/5/115:58 AM5.57816.648
2842017/6/167:48 PM4.37520.770
2852017/8/189:19 PM792028.7930
2862017/8/97:27 AM6.6811169.660
2872017/9/302:14 PM5.46133.620
2882017/11/186:34 AM6.98100
2892017/11/235:43 PM56100

References

  1. Erdik, M.; Şeşetyan, K.; Demircioǧlu, M.B.; Zülfikar, C.; Hancilar, U.; Tüzün, C.; Harmandar, E. Rapid Earthquake Loss Assessment After Damaging Earthquakes. Soil Dyn. Earthq. Eng. 2011, 31, 247–266. [Google Scholar] [CrossRef]
  2. Samardjieva, E.; Badal, J. Estimation of the Expected Number of Casualties Caused by Strong Earthquakes. Bull. Seismol. Soc. Am. 2002, 92, 2310–2322. [Google Scholar]
  3. Chen, W.; Sun, Z.; Han, J. Landslide Susceptibility Modeling Using Integrated Ensemble Weights of Evidence with Logistic Regression and Random Forest Models. Appl. Sci. 2019, 9, 171. [Google Scholar] [CrossRef]
  4. Chen, Q.F.; Mi, H.; Huang, J. A simplified approach to earthquake risk in mainland China. Pure Appl. Geophys. 2005, 162, 1255–1269. [Google Scholar] [CrossRef]
  5. Chen, W.; Shirzadi, A.; Shahabi, H.; Ahmad, B.B.; Zhang, S.; Hong, H.; Zhang, N. A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. Geomat. Nat. Hazards Risk 2017, 8, 1955–1977. [Google Scholar] [CrossRef] [Green Version]
  6. Hu, H.Y.; Lee, Y.C.; Yen, T.M.; Tsai, C.H. Using BPNN and DEMATEL to modify importance-performance analysis model—A study of the computer industry. Expert Syst. Appl. 2009, 36, 9969–9979. [Google Scholar] [CrossRef]
  7. Park, S.; Kim, J. Landslide Susceptibility Mapping Based on Random Forest and Boosted Regression Tree Models, and a Comparison of Their Performance. Appl. Sci. 2019, 9, 942. [Google Scholar] [CrossRef]
  8. Provost, F.; Hibert, C.; Malet, J.P. Automatic classification of endogenous landslide seismicity using the Random Forest supervised classifier. Geophys. Res. Lett. 2017, 44, 113–120. [Google Scholar] [CrossRef]
  9. Sung, A.H.; Mukkamala, S. Identifying Important Features for Intrusion Detection Using Support Vector Machines and Neural Networks. In Proceedings of the 2003 Symposium on Applications and the Internet, Orlando, FL, USA, 28 February 2003; pp. 3–10. [Google Scholar]
  10. Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef]
  11. Karimzadeh, S.; Miyajima, M.; Hassanzadeh, R.; Amiraslanzadeh, R.; Kamel, B. A GIS-based seismic hazard, building vulnerability and human loss assessment for the earthquake scenario in Tabriz. Soil Dyn. Earthq. Eng. 2014, 66, 263–280. [Google Scholar] [CrossRef]
  12. Wilson, B.; Paradise, T. Assessing the impact of Syrian refugees on earthquake fatality estimations in southeast Turkey. Nat. Hazards Earth Syst. Sci. 2018, 18, 257–269. [Google Scholar] [CrossRef] [Green Version]
  13. Maqsood, S.T.; Schwarz, J. Estimation of Human Casualties from Earthquakes in Pakistan—An Engineering Approach. Seismol. Res. Lett. 2011, 82, 32–41. [Google Scholar] [CrossRef]
  14. Jaiswal, K.; EERI, M.; Wald, D. An empirical model for Global Earthquake fatality estimation. Earthq. Spectra 2010, 26, 1017–1037. [Google Scholar] [CrossRef]
  15. So, E.; Spence, R. Estimating shaking-induced casualties and building damage for global earthquake events: A proposed modelling approach. Bull. Earthq. Eng. 2013, 11, 347–363. [Google Scholar] [CrossRef]
  16. Hashemi, M.; Alesheikh, A.A. A GIS-based earthquake damage assessment and settlement methodology. Soil Dyn. Earthq. Eng. 2011, 31, 1607–1617. [Google Scholar] [CrossRef]
  17. Earle, P.S.; Wald, D.J.; Allen, T.I.; Jaiswal, K.S.; Porter, K.A.; Hearne, M.G. Rapid Exposure and Loss Estimates for The May 12, 2008 Mw 7.9 Wenchuan Earthquake Provided by The U.S. Geological Survey’s Pager System. In Proceedings of the 14th World Conference on Earthquake Engineering, Beijing, China, 12–17 October 2008. [Google Scholar]
  18. Schweier, C. Geometry Based Estimation of Trapped Victims After Earthquakes. Int. Symp. Strong Vranc. Earthq. Risk Mitig. 2007, 9, 4–6. [Google Scholar]
  19. Goretti, A.; Bramerini, F.; Di Pasquale, G.; Dolce, M.; Lagomarsino, S.; Parodi, S.; Iervolino, I.; Verderame, G.M.; Bernardini, A.; Penna, A.; et al. The Italian Contribution to the USGS PAGER Project. In Proceedings of the 14th World Conference on Earthquake Engineering, Beijing, China, 12–17 October 2008. [Google Scholar]
  20. Jaiswal, K.; Wald, D.J.; Hearne, M. Estimating Casualties for Large Earthquakes Worldwide Using an Emperical Approach; Open-File Rep. 2009-1136; U.S. Geological Survey: Reston, VA, USA, 2009; p. 78.
  21. Aghamohammadi, H.; Mesgari, M.S.; Mansourian, A.; Molaei, D. Seismic human loss estimation for an earthquake disaster using neural network. Int. J. Environ. Sci. Technol. 2013, 10, 931–939. [Google Scholar] [CrossRef] [Green Version]
  22. Wang, F.; Niu, L. An Improved BP Neural Network in Internet of Things Data Classification Application Research. In Proceedings of the 2016 IEEE Information Technology Networking Electronic and Automation Control. Conference. (ITNEC 2016), Chongqing, China, 20–22 May 2016; pp. 805–808. [Google Scholar]
  23. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  24. Xu, L.; Cui, Y.; Song, Y.; Ma, X. The Application of an Improved BPNN Model to Coal Pyrolysis. Energy Sources Part. A Recovery Util. Environ. Eff. 2015, 37, 1805–1812. [Google Scholar] [CrossRef]
  25. Chand, J.; Singh Chauhan, A.; Kumar Shrivastava, A. Review on Classification of Web Log Data using CART Algorithm. Int. J. Comput. Appl. 2013, 80, 41–43. [Google Scholar] [CrossRef]
  26. Tian, H.X.; Mao, Z.Z. An Ensemble ELM Based on Modified AdaBoost.RT Algorithm for Predicting the Temperature of Molten Steel in Ladle Furnace. IEEE Trans. Autom. Sci. Eng. 2009, 7, 73–80. [Google Scholar]
  27. Nadim, F.; Andresen, A.; Bolourchi, M.J.; Mokhtari, M.; Tvedt, E.; Moghtaderi-Zadeh, M.; Lindholm, C.; Remseth, S. The Bam Earthquake of 26 December 2003. Bull. Earthq. Eng. 2005, 2, 119–153. [Google Scholar] [CrossRef]
  28. Kingma, D.P.; Ba, J. A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
  29. Gimpel, K.; Das, D.; Smith, N.A. Distributed Asynchronous Online Learning for Natural Language. In Proceedings of the 14th Conference on Computational Natural Language learning (CoNLL 2010), Uppsala, Sweden, 15–16 July 2010; pp. 213–222. [Google Scholar]
  30. Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), Savannah, GA, USA, 2–4 November 2016. [Google Scholar]
  31. Danner, G.; Jelasity, M. Fully Distributed Privacy Preserving Mini-Batch Gradient Descent Learning. In Proceedings of the 15th IFIP International Conference on Distributed Applications and Interoperable Systems, Grenoble, France, 2–4 June 2015; pp. 30–44. [Google Scholar]
  32. Said, E.S.; Dickey, D.A. Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 1984, 71, 599–607. [Google Scholar] [CrossRef]
  33. China Earthquake Administration. Assessment of Earthquake Disaster Situation in Emergency Period; GB/T 30352-2013; China Standard Press: Beijing, China, 2014.
  34. Wyss, M. Human losses expected in Himalayan earthquakes. Nat. Hazards 2005, 34, 305–314. [Google Scholar] [CrossRef]
  35. Nichols, J.M.; Beavers, J.E. Development and Calibration of an Earthquake Fatality Function. Earthq. Spectra 2003, 19, 605–633. [Google Scholar] [CrossRef]
  36. Wyss, M.; Zuñiga, F.R. Estimated casualties in a possible great earthquake along the Pacific Coast of Mexico. Bull. Seismol. Soc. Am. 2016, 106, 1867–1874. [Google Scholar] [CrossRef]
Figure 1. Flowchart for assessing human losses with machine learning methods. CART: classification and regression tree.
Figure 1. Flowchart for assessing human losses with machine learning methods. CART: classification and regression tree.
Sustainability 11 02727 g001
Figure 2. Simple example showing that AdaBoost can construct strong classifier from set of weak classifiers.
Figure 2. Simple example showing that AdaBoost can construct strong classifier from set of weak classifiers.
Sustainability 11 02727 g002
Figure 3. Comparing the relative importance of input variables with RF, CART, and AdaBoost algorithms. To make the contrast clearer and more readable, use the bar chart in the sigmaplot software for drawing.
Figure 3. Comparing the relative importance of input variables with RF, CART, and AdaBoost algorithms. To make the contrast clearer and more readable, use the bar chart in the sigmaplot software for drawing.
Sustainability 11 02727 g003
Figure 4. Importance of structure type (The structure types that contribute little to the casualty not shown. The importance of general houses and structure types below it close to zero).
Figure 4. Importance of structure type (The structure types that contribute little to the casualty not shown. The importance of general houses and structure types below it close to zero).
Sustainability 11 02727 g004
Figure 5. Scheme of deep learning network.
Figure 5. Scheme of deep learning network.
Sustainability 11 02727 g005
Figure 6. The number of deaths of the deep learning model and the national standard on test data. And the errors of two models with actual death toll are also presented. (The results of earthquake cases 8–10 are not shown for the large difference will affect the results of other earthquake cases).
Figure 6. The number of deaths of the deep learning model and the national standard on test data. And the errors of two models with actual death toll are also presented. (The results of earthquake cases 8–10 are not shown for the large difference will affect the results of other earthquake cases).
Sustainability 11 02727 g006
Figure 7. Results of three tests by Random Forest (a), AdaBoost (b), and CART (c) algorithms. The horizontal axis is the first letter of the feature, followed by: PD (population density), M (magnitude), FD (focal depth), EI (epicentral intensity), ES (economic status), D (date), T (time), AI (abnormal intensity) and SD (secondary disaster).
Figure 7. Results of three tests by Random Forest (a), AdaBoost (b), and CART (c) algorithms. The horizontal axis is the first letter of the feature, followed by: PD (population density), M (magnitude), FD (focal depth), EI (epicentral intensity), ES (economic status), D (date), T (time), AI (abnormal intensity) and SD (secondary disaster).
Sustainability 11 02727 g007
Table 1. Earthquake disaster losses in China mainland from 1992 to 2017, including the number of death and injured people and the economic costs. The unit of the economic loss is Chinese Renminbi Yuan (CNY).
Table 1. Earthquake disaster losses in China mainland from 1992 to 2017, including the number of death and injured people and the economic costs. The unit of the economic loss is Chinese Renminbi Yuan (CNY).
YearDeaths (People)Injured (People)Economic Costs (CNY)
199254801.60 × 108
199393812.84 × 108
1994413783.29 × 108
199585150241.16 × 109
1996365179564.60 × 109
1997211501.25 × 109
199859136311.84 × 109
199931374.74 × 108
20001029771.47 × 109
200197411.48 × 109
200223601.48 × 108
200331971474.66 × 109
200486885.78 × 108
2005158672.63 × 109
2006252048.00 × 108
200734192.02 × 109
2008692833770108.59 × 1011
200934042.74 × 109
20102705110882.36 × 1010
2011325066.01 × 109
20128613318.29 × 109
2013294156719.95 × 1010
201462436883.56 × 1010
20153312171.80 × 1010
201621036.68 × 109
2017376381.48 × 1010
Table 2. Parameters of random forest (RF), CART, and AdaBoost (Unset parameters were used as default parameters in sklearn, AdaBoost has two classes of classification algorithms SAMME and SAMME.R, among which SAMME.R is better based on class probability).
Table 2. Parameters of random forest (RF), CART, and AdaBoost (Unset parameters were used as default parameters in sklearn, AdaBoost has two classes of classification algorithms SAMME and SAMME.R, among which SAMME.R is better based on class probability).
Random ForestCARTAdaBoost
number of estimators = 82
number of jobs = −1
max features = none
criterion = gini
max depth = 10
max features = none
number of estimators = 500
criterion = gini
base estimator = decision trees classifier algorithm = SAMME.R
learning rate = 0.5
number of estimators = 379
Table 3. Results of testing the models on unseen validation set of 9 features.
Table 3. Results of testing the models on unseen validation set of 9 features.
AlgorithmRandom ForestCARTAdaBoost
Mean accuracy0.8200.7450.766
Table 4. Steps of the random forest algorithm.
Table 4. Steps of the random forest algorithm.
Step by Step Procedure of Random Forests Algorithm
Inputs: a, b, c, d
a: Damage ratio of collapse of different structures
b: Damage ratio of heavy damage of different structures
c: Population density
d: Death
Parameters:
number of estimators = 500, criterion = gini, numbers of jobs = -1, max features = None, criterion = MSE, max depth = None, max leaf nodes = None, min impurity decrease = 0.0, min impurity split = None, min samples leaf = 1, min samples split = 2, min weight fraction leaf = 0.0, presort = False, random state = None, splitter = best
Process:
step1: Bootstrap sampling is used to extract sub-training sets from the training set.
step2: Generate the feature subsets by randomly selecting features before node splits.
step3: Establish decision trees
step4: Obtain the results for the sample to be tested
step5: Vote on the results and got the results.
Output: Importance of structure types
Table 5. Hyper parameters.
Table 5. Hyper parameters.
Batch SizeLearning Rate BaseLearning Rate DecayRegularization RateTraining StepsMoving Average Decay
160.80.990.000055000000.99
Table 6. Test sets.
Table 6. Test sets.
NumberTimeMagnitude (Ms.)Epicentral Intensity (Degree)Focal Depth (km)Population Density (People/km2)
10.335.161258.28519
20.845.0733105.7979
30.255.57816.63824
40.385.169169.8517
50.966.98113.384553
60.696.1810131.524
70.475.7814226.8326
80.696.5912216.6675
90.337.19148.686792
100.608.01114238.3409
Table 7. Comparison results.
Table 7. Comparison results.
NumberDeep LearningChina-National StandardTrue Value
1000
2001
3708
426022
524024
6361041
7801881
8245173617
9262572698
103740619169227
Table 8. The accuracy of two models.
Table 8. The accuracy of two models.
MethodCases 1–7Cases 8–10Cases 1–10
China-National Standard52.27%25.32%44.19%
Deep learning model93.61%45.85%79.28%

Share and Cite

MDPI and ACS Style

Jia, H.; Lin, J.; Liu, J. An Earthquake Fatalities Assessment Method Based on Feature Importance with Deep Learning and Random Forest Models. Sustainability 2019, 11, 2727. https://0-doi-org.brum.beds.ac.uk/10.3390/su11102727

AMA Style

Jia H, Lin J, Liu J. An Earthquake Fatalities Assessment Method Based on Feature Importance with Deep Learning and Random Forest Models. Sustainability. 2019; 11(10):2727. https://0-doi-org.brum.beds.ac.uk/10.3390/su11102727

Chicago/Turabian Style

Jia, Hanxi, Junqi Lin, and Jinlong Liu. 2019. "An Earthquake Fatalities Assessment Method Based on Feature Importance with Deep Learning and Random Forest Models" Sustainability 11, no. 10: 2727. https://0-doi-org.brum.beds.ac.uk/10.3390/su11102727

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop