Bayesian Predictive Analysis of Natural Disaster Losses

Deng, Min; Aminzadeh, Mostafa; Ji, Min

doi:10.3390/risks9010012

Open AccessArticle

Bayesian Predictive Analysis of Natural Disaster Losses

by

Min Deng

,

Mostafa Aminzadeh

and

Min Ji

^*

Department of Mathematics, Towson University, Towson, MD 21252, USA

^*

Author to whom correspondence should be addressed.

Risks 2021, 9(1), 12; https://0-doi-org.brum.beds.ac.uk/10.3390/risks9010012

Submission received: 20 November 2020 / Revised: 16 December 2020 / Accepted: 18 December 2020 / Published: 2 January 2021

Download

Browse Figures

Versions Notes

Abstract

:

Different types of natural events hit the United States every year. The data of natural hazards from 1900 to 2016 in the US shows that there is an increasing trend in annul natural disaster losses after 1980. Climate change is recognized as one of the factors causing this trend, and predictive analysis of natural losses becomes important in loss prediction and risk prevention as this trend continues. In this paper, we convert natural disaster losses to the year 2016 dollars using yearly average Consumers Price Index (CPI), and conduct several tests to verify that the CPI adjusted amounts of loss from individual natural disasters are independent and identically distributed. Based on these test results, we use various model selection quantities to find the best model for the natural loss severity among three composite distributions, namely Exponential-Pareto, Inverse Gamma-Pareto, and Lognormal-Pareto. These composite distributions model piecewise small losses with high frequency and large losses with low frequency. Remarkably, we make the first attempt to derive analytical Bayesian estimate of the Lognormal-Pareto distribution based on the selected priors, and show that the Lognormal-Pareto distribution outperforms the other two composite distributions in modeling natural disaster losses. Important risk measures for natural disasters are thereafter derived and discussed.

Keywords:

composite distributions; predictive analysis; bayesian inference; natural disaster; climate change; risk measures

1. Introduction

Different types of natural events hit the United States (US) every year. The east coast of the US suffers hurricanes, the middle of the US sees tornadoes, the west coast of the US endures earthquake, and the south of the US bears a variety of issues such as hurricane, wind, drought, and floods1. The data of the occurrence and damage of natural events from 1900 to present in the US, from the Emergency Events Database (EM-DAT)—International Disaster Database2, showed that every year’s natural disaster losses after 1980 have been increasing. Climate change is recognized as one of the contributors to this increasing natural losses. Injury, homelessness, displacement, and economic losses from natural events can have significant impact on populations and societies. Predictive analysis of future natural events is imperative to provide important information for prevention and remedy plans to reduce the human impact and economic losses from natural disasters.

Various research has attempted to model the frequencies and damages due to the natural events. Levi and Partrat (1991) analyzed hurricane losses between 1954 and 1986 in the US, and found that the amounts of losses were independent and identically distributed (i.i.d.) and independent of the frequencies of hurricanes. These assumptions are confirmed in our research, based on the EM-DAT natural disaster data from 1900 to 2016 after taking into account price inflations. The number of losses in different years is converted into 2016 dollars using the annual average Consumers Price Index (CPI)3. We found that both the numbers of natural events and the CPI adjusted total damages have been increasing over the time; however there is no visible trend in the individual CPI adjusted losses, which is consistent with the conclusion claimed by Levi and Partrat (1991).

To discover the distribution features of individual losses from natural disasters, we use the most recent 37 years of the CPI adjusted amounts of natural losses to investigate appropriate parametric distribution models for the amount of loss caused by individual natural event, i.e., natural loss severity. Levi and Partrat (1991) proposed to use lognormal distribution for the losses of natural events based on the hurricane data between 1954 and 1986 in the US. However, this distribution cannot describe the typical feature of the natural disaster losses, i.e., many small amounts of losses and occasional occurrence of huge amount of losses. Some researchers argued to use composite distributions to capture this recognized feature.

A composite distribution has a threshold for the support of the loss random variable, with the Pareto distribution usually used to model a large loss beyond the threshold amount and popular parametric continuous distributions used for a small amount of loss below the threshold. For example, Bakar et al. (2015) developed several Weibull distribution-based composite models for heavy tailed insurance loss data. In our paper, we compare the performance of three composite distributions, namely Exponential-Pareto (Exp-Pareto), Inverse Gamma-Pareto (IG-Pareto), and Lognormal-Pareto (LN-Pareto), and the corresponding three non-composite parametric models, based on the negative loglikelihood, the Akaike information criterion, and Bayesian information criterion. All the three model selection criteria show that the composite distributions fit the data better than the non-composite models.

We thereafter apply the three composite models to the data and select the best fitted composite model for the loss severity of natural events. For consistent model comparison, we make the first attempt to derive analytic Bayesian estimation of the LN-Pareto composite distribution, because Aminzadeh and Deng (2018, 2019) has analytically derived Bayesian estimators for Exp-Pareto and IG-Pareto. In addition, Bayesian method enables us to perform predictive analysis of future losses of natural disasters using the best fitted composite model.

Cooray and Cheng (2015) developed Bayesian estimators for the LN-Pareto composite model based on a Markov Chain Monte Carlo (MCMC) simulation algorithm. In our paper, we carefully select prior distributions for the LN-Pareto composite model and derive a closed form Bayesian estimators for the unknown parameters of the LN-Pareto distribution. The mean squared errors (MSE) of the Bayesian estimation method and the maximum likelihood (ML) estimation method show that Bayesian estimation method outperform the ML estimation method in estimating these three composite models.

The comparison results show that the LN-Pareto is the best among these three composite models for the loss severity of natural disasters. Various risk measures of natural event losses are thereafter presented based on the LN-Pareto distribution. Same risk measures based on the other two composite distributions of natural losses are also provided for comparison.

The remainder of this paper is organized as follows. Section 2 describes the natural disaster data and tests the assumption that individual loss amounts are independent and identically distributed. Section 3 introduces three composite distributions and compare these models with the corresponding non-composite models. Section 4 derives Bayesian estimators for the LN-Pareto composite distribution and demonstrates the performance of Bayesian estimation method versus the ML estimation method. Section 5 presents risk measures of the future natural disaster losses based on the LN-Pareto model and the other two composite distributions. The concluding remarks are given in Section 6.

2. Natural Losses in the US

2.1. The Data

The EM-DAT contains worldwide data on the occurrence and impact of natural events from 1900 to the present day. The database is compiled from various sources, including United Nation agencies, non-governmental organizations, insurance companies, research institutes and press agencies. There were a total of 258 natural events in the US from 1900 to 2016, among which 156 events happened between 1980 to 2016, accounting for 60% of the total events during this one third of the whole time period. This observation is consistent with the fact that Earth’s climate is changing faster than at any point in the history as a result of human activities.

We also look into the trend of the natural loss amounts from 1900 to 2016. To eliminate the effect of price inflation, we convert the amounts of losses in each year into the year 2016 dollars. Let

C P I_{t}

be the annual average CPI in year t and

y_{t}

be the CPI adjusted amount of losses in that year, then we have

y_{t} = loss amount in year t \times \frac{C P I_{2016}}{C P I_{t}}, for t = 1900, 1991, \dots, 2016 .

After price inflation effect being adjusted, the amount of losses during 1980 to 2016 accounts for 85.3% of the total losses. Figure 1 shows the yearly number of natural events and the CPI adjusted (in 2016 dollars) total damage costs from 1900 to 2016 in the US. We can see that both the number of natural events and the CPI adjust damage costs in each year have been increasing over these years, especially after 1980.

To demonstrate the change in both number and damage costs of natural events caused by human activities, we break down the data to be before and after 1980. Figure 2 displays the percentage of the number of occurrence to the total number of occurrence and the percentage of the CPI adjusted damage costs to the total losses by different time periods (before and after 1980) and by different types of natural disasters. The exact numbers for this figure are listed in Table A1 and Table A2 in Appendix A.

We can see that storms and floods account for most of natural events. The damage costs from natural events after 1980 account for the majority of the total costs, among which the damage costs caused by storms after 1980 account for more than half. In addition, these storm losses were caused by fewer storms after 1980 than before 1980, indicating higher average storm losses after 1980 than before 1980. However, this is not the case for other types of natural events.

Although there is an obvious increasing trend in both the numbers and the total damage costs of natural events after 1980, it is not trivial to find out the pattern in the economic losses of individual natural events. Based on the aforementioned features of the natural loss data, we use the natural losses in the most recent 37 years from 1980 to 2016. There are a total number of 462 natural events in these 37 years, and only seven events caused economic loss above

2 \times 10^{7}

(in 2016 dollars). In Figure 3, we depict the scatter plot of those losses below

2 \times 10^{7}

, and we can see that there is no identified trend in individual natural disaster losses.

We are interested in how to appropriately model individual natural disaster losses. Motivated by this research question and the above observations, in this paper we aim to investigate the features of the severity of natural events, attempt to find appropriate loss severity models for natural disasters, and provide the predictive modeling of loss severity for risk management and insurance arrangement.

2.2. The i.i.d. Assumption for Loss Severity

Let the random variable N denote the number of occurrences per year and

X_{i}

, for

i = 1, 2, \dots, N

, be the severity random variable of the

i th

occurrence of natural events occurring in each year. The vector

\vec{X} = {(X_{1}, X_{2}, \dots, X_{N})}^{T}

denotes the random vector of yearly loss severity. We use the natural disaster data between 1980 to 2016, and the random variable N has a sample size of 37. Table A3 in Appendix B lists the CPI adjusted individual natural damage losses in these 37 years. It can easily be seen that random variable N takes one of the values

0, 2, 6, 8, \dots, 28

over the 37 years. One value of N is zero, because there was no record of natural events in year 1988. In all other years, there were at least two natural events every year. Therefore, there are 36 non-zero realizations

{(n_{k}, {\vec{x}}^{k})}_{k = 1, 2, \dots, 36}

of

(N, \vec{X})

. The

i th

column in the matrix

{({\vec{x}}^{1}, {\vec{x}}^{2}, \dots, {\vec{x}}^{36})}^{T}

are the realizations of

X_{i}

, i.e., the

i th

occurrence of natural disasters in each of 36 years.

Let

m_{i}

be the sample size of the

i th

random variable

X_{i}

. From Table A3 we also see that

X_{1}

, the 1st severity random variable, has 36 non-zero values and therefore the sample size

m_{1} = 36

. The realization of

X_{1}

is

{(x_{1}^{1}, x_{1}^{2}, \dots, x_{1}^{36})}^{T}

, which takes the value

(1019.45, 1056.14, \dots, 550.00)

million in 2016 dollars.

X_{28}

, the 28th severity random variable with

m_{28} = 1

and takes only one non-zero value

1822.71

millions in 2016 dollars, because there is only one year with 28 losses. The total number of the observation is

\sum_{k = 1}^{28} m_{k} = 462

.

We want to test the general assumption that

X_{1}, X_{2}, \dots, X_{28}

are independent and identically distributed random variables. Since

X_{21}, X_{22}, \dots X_{28}

have small sample sizes, we group them together as one random variable

X_{21}

. Applying the nonparametric methods, the Kendall Tau test and the Spearman test (see Gibbons and Chakraborti 2003), we test the independence assumption for

X_{1}, X_{2}, \dots, X_{21}

as follows.

For

X_{i}, X_{j}, i \neq j

, where

i, j = 1, 2, \dots, 21

, the null hypotheses and alternative hypotheses are

H_{0} : ρ_{X_{i}, X_{j}} = 0, H_{a} : ρ_{X_{i}, X_{j}} \neq 0

Let

m = min (m_{i}, m_{j})

. The Kendall Tau statistic is

T = \frac{1}{(\binom{m}{2})} \sum_{1 \leq k \leq l \leq m} A_{k l},

where

A_{k l} = \{\begin{matrix} 1 & (x_{i}^{k} - x_{i}^{l}) (x_{j}^{k} - x_{j}^{l}) > 0 \\ 0 & (x_{i}^{k} - x_{i}^{l}) (x_{j}^{k} - x_{j}^{l}) = 0 \\ - 1 & (x_{i}^{k} - x_{i}^{l}) (x_{j}^{k} - x_{j}^{l}) < 0 \end{matrix}

Let

R_{k}

and

S_{k}

be the rank of

x_{i}^{k}

and

x_{j}^{k}

among m observations respectively, for

k = 1, 2, \dots, m

. The Spearman rho test statistic is

R = \frac{\sum_{k = 1}^{m} (R_{k} - \bar{R}) (S_{k} - \bar{S)}}{\sqrt{\sum_{k = 1}^{m} {(R_{k} - \bar{R})}^{2} \sum_{k = 1}^{m} {(S_{k} - \bar{S})}^{2}}},

where

\bar{R}

and

\bar{S}

are the average of

R_{k}

and

S_{k}

respectively.

For

X_{1}, X_{2}, \dots, X_{21}

, there are total of

(\binom{21}{2}) = 210

tests. We obtain 210 values of all the Kendall Tau test statistic and Spearman test statistic with the corresponding p-values. Table 1 lists the pairs of severity random variables with the null hypotheses being rejected at the significance level of 1%. All other pairs have non-significant test results and we fail to reject the null hypotheses at the significance level of 1%. Based on these test results, it is reasonable to assume that

X_{1}, X_{2}, \dots

are independent.

Next, the Kruskal-Wallis test is used to verify the identical distribution assumption. The details of this test are introduced by Gibbons and Chakraborti (2003). Let

F_{i}

be the cumulative distribution function of

X_{i}, i = 1, 2, . . ., 21

. In our Kruskal-Wallis test, the null hypotheses and alternative hypotheses are

H_{0} : F_{1} = F_{2} = . . . = F_{21}, H_{a} : F_{i} \neq F_{j}, for some i \neq j .

There are a total of 462 observations. Sort all the 462 observations in an increasing order. If there are more than one observations with an identical value, then the median is assigned as the rank for these observations. Define

T_{i}

to be the sum of the ranks of all the observations of

X_{i}

, where

i = 1, 2, \dots, 21

. We have

T_{1} = 8817, T_{2} = 8444, \dots

, and

T_{21} = 6656.5

. It is clear that

\sum_{i = 1}^{21} T_{i} = 462 (462 + 1) / 2 =

106,854.

Under the null hypotheses,

E (T_{i}) = m_{i} (462 + 1) / 2

. Therefore,

E (T_{1}) = 8334, E (T_{2}) =

88,334, …, and

E (T_{21}) = 6019

. The Kruskal-Wallis Statistic is

K W = \frac{12}{462 (462 + 1)} \sum_{i = 1}^{21} \frac{1}{m_{i}} {(T_{i} - E (T_{i}))}^{2} = 19.4274

Asymptotically,

K W

has Chi-Square distribution with the degree of freedom 20, with the p-value 0.494. We fail to reject

H_{0}

at the significance level 1%. Therefore, it is reasonable to assume that

X_{1}, X_{2}, \dots

are identically distributed.

2.3. Non-Parametric Distribution of Loss Severity

Based on the test results, we can reasonably assume that the CPI adjusted natural disaster damage random variable is independent and identically distributed. Let X denote the CPI adjusted damage random variable with unknown distribution

F (x)

. The 462 individual damage losses are the realizations of the loss severity random variable X. As we explore an appropriate parametric distribution

F (x)

for the natural disaster loss severity, we first look at the non-parametric distribution of X, which is also called a data dependent distribution.

Let

y_{1} < y_{2} < \dots < y_{k}

be the k unique loss values, and

s_{i}

be the number of times the observations

y_{i}

appears in the sample. Let

r_{j} = \sum_{i = j}^{k} s_{i}

be the number of observations greater than or equal to

y_{j}

. The Nelson-Aalen estimator of the cumulative hazard rate function is

\hat{H} (x) = \{\begin{matrix} 0 & x < y_{1} \\ \sum_{i = 1}^{j - 1} \frac{s_{i}}{r_{i}} & y_{j - 1} \leq x < y_{j}, j = 2, . . ., k \\ \sum_{i = 1}^{k} \frac{s_{i}}{r_{i}} & x \geq y_{k} \end{matrix}

Therefore,

\hat{S} (x) = exp (- \hat{H} (x))

and

\hat{F} (x) = 1 - \hat{S} (x) = 1 - exp (- \hat{H} (x))

. The Kolmogorov -Smirnov (K-S) confidence band of unknown distribution

F (x)

can also be constructed based on the Nelson Aalen estimate of distribution

\hat{F} (x)

. Define the K-S statistics by

D_{n} = \underset{x}{s u p} (| \hat{F} (x) - F (x) |)

, where n is sample size. To form

100 (1 - α)

% conference band, we select a number d such that

P (D_{n} \geq d) = α

. Then, the lower band is

F_{L} (x) = max (\hat{F} (x) - d, 0)

and the upper band is

F_{U} (x) = max (\hat{F} (x) + d, 0)

. We set

α = 0.05

and the true unknown distribution of loss lies between

F_{L} (x)

and

F_{U} (x)

with 95% confidence.

The plots of

\hat{F} (x), F_{L} (x)

, and

F_{U} (x)

is given in Panel (a) of Figure 4. We also display the histogram of the CPI adjusted severity of the natural events in Panel (b) of Figure 4. We can see that the histogram of the CPI adjusted severity of the natural event losses is both skewed and fat-tailed, which shows the typical feature that there are many small losses and a few very large losses.

3. Composite Models

We confirmed that the natural event severity data has the typical feature of high frequency of small losses and low frequency of large losses; while the traditional distributions such as the normal, exponential, inverse-gamma, and log-normal distributions, are not able to describe this feature. Some researchers have addressed this feature of insurance data by using composite models for insurance losses. The probability density function of a composite model consists of two distributions, namely probability density functions

f_{1} (x)

and

f_{2} (x)

. The general composite model has the probability density function as follows.

f_{X} (x) = \{\begin{matrix} c f_{1} (x) & 0 < x \leq θ \\ c f_{2} (x) & θ \leq x < \infty \end{matrix},

where c is a normalized constant and

θ

is the parameter that represents the threshold of the supports for the two distributions. In order to make the composite density function smooth, it is usually assumed that the pdf

f_{X} (x)

is continuous and differentiable at

θ

, that is

f_{1} (θ) = f_{2} (θ)

and

f_{1}^{^{'}} (θ) = f_{2}^{^{'}} (θ) .

3.1. Three Composite Distributions

Cooray and Ananda (2005) introduced a two-parameter continuous and differentiable composite LN-Pareto model, which is a two-parameter lognormal density up to an unknown threshold value and a two-parameter Pareto density for the rest of the model. The resulting density is similar in shape to a lognormal density with the tail behavior quite similar to a Pareto density. They applied the proposed composite model to a fire insurance data to show the importance of the proposed composite LN-Pareto distribution in describing insurance claim data.

Motivated by Cooray and Ananda (2005), Teodorescu and Vernic (2006) introduced a composite Exp-Pareto distribution, which is an exponential density up to an unknown threshold value and a two parameter Pareto density for the rest of the model. The model is reduced to a one-parameter distribution after satisfying the continuous and differentiable condition. Aminzadeh and Deng (2019) proposed an IG-Pareto composite model. Under the general assumption of smoothness and continuity, the IG-Pareto model is reduced to be a one parameter model. Let

Φ (\cdot)

be the cumulative distribution function (cdf) of the standard normal distribution. Table 2 summarizes the distribution density function of Exp-Pareto, LN-Pareto, and IG-Pareto composite models, and Figure 5 plots three distribution functions with various parameter values.

We can see that the Nelson Aalen estimate of the unknown distribution of the natural disaster severity loss random variable is quite similar to the distributions of three composite models. This indicates that composite models might be able to describe the features of the natural disaster damage losses. In the next section, we will compare the performance of these three composite distributions in modeling the natural disaster loss severity based on standard model comparison and selection criteria.

3.2. Model Selection for Loss Severity

The maximum likelihood estimators for the unknown parameters of the Exp-Pareto, IG-Pareto, and Lognormal-Pareto composite models have been derived by Teodorescu and Vernic (2006), Aminzadeh and Deng (2019), and Cooray and Ananda (2005) respectively. We know that the parameter

θ

is the unknown threshold value dividing the domain of the two distributions of a composite model, and the maximum likelihood functions changes when the value of

θ

changes; therefore, a grid search method has to be used to find the ML estimates. The grid search algorithm can be briefly summarized as follows.

1.: Sort the sample of the natural disaster damage losses in an increasing order, i.e., $x_{1} < x_{2} < . . ., < x_{n}$ , where n is the sample size. Let $n^{*}$ be the size of the partial sample of the first $n^{*}$ losses $x_{1}, x_{2}, \dots, x_{n^{*}}$ . Start from $n^{*} = 1$ .
2.: Compute the maximum likelihood estimates $\hat{θ}$ and $\hat{β}$ as in Table 3 for the given $n^{*}$ . If $\hat{θ}$ is in between $x_{n^{*}} \leq \hat{θ} \leq x_{n^{*} + 1}$ , we found $n^{*}$ ; otherwise, increase $n^{*}$ by 1.
3.: Repeat Step 2 for $n^{*} = 2, 3, \dots,$ till $x_{n^{*}} \leq \hat{θ} \leq x_{n^{*} + 1}$ . The ML estimates of the parameters are found based on the correct $n^{*}$ .

In our research, Mathematica software is used to code the algorithm. Table 3 lists the ML estimates of three composite distributions.

Based on the ML estimates, we use the negative log-likelihood (NLL) value, the Akaike’s Information Criterion (AIC), the Bayesian Information Criterion (BIC) goodness-of-fit measures to compare the appropriateness of these three composite models in modeling the natural disaster severity.

N L L

can be used to compare models with the same number of parameters. It is equivalent to the maximum value of the likelihood function and defined as

N L L = - log L (x_{1}, x_{2}, \dots, x_{n} | \underset{̲}{θ}),

where

\underset{̲}{θ}

is the vector of unknown parameters. The smaller the NLL value, the larger the value of the likelihood function, and the better the fitted model.

AIC was defined by Akaike (1973) as

A I C = - 2 log L (x_{1}, x_{2}, \dots, x_{n} | \underset{̲}{θ}) + 2 q,

where q is the number of unknown parameters. The smaller the AIC value, the better the fitted model. The first term

- 2 log L (x_{1}, x_{2}, \dots, x_{n} | \underset{̲}{θ})

will decrease when the number of unknown parameters increases and is offset by the value of

2 q

, indicating a trade-off between the goodness-of-fit and the number of parameters.

B I C

was developed by Schwarz (1978) and defined as

B I C = - 2 log L (x_{1}, X_{2}, \dots, x_{n} | \underset{̲}{θ}) + q log (n),

where n is sample size and q is the number of parameters.

B I C

penalizes a large number of parameters and a big size of sample. The smaller the BIC, the better the fitted model. Interested readers are referred to Burnham and Anderson (2002) for more details about these model selection criteria.

Based on the sample of 462 natural disaster damage losses in the US from 1980 to 2016, the values of the

N L L, A I C,

and

B I C

and the maximum likelihood estimates of the three composite models are summarized in Table 4. For comparison, we also fit three non-composite parametric models, namely the Exponential, Lognormal, and Inverse Gamma distributions, to the natural disaster data.

We can see that three composite models fit the natural disaster losses better than three corresponding non-composite models. This supports the claim that a composite model can describe the distribution features of insurance data and natural disaster losses. Among three composite models, the LN-Pareto fit the data better than the other two composite models, in terms of the NLL, AIC, and BIC values. Therefore, the LN-Pareto composite distribution is the best model to conduct Bayesian predictive analysis of the natural disaster losses in the US.

4. The Bayesian Estimate

4.1. Bayesian Estimator of LN-Pareto

There is no analytical Bayesian estimator of the LN-Pareto composite distribution in the current literature. Cooray and Cheng (2015) found the Bayesian estimation for LN-Pareto based on the MCMC method. In this paper, we use conjugate priors for the two parameters of the LN-Pareto and make the first attempt to derive a closed-form Bayesian estimator without the MCMC simulation.

Recall the LN-Pareto probability density function given by Cooray and Ananda (2005), as listed in Table 2,

f_{X} (x | θ, β) = \{\begin{matrix} \frac{β θ^{β}}{(1 + Φ (k)) x^{β + 1}} e^{- 0.5 {(\frac{β}{k})}^{2} {ln}^{2} (x / θ)} & 0 < x \leq θ \\ \frac{β θ^{β}}{(1 + Φ (k)) x^{β + 1}} & θ \leq x < \infty \end{matrix},

(1)

where

k = 0.372238898

and

Φ (\cdot)

is the cdf of the standard normal distribution.

We use Gamma(

a_{1}, b_{1})

to be the prior distributions of

β

and LN(

c_{1}, {(\frac{k}{β})}^{2})

to be the prior distribution of

θ | β

, where

a_{1}, b_{1}, c_{1}

are hyper parameters. The prior distributions have probability density functions as follows.

\begin{matrix} ρ (β) & \propto & \frac{β^{a_{1} - 1} e^{- \frac{β}{b_{1}}}}{Γ (a_{1}) b_{1}^{a_{1}}}, a_{1} > 0, b_{1} > 0 \\ ρ (θ | β) & \propto & \frac{β}{k} e^{- \frac{1}{2 {(\frac{k}{β})}^{2}} {(ln (θ) - c_{1})}^{2}}, c_{1} > 0 \end{matrix}

Without loss of generality, we assume that

x_{1} < x_{2} < . . . < x_{n}

is an ordered random sample from the LN-Pareto distribution. Given

n^{*}

, the size of the partial sample of the first

n^{*}

losses

x_{1}, x_{2}, \dots, x_{n^{*}}

, such that

x_{n^{*}} \leq θ \leq x_{n^{*} + 1}

. The likelihood function can be written as

L (\underset{̲}{x} | θ β) \propto \frac{β^{n} θ^{n β}}{{(\prod_{i = 1}^{n} x_{i})}^{β + 1}} e^{- 0.5 {(\frac{β}{k})}^{2} \sum_{i = 1}^{n^{*}} {(ln (x_{i}) - ln (θ))}^{2}} .

To find posterior distributions

π (β | \underset{̲}{x})

and

π (θ | β, \underset{̲}{x})

, we need the joint pdf

f (\underset{̲}{x}, θ, β)

of

(\underset{̲}{x}, θ, β)

, which is obtained as

f (\underset{̲}{x}, θ, β) = L (\underset{̲}{x} | θ, β) ρ (θ | β) ρ (β)

(2)

The joint distribution function in Equation (2) can be reduced to

f (\underset{̲}{x}, θ, β) \propto β^{n + a_{1}} e^{\frac{- β (1 + b_{1} P)}{b_{1}}} e^{(- 0.5 {(\frac{β}{k})}^{2} A_{2})} θ^{n β - 1} e^{- 0.5 {(\frac{β}{k})}^{2} (n^{*} + 1) {(ln (θ) - A_{1})}^{2}},

(3)

where

P = \sum_{i = 1}^{n} ln x_{i}

,

A_{1} = \frac{\sum_{i = 1}^{n^{*}} ln (x_{i}) + c_{1}}{n^{*} + 1},

and

A_{2} = \sum_{i = 1}^{n^{*}} {ln}^{2} (x_{i}) + c_{1}^{2} - (n^{*} + 1) A_{1}^{2} .

From Equation (3), the posterior probability distribution functions can be obtained as

π (β | \underset{̲}{x}) \propto β^{n + a_{1} - 1} e^{\frac{- β (1 + b_{1} P)}{b_{1}}} e^{(- 0.5 {(\frac{β}{k})}^{2} A_{2})},

(4)

and

π (θ | β, \underset{̲}{x}) \propto β θ^{n β - 1} e^{- 0.5 \frac{{(ln (θ) - A_{1})}^{2}}{{(\frac{k}{β \sqrt{n^{*} + 1}})}^{2}}} .

(5)

It is noted that the right hand side of Equation (5) is the kernel of a lognormal distribution with parameters

A_{1}

and

\frac{k}{β \sqrt{n^{*} + 1}}

.

Our next step is to find

E [θ | β, \underset{̲}{x}]

, the expectation of the posterior distribution in Equation (5), i.e., the conditional Bayes estimate of

θ

under the squared error loss function. We first need to find the normalizating constant

C_{1}

for the probability density function in Equation (5). Let

ξ = \frac{k}{β \sqrt{n^{*} + 1}}

, then

\begin{matrix} \int_{0}^{\infty} C_{1} β θ^{n β - 1} e^{- 0.5 \frac{{(ln (θ) - A_{1})}^{2}}{{(\frac{k}{β \sqrt{n^{*} + 1}})}^{2}}} d θ \\ = & C_{1} \int_{0}^{\infty} \frac{k}{ξ \sqrt{n^{*} + 1}} θ^{n β - 1} e^{- 0.5 \frac{{(ln (θ) - A_{1})}^{2}}{ξ^{2}}} d θ \\ = & \frac{C_{1} k}{\sqrt{n^{*} + 1}} \sqrt{2 π} \int_{0}^{\infty} θ^{n β} \frac{1}{θ ξ \sqrt{2 π}} e^{- 0.5 \frac{{(ln (θ) - A_{1})}^{2}}{ξ^{2}}} d θ \\ = & \frac{C_{1} k}{\sqrt{n^{*} + 1}} \sqrt{2 π} E [θ^{n β}] \\ = & \frac{C_{1} k}{\sqrt{n^{*} + 1}} \sqrt{2 π} M_{ln (θ)} (n β) = 1, \end{matrix}

(6)

where

M_{Y} (t)

denotes the moment generating function of a random variable Y.

ln (θ) \sim Normal (A_{1}, ξ^{2})

based on the probability density functions of a lognormal distribution. As a result of Equation (6) we get

1 = \frac{C_{1} k}{\sqrt{n^{*} + 1}} \sqrt{2 π} e^{A_{1} (n β) + . 5 ξ^{2} {(n β)}^{2}} .

Therefore,

C_{1} = \frac{\sqrt{n^{*} + 1}}{\sqrt{2 π} k} e^{- (A_{1} (n β) + . 5 ξ^{2} {(n β)}^{2})}

and the conditional Bayes estimate of

θ

is

{\hat{θ}}_{B a y e s} | β = E [θ | β, \underset{̲}{x}] = e^{(A_{1} + . 5 ξ^{2} (2 n β + 1))}

(7)

The Bayes estimate of

β

can be derived based on the posterior probability distribution function of

β

in Equation (4). Let

C_{2}

denote the normalizating constant for the probability distribution function of

β

. Let

B_{1} = n + a_{1}, B_{2} = \frac{b_{1}}{1 + b_{1} P}

. We have

\begin{matrix} \int_{0}^{\infty} C_{2} e^{(- 0.5 {(\frac{β}{k})}^{2} A_{2})} β^{B_{1} - 1} e^{- \frac{β}{B_{2}}} d β \\ = & C_{2} Γ (B_{1}) B_{2}^{B_{1}} \int_{0}^{\infty} C_{2} e^{(- 0.5 {(\frac{β}{k})}^{2} A_{2})} \frac{β^{B_{1} - 1} e^{- \frac{β}{B_{2}}}}{Γ (B_{1}) B_{2}^{B_{1}}} d β = 1 \end{matrix}

Please note that

\frac{β^{B_{1} - 1} e^{- \frac{β}{B_{2}}}}{Γ (B_{1}) B_{2}^{B_{1}}}

is the probability density function of Gamma(

B_{1}, B_{2})

. As a result

C_{2} = {(Γ (B_{1}) B_{2}^{B - 1} E [e^{- 0.5 {(\frac{β}{k})}^{2} A_{2}}])}^{- 1},

and the Bayes estimate of

β

is

{\hat{β}}_{B a y e s} = E [β | \underset{̲}{x}] = \frac{B_{1} B_{2} E_{1}}{E_{2}},

(8)

where

E_{1}

and

E_{2}

are the expected values

E [e^{- 0.5 {(\frac{β}{k})}^{2} A_{2}}]

, when

β

follows Gamma(

B_{1} + 1, B_{2})

and Gamma(

B_{1}, B_{2})

distribution respectively. Numerical integration in software Mathematica is used to compute both

E_{1}

and

E_{2}

. Similar to the ML estimation method, the following grid searching method is used in Bayesian estimation:

Sort the sample of size n in increasing order, i.e., $x_{1} < x_{2} < . . ., < x_{n}$ and let $n^{*}$ be the size of the partial sample of the first $n^{*}$ losses $x_{1}, x_{2}, \dots, x_{n^{*}}$ . Start from $n^{*} = 1$ .
Compute the Bayes estimate $β$ via (8) for the given $n^{*}$ .
Compute the conditional Bayes estimate of $θ$ via (7), given ${\hat{β}}_{B a y e s}$ from Step 2. If $x_{n^{*} + 1} \leq {\hat{θ}}_{B a y e s | β} \leq x_{n^{*} + 1}$ , then we found $n^{*}$ . otherwise, increase $n^{*}$ by 1.
Repeat Step 2 and 3 for $n^{*} = 2, 3, \dots$ till $x_{n^{*} + 1} \leq {\hat{θ}}_{B a y e s | β} \leq x_{n^{*} + 1}$ , and we found the correct $n^{*}$ .

The values of

{\hat{β}}_{B a y e s}

and

{\hat{θ}}_{B a y e s | β}

found based on the correct

n^{*}

, represent the actual Bayes estimates of the parameters

β

and

θ

. We therefore obtain analytical Bayesian estimates of the LN-Pareto composite model without simulation.

4.2. Validation by Simulation

We employ simulation studies to validate the accuracy of the proposed Bayesian estimation method for the LN-Pareto distribution, and compare it with the ML estimation method. In each simulation, we generate a sample of n observations from the LN-Pareto composite density function given by Equation (1) for different selected values of

θ

and

β

. Then we obtain the ML estimates and Bayesian estimates of these two parameters, based on the simulated samples. We repeat the simulation for

N = 100

times. The average value of the estimates (denoted by

\bar{\hat{θ}}

and

\bar{\hat{β}}

) and the mean squared errors (MSE, denoted by

ϵ

) are computed. MSEs show the differences between the true values and the estimated values of the parameters and intuitively indicate the performance of the estimation method.

Bayesian estimates of

θ

and

β

needs appropriate values of the hyper-parameters

a_{1}, b_{1}

, and

c_{1}

. Recall that the prior distribution for

β

is a Gamma distribution with hyper-parameter

a_{1}

and

b_{1}

. Therefore, when specifying the values of hyper-parameter

a_{1}

and

b_{1}

, we need to make sure the product of

a_{1}

and

b_{1}

, which is the expectation of the Gamma prior distribution, equal to the preset value of

β

. In addition, since the variance of the Gamma prior distribution is proportional to

b_{1}^{2}

, we choose

b_{1}

very small and then solve

a_{1}

from the equation

a_{1} * b_{1} = β

. For example, when the selected true value for

β

is 0.5,

a_{1}

and

b_{1}

are chosen to be

a_{1} = 100

and

b_{1} = 0.005

such that

b_{1}

is small and the product of

a_{1}

and

b_{1}

is 0.5.

Similarly, the conditional prior of

θ | β

are assumed to follow a log-normal distribution LN

(c_{1}, {(\frac{k}{β})}^{2})

, therefore we choose

c_{1}

to be the solution of the equation

exp (c_{1} + 0.5 {(\frac{k}{β})}^{2}) = θ

, based on the expectation of the log-normal distribution. Table 5 lists Bayesian estimates and the ML estimates using simulated data from the LN-Pareto composite distribution with different values of the parameters.

From Table 5, we can see that the informative Bayesian estimates outperform the ML estimates in both cases

β < 1

and

β > 1

, for Bayesian estimates have smaller MSEs than the ML estimates in all the simulation scenarios. Please note that as sample size n increases, both ML estimation and Bayes method provide more accurate estimates for

θ

, in terms of smaller MSEs when n increases. However, in Equation (8) the Bayesian estimate of

β

is proportional to n since

B_{1} = n + a_{1}

. As n increases, the MSE of Bayesian estimate of

β

increases slightly but is still much smaller than the MSE of the ML estimates. These simulation results indicate that Bayesian estimates are consistently better than the ML estimates if choosing reasonable hyper-parameters.

4.3. Bayesian Estimates of Three Composite Models

Aminzadeh and Deng (2018, 2019) have derived closed-form Bayesian estimators of the unknown parameter of Exp-Pareto and IG-Pareto distributions. The Bayesian estimate of

θ

in Exp-Pareto distribution is

{\hat{θ}}_{B a y e s} = \frac{b_{2} + 1.35 \sum_{i = 1}^{n^{*}} x_{i}}{a_{2} - 0.35 n + 1.35 n^{*} - 1},

where

a_{2}

and

b_{2}

are hyper-parameters, n is the sample size and equal to 462 for our data, and

n^{*}

is the size of partial sample, as aforementioned.

The Bayesian estimate of

θ

in IG-Pareto distribution is

{\hat{θ}}_{B a y e s} = \frac{a_{3} (n a + n^{*} k + b_{3})}{(a_{3} k \sum_{i = 1}^{n^{*}} \frac{1}{x_{i}} + 1)},

where

a_{3}

and

b_{3}

are hyper-parameters, and

k = 0.144351, a = 0.163847

, as specified in Table 2, for the IG-Pareto distribution.

Based on the Bayesian estimators given by Aminzadeh and Deng (2018, 2019) and the Bayesian estimators for the LN-Pareto distribution derived in this research, we have closed form Bayesian estimators for all three composite models. Based on the CPI adjusted natural disaster losses from 1980 to 2016 in the US, we obtain analytical Bayesian estimates of the three composite models for natural loss severity in Table 6.

2

Bayesian estimation also shows that the LN-Pareto is the best model among all three composite distribution models for the smallest NLL, AIC, and BIC values. Comparing the NLL, AIC, and BIC values in Table 4 and Table 6, we can see that Bayesian estimation method has smaller values of these three criteria when fitting three composite models to the natural losses data. Kass and Raftery (1995) claimed that difference of 10 or more is strong evidence to favor the model with a smaller BIC value. Although the advantage of Bayesian estimate over the ML estimate is marginal especially in fitting Exp-Pareto and LN-Pareto to the data, Bayesian estimation overall performs better than the ML estimation method. The advantage of Bayesian method will be significant and favorable when the sample size is small.

5. Risk Measures

In this section, we are going to investigate risk measures of the loss severity of natural events, based on the LN-Pareto composite model. Two important risk measures, Value at Risk (VaR) and Tailed Value at Risk (TVaR), are used in our research. As comparison, we also display VaR and TVaR of loss severity based on the other two composite models.

Value at Risk and Tailed Value at Risk

VaR is a point risk measurement and describes the minimum loss with the desired level of confidence. Given a level of confidence p and a cumulative distribution function

F (x)

of a loss random variable X, VaR is defined as

Pr (X \leq V a R_{p} (X)) = p, i . e ., V a R_{p} (X) = F^{- 1} (p) .

For example, if the VaR of the natural disaster loss severity is $100 million at a 95% confidence level, there is a only a 5% chance that the damage from a natural event will be more than $100 million in any natural disaster. This risk measure can be used by an insurance company to assess reinsurance need and risk management, so that the losses can be covered without putting the company at risk.

TVaR was developed as an alternative to VaR. It describes the average loss over VaR for a given confidence level p. Mathematically put,

T V a R_{p} (X) = E [X | X > V a R_{p} (X)] = \frac{\int_{V a R_{p} (x)}^{\infty} x f (x) d x}{1 - p} .

In three composite distributions, the Pareto distribution is used to model large losses with small frequencies. However, the expectation of the Pareto distribution does not exists if the shape parameter is smaller than 1. This is true for all fitted composite distributions in our research. Therefore, we define Limited Tailed Value at Risk (LTVaR) as

L T V a R = E [(X \land b) | X > V a R_{p} (X)] = \frac{\int_{V a R_{p} (x)}^{b} x f (x) d y + \int_{b}^{\infty} b f (x) d y}{1 - p},

where b is the maximum liability of a loss. From its definition, LTVaR is the average of the losses that are great than VaR but capped by the loss limit b, and therefore greater than the limited expectation. LTVaR is a very useful measure and can be easily implemented in insurance because this concept matches the maximum insurance benefit of an insurance policy.

Table 7 displays the derived VaR

_{p}

and LTVaR

_{p}

at

85 %

confidence level from the three Bayesian estimated and ML estimated composite distributions. b is chosen to be

10^{5}

(’0000000 US$) for LTVaR

_{0.85}

. Table 7 shows that the theoretical VaR

_{0.85} (X)

based on three composite models are different when a different estimation method is used.

For the IG-Pareto model, the Bayesian estimation method is significantly better than the ML estimation and these two estimation methods result in dramatic difference in the theoretical VaR

_{0.85} (X)

and LTVaR

_{0.85} (X)

. For the other two composite models, although the Bayesian estimation method has marginally lower BIC values than the ML estimation method, the theoretical VaR

_{0.85} (X)

and LTVaR

_{0.85} (X)

are significantly different. Therefore, we need to be cautious in the choice of model estimation methods when calibrating a composite model.

Secondly, large values of theoretical VaR

_{0.85} (X)

and LTVaR

_{0.85} (X)

indicates that composite distributions do take care of the fat tail problem in our real world situation and that the average loss in the worst cases will not be underestimated by a composite model. Moreover, both the ML estimation and Baysian estimation methods confirm that the LN-Pareto fits the US natural disaster data better than the other two composite models, and our simulation validation has verified that the Bayesian estimation method for the LN-Pareto distribution is more accurate than the ML estimation; therefore, we would like to put more weight on the result from the Bayesian estimated LN-Pareto model, as highlighted by bold font in in Table 7.

6. Conclusions

In this paper, we propose using composite distributions to model natural disaster losses. A composite model piece-wisely models the typical feature of insurance losses, that is, high frequency of small amount of losses and low frequency of large amount of losses. We use the US natural disaster data from 1980 to 2016 in our research, considering the change in natural disasters’ occurrence after 1980 due to climate change.

There are a total of 462 natural disasters during 1980 to 2016. After converting the amounts of losses into the year 2016 dollars, we test the assumption that natural disaster severity random variables in different years are independent and identically distributed. Our tests support the i.i.d assumption and we are able to use all the natural losses as realizations of one natural disaster severity random variable.

Based on the sample of 462 natural losses, we compare the performance of three composite distributions in modeling the natural losses in the US, namely Exp-Pareto, IG-Pareto, and LN-Pareto distributions. Based on the ML estimation method, we find that composite distributions fit the natural disaster losses better than the corresponding non-composite distributions according to the NLL, AIC, and BIC measures. In addition, we also find that the LN-Pareto model is the best one among these three composite distributions,

We make the first attempt to derive analytical Bayesian estimates for the LN-Pareto model. Simulation studies are conducted to assess the performance of the derived Bayesian estimates. The values of the MSEs from simulation show that the analytical Bayesian estimation performs better than the ML estimation method for the LN-Pareto distribution with various values for its parameters. The simulation study also reveals that Bayesian method is superior to ML in particular if the sample size is not very large.

In this research, the MCMC method is not used since we derived closed form Bayesian estimates for the LN-Pareto model. Based on the analytical Bayesian estimates of the three composite models, it is confirmed that LN-Pareto composite model is the best fit to the natural disaster losses. Bayesian estimation is proven to perform better than the ML estimation, according to the NLL, AIC, and BIC values.

Several risk measures for natural losses based on these three composite models are thereafter presented and compared. The differences in the derived risk measures from different composite distributions and different estimation methods reveal the importance of choosing an appropriate composite model for modeling natural losses and the difficulty in estimating the model.

Our research provides alternative information for insurance and risk management of natural disasters. We acknowledge the sparseness of natural disaster data. There are only 462 individual natural losses in the past 37 years and we rely on the CPI data to convert the losses to be consistent and free of the effect of price inflations over years. In the future research, we will continue to investigate the features of composite models and explore Bayesian model selection in predictive analysis of natural losses.

Author Contributions

Conceptualization, M.D., M.A. and M.J.; Methodology, M.D. and M.A.; Software, M.D. and M.A.; Validation, M.A.; Formal analysis, M.D. and M.A.; Investigations, M.D. and M.A.; Resources, M.D.; Data curation, M.D.; writing—original draft preparation, M.J.; writing—review and editing, M.D. and M.A.; visualization, M.D. and M.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Number and Loss Amounts of Natural Events from 1900 to 2016

Table A1. The number of occurrence of the natural events breakdown by time, type of natural events from 1900 to 2016.

	1900 to 1980	1980 to 2016	1900 to 2016	% of All Events
Drought	$\underset{(9.09 %)}{1}$	$\underset{(90.91 %)}{10}$	11	4.26%
Earthquake	$\underset{(43.75 %)}{14}$	$\underset{(56.25 %)}{18}$	32	12.40%
Epidemic	$\underset{(0.00 %)}{0}$	$\underset{(100.00 %)}{4}$	4	1.55%
Extreme temperature	$\underset{(29.03 %)}{9}$	$\underset{(70.97 %)}{22}$	31	12.02%
Flood	$\underset{(32.69 %)}{17}$	$\underset{(67.31 %)}{35}$	52	20.16%
Landslide	$\underset{(40.00 %)}{2}$	$\underset{(60.00 %)}{3}$	5	1.94%
Storm	$\underset{(58.70 %)}{54}$	$\underset{(41.30 %)}{38}$	92	35.66%
Volcanic activity	$\underset{(0.00 %)}{0}$	$\underset{(100.00 %)}{1}$	1	0.39%
Wildfire	$\underset{(16.67 %)}{5}$	$\underset{(83.33 %)}{25}$	30	11.63%
Total	$\underset{(39.53 %)}{102}$	$\underset{(60.47 %)}{156}$	258	100.00%

Table A2. The adjusted total damage (’0000000 US$) breakdown by time, type of natural events from 1900 to 2016.

	1900 to 1979	1980 to 2016	1900 to 2016	% of All Damages
Drought	$\underset{(0.00 %)}{0}$	$\underset{(100.00 %)}{4371.623471}$	4371.623471	3.62%
Earthquake	$\underset{(19.08 %)}{1521.333617}$	$\underset{(80.92 %)}{6454.101764}$	7975.435381	6.60%
Epidemic	$\underset{(0.00 %)}{0}$	$\underset{(0.00 %)}{0}$	0	0.00%
Extreme temperature	$\underset{(43.24 %)}{1633.100124}$	$\underset{(56.76 %)}{2143.756251}$	3776.856375	3.13%
Flood	$\underset{(30.94 %)}{4260.932883}$	$\underset{(69.06 %)}{9511.711943}$	13,772.64483	11.40%
Landslide	$\underset{(0.00 %)}{0}$	$\underset{(100.00 %)}{2.027634158}$	2.027634158	0.00%
Storm	$\underset{(11.57 %)}{10, 143.84049}$	$\underset{(88.43 %)}{77, 499.01392}$	87,642.85441	72.54%
Volcanic activity	$\underset{(0.00 %)}{0}$	$\underset{(100.00 %)}{250.4927427}$	250.4927427	0.21%
Wildfire	$\underset{(8.34 %)}{253.0904446}$	$\underset{(91.66 %)}{2782.634167}$	3035.724612	2.51%
Total	$\underset{(14.74 %)}{17, 812.29756}$	$\underset{(85.26 %)}{103, 015.3619}$	120,827.6595	100.00%

Appendix B. Damage Losses from Natural Events from 1980 to 2016

Table A3. Individual damage losses from natural event (’000000 US$) from 1980 to 2016.

Year	# of Losses	$1 st$ Loss	$2 nd$ Loss	$3 th$ Loss	$4 th$ Loss	$5 th$ Loss	…
1980	6	1019.45	87.38	58.25	2504.93	5825.41	2504.93
1981	2	1056.14	1217.20
1982	8	572.04	2487.12	84.31	99.48	248.71	104.46	497.42	1243.56
1983	9	7229.13	2409.71	74.70	15.06	1265.10	722.91	240.97	313.26	36.15
1984	10	1683.05	80.85	2309.98	69.30	46.20	39.27	80.85	265.65	69.30	1385.99
1985	10	2453.60	2007.49	3345.82	26.99	758.39	1784.44	446.11	22.31	516.59	0.89
1986	5	1.58	3832.23	87.59	65.70	54.75
1987	7	450.01	8.45	242.96	33.80	122.54	21.13	10.56
1988	0
1989	4	13,548.78	10,839.03	735.51	967.77
1990	6	23.32	73.45	183.63	918.16	64.27	82.63
1991	9	59.03	2643.25	52.86	4405.41	52.86	1762.17	1497.84	1762.17	590.33
1992	8	128.30	171.07	145.41	8553.35	5132.01	153.96	171.07	45,332.75
1993	7	315.58	207.62	166.09	8304.74	19,931.38	1660.95	12.46
1994	7	161.95	3.24	404.87	48,584.41	1133.64	809.74	3.40
1995	12	196.86	3307.18	4724.55	1330.75	1102.39	3149.70	4724.55	157.48	15.75	15.75
		3149.70	4724.55
1996	6	1070.78	5200.92	13.00	2294.52	764.84	30.59
1997	17	269.17	3.74	2.99	366.37	74.77	373.84	224.31	299.07	747.69	149.54
		224.31	299.07	89.72	747.69	747.69	2243.06	7476.85
1998	25	2061.41	2945.61	1472.44	406.39	690.57	6294.66	220.87	147.24	88.35	6.63
		73.62	0.88	92.03	1472.44	2.94	1774.21	544.80	663.33	92.03	295.22
		2208.65	736.22	397.56	92.03	295.22
1999	18	648.28	144.06	3976.11	288.84	1440.62	216.09	132.54	100.84	10,084.33	288.84
		144.06	1440.62	10.08	90.04	288.12	0.43	1584.68	432.19
2000	16	627.20	292.69	2090.65	139.38	39.72	11.29	1393.77	231.37	69.69	125.44
		305.24	13.94	27.88	487.82	1533.15	696.88
2001	12	338.80	31.17	2.44	8131.24	17.62	27.10	13.55	40.66	5.42	9.49
		4.07	2710.41
2002	16	533.65	267.49	6.67	17.34	5.34	2935.05	26.68	267.49	1334.11	26.68
		400.23	933.88	2668.23	601.02	8.81	4402.57
2003	13	6521.93	32.61	5217.54	138.26	4395.78	260.88	521.75	22.17	65.22	4565.35
		4.43	2739.21	260.88
2004	14	381.17	5.72	1397.61	889.39	76.23	0.22	20,328.81	79.41	13,976.06	22,869.91
		10,164.40	2.67	1.27	635.28
2005	11	307.23	245.78	36.87	430.12	6.55	430.12	2740.48	19,662.63	17,573.48	301.08
		122.89
2006	20	1428.61	714.31	1904.82	308.34	14.29	535.73	101.19	1190.51	8.33	19.05
		119.05	18.45	39.12	29.76	113.10	357.15	29.76	178.58	107.15	428.58
2007	15	32.41	578.77	810.28	150.48	2893.85	364.63	1157.54	578.77	162.06	405.14
		2315.08	347.26	347.26	694.52	347.26
2008	16	501.63	2.23	780.32	1783.58	113.70	1337.69	200.65	33,442.22	1449.16	2229.48
		668.84	1114.74	1226.21	11,147.41	122.62	401.31
2009	12	185.71	1901.83	111.87	268.49	2796.80	559.36	1230.59	671.23	2237.44	1118.72
		950.91	1678.08
2010	7	2586.57	2971.80	13.76	2201.33	110.07	1651.00	550.33
2011	14	2133.97	195.26	11,736.86	14,937.82	2027.28	7789.01	1066.99	3200.96	800.24	3734.45
		4908.14	213.40	2133.97	8535.90
2012	22	182.94	1620.30	1881.64	219.52	181.89	627.21	2090.71	52,267.70	2.09	52.27
		4181.42	209.07	522.68	5226.77	4704.09	104.54	3554.20	1463.50	1986.17	731.75
		219.52	20,907.08
2013	26	1648.42	1133.29	309.08	3193.82	309.08	2163.55	22.05	515.13	927.24	2.06
		25.76	25.76	2.06	334.84	180.30	309.08	1957.50	10.30	1339.34	2.06
		206.05	103.03	2266.58	103.03	103.03	1133.29
2014	19	2.03	2027.63	101.38	3953.89	273.73	66.91	1622.11	709.67	172.35	101.38
		253.45	91.24	212.90	253.45	1622.11	760.36	2534.54	2230.40	20.28
2015	28	172.14	506.31	1417.66	961.98	1012.62	162.02	1417.66	2734.06	658.20	101.26
		81.01	2.03	708.83	101.26	961.98	151.89	1417.66	2.03	1721.45	101.26
		273.41	141.77	911.35	607.57	405.05	151.89	3037.85	1822.71
2016	25	550.00	125.00	3900.00	2000.00	2400.00	1000.00	1100.00	300.00	1000.00	150.00
		50.00	10,000.00	100.00	600.00	550.00	10,000.00	1200.00	275.00	20.00	1200.00
		100.00	2300.00	1600.00	1200.00	2300.00

References

Akaike, Hirotogu. 1973. Information theory and an extension of the maximum likelihood principle. In Proceedings of the 2nd International Symposium on Information Theory, Tsahkadsor, USSR, Armenia, 2–8 September 1971. Edited by B. N. Petrov and F. Csaki. Budapest: Akademia Kiado, pp. 267–81. [Google Scholar]
Aminzadeh, Mostafa S., and Min Deng. 2018. Bayesian Predictive Modeling for Exponential-Pareto Composite Distribution. Variance 12: 59–68. [Google Scholar]
Aminzadeh, Mostafa S., and Min Deng. 2019. Bayesian Predictive Modeling for Inverse Gamma-Pareto Composite Distribution. Communications in Statistics-Theory and Methods 48: 1938–54. [Google Scholar] [CrossRef]
Bakar, S.A. Abu, Nor A. Hamzah, Mastoureh Maghsoudi, and Saralees Nadarajah. 2015. Modeling loss data using composite models. Insurance: Mathematics and Economics 61: 146–54. [Google Scholar]
Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach, 2nd ed. New York: Springer. [Google Scholar]
Cooray, Kahadawala, and Chin-I Cheng. 2015. Bayesian estimators of the lognormal-Pareto composite distribution. Scandinavian Actuarial Journal 6: 500–15. [Google Scholar] [CrossRef]
Cooray, Kahadawala, and Malwane M.A. Ananda. 2005. Modeling actuarial data with a composite lognormal-Pareto model. Scandinavian Actuarial Journal 5: 321–34. [Google Scholar] [CrossRef]
Gibbons, Jean Dickinson, and Subhabrata Chakraborti. 2003. Nonparametric Statistical Inference, 4th ed., Revised and Expanded. Statistics Textbooks and Monographs. Boca Raton: CRC Press, vol. 168. [Google Scholar]
Kass, Robert E., and Adrian E. Raftery. 1995. Bayes Factors. Journal of the American Statistical Association 90: 773–95. [Google Scholar] [CrossRef]
Levi, Charles, and Christian Partrat. 1991. Statistical Analysis of Natural Events in the United States. ASTIN Bulletin: The Journal of the IAA 21: 253–76. [Google Scholar] [CrossRef] [Green Version]
Schwarz, Gideon. 1978. Estimating the dimension of a model. The Annals of Statistics 6: 461–64. [Google Scholar]
Teodorescu, Sandra, and Raluca Vernic. 2006. A composite Exponential-Pareto distribution. The Annals of the Ovidius, University of Constanta, Mathematics Series 1: 99–108. [Google Scholar]

1	See http://www.nytimes.com/interactive/2011/05/01/weekinreview/01safe.html.
2	https://www.emdat.be/emdat_db/.
3	The CPI data was download from Bureau of Labor Statistics https://data.bls.gov/pdq/SurveyOutputServlet.

Figure 1. The number and damage costs of natural events from 1900 to 2016 in the US. (a) The number of natural events; (b) Total damage in 2016 dollars (’0000000 US$).

Figure 2. The percentage of the number of occurrence and damage costs of natural disasters to the total occurrence and damage losses by different types of natural events before and after 1980 in the US. (a) Percent of the number of occurrence; (b) Percent of the damage costs.

Figure 3. The scatter plot of individual natural disaster losses after 1980 in the US.

Figure 4. The histogram and the empirical distribution with 95% confidence bands of the CPI adjusted natural losses from 1980 to 2016 in the US. (a) The histogram of the CPI adjusted natural losses; (b) The empirical distribution with 95% confidence bands of the CPI adjusted natural losses.

Figure 5. The cumulative distribution functions of three composite models with various parameter values.

Table 1. Pairs of severity losses with significant Kendall and Spearman test results.

Pairs	Kendall Tau Statistics (p-Value)	Spearman Statistics (p-Value)
$X_{1}, X_{5}$	$\underset{(0.002)}{- 0.373}$	$\underset{(0.001)}{- 0.546}$
$X_{7}, X_{21}$	$\underset{(0.0051)}{0.393}$	$\underset{(0.003)}{0.566}$
$X_{9}, X_{16}$	$\underset{(0.009)}{- 0.575}$	$\underset{(0.0034)}{- 0.769}$

Table 2. The distribution density functions of three composite models.

	$f_{1} (x)$	$f_{2} (x)$	The Composite pdf $f (x)$
Exp-Pareto	$f_{1} (x) = λ e^{- λ x}$ , $x > 0, λ > 0$	$f_{2} (x) = \frac{α θ^{α}}{x^{α + 1}},$ $x \geq θ > 0, α > 0$	$f (x) = \{\begin{matrix} \frac{0.775}{θ} e^{\frac{- 1.35 x}{θ}}, & 0 < x \leq θ \\ \frac{0.2 θ^{0.35}}{x^{1.35}}, & θ \leq x < \infty \end{matrix}$
IG-Pareto	$f_{1} (x) = \frac{β^{α} x^{- α - 1} e^{- β / x}}{Γ (α)}$ , $x > 0, α > 0, β > 0$	$f_{2} (x) = \frac{a θ^{a}}{x^{a + 1}},$ $x \geq θ, a > 0, θ > 0$	$f (x) = \{\begin{matrix} \frac{c {(k θ)}^{α} x^{- α - 1} e^{\frac{- k θ}{x}}}{Γ (α)}, & 0 < x \leq θ \\ \frac{c (α - k) θ^{α - k}}{x^{α - k + 1}}, & θ \leq x < \infty \end{matrix}$ where $α = 0.308289$ , $k = 0.144351$ , $c = 0.711384$
LN-Pareto	$f_{1} (x) = \frac{e^{- \frac{1}{2} {(\frac{l n x - μ}{σ})}^{2}}}{x σ \sqrt{2 π}},$ $x > 0, σ > 0$	$f_{2} (x) = \frac{α θ^{α}}{x^{α + 1}},$ $x \geq θ, α > 0, θ > 0$	$f (x) = \{\begin{matrix} \frac{β θ^{β} e^{- 0.5 {(\frac{β}{k})}^{2} {ln}^{2} (x / θ)}}{(1 + Φ (k)) x^{β + 1}}, & 0 < x \leq θ \\ \frac{β θ^{β}}{(1 + Φ (k)) x^{β + 1}}, & θ \leq x < \infty \end{matrix}$ where $k = 0.372238898$

Table 3. The ML estimates of the unknown parameter in three composite models.

Composite Model	ML Estimates of Parameters
Exp-Pareto	$\hat{θ} = \frac{1.35 n^{} {\bar{x}}_{n^{}}}{1.35 n^{} - 0.35 n}$ , where ${\bar{x}}_{n^{}} = \frac{1}{n^{}} \sum_{i = 1}^{n^{}} x_{i}$ .
IG-Pareto	$\hat{θ} = \frac{n^{} α + (α - k) n^{}}{k S}$ , where $S = \sum_{i = 1}^{n^{*}} x_{i}^{- 1}$ , $α = 0.308289$ , $k = 0.144351$ .
LN-Pareto	If $n^{} = 1$ , $\hat{θ} = x_{1} {(\prod_{i = 1}^{n^{}} x_{i} / x_{1})}^{k^{2}}$ , $\hat{β} = n^{} {(\sum_{i = 1}^{n^{}} ln (x_{i} / x_{1})}^{- 1}$ , where $k = 0.372238898$ ; otherwise, $\hat{θ} = e x p (\frac{n k^{2}}{n^{} \hat{β}}) {(\prod_{i = 1}^{n^{}} x_{i})}^{\frac{1}{n^{}}}$ , $\hat{β} = \frac{k^{2} B + \sqrt{k^{4} B^{2} + 4 n^{} n k^{2} A}}{2 A}$ , where $A = n^{} \sum_{i = 1}^{n^{}} {(l n x_{i})}^{2} - {(\sum_{i = 1}^{n^{}} l n x_{i})}^{2}$ , $B = n \sum_{i = 1}^{n^{}} l n x_{i} - n^{*} \sum_{i = 1}^{n} l n x_{i}$ .

Table 4. The ML estimates of three composite models and three non-composite models for the severity of natural events in the US.

Model	ML Estimates of Parameters	$NLL$	$AIC$	$BIC$
Exp-Pareto	$\hat{θ} = 25.561$	2698.02	5398.05	5402.19
	$n^{*} = 189$
IG-Pareto	$\hat{θ} = 2.86262$	2719.35	5440.7	5444.83
	$n^{*} = 64$
LN-Pareto	$\hat{θ} = 20.94751406$	2327.74	4659.49	4667.76
	$\hat{β} = 0.22044516$
	$n^{*} = 174$
Exponential $X \sim$ Exp( $λ$ )	$\hat{λ} = 5.32 \times 10^{- 3}$	$2881.23$	$5764.47$	$5768.61$
Inverse Gamma	$\hat{α} = 0.254669$	2813.29	5630.58	5638.85
$X \sim$ IG( $α, β$ )	$\hat{β} = 0.527914$
Lognormal	$\hat{μ} = 3.50972$	3058.66	6121.31	6129.58
$X \sim$ LN( $μ, σ$ )	$\hat{σ} = 2.1659785$

Table 5. Comparison of the ML and Bayesian estimates.

$θ = 5$ $β = 0.5$ ( $a_{1} = 100, b_{1} = 0.005, c_{1} = 1.33231$ )
n	${\bar{\hat{θ}}}_{M L}$	$ϵ_{θ_{M L}}$	${\bar{\hat{β}}}_{M L}$	$ϵ_{β_{M L}}$	${\bar{\hat{θ}}}_{B a y e s}$	$ϵ_{θ_{B a y e s}}$	${\bar{\hat{β}}}_{B a y e s}$	$ϵ_{β_{B a y e s}}$
20	7.8604	4.0640	0.4747	0.1611	5.6933	2.0959	0.4641	0.0402
50	7.8942	3.8006	0.4435	0.1121	5.7941	1.2265	0.4338	0.0688
100	7.6421	3.6415	0.4333	0.1056	5.8745	1.1874	0.4058	0.0960
$θ = 20 β = 0.5$ ( $a_{1} = 560, b_{1} = 0.001, c_{1} = 2.71861$ )
n	${\bar{\hat{θ}}}_{M L}$	$ϵ_{θ_{M L}}$	${\bar{\hat{β}}}_{M L}$	$ϵ_{β_{M L}}$	${\bar{\hat{θ}}}_{B a y e s}$	$ϵ_{θ_{B a y e s}}$	${\bar{\hat{β}}}_{B a y e s}$	$ϵ_{β_{B a y e s}}$
20	21.8966	12.2540	0.5878	0.2626	20.5062	5.5390	0.5341	0.0345
50	19.9977	4.9775	0.5738	0.2431	20.5262	3.3892	0.5025	0.0065
100	19.5730	3.6691	0.5467	0.1830	21.2182	3.0585	0.4629	0.0379
$θ = 5 β = 1.5$ ( $a_{1} = 3000, b_{1} = 0.0005, c_{1} = 1.57865$ )
n	${\bar{\hat{θ}}}_{M L}$	$ϵ_{θ_{M L}}$	${\bar{\hat{β}}}_{M L}$	$ϵ_{β_{M L}}$	${\bar{\hat{θ}}}_{B a y e s}$	$ϵ_{θ_{B a y e s}}$	${\bar{\hat{β}}}_{B a y e s}$	$ϵ_{β_{B a y e s}}$
20	5.342	1.2443	1.510	0.3811	5.076	0.3877	1.480	0.02047
50	5.041	0.5606	1.528	0.2181	5.038	0.2526	1.451	0.0492
100	5.035	0.2344	1.516	0.1389	5.111	0.2196	1.406	0.0941
$θ = 20 β = 1.5$ ( $a_{1} = 3000, b_{1} = 0.0005, c_{1} = 2.96494$ )
n	${\bar{\hat{θ}}}_{M L}$	$ϵ_{θ_{M L}}$	${\bar{\hat{β}}}_{M L}$	$ϵ_{β_{M L}}$	${\bar{\hat{θ}}}_{B a y e s}$	$ϵ_{θ_{B a y e s}}$	${\bar{\hat{β}}}_{B a y e s}$	$ϵ_{β_{B a y e s}}$
20	20.698	3.3162	1.560	0.4157	20.556	1.8798	1.460	0.04030
50	20.763	3.0164	1.468	0.2479	20.460	1.0797	1.404	0.09601
100	20.218	2.6237	1.466	0.2839	20.637	0.9563	1.324	0.1762

Table 6. Bayesian estimates of three composite models for the severity of natural events in the US.

Model	Prior Distributions	Bayesian Estimates	$NLL$	$AIC$	$BIC$
Exp-Pareto	$θ \sim$ Inverse-Gamma(10, 5)	$\hat{θ} = 23.1451$ $n^{*} = 183$	2697.57	5397.13	5401.27
IG-Pareto	$θ \sim$ Gamma(50, 1)	$\hat{θ} = 4.3818$ $n^{*} = 77$	2699.17	5400.34	5404.48
LN-Pareto	$θ \sim$ LN(1.61352, 2.857) $β \sim$ Gamma(20,500, 1.1 × 10 $^{- 6}$ )	$\hat{θ} = 19.2316$ $\hat{β} = 0.220173$ $n^{*} = 168$	2327.63	4659.27	4667.54

Table 7. Risk measures (in ’0000000 US$) of natural events’ severity from three composite models.

	Models	${VaR}_{0.85} (x)$	${LTVaR}_{0.85} (x)$
Bayesian Estimation	Exp-Pareto	1073	30,723
	IG-Pareto	58,212	98,041
	LN-Pareto	11,070	75,860
ML Estimation	Exp-Pareto	1185	31,770
	IG-Pareto	38,029	94,619
	LN-Pareto	11,963	76,946

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, M.; Aminzadeh, M.; Ji, M. Bayesian Predictive Analysis of Natural Disaster Losses. Risks 2021, 9, 12. https://0-doi-org.brum.beds.ac.uk/10.3390/risks9010012

AMA Style

Deng M, Aminzadeh M, Ji M. Bayesian Predictive Analysis of Natural Disaster Losses. Risks. 2021; 9(1):12. https://0-doi-org.brum.beds.ac.uk/10.3390/risks9010012

Chicago/Turabian Style

Deng, Min, Mostafa Aminzadeh, and Min Ji. 2021. "Bayesian Predictive Analysis of Natural Disaster Losses" Risks 9, no. 1: 12. https://0-doi-org.brum.beds.ac.uk/10.3390/risks9010012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Predictive Analysis of Natural Disaster Losses

Abstract

1. Introduction

2. Natural Losses in the US

2.1. The Data

2.2. The i.i.d. Assumption for Loss Severity

2.3. Non-Parametric Distribution of Loss Severity

3. Composite Models

3.1. Three Composite Distributions

3.2. Model Selection for Loss Severity

4. The Bayesian Estimate

4.1. Bayesian Estimator of LN-Pareto

4.2. Validation by Simulation

4.3. Bayesian Estimates of Three Composite Models

5. Risk Measures

Value at Risk and Tailed Value at Risk

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Number and Loss Amounts of Natural Events from 1900 to 2016

Appendix B. Damage Losses from Natural Events from 1980 to 2016

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI