Next Article in Journal
Starlikeness Condition for a New Differential-Integral Operator
Next Article in Special Issue
A Study on the X ¯ and S Control Charts with Unequal Sample Sizes
Previous Article in Journal
An Adaptive Image Watermarking Method Combining SVD and Wang-Landau Sampling in DWT Domain
Previous Article in Special Issue
On the Smoothing of the Generalized Extreme Value Distribution Parameters Using Penalized Maximum Likelihood: A Case Study on UVB Radiation Maxima in the Mexico City Metropolitan Area
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Diagnostic Analytics for an Autoregressive Model under the Skew-Normal Distribution

1
School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China
2
School of Mathematics, Shanghai University of Finance and Economics, Shanghai 200433, China
3
School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile
4
Faculty of Science and Technology, University of Canberra, Bruce, ACT 2617, Australia
5
School of Engineering in Statistics, Universidad Católica del Maule, Talca 3466706, Chile
*
Author to whom correspondence should be addressed.
Submission received: 19 March 2020 / Revised: 20 April 2020 / Accepted: 22 April 2020 / Published: 2 May 2020
(This article belongs to the Special Issue Statistical Simulation and Computation)

Abstract

:
Autoregressive models have played an important role in time series. In this paper, an autoregressive model based on the skew-normal distribution is considered. The estimation of its parameters is carried out by using the expectation–maximization algorithm, whereas the diagnostic analytics are conducted by means of the local influence method. Normal curvatures for the model under four perturbation schemes are established. Simulation studies are conducted to evaluate the performance of the proposed procedure. In addition, an empirical example involving weekly financial return data are analyzed using the procedure with the proposed diagnostic analytics, which has improved the model fit.

1. Introduction

For time series data analysis, autoregressive (AR) modeling is an essential technique and has been applied in many areas including biology, chemistry, earth sciences, economics, education, engineering, finance, health, medicine, and physics; see, for example, [1,2] for recent accounts of time series modeling and applications. Issues related to estimation and testing for AR models are extensive and well-established; see [3,4] for related issues to statistical diagnostics which are of equal importance. Local influence diagnostics is the study of how relevant minor perturbations impact the fit of the model and the results of statistical inference. This has become a useful statistical methodology after [5] introduced the idea of local influence to aid in the identification of potentially influential observations. Diagnostic analytics is used in a number of regression and time series models. Among others, Refs. [6,7,8,9,10,11,12,13,14,15] investigated the local influence of linear or nonlinear regression models under non-normal distributional assumptions. In a framework of time series data, Refs. [16,17,18] considered influence diagnostics for AR and vector AR models under normal or elliptical distributions.
A standard assumption for time series models is that their errors are mutually independent and follow normal or symmetric distributions, as studied in [16,17,18]. However, it is known that certain financial and other datasets feature errors with skewed distributions. In order to deal with such data, the skew-normal (SN) distribution and its scale-mixtures have provided an appealing alternative and can therefore be adopted. Their properties, extensions, and applications are becoming increasingly popular; see [19,20,21,22,23,24,25,26]. In addition, Refs. [27,28,29] studied SN linear, linear mixed, and nonlinear models. Ref. [8] analyzed diagnostics in the nonlinear model with scale mixtures of SN and AR errors. For particular financial applications, Ref. [30] looked at asset pricing issues with return data following SN models. To our knowledge, no study on influence diagnostics in AR time series models under SN distributions have been reported. Therefore, the objective of the present paper is to formulate an AR model under the SN distribution (SN AR model) and to derive diagnostic analytics with applications to financial data. We use the matrix differential calculus pioneered by [31] to establish the mathematical results used in our data analysis. We implement the maximum likelihood (ML) method with the expectation–maximization (EM) algorithm to estimate the SN AR model parameters, whereas the local influence method with four perturbation schemes is used for the diagnostic analytics. The EM algorithm has now become a popular iterative technique for the ML estimation method with incomplete data; see [32,33]. The paper proceeds as follows. In Section 2, the SN AR model is introduced and the ML estimations of the model parameters are derived. Section 3 presents the local influence method and establishes normal curvatures under four perturbation schemes. In Section 4, a simulation study and an empirical example involving an AR model are presented, while the effectiveness of the proposed diagnostics is illustrated and discussed. Our concluding remarks are addressed in Section 5. The derivations of the normal curvatures are presented in the Appendix A.

2. An SN AR Model and its Parameter Estimation

2.1. The SN AR Model

A random variable Y is said to follow an SN distribution with location parameter μ , scale parameter σ 2 and skewness parameter λ , which is denoted by Y SN ( μ , σ 2 , λ ) , if its probability density function is given by
f ( y ) = 2 σ ϕ y μ σ Φ λ y μ σ , y R , μ R , σ > 0 , λ R ,
where ϕ and Φ are the standard normal probability density and cumulative distribution functions, respectively. We see that, if λ = 0 , then the probability density function of Y defined in (1) reduces to the normal probability density function. If Y SN ( μ , σ 2 , λ ) , then E ( Y ) = μ + σ δ 2 / π and Var ( Y ) = σ 2 ( 2 / π ) σ 2 δ 2 , where δ = λ / 1 + λ 2 .
Let Y t be generated by a stationary AR(p) process given by
Y t = β 1 y t 1 + + β p y t p + u t = x t β + u t , t = 1 , , T ,
where Y t is a time series, with Y 1 , , Y p being the p initial values for y t j , such that x t = ( y t 1 , , y t p ) is the p × 1 vector of SN AR(p) covariates; β j is a regression coefficient, for j = 1 , , p , such that β = ( β 1 , , β p ) is a p × 1 regression coefficient vector, and u t is the scalar error term which follows an SN distribution, that is, u t SN ( 0 , σ 2 , λ ) , where σ 2 is the scale parameter and λ is the skewness parameter. Thus, θ = ( β , σ 2 , λ ) is the ( p + 2 ) × 1 vector of SN AR(p) model parameters. Note that we do not include an intercept in the SN model above. The expected value of u t is not zero, due to the assumption that the expected value of the underlying normal distribution is zero. Then, we choose the non-zero expected value of u t as an approximate to replace an intercept for the SN model.

2.2. ML Estimation

Finding the ML estimate of the parameter vector θ by direct maximization of the log-likelihood can potentially be a difficult task, so we implement the EM algorithm for this estimation. Let y c = ( y o , y m ) denote the complete data, with y o being the observed data and y m the missing data. We use the notation Y c , Y o , Y m for the random vectors associated with y c , y o and y m , respectively. Starting from θ ( 0 ) as the initial estimate, we can get θ ( 0 ) , θ ( 1 ) , by running the two steps of the EM algorithm iteratively as defined below.
In order to implement the EM algorithm, consider the stochastic representation given by
Y = μ + σ δ H + σ ( 1 δ 2 ) H 1 ,
where H = | H 0 | HN ( 0 , 1 ) , with HN being used to indicate the half normal distribution. In addition, both H 0 and H 1 are independent random variables which follow a standard normal distribution. Note that, from (3), we have Y | H = h N ( μ + h λ σ / 1 + λ 2 , σ 2 / ( 1 + λ 2 ) ) and H HN ( 0 , 1 ) . Hence, by considering y o = ( y p + 1 , , y T ) , y m = ( h p + 1 , , h T ) and y c = ( y o , y m ) as the observed, missing and complete data sets, respectively, we have the complete-data log-likelihood function for θ = ( β , σ 2 , λ ) given by
c ( θ , y c ) = t = p + 1 T 1 2 log ( σ 2 ) + 1 2 log ( 1 + λ 2 ) 1 + λ 2 2 σ 2 y t x t β λ σ 1 + λ 2 h t 2 .
Therefore, for the E-step of the EM algorithm, given the current estimate θ ^ ( k ) and based on (4), we can calculate the function Q ( θ ) | θ = θ ^ ( k ) = E [ c ( θ , Y c ) | Y o = y o ] | θ = θ ^ ( k ) as
Q ( θ ) = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) ( 1 + λ 2 ) 2 t = p + 1 y t x t β σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) ,
where c ^ t = E ( H t | Y o = y o ) | θ = θ ^ ( k ) = τ 1 + τ 2 ϕ ( τ 1 / τ 2 ) / Φ ( τ 1 / τ 2 ) , c t 2 ^ = E ( H t 2 | Y o = y o ) | θ = θ ^ ( k ) = τ 1 2 + τ 2 2 + τ 1 τ 2 ϕ ( τ 1 / τ 2 ) / Φ ( τ 1 / τ 2 ) , and τ 1 = ( λ ^ ( k ) / σ ^ ( k ) ) ( 1 + ( λ ( k ) ^ ) 2 ) 1 / 2 ( y t x t β ^ ( k ) ) , τ 2 = 1 / ( 1 + ( λ ( k ) ^ ) 2 ) 1 / 2 .
Note that equations in (4) and (5) are obtained from the probability density function given in (1) and after calculating the expected value of (4). In addition, c t 2 ^ is different from ( c ^ t ) 2 . For the M-step, we update θ ^ ( k ) by the Newton–Raphson iteration Q ˙ ( θ ^ k + 1 ) = Q ˙ ( θ ^ ( k ) ) + Q ¨ ( θ ^ ( k ) ) ( θ ^ k + 1 θ ^ ( k ) ) + o ( | θ ^ k + 1 θ ^ ( k ) | ) , where Q ˙ denotes the gradient vector and Q ¨ denotes the Hessian matrix. As it is known, a suitable initial value θ ^ ( 0 ) is important and difficult to find in numerical computation. Thus, we can get θ ^ ( 0 ) = ( β ^ ( 0 ) , σ 2 ^ ( 0 ) , λ ^ ( 0 ) ) considering β ^ ( 0 ) as the ordinary least squares (OLS) estimate and then σ 2 ^ ( 0 ) , λ ^ ( 0 ) can be computed as θ ^ = ( β ^ , σ 2 ^ , λ ^ ) until | θ ^ k + 1 θ ^ ( k ) | < 10 5 .

2.3. The Hessian Matrix

Next, we compute the Hessian matrix Q ¨ θ evaluated at the ML estimate θ ^ using
c ( θ , y c ) = t = p + 1 T 1 2 log ( σ 2 ) + 1 2 log ( 1 + λ 2 ) 1 + λ 2 2 σ 2 u t λ σ 1 + λ 2 h t 2 .
Then, based on (6), the Q function is given by
Q ( θ ) | θ = θ ^ = E [ c ( θ , Y c ) | Y o = y o ] | θ = θ ^ = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) ( 1 + λ 2 ) 2 t = p + 1 u t σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) .
For the SN AR(p) model, we have the ( p + 2 ) × ( p + 2 ) Hessian matrix Q ¨ ( θ ^ ) given by
Q ¨ ( θ ^ ) = 2 Q θ β β 2 Q θ β σ 2 2 Q θ β λ 2 Q θ σ 2 β 2 Q θ σ 2 2 2 Q θ σ 2 λ 2 Q θ λ β 2 Q θ λ σ 2 2 Q θ λ 2 | θ = θ ^ ,
where θ ^ = ( β ^ , σ ^ 2 , λ ^ ) is the ML estimate of θ . The expression for 2 Q θ / β β and the other submatrices are presented in Appendix A.1.

3. Influence Analysis for the SN AR Model

3.1. Local Influence

Let ( θ ) denote the log-likelihood function for the model given in (2), which is the postulated model, where θ is a ( p + 2 ) × 1 vector of unknown parameters with its ML estimate θ ^ . Let ω = ( ω 1 , , ω q ) denote a q × 1 vector of perturbations confined to some open subset of R q and let ω 0 denote a q × 1 non-perturbation vector, with q as a suitable number of dimensions and ω 0 = ( 0 , , 0 ) , or ω 0 = ( 1 , , 1 ) , or a third choice, depending on the context. Then, ( θ ) and ( θ | ω ) represent the log-likelihood functions of the postulated and perturbed models, respectively. Note that ( θ ) = ( θ | ω 0 ) .
We suppose that ( θ | ω ) is twice continuously differentiable in a neighborhood of ( θ ^ , ω 0 ) . We are interested in the comparison of θ ^ and θ ^ ω using the local influence method idea, which is to investigate the degree of inference affected by those minor changes in the corresponding perturbations. Ref. [5] used likelihood displacement (LD) to assess the influence of the perturbation ω defined as LD ( ω ) = 2 ( ( θ ^ ) ( θ ^ ω ) ) . Here, large values of LD ( ω ) indicate that θ ^ and θ ^ ω differ considerably in relation to the contours of the non-perturbed log-likelihood function ( θ ) . The idea is based on studying the local behavior of LD ( ω ) and the normal curvature C l θ in a unit-length direction vector l, where | | l | | = 1 . According to [5], the normal curvature used to examine the local influence of the perturbation vector at ω = ω 0 is C l θ = 2 | l F ¨ l | = 2 | l ( Δ ¨ 1 Δ ) l | , with
F ¨ = 2 ( θ | ω ) ω ω , Δ = 2 ( θ | ω ) θ ω , ¨ = 2 ( θ ) θ θ ,
where l is a q × 1 vector of unit length, ¨ is the ( p + 2 ) × ( p + 2 ) observed information matrix for the postulated model, Δ is the ( p + 2 ) × q perturbation matrix for the perturbed model, and ¨ and Δ are evaluated at θ = θ ^ and ω = ω 0 . The suggestion is to make the local influence diagnostic analytics by finding the maximum curvature C max = max | | l | | = 1 C l , where C max corresponds to the largest absolute eigenvalue λ max and its associated eigenvector l max of the matrix F ¨ = Δ ¨ 1 Δ . If the absolute value of the ith element of l max is the largest, then the ith observation in the data may be the most influential. To examine the magnitude of influence, it is useful to have a benchmark value for C max and for the elements of l max , see [10,18,34].

3.2. Local Influence for the SN AR Model

Next, we conduct a local influence diagnostic analytics for the SN AR(p) model. Due to the complexity of the SN distribution, we obtain the ML estimates based on the EM algorithm. As suggested by [29,34], the Q function and a Q function obtained similarly as LD can be used to replacethe log-likelihood function and likelihood displacement in the method of [5], in order to assess the influence of the perturbation. Thus, the normal curvature should be changed to be
C l θ = 2 | l F ¨ l | = 2 | l ( Δ Q ¨ 1 Δ ) l | ,
with
F ¨ = 2 Q ( θ | ω ) ω ω , Δ = 2 Q ( θ | ω ) θ ω , Q ¨ = 2 Q ( θ ) θ θ ,
where l is a q × 1 vector of unit length, and F ¨ , Q ¨ and Δ are q × q , ( p + 2 ) × ( p + 2 ) and ( p + 2 ) × q matrices, respectively. In addition, Q ¨ and Δ need to be evaluated at θ = θ ^ and ω = ω 0 .
The method examines the total local influence C t = C l t ( θ ) , where l t is a q × 1 unit-length vector with one at the tth position and zeros elsewhere. We denote S = Δ Q ¨ 1 Δ . Since C l θ is not invariant under a uniform change of scale, Ref. [34] proposed the conformal normal curvature B l ( θ ) = C l ( θ ) / ( 2 tr ( S ) ). An interesting property of the conformal normal curvature is that, for any unit-length direction l, we have 0 B l θ 1 , which allows comparison of curvatures among different models.
In order to determine if the tth observation is influential, Ref. [34] proposed to classify the tth observation as possibly influential if M ( 0 ) t = B l t is greater than the benchmark 1 / q + c SM ( 0 ) , where SM ( 0 ) is the sample standard error of M ( 0 ) k , for k = 1 , , q , and c is a certain constant. Depending on the specific application, c may be taken to be a suitably selected positive value.
The forms given in (8) are used to obtain our normal curvature results under the four perturbation schemes, namely the case-weights, data, variance parameter, and skewness parameter schemes. The matrices Q ¨ and Δ need to be established for each scheme.
We employ the matrix differential calculus proposed by [31] to establish these algebraic results in the following sections. We present their derivations in their respective Appendix A.

3.3. Perturbation Matrices

3.3.1. Perturbation of Case-Weights

Assume that a minor perturbation is made on the SN AR(p) model, with y t = x t β + u t being replaced by ω t y t = ω t x t β + ω t u t , where ω t is the weight. Let ω = ( ω p + 1 , , ω T ) denote the ( T p ) × 1 perturbation vector and ω 0 = ( 1 , , 1 ) denote the ( T p ) × 1 non-perturbation vector. Then, the complete-data log-likelihood function of the perturbed model is given by
c ( θ , ω , y c ) = t = p + 1 T 1 2 log ( σ 2 ) + 1 2 log ( 1 + λ 2 ) 1 + λ 2 2 σ 2 ω t u t λ σ 1 + λ 2 h t 2 .
Thus, the perturbed Q function obtained from (9) is expressed as
Q ( θ , ω ) | θ = θ ^ = E [ c ( θ , ω , Y c ) | Y o = y o ] | θ = θ ^ = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 ω t u t σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) .
For the SN AR ( p ) model in the perturbation of case-weights, the ( p + 2 ) × ( T p ) perturbation matrix Δ must be evaluated at θ = θ ^ and ω = ω 0 after taking derivatives, obtaining
Δ = 2 Q ( θ , ω ) θ ω | θ = θ ^ , ω = ω 0 = 1 + λ ^ 2 σ ^ 2 u ^ t x t 1 + λ ^ 2 2 σ ^ 3 2 u ^ t 2 σ ^ λ ^ c ^ t 1 + λ ^ 2 u ^ t 2 λ ^ σ ^ 2 u ^ t 2 + 2 λ ^ + 1 1 + λ ^ 2 c ^ t σ ^ u ^ t ,
where
u ^ t = y t x t β ^ , c ^ t = E ( H t | Y 0 , θ ^ ) = τ 1 + ϕ ( τ 1 τ 2 ) Φ ( τ 1 τ 2 ) τ 2 , c t 2 ^ = E ( H t 2 | Y 0 , θ ^ ) = τ 1 2 + τ 2 2 + ϕ ( τ 1 τ 2 ) Φ ( τ 1 τ 2 ) τ 1 τ 2 , τ 1 = λ ^ σ ^ 1 + λ ^ 2 u ^ t , τ 2 = 1 1 + λ ^ 2 .

3.3.2. Perturbation of Data

Assume that a perturbation replaces y t by ω t + y t . Let ω = ( ω p + 1 , , ω T ) denote the ( T p ) × 1 perturbation vector and ω 0 = ( 0 , , 0 ) the ( T p ) × 1 non-perturbation vector. The perturbed AR ( p ) model can be written as y t + ω t = β 1 y t 1 + ω t 1 + + β p y t p + ω t p + u t , where u t = y t x t β + μ ( ω t ) and μ ( ω t ) = ω t β 1 ω t 1 β p ω t p . Then, the complete-data log-likelihood function of the perturbed model is given by
c ( θ , ω , y c ) = t = p + 1 T 1 2 log ( σ 2 ) + 1 2 log ( 1 + λ 2 ) 1 + λ 2 2 σ 2 u t + μ ( ω t ) λ σ 1 + λ 2 h t 2 .
Thus, the perturbed Q function is expressed as
Q ( θ , ω ) | θ = θ ^ = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 u t + μ ( ω t ) σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) .
For the SN AR ( p ) model in the perturbation of data, we have
Δ = 2 Q ( θ , ω ) θ ω | θ = θ ^ , ω = ω 0 = 1 + λ ^ 2 σ ^ 2 x t 1 + λ ^ 2 2 σ ^ 3 2 u ^ t 2 σ ^ λ ^ 1 + λ ^ 2 c ^ t 2 λ ^ σ ^ 2 u ^ t 2 + 2 λ ^ + 1 1 + λ ^ 2 c ^ t σ ^ ,
where u ^ t = y t x t β ^ , and h ^ t is as in the data perturbation.

3.3.3. Perturbation of Scale

Assume that the variance parameter σ 2 in the model is replaced by ω t 1 σ 2 , meaning that u t SN ( 0 , ω t 1 σ 2 , λ ) . Let ω = ( ω p + 1 , , ω T ) denote the ( T p ) × 1 perturbation vector and ω 0 = ( 1 , , 1 ) denote the ( T p ) × 1 non-perturbation vector. Then, the complete-data log-likelihood function of the perturbed model is given by
c ( θ , ω , y c ) = t = p + 1 T 1 2 log ( σ 2 ) + 1 2 log ( ω t ) + 1 2 log ( 1 + λ 2 ) ω t ( 1 + λ 2 ) 2 σ 2 u t λ σ ω t ( 1 + λ 2 ) h t 2 .
Thus, the perturbed Q function is expressed as
Q ( θ , ω ) | θ = θ ^ = ( T p ) 2 log ( σ 2 ) + t = p + 1 1 2 log ( ω t ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 ω t u t σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) .
For the SN AR ( p ) model with the perturbation of variance parameter, we have
Δ = 2 Q ( θ , ω ) θ ω | θ = θ ^ , ω = ω 0 = 1 + λ ^ 2 σ ^ u ^ t σ ^ λ ^ 2 1 + λ ^ 2 c ^ t x t 1 + λ ^ 2 u ^ t 2 2 σ ^ 4 λ ^ 1 + λ ^ 2 4 σ ^ 3 c ^ t u ^ t λ ^ σ ^ 2 u ^ t 2 + 2 λ ^ 2 + 1 1 + λ ^ 2 c ^ t 2 σ ^ u ^ t ,
where u ^ t = y t x t β ^ , and h ^ t is as in data perturbation.

3.3.4. Perturbation of Skewness

Considering the particular skewed feature of the distribution, we may investigate the effect on the model fit by making a minor change of the skewness parameter λ . In our perturbed model, we propose to replace λ by ω i λ . Let ω = ( ω p + 1 , , ω T ) denote the ( T p ) × 1 perturbation vector and ω 0 = ( 1 , , 1 ) denote the ( T p ) × 1 non-perturbation vector. Then, the complete-data log-likelihood function of the perturbed model is given by
c ( θ , ω , y c ) = t = p + 1 T 1 2 log ( σ 2 ) + 1 2 log ( 1 + ω t λ 2 ) 1 + ω t λ 2 2 σ 2 u t ω t λ σ 1 + ω t λ 2 h t 2 .
Thus, the perturbed Q function is expressed as
Q ( θ , ω ) | θ = θ ^ = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + ω t λ 2 ) t = p + 1 1 + ω t λ 2 2 u t σ ω t λ 1 + ω t λ 2 c ^ t 2 t = p + 1 ω t λ 2 2 ( c t 2 ^ ( c ^ t ) 2 ) .
For the SN AR ( p ) model with the perturbation of the skewness parameter, we have
Δ = 2 Q ( θ , ω ) θ ω | θ = θ ^ , ω = ω 0 = u ^ t σ ^ 2 λ ^ 2 λ ^ 2 λ ^ 2 + 1 2 1 + λ ^ 2 c ^ t σ ^ x t λ ^ 2 u ^ t 2 2 σ ^ 4 λ ^ 2 σ ^ 3 1 + 2 λ ^ 2 2 1 + λ ^ 2 c ^ t u ^ t λ ^ λ ^ 2 + 1 2 λ ^ σ ^ 2 u ^ t 2 λ ^ c ^ t 2 + 6 λ ^ 2 + 1 λ ^ 2 + 1 λ ^ 2 2 λ ^ 2 + 1 2 λ ^ 2 + 1 3 / 2 c ^ t σ ^ u ^ t ,
where u ^ t = y t x t β ^ and h ^ t , c t 2 ^ are as in the data perturbation.

4. Numerical Results

4.1. Simulation Study I: Effectiveness of the Diagnostics

For our simulation, we consider an AR ( 1 ) model as specified in Section 2 given by
y t = β y t 1 + u t , u t SN ( 0 , σ 2 , λ ) ,
where we choose β = 0.12 , σ 2 = 0.003 and λ = 0.1 . We generate T = 400 observations. Now, we compare the performance of the ML estimates in the presence of five perturbed or shifted observations among three different scenarios with λ = 0.1 , 0.2 , 0.3 .
We perturb the value y t by y t = y t + β y t 1 d , where t = 200 , 201 , 202 , 203 , 204 and d = 5 , 10 , , 50 in order to guarantee the presence of outliers. We calculate the ML estimate of β under the perturbed and non-perturbed data, by fitting both data sets under the AR(1) model with λ = 0.1 , 0.2 , 0.3 , respectively. Then, we compute the relative changes of the estimates as | ( E s t ( i ) E s t ) / E s t | , where E s t ( i ) is the estimate of β under the perturbed data and E s t is the estimate of β under the non-perturbed data. The above computation in MATLAB (version 9.3, MathWorks, US) runs five iterations to converge in less than 60 s on a PC. Our experience indicates that the computation runs up to 10 iterations to converge in less than 120 s for even T = 1000 . Figure 1 shows the effectiveness of the influence diagnostics.

4.2. Simulation Study II: Comparing SN and Normal Distributions

We perform a numerical simulation to examine the effectiveness of our method under SN distributions. The results under the SN and normal distributions are compared as follows:
  • Step 1. We use the simulated data ( λ = 0.1 ) with y t perturbed by y t = y t + β y t 1 d , where d = 5 and t = 200 , 201 , 202 , 203 , 204 . We fit an AR(1) model under normality to the data, and obtain the fitted AR(1) model given by Y t = 0.1549 y t 1 + u t , where u t N ( 0 , 0.0151 ) .
  • Step 2. We conduct a local influence diagnostic analytics under the normal distribution using the diagnostic results given by [18].
  • Step 3. We compare the local influence results for the normal distribution in Step 2. In Figure 2, twenty-four influence observations are detected under the SN distribution. These results are summarized in Table 1.
As presented, twenty-four influential observations are detected by the local influence diagnostic analytics under the SN distribution, which is more than the twenty influential values detected by the local influence analytics under the normal distribution. These results are informative and as expected. We see that the influential values (that is, cases #62, #153, #201 and #301) are less than zero.
As we understand, when λ for the SN distribution is larger than zero, it is easier to find possible influential values less than zero due to the difference in patterns between the SN distribution and the normal distribution. This indicates that the diagnostic results under the SN distribution established in Section 3 work well.

4.3. Real-World Data Analysis

We conduct an empirical example of financial data based on our results presented in Section 2 and Section 3. We choose Chevron shares (hereinafter referred to as CVX weekly financial return data), which were collected from 12 January 2007 to 1 August 2014 to construct the AR models. Figure 3 shows the log transformation of this data set (a total of 395 observations).
We first determine the order of the AR model, using the following steps:
  • Step 1. We assume the data are subject to an AR ( p ) model, for p = 1 , 2 , , given by
    Y t = β 1 y t 1 + u t Y t = β 1 y t 1 + β 2 y t 2 + + β p y t p + u t
  • Step 2. For the ith equation given in (17), we let β ^ j ( i ) be the OLS estimate of β j , where the superscript ( i ) denotes the estimates for the AR(i) model. Then, the residual is defined as u ^ t ( i ) = y t β ^ 1 ( i ) y t 1 β ^ 2 ( i ) y t 2 β ^ i ( i ) y t i . The estimated variance for the AR(i) model is expressed as σ ^ i 2 = ( 1 / T 2 i 1 ) t = i + 1 ( u ^ t ( i ) ) 2 .
  • Step 3. We use the ith and ( i 1 ) th equations given in (17) to test H 0 : β i = 0 versus H 1 : β i 0 , that is, we test the AR(i) model versus the AR( i 1 ) model. The test statistic is defined as
    M ( i ) = ( T i 2.5 ) log σ ^ i 2 σ ^ i 1 2 .
For our model, M ( i ) follows an asymptotically chi-square distribution with one degree of freedom, that is, M ( i ) χ 2 ( 1 ) .
We calculate M ( i ) by (18), for i = 1 , , 7 , and present the result in Table 2. As the 99th percentile of the chi-square distribution with one degree of freedom is 6.635 ( χ 0.99 2 ( 1 ) = 6.635 from Table 2), we select the order of the AR(p) model to be p = 3 .
From the EM algorithm detailed in Section 2, we can find the estimate of parameter θ ^ given by
( β 1 ^ , β 2 ^ , β 3 ^ , σ ^ 2 , λ ^ ) = ( 0.0153 , 0.0333 , 0.2198 , 0.0013 , 0.0661 ) .
Since the absolute values of β 1 , β 2 and β 3 all are less than 1, we can accept that the CVX time series is stationary for the AR(3) model. Thus, we obtain a predictive model in the fitted SN AR(3) structure given by
y ^ t = 0.0153 y t 1 0.0333 y t 2 0.2198 y t 3 ,
with μ ^ = 0 , σ ^ 2 = 0.0013 , and λ ^ = 0.0661 .
By applying the method in Section 3, we can conduct influence analysis for the SN AR(3) model. After calculating the observed Hessian matrix and the Δ matrices for the four perturbation schemes of case-weights, data, variance parameter, and skewness parameter, we obtain the diagnostics matrices S 1 , S 2 , S 3 , and S 4 , respectively. In this case, we select the benchmark as 1 / 392 + 3 SM ( 0 ) , with the values of 0.1144, 0.0178, 0.0347, and 0.0791 established in the simulation study for the four perturbation schemes, respectively.
In Figure 4, the straight line represents the benchmark value which determines whether an observation is potentially influential. The observation is justified to be potentially influential when its diagnostic value exceeds the benchmark.
Firstly, we only find case #92 to be potentially influential. The other potential influential observations may be masked by case #92. Similar to a step-wise diagnostic procedure proposed by [6], a second step of identification of influential observations is conducted. In the second rstep, we replace the value of case #92 by the average of cases #91 and #93 to get a new time series.
We re-fit an AR model in the same manner as done previously in the first step. For the new time series, the parameters in the SN AR model are again estimated by applying the EM-algorithm with the order also selected to be three. Thus, we present the new SN AR(3) model as
y ^ t = 0.0092 y t 1 0.0374 y t 2 0.1328 y t 3 ,
with μ ^ = 0 , σ ^ 2 = 0.0011 , λ ^ = 0.1015 . Since the absolute values of β ^ 1 β ^ 2 , and β ^ 3 are all less than one, we can accept the CVX time series to be stationary for the AR(3) model. For the new AR(3) model, we conduct influence analysis. We select the benchmark as 1 / 392 + 3 SM ( 0 ) again, with the four values of 0.0377, 0.0100, 0.0162, and 0.0203 for the corresponding perturbation schemes, respectively. Thus, twenty-two influence observations are detected in Figure 5, summarized in Table 3. It tallies the observations which are identified to be potentially influential for CVX, and an "*" in Table 3 indicates that the observation has been identified via the assigned perturbation scheme. It is worth noting that the points shown in Table 3 correspond to a number of material historical events. Many of these points relate to events around the 2008 global financial crisis. For example, on September 7 2008, the US Treasury Department announced to take over Fannie Mae and Freddie Mac. On 3 October 2008, the Bush administration signed a total of up to 700 billion dollars in a financial rescue plan. This shows the effectiveness of our procedure in identifying potentially influential observations to improve modeling outcomes.

5. Conclusions

In this paper, we have researched the influence diagnostics in the AR(p) model under skew-normal distributions. We have obtained the normal curvatures for the model under four perturbation schemes, including the newly proposed perturbation of skewness. We have conducted a Monte Carlo simulation study to obtain approximate benchmark values for determining those possible influential points, and use them to analyze the weekly log-returns of Chevron. The findings outlined in this paper suggest that our local influence approach in the AR(p) model effectively identifies potentially influential observations to improve the fit of the model.
Note that we have made a new application of the matrix differential calculus developed by [31] in mathematical optimization and data analysis. In particular, this paper focuses on the detection of local (rather than global) influential observations. The extension of the current study to identify global influential observations will be considered in the future.

Author Contributions

Data curation, Y.L.; formal analysis, G.M. and S.L.; investigation, G.M., V.L. and A.T.; methodology, V.L., S.L. and A.T.; writing–original draft, V.L.; writing–review and editing, S.L. and A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported partially by grant “Fondecyt 1200525” from the National Commission for Scientific and Technological Research of the Chilean government.

Acknowledgments

The authors would also like to thank the Editor and Reviewers for their constructive comments which led to improve the presentation of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Based on the matrix differential calculus presented by [31], we establish the derivatives involved in our calculations.

Appendix A.1. The Hessian Matrix

For the function Q ( θ ) | θ = θ ^ = E [ c ( θ , Y c ) | Y o = y o ] | θ = θ ^ , we have
Q ( θ ) = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 u t σ 2 + λ 2 1 + λ 2 c t 2 ^ u t σ 2 λ 1 + λ 2 c ^ t = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 u t σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) ,
where
c ^ t = E ( H t | Y 0 , θ ^ ) = τ 1 + ϕ ( τ 1 τ 2 ) Φ ( τ 1 τ 2 ) τ 2 , c t 2 ^ = E ( H t 2 | Y 0 , θ ^ ) = τ 1 2 + τ 2 2 + ϕ ( τ 1 τ 2 ) Φ ( τ 1 τ 2 ) τ 1 τ 2 , τ 1 = λ ^ σ ^ 1 + λ ^ 2 u ^ t , τ 2 = 1 1 + λ ^ 2 .
The first-order derivatives related to Q ¨ ( θ ^ ) in (7) are given by
Q ( θ ) β = t = p + 1 1 + λ 2 σ u t σ λ 1 + λ 2 c ^ t x t , Q ( θ ) σ 2 = ( T p ) 2 σ 2 + t = p + 1 1 + λ 2 2 σ 3 u t σ λ 1 + λ 2 c ^ t u t , Q ( θ ) λ = t = p + 1 λ 1 + λ 2 u t σ 2 λ λ c t 2 ^ + 2 λ 2 + 1 1 + λ 2 u t σ c ^ t .
The second-order derivatives related to Q ¨ ( θ ^ ) in (7) are given by
2 Q ( θ ) β β = t = p + 1 1 + λ 2 σ 2 x t x t , 2 Q ( θ ) β σ 2 = t = p + 1 1 + λ 2 2 σ 3 λ 1 + λ 2 c ^ t 2 u t σ x t , 2 Q ( θ ) β λ = t = p + 1 2 λ σ 2 u t 2 λ 2 + 1 σ 1 + λ 2 c ^ t x t , 2 Q ( θ ) σ 2 2 = ( T p ) 2 σ 4 t = p + 1 1 + λ 2 4 σ 5 4 u t σ 3 λ 1 + λ 2 c ^ t u t , 2 Q ( θ ) σ 2 λ = t = p + 1 λ σ 4 u t 2 2 λ 2 + 1 1 + λ 2 u t 2 σ 3 c ^ t , 2 Q ( θ ) λ 2 = T p 1 λ 2 1 + λ 2 2 + t = p + 1 u t 2 σ 2 c t 2 ^ + 2 λ 3 + 3 λ 1 + λ 2 3 u t σ c ^ t .

Appendix A.2. Perturbation Matrix for Case-Weights

For the function Q ( θ , ω ) | θ = θ ^ = E [ c ( θ , ω , Y c ) | Y o = y o ] | θ = θ ^ given in (10), we have
Q ( θ , ω ) = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 ω t u t σ 2 + λ 2 1 + λ 2 c t 2 ^ ω t u t σ 2 λ 1 + λ 2 c ^ t = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 ω t u t σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) .
The first-order derivatives related to Δ in (11) are given by
Q ( θ , ω ) β = t = p + 1 1 + λ 2 σ ω t u t σ λ 1 + λ 2 c ^ t x t , Q ( θ , ω ) σ 2 = ( T p ) 2 σ 2 + t = p + 1 1 + λ 2 2 σ 3 ω t u t σ λ 1 + λ 2 c ^ t ω t u t , Q ( θ , ω ) λ = t = p + 1 λ 1 + λ 2 ω t u t σ 2 λ λ c t 2 ^ + 2 λ 2 + 1 1 + λ 2 ω t u t σ c ^ t ,
The second-order derivatives related to Δ in (11) are given by
2 Q ( θ , ω ) β ω t = 1 + λ 2 σ 2 u t x t , 2 Q ( θ , ω ) σ 2 ω t = 1 + λ 2 2 σ 3 2 u t 2 σ ω t λ c ^ t 1 + λ 2 u t , 2 Q ( θ , ω ) λ ω t = 2 λ σ u t 2 ω t + 2 λ 2 + 1 1 + λ 2 u t σ c ^ t .
Noting that ω 0 = ( 1 , , 1 ) , we obtain (11).

Appendix A.3. Perturbation Matrix for Data

For the function Q ( θ , ω ) | θ = θ ^ = E [ c ( θ , ω , Y c ) | Y o = y o ] | θ = θ ^ given in (12), we have
Q ( θ , ω ) = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 u t + μ ω t σ 2 + λ 2 1 + λ 2 c t 2 ^ u t + μ ω t σ 2 λ 1 + λ 2 c ^ t = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 u t + μ ω t σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) .
The first-order derivatives related to Δ in (13) are given by
Q ( θ , ω ) β = t = p + 1 1 + λ 2 σ u t + μ ω t σ λ 1 + λ 2 c ^ t x t + ω t 1 , , ω t p Q ( θ , ω ) σ 2 = ( T p ) 2 σ 2 + t = p + 1 1 + λ 2 2 σ 3 u t + μ ω t 2 σ λ u t + μ ω t 1 + λ 2 c ^ t Q ( θ , ω ) λ = t = p + 1 λ 1 + λ 2 u t + μ ω t σ 2 λ λ c t 2 ^ + 2 λ 2 + 1 1 + λ 2 u t + μ ω t σ c ^ t
The second-order derivatives related to Δ in (13) are given by
2 Q ( θ , ω ) β ω t = 1 + λ 2 σ 2 x t + ω t 1 , , ω t p 2 Q ( θ , ω ) σ 2 ω t = 1 + λ 2 2 σ 3 2 u t + μ ω t σ λ c ^ t 1 + λ 2 2 Q ( θ , ω ) λ ω t = 2 λ σ 2 u t + μ ω t + 2 λ 2 + 1 1 + λ 2 c ^ t σ
Noting that ω 0 = ( 0 , , 0 ) , we obtain (13).

Appendix A.4. Perturbation Matrix for Scale

For the function Q ( θ , ω ) | θ = θ ^ = E [ c ( θ , ω , Y c ) | Y o = y o ] | θ = θ ^ given in (14), we have
Q ( θ , ω ) = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( ω t ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 ω t u t 2 σ 2 + λ 2 1 + λ 2 c t 2 ^ ω t u t σ 2 λ 1 + λ 2 c ^ t = ( T p ) 2 log ( σ 2 ) + t = p + 1 1 2 log ( ω t ) + ( T p ) 2 log ( 1 + λ 2 ) 1 + λ 2 2 t = p + 1 ω t u t σ λ 1 + λ 2 c ^ t 2 λ 2 2 t = p + 1 T ( c t 2 ^ ( c ^ t ) 2 ) .
The first-order derivatives related to Δ in (15) are given by
Q ( θ , ω ) β = t = p + 1 ( 1 + λ 2 ) ω t u t σ 2 λ ω t 1 + λ 2 σ c ^ t x t Q ( θ , ω ) σ 2 = t = p + 1 1 2 σ 2 + 1 + λ 2 ω t u t 2 2 σ 4 λ ω t 1 + λ 2 2 σ 3 c ^ t u t Q ( θ , ω ) λ = t = p + 1 λ 1 + λ 2 ω t u t 2 σ 2 λ λ c t 2 ^ + 2 λ 2 + 1 1 + λ 2 ω t u t σ c ^ t
The second-order derivatives related to Δ in (15) are given by
2 Q ( θ , ω ) β ω t = 1 + λ 2 σ 2 u t λ 1 + λ 2 σ 1 2 ω t c ^ t x t 2 Q ( θ , ω ) σ 2 ω t = 1 + λ 2 u t 2 2 σ 4 λ 1 + λ 2 4 σ 3 ω t c ^ t u t 2 Q ( θ , ω ) λ ω t = u t 2 λ σ 2 + 2 λ 2 + 1 1 + λ 2 u t 2 ω t c ^ t σ
Noting that ω 0 = ( 1 , , 1 ) , we obtain (15).

Appendix A.5. Perturbation Matrix for Skewness

For the function Q ( θ , ω ) | θ = θ ^ = E [ c ( θ , ω , Y c ) | Y o = y o ] | θ = θ ^ given in (16), we have
Q ( θ , ω ) = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + ω t λ 2 ) 1 + ω t λ 2 2 t = p + 1 u t 2 σ 2 + ω t λ 2 1 + ω t λ 2 c t 2 ^ u t σ 2 ω t λ 1 + ω t λ 2 c ^ t = ( T p ) 2 log ( σ 2 ) + ( T p ) 2 log ( 1 + ω t λ 2 ) 1 + ω t λ 2 2 t = p + 1 u t σ ω t λ 1 + ω t λ 2 c ^ t 2 ω t λ 2 2 t = p + 1 ( c t 2 ^ ( c ^ t ) 2 ) .
The first-order derivatives related to Δ in (16) are given by
Q ( θ , ω ) β = t = p + 1 1 + ω t λ 2 σ u t σ λ ω t 1 + ω t λ 2 c ^ t x t Q ( θ , ω ) σ 2 = t = p + 1 1 2 σ 2 + 1 + ω t λ 2 2 σ 3 u t σ λ ω t 1 + ω t λ 2 c ^ t u t Q ( θ , ω ) λ = t = p + 1 ω t λ 1 + ω t λ 2 u t 2 σ 2 ω t λ ω t λ c t 2 ^ + ω t 2 ω t λ 2 + 1 1 + ω t λ 2 u t σ c ^ t .
The second-order derivatives related to Δ in (16) are given by
2 Q ( θ , ω ) β ω t = λ 2 σ 2 u t λ 1 + 2 λ 2 ω t 2 ω t 1 + λ 2 ω t c ^ t σ x t 2 Q ( θ , ω ) σ 2 ω t = λ 2 u t 2 σ 4 λ 4 σ 3 1 + 2 λ 2 ω t ω t 1 + λ 2 ω t c ^ t u t 2 Q ( θ , ω ) λ ω t = λ 1 + λ 2 ω t 2 u t 2 λ σ 2 λ c t 2 ^ + 6 λ 2 ω t + 1 λ 2 ω t + 1 λ 2 ω t 2 λ 2 ω t + 1 2 ω t 1 + λ 2 ω t 3 / 2 u t σ c ^ t
Noting that ω 0 = ( 1 , , 1 ) , we obtain (16).

References

  1. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
  2. Liu, T.; Liu, S.; Shi, L. Time Series Analysis Using SAS Enterprise Guide; Springer: Singapore, 2020. [Google Scholar]
  3. Li, W.K. Diagnostic Checks in Time Series; Chapman and Hall/CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
  4. Liu, S.; Welsh, A.H. Regression diagnostics. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1206–1208. [Google Scholar]
  5. Cook, D. Assessment of local influence. J. R. Stat. Soc. B 1986, 48, 133–169. [Google Scholar] [CrossRef]
  6. Shi, L.; Huang, M. Stepwise local influence analysis. Comput. Stat. Data Anal. 2011, 55, 973–982. [Google Scholar] [CrossRef]
  7. Lu, J.; Shi, L.; Chen, F. Outlier detection in time series models using local influence method. Comm. Stat. Theor. Meth. 2012, 41, 2202–2220. [Google Scholar] [CrossRef]
  8. Cao, C.Z.; Lin, J.G.; Shi, J.Q. Diagnostics on nonlinear model with scale mixtures of skew-normal and first-order autoregressive errors. Statistics 2014, 48, 1033–1047. [Google Scholar] [CrossRef]
  9. Leiva, V.; Liu, S.; Shi, L.; Cysneiros, F.J.A. Diagnostics in elliptical regression models with stochastic restrictions applied to econometrics. J. Appl. Stat. 2016, 43, 627–642. [Google Scholar] [CrossRef]
  10. Liu, S.; Leiva, V.; Ma, T.; Welsh, A.H. Influence diagnostic analysis in the possibly heteroskedastic linear model with exact restrictions. Stat. Meth. Appl. 2016, 25, 227–249. [Google Scholar] [CrossRef] [Green Version]
  11. Liu, S.; Ma, T.; SenGupta, A.; Shimizu, K.; Wang, M.Z. Influence diagnostics in possibly asymmetric circular-linear multivariate regression models. Sankhya B 2017, 79, 76–93. [Google Scholar] [CrossRef]
  12. Garcia-Papani, F.; Leiva, V.; Ruggeri, F.; Uribe-Opazo, M.A. Kriging with external drift in a Birnbaum-Saunders geostatistical model. Stoch. Environ. Res. Risk Assess. 2019, 32, 1517–1530. [Google Scholar] [CrossRef]
  13. Leao, J.; Leiva, V.; Saulo, H.; Tomazella, V. Incorporation of frailties into a cure rate regression model and its diagnostics and application to melanoma data. Stat. Med. 2018, 37, 4421–4440. [Google Scholar] [CrossRef]
  14. Tapia, A.; Leiva, V.; Diaz, M.P.; Giampaoli, V. Influence diagnostics in mixed effects logistic regression models. TEST 2019, 28, 920–942. [Google Scholar] [CrossRef]
  15. Tapia, H.; Giampaoli, V.; Diaz, M.P.; Leiva, V. Sensitivity analysis of longitudinal count responses: A local influence approach and application to medical data. J. Appl. Stat. 2019, 46, 1021–1042. [Google Scholar] [CrossRef]
  16. Liu, S. On diagnostics in conditionally heteroskedastic time series models under elliptical distributions. J. Appl. Prob. 2004, 41, 393–405. [Google Scholar] [CrossRef]
  17. Zevallos, M.; Santos, B.; Hotta, L.K. A note on influence diagnostics in AR(1) time series models. J. Stat. Plan. Inference 2012, 142, 2999–3007. [Google Scholar] [CrossRef]
  18. Liu, Y.; Ji, G.; Liu, S. Influence diagnostics in a vector autoregressive model. J. Stat. Comput. Simul. 2015, 85, 2632–2655. [Google Scholar] [CrossRef]
  19. Azzalini, A. A class of distribution which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  20. Eling, M. Fitting insurance claims to skewed distributions: Are the skew-normal and skew-student good models? Insur. Math. Econ. 2012, 51, 239–248. [Google Scholar] [CrossRef]
  21. Azzalini, A. The Skew-Normal and Related Families; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  22. Arnold, B.C.; Gomez, H.W.; Varela, H.; Vidal, I. Univariate and bivariate models related to the generalized epsilon-skew-Cauchy distribution. Symmetry 2019, 11, 794. [Google Scholar] [CrossRef] [Green Version]
  23. Cysneiros, F.J.A.; Leiva, V.; Liu, S.; Marchant, C.; Scalco, P. A Cobb-Douglas type model with stochastic restrictions: Formulation, local influence diagnostics and data analytics in economics. Qual. Quant. 2019, 53, 1693–1719. [Google Scholar] [CrossRef]
  24. Huerta, M.; Leiva, V.; Liu, S.; Rodriguez, M.; Villegas, D. On a partial least squares regression model for asymmetric data with a chemical application in mining. Chem. Intell. Lab. Syst. 2019, 190, 55–68. [Google Scholar] [CrossRef]
  25. Ventura, M.; Saulo, H.; Leiva, V.; Monsueto, S. Log-symmetric regression models: Information criteria, application to movie business and industry data with economic implications. Appl. Stoch. Models Bus. Ind. 2019, 35, 963–977. [Google Scholar] [CrossRef]
  26. Maleki, M.; Arellano-Valle, R.V. Maximum a-posteriori estimation of autoregressive processes based on finite mixtures of scale-mixtures of skew-normal distributions. J. Stat. Comput. Simul. 2017, 87, 1061–1083. [Google Scholar] [CrossRef]
  27. Bolfarine, H.; Montenegro, L.C.; Lachos, V.H. Influence diagnostics for skew-normal linear mixed models. Indian J. Stat. 2007, 69, 648–670. [Google Scholar]
  28. Xie, F.C.; Lin, J.G.; Wei, B.C. Diagnostics for skew-normal nonlinear regression models with AR(1) errors. Comput. Stat. Data Anal. 2009, 53, 4403–4416. [Google Scholar] [CrossRef]
  29. Garay, A.M.; Lachos, V.H.; Labra, F.V.; Ortega, E.M.M. Statistical diagnostics for nonlinear regression models based on scale mixtures of skew-normal distributions. J. Stat. Comput. Simul. 2014, 84, 1761–1778. [Google Scholar] [CrossRef]
  30. Carmichael, B.; Coën, A. Asset pricing with skewed-normal return. Finance Res. Lett. 2013, 10, 50–57. [Google Scholar] [CrossRef]
  31. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; Wiley: Chichester, UK, 2019. [Google Scholar]
  32. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 1977, 39, 1–38. [Google Scholar]
  33. McLachlan, G.; Krishnan, T. The EM Algorithm and Extensions; Wiley: New York, NY, USA, 1997. [Google Scholar]
  34. Poon, W.Y.; Poon, Y.S. Conformal normal curvature and assessment of local influence. J. R. Stat. Soc. B 1999, 61, 51–61. [Google Scholar] [CrossRef]
Figure 1. Relative change of estimate of β against d.
Figure 1. Relative change of estimate of β against d.
Mathematics 08 00693 g001
Figure 2. Diagnostics for the perturbations of case-weights, data, variance and skewness.
Figure 2. Diagnostics for the perturbations of case-weights, data, variance and skewness.
Mathematics 08 00693 g002
Figure 3. CVX weekly log-returns.
Figure 3. CVX weekly log-returns.
Mathematics 08 00693 g003
Figure 4. Diagnostics for the perturbations of case-weights, data, variance, and skewness in the SN AR(3) model [step 1].
Figure 4. Diagnostics for the perturbations of case-weights, data, variance, and skewness in the SN AR(3) model [step 1].
Mathematics 08 00693 g004
Figure 5. Diagnostics for the perturbations of case-weights, data, variance, and skewness in the SN AR(3) model [step 2].
Figure 5. Diagnostics for the perturbations of case-weights, data, variance, and skewness in the SN AR(3) model [step 2].
Mathematics 08 00693 g005
Table 1. Local influence results for normal and SN models.
Table 1. Local influence results for normal and SN models.
IDIndex under the Normal ModelIndex under the SN ModelObserved Value
13333−0.334
234340.030
3-62−0.271
47777−0.260
5111111−0.297
6112112−0.225
71401400.319
81411410.089
9-153−0.175
10-201−0.269
11202202−0.326
12203203−0.090
132052050.328
14206206−0.124
15214214−0.406
16215215−0.159
172932930.296
18294294−0.012
19-301−0.278
203063060.043
21345345−0.293
22346346−0.092
23362362−0.307
24363363−0.048
Table 2. Test statistic M ( i ) , for  i = 1 , , 7 using CVX data.
Table 2. Test statistic M ( i ) , for  i = 1 , , 7 using CVX data.
Order1234567
M ( i ) −0.3826−1.00116.95512.9495−1.6675−1.96198.9958
Table 3. Summary of the curvature-based diagnostic analytics.
Table 3. Summary of the curvature-based diagnostic analytics.
IDTimeCVXs ReturnCase-WeightsDataVarianceSkewness
542008-01-18−8.286% *
792008-07-11−6.687% *
872008-09-05−7.329% *
912008-10-03−9.109% *
922008-10-10−31.674%****
952008-10-3115.4665%* *
962008-11-07−1.54% *
972008-11-14−1.067% *
982008-11-21−3.06% *
992008-11-2811.4104%*
1012008-12-125.9723% *
1022008-12-19−10.888%****
1032008-12-26−0.708% *
1042009-01-028.4069% *
1052009-01-09−4.956% * *
1102009-02-13−7.152% *
1132009-03-06−4.102% *
1812010-06-25−7.505% *
2462011-09-23−10.154%* *
2542011-11-18−8.955% *
2562011-12-029.6993% *
2572011-12-092.4863% *

Share and Cite

MDPI and ACS Style

Liu, Y.; Mao, G.; Leiva, V.; Liu, S.; Tapia, A. Diagnostic Analytics for an Autoregressive Model under the Skew-Normal Distribution. Mathematics 2020, 8, 693. https://0-doi-org.brum.beds.ac.uk/10.3390/math8050693

AMA Style

Liu Y, Mao G, Leiva V, Liu S, Tapia A. Diagnostic Analytics for an Autoregressive Model under the Skew-Normal Distribution. Mathematics. 2020; 8(5):693. https://0-doi-org.brum.beds.ac.uk/10.3390/math8050693

Chicago/Turabian Style

Liu, Yonghui, Guohua Mao, Víctor Leiva, Shuangzhe Liu, and Alejandra Tapia. 2020. "Diagnostic Analytics for an Autoregressive Model under the Skew-Normal Distribution" Mathematics 8, no. 5: 693. https://0-doi-org.brum.beds.ac.uk/10.3390/math8050693

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop