Robust Estimation for the Single Index Model Using Pseudodistances

Toma, Aida; Fulga, Cristinca

doi:10.3390/e20050374

Open AccessArticle

Robust Estimation for the Single Index Model Using Pseudodistances

by

Aida Toma

^1,2,* and

Cristinca Fulga

¹

Department of Applied Mathematics, Bucharest Academy of Economic Studies, 010374 Bucharest, Romania

²

“Gh. Mihoc - C. Iacob” Institute of Mathematical Satistics and Applied Mathematics, Romanian Academy, 010071 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Entropy 2018, 20(5), 374; https://0-doi-org.brum.beds.ac.uk/10.3390/e20050374

Submission received: 31 March 2018 / Revised: 11 May 2018 / Accepted: 14 May 2018 / Published: 17 May 2018

(This article belongs to the Special Issue New Developments in Statistical Information Theory Based on Entropy and Divergence Measures)

Download

Browse Figures

Versions Notes

Abstract

:

For portfolios with a large number of assets, the single index model allows for expressing the large number of covariances between individual asset returns through a significantly smaller number of parameters. This avoids the constraint of having very large samples to estimate the mean and the covariance matrix of the asset returns, which practically would be unrealistic given the dynamic of market conditions. The traditional way to estimate the regression parameters in the single index model is the maximum likelihood method. Although the maximum likelihood estimators have desirable theoretical properties when the model is exactly satisfied, they may give completely erroneous results when outliers are present in the data set. In this paper, we define minimum pseudodistance estimators for the parameters of the single index model and using them we construct new robust optimal portfolios. We prove theoretical properties of the estimators, such as consistency, asymptotic normality, equivariance, robustness, and illustrate the benefits of the new portfolio optimization method for real financial data.

Keywords:

minimum divergence methods; robustness; single index model

1. Introduction

The problem of portfolio optimization in the mean-variance approach depends on a large number of parameters that need to be estimated on the basis of relatively small samples. Due to the dynamics of market conditions, only a short period of market history can be used for estimation of the model’s parameters. In order to reduce the number of parameters that need to be estimated, the single index model proposed by Sharpe (see [1,2]) can be used. The traditional estimators for parameters of the single index model are based on the maximum likelihood method. These estimators have optimal properties for normally distributed variables, but they may give completely erroneous results in the presence of outlying observations. Since the presence of outliers in financial asset returns is a frequently occurring phenomenon, robust estimates for the parameters of the single index model are necessary in order to provide robust and optimal portfolios.

Our contribution to robust portfolio optimization through the single index model is based on using minimum pseudodistance estimators.

The interest on statistical methods based on information measures and particularly on divergences has grown substantially in recent years. It is a known fact that, for a wide variety of models, statistical methods based on divergence measures have some optimal properties in relation to efficiency, but especially in relation to robustness, representing viable alternatives to the classical methods. We refer to the monographs of Pardo [3] and Basu et al. [4] for an excellent presentation of such methods, for their importance and applications.

We can say that the minimum pseudodistance methods for estimation go to the same category as the minimum divergence methods. The minimum divergence estimators are defined by minimizing some appropriate divergence between the assumed theoretical model and the true model corresponding to the data. Depending on the choice of the divergence, minimum divergence estimators can afford considerable robustness with a minimal loss of efficiency. The classical minimum divergence methods require nonparametric density estimation, which imply some difficulties such as the bandwidth selection. In order to avoid the nonparametric density estimation in minimum divergence estimation methods, some proposals have been made in [5,6,7] and robustness properties of such estimators have been studied in [8,9].

The pseudodistances that we use in the present paper were originally introduced in [6], where they are called "type-0" divergences, and corresponding minimum divergence estimators have been studied. They are also obtained (using a cross entropy argument) and extensively studied in [10] where they are called

γ

-divergences. They are also introduced in [11] in the context of decomposable pseudodistances. By its very definition, a pseudodistance satisfies two properties, namely the nonnegativity and the fact that the pseudodistance between two probability measures equals to zero if and only if the two measures are equal. The divergences are moreover characterized by the information processing property, i.e., by the complete invariance with respect to statistically sufficient transformations of the observation space (see [11], p. 617). In general, a pseudodistance may not satisfy this property. We adopted the term pseudodistance for this reason, but in the literature we can also meet the other terms above. The minimum pseudodistance estimators for general parametric models have been presented in [12] and consist of minimization of an empirical version of a pseudodistance between the assumed theoretical model and the true model underlying the data. These estimators have the advantages of not requiring any prior smoothing and conciliate robustness with high efficiency, usually requiring distinct techniques.

In this paper, we define minimum pseudodistance estimators for the parameters of the single index model and using them we construct new robust optimal portfolios. We study properties of the estimators, such as, consistency, asymptotic normality, robustness and equivariance and illustrate the benefits of the proposed portfolio optimization method through examples for real financial data.

We mention that we define minimum pseudodistance estimators, and prove corresponding theoretical properties, for the parameters of the simple linear regression model (35), associated with the single index model. However, in a very similar way, we can define minimum pseudodistance estimators and obtain the same theoretical results for the more general linear regression model

Y_{j} = X_{j}^{T} β + e_{j}

,

j = 1, \dots, n

, where the errors

e_{j}

are i.i.d. normal variables with mean zero and variance

σ^{2}

,

X_{j} = {(X_{j 1}, \dots, X_{j p})}^{T}

is the vector of independent variables corresponding to the j-th observation and

β = {(β_{1}, \dots, β_{p})}^{T}

represents the regression coefficients.

The rest of the paper is organized as follows. In Section 2, we present the problem of robust estimation for some portfolio optimization models. In Section 3, we present the proposed approach. We define minimum pseudodistance estimators for regression parameters corresponding to the single index model and obtain corresponding estimating equations. Some asymptotic properties and equivariance properties of these estimators are studied. The robustness issue for estimators is considered through the influence function analysis. Using minimum pseudodistance estimators, new optimal portfolios are defined. Section 4 presents numerical results illustrating the performance of the proposed methodology. Finally, the proofs of the theorems are provided in the Appendix A.

2. The Single Index Model

Portfolio selection represents the problem of allocating a given capital over a number of available assets in order to maximize the return of the investment while minimizing the risk. We consider a portfolio formed by a collection of N assets. The returns of the assets are given by the random vector

X : = {(X_{1}, \dots, X_{N})}^{T}

. Usually, it is supposed that X follows a multivariate normal distribution

N_{N} (μ, Σ)

, with

μ

being the vector containing the mean returns of the assets and

Σ = (σ_{i j})

the covariance matrix of the assets returns. Let

w : = {(w_{1}, \dots, w_{N})}^{T}

be the vector of weights associated with the portfolio, where

w_{i}

is the proportion of capital invested in the asset i. Then, the total return of the portfolio is defined by the random variable

w^{T} X = w_{1} X_{1} + \dots + w_{N} X_{N} .

(1)

The mean and the variance of the portfolio return are given by

\begin{matrix} R (w) & : = & w^{T} μ, \end{matrix}

(2)

\begin{matrix} S (w) & : = & w^{T} Σ w . \end{matrix}

(3)

A classical approach for portfolio selection is the mean-variance optimization introduced by Markowitz [13]. For a given investor’s risk aversion

λ > 0

, the mean-variance optimization gives the optimal portfolio

w^{*}

, solution of the problem

\arg \max_{w} {R (w) - \frac{λ}{2} S (w)},

(4)

with the constraint

w^{T} e_{N} = 1

,

e_{N}

being the N-dimensional vector of ones. The solution of the optimization problem (4) is explicit, the optimal portfolio weights for a given value of

λ

being

w^{*} = \frac{1}{λ} Σ^{- 1} (μ - η e_{N}),

(5)

where

η = \frac{e_{N}^{T} Σ^{- 1} μ - λ}{e_{N}^{T} Σ^{- 1} e_{N}} .

(6)

This is the case when short selling is allowed. When short selling is not allowed, we have a supplementary constraint in the optimization problem, namely all the weights

w_{i}

are positive.

Another classical approach for portfolio selection is to minimize the portfolio risk defined by the portfolio variance, under given constraints. This means determining the optimal portfolio

w^{*}

as a solution of the optimization problem

\arg \min_{w} S (w),

(7)

subject to

R (w) = w^{T} μ \geq μ_{0}

, for a given value

μ_{0}

of the portfolio return.

However, the mean-variance analysis has been criticized for being sensitive to estimation errors of the mean and the covariance of the assets returns. For both optimization problems above, estimations of the input parameters

μ

and

Σ

are necessary. The quality and hence the usefulness of the results of the portfolio optimization problem critically depend on the quality of the statistical estimates for these input parameters. The mean vector and the covariance matrix of the returns are in practice estimated by the maximum likelihood estimators under the multivariate normal assumption. When the model is exactly satisfied, the maximum likelihood estimators have optimal properties, being the most efficient. On the other hand, in the presence of outlying observations, these estimators may give completely erroneous results and consequently the weights of the corresponding optimal portfolio may be completely misleading. It is a known fact that outliers frequently occur in asset returns, where an outlier is defined to be an unusually large value well separated from the bulk of the returns. Therefore, robust alternatives to the classical approaches need to be carefully analyzed.

For an overview on the robust methods for portfolio optimization, using robust estimators of the mean and covariance matrix in the Markowitz’s model, we refer to [14]. We also cite the methods proposed by Vaz-de Melo and Camara [15], Perret-Gentil and Victoria-Feser [16], Welsch and Zhou [17], DeMiguel and Nogales [18], and Toma and Leoni-Aubin [19].

On the other hand, in portfolio analysis, one is sometimes faced with two conflicting demands. Good quality statistical estimates require a large sample size. When estimating the covariance matrix, the sample size must be larger than the number of different elements of the matrix. For example, for a portfolio involving 100 securities, this would mean observations from 5050 trading days, which is about 20 years. From a practical point of view, considering such large samples is not adequate for the considered problem. Since the market conditions change rapidly, very old observations would lead to irrelevant estimates for the current or future market conditions. In addition, in some situations, the number of assets could even be much larger than the sample size of exploitable historical data. Therefore, estimating the covariance matrix of asset returns is challenging due to the high dimensionality and also to the heavy-tailedness of asset return data. It is a known fact that extreme events are typical in financial asset prices, leading to heavy-tailed asset returns. One way to treat these problems is to use the single index model.

The single index model (see [1]) allows us to express the large number of covariances between the returns of the individual assets through a significantly smaller number of parameters. This is possible under the hypothesis that the correlation between two assets is strictly given by their dependence on a common market index. The return of each asset i is expressed under the form

X_{i} = α_{i} + β_{i} X_{M} + e_{i},

(8)

where

X_{M}

is the random variable representing the return of the market index,

e_{i}

are zero mean random variables representing error terms and

α_{i}, β_{i}

are new parameters to be estimated. It is supposed that the

e_{i}

’s are independent and also that the

e_{i}

s are independent of

x_{M}

. Thus,

E (e_{i}) = 0

,

E (e_{i} e_{j}) = 0

and

E (e_{i} x_{M}) = 0

for all i and all

j \neq i

.

The intercept in Equation (35) represents the asset’s expected return when the market index return is zero. The slope coefficient

β_{i}

represents the asset’s sensitivity to the index, namely the impact of a unit change in the return of the index. The error

e_{i}

is the return variation that cannot be explained by the index.

The following notations are also used:

\begin{matrix} σ_{i}^{2} & : = & V a r (e_{i}), μ_{M} : = E (X_{M}), σ_{M}^{2} : = V a r (X_{M}) . \end{matrix}

Using Equation (35), the components of the parameters

μ

and

Σ

from the models (4) and (7) are given by

\begin{matrix} μ_{i} & = & α_{i} + β_{i} μ_{M}, \end{matrix}

(9)

\begin{matrix} σ_{i i} & = & β_{i}^{2} σ_{M}^{2} + σ_{i}^{2}, \end{matrix}

(10)

\begin{matrix} σ_{i j} & = & β_{i} β_{j} σ_{M}^{2} . \end{matrix}

(11)

Both variances and covariances are determined by the assets’ betas and sigmas and by the standard deviation of the market index. Thus, the

N (N + 1) / 2

different elements of the covariance matrix

Σ

can be expressed by

2 N + 1

parameters

β_{i}, σ_{i}

,

σ_{M}

. This is a significant reduction of the number of parameters that need to be estimated.

The traditional estimators for parameters of the single index model are based on the maximum likelihood method. These estimators have optimal properties for normally distributed variables, but they may give completely erroneous results in the presence of outlying observations. Therefore, robust estimates for the parameters of the single index model are necessary in order to provide robust and optimal portfolios.

3. Robust Estimators for the Single Index Model and Robust Portfolios

3.1. Definitions of the Estimators

Consider the linear regression model

X = α + β X_{M} + e .

(12)

Suppose we have i.i.d. two-dimensional random vectors

Z_{j} = (X_{M j}, X_{j})

,

j = 1, \dots, n

, such that

X_{j} = α + β X_{M j} + e_{j}

. The random variables

e_{j}

,

j = 1, \dots, n

, are i.i.d. with

N (0, σ)

and independent on the

X_{M j}

,

j = 1, \dots, n

.

The classical estimators for the unknown parameters

α, β, σ

of the linear regression model are the maximum likelihood estimators (MLE). The classical MLE estimators perform well if the model hypotheses are satisfied exactly and may otherwise perform poorly. It is well known that the MLE are not robust, since a small fraction of outliers, even one outlier may have an important effect inducing significant errors on the estimates. Therefore, robust alternatives of the MLE should be considered, in order to propose robust estimates for the single index model, leading then to robust portfolio weights.

In order to robustly estimate the unknown parameters

α, β, σ

, suppressing the outsized effects of outliers, we use the approach based on pseudodistance minimization.

For two probability measures

P, Q

admitting densities p, respectively, q with respect to the Lebesgue measure, we consider the following family of pseudodistances (also called

γ

-divergences in some articles) of orders

γ > 0

R_{γ} (P, Q) : = \frac{1}{(1 + γ)} \ln (\int p^{γ} d P) + \frac{1}{γ (1 + γ)} \ln (\int q^{γ} d Q) - \frac{1}{γ} \ln (\int p^{γ} d Q),

(13)

satisfying the limit relation

R_{γ} (P, Q) \to R_{0} (P, Q) : = \int \ln \frac{q}{p} d Q for γ ↓ 0 .

Note that

R_{0} (P, Q)

is the well-known modified Kullback–Leibler divergence. Minimum pseudodistance estimators for parametric models, using the family (13), have been studied by [6,10,11]. We also mention that pseudodistances (13) have also been used for defining optimal robust M-estimators with the Hampel’s infinitesimal approach in [20].

For the linear regression model, we consider the joint distribution of the entire data, the explanatory variable

X_{M}

being random together with the response variable X, and write a pseudodistance between a theoretical model and the data. Let

P_{θ}

, with

θ = : (α, β, σ)

, be the probability measure associated with the theoretical model given by the random vector

(X_{M}, X)

, where

X = α + β X_{M} + e

with

e \sim N (0, σ)

, e independent on

X_{M}

, and Q the probability measure associated with the data. Denote by

p_{θ}

, respectively, q, the corresponding densities. For

γ > 0

, the pseudodistance between

P_{θ}

and Q is defined by

\begin{matrix} R_{γ} (P_{θ}, Q) : = & \frac{1}{(1 + γ)} \ln (\int p_{θ}^{γ} (x_{M}, x) d P_{θ} (x_{M}, x)) + \frac{1}{γ (1 + γ)} \ln (\int q^{γ} (x_{M}, x) d Q (x_{M}, x)) - \\ \frac{1}{γ} \ln (\int p_{θ}^{γ} (x_{M}, x) d Q (x_{M}, x)) . \end{matrix}

(14)

Using the change of variables

(x_{M}, x) \to (u, v) : = (x_{M}, x - α - β x_{M})

and taking into account that

f (u, v) : = p_{θ} (u, v + α + β u)

is the density of

(X_{M}, e)

, since

X_{M}

and e are independent, we can write

\begin{matrix} \int p_{θ}^{γ} (x_{M}, x) d P_{θ} (x_{M}, x) = \int p_{M}^{γ + 1} (u) d u \cdot \int ϕ_{σ}^{γ + 1} (v) d v, \end{matrix}

(15)

\begin{matrix} \int p_{θ}^{γ} (x_{M}, x) d Q (x_{M}, x) = \int p_{M}^{γ} (x_{M}) \cdot ϕ_{σ}^{γ} (x - α - β x_{M}) d Q (x_{M}, x), \end{matrix}

(16)

where

p_{M}

is the density of

X_{M}

and

ϕ_{σ}

is the density of the random variable

e \sim N (0, σ)

. Then,

\begin{matrix} R_{γ} (P_{θ}, Q) = & \frac{1}{(1 + γ)} \ln (\int p_{M}^{γ + 1} (u) d u) + \frac{1}{(1 + γ)} \ln (\int ϕ_{σ}^{γ + 1} (v) d v) \\ + \frac{1}{γ (1 + γ)} \ln (\int q^{γ} (x_{M}, x) d Q (x_{M}, x)) \\ - \frac{1}{γ} \ln (\int p_{M}^{γ} (x_{M}) \cdot ϕ_{σ}^{γ} (x - α - β x_{M}) d Q (x_{M}, x)) . \end{matrix}

Notice that the first and the third terms in the pseudodistance

R_{γ} (P_{θ}, Q)

do not depend on

θ

and hence are not included in the minimization process. The parameter

θ_{0} : = (α_{0}, β_{0}, σ_{0})

of interest is then given by

\begin{matrix} (α_{0}, β_{0}, σ_{0}) & : = \arg \min_{α, β, σ} R_{γ} (P_{θ}, Q) \\ = \arg \min_{α, β, σ} \{\frac{1}{(1 + γ)} \ln (\int ϕ_{σ}^{γ + 1} (v) d v) - \frac{1}{γ} \ln (\int p_{M}^{γ} (x_{M}) \cdot ϕ_{σ}^{γ} (x - α - β x_{M}) d Q (x_{M}, x))\} . \end{matrix}

(17)

Suppose now that an i.i.d. sample

Z_{1}, \dots, Z_{n}

is available from the true model. For a given

γ > 0

, we define a minimum pseudodistance estimator of

θ_{0} = (α_{0}, β_{0}, σ_{0})

by minimizing an empirical version of the objective function in Equation (17). This empirical version is obtained by replacing

p_{M} (x_{M})

with the empirical density function

{\hat{p}}_{M} (x_{M}) = \frac{1}{n} \sum_{i = 1}^{n} δ (x_{M} - X_{M i})

, where

δ (\cdot)

is the Dirac delta function, and Q with the empirical measure corresponding to the sample. More precisely, we define

\hat{θ} : = (\hat{α}, \hat{β}, \hat{σ})

\begin{matrix} (\hat{α}, \hat{β}, \hat{σ}) & : = \arg \min_{α, β, σ} \{\frac{1}{(1 + γ)} \ln (\int ϕ_{σ}^{γ + 1} (v) d v) - \frac{1}{γ} \ln (\int {\hat{p}}_{M}^{γ} (x_{M}) \cdot ϕ_{σ}^{γ} (x - α - β x_{M}) d P_{n} (x_{M}, x))\} \\ = \arg \min_{α, β, σ} \{\frac{1}{(1 + γ)} \ln (\int ϕ_{σ}^{γ + 1} (v) d v) - \frac{1}{γ} \ln (\frac{1}{n^{γ + 1}} \sum_{j = 1}^{n} ϕ_{σ}^{γ} (X_{j} - α - β X_{M j}))\}, \end{matrix}

(18)

or equivalently

\begin{matrix} (\hat{α}, \hat{β}, \hat{σ}) & = & \arg \max_{α, β, σ} \sum_{j = 1}^{n} \frac{ϕ_{σ}^{γ} (X_{j} - α - β X_{M j})}{{[\int ϕ_{σ}^{γ + 1} (v) d v]}^{γ / (γ + 1)}} \\ = & \arg \max_{α, β, σ} \sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - α - β X_{M j}}{σ})}^{2}) . \end{matrix}

Differentiating with respect to

α, β, σ

, the estimators

\hat{α}, \hat{β}, \hat{σ}

are solutions of the system

\begin{matrix} \sum_{j = 1}^{n} \exp (- \frac{γ}{2} {(\frac{X_{j} - α - β X_{M j}}{σ})}^{2}) (\frac{X_{j} - α - β X_{M j}}{σ}) = 0, \end{matrix}

(19)

\begin{matrix} \sum_{j = 1}^{n} \exp (- \frac{γ}{2} {(\frac{X_{j} - α - β X_{M j}}{σ})}^{2}) (\frac{X_{j} - α - β X_{M j}}{σ}) X_{M j} = 0, \end{matrix}

(20)

\begin{matrix} \sum_{j = 1}^{n} \exp (- \frac{γ}{2} {(\frac{X_{j} - α - β X_{M j}}{σ})}^{2}) [{(\frac{X_{j} - α - β X_{M j}}{σ})}^{2} - \frac{1}{γ + 1}] = 0 . \end{matrix}

(21)

Note that, for

γ = 0

, the solution of this system is nothing but the maximum likelihood estimator of

(α, β, σ)

. Therefore, the estimating Equations (19)–(21) are generalizations of the maximum likelihood score equations. The tuning parameter

γ

associated with the pseudodistance controls the trade-off between robustness and efficiency of the minimum pseudodistance estimators.

We can also write that

\hat{θ} = (\hat{α}, \hat{β}, \hat{σ})

is a solution of

\sum_{j = 1}^{n} Ψ (Z_{j}, \hat{θ}) = 0 or \int Ψ (z, \hat{θ}) d P_{n} (z) = 0,

(22)

where

Ψ (z, θ) = {(ϕ (\frac{x - α - β x_{M}}{σ}), ϕ (\frac{x - α - β x_{M}}{σ}) x_{M}, χ (\frac{x - α - β x_{M}}{σ}))}^{T},

(23)

with

z = (x_{M}, x)

,

θ = (α, β, σ)

,

ϕ (t) = \exp (- \frac{γ}{2} t^{2}) t

and

χ (t) = \exp (- \frac{γ}{2} t^{2}) [t^{2} - \frac{1}{γ + 1}]

.

When the measure Q corresponding to the data pertain to the theoretical model, hence

Q = P_{θ_{0}}

, it holds that

\int Ψ (z, θ_{0}) d P_{θ_{0}} (z) = 0 .

(24)

Thus, we can consider

\hat{θ} = (\hat{α}, \hat{β}, \hat{σ})

as a Z-estimator of

θ_{0} = (α_{0}, β_{0}, σ_{0})

, which allows for adapting in the present context asymptotic results from the general theory of Z-estimators (see [21]).

Remark 1.

In the case when the density

p_{M}

is known, by replacing Q with the empirical measure

P_{n}

in Equation (17), a new class of estimators of

(α_{0}, β_{0}, σ_{0})

can be obtained. These estimators can also be written under the form of Z-estimators, using the same reasoning as above. The results of Theorems 1–4 below could be adapted for these new estimators, and moreover all the influence functions of these estimators would be redescending bounded. However, in practice, the density of the index return is not known. Therefore, we will work with the class of minimum pseudodistance estimators as defined above.

3.2. Asymptotic Properties

In order to prove the consistency of the estimators, we use their definition (22) as Z-estimators.

3.2.1. Consistency

Theorem 1.

Assume that, for any

ε > 0

, the following condition for the separability of solution holds

\inf_{θ \in M} ∥\int ψ (z, θ) d P_{θ_{0}} (z)∥ > 0 = ∥\int ψ (z, θ_{0}) d P_{θ_{0}} (z)∥,

(25)

where

M : = {θ s . t . ∥ θ - θ_{0} ∥ \geq ε}

. Then,

\hat{θ} = (\hat{α}, \hat{β}, \hat{σ})

converges in probability to

θ_{0} = (α_{0}, β_{0}, σ_{0})

.

3.2.2. Asymptotic Normality

Assume that

Z_{1}, \dots, Z_{n}

are i.i.d. two-dimensional random vectors having the common probability distribution

P_{θ_{0}}

. For

γ > 0

fixed, let

\hat{θ} = (\hat{α}, \hat{β}, \hat{σ})

be a sequence of estimators of the unknown parameter

θ_{0} = (α_{0}, β_{0}, σ_{0})

, solution of

\sum_{j = 1}^{n} Ψ (Z_{j}, \hat{θ}) = 0,

(26)

where

Ψ (z, θ) = {(σ^{2} ϕ (\frac{x - α - β x_{M}}{σ}), σ^{2} ϕ (\frac{x - α - β x_{M}}{σ}) x_{M}, σ^{2} χ (\frac{x - α - β x_{M}}{σ}))}^{T},

(27)

with

z = (x_{M}, x)

,

θ = (α, β, σ)

,

ϕ (t) = \exp (- \frac{γ}{2} t^{2}) t

and

χ (t) = \exp (- \frac{γ}{2} t^{2}) [t^{2} - \frac{1}{γ + 1}]

. Note that the estimators

\hat{θ} = (\hat{α}, \hat{β}, \hat{σ})

defined by Equations (19)–(21), or equivalently by (22), are also solutions of the system (26). Using the function (27) for defining the estimators allows for obtaining the asymptotic normality, only imposing the consistency condition of the estimators, without other supplementary assumptions that are usually imposed in the case of Z-estimators.

Theorem 2.

Assume that

\hat{θ} \to θ_{0}

in probability. Then,

\sqrt{n} (\hat{θ} - θ_{0}) \to N_{3} (0, B^{- 1} A {(B^{- 1})}^{T})

(28)

in distribution, where

A = E (Ψ (Z, θ_{0}) Ψ {(Z, θ_{0})}^{T})

and

B = E (\dot{Ψ} (Z, θ_{0}))

, with Ψ defined by (27),

\dot{Ψ}

being the matrix with elements

{\dot{Ψ}}_{i k} = \frac{\partial Ψ_{i}}{\partial θ_{k}}

.

After some calculations, we obtain the asymptotic covariance matrix of

\hat{θ}

having the form

σ_{0}^{2} \frac{{(γ + 1)}^{3}}{{(2 γ + 1)}^{3 / 2}} (\begin{matrix} \frac{μ_{M}^{2} + σ_{M}^{2}}{σ_{M}^{2}} & \frac{- μ_{M}}{σ_{M}^{2}} & 0 \\ \frac{- μ_{M}}{σ_{M}^{2}} & \frac{1}{σ_{M}^{2}} & 0 \\ 0 & 0 & \frac{3 γ^{2} + 4 γ + 2}{4 (2 γ + 1)} \end{matrix}) .

It follows that

\hat{β}

and

\hat{σ}

are asymptotically independent; in addition,

\hat{α}

and

\hat{σ}

are asymptotically independent.

3.3. Influence Functions

In order to describe stability properties of the estimators, we use the following well-known concepts from the theory of robust statistics. A map T, defined on a set of probability measures and parameter space valued, is a statistical functional corresponding to an estimator

\hat{θ}

of the parameter

θ

, if

\hat{θ} = T (P_{n})

,

P_{n}

being the empirical measure pertaining to the sample. The influence function of T at

P_{θ}

is defined by

IF (z; T, P_{θ}) : = {\frac{\partial T ({\tilde{P}}_{ε z})}{\partial ε}|}_{ε = 0},

where

{\tilde{P}}_{ε z} : = (1 - ε) P_{θ} + ε δ_{z},

δ_{z}

being the Dirac measure putting all mass at z. As a consequence, the influence function describes the linearized asymptotic bias of a statistic under a single point contamination of the model

P_{θ}

. An unbounded influence function implies an unbounded asymptotic bias of a statistic under single point contamination of the model. Therefore, a natural robustness requirement on a statistical functional is the boundedness of its influence function.

For

γ > 0

fixed and a given probability measure P, the statistical functionals

α (P)

,

β (P)

and

σ (P)

, corresponding to the minimum pseudodistance estimators

\hat{α}

,

\hat{β}

and

\hat{σ}

, are defined by the solution of the system

\int Ψ (z, T (P)) d P (z) = 0,

(29)

with

Ψ

defined by (23) and

T (P) : = (α (P), β (P), σ (P))

, whenever this solution exists.

When

P = P_{θ}

corresponds to the considered theoretical model, the solution of system (29) is

T (P_{θ}) = θ = (α, β, σ)

.

Theorem 3.

The influence functions corresponding to the estimators

\hat{α}, \hat{β}

and

\hat{σ}

are respectively given by

\begin{matrix} I F (x_{M 0}, x_{0}; α, P_{θ}) & = & σ {(γ + 1)}^{3 / 2} ϕ (\frac{x_{0} - α - β x_{M 0}}{σ}) [1 - \frac{(x_{M 0} - E (X_{M})) E (X_{M})}{V a r (X_{M})}], \end{matrix}

(30)

\begin{matrix} I F (x_{M 0}, x_{0}; β, P_{θ}) & = & σ {(γ + 1)}^{3 / 2} ϕ (\frac{x_{0} - α - β x_{M 0}}{σ}) \frac{x_{M 0} - E (X_{M})}{V a r (X_{M})}, \end{matrix}

(31)

\begin{matrix} I F (x_{M 0}, x_{0}; σ, P_{θ}) & = & \frac{σ {(γ + 1)}^{5 / 2}}{2} χ (\frac{x_{0} - α - β x_{M 0}}{σ}) . \end{matrix}

(32)

Since

χ

is redescending,

\hat{σ}

has a bounded influence function and hence it is a redescending B-robust estimator. On the other hand,

I F (x_{M 0}, x_{0}, α, P)

and

I F (x_{M 0}, x_{0}, β, P)

will tend to infinity only when

x_{M 0}

tends to infinity and

| \frac{x_{0} - α - β x_{M 0}}{σ} | \leq k

, for some k. Hence, these influence functions are bounded with respect to partial outliers or leverage points (outlying values of the independent variable). This means that large outliers with respect to

x_{M}

, or with respect to x, will have a reduced influence on the estimates. However, the influence functions are clearly unbounded for

γ = 0

, which corresponds to the non-robust maximum likelihood estimators.

3.4. Equivariance of the Regression Coefficients’ Estimators

If an estimator is equivariant, it means that it transforms "properly" in some sense. Rousseeuw and Leroy [22] (p. 116) discuss three important equivariance properties for a regression estimator: regression equivariance, scale equivariance and affine equivariance. These are desirable properties since they allow one to know how the estimates change under different types of transformations of the data. Regression equivariance means that any additional linear dependence is reflected in the regression vector accordingly. The regression equivariance is routinely used when studying regression estimators. It allows for assuming, without loss generality, any value for the parameter

(α, β)

for proving asymptotic properties or describing Monte-Carlo studies. An estimator being scale equivariant means that the fit produced by it is independent of the choice of measurement unit for the response variable. The affine equivariance is useful because it means that changing to a different co-ordinate system for the explanatory variable will not affect the estimate. It is known that the maximum likelihood estimator of the regression coefficients satisfies all these three properties. We show that the minimum pseudodistance estimators of the regression coefficients satisfy all the three equivariance properties, for all

γ > 0

.

Theorem 4.

For all

γ > 0

, the minimum pseudodistance estimators

{(\hat{α}, \hat{β})}^{T}

of the regression coefficients

{(α, β)}^{T}

are regression equivariant, scale equivariant and affine equivariant.

On the other hand, the objective function in the definition of the estimators depends on data only through the summation

\sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - α - β X_{M j}}{σ})}^{2}),

(33)

which is permutation invariant. Thus, the corresponding estimators of the regression coefficients and of the error standard deviation are permutation invariant, therefore the ordering of data does not affect the estimators.

The minimum pseudodistance estimators are also equivariant with respect to reparametrizations. If

θ = (α, β, σ)

and the model is reparametrized to

Υ = Υ (θ)

with a one-to-one transformation, then the minimum pseudodistance estimator of

Υ

is simply

\hat{Υ} = Υ (\hat{θ})

, in terms of the minimum pseudodistance estimator

\hat{θ}

of

θ

, for the same

γ

.

3.5. Robust Portfolios Using Minimum Pseudodistance Estimators

The robust estimation of the parameters

α_{i}, β_{i}, σ_{i}

from the single index model given by (35), using minimum pseudodistance estimators, together with the robust estimation of

μ_{M}

and

σ_{M}

lead to robust estimates of

μ

and

Σ

, on the basis of relations (9)–(11). Since we do not model the explanatory variable

X_{M}

in a specific way, we estimate

μ_{M}

and the standard deviation

σ_{M}

using as robust estimators the median, respectively the median absolute deviation. Then, the portfolio weights, obtained as solutions of the optimization problems (4) or (7) with input parameters robustly estimated, will also be robust. This methodology leads to new optimal robust portfolios. In the next section, on the basis of real financial data, we illustrate this new methodology and compare it with the traditional method based on maximum likelihood estimators.

4. Applications

4.1. Comparisons of the Minimum Pseudodistance Estimators with Other Robust Estimators for the Linear Regression Model

In order to illustrate the performance of the minimum pseudodistance estimators for the simple linear regression model, we compare them with the least median of squares (LMS) estimator (see [22,23]), with S-estimators (SE) (see [24]) and with the minimum density power divergence (MDPD) estimators (see [25]), estimators that are known to have a good behavior from the robustness point of view.

We considered a data set that comes from astronomy, namely the data from the Hertzsprung–Russell diagram of the star clusters CYG OB1 containing 47 stars in the direction of Cygnus. For these data, the independent variable is the logarithm of the effective temperature at the surface of the star and the dependent variable is the logarithm of its light intensity. The data are given in Rousseeuw and Leroy [22] (p. 27), who underlined that there are two groups of points: the majority, following a steep band, and four stars clearly forming a separate group from the rest of the data. These four stars are known as giants in astronomy. Thus, these outliers are not recording errors, but represents leverage points coming from a different group.

The estimates of the regression coefficients and of error standard deviation obtained with minimum pseudodistance estimators for several values of

γ

are given in Table 1 and some of the fitted models are plotted in Figure 1. For comparison, in Table 1, we also give estimates obtained with S-estimators based on the Tukey biweighted function, these estimates being taken from [24], as well as estimations obtained with minimum density power divergence methods for several values of the tuning parameter, and estimates obtained with the least median of squares method, all these estimates being taken from [25]. The MLE estimates, given on the first line of Table 1, are significantly affected by the four leverage points. On the other hand, like the robust least median of squares estimator, the robust S-estimators and some minimum density power divergence estimators, the minimum pseudodistance estimators with

γ \geq 0.32

can successfully ignore outliers. In addition, the minimum pseudodistance estimators with

γ \geq 0.5

give robust fits that are closer to the fits generated by the least median of squares estimates or by the S-estimates than the fits generated by the minimum density power divergence estimates.

4.2. Robust Portfolios Using Minimum Pseudodistance Estimators

In order to illustrate the performance of the proposed robust portfolio optimization method, we considered real data sets for the Russell 2000 index and for 50 stocks from its components. The stocks are listed in Appendix B. We selected daily return data for the Russell 2000 index and for all these stocks from 2 January 2013 to 30 June 2016. The data were retrieved from Yahoo Finance.

The data has been divided by quarter, in total 14 quarters for index and each stock. For each quarter, on the basis of data corresponding to the index, we estimated

μ_{M}

and the standard deviation

σ_{M}

using as robust estimators the median (MED), respectively the median absolute deviation (MAD) defined by

M A D : = \frac{1}{0.6745} \cdot M E D (| X_{i} - M E D (X_{i}) |) .

(34)

We also estimated

μ_{M}

and

σ_{M}

classically, using sample mean and sample variance. Then, for each quarter and each of the 50 stocks, we estimated

α

,

β

and

σ

from the regression model using robust minimum pseudodistance estimators, respectively the classical MLE estimators. Then, on the basis of relations (9), (10) and (11), we estimated

μ

and

Σ

first using the robust estimates and then the classical estimates, all being previously computed.

Once the input parameters for the portfolio optimization procedure were estimated, for each quarter, we determined efficient frontiers, for both robust estimates and classical estimates. In both cases, the efficient frontier is determined as follows. Firstly, the range of returns is determined as the interval comprised between the return of the portfolio of global minimum risk (variance) and the maximum value of the return of a feasible portfolio, where the feasible region is

X = \{w \in R^{N} |w^{T} e_{N} = 1, w_{k} \geq 0, k \in \{1, \dots, 50\}\}

and

N = 50

. We trace each efficient frontier in 100 points; therefore, the range of returns is divided, in each case, in ninety-nine sub-intervals with

μ_{1} < μ_{2} < \dots < μ_{100},

where

μ_{1}

is the return of the portfolio of global minimum variance and

μ_{100}

is the maximum return for the feasible region X. We determined

μ_{1}

and

μ_{100}

using robust estimates of

μ

and

Σ

(for the robust frontier) and then using classical estimates (for the classical frontier). In each case, 100 optimization problems are solved:

\begin{matrix} \arg \min_{w \in R^{N}} S (w) \\ w_{k} \geq 0, k \in \{1, \dots, 50\} \\ w^{T} e_{N} = 1 \\ R (w) \geq μ_{i}, \end{matrix}

where

i \in \{1, \dots, 100\}

.

In Figure 2, for eight quarters (the first four quarters and the last four quarters), we present efficient frontiers corresponding to the optimal minimum variance portfolios based on the robust minimum pseudodistance estimates with

γ = 0.5

, respectively based on the classical estimates. Thus, on the

o x

-axis, we consider the portfolio risk (given by the portfolio standard deviation) and, on the

o y

-axis, we represent the portfolio return. We notice that, in comparison with the classical method based on MLE, the proposed robust method provides optimal portfolios that have higher returns for the same level of risk (standard deviation). Indeed, for each quarter, the robust frontier is situated above the classical one, the standard deviations of the robust portfolios being smaller compared with those of the classical portfolios. We obtained similar results for the other quarters and for other choices of the tuning parameter

γ

, corresponding to the minimum pseudodistance estimators, too.

We also illustrate the empirical performance of the proposed optimal portfolios through an out-of-sample analysis, by using the Sharpe ratio as out-of-sample measure. For this analysis, we apply a “rolling-horizon” procedure as presented in [18]. First, we choose a window over which to perform the estimation. We denote the length of the estimation window by

τ < T

, where T is the size of the entire data set. Then, using the data in the first estimation window, we compute the weights for the considered portfolios. We repeat this procedure for the next window, by including the data for the next day and dropping the data for the earliest day. We continue doing this until the end of the data set is reached. At the end of this process, we have generated

T - τ

portfolio weight vectors for each strategy, which are the vectors

w_{t}^{k}

for

t \in {τ, \dots, T - 1}

, k denoting the strategy. For a strategy k,

w_{t}^{k}

has the components

w_{j, t}^{k}

, where

w_{j, t}^{k}

denotes the portfolio weight in asset j chosen at the time t.

The out-of-sample return at the time

t + 1

, corresponding to the strategy k, is defined as

{(w_{t}^{k})}^{T} X_{t + 1}

,

X_{t + 1} : = {(X_{1, t + 1}, \dots, X_{N, t + 1})}^{T}

representing the data at the time

t + 1

. For each strategy k, using these out-of-sample returns, the out-of-sample mean and the out-of-sample variance are defined by

{\hat{μ}}^{k} = \frac{1}{T - τ} \sum_{t = τ}^{T - 1} {(w_{t}^{k})}^{T} X_{t + 1} and {({\hat{σ}}^{k})}^{2} = \frac{1}{T - τ - 1} \sum_{t = τ}^{T - 1} {({(w_{t}^{k})}^{T} X_{t + 1} - {\hat{μ}}^{k})}^{2}

(35)

and the out-of-sample Sharpe ratio is defined by

{\hat{S R}}^{k} = \frac{{\hat{μ}}^{k}}{{\hat{σ}}^{k}} .

(36)

In this example, we considered the data set corresponding to the quarters 13 and 14. The size of the entire data set was

T = 126

and the length of the estimation window was

τ = 63

points. For the data from the first window, classical and robust efficient frontiers were traced, following all the steps that we explained in the first part of this subsection. More precisely, we considered the classical efficient frontier corresponding to the optimal minimum variance portfolios based on MLE and three robust frontiers, corresponding to the optimal minimum variance portfolios using robust minimum pseudodistance estimations with

γ = 1

,

γ = 1.2

and

γ = 1.5

, respectively. Then, on each frontier, we chose the optimal portfolio associated with the maximal value of the ratio between the portfolio return and portfolio standard deviation. These four optimal portfolios represent the strategies that we compared in the out-of-sample analysis. For each of these portfolios, we computed the out-of-sample returns for the next time (next day). Then, we repeated all these procedures for the next window, and so on until the end of the data set has been reached. In the spirit of [18] Section 5, using (35) and (36), we computed out-of-sample means, out-of-sample variances and out-of-sample Sharpe ratios for each strategy. The out-of-sample means and out-of-sample variances were annualized, and we also considered a benchmark rate of 1.5 %. In this way, we obtained the following values for the out-of-sample Sharpe ratio:

\hat{S R} = 0.22

for the optimal portfolio based on MLE,

\hat{S R} = 0.74

for the optimal portfolio based on minimum pseudodistance estimations with

γ = 1

,

\hat{S R} = 0.71

for the optimal portfolio based on minimum pseudodistance estimations with

γ = 1.2

and

\hat{S R} = 0.29

for the optimal portfolio based on minimum pseudodistance estimations with

γ = 1.5

. In Figure 3, we illustrate efficient frontiers for the windows 7 and 8, as well as the optimal portfolios chosen on each frontier.

This example shows that the optimal minimum variance portfolios based on robust minimum pseudodistance estimations in the single index model may attain higher Sharpe ratios than the traditional optimal minimum variance portfolios given by the single index model using MLE.

The obtained numerical results show that, for the single index model, the presented robust technique for portfolio optimization yields better results than the classical method based on MLE, in the sense that it leads to larger returns for the same value of risk in the case when outliers or atypical observations are present in the data set. The considered data sets contain such outliers. This is often the case for the considered problem, since outliers frequently occur in asset returns data. However, when there are no outliers in the data set, the classical method based on MLE is more efficient than the robust ones and therefore may lead to better results.

5. Conclusions

When outliers or atypical observations are present in the data set, the new portfolio optimization method based on robust minimum pseudodistance estimates yields better results than the classical single index method based on MLE estimates, in the sense that it leads to larger returns for smaller risks. In literature, there exist various methods for robust estimation in regression models. In the present paper, we proposed the method based on the minimum pseudodistance approach, which suppose to solve a simple optimization problem. In addition, from a theoretical point of view, these estimators have attractive properties, such as being redescending robust, consistent, equivariant and asymptotically normally distributed. The comparison with other known robust estimators of the regression parameters, such as the least median of squares estimators, the S-estimators or the minimum density power divergence estimators, shows that the minimum pseudodistance estimators represent an attractive alternative that may be considered in other applications too.

Author Contributions

A.T. designed the methodology, obtained the theoretical results and wrote the paper. A.T. and C.F. conceived the application part. C.F. implemented the methods in MATLAB and obtained the numerical results. Both authors have read and approved the final manuscript.

Acknowledgments

This work was supported by a grant of the Romanian National Authority for Scientific Research, CNCS-UEFISCDI, project number PN-II-RU-TE-2012-3-0007.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of the Results

Proof of Theorem 1.

Since the functions

ϕ

and

χ

are redescending bounded functions, for a compact neighborhood

N_{θ_{0}}

of

θ_{0}

, it holds

\int \sup_{θ \in N_{θ_{0}}} ∥ Ψ (z, θ) ∥ d P_{θ_{0}} (z) < \infty .

(A1)

Since

θ \mapsto Ψ (z, θ)

is continuous, by the uniform law of large numbers, (A1) implies

\sup_{θ \in N_{θ_{0}}} ∥\int ψ (z, θ) d P_{n} (z) - \int ψ (z, θ) d P_{θ_{0}} (z)∥ \to 0

(A2)

in probability.

Then, (A2) together with assumption (25) assure the convergence in probability of

\hat{θ}

toward

θ_{0}

. The arguments are the same as those from van der Vaart [21], Theorem 5.9, p. 46. ☐

Proof of Theorem 2.

First, note that

Ψ

defined by (27) is twice differentiable with respect to

θ

with bounded derivatives. The matrix

\dot{Ψ} (z, θ)

has the form

(\begin{matrix} - σ ϕ^{'} (\frac{x - α - β x_{M}}{σ}) & - σ ϕ^{'} (\frac{x - α - β x_{M}}{σ}) x_{M} & 2 σ ϕ (\frac{x - α - β x_{M}}{σ}) - σ ϕ^{'} (\frac{x - α - β x_{M}}{σ}) (\frac{x - α - β x_{M}}{σ}) \\ - σ ϕ^{'} (\frac{x - α - β x_{M}}{σ}) x_{M} & - σ ϕ^{'} (\frac{x - α - β x_{M}}{σ}) x_{M}^{2} & 2 σ ϕ (\frac{x - α - β x_{M}}{σ}) x_{M} - σ ϕ^{'} (\frac{x - α - β x_{M}}{σ}) (\frac{x - α - β x_{M}}{σ}) x_{M} \\ - σ χ^{'} (\frac{x - α - β x_{M}}{σ}) & - σ χ^{'} (\frac{x - α - β x_{M}}{σ}) x_{M} & 2 σ χ (\frac{x - α - β x_{M}}{σ}) - σ χ^{'} (\frac{x - α - β x_{M}}{σ}) (\frac{x - α - β x_{M}}{σ}) \end{matrix})

with

ϕ^{'} (t) = [1 - γ t^{2}] \exp (- \frac{γ}{2} t^{2})

and

χ^{'} (t) = [\frac{3 γ + 2}{γ + 1} t - γ t^{3}] \exp (- \frac{γ}{2} t^{2})

. Since

ϕ (t), χ (t), ϕ^{'} (t), χ^{'} (t)

are redescending bounded functions, for

θ = θ_{0}

, it holds

| {\dot{Ψ}}_{i k} (z, θ_{0}) | \leq K (z) with E (K (Z)) < \infty .

(A3)

In addition, a simple calculation shows that each component

\frac{\partial Ψ_{i}}{\partial θ_{k} \partial θ_{l}}

is a bounded function, since it can be expressed through the functions

ϕ (t), χ (t), ϕ^{'} (t), χ^{'} (t), ϕ^{″} (t), χ^{″} (t)

, which are redescending bounded functions. In addition, bounds that can be established for each component

\frac{\partial Ψ_{i}}{\partial θ_{k} \partial θ_{l}}

do not depend on the parameter

θ

.

For each i, call

{\ddot{Ψ}}_{i}

the matrix with elements

\frac{\partial Ψ_{i}}{\partial θ_{k} \partial θ_{l}}

and

C_{n} (z, θ)

the matrix with its i-th raw equal to

{(\hat{θ} - θ_{0})}^{T} {\ddot{Ψ}}_{i} (z, θ)

. Using a Taylor expansion, we get

0 = \sum_{j = 1}^{n} Ψ (Z_{j}, \hat{θ}) = \sum_{j = 1}^{n} {Ψ (Z_{j}, θ_{0}) + \dot{Ψ} (Z_{j}, θ_{0}) (\hat{θ} - θ_{0}) + \frac{1}{2} C_{n} (Z_{j}, θ_{j}) (\hat{θ} - θ_{0})} .

(A4)

Therefore,

0 = A_{n} + (B_{n} + {\bar{C}}_{n}) (\hat{θ} - θ_{0})

(A5)

with

A_{n} = \frac{1}{n} \sum_{j = 1}^{n} Ψ (Z_{j}, θ_{0}), B_{n} = \frac{1}{n} \sum_{j = 1}^{n} \dot{Ψ} (Z_{j}, θ_{0}), {\bar{C}}_{n} = \frac{1}{2 n} \sum_{j = 1}^{n} C_{n} (Z_{j}, θ_{j})

(A6)

i.e.,

{\bar{C}}_{n}

is the matrix with its i-th raw equal to

{(\hat{θ} - θ_{0})}^{T} {\ddot{Ψ}}_{i}^{-}

, where

{\ddot{Ψ}}_{i}^{-} = \frac{1}{2 n} \sum_{j = 1}^{n} {\ddot{Ψ}}_{i} (Z_{j}, θ_{j}),

(A7)

which is bounded by a constant that does not depend on

θ

, according to the arguments mentioned above. Since

\hat{θ} - θ_{0} \to 0

in probability, this implies that

{\bar{C}}_{n} \to 0

in probability.

We have

\sqrt{n} (\hat{θ} - θ_{0}) = - {(B_{n} + {\bar{C}}_{n})}^{- 1} \sqrt{n} A_{n} .

(A8)

Note that, for

j = 1, \dots, n

, the vectors

Ψ (Z_{j}, θ_{0})

are i.i.d. with mean zero and the covariance matrix A, and the matrices

\dot{Ψ} (Z_{j}, θ_{0})

are i.i.d. with mean B. Hence, when

n \to \infty

, using (A3), the law of large numbers implies that

B_{n} \to B

in probability, which implies

B_{n} + {\bar{C}}_{n} \to B

in probability, which is nonsingular. Then, the multivariate central limit theorem implies

\sqrt{n} A_{n} \to N_{3} (0, A)

in distribution.

Then,

\sqrt{n} (\hat{θ} - θ_{0}) \to N_{3} (0, B^{- 1} A {(B^{- 1})}^{T})

(A9)

in distribution, according to the multivariate Slutzki’s Lemma. ☐

Proof of Theorem 3.

The system (29) can be written as

\begin{matrix} \int ϕ (\frac{x - α (P) - β (P) x_{M}}{σ (P)}) d P (x_{M}, x) = 0, \\ \int ϕ (\frac{x - α (P) - β (P) x_{M}}{σ (P)}) x_{M} d P (x_{M}, x) = 0, \\ \int χ (\frac{x - α (P) - β (P) x_{M}}{σ (P)}) d P (x_{M}, x) = 0 . \end{matrix}

We consider the contaminated model

{\tilde{P}}_{ε, x_{M 0}, x_{0}} : = (1 - ε) P_{θ} + ε δ_{(x_{M 0}, x_{0})}

, where

δ_{(x_{M 0}, x_{0})}

is the Dirac measure putting all mass in the point

(x_{M 0}, x_{0})

, which we simply denote here by

{\tilde{P}}_{ε}

. Then, it holds

\begin{matrix} (1 - ε) \int ϕ (\frac{x - α ({\tilde{P}}_{ε}) - β ({\tilde{P}}_{ε}) x_{M}}{σ ({\tilde{P}}_{ε})}) d P_{θ} (x_{M}, x) + ε ϕ (\frac{x_{0} - α ({\tilde{P}}_{ε}) - β ({\tilde{P}}_{ε}) x_{M 0}}{σ ({\tilde{P}}_{ε})}) = 0, \end{matrix}

(A10)

\begin{matrix} (1 - ε) \int ϕ (\frac{x - α ({\tilde{P}}_{ε}) - β ({\tilde{P}}_{ε}) x_{M}}{σ ({\tilde{P}}_{ε})}) x_{M} d P_{θ} (x_{M}, x) + ε ϕ (\frac{x_{0} - α ({\tilde{P}}_{ε}) - β ({\tilde{P}}_{ε}) x_{M 0}}{σ ({\tilde{P}}_{ε})}) x_{M 0} = 0, \end{matrix}

(A11)

\begin{matrix} (1 - ε) \int χ (\frac{x - α ({\tilde{P}}_{ε}) - β ({\tilde{P}}_{ε}) x_{M}}{σ ({\tilde{P}}_{ε})}) d P_{θ} (x_{M}, x) + ε χ (\frac{x_{0} - α ({\tilde{P}}_{ε}) - β ({\tilde{P}}_{ε}) x_{M 0}}{σ ({\tilde{P}}_{ε})}) = 0 . \end{matrix}

(A12)

Derivating the first equation with respect to

ε

and taking the derivatives in

ε = 0

, we obtain

\begin{matrix} \int ϕ^{'} (\frac{x - α - β x_{M}}{σ}) [\frac{1}{σ} (- I F (x_{M 0}, x_{0}, α, P_{θ}) - x_{M} I F (x_{M 0}, x_{0}, β, P_{θ})) \\ - \frac{x - α - β x_{M}}{σ^{2}} I F (x_{M 0}, x_{0}, σ, P_{θ})] d P_{θ} (x_{M}, x) + ϕ (\frac{x_{0} - α - β x_{M 0}}{σ}) = 0 . \end{matrix}

After some calculations, we obtain the relation

\begin{matrix} - \frac{1}{σ {(γ + 1)}^{3 / 2}} I F (x_{M 0}, x_{0}, α, P_{θ}) - \frac{1}{σ {(γ + 1)}^{3 / 2}} I F (x_{M 0}, x_{0}, β, P_{θ}) E (X_{M}) + ϕ (\frac{x_{0} - α - β x_{M 0}}{σ}) = 0 . \end{matrix}

(A13)

Similarly, derivating with respect to

ε

Equations (A11) and (A12) and taking the derivatives in

ε = 0

, we get

\begin{matrix} - \frac{1}{σ {(γ + 1)}^{3 / 2}} E (X_{M}) I F (x_{M 0}, x_{0}, α, P_{θ}) - \frac{1}{σ {(γ + 1)}^{3 / 2}} E (X_{M}^{2}) I F (x_{M 0}, x_{0}, β, P_{θ}) \\ + ϕ (\frac{x_{0} - α - β x_{M 0}}{σ}) x_{M 0} = 0 \end{matrix}

(A14)

and

- \frac{2}{σ {(γ + 1)}^{5 / 2}} I F (x_{M 0}, x_{0}, σ, P_{θ}) + χ (\frac{x_{0} - α - β x_{M 0}}{σ}) = 0 .

(A15)

Solving the system formed with the Equations (A13)–(A15), we find the expressions for the influence functions. ☐

Proof of Theorem 4.

In the following, we simply denote by

X_{M j}

the vector

{(1, X_{M j})}^{T}

. Then,

\begin{matrix} {(\hat{α}, \hat{β})}^{T} ({(X_{M j}, X_{j}) : j = 1, \dots, n}) \\ = \arg_{{(α, β)}^{T}} \max_{(α, β, σ)} \sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - X_{M j}^{T} {(α, β)}^{T}}{σ})}^{2}) . \end{matrix}

For any two-dimensional column vector v, we have

\begin{matrix} {(\hat{α}, \hat{β})}^{T} ({(X_{M j}, X_{j} + X_{M j}^{T} v) : j = 1, \dots, n}) \\ = \arg_{{(α, β)}^{T}} \max_{(α, β, σ)} \sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} + X_{M j}^{T} v - X_{M j}^{T} {(α, β)}^{T}}{σ})}^{2}) \\ = \arg_{{(α, β)}^{T}} \max_{(α, β, σ)} \sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - X_{M j}^{T} ({(α, β)}^{T} - v)}{σ})}^{2}) \\ = \arg_{({(α, β)}^{T} - v)} \max_{{({(α, β)}^{T} - v)}^{T}, σ)} \sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - X_{M j}^{T} ({(α, β)}^{T} - v)}{σ})}^{2}) + v \\ = {(\hat{α}, \hat{β})}^{T} ({(X_{M j}, X_{j}) : j = 1, \dots, n}) + v, \end{matrix}

which show that

{(\hat{α}, \hat{β})}^{T}

is regression equivariant.

For any constant

c \neq 0

, we have

\begin{matrix} {(\hat{α}, \hat{β})}^{T} ({(X_{M j}, c X_{j}) : j = 1, \dots, n}) \\ = \arg_{{(α, β)}^{T}} \max_{(α, β, σ)} \sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{c X_{j} - X_{M j}^{T} {(α, β)}^{T}}{σ})}^{2}) \\ = \arg_{{(α, β)}^{T}} \max_{(α, β, σ)} \sum_{j = 1}^{n} c^{- γ / (γ + 1)} {(σ / c)}^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - X_{M j}^{T} ({(α, β)}^{T} / c)}{(σ / c)})}^{2}) \\ = c \cdot \arg_{{(α / c, β / c)}^{T}} \max_{(α / c, β / c, σ / c)} \sum_{j = 1}^{n} c^{- γ / (γ + 1)} {(σ / c)}^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - X_{M j}^{T} ({(α, β)}^{T} / c)}{(σ / c)})}^{2}) \\ = c \cdot {(\hat{α}, \hat{β})}^{T} ({(X_{M j}, X_{j}) : j = 1, \dots, n}) . \end{matrix}

This implies that the estimator

(\hat{α}, \hat{β}) = (\hat{α}, \hat{β}) ({(X_{M j}, X_{j}) : j = 1, \dots, n})

is scale equivariant.

Now, for any two-dimensional square matrix A, we get

\begin{matrix} {(\hat{α}, \hat{β})}^{T} ({(A^{T} X_{M j}, X_{j}) : j = 1, \dots, n}) \\ = \arg_{{(α, β)}^{T}} \max_{(α, β, σ)} \sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - X_{M j}^{T} A {(α, β)}^{T}}{σ})}^{2}) \\ = A^{- 1} \arg_{A {(α, β)}^{T}} \max_{((α, β) A^{T}, σ)} \sum_{j = 1}^{n} σ^{- γ / (γ + 1)} \exp (- \frac{γ}{2} {(\frac{X_{j} - X_{M j}^{T} (A {(α, β)}^{T})}{σ})}^{2}) \\ = A^{- 1} \cdot {(\hat{α}, \hat{β})}^{T} ({(X_{M j}, X_{j}) : j = 1, \dots, n}), \end{matrix}

which show the affine equivariance of the estimator

(\hat{α}, \hat{β}) = (\hat{α}, \hat{β}) ({(X_{M j}, X_{j}) : j = 1, \dots, n})

. ☐

Appendix B. The 50 Stocks and Their Abbreviations

Asbury Automotive Group, Inc. (ABG)
Arctic Cat Inc. (ACAT)
American Eagle Outfitters, Inc. (AEO)
AK Steel Holding Corporation (AKS)
Albany Molecular Research, Inc. (AMRI)
The Andersons, Inc. (ANDE)
ARMOUR Residential REIT, Inc. (ARR)
BJ’s Restaurants, Inc. (BJRI)
Brooks Automation, Inc. (BRKS)
Caleres, Inc. (CAL)
Cincinnati Bell Inc. (CBB)
Calgon Carbon Corporation (CCC)
Coeur Mining, Inc. (CDE)
Cohen & Steers, Inc. (CNS)
Cray Inc. (CRAY)
Cirrus Logic, Inc. (CRUS)
Covenant Transportation Group, Inc. (CVTI)
EarthLink Holdings Corp. (ELNK)
Gray Television, Inc. (GTN)
Triple-S Management Corporation (GTS)
Getty Realty Corp. (GTY)
Hecla Mining Company (HL)
Harmonic Inc. (HLIT)
Ligand Pharmaceuticals Incorporated (LGND)
Louisiana-Pacific Corporation (LPX)
Lattice Semiconductor Corporation (LSCC)
ManTech International Corporation (MANT)
MiMedx Group, Inc. (MDXG)
Medifast, Inc. (MED)
Mentor Graphics Corporation (MENT)
Mistras Group, Inc. (MG)
Mesa Laboratories, Inc. (MLAB)
Meritor, Inc. (MTOR)
Monster Worldwide, Inc. (MWW)
Nektar Therapeutics (NKTR)
Osiris Therapeutics, Inc. (OSIR)
PennyMac Mortgage Investment Trust (PMT)
Paratek Pharmaceuticals, Inc. (PRTK)
Repligen Corporation (RGEN)
Rigel Pharmaceuticals, Inc. (RIGL)
Schnitzer Steel Industries, Inc. (SCHN)
comScore, Inc. (SCOR)
Safeguard Scientifics, Inc. (SFE)
Silicon Graphics International (SGI)
Sagent Pharmaceuticals, Inc. (SGNT)
Semtech Corporation (SMTC)
Sapiens International Corporation N.V. (SPNS)
Sarepta Therapeutics, Inc. (SRPT)
Take-Two Interactive Software, Inc. (TTWO)
Park Sterling Corporation (PSTB)

References

Sharpe, W.F. A simplified model to portfolio analysis. Manag. Sci. 1963, 9, 277–293. [Google Scholar] [CrossRef]
Alexander, G.J.; Sharpe, W.F.; Bailey, J.V. Fundamentals of Investments; Prentice-Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall: Boca Raton, FL, USA, 2006. [Google Scholar]
Basu, A.; Shioya, H.; Park, C. Statistical Inference: the Minimum Pseudodistance Approach; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimizing a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef]
Jones, M.C.; Hjort, N.L.; Harris, I.R.; Basu, A. A comparison of related density-based minimum divergence estimators. Biometrika 2001, 88, 865–873. [Google Scholar] [CrossRef]
Broniatowski, M.; Keziou, A. Parametric estimation and tests through divergences and the duality technique. J. Multivar. Anal. 2009, 100, 16–36. [Google Scholar] [CrossRef]
Toma, A.; Leoni-Aubin, S. Robust tests based on dual divergence estimators and saddlepoint approximations. J. Multivar. Anal. 2010, 101, 1143–1155. [Google Scholar] [CrossRef]
Toma, A.; Broniatowski, M. Dual divergence estimators and tests: Robustness results. J. Multivar. Anal. 2011, 102, 20–36. [Google Scholar] [CrossRef]
Fujisawa, H.; Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008, 99, 2053–2081. [Google Scholar] [CrossRef]
Broniatowski, M.; Vajda, I. Several applications of divergence criteria in continuous families. Kybernetica 2012, 48, 600–636. [Google Scholar]
Broniatowski, M.; Toma, A.; Vajda, I. Decomposable pseudodistances and applications in statistical estimation. J. Stat. Plan. Inference 2012, 142, 2574–2585. [Google Scholar] [CrossRef]
Markowitz, H.M. Mean-variance analysis in portfolio choice and capital markets. J. Finance 1952, 7, 77–91. [Google Scholar]
Fabozzi, F.J.; Huang, D.; Zhou, G. Robust portfolios: contributions from operations research and finance. Ann. Oper. Res. 2010, 176, 191–220. [Google Scholar] [CrossRef]
Vaz-de Melo, B.; Camara, R.P. Robust multivariate modeling in finance. Int. J. Manag. Finance 2005, 4, 12–23. [Google Scholar] [CrossRef]
Perret-Gentil, C.; Victoria-Feser, M.P. Robust Mean-Variance Portfolio Selection. FAME Research Paper, No. 140. 2005. Available online: papers.ssrn.com/sol3/papers.cfm?abstract_id=721509 (accessed on 28 February 2018).
Welsch, R.E.; Zhou, X. Application of robust statistics to asset allocation models. Revstat. Stat. J. 2007, 5, 97–114. [Google Scholar]
DeMiguel, V.; Nogales, F.J. Portfolio selection with robust estimation. Oper. Res. 2009, 57, 560–577. [Google Scholar] [CrossRef]
Toma, A.; Leoni-Aubin, S. Robust portfolio optimization using pseudodistances. PLoS ONE 2015, 10, 1–26. [Google Scholar] [CrossRef] [PubMed]
Toma, A.; Leoni-Aubin, S. Optimal robust M-estimators using Renyi pseudodistances. J. Multivar. Anal. 2013, 115, 359–373. [Google Scholar] [CrossRef]
Van der Vaart, A. Asymptotic Statistics; Cambridge University Press: New York, NY, USA, 1998. [Google Scholar]
Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
Andersen, R. Modern Methods for Robust Regression; SAGE Publications, Inc.: Los Angeles, CA, USA, 2008. [Google Scholar]
Rousseeuw, P.J.; Yohai, V. Robust regression by means of S-estimators. In Robust and Nonlinear Time Series Analysis; Franke, J., Hardle, W., Martin, D., Eds.; Springer: New York, NY, USA, 1984; pp. 256–272. ISBN 978-0-387-96102-6. [Google Scholar]
Ghosh, A.; Basu, A. Robust estimations for independent, non-homogeneous observations using density power divergence with applications to linear regression. Electron. J. Stat. 2013, 7, 2420–2456. [Google Scholar] [CrossRef]

Figure 1. Plots of the Hertzsprung–Russell data and fitted regression lines using MLE, minimum density power divergence (MDPD) methods for several values of

γ

, minimum pseudodistance (MP) methods for several values of

γ

, S-estimators (SE) and the least median of squares (LMS) method.

Figure 1. Plots of the Hertzsprung–Russell data and fitted regression lines using MLE, minimum density power divergence (MDPD) methods for several values of

γ

, minimum pseudodistance (MP) methods for several values of

γ

, S-estimators (SE) and the least median of squares (LMS) method.

Figure 2. Efficient frontiers, classical (MLE) vs. robust corresponding to

γ = 0.5

(RE), for eight quarters (the first four quarters and the last four quarters).

Figure 2. Efficient frontiers, classical (MLE) vs. robust corresponding to

γ = 0.5

(RE), for eight quarters (the first four quarters and the last four quarters).

Figure 3. Efficient frontiers, classical (MLE) vs. robust corresponding to

γ = 1

(RE), and optimal portfolios chosen on frontiers, for the windows 7 (left) and 8 (right).

Figure 3. Efficient frontiers, classical (MLE) vs. robust corresponding to

γ = 1

(RE), and optimal portfolios chosen on frontiers, for the windows 7 (left) and 8 (right).

Table 1. The parameter estimates for the linear regression model for the Hertzsprung–Russell data using several minimum pseudodistance (MP) methods, several minimum density power divergence (MDPD) methods, the least median of squares (LMS) method, S-estimators and the MLE method.

γ

represents tuning parameter.

Table 1. The parameter estimates for the linear regression model for the Hertzsprung–Russell data using several minimum pseudodistance (MP) methods, several minimum density power divergence (MDPD) methods, the least median of squares (LMS) method, S-estimators and the MLE method.

γ

represents tuning parameter.

MLE Estimates
	$α$	$β$	$σ$
	6.79	−0.41	0.55
MP Estimates
$γ$	$α$	$β$	$σ$
$0.01$	$6.79$	$- 0.41$	$0.55$
$0.1$	$6.81$	$- 0.41$	$0.56$
$0.25$	$6.86$	$- 0.42$	$0.58$
$0.3$	$6.88$	$- 0.42$	$0.59$
$0.31$	$6.89$	$- 0.43$	$0.59$
$0.32$	$- 6.81$	$2.66$	$0.39$
$0.35$	$- 7.16$	$2.74$	$0.38$
$0.4$	$- 7.62$	$2.85$	$0.38$
$0.5$	$- 8.17$	$2.97$	$0.37$
$0.75$	$- 8.65$	$3.08$	$0.38$
1	$- 8.84$	$3.12$	$0.39$
$1.2$	$- 8.94$	$3.15$	$0.40$
$1.5$	$- 9.08$	$3.18$	$0.41$
2	$- 9.31$	$3.23$	$0.43$
MDPD Estimates
$γ$	$α$	$β$	$σ$
$0.1$	$6.78$	$- 0.41$	$0.60$
$0.25$	$- 5.16$	$2.30$	$0.42$
$0.5$	$- 7.22$	$2.76$	$0.40$
$0.8$	$- 7.89$	$2.91$	$0.40$
1	$- 8.03$	$2.95$	$0.41$
S-Estimates
	$α$	$β$	$σ$
	$- 9.59$	$3.28$	−
LMS Estimates
	$α$	$β$	$σ$
	$- 12.30$	$3.90$	−

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Toma, A.; Fulga, C. Robust Estimation for the Single Index Model Using Pseudodistances. Entropy 2018, 20, 374. https://0-doi-org.brum.beds.ac.uk/10.3390/e20050374

AMA Style

Toma A, Fulga C. Robust Estimation for the Single Index Model Using Pseudodistances. Entropy. 2018; 20(5):374. https://0-doi-org.brum.beds.ac.uk/10.3390/e20050374

Chicago/Turabian Style

Toma, Aida, and Cristinca Fulga. 2018. "Robust Estimation for the Single Index Model Using Pseudodistances" Entropy 20, no. 5: 374. https://0-doi-org.brum.beds.ac.uk/10.3390/e20050374

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Estimation for the Single Index Model Using Pseudodistances

Abstract

1. Introduction

2. The Single Index Model

3. Robust Estimators for the Single Index Model and Robust Portfolios

3.1. Definitions of the Estimators

3.2. Asymptotic Properties

3.2.1. Consistency

3.2.2. Asymptotic Normality

3.3. Influence Functions

3.4. Equivariance of the Regression Coefficients’ Estimators

3.5. Robust Portfolios Using Minimum Pseudodistance Estimators

4. Applications

4.1. Comparisons of the Minimum Pseudodistance Estimators with Other Robust Estimators for the Linear Regression Model

4.2. Robust Portfolios Using Minimum Pseudodistance Estimators

5. Conclusions

Author Contributions

Acknowledgments

Conflicts of Interest

Appendix A. Proof of the Results

Appendix B. The 50 Stocks and Their Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI