Modelling Volatile Time Series with V-Transforms and Copulas

McNeil, Alexander J.

doi:10.3390/risks9010014

Open AccessArticle

Modelling Volatile Time Series with V-Transforms and Copulas

by

Alexander J. McNeil

The York Management School, University of York, Freboys Lane, York YO10 5GD, UK

Risks 2021, 9(1), 14; https://0-doi-org.brum.beds.ac.uk/10.3390/risks9010014

Submission received: 29 November 2020 / Revised: 21 December 2020 / Accepted: 29 December 2020 / Published: 5 January 2021

(This article belongs to the Special Issue Risks: Feature Papers 2020)

Download

Browse Figures

Versions Notes

Abstract

:

An approach to the modelling of volatile time series using a class of uniformity-preserving transforms for uniform random variables is proposed. V-transforms describe the relationship between quantiles of the stationary distribution of the time series and quantiles of the distribution of a predictable volatility proxy variable. They can be represented as copulas and permit the formulation and estimation of models that combine arbitrary marginal distributions with copula processes for the dynamics of the volatility proxy. The idea is illustrated using a Gaussian ARMA copula process and the resulting model is shown to replicate many of the stylized facts of financial return series and to facilitate the calculation of marginal and conditional characteristics of the model including quantile measures of risk. Estimation is carried out by adapting the exact maximum likelihood approach to the estimation of ARMA processes, and the model is shown to be competitive with standard GARCH in an empirical application to Bitcoin return data.

Keywords:

time series; volatility; probability-integral transform; ARMA model; copula

1. Introduction

In this paper, we show that a class of uniformity-preserving transformations for uniform random variables can facilitate the application of copula modelling to time series exhibiting the serial dependence characteristics that are typical of volatile financial return data. Our main aims are twofold: to establish the fundamental properties of v-transforms and show that they are a natural fit to the volatility modelling problem; to develop a class of processes using the implied copula process of a Gaussian ARMA model that can serve as an archetype for copula models using v-transforms. Although the existing literature on volatility modelling in econometrics is vast, the models we propose have some attractive features. In particular, as copula-based models, they allow the separation of marginal and serial dependence behaviour in the construction and estimation of models.

A distinction is commonly made between genuine stochastic volatility models, as investigated by Taylor (1994) and Andersen (1994), and GARCH-type models as developed in a long series of papers by Engle (1982), Bollerslev (1986), Ding et al. (1993), Glosten et al. (1993) and Bollerslev et al. (1994), among others. In the former an unobservable process describes the volatility at any time point while in the latter volatility is modelled as a function of observable information describing the past behaviour of the process; see also the review articles by Shephard (1996) and Andersen and Benzoni (2009). The generalized autoregressive score (GAS) models of Creal et al. (2013) generalize the observation-driven approach of GARCH models by using the score function of the conditional density to model time variation in key parameters of the time series model. The models of this paper have more in common with the observation-driven approach of GARCH and GAS but have some important differences.

In GARCH-type models, the marginal distribution of a stationary process is inextricably linked to the dynamics of the process as well as the conditional or innovation distribution; in most cases, it has no simple closed form. For example, the standard GARCH mechanism serves to create power-law behaviour in the marginal distribution, even when the innovations come from a lighter-tailed distribution such as Gaussian (Mikosch and Stărică 2000). While such models work well for many return series, they may not be sufficiently flexible to describe all possible combinations of marginal and serial dependence behaviour encountered in applications. In the empirical example of this paper, which relates to log-returns on the Bitcoin price series, the data appear to favour a marginal distribution with sub-exponential tails that are lighter than power tails and this cannot be well captured by standard GARCH models. Moreover, in contrast to much of the GARCH literature, the models we propose make no assumptions about the existence of second-order moments and could also be applied to very heavy-tailed situations where variance-based methods fail.

Let

X_{1}, \dots, X_{n}

be a time series of financial returns sampled at (say) daily frequency and assume that these are modelled by a strictly stationary stochastic process

(X_{t})

with marginal distribution function (cdf)

F_{X}

. To match the stylized facts of financial return data described, for example, by Campbell et al. (1997) and Cont (2001), it is generally agreed that

(X_{t})

should have limited serial correlation, but the squared or absolute processes

(X_{t}^{2})

and

(| X_{t} |)

should have significant and persistent positive serial correlation to describe the effects of volatility clustering.

In this paper, we refer to transformed series like

(| X_{t} |)

, in which volatility is revealed through serial correlation, as volatility proxy series. More generally, a volatility proxy series

(T (X_{t}))

is obtained by applying a transformation

T : R \mapsto R

which (i) depends on a change point

μ_{T}

that may be zero, (ii) is increasing in

X_{t} - μ_{T}

for

X_{t} \geq μ_{T}

and (iii) is increasing in

μ_{T} - X_{t}

for

X_{t} \leq μ_{T}

.

Our approach in this paper is to model the probability-integral transform (PIT) series

(V_{t})

of a volatility proxy series. This is defined by

V_{t} = F_{T (X)} (T (X_{t}))

for all t, where

F_{T (X)}

denotes the cdf of

T (X_{t})

. If

(U_{t})

is the PIT series of the original process

(X_{t})

, defined by

U_{t} = F_{X} (X_{t})

for all t, then a v-transform is a function describing the relationship between the terms of

(V_{t})

and the terms of

(U_{t})

. Equivalently, a v-transform describes the relationship between quantiles of the distribution of

X_{t}

and the distribution of the volatility proxy

T (X_{t})

. Alternatively, it characterizes the dependence structure or copula of the pair of variables

(X_{t}, T (X_{t}))

. In this paper, we show how to derive flexible, parametric families of v-transforms for practical modelling purposes.

To gain insight into the typical form of a v-transform, let

x_{1}, \dots, x_{n}

represent the realized data values and let

u_{1}, \dots, u_{n}

and

v_{1}, \dots, v_{n}

be the samples obtained by applying the transformations

v_{t} = F_{n}^{(| X |)} (| x_{t} |)

and

u_{t} = F_{n}^{(X)} (x_{t})

, where

F_{n}^{(X)} (x) = \frac{1}{n + 1} \sum_{t = 1}^{n} I_{{x_{t} \leq x}}

and

F_{n}^{(| X |)} (x) = \frac{1}{n + 1} \sum_{t = 1}^{n} I_{{| x_{t} | \leq x}}

denote scaled versions of the empirical distribution functions of the

x_{t}

and

| x_{t} |

samples, respectively. The graph of

(u_{t}, v_{t})

gives an empirical estimate of the v-transform for the random variables

(X_{t}, | X_{t} |)

. In the left-hand plot of Figure 1 we show the relationship for a sample of

n = 1043

daily log-returns of the Bitcoin price series for the years 2016–2019. Note how the empirical v-transform takes the form of a slightly asymmetric ‘V’.

The right-hand plot of Figure 1 shows the sample autocorrelation function (acf) of the data given by

z_{t} = Φ^{- 1} (v_{t})

where

Φ

is the standard normal cdf. This reveals a persistent pattern of positive serial correlation which can be modelled by the implied ARMA copula. This pattern is not evident in the acf of the raw

x_{t}

data in the centre plot.

To construct a volatility model for

(X_{t})

using v-transforms, we need to specify a process for

(V_{t})

. In principle, any model for a series of serially dependent uniform variables can be applied to

(V_{t})

. In this paper, we illustrate concepts using the Gaussian copula model implied by the standard ARMA dependence structure. This model is particularly tractable and allows us to derive model properties and fit models to data relatively easily.

There is a large literature on copula models for time series; see, for example, the review papers by Patton (2012) and Fan and Patton (2014). While the main focus of this literature has been on cross-sectional dependencies between series, there is a growing literature on models of serial dependence. First-order Markov copula models have been investigated by Chen and Fan (2006), Chen et al. (2009) and Domma et al. (2009) while higher-order Markov copula models using D-vines are applied by Smith et al. (2010). These models are based on the pair-copula apporoach developed in Joe (1996), Bedford and Cooke (2001, 2002) and Aas et al. (2009). However, the standard bivariate copulas that enter these models are not generally effective at describing the typical serial dependencies created by stochastic volatility, as observed by Loaiza-Maya et al. (2018).

The paper is structured as follows. In Section 2, we provide motivation for the paper by constructing a symmetric model using the simplest example of a v-transform. The general theory of v-transforms is developed in Section 3 and is used to construct the class of VT-ARMA processes and analyse their properties in Section 4. Section 5 treats estimation and statistical inference for VT-ARMA processes and provides an example of their application to the Bitcoin return data; Section 6 presents the conclusions. Proofs may be found in the Appendix A.

2. A Motivating Model

Given a probability space

(Ω, F, P)

, we construct a symmetric, strictly stationary process

{(X_{t})}_{t \in N \ {0}}

such that, under the even transformation

T (x) = | x |

, the serial dependence in the volatility proxy series

(T (X_{t}))

is of ARMA type. We assume that the marginal cdf

F_{X}

of

(X_{t})

is absolutely continuous and the density

f_{X}

satisfies

f_{X} (x) = f_{X} (- x)

for all

x > 0

. Since

F_{X}

and

F_{| X |}

are both continuous, the properties of the probability-integral (PIT) transform imply that the series

(U_{t})

and

(V_{t})

given by

U_{t} = F_{X} (X_{t})

and

V_{t} = F_{| X |} (| X_{t} |)

both have standard uniform marginal distributions. Henceforth, we refer to

(V_{t})

as the volatility PIT process and

(U_{t})

as the series PIT process.

Any other volatility proxy series that can be obtained by a continuous and strictly increasing transformation of the terms of

(| X_{t} |)

, such as

(X_{t}^{2})

, yields exactly the same volatility PIT process. For example, if

{\tilde{V}}_{t} = F_{X^{2}} (X_{t}^{2})

, then it follows from the fact that

F_{X^{2}} (x) = F_{| X |} (\sqrt[+]{x})

for

x \geq 0

that

{\tilde{V}}_{t} = F_{X^{2}} (X_{t}^{2}) = F_{| X |} (| X_{t} |) = V_{t}

. In this sense, we can think of classes of equivalent volatility proxies, such as

(| X_{t} |)

,

(X_{t}^{2})

,

(exp | X_{t} |)

and

(ln (1 + | X_{t} |))

. In fact,

(V_{t})

is itself an equivalent volatility proxy to

(| X_{t} |)

since

F_{| X |}

is a continuous and strictly increasing transformation.

The symmetry of

f_{X}

implies that

F_{| X |} (x) = 2 F_{X} (x) - 1 = 1 - 2 F_{X} (- x)

for

x \geq 0

. Hence, we find that

V_{t} = F_{| X |} (| X_{t} |) = \{\begin{matrix} \begin{matrix} F_{| X |} (- X_{t}) & = 1 - 2 F_{X} (X_{t}) = 1 - 2 U_{t}, & if X_{t} < 0 \\ F_{| X |} (X_{t}) & = 2 F_{X} (X_{t}) - 1 = 2 U_{t} - 1, & if X_{t} \geq 0 \end{matrix} \end{matrix}

which implies that the relationship between the volatility PIT process

(V_{t})

and the series PIT process

(U_{t})

is given by

V_{t} = V (U_{t}) = | 2 U_{t} - 1 |

(1)

where

V (u) = | 2 u - 1 |

is a perfectly symmetric v-shaped function that maps values of

U_{t}

close to 0 or 1 to values of

V_{t}

close to 1, and values close to

0.5

to values close to 0.

V

is the canonical example of a v-transform. It is related to the so-called tent-map transformation

T (u) = 2 min (u, 1 - u)

by

V (u) = 1 - T (u)

.

Given

(V_{t})

, let the process

(Z_{t})

be defined by setting

Z_{t} = Φ^{- 1} (V_{t})

so that we have the following chain of transformations:

\begin{matrix} X_{t} \overset{F_{X}}{\to} & U_{t} \overset{V}{\to} & V_{t} \overset{Φ^{- 1}}{\to} & Z_{t} . \end{matrix}

(2)

We refer to

(Z_{t})

as a normalized volatility proxy series. Our aim is to construct a process

(X_{t})

such that, under the chain of transformations in (2), we obtain a Gaussian ARMA process

(Z_{t})

with mean zero and variance one. We do this by working back through the chain.

The transformation

V

is not an injection and, for any

V_{t} > 0

, there are two possible inverse values,

\frac{1}{2} (1 - V_{t})

and

\frac{1}{2} (1 + V_{t})

. However, by randomly choosing between these values, we can ‘stochastically invert’

V

to construct a random variable

U_{t}

such that

V (U_{t}) = V_{t}

, This is summarized in Lemma 1, which is a special case of a more general result in Proposition 4.

Lemma 1.

Let V be a standard uniform variable. If

V = 0

, set

U = \frac{1}{2}

. Otherwise, let

U = \frac{1}{2} (1 - V)

with probability 0.5 and

U = \frac{1}{2} (1 + V)

with probability 0.5. Then, U is uniformly distributed and

V (U) = V

.

This simple result suggests Algorithm 1 for constructing a process

(X_{t})

with symmetric marginal density

f_{X}

such that the corresponding normalized volatility proxy process

(Z_{t})

under the absolute value transformation (or continuous and strictly increasing functions thereof) is an ARMA process. We describe the resulting model as a VT-ARMA process.

Algorithm 1:

Generate $(Z_{t})$ as a causal and invertible Gaussian ARMA process of order $(p, q)$ with mean zero and variance one.
Form the volatility PIT process $(V_{t})$ where $V_{t} = Φ (Z_{t})$ for all t.
Generate a process of iid Bernoulli variables $(Y_{t})$ such that $P (Y_{t} = 1) = 0.5$ .
Form the PIT process $(U_{t})$ using the transformation $U_{t} = 0.5 {(1 - V_{t})}^{I_{{Y_{t} = 0}}} {(1 + V_{t})}^{I_{{Y_{t} = 1}}}$ .
Form the process $(X_{t})$ by setting $X_{t} = F_{X}^{- 1} (U_{t})$ .

It is important to state that the use of the Gaussian process

(Z_{t})

as the fundamental building block of the VT-ARMA process in Algorithm 1 has no effect on the marginal distribution of

(X_{t})

, which is

F_{X}

as specified in the final step of the algorithm. The process

(Z_{t})

is exploited only for its serial dependence structure, which is described by a family of finite-dimensional Gaussian copulas; this dependence structure is applied to the volatility proxy process.

Figure 2 shows a symmetric VT-ARMA(1,1) process with ARMA parameters

α_{1} = 0.95

and

β_{1} = - 0.85

; such a model often works well for financial return data. Some intuition for this observation can be gained from the fact that the popular GARCH(1,1) model is known to have the structure of an ARMA(1,1) model for the squared data process; see, for example, McNeil et al. (2015) (Section 4.2) for more details.

3. V-Transforms

To generalize the class of v-transforms, we admit two forms of asymmetry in the construction described in Section 2: we allow the density

f_{X}

to be skewed; we introduce an asymmetric volatility proxy.

Definition 1

(Volatility proxy transformation and profile). Let

T_{1}

and

T_{2}

be strictly increasing, continuous and differentiable functions on

R^{+} = [0, \infty)

such that

T_{1} (0) = T_{2} (0)

. Let

μ_{T} \in R

. Any transformation

T : R \to R

of the form

T (x) = \{\begin{matrix} T_{1} (μ_{T} - x) & x \leq μ_{T} \\ T_{2} (x - μ_{T}) & x > μ_{T} \end{matrix}

(3)

is a volatility proxy transformation. The parameter

μ_{T}

is the change point of T and the associated function

g_{T} : R^{+} \to R^{+}

,

g_{T} (x) = T_{2}^{- 1} \circ T_{1} (x)

is the profile function of T.

By introducing

μ_{T}

, we allow for the possibility that the natural change point may not be identical to zero. By introducing different functions

T_{1}

and

T_{2}

for returns on either side of the change point, we allow the possibility that one or other may contribute more to the volatility proxy. This has a similar economic motivation to the leverage effects in GARCH models (Ding et al. 1993); falls in equity prices increase a firm’s leverage and increase the volatility of the share price.

Clearly, the profile function of a volatility proxy transformation is a strictly increasing, continuous and differentiable function on

R^{+}

such that

g_{T} (x) = 0

. In conjunction with

μ_{T}

, the profile contains all the information about T that is relevant for constructing v-transforms. In the case of a volatility proxy transformation that is symmetric about

μ_{T}

, the profile satisfies

g_{T} (x) = x

.

The following result shows how v-transforms

V = V (U)

can be obtained by considering different continuous distributions

F_{X}

and different volatility proxy transformations T of type (3).

Proposition 1.

Let X be a random variable with absolutely continuous and strictly increasing cdf

F_{X}

on

R

and let T be a volatility proxy transformation. Let

U = F_{X} (X)

and

V = F_{T (X)} (T (X))

. Then,

V = V (U)

where

V (u) = \{\begin{matrix} F_{X} (μ_{T} + g_{T} (μ_{T} - F_{X}^{- 1} (u))) - u, & u \leq F_{X} (μ_{T}) \\ u - F_{X} (μ_{T} - g_{T}^{- 1} (F_{X}^{- 1} (u) - μ_{T})), & u > F_{X} (μ_{T}) . \end{matrix}

(4)

The result implies that any two volatility proxy transformations T and

\tilde{T}

which have the same change point

μ_{T}

and profile function

g_{T}

belong to an equivalence class with respect to the resulting v-transform. This generalizes the idea that

T (x) = | x |

and

T (x) = x^{2}

give the same v-transform in the symmetric case of Section 2. Note also that the volatility proxy transformations

T^{(V)}

and

T^{(Z)}

defined by

\begin{matrix} T^{(V)} (x) & = & F_{T (X)} (T (x)) = V (F_{X} (x)) \\ T^{(Z)} (x) & = & Φ^{- 1} (T^{(V)} (x)) = Φ^{- 1} (V (F_{X} (x))) \end{matrix}

(5)

are in the same equivalence class as T since they share the same change point and profile function.

Definition 2

(v-transform and fulcrum). Any transformation

V

that can be obtained from Equation (4) by choosing an absolutely continuous and strictly increasing cdf

F_{X}

on

R

and a volatility proxy transformation T is a v-transform. The value

δ = F_{X} (μ_{T})

is the fulcrum of the v-transform.

3.1. A Flexible Parametric Family

In this section, we derive a family of v-transforms using construction (4) by taking a tractable asymmetric model for

F_{X}

using the construction proposed by Fernández and Steel (1998) and by setting

μ_{T} = 0

and

g_{T} (x) = k x^{ξ}

for

k > 0

and

ξ > 0

. This profile function contains the identity profile

g_{T} (x) = x

(corresponding to the symmetric volatility proxy transformation) as a special case, but allows cases where negative or positive returns contribute more to the volatility proxy. The choices we make may at first sight seem rather arbitrary, but the resulting family can in fact assume many of the shapes that are permissible for v-transforms, as we will argue.

Let

f_{0}

be a density that is symmetric about the origin and let

γ > 0

be a scalar parameter. Fernandez and Steel suggested the model

f_{X} (x; γ) = \{\begin{matrix} \frac{2 γ}{1 + γ^{2}} f_{0} (γ x) & x \leq 0 \\ \frac{2 γ}{1 + γ^{2}} f_{0} (\frac{x}{γ}) & x > 0 . \end{matrix}

(6)

This model is often used to obtain skewed normal and skewed Student distributions for use as innovation distributions in econometric models. A model with

γ > 1

is skewed to the right while a model with

γ < 1

is skewed to the left, as might be expected for asset returns. We consider the particular case of a Laplace or double exponential distribution

f_{0} (x) = 0.5 exp (- | x |)

which leads to particularly tractable expressions.

Proposition 2.

Let

F_{X} (x; γ)

be the cdf of the density (6) when

f_{0} (x) = 0.5 exp (- | x |)

. Set

μ_{T} = 0

and let

g_{T} (x) = k x^{ξ}

for

k, ξ > 0

. The v-transform (4) is given by

V_{δ, κ, ξ} (u) = \{\begin{matrix} 1 - u - (1 - δ) exp (- κ {(- ln (\frac{u}{δ}))}^{ξ}) & u \leq δ, \\ u - δ exp (- κ^{- 1 / ξ} {(- ln (\frac{1 - u}{1 - δ}))}^{1 / ξ}) & u > δ, \end{matrix}

(7)

where

δ = F_{X} (0) = {(1 + γ^{2})}^{- 1} \in (0, 1)

and

κ = k / γ^{ξ + 1} > 0

.

It is remarkable that (7) is a uniformity-preserving transformation. If we set

ξ = 1

and

κ = 1

, we get

V_{δ} (u) = \{\begin{matrix} (δ - u) / δ & u \leq δ, \\ (u - δ) / (1 - δ) & u > δ \end{matrix}

(8)

which obviously includes the symmetric model

V_{0.5} (u) = | 2 u - 1 |

. The v-transform

V_{δ} (u)

in (8) is a very convenient special case, and we refer to it as the linear v-transform.

In Figure 3, we show the v-transform

V_{δ, κ, ξ}

when

δ = 0.55

,

κ = 1.4

and

ξ = 0.65

. We will use this particular v-transform to illustrate further properties of v-transforms and find a characterization.

3.2. Characterizing v-Transforms

It is easily verified that any v-transform obtained from (4) consists of two arms or branches, described by continuous and strictly monotonic functions; the left arm is decreasing and the right arm increasing. See Figure 3 for an illustration. At the fulcrum

δ

, we have

V (δ) = 0

. Every point

u \in [0, 1] \ {δ}

has a dual point

u^{*}

on the opposite side of the fulcrum such that

V (u^{*}) = V (u)

. Dual points can be interpreted as the quantile probability levels of the distribution of X that give rise to the same level of volatility.

We collect these properties together in the following lemma and add one further important property that we refer to as the square property of a v-transform; this property places constraints on the shape that v-transforms can take and is illustrated in Figure 3.

Lemma 2.

A v-transform is a mapping

V : [0, 1] \to [0, 1]

with the following properties:

1.: $V (0) = V (1) = 1$ ;
2.: There exists a point δ known as the fulcrum such that $0 < δ < 1$ and $V (δ) = 0$ ;
3.: $V$ is continuous;
4.: $V$ is strictly decreasing on $[0, δ]$ and strictly increasing on $[δ, 1]$ ;
5.: Every point $u \in [0, 1] \ {δ}$ has a dual point $u^{*}$ on the opposite side of the fulcrum satisfying $V (u) = V (u^{*})$ and $| u^{*} - u | = V (u)$ (square property).

It is instructive to see why the square property must hold. Consider Figure 3 and fix a point

u \in [0, 1] \ {δ}

with

V (u) = v

. Let

U \sim U (0, 1)

and let

V = V (U)

. The events

\{V \leq v\}

and

\{min (u, u^{*}) \leq U \leq max (u, u^{*})\}

are the same and hence the uniformity of V under a v-transform implies that

v = P (V \leq v) = P (min (u, u^{*}) \leq U \leq max (u, u^{*})) = | u^{*} - u | .

(9)

The properties in Lemma 2 could be taken as the basis of an alternative definition of a v-transform. In view of (9), it is clear that any mapping

V

that has these properties is a uniformity-preserving transformation. We can characterize the mappings

V

that have these properties as follows.

Theorem 1.

A mapping

V : [0, 1] \to [0, 1]

has the properties listed in Lemma 2 if and only if it takes the form

V (u) = \{\begin{matrix} (1 - u) - (1 - δ) Ψ (\frac{u}{δ}) & u \leq δ, \\ u - δ Ψ^{- 1} (\frac{1 - u}{1 - δ}) & u > δ, \end{matrix}

(10)

where Ψ is a continuous and strictly increasing distribution function on

[0, 1]

.

Our arguments so far show that every v-transform must have the form (10). It remains to verify that every uniformity-preserving transformation of the form (10) can be obtained from construction (4), and this is the purpose of the final result of this section. This allows us to view Definition 2, Lemma 2, and the characterization (10) as three equivalent approaches to the definition of v-transforms.

Proposition 3.

Let

V

be a uniformity-preserving transformation of the form (10) and

F_{X}

a continuous distribution function. Then,

V

can be obtained from construction (4) using any volatility proxy transformation with change point

μ_{T} = F_{X}^{- 1} (δ)

and profile

g_{T} (x) = F_{X}^{- 1} (F_{X} (μ_{T} - x) + V (F_{X} (μ_{T} - x))) - μ_{T}, x \geq 0 .

(11)

Henceforth, we can view (10) as the general equation of a v-transform. Distribution functions

Ψ

on

[0, 1]

can be thought of as generators of v-transforms. Comparing (10) with (7), we see that our parametric family

V_{δ, κ, ξ}

is generated by

Ψ (x) = exp (- κ (- {(ln x)}^{ξ}))

. This is a 2-parameter distribution whose density can assume many different shapes on the unit interval including increasing, decreasing, unimodal, and bathtub-shaped forms. In this respect, it is quite similar to the beta distribution which would yield an alternative family of v-transforms. The uniform distribution function

Ψ (x) = x

gives the family of linear v-transforms

V_{δ}

.

In applications, we construct models starting from the building blocks of a tractable v-transform

V

such as (7) and a distribution

F_{X}

; from these, we can always infer an implied profile function

g_{T}

using (11). The alternative approach of starting from

g_{T}

and

F_{X}

and constructing

V

via (4) is also possible but can lead to v-transforms that are cumbersome and computationally expensive to evaluate if

F_{X}

and its inverse do not have simple closed forms.

3.3. V-Transforms and Copulas

If two uniform random variables are linked by the v-transform

V = V (U)

, then the joint distribution function of

(U, V)

is a special kind of copula. In this section, we derive the form of the copula, which facilitates the construction of stochastic processes using v-transforms.

To state the main result, we use the notation

V^{- 1}

and

V^{'}

for the the inverse function and the gradient function of a v-transform

V

. Although there is no unique inverse

V^{- 1} (v)

(except when

v = 0

), the fact that the two branches of a v-transform mutually determine each other allows us to define

V^{- 1} (v)

to be the inverse of the left branch of the v-transform given by

V^{- 1} : [0, 1] \to [0, δ], V^{- 1} (v) = inf {u : V (u) = v}

. The gradient

V^{'} (u)

is defined for all points

u \in [0, 1] \ {δ}

, and we adopt the convention that

V^{'} (δ)

is the left derivative as

u \to δ

.

Theorem 2.

Let V and U be random variables related by the v-transform

V = V (U)

.

1.: The joint distribution function of $(U, V)$ is given by the copula

$C (u, v) = (U \leq u, V \leq v) = \{\begin{matrix} 0 & u < V^{- 1} (v) \\ u - V^{- 1} (v) & V^{- 1} (v) \leq u < V^{- 1} (v) + v \\ v & u \geq V^{- 1} (v) + v . \end{matrix}$

(12)
2.: Conditional on $V = v$ , the distribution of U is given by

$U = \{\begin{matrix} V^{- 1} (v) & with probability Δ (v) if v \neq 0 \\ V^{- 1} (v) + v & with probability 1 - Δ (v) if v \neq 0 \\ δ & if v = 0 \end{matrix}$

(13)

where

$Δ (v) = - \frac{1}{V^{'} (V^{- 1} (v))} .$

(14)
3.: $E (Δ (V)) = δ$ .

Remark 1.

In the case of the symmetric v-transform

V (u) = | 1 - 2 u |

, the copula in (12) takes the form

C (u, v) = max (min (u + \frac{v}{2} - \frac{1}{2}, v), 0)

. We note that this copula is related to a special case of the tent map copula family

C_{θ}^{T}

in Rémillard (2013) by

C (u, v) = u - C_{1}^{T} (u, 1 - v)

.

For the linear v-transform family, the conditional probability

Δ (v)

in (14) satisfies

Δ (v) = δ

. This implies that the value of V contains no information about whether U is likely to be below or above the fulcrum; the probability is always the same regardless of V. In general, this is not the case and the value of V does contain information about whether U is large or small.

Part (2) of Theorem 2 is the key to stochastically inverting a v-transform in the general case. Based on this result, we define the concept of stochastic inversion of a v-transform. We refer to the function

Δ

as the conditional down probability of

V

.

Definition 3

(Stochastic inversion function of a v-transform). Let

V

be a v-transform with conditional down probability Δ. The two-place function

V^{- 1} : [0, 1] \times [0, 1] \to [0, 1]

defined by

V^{- 1} (v, w) = \{\begin{matrix} V^{- 1} (v) & i f w \leq Δ (v) \\ v + V^{- 1} (v) & i f w > Δ (v) . \end{matrix}

(15)

is the stochastic inversion function of

V

.

The following proposition, which generalizes Lemma 1, allows us to construct general asymmetric processes that generalize the process of Algorithm 1.

Proposition 4.

Let V and W be iid

U (0, 1)

variables and let

V

be a v-transform with stochastic inversion function

V

. If

U = V^{- 1} (V, W)

, then

V (U) = V

and

U \sim U (0, 1)

.

In Section 4, we apply v-transforms and their stochastic inverses to the terms of time series models. To understand the effect this has on the serial dependencies between random variables, we need to consider multivariate componentwise v-transforms of random vectors with uniform marginal distributions and these can also be represented in terms of copulas. We now give a result which forms the basis for the analysis of serial dependence properties. The first part of the result shows the relationship between copula densities under componentwise v-transforms. The second part shows the relationship under the componentwise stochastic inversion of a v-transform; in this case, we assume that the stochastic inversion of each term takes place independently given

V

so that all serial dependence comes from

V

.

Theorem 3.

Let

V

be a v-transform and let

U = {(U_{1}, \dots, U_{d})}^{'}

and

V = {(V_{1}, \dots, V_{d})}^{'}

be vectors of uniform random variables with copula densities

c_{U}

and

c_{V}

, respectively.

1.: If $V = {(V (U_{1}), \dots, V (U_{d}))}^{'}$ , then

$c_{V} (v_{1}, \dots, v_{d}) = \sum_{j_{1} = 1}^{2} \dots \sum_{j_{d} = 1}^{2} c_{U} (u_{1 j_{1}}, \dots, u_{d j_{d}}) \prod_{i = 1}^{d} Δ {(v_{i})}^{I_{{j_{i} = 1}}} {(1 - Δ (v_{i}))}^{I_{{j_{i} = 2}}}$

(16)

where $u_{i 1} = V^{- 1} (v_{i})$ and $u_{i 2} = V^{- 1} (v_{i}) + v_{i}$ for all $i \in {1, \dots, d}$ .
2.: If $U = {(V^{- 1} (V_{1}, W_{1}), \dots, V^{- 1} (V_{d}, W_{d}))}^{'}$ where $W_{1}, \dots, W_{d}$ are iid uniform random variables that are also independent of $V_{1}, \dots, V_{d}$ , then

$c_{U} (u_{1}, \dots, u_{d}) = c_{V} (V (u_{1}), \dots, V (u_{d})) .$

(17)

4. VT-ARMA Copula Models

In this section, we study some properties of the class of time series models obtained by the following algorithm, which generalizes Algorithm 1. The models obtained are described as VT-ARMA processes since they are stationary time series constructed using the fundamental building blocks of a v-transform

V

and an ARMA process.

We can add any marginal behaviour in the final step, and this allows for an infinitely rich choice. We can, for instance, even impose an infinite-variance or an infinite-mean distribution, such as the Cauchy distribution, and still obtain a strictly stationary process for

(X_{t})

. We make the following definitions.

Definition 4

(VT-ARMA and VT-ARMA copula process). Any stochastic process

(X_{t})

that can be generated using Algorithm 2 by choosing an underlying ARMA process with mean zero and variance one, a v-transform

V

, and a continuous distribution function

F_{X}

is a VT-ARMA process. The process

(U_{t})

obtained at the penultimate step of the algorithm is a VT-ARMA copula process.

Algorithm 2:

Generate $(Z_{t})$ as a causal and invertible Gaussian ARMA process of order $(p, q)$ with mean zero and variance one.
Form the volatility PIT process $(V_{t})$ where $V_{t} = Φ (Z_{t})$ for all t.
Generate iid $U (0, 1)$ random variables $(W_{t})$ .
Form the series PIT process $(U_{t})$ by taking the stochastic inverses $U_{t} = V^{- 1} (V_{t}, W_{t})$ .
Form the process $(X_{t})$ by setting $X_{t} = F_{X}^{- 1} (U_{t})$ for some continuous cdf $F_{X}$ .

Figure 4 gives an example of a simulated process using Algorithm 2 and the v-transform

V_{δ, κ, ξ}

in (7) with

κ = 0.9

and MA parameter

ξ = 1.1

. The marginal distribution is a heavy-tailed skewed Student distribution of type (6) with degrees-of-freedom

ν = 3

and skewness

γ = 0.8

, which gives rise to more large negative returns than large positive returns. The underlying time series model is an ARMA(1,1) model with AR parameter

α = 0.95

and MA parameter

β = - 0.85

. See the caption of the figure for full details of parameters.

In the remainder of this section, we concentrate on the properties of VT-ARMA copula processes

(U_{t})

from which related properties of VT-ARMA processes

(X_{t})

may be easily inferred.

4.1. Stationary Distribution

The VT-ARMA copula process

(U_{t})

of Definition 4 is a strictly stationary process since the joint distribution of

(U_{t_{1}}, \dots, U_{t_{k}})

for any set of indices

t_{1} < \dots < t_{k}

is invariant under time shifts. This property follows easily from the strict stationarity of the underlying ARMA process

(Z_{t})

according to the following result, which uses Theorem 3.

Proposition 5.

Let

(U_{t})

follow a VT-ARMA copula process with v-transform

V

and an underlying ARMA(p,q) structure with autocorrelation function

ρ (k)

. The random vector

(U_{t_{1}}, \dots, U_{t_{k}})

for

k \in N

has joint density

c_{P (t_{1}, \dots, t_{k})}^{Ga} (V (u_{1}), \dots, V (u_{k}))

, where

c_{P (t_{1}, \dots, t_{k})}^{Ga}

denotes the density of the Gaussian copula

C_{P (t_{1}, \dots, t_{k})}^{Ga}

and

P (t_{1}, \dots, t_{k})

is a correlation matrix with

(i, j)

element given by

ρ (| t_{j} - t_{i} |)

.

An expression for the joint density facilitates the calculation of a number of dependence measures for the bivariate marginal distribution of

(U_{t}, U_{t + k})

. In the bivariate case, the correlation matrix of the underlying Gaussian copula

C_{P (t, t + k)}^{Ga}

contains a single off-diagonal value

ρ (k)

and we simply write

C_{ρ (k)}^{Ga}

. The Pearson correlation of

(U_{t}, U_{t + k})

is given by

\begin{matrix} ρ (U_{t}, U_{t + k}) & = 12 \int_{0}^{1} \int_{0}^{1} u_{1} u_{2} c_{ρ (k)}^{Ga} (V (u_{1}), V (u_{2})) d u_{1} d u_{2} - 3 . \end{matrix}

(18)

This value is also the value of the Spearman rank correlation

ρ_{S} (X_{t}, X_{t + k})

for a VT-ARMA process

(X_{t})

with copula process

(U_{t})

(since the Spearman’s rank correlation of a pair of continuous random variables is the Pearson correlation of their copula). The calculation of (18) typically requires numerical integration. However, in the special case of the linear v-transform

V_{δ}

in (8), we can get a simpler expression as shown in the following result.

Proposition 6.

Let

(U_{t})

be a VT-ARMA copula process satisfying the assumptions of Proposition 5 with linear v-transform

V_{δ}

. Let

(Z_{t})

denote the underlying Gaussian ARMA process. Then,

\begin{matrix} ρ (U_{t}, U_{t + k}) & = & {(2 δ - 1)}^{2} ρ_{S} (Z_{t}, Z_{t + k}) = \frac{6 {(2 δ - 1)}^{2} arcsin (\frac{ρ (k)}{2})}{π} . \end{matrix}

(19)

For the symmetric v-transform

V_{0.5}

, Equation (19) obviously yields a correlation of zero so that, in this case, the VT-ARMA copula process

(U_{t})

is a white noise with an autocorrelation function that is zero, except at lag zero. However, even a very asymmetric model with

δ = 0.4

or

δ = 0.6

gives

ρ (U_{t}, U_{t + k}) = 0.04 ρ_{S} (Z_{t}, Z_{t + k})

so that serial correlations tend to be very weak.

When we add a marginal distribution, the resulting process

(X_{t})

has a different auto-correlation function to

(U_{t})

, but the same rank autocorrelation function. The symmetric model of Section 2 is a white noise process. General asymmetric processes

(X_{t})

are not perfect white noise processes but have only very weak serial correlation.

4.2. Conditional Distribution

To derive the conditional distribution of a VT-ARMA copula process, we use the vector notation

U_{t} = {(U_{1}, \dots, U_{t})}^{'}

and

Z_{t} = {(Z_{1}, \dots, Z_{t})}^{'}

to denote the history of processes up to time point t and

u_{t}

and

z_{t}

for realizations. These vectors are related by the componentwise transformation

Z_{t} = Φ^{- 1} (V (U_{t}))

. We assume that all processes have a time index set given by

t \in {1, 2, \dots}

.

Proposition 7.

For

t > 1

, the conditional density

f_{U_{t} ∣ U_{t - 1}} (u ∣ u_{t - 1})

is given by

f_{U_{t} ∣ U_{t - 1}} (u ∣ u_{t - 1}) = \frac{ϕ (\frac{Φ^{- 1} (V (u)) - μ_{t}}{σ_{ϵ}})}{σ_{ϵ} ϕ (Φ^{- 1} (V (u)))}

(20)

where

μ_{t} = E (Z_{t} ∣ Z_{t - 1} = Φ^{- 1} (V (u_{t - 1})))

and

σ_{ϵ}

is the standard deviation of the innovation process for the ARMA model followed by

(Z_{t})

.

When

(Z_{t})

is iid white noise

μ_{t} = 0

,

σ_{ϵ} = 1

and (20) reduce to the uniform density

f_{U_{t} ∣ U_{t - 1}} (u ∣ u_{t - 1}) = 1

as expected. In the case of the first-order Markov AR(1) model

Z_{t} = α_{1} Z_{t - 1} + ϵ_{t}

, the conditional mean of

Z_{t}

is

μ_{t} = α_{1} Φ^{- 1} (V (u_{t - 1}))

and

σ_{ϵ}^{2} = 1 - α_{1}^{2}

. The conditional density (20) can be easily shown to simplify to

f_{U_{t} ∣ U_{t - 1}} (u ∣ u_{t - 1}) = c_{α_{1}}^{Ga} (V (u), V (u_{t - 1}))

where

c_{α_{1}}^{Ga} (V (u_{1}), V (u_{2}))

denotes the copula density derived in Proposition 5. In this special case, the VT-ARMA model falls within the class of first-order Markov copula models considered by Chen and Fan (2006), although the copula is new.

If we add a marginal distribution

F_{X}

to the VT-ARMA copula model to obtain a model for

(X_{t})

and use similar notational conventions as above, the resulting VT-ARMA model has conditional density

f_{X_{t} ∣ X_{t - 1}} (x ∣ x_{t - 1}) = f_{X} (x) f_{U_{t} ∣ U_{t - 1}} (F_{X} (x) ∣ F_{X} (x_{t - 1}))

(21)

with

f_{U_{t} ∣ U_{t - 1}}

as in (20). An interesting property of the VT-ARMA process is that the conditional density (21) can have a pronounced bimodality for values of

μ_{t}

in excess of zero that is in high volatility situations where the conditional mean of

Z_{t}

is higher than the marginal mean value of zero; in low volatility situations, the conditional density appears more concentrated around zero. This phenomenon is illustrated in Figure 4. The bimodality in high volatility situations makes sense: in such cases, it is likely that the next return will be large in absolute value and relatively less likely that it will be close to zero.

The conditional distribution function of

(X_{t})

is

F_{X_{t} ∣ X_{t - 1}} (x ∣ x_{t - 1}) = F_{U_{t} ∣ U_{t - 1}} (F_{X} (x) ∣ F_{X} (x_{t - 1}))

and hence the

ψ

-quantile

x_{ψ, t}

of

F_{X_{t} ∣ X_{t - 1}}

can be obtained by solving

ψ = F_{U_{t} ∣ U_{t - 1}} (F_{X} (x_{ψ, t}) ∣ F_{X} (x_{t - 1})) .

(22)

For

ψ < 0.5

, the negative of this value is often referred to as the conditional

(1 - ψ)

-VaR (value-at-risk) at time t in financial applications.

5. Statistical Inference

In the copula approach to dependence modelling, the copula is the object of central interest and marginal distributions are often of secondary importance. A number of different approaches to estimation are found in the literature. As before, let

x_{1}, \dots, x_{n}

represent realizations of variables

X_{1}, \dots, X_{n}

from the time series process

(X_{t})

.

The semi-parametric approach developed by Genest et al. (1995) is very widely used in copula inference and has been applied by Chen and Fan (2006) to first-order Markov copula models in the time series context. In this approach, the marginal distribution

F_{X}

is first estimated non-parametrically using the scaled empirical distribution function

F_{n}^{(X)}

(see definition in Section 1) and the data are transformed onto the

(0, 1)

scale. This has the effect of creating pseudo-copula data

u_{t} = rank (x_{t}) / (n + 1)

where

rank (x_{t})

denotes the rank of

x_{t}

within the sample. The copula is fitted to the pseudo-copula data by maximum likelihood (ML).

As an alternative, the inference-functions-for-margins (IFM) approach of Joe (2015) could be applied. This is also a two-step method although in this case a parametric model

{\hat{F}}_{X}

is estimated under an iid assumption in the first step and the copula is fitted to the data

u_{t} = {\hat{F}}_{X} (x_{t})

in the second step.

The approach we adopt for our empirical example is to first use the semi-parametric approach to determine a reasonable copula process, then to estimate marginal parameters under an iid assumption, and finally to estimate all parameters jointly using the parameter estimates from the previous steps as starting values.

We concentrate on the mechanics of deriving maximum likelihood estimates (MLEs). The problem of establishing the asymptotic properties of the MLEs in our setting is a difficult one. It is similar to, but appears to be more technically challenging than, the problem of showing consistency and efficiency of MLEs for a Box-Cox-transformed Gaussian ARMA process, as discussed in Terasaka and Hosoya (2007). We are also working with a componentwise transformed ARMA process, although, in our case, the transformation

(X_{t}) \to (Z_{t})

is via the nonlinear, non-increasing volatility proxy transformation

T^{(Z)} (x)

in (5), which is not differentiable at the change point

μ_{T}

. We have, however, run extensive simulations which suggests good behaviour of the MLEs in large samples.

5.1. Maximum Likelihood Estimation of the VT-ARMA Copula Process

We first consider the estimation of the VT-ARMA copula process for a sample of data

u_{1}, \dots, u_{n}

. Let

θ^{(V)}

and

θ^{(A)}

denote the parameters of the v-transform and ARMA model, respectively. It follows from Theorem 3 (part 2) and Proposition 5 that the log-likelihood for the sample

u_{1}, \dots, u_{n}

is simply the log density of the Gaussian copula under componentwise inverse v-transformation. This is given by

\begin{matrix} L (θ^{(V)}, θ^{(A)} ∣ u_{1}, \dots, u_{n}) & = L^{*} (θ^{(A)} ∣ Φ^{- 1} (V_{θ^{(V)}} (u_{1})), \dots, Φ^{- 1} (V_{θ^{(V)}} (u_{n}))) \\ - \sum_{t = 1}^{n} ln ϕ (Φ^{- 1} (V_{θ^{(V)}} (u_{t}))) \end{matrix}

(23)

where the first term

L^{*}

is the log-likelihood for an ARMA model with a standard N(0,1) marginal distribution. Both terms in the log-likelihood (23) are relatively straightforward to evaluate.

The evaluation of the ARMA likelihood

L^{*} (θ^{(A)} ∣ z_{1}, \dots, z_{n})

for parameters

θ^{(A)}

and data

z_{1}, \dots, z_{n}

can be accomplished using the Kalman filter. However, it is important to note that the assumption that the data

z_{1}, \dots, z_{n}

are standard normal requires a bespoke implementation of the Kalman filter, since standard software always treats the error variance

σ_{ϵ}^{2}

as a free parameter in the ARMA model. In our case, we need to constrain

σ_{ϵ}^{2}

to be a function of the ARMA parameters so that

var (Z_{t}) = 1

. For example, in the case of an ARMA(1,1) model with AR parameter

α_{1}

and MA parameter

β_{1}

, this means that

σ_{ϵ}^{2} = σ_{ϵ}^{2} (α_{1}, β_{1}) = (1 - α_{1}^{2}) / (1 + 2 α_{1} β_{1} + β_{1}^{2})

. The constraint on

σ_{ϵ}^{2}

must be incorporated into the state-space representation of the ARMA model.

Model validation tests for the VT-ARMA copula can be based on residuals

r_{t} = z_{t} - {\hat{μ}}_{t}, z_{t} = Φ^{- 1} (V_{{\hat{θ}}^{(V)}} (u_{t}))

(24)

where

z_{t}

denotes the implied realization of the normalized volatility proxy variable and where an estimate

{\hat{μ}}_{t}

of the conditional mean

μ_{t} = E (Z_{t} ∣ Z_{t - 1} = z_{t})

may be obtained as an output of the Kalman filter. The residuals should behave like an iid sample from a normal distribution.

Using the estimated model, it is also possible to implement a likelihood-ratio (LR) test for the presence of stochastic volatility in the data. Under the null hypothesis that

θ^{(A)} = 0

, the log-likelihood (23) is identically equal to zero. Thus, the size of the maximized log-likelihood

L ({\hat{θ}}^{(V)}, {\hat{θ}}^{(A)}; u_{1}, \dots, u_{n})

provides a measure of the evidence for the presence of stochastic volatility.

5.2. Adding a Marginal Model

If

F_{X}

and

f_{X}

denote the cdf and density of the marginal model and the parameters are denoted

θ^{(M)}

, then the full log-likelihood for the data

x_{1}, \dots, x_{n}

is simply

\begin{matrix} L^{full} (θ ∣ x_{1}, \dots, x_{n}) & = \sum_{t = 1}^{n} ln f_{X} (x_{t}; θ^{(M)}) \\ + L (θ^{(V)}, θ^{(A)} ∣ F_{X} (x_{1}; θ^{(M)}), \dots, F_{X} (x_{n}; θ^{(M)})) \end{matrix}

(25)

where the first term is the log-likelihood for a sample of iid data from the marginal distribution

F_{X}

and the second term is (23).

When a marginal model is added, we can recover the implied form of the volatility proxy transformation using Proposition 3. If

\hat{δ}

is the estimated fulcrum parameter of the v-transform, then the estimated change point is

{\hat{μ}}_{T} = F_{X}^{- 1} (\hat{δ}; {\hat{θ}}^{(M)})

and the implied profile function is

\begin{matrix} {\hat{g}}_{T} (x) & = & {\hat{F}}_{X}^{- 1} ({\hat{F}}_{X} ({\hat{μ}}_{T} - x) - V_{{\hat{θ}}^{(V)}} ({\hat{F}}_{X} ({\hat{μ}}_{T} - x))) - {\hat{μ}}_{T} . \end{matrix}

(26)

Note that is is possible to force the change point to be zero in a joint estimation of marginal model and copula by imposing the constraint

F_{X} (0; θ^{(M)}) = δ

on the fulcrum and marginal parameters during the optimization. However, in our experience, superior fits are obtained when these parameters are unconstrained.

5.3. Example

We analyse

n = 1043

daily log-returns for the Bitcoin price series for the period 2016–2019; values are multiplied by 100. We first apply the semi-parametric approach of Genest et al. (1995) using the log-likelihood (23) which yields the results in Table 1. Different models are referred to by VT(n)-ARMA(p, q), where

(p, q)

refers to the ARMA model and n indexes the v-transform: 1 is the linear v-transform

V_{δ}

in (8); 3 is the three-parameter transform

V_{δ, κ, ξ}

in (7); 2 is the two-parameter v-transform given by

V_{δ, κ} : = V_{δ, κ, 1}

. In unreported analyses, we also tried the three-parameter family based on the beta distribution, but this had negligible effect on the results.

The column marked L gives the value of the maximized log-likelihood. All values are large and positive showing strong evidence of stochastic volatility in all cases. The model VT(1)-ARMA(1,0) is a first-order Markov model with linear v-transform. The fit of this model is noticeably poorer than the others suggesting that Markov models are insufficient to capture the persistence of stochastic volatility in the data. The column marked SW contains the p-value for a Shapiro–Wilks test of normality applied to the residuals from the VT-ARMA copula model; the result is non-significant in all cases.

According to the AIC values, the VT(2)-ARMA(1,1) is the best model. We experimented with higher order ARMA processes, but this did not lead to further significant improvements. Figure 5 provides a visual of the fit of this model. The pictures in the panels show the QQplot of the residuals against normal, acf plots of the residuals and squared residuals and the estimated conditional mean process

({\hat{μ}}_{t})

, which can be taken as an indicator of high and low volatility periods. The residuals and absolute residuals show very little evidence of serial correlation and the QQplot is relatively linear, suggesting that the ARMA filter has been successful in explaining much of the serial dependence structure of the normalized volatility proxy process.

We now add various marginal distributions to the VT(2)-ARMA(1,1) copula model and estimate all parameters of the model jointly. We have experimented with a number of location-scale families including Student-t, Laplace (double exponential), and a double-Weibull family which generalizes the Laplace distribution and is constructed by taking back-to-back Weibull distributions. Estimation results are presented for these three distributions in Table 2. All three marginal distributions are symmetric around their location parameters

μ

, and no improvement is obtained by adding skewness using the construction of Fernández and Steel (1998) described in Section 3.1; in fact, the Bitcoin returns in this time period show a remarkable degree of symmetry. In the table, the shape and scale parameters of the distributions are denoted

η

and

σ

, respectively; in the case of Student, an infinite-variance distribution with degree-of-freedom parameter

η = 1.94

is fitted, but this model is inferior to the models with Laplace and double-Weibull margins; the latter is the favoured model on the basis of AIC values.

Figure 6 shows some aspects of the joint fit for the fully parametric VT(2)-ARMA(1,1) model with double-Weibull margin. A QQplot of the data against the fitted marginal distribution confirms that the double-Weibull is a good marginal model for these data. Although this distribution is sub-exponential (heavier-tailed than exponential), its tails do not follow a power law and it is in the maximum domain of attraction of the Gumbel distribution (see, for example, McNeil et al. 2015, Chapter 5).

Using (26), the implied volatility proxy profile function

{\hat{g}}_{T}

can be constructed and is found to lie just below the line

y = x

as shown in the upper-right panel. The change point is estimated to be

{\hat{μ}}_{T} = 0.06

. We can also estimate an implied volatility proxy transformation in the equivalence class defined by

{\hat{g}}_{T}

and

{\hat{μ}}_{T}

. We estimate the transformation

T = T^{(Z)}

in (5) by taking

\hat{T} (x) = Φ^{- 1} (V_{{\hat{θ}}^{(V)}} (F_{X} (x; {\hat{θ}}^{(M)})))

. In the lower-left panel of Figure 6, we show the empirical v-transform formed from the data

(x_{t}, \hat{T} (x_{t}))

together with the fitted parametric v-transform

V_{{\hat{θ}}^{(V)}}

. We recall from Section 1 that the empirical v-transform is the plot

(u_{t}, v_{t})

where

u_{t} = F_{n}^{(X)} (x_{t})

and

v_{t} = F_{n}^{(\hat{T} (X))} (\hat{T} (x_{t}))

. The empirical v-transform and the fitted parametric v-transform show a good degree of correspondence. The lower-right panel of Figure 6 shows the volatility proxy transformation

\hat{T} (x)

as a function of x superimposed on the points

(x_{t}, Φ^{- 1} (v_{t}))

. Using the curve, we can compare the effects of, for example, a log-return (× 100) of −10 and a log-return of 10. For the fitted model, these are 1.55 and 1.66 showing that the up movement is associated with slightly higher volatility.

As a comparison to the VT-ARMA model, we fit standard GARCH(1,1) models using Student-t and generalized error distributions for the innovations; these are standard choices available in the popular rugarch package in R. The generalized error distribution (GED) contains normal and Laplace as special cases as well as a model that has a similar tail behaviour to Weibull; note, however, that, by the theory of Mikosch and Stărică (2000), the tails of the marginal distribution of the GARCH decay according to a power law in both cases. The results in Table 3 show that the VT(2)-ARMA(1,1) models with Laplace and double-Weibull marginal distributions outperform both GARCH models in terms of AIC values.

Figure 7 shows the in-sample 95% conditional value-at-risk (VaR) estimate based on the VT(2)-ARMA(1,1) model which has been calculated using (22). For comparison, a dashed line shows the corresponding estimate for the GARCH(1,1) model with GED innovations.

Finally, we carry out an out-of-sample comparison of conditional VaR estimates using the same two models. In this analysis, the models are estimated daily throughout the 2016–2019 period using a 1000-day moving data window and one-step-ahead VaR forecasts are calculated. The VT-ARMA model gives 47 exceptions of the 95% VaR and 11 exceptions of the 99% VaR, compared with expected numbers of 52 and 10 for a 1043 day sample, while the GARCH model leads to 57 and 12 exceptions; both models pass binomial tests for these exception counts. In a follow-up paper (Bladt and McNeil 2020), we conduct more extensive out-of-sample backtests for models using v-transforms and copula processes and show that they rival and often outperform forecast models from the extended GARCH family.

6. Conclusions

This paper has proposed a new approach to volatile financial time series in which v-transforms are used to describe the relationship between quantiles of the return distribution and quantiles of the distribution of a predictable volatility proxy variable. We have characterized v-transforms mathematically and shown that the stochastic inverse of a v-transform may be used to construct stationary models for return series where arbitrary marginal distributions may be coupled with dynamic copula models for the serial dependence in the volatility proxy.

The construction was illustrated using the serial dependence model implied by a Gaussian ARMA process. The resulting class of VT-ARMA processes is able to capture the important features of financial return series including near-zero serial correlation (white noise behaviour) and volatility clustering. Moreover, the models are relatively straightforward to estimate building on the classical maximum-likelihood estimation of an ARMA model using the Kalman filter. This can be accomplished in the stepwise manner that is typical in copula modelling or through joint modelling of the marginal and copula process. The resulting models yield insights into the way that volatility responds to returns of different magnitude and sign and can give estimates of unconditional and conditional quantiles (VaR) for practical risk measurement purposes.

There are many possible uses for VT-ARMA copula processes. Because we have complete control over the marginal distribution, they are very natural candidates for the innovation distribution in other time series models. For example, they could be applied to the innovations of an ARMA model to obtain ARMA models with VT-ARMA errors; this might be particularly appropriate for longer interval returns, such as weekly or monthly returns, where some serial dependence is likely to be present in the raw return data.

Clearly, we could use other copula processes for the volatility PIT process

(V_{t})

. The VT-ARMA copula process has some limitations: the radial symmetry of the underlying Gaussian copula means that the serial dependence between large values of the volatility proxy must mirror the serial dependence between small values; moreover, this copula does not admit tail dependence in either tail and it seems plausible that very large values of the volatility proxy might have a tendency to occur in succession.

To extend the class of models based on v-transforms, we can look for models for the volatility PIT process

(V_{t})

with higher dimensional marginal distributions given by asymmetric copulas with upper tail dependence. First-order Markov copula models as developed in Chen and Fan (2006) can give asymmetry and tail dependence, but they cannot model the dependencies at longer lags that we find in empirical data. D-vine copula models can model higher-order Markov dependencies and Bladt and McNeil (2020) show that this is a promising alternative specification for the volatility PIT process.

Funding

This research received no external funding.

Data Availability Statement

The analyses were carried out using R 4.0.2 (R Core Team, 2020) and the tscopula package (Alexander J. McNeil and Martin Bladt, 2020) available at https://github.com/ajmcneil/tscopula. The full reproducible code and the data are available at https://github.com/ajmcneil/vtarma.

Acknowledgments

The author is grateful for valuable input from a number of researchers including Hansjoerg Albrecher, Martin Bladt, Valérie Chavez-Demoulin, Alexandra Dias, Christian Genest, Michael Gordy, Yen Hsiao Lok, Johanna Nešlehová, Andrew Patton, and Ruodu Wang. Particular thanks are due to Martin Bladt for providing the Bitcoin data and advice on the data analysis. The paper was completed while the author was a guest at the Forschungsinstitut für Mathematik (FIM) at ETH Zurich.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proofs

Appendix A.1. Proof of Proposition 1

We observe that, for

x \geq 0

,

F_{T (X)} (x) = P (μ_{T} - T_{1}^{- 1} (x) \leq X_{t} \leq μ_{T} + T_{2}^{- 1} (x)) = F_{X} (μ_{T} + T_{2}^{- 1} (x)) - F_{X} (μ_{T} - T_{1}^{- 1} (x)) .

{X_{t} \leq μ_{T}} \Leftrightarrow {U \leq F_{X} (μ_{T})}

and in this case

\begin{matrix} V = F_{T (X)} (T (X_{t})) = F_{T (X)} (T_{1} (μ_{T} - X_{t})) & = F_{X} (μ_{T} + T_{2}^{- 1} (T_{1} (μ_{T} - X_{t}))) - F_{X} (X_{t}) \\ = F_{X} (μ_{T} + g_{T} (μ_{T} - F_{X}^{- 1} (U))) - U . \end{matrix}

{X_{t} > μ_{T}} \Leftrightarrow {U > F_{X} (μ_{T})}

and in this case

\begin{matrix} V = F_{T (X)} (T (X_{t})) = F_{T (X)} (T_{2} (X_{t} - μ_{T})) & = F_{X} (X_{t}) - F_{X} (μ_{T} - T_{1}^{- 1} (T_{2} (X_{t} - μ_{T}))) \\ = U - F_{X} (μ_{T} - g_{T}^{- 1} (F_{X}^{- 1} (U) - μ_{T})) . \end{matrix}

Appendix A.2. Proof of Proposition 2

The cumulative distribution function

F_{0} (x)

of the double exponential distribution is equal to

0.5 e^{x}

for

x \leq 0

and

1 - 0.5 e^{- x}

if

x > 0

. It is straightforward to verify that

F_{X} (x; γ) = \{\begin{matrix} δ e^{γ x} & x \leq 0 \\ 1 - (1 - δ) e^{- \frac{x}{γ}} & x > 0 \end{matrix} and F_{X}^{- 1} (u; γ) = \{\begin{matrix} \frac{1}{γ} ln (\frac{u}{δ}) & u \leq δ \\ - γ ln (\frac{1 - u}{1 - δ}) & u > δ . \end{matrix}

When

g_{T} (x) = k x^{ξ}

, we obtain for

u \leq δ

that

\begin{matrix} V_{δ, κ, ξ} (u) = F_{X} (\frac{k}{γ^{ξ}} (ln {(\frac{δ}{u})}^{ξ}); γ) - u & = 1 - u - (1 - δ) exp (- \frac{k}{γ^{ξ + 1}} {(- ln (\frac{u}{δ}))}^{ξ}) . \end{matrix}

For

u > δ

, we make a similar calculation.

Appendix A.3. Proof of Theorem 1

It is easy to check that Equation (10) fulfills the list of properties in Lemma 2. We concentrate on showing that a function that has these properties must be of the form (10). It helps to consider the picture of a v-transform in Figure 3. Consider the lines

v = 1 - u

and

v = δ - u

for

u \in [0, δ]

. The areas above the former and below the latter are shaded gray.

The left branch of the v-transform must start at

(0, 1)

, end at

(δ, 0)

, and lie strictly between these lines in

(0, δ)

. Suppose, on the contrary, that

v = V (u) \leq δ - u

for

u \in (0, δ)

. This would imply that the dual point

u^{*}

given by

u^{*} = u + v

satisfies

u^{*} \leq δ

which contradicts the requirement that

u^{*}

must be on the opposite side of the fulcrum. Similarly, if

v = V (u) \geq 1 - u

for

u \in (0, δ)

, then

u^{*} \geq 1

and this is also not possible; if

u^{*} = 1

, then

u = 0

, which is a contradiction.

Thus, the curve that links

(0, 1)

and

(δ, 0)

must take the form

V (u) = (δ - u) Ψ (\frac{u}{δ}) + (1 - u) (1 - Ψ (\frac{u}{δ})) = (1 - u) - (1 - δ) Ψ (\frac{u}{δ})

where

Ψ (0) = 0

,

Ψ (1) = 1

and

0 < Ψ (x) < 1

for

x \in (0, 1)

. Clearly,

Ψ

must be continuous to satisfy the conditions of the v-transform. It must also be strictly increasing. If it were not, then the derivative would satisfy

V^{'} (u) \geq - 1

, which is not possible: if at any point

u \in (0, δ)

, we have

V^{'} (u) = - 1

, then the opposite branch of the v-transform would have to jump vertically at the dual point

u^{*}

, contradicting continuity; if

V^{'} (u) > - 1

, then

V

would have to be a decreasing function at

u^{*}

, which is also a contradiction.

Thus,

Ψ

fulfills the conditions of a continuous, strictly increasing distribution function on

[0, 1]

, and we have established the necessary form for the left branch equation. To find the value of the right branch equation at

u > δ

, we invoke the square property. Since

V (u) = V (u^{*}) = V (u - V (u))

, we need to solve the equation

x = V (u - x)

for

x \in [0, 1]

using the formula for the left branch equation of

V

. Thus, we solve

x = 1 - u + x - (1 - δ) Ψ (\frac{u - x}{δ})

for x, and this yields the right branch equation as asserted.

Appendix A.4. Proof of Proposition 3

Let

g_{T} (x)

be as given in (11) and let

u (x) = F_{X} (μ_{T} - x)

. For

x \in R^{+}

,

u (x)

is a continuous, strictly decreasing function of x starting at

u (0) = δ

and decreasing to 0. Since

Ψ

is a cumulative distribution function, it follows that

u^{*} (x) = u (x) + V (u (x)) = 1 - (1 - δ) Ψ (\frac{u (x)}{δ})

is a continuous, strictly increasing function starting at

u^{*} (0) = δ

and increasing to 1. Hence,

g_{T} (x) = F_{X}^{- 1} (u^{*} (x)) - μ_{T}

is continuous and strictly increasing on

R^{+}

with

g_{T} (0) = 0

as required of the profile function of a volatility proxy transformation. It remains to check that, if we insert (11) in (4), we recover

V (u)

, which is straightforward.

Appendix A.5. Proof of Theorem 2

For any $0 \leq v \leq 1$ , the event ${U \leq u, V \leq v}$ has zero probability for $u < V^{- 1} (v)$ . For $u \geq V^{- 1} (v)$ , we have

${U \leq u, V \leq v} = {V^{- 1} (v) \leq U \leq min (u, V^{- 1} (v) + v)}$

and hence $P (U \leq u, V \leq v) = min (u, V^{- 1} (v) + v) - V^{- 1} (v)$ and (12) follows.
We can write $P (U \leq u, V \leq v) = C (u, v)$ , where C is the copula given by (12). It follows from the basic properties of a copula that

$P (U \leq u, V = v) = \frac{d}{d v} C (u, v) = \{\begin{matrix} 0 & u < V^{- 1} (v) \\ - \frac{d}{d v} V^{- 1} (v) & V^{- 1} (v) \leq u < V^{- 1} (v) + v \\ 1 & u \geq V^{- 1} (v) + v \end{matrix}$

This is the distribution function of a binomial distribution, and it must be the case that $Δ (v) = - \frac{d}{d v} V^{- 1} (v)$ . Equation (14) follows by differentiating the inverse.
Finally, $E (Δ (V)) = δ$ is easily verified by making the substitution $x = V^{- 1} (v)$ in the integral $E (Δ (V)) = - \int_{0}^{1} \frac{1}{V^{'} (V^{- 1} (v))} d v$ .

Appendix A.6. Proof of Proposition 4

It is obviously true that

V (V^{- 1} (v, W)) = v

for any W. Hence,

V (U) = V (V^{- 1} (V, W)) = V

. The uniformity of U follows from the fact that

P (V^{- 1} (V, W) = V^{- 1} (v) ∣ V = v) = P (W \leq Δ (v) ∣ V = v) = P (W \leq Δ (v)) = Δ (v) .

Hence, the pair of random variables

(U, V)

has the conditional distribution (13) and is distributed according to the copula C in (12).

Appendix A.7. Proof of Theorem 3

Since the event ${V_{i} \leq v_{i}}$ is equal to the event ${V^{- 1} (v_{i}) \leq U_{i} \leq V^{- 1} (v_{i}) + v_{i}}$ , we first compute the probability of a box $[a_{1}, b_{1}] \times \dots \times [a_{d}, b_{d}]$ where $a_{i} = V^{- 1} (v_{i}) \leq V^{- 1} (v_{i}) + v_{i} = b_{i}$ . The standard formula for such probabilities implies that the copulas $C_{V}$ and $C_{U}$ are related by

$C_{V} (v_{1}, \dots, v_{d}) = \sum_{j_{1} = 1}^{2} \dots \sum_{j_{d} = 1}^{2} {(- 1)}^{j_{1} + \dots + j_{d}} C_{U} (u_{1 j_{1}}, \dots, u_{d j_{d}});$

see, for example, McNeil et al. (2015), p. 221. Thus, the copula densities are related by

$c_{V} (v_{1}, \dots, v_{d}) = \sum_{j_{1} = 1}^{2} \dots \sum_{j_{d} = 1}^{2} c_{U} (u_{1 j_{1}}, \dots, u_{d j_{d}}) \prod_{i = 1}^{d} \frac{d}{d v_{i}} {(- 1)}^{j_{i}} u_{i j_{i}},$

and the result follows if we use (14) to calculate that

$\frac{d}{d v_{i}} {(- 1)}^{j} u_{i j} = \{\begin{matrix} \frac{d}{d v_{i}} (- V^{- 1} (v_{i})) = Δ (v_{i}) & if j = 1, \\ \frac{d}{d v_{i}} (v_{i} + V^{- 1} (v_{i})) = 1 - Δ (v_{i}) & if j = 2 . \end{matrix}$
For the point $(u_{1}, \dots, u_{d}) \in {[0, 1]}^{d}$ , we consider the set of events $A_{i} (u_{i})$ defined by

$A_{i} (u_{i}) = \{\begin{matrix} \{U_{i} \leq u_{i}\} & if u_{i} \leq δ \\ \{U_{i} > u_{i}\} & if u_{i} > δ \end{matrix}$

The probability $P (A_{1} (u_{1}), \dots, A_{d} (u_{d}))$ is the probability of an orthant defined by the point $(u_{1}, \dots, u_{d})$ and the copula density at this point is given by

$c_{U} (u_{1}, \dots, u_{d}) = {(- 1)}^{\sum_{i = 1}^{d} I_{{u_{i} > δ}}} \frac{d^{d}}{d u_{1} \dots d u_{d}} P (⋂_{i = 1}^{d} A_{i} (u_{i})) .$

The event $A_{i} (u_{i})$ can be written

$A_{i} (u_{i}) = \{\begin{matrix} \{V_{i} \geq V (u_{i}), W_{i} \leq Δ (V_{i})\} & if u_{i} \leq δ \\ \{V_{i} > V (u_{i}), W_{i} > Δ (V_{i})\} & if u_{i} > δ \end{matrix}$

and hence we can use Theorem 2 to write

$\begin{matrix} P (⋂_{i = 1}^{d} A_{i} (u_{i})) = \int_{V (u_{1})}^{1} \dots \int_{V (u_{d})}^{1} c_{V} (v_{1}, \dots, v_{d}) \prod_{i = 1}^{d} Δ {(v_{i})}^{I_{{u_{i} \leq δ}}} {(1 - Δ (v_{i}))}^{I_{{u_{i} > δ}}} d v_{1} \dots d v_{d} . \end{matrix}$

The derivative is given by

$\begin{matrix} \frac{d^{d}}{d u_{1} \dots d u_{d}} P (⋂_{i = 1}^{d} A_{i} (u_{i})) = {(- 1)}^{d} c_{V} (V (u_{1}), \dots, V (u_{d})) \prod_{i = 1}^{d} p {(u_{i})}^{I_{{u_{i} \leq δ}}} {(1 - p (u_{i}))}^{I_{{u_{i} > δ}}} V^{'} (u_{i}) \end{matrix}$

where $p (u_{i}) = Δ (V (u_{i}))$ and hence we obtain

$c_{U} (u_{1}, \dots, u_{d}) = c_{V} (V (u_{1}), \dots, V (u_{d})) \prod_{i = 1}^{d} {(- p (u_{i}))}^{I_{{u_{i} \leq δ}}} {(1 - p (u_{i}))}^{I_{{u_{i} > δ}}} V^{'} (u_{i}) .$

It remains to verify that each of the terms in the product is identically equal to 1. For $u_{i} \leq δ$ , this follows easily from (14) since $- p (u_{i}) = - Δ (V (u_{i})) = 1 / V^{'} (u_{i})$ . For $u_{i} > δ$ , we need an expression for the derivative of the right branch equation. Since $V (u_{i}) = V (u_{i} - V (u_{i}))$ , we obtain

$V^{'} (u_{i}) = V^{'} (u_{i} - V (u_{i})) (1 - V^{'} (u_{i})) = V^{'} (u_{i}^{*}) (1 - V^{'} (u_{i})) ⟹ V^{'} (u_{i}) = \frac{V^{'} (u_{i}^{*})}{1 + V^{'} (u_{i}^{*})}$

implying that

$1 - p (u_{i}) = 1 - Δ (V (u_{i})) = 1 - Δ (V (u_{i}^{*})) = 1 + \frac{1}{V^{'} (u_{i}^{*})} = \frac{1 + V^{'} (u_{i}^{*})}{V^{'} (u_{i}^{*})} = \frac{1}{V^{'} (u_{i})} .$

Appendix A.8. Proof of Proposition 5

Let

V_{t} = V (U_{t})

and

Z_{t} = Φ^{- 1} (V_{t})

as usual. The process

(Z_{t})

is an ARMA process with acf

ρ (k)

and hence

(Z_{t_{1}}, \dots, Z_{t_{k}})

are jointly standard normally distributed with correlation matrix

P (t_{1}, \dots, t_{k})

. This implies that the joint distribution function of

(V_{t_{1}}, \dots, V_{t_{k}})

is the Gaussian copula with density

c_{P (t_{1}, \dots, t_{k})}^{Ga}

and hence by Part 2 of Theorem 3 the joint distribution function of

(U_{t_{1}}, \dots, U_{t_{k}})

is the copula with density

c_{P (t_{1}, \dots, t_{k})}^{Ga} (V (u_{1}), \dots, V (u_{k}))

.

Appendix A.9. Proof of Proposition 6

We split the integral in (18) into four parts. First, observe that, by making the substitutions

v_{1} = V (u_{1}) = 1 - u_{1} / δ

and

v_{2} = V (u_{2}) = 1 - u_{2} / δ

on

[0, δ] \times [0, δ]

, we get

\begin{matrix} \int_{0}^{δ} \int_{0}^{δ} u_{1} u_{2} c_{ρ (k)}^{Ga} (V (u_{1}), V (u_{2})) d u_{1} d u_{2} & = δ^{4} \int_{0}^{1} \int_{0}^{1} (1 - v_{1}) (1 - v_{2}) c_{ρ (k)}^{Ga} (v_{1}, v_{2}) d v_{1} d v_{2} \\ = δ^{4} E ((1 - V_{t}) (1 - V_{t + k})) \\ = δ^{4} (1 - E (V_{t}) - E (V_{t + k}) + E (V_{t} V_{t + k})) = δ^{4} E (V_{t} V_{t + k}) \end{matrix}

where

(V_{t}, V_{t + k})

has joint distribution given by the Gaussian copula

C_{ρ (k)}^{Ga}

. Similarly, by making the substitutions

v_{1} = V (u_{1}) = 1 - u_{1} / δ

and

v_{2} = V (u_{2}) = (u_{2} - δ) / (1 - δ)

on

[0, δ] \times [δ, 1]

, we get

\begin{array}{l} \int_{0}^{δ} \int_{δ}^{1} u_{1} u_{2} c_{ρ (k)}^{Ga} (V (u_{1}), V (u_{2})) d u_{1} d u_{2} \\ = \int_{0}^{1} \int_{0}^{1} δ^{2} (1 - δ) (1 - v_{1}) (δ + (1 - δ) v_{2}) c_{ρ (k)}^{Ga} (v_{1}, v_{2}) d v_{1} d v_{2} \\ = δ^{3} (1 - δ) E (1 - V_{t}) + δ^{2} {(1 - δ)}^{2} E ((1 - V_{t}) V_{t + k}) = \frac{δ^{2} (1 - δ)}{2} - δ^{2} {(1 - δ)}^{2} E (V_{t} V_{t + k}) \end{array}

and the same value is obtained on the quadrant

[δ, 1] \times [0, δ]

. Finally, making the substitutions

v_{1} = V (u_{1}) = (u_{1} - δ) / (1 - δ)

and

v_{2} = V (u_{2}) = (u_{2} - δ) / (1 - δ)

on

[δ, 1] \times [δ, 1]

, we get

\begin{array}{l} \int_{δ}^{1} \int_{δ}^{1} u_{1} u_{2} c_{ρ (k)}^{Ga} (V (u_{1}), V (u_{2})) d u_{1} d u_{2} \\ = \int_{0}^{1} \int_{0}^{1} {(1 - δ)}^{2} (δ + (1 - δ) v_{1}) (δ + (1 - δ) v_{2}) c_{ρ (k)}^{Ga} (v_{1}, v_{2}) d v_{1} d v_{2} \\ = \int_{0}^{1} \int_{0}^{1} {(1 - δ)}^{2} (δ^{2} + δ (1 - δ) v_{1} + δ (1 - δ) v_{2} + {(1 - δ)}^{2} v_{1} v_{2}) c_{ρ (k)}^{Ga} (v_{1}, v_{2}) d v_{1} d v_{2} \end{array}

\begin{matrix} = δ^{2} {(1 - δ)}^{2} + δ {(1 - δ)}^{3} E (V_{t}) + δ {(1 - δ)}^{3} E (V_{t + k}) + {(1 - δ)}^{4} E (V_{t} V_{t + k}) \\ = δ {(1 - δ)}^{2} + {(1 - δ)}^{4} E (V_{t} V_{t + k}) \end{matrix}

Collecting all of these terms together yields

\begin{matrix} \int_{0}^{1} \int_{0}^{1} u_{1} u_{2} c_{ρ (k)}^{Ga} (V (u_{1}), V (u_{2})) d u_{1} d u_{2} & = δ (1 - δ) + {(2 δ - 1)}^{2} E (V_{t} V_{t + k}) \end{matrix}

and, since

ρ_{S} (Z_{t}, Z_{t + k}) = 12 E (V_{t} V_{t + k}) - 3

, it follows that

\begin{matrix} ρ (U_{t}, U_{t + k}) = 12 E (U_{t} U_{t + k}) - 3 & = 12 \int_{0}^{1} \int_{0}^{1} u_{1} u_{2} c_{ρ (k)}^{Ga} (V (u_{1}), V (u_{2})) d u_{1} d u_{2} - 3 \\ = 12 δ (1 - δ) + 12 {(2 δ - 1)}^{2} E (V_{t} V_{t + k}) - 3 \\ = 12 δ (1 - δ) + {(2 δ - 1)}^{2} (ρ_{S} (Z_{t}, Z_{t + k}) + 3) - 3 \\ = {(2 δ - 1)}^{2} ρ_{S} (Z_{t}, Z_{t + k}) . \end{matrix}

The value of Spearman’s rho

ρ_{S} (Z_{t}, Z_{t + k})

for the bivariate Gaussian distribution is well known; see, for example, McNeil et al. (2015).

Appendix A.10. Proof of Proposition 7

The conditional density satisfies

\begin{matrix} f_{U_{t} ∣ U_{t - 1}} (u ∣ u_{t - 1}) & = & \frac{c_{U_{t}} (u_{1}, \dots, u_{t - 1}, u)}{c_{U_{t - 1}} (u_{1}, \dots, u_{t - 1})} = \frac{c_{P (1, \dots, t)}^{Ga} (V (u_{1}), \dots, V (u_{t - 1}), V (u))}{c_{P (1, \dots, t - 1)}^{Ga} (V (u_{1}), \dots, V (u_{t - 1}))} . \end{matrix}

The Gaussian copula density is given in general by

c_{P}^{Ga} (v_{1}, \dots, v_{d}) = \frac{f_{Z} (Φ^{- 1} (v_{1}), \dots, Φ^{- 1} (v_{d}))}{\prod_{i = 1}^{d} ϕ (Φ^{- 1} (v_{i}))}

where

Z

is a multivariate Gaussian vector with standard normal margins and correlation matrix P. Hence, it follows that we can write

\begin{matrix} f_{U_{t} ∣ U_{t - 1}} (u ∣ u_{t - 1}) & = & \frac{f_{Z_{t}} (Φ^{- 1} (V (u_{1})), \dots, Φ^{- 1} (V (u_{t - 1})), Φ^{- 1} (V (u)))}{f_{Z_{t - 1}} (Φ^{- 1} (V (u_{1})), \dots, Φ^{- 1} (V (u_{t - 1}))) ϕ (Φ^{- 1} (V (u)))} \\ = & \frac{f_{Z_{t} ∣ Z_{t - 1}} (Φ^{- 1} (V (u)) ∣ Φ^{- 1} (V (u_{t - 1})))}{ϕ (Φ^{- 1} (V (u)))} \end{matrix}

where

f_{Z_{t} ∣ Z_{t - 1}}

is the conditional density of the ARMA process, from which (20) follows easily.

References

Aas, Kjersti, Claudia Czado, Arnoldo Frigessi, and Henrik Bakken. 2009. Pair-copula constructions of multiple dependence. Insurance: Mathematics and Economics 44: 182–98. [Google Scholar] [CrossRef] [Green Version]
Andersen, Torben G. 1994. Stochastic autoregressive volatility: A framework for volatility modeling. Mathematical Finance 4: 75–102. [Google Scholar]
Andersen, Torben G., and Luca Benzoni. 2009. Stochastic Volatility. In Complex Systems in Finance and Econometrics. Edited by Robert A. Meyers. New York: Springer. [Google Scholar]
Bedford, Tim, and Roger M. Cooke. 2001. Probability density decomposition for conditionally independent random variables modeled by vines. Annals of Mathematics and Artificial Intelligence 32: 245–68. [Google Scholar] [CrossRef]
Bedford, Tim, and Roger M. Cooke. 2002. Vines—A new graphical model for dependent random variables. Annals of Statistics 30: 1031–68. [Google Scholar] [CrossRef]
Bladt, Martin, and Alexander J. McNeil. 2020. Time series copula models using d-vines and v-transforms: An alternative to GARCH modelling. arXiv arXiv:2006.11088. [Google Scholar]
Bollerslev, Tim. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–27. [Google Scholar] [CrossRef] [Green Version]
Bollerslev, Tim, Robert F. Engle, and Daniel B. Nelson. 1994. ARCH models. In Handbook of Econometrics. Edited by Robert F. Engle and Daniel L. McFadden. Amsterdam: North-Holland, vol. 4, pp. 2959–3038. [Google Scholar]
Campbell, John Y., Anrew W. Lo, and A. Craig MacKinlay. 1997. The Econometrics of Financial Markets. Princeton: Princeton University Press. [Google Scholar]
Chen, Xiaohong, and Yanqin Fan. 2006. Estimation of copula-based semiparametric time series models. Journal of Econometrics 130: 307–35. [Google Scholar] [CrossRef] [Green Version]
Chen, Xiaohong, Wei Biao Wu, and Yanping Yi. 2009. Efficient estimation of copula-based semiparametric Markov models. Annals of Statistics 37: 4214–53. [Google Scholar] [CrossRef]
Cont, Rama. 2001. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance 1: 223–36. [Google Scholar] [CrossRef]
Creal, Drew, Siem Jan Koopman, and André Lucas. 2013. Generalized autoregressive score models with applications. Journal of Applied Econometrics 28: 777–95. [Google Scholar] [CrossRef] [Green Version]
Ding, Zhuanxin, Clive W. Granger, and Robert F. Engle. 1993. A long memory property of stock market returns and a new model. Journal of Empirical Finance 1: 83–106. [Google Scholar] [CrossRef]
Domma, Filippo, Sabrina Giordano, and Pier Francesco Perri. 2009. Statistical modeling of temporal dependence in financial data via a copula function. Communications if Statistics: Simulation and Computation 38: 703–28. [Google Scholar] [CrossRef] [Green Version]
Engle, Robert F. 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Journal of the Econometric Society 50: 987–1008. [Google Scholar] [CrossRef]
Fan, Yanqin, and Andrew J. Patton. 2014. Copulas in econometrics. Annual Review of Economics 6: 179–200. [Google Scholar] [CrossRef] [Green Version]
Fernández, Carmen, and Mark F. J. Steel. 1998. On Bayesian modeling of fat tails and skewness. Journal of the American Statistical Association 93: 359–71. [Google Scholar]
Genest, Christian, Kilani Ghoudi, and Louis-Paul Rivest. 1995. A semi-parametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82: 543–52. [Google Scholar] [CrossRef]
Glosten, Lawrence R., Ravi Jagannathan, and David E. Runkle. 1993. On the relation between the expected value and the volatility of the nominal excess return on stocks. The Journal of Finance 48: 1779–801. [Google Scholar] [CrossRef]
Joe, Harry. 1996. Families of m-variate distributions with given margins and m(m − 1)/2 bivariate dependence parameters. In Distributions with Fixed Marginals and Related Topics. Edited by Ludger Rüschendorf, Berthold Schweizer and Michael D. Taylor. Lecture Notes–Monograph Series; Hayward: Institute of Mathematical Statistics, vol. 28, pp. 120–41. [Google Scholar]
Joe, Harry. 2015. Dependence Modeling with Copulas. Boca Raton: CRC Press. [Google Scholar]
Loaiza-Maya, Rubén, Michael S. Smith, and Worapree Maneesoonthorn. 2018. Time series copulas for heteroskedastic data. Journal of Applied Econometrics 33: 332–54. [Google Scholar] [CrossRef] [Green Version]
McNeil, Alexander J., Rüdiger Frey, and Paul Embrechts. 2015. Quantitative Risk Management: Concepts, Techniques and Tools, 2nd ed. Princeton: Princeton University Press. [Google Scholar]
Mikosch, Thomas, and Catalin Stărică. 2000. Limit theory for the sample autocorrelations and extremes of a GARCH(1,1) process. The Annals of Statistics 28: 1427–51. [Google Scholar]
Patton, Andew J. 2012. A review of copula models for economic time series. Journal of Multivariate Analysis 110: 4–18. [Google Scholar] [CrossRef] [Green Version]
Rémillard, Bruno. 2013. Statistical Methods for Financial Engineering. London: Chapman & Hall. [Google Scholar]
Shephard, Neil. 1996. Shephard, Neil 1996. Statistical aspects of ARCH and stochastic volatility. In Time Series Models in Econometrics, Finance and Other Fields. Edited by David R. Cox, David V. Hinkley and Ole E. Barndorff-Nielsen. London: Chapman & Hall, pp. 1–55. [Google Scholar]
Smith, Michael S., Aleksey Min, Carlos Almeida, and Claudia Czado. 2010. Modeling Longitudinal Data Using a Pair-Copula Decomposition of Serial Dependence. Journal of the American Statistical Association 105: 1467–79. [Google Scholar] [CrossRef]
Taylor, Stephen J. 1994. Modeling stochastic volatility: A review and comparative study. Mathematical Finance 4: 183–204. [Google Scholar] [CrossRef]
Terasaka, Takahiro, and Yuzo Hosoya. 2007. A modified Box-Cox transformation in the multivariate ARMA model. Journal of the Japan Statistical Society 37: 1–28. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Scatterplot of

v_{t}

against

u_{t}

(left), sample acf of raw data

x_{t}

(centre) and sample acf of

z_{t} = Φ^{- 1} (v_{t})

(right). The transformed data are defined by

v_{t} = F_{n}^{(| X |)} (| x_{t} |)

and

u_{t} = F_{n}^{(X)} (x_{t})

where

F_{n}^{(X)}

and

F_{n}^{(| X |)}

denote versions of the empirical distribution function of the

x_{t}

and

| x_{t} |

values, respectively. The sample size is

n = 1043

and the data are daily log-returns of the Bitcoin price for the years 2016–2019.

Figure 1. Scatterplot of

v_{t}

against

u_{t}

(left), sample acf of raw data

x_{t}

(centre) and sample acf of

z_{t} = Φ^{- 1} (v_{t})

(right). The transformed data are defined by

v_{t} = F_{n}^{(| X |)} (| x_{t} |)

and

u_{t} = F_{n}^{(X)} (x_{t})

where

F_{n}^{(X)}

and

F_{n}^{(| X |)}

denote versions of the empirical distribution function of the

x_{t}

and

| x_{t} |

values, respectively. The sample size is

n = 1043

and the data are daily log-returns of the Bitcoin price for the years 2016–2019.

Figure 2. Realizations of length

n = 500

of

(X_{t})

and

(Z_{t})

for a VT-ARMA(1,1) process with a marginal Student t distribution with

ν = 3

degrees of freedom and ARMA parameters

α = 0.95

and

β = - 0.85

. ACF plots for

(X_{t})

and

(| X_{t} |)

are also shown.

Figure 2. Realizations of length

n = 500

of

(X_{t})

and

(Z_{t})

for a VT-ARMA(1,1) process with a marginal Student t distribution with

ν = 3

degrees of freedom and ARMA parameters

α = 0.95

and

β = - 0.85

. ACF plots for

(X_{t})

and

(| X_{t} |)

are also shown.

Figure 3. An asymmetric v-transform from the family defined in (7). For any v-transform, if

v = V (u)

and

u^{*}

is the dual of u, then the points

(u, 0)

,

(u, v)

,

(u^{*}, 0)

and

(u^{*}, v)

form the vertices of a square. For the given fulcrum

δ

, a v-transform can never enter the gray shaded area of the plot.

Figure 3. An asymmetric v-transform from the family defined in (7). For any v-transform, if

v = V (u)

and

u^{*}

is the dual of u, then the points

(u, 0)

,

(u, v)

,

(u^{*}, 0)

and

(u^{*}, v)

form the vertices of a square. For the given fulcrum

δ

, a v-transform can never enter the gray shaded area of the plot.

Figure 4. Top left: realization of length

n = 500

of

(X_{t})

for a process with a marginal skewed Student distribution (parameters:

ν = 3

,

γ = 0.8

,

μ = 0.3

,

σ = 1

) a v-transform of the form (7) (parameters:

δ = 0.50

,

κ = 0.9

,

ξ = 1.1

) and an underlying ARMA process (

α = 0.95

,

β = - 0.85

,

σ_{ϵ} = 0.95

). Top right: the underlying ARMA process

(Z_{t})

in gray with the conditional mean

(μ_{t})

superimposed in black; horizontal lines at

μ_{t} = 0.5

(a high value) and

μ_{t} = - 0.5

(a low value). The corresponding conditional densities are shown in the bottom figures with the marginal density as a dashed line.

Figure 4. Top left: realization of length

n = 500

of

(X_{t})

for a process with a marginal skewed Student distribution (parameters:

ν = 3

,

γ = 0.8

,

μ = 0.3

,

σ = 1

) a v-transform of the form (7) (parameters:

δ = 0.50

,

κ = 0.9

,

ξ = 1.1

) and an underlying ARMA process (

α = 0.95

,

β = - 0.85

,

σ_{ϵ} = 0.95

). Top right: the underlying ARMA process

(Z_{t})

in gray with the conditional mean

(μ_{t})

superimposed in black; horizontal lines at

μ_{t} = 0.5

(a high value) and

μ_{t} = - 0.5

(a low value). The corresponding conditional densities are shown in the bottom figures with the marginal density as a dashed line.

Figure 5. Plots for a VT(2)-ARMA(1,1) model fitted to the Bitcoin return data: QQplot of the residuals against normal (upper left); acf of the residuals (upper right); acf of the absolute residuals (lower left); estimated conditional mean process

(μ_{t})

(lower right).

Figure 5. Plots for a VT(2)-ARMA(1,1) model fitted to the Bitcoin return data: QQplot of the residuals against normal (upper left); acf of the residuals (upper right); acf of the absolute residuals (lower left); estimated conditional mean process

(μ_{t})

(lower right).

Figure 6. Plots for a VT(2)-ARMA(1,1) model combined with a double Weibull marginal distribution fitted to the Bitcoin return data: QQplot of the data against fitted double Weibull model (upper left); estimated volatility proxy profile function

g_{T}

(upper right); estimated v-transform (lower left); implied relationship between data and volatility proxy variable (lower right).

Figure 6. Plots for a VT(2)-ARMA(1,1) model combined with a double Weibull marginal distribution fitted to the Bitcoin return data: QQplot of the data against fitted double Weibull model (upper left); estimated volatility proxy profile function

g_{T}

(upper right); estimated v-transform (lower left); implied relationship between data and volatility proxy variable (lower right).

Figure 7. Plot of estimated 95% value-at-risk (VaR) for Bitcoin return data superimposed on log returns. Solid line shows VaR estimated using the VT(2)-ARMA(1,1) model combined with a double-Weibull marginal distribution; the dashed line shows VaR estimated using a GARCH(1,1) model with GED innovation distribution.

Table 1. Analysis of daily Bitcoin return data 2016–2019. Parameter estimates, standard errors (below estimates) and information about the fit: SW denotes Shapiro–Wilks p-value; L is the maximized value of the log-likelihood and AIC is the Akaike information criterion.

Model	$α_{1}$	$β_{1}$	$δ$	$κ$	$ξ$	SW	L	AIC
VT(1)-ARMA(1,0)	0.283		0.460			0.515	37.59	−71.17
	0.026		0.001
VT(1)-ARMA(1,1)	0.962	−0.840	0.416			0.197	92.91	−179.81
	0.012	0.028	0.004
VT(2)-ARMA(1,1)	0.965	−0.847	0.463	0.920		0.385	94.73	−181.45
	0.011	0.026	0.001	0.131
VT(3)-ARMA(1,1)	0.962	−0.839	0.463	0.881	0.995	0.407	94.82	−179.64
	0.012	0.028	0.001	0.123	0.154

Table 2. VT(2)-ARMA(1,1) model with three different margins: Student-t, Laplace, double Weibull. Parameter estimates, standard errors (alongside estimates) and information about the fit: SW denotes Shapiro–Wilks p-value; L is the maximized value of the log-likelihood and AIC is the Akaike information criterion.

	Student		Laplace		dWeibull
$α_{1}$	0.954	0.012	0.953	0.012	0.965	0.021
$β_{1}$	−0.842	0.026	−0.847	0.025	−0.847	0.035
$δ$	0.478	0.001	0.480	0.002	0.463	0.000
$κ$	0.790	0.118	0.811	0.129	0.939	0.138
$η$	1.941	0.005			0.844	0.022
$μ$	0.319	0.002	0.315	0.002	0.192	0.001
$σ$	2.427	0.003	3.194	0.004	2.803	0.214
SW	0.585		0.551		0.376
L	−2801.696		−2791.999		−2779.950
AIC	5617.392		5595.999		5573.899

Table 3. Comparison of three VT(2)-ARMA(1,1) models with different marginal distributions with two GARCH(1,1) models with different innovation distributions.

	Parameters	AIC
VT-ARMA (Student)	7	5617.39
VT-ARMA (Laplace)	6	5596.00
VT-ARMA (dWeibull)	7	5573.90
GARCH (Student)	5	5629.02
GARCH (GED)	5	5611.53

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

McNeil, A.J. Modelling Volatile Time Series with V-Transforms and Copulas. Risks 2021, 9, 14. https://0-doi-org.brum.beds.ac.uk/10.3390/risks9010014

AMA Style

McNeil AJ. Modelling Volatile Time Series with V-Transforms and Copulas. Risks. 2021; 9(1):14. https://0-doi-org.brum.beds.ac.uk/10.3390/risks9010014

Chicago/Turabian Style

McNeil, Alexander J. 2021. "Modelling Volatile Time Series with V-Transforms and Copulas" Risks 9, no. 1: 14. https://0-doi-org.brum.beds.ac.uk/10.3390/risks9010014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modelling Volatile Time Series with V-Transforms and Copulas

Abstract

1. Introduction

2. A Motivating Model

3. V-Transforms

3.1. A Flexible Parametric Family

3.2. Characterizing v-Transforms

3.3. V-Transforms and Copulas

4. VT-ARMA Copula Models

4.1. Stationary Distribution

4.2. Conditional Distribution

5. Statistical Inference

5.1. Maximum Likelihood Estimation of the VT-ARMA Copula Process

5.2. Adding a Marginal Model

5.3. Example

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs

Appendix A.1. Proof of Proposition 1

Appendix A.2. Proof of Proposition 2

Appendix A.3. Proof of Theorem 1

Appendix A.4. Proof of Proposition 3

Appendix A.5. Proof of Theorem 2

Appendix A.6. Proof of Proposition 4

Appendix A.7. Proof of Theorem 3

Appendix A.8. Proof of Proposition 5

Appendix A.9. Proof of Proposition 6

Appendix A.10. Proof of Proposition 7

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI