Next Article in Journal
Isothermal Oxidation of Aluminized Coatings on High-Entropy Alloys
Next Article in Special Issue
Explicit Formula of Koszul–Vinberg Characteristic Functions for a Wide Class of Regular Convex Cones
Previous Article in Journal
On the Virtual Cell Transmission in Ultra Dense Networks
Previous Article in Special Issue
From Tools in Symplectic and Poisson Geometry to J.-M. Souriau’s Theories of Statistical Mechanics and Thermodynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Non-Asymptotic Confidence Sets for Circular Means †

Institut für Mathematik, Technische Universität Ilmenau, 98684 Ilmenau, Germany
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in Proceedings of the 2nd International Conference on Geometric Science of Information, Palaiseau, France, 28–30 October 2015; Nielsen, F., Barbaresco, F., Eds.; Lecture Notes in Computer Science, Volume 9389; Springer International Publishing: Cham, Switzerland, 2015; pp. 635–642.
These authors contributed equally to this work.
Submission received: 15 July 2016 / Revised: 10 October 2016 / Accepted: 13 October 2016 / Published: 20 October 2016
(This article belongs to the Special Issue Differential Geometrical Theory of Statistics)

Abstract

:
The mean of data on the unit circle is defined as the minimizer of the average squared Euclidean distance to the data. Based on Hoeffding’s mass concentration inequalities, non-asymptotic confidence sets for circular means are constructed which are universal in the sense that they require no distributional assumptions. These are then compared with asymptotic confidence sets in simulations and for a real data set.

1. Introduction

In applications, data assuming values on the circle, i.e., circular data, arise frequently, examples being measurements of wind directions, or time of the day that patients are admitted to a hospital unit. We refer to the literature, e.g., [1,2,3,4,5], for an overview of statistical methods for circular data, in particular the ones described in this section.
Here, we will concern ourselves with the arguably simplest statistic, the mean. However, given that a circle does not carry a vector space structure, i.e., there is neither a natural addition of points on the circle nor can one divide them by a natural number, what should the meaning of “mean” be?
In order to simplify the exposition, we specifically consider the unit circle in the complex plane, S 1 = { z C : | z | = 1 } , and we assume the data can be modelled as independent random variables Z 1 , , Z n which are identically distributed as the random variable Z taking values in S 1 . In the literature, however, the circle is often taken to lie in the real plane R 2 , i.e., while we denote the point on the circle corresponding to an angle θ ( π , π ] by exp ( i θ ) = cos ( θ ) + i sin ( θ ) C one may take it to be ( cos θ , sin θ ) R 2 .
Of course, C is a real vector space, so the Euclidean sample mean Z ¯ n = 1 n k = 1 n Z k C is well-defined. However, unless all Z k take identical values, it will (by the strict convexity of the closed unit disc) lie inside the circle, i.e., its modulus | Z ¯ n | will be less than 1. Though Z ¯ n cannot be taken as a mean on the circle, if Z ¯ n 0 , one might say that it specifies a direction; this leads to the idea of calling Z ¯ n / | Z ¯ n | the circular sample mean of the data.
Observing that the Euclidean sample mean is the minimiser of the sum of squared distances, this can be put in the more general framework of Fréchet means [6]: define the set of circular sample means to be
μ ^ n = argmin ζ S 1 k = 1 n | Z k ζ | 2 ,
and analoguously define the set of circular population means of the random variable Z to be
μ = argmin ζ S 1 E | Z ζ | 2 .
Then, as usual, the circular sample means are the circular population means with respect to the empirical distribution of Z 1 , , Z n .
The circular population mean can be related to the Euclidean population mean E   Z by noting that E | Z ζ | 2 = E | Z E Z | 2 + | E Z ζ | 2 (in statistics, this is called the bias-variance decomposition), so that
μ = argmin ζ S 1 | E Z ζ | 2
is the set of points on the circle closest to E Z . It follows that μ is unique if and only if E Z 0 in which case it is given by μ = E Z / | E Z | , the orthogonal projection of E Z onto the circle; otherwise, i.e., if E Z = 0 , the set of circular population means is all of S 1 . We consider the information of whether the circular population mean is not unique, e.g., but not exclusively because Z is uniformly distributed over the circle, to be relevant; it thus should be inferred from the data as well. Analogously, μ ^ n is either all of S 1 or uniquely given by Z ¯ n / | Z ¯ n | according to whether Z ¯ n is 0 or not. Note that Z ¯ n 0 a.s. if Z is continuously distributed on the circle, even if E Z = 0 . Z ¯ n is what is known as the vector resultant, while Z ¯ n / | Z ¯ n | is sometimes referred to as the mean direction.
The expected squared distances minimised in Equation (2) are given by the metric inherited from the ambient space C; therefore, μ is also called the set of extrinsic population means. If we measured distances intrinsically along the circle, i.e., using arc-length instead of chordal distance, we would obtain what is called the set of intrinsic population means. We will not consider the latter in the following, see e.g., [7] for a comparison and [8,9] for generalizations of these concepts.
Our aim is to construct confidence sets for the circular population mean μ that form a superset of μ with a certain (so-called) coverage probability that is required to be not less than some pre-specified significance level 1 α for α ( 0 , 1 ) .
The classical approach is to construct an asymptotic confidence interval where the coverage probability converges to 1 α when n tends to infinity. This can be done as follows: since Z is a bounded random variable, n ( Z ¯ n E Z ) converges to a bivariate normal distribution when identifying C with R 2 . Now, assume E Z 0 so μ is unique. Then, the orthogonal projection is differentiable in a neighbourhood of E Z , so the δ-method (see e.g., [1] (p. 111) or [4] (Lemma 3.1)) can be applied and one easily obtains
n Arg ( μ 1 μ ^ n ) D N 0 , E ( Im ( μ 1 Z ) ) 2 | E Z | 2 ,
where Arg : C \ { 0 } ( π , π ] R denotes the argument of a complex number (it is defined arbitrarily at 0 C ), while multiplying with μ 1 rotates such that E Z = μ is mapped to 0 ( π , π ] , see e.g., [4] (Proposition 3.1) or [7] (Theorem 5). Estimating the asymptotic variance and applying Slutsky’s lemma, one arrives at the asymptotic confidence set C A = { ζ S 1 : | Arg ( ζ 1 μ ^ n ) | < δ A } provided μ ^ n is unique, where the angle determining the interval is given by
δ A = q 1 α 2 n | Z ¯ n | k = 1 n ( Im ( μ ^ n 1 Z k ) ) 2 ,
with q 1 α 2 denoting the ( 1 α 2 ) -quantile of the standard normal distribution N ( 0 , 1 ) .
There are two major drawbacks to the use of asymptotic confidence intervals: firstly, by definition, they do not guarantee a coverage probability of at least 1 α for finite n, so the coverage probability for a fixed distribution and sample size may be much smaller. Indeed, Simulation 2 in Section 4 demonstrates that, even for n = 100 , the coverage probability may be as low as 64 % when constructing the asymptotic confidence set for 1 α = 90 % . Secondly, they assume that E Z 0 , so they are not applicable to all distributions on the circle. Since in practice it is unknown whether this assumption hold, one would have to test the hypothesis E Z = 0 , possibly again by an asymptotic test, and construct the confidence set conditioned on this hypothesis having been rejected, setting C A = S 1 otherwise. However, this sequential procedure would require some adaptation taking the pre-test into account (cf. e.g., [10])—we come back to this point in Section 5—and it is not commonly implemented in practice.
We therefore aim to construct non-asymptotic confidence sets for μ, guaranteeing coverage with at least the desired probability for any sample size n, which in addition are universal in the sense that they do not make any distributional assumptions about the circular data besides them being independent and identically distributed. It has been shown in [7] that this is possible; however, the confidence sets that were constructed there were far too large to be of use in practice. Nonetheless, we start by varying that construction in Section 2 but using Hoeffding’s inequality instead of Chebyshev’s as in [7]. Considerable improvements are possible if one takes the variance E ( Im ( μ 1 Z ) ) 2 “perpendicular to E Z ” into account; this is achieved by a second construction in Section 3. Of course, the latter confidence sets will still be conservative but Proposition 2(iv) shows that they are (for 1 α = 95 % ) only a factor 3 2 longer than the asymptotic ones when the sample size n is large. We further illustrate and compare those confidence sets in simulations and for an application to real data in Section 4, discussing the results obtained in Section 5.

2. Construction Using Hoeffding’s Inequality

We will construct a confidence set as the acceptance region of a series of tests. This idea has been used before for the construction of confidence sets for the circular population mean [7] (Section 6); however, we will modify that construction by replacing Chebyshev’s inequality—which is too conservative here—by three applications of Hoeffding’s inequality [11] (Theorem 1): if U 1 , , U n are independent random variables taking values in the bounded interval [ a , b ] with < a < b < . Then, U ¯ n = 1 n k = 1 n U k with E U ¯ n = ν fulfills
P U ¯ n ν t ν a ν a + t ν a + t b ν b ν t b ν t n b a
for any t ( 0 , b ν ) . The bound on the right-hand side—denoted β ( t ) —is continuous and strictly decreasing in t (as expected; see Appendix A) with β ( 0 ) = 1 and lim t b ν β ( t ) = ν a b a n whence a unique solution t = t ( γ , ν , a , b ) to the equation β ( t ) = γ exists for any γ ν a b a n , 1 . Equivalently, t ( γ , ν , a , b ) is strictly decreasing in γ . Furthermore, ν + t ( γ , ν , a , b ) is strictly increasing in ν (see Appendix A again), which is also to be expected. While there is no closed form expression for t ( γ , ν , a , b ) , it can without difficulty be determined numerically.
Note that the estimate
β ( t ) exp 2 n t 2 / ( b a ) 2
is often used and called Hoeffding’s inequality [11]. While this would allow to solve explicitly for t, we prefer to work with β as it is sharper, especially for ν close to b as well as for large t. Nonetheless, it shows that the tail bound β ( t ) tends to zero as fast as if using the central limit theorem which is why it is widely applied for bounded variables, see e.g., [12].
Now, for any ζ S 1 , we will test the hypothesis that ζ is a circular population mean. This hypothesis is equivalent to saying that there is some λ [ 0 , 1 ] such that E Z = λ ζ . Multiplication by ζ 1 then rotates E Z onto the non-negative real axis: E ζ 1 Z = λ 0 .
Now, fix ζ and consider X k = Re ( ζ 1 Z k ) , Y k = Im ( ζ 1 Z k ) for k = 1 , , n which may be viewed as the projection of Z 1 , , Z k onto the line in the direction of ζ and onto the line perpendicular to it. Both are sequences of independent random variables taking values in [ 1 , 1 ] with E X k = λ and E Y k = 0 under the hypothesis. They thus fulfill the conditions for Hoeffding’s inequality with a = 1 , b = 1 and ν = λ or 0, respectively.
We will first consider the case of non-uniqueness of the circular mean, i.e., μ = S 1 , or equivalently λ = 0 . Then, the critical value s 0 = t ( α 4 , 0 , 1 , 1 ) is well-defined for any α 4 > 2 n , and we get P ( X ¯ n s 0 ) α 4 , and also, by considering X 1 , , X n , that P ( X ¯ n s 0 ) α 4 . Analogously, P ( | Y ¯ n | s 0 ) 2 α 4 = α 2 . We conclude that
P | Z ¯ n | 2 s 0 = P | X ¯ n | 2 + | Y ¯ n | 2 2 s 0 2 P | X ¯ n | 2 s 0 2 + P | Y ¯ n | 2 s 0 2 α .
Rejecting the hypothesis μ = S 1 , i.e., E Z = 0 , if | Z ¯ n | 2 s 0 thus leads to a test whose probability of false rejection is at most α (see Figure 1). Of course, one may work with | X ¯ n | 2 s 0 2 and | Y ¯ n | 2 s 0 2 as criterions for rejection; however, we prefer working with | Z ¯ n | 2 s 0 since it is independent of the chosen ζ .
In the case of uniqueness of the circular mean, i.e., for the hypothesis λ > 0 , we use the monotonicity of ν + t ( γ , ν , a , b ) in ν and obtain
P X ¯ n s 0 = P X ¯ n t ( α 4 , 0 , 1 , 1 ) P X ¯ n λ + t ( α 4 , λ , 1 , 1 ) α 4
as well. For the direction perpendicular to the direction of ζ (see Figure 2), however, we may now work with 3 8 α , so for s p = t ( 3 8 α , 0 , 1 , 1 ) —which is well-defined whenever s 0 is since 3 8 α > α 4 > 2 n —we obtain
P ( Y ¯ n s p ) + P Y ¯ n s p 2 · 3 8 α .
Rejecting if X ¯ n s 0 or | Y ¯ n | s p , then, will happen with probability at most α 4 + 2 · 3 8 α = α under the hypothesis μ = ζ . In case that we already rejected the hypothesis μ = S 1 , i.e., if | Z ¯ n | 2 s 0 , ζ will not be rejected if and only if X ¯ n > s 0 > 0 and | Y ¯ n | < s p < s 0 which is then equivalent to | Arg ( ζ 1 Z ¯ n ) | = arcsin ( | Y ¯ n | / | Z ¯ n | ) < arcsin ( s p / | Z ¯ n | ) = δ H (see Figure 3).
Define C H as all ζ which we could not reject, i.e.,
C H = S 1 , if   α 2 n + 2   or   | Z ¯ n | 2 s 0 , ζ S 1 : | Arg ( ζ 1 μ ^ n ) | < δ H otherwise .
Then, we obtain the following result:
Proposition 1.
Let Z 1 , , Z n be random variables taking values on the unit circle S 1 , α ( 0 , 1 ) , and let C H be defined as in Equation (8).
(i) 
C H is a ( 1 α ) -confidence set for the circular population mean set. In particular, if E Z = 0 , i.e., the circular population mean set equals S 1 , then | Z ¯ n | > 2 s 0 with probability at most α , so indeed C H = S 1 with probability at least 1 α .
(ii) 
s 0 and s p are of order n 1 2 .
(iii) 
If E Z 0 , then n δ H 0 in probability and the probability of obtaining the trivial confidence set, i.e., P ( C H = S 1 ) = P ( | Z ¯ n | 2 s 0 ) , goes to 0 exponentially fast.
Proof. 
(i) holds by construction.
For (ii), recall Equation (7), from which we obtain the estimates α 4 exp ( n s 0 2 / 2 ) resp. 3 8 α exp ( n s p 2 / 2 ) , implying that s 0 and s p are of order n 1 2 ; the same holds stochastically for δ H since Z ¯ n E Z a.s. Regarding the second statement of (iii), if μ is unique, consider ζ = μ ; then, τ = E X ¯ n < 0 and 2 s 0 is eventually less than τ 2 and also α > 2 n + 2 eventually. Hence, the probability of obtaining the trivial confidence set C H = S 1 is eventually bounded by P ( ζ C H ) P ( X ¯ n > s 0 ) P ( X ¯ n > τ 2 ) = P ( X ¯ n E X ¯ n > τ 2 ) exp ( n τ 2 / 8 ) , and thus will go to zero exponentially fast as n tends to infinity. ☐

3. Estimating the Variance

From the central limit theorem for μ ^ n in case of unique μ, cf. Equation (4), we see that the aymptotic variance of μ ^ n gets small if | E Z | is close to 1 (then E Z is close to the boundary S 1 of the unit disc, which is possible only if the distribution is very concentrated) or if the variance E ( Im ( μ 1 Z ) ) 2 in the direction perpendicular to μ is small (if the distribution were concentrated on ± μ , this variance would be zero and μ ^ n would equal μ with large probability). While δ H ( | Z ¯ n | being the denominator of its sine) takes the former into account, the latter has not been exploited yet. To do so, we need to estimate E ( Im ( μ 1 Z ) ) 2 .
Consider V n = 1 n k = 1 n Y k 2 that is under the hypothesis that the corresponding ζ is the unique circular population mean has expectation σ 2 = Var ( Y k ) = E ( Im ( ζ 1 Z ) ) 2 . Now, 1 V n = 1 n k = 1 n ( 1 Y k 2 ) is the mean of n independent random variables taking values in [ 0 , 1 ] and having expectation 1 σ 2 . By another application of Equation (6), we obtain P ( σ 2 V n + t ) = P ( 1 V n 1 σ 2 + t ) α 4 for t = t ( α 4 , 1 σ 2 , 0 , 1 ) , the latter existing if α 4 > ( 1 σ 2 ) n . Since 1 σ 2 + t ( α 4 , 1 σ 2 , 0 , 1 ) increases with 1 σ 2 , there is a minimal σ 2 for which 1 V n 1 σ 2 + t ( α 4 , 1 σ 2 , 0 , 1 ) holds and becomes an equality; we denote it by σ 2 ^ = V n + t ( α 4 , 1 σ 2 ^ , 0 , 1 ) . Inserting into Equation (6), it by construction fulfills
α 4 = 1 σ 2 ^ 1 V n 1 V n σ 2 ^ V n V n n .
It is easy to see that the right-hand side depends continuously on and is strictly decreasing in σ 2 ^ [ V n , 1 ] (see Appendix A), thereby traversing the interval [ 0 , 1 ] so that one can again solve the equation numerically. We then may, with an error probability of at most α 4 , use σ 2 ^ as an upper bound for σ 2 . Note that σ 2 ^ > V n exists if α 4 > ( 1 σ 2 ^ ) n . The latter is fulfilled for any V n < 1 since Equation (9) is equivalent to
α 4 = ( 1 σ 2 ^ ) n 1 1 V n > 1 1 σ 2 ^ 1 V n V n > 1 σ 2 ^ V n V n > 1 n .
For V n = 1 , let σ 2 ^ = 1 be the trivial bound.
With such an upper bound on its variance, we now can get a better estimate for P ( Y ¯ n > t ) . Indeed, one may use another inequality by Hoeffding [11] (Theorem 3): the mean W ¯ n = 1 n k = 1 n W k of a sequence W 1 , , W n of independent random variables taking values in ( , 1 ] , each having zero expectation as well as variance ρ 2 fulfills
P W ¯ n w 1 + w ρ 2 ρ 2 w 1 w w 1 n 1 + ρ 2 ,
exp ( n t [ ( 1 + ρ 2 t ) ln ( 1 + t ρ 2 ) 1 ] ) .
for any w ( 0 , 1 ) . Again, an elementary calculation (analogous to Lemma A1) shows that the right-hand side of Equation (10) is strictly decreasing in w, continuously ranging between 1 and ( ρ 2 1 + ρ 2 ) n as w varies in ( 0 , 1 ) , so that there exists a unique w = w ( γ , ρ 2 ) for which the right-hand side equals γ, provided γ ( ρ 2 1 + ρ 2 ) n , 1 . Moreover, the right-hand side increases with ρ 2 (as expected), so that w ( γ , ρ 2 ) is increasing in ρ 2 , too (cf. Appendix A).
Therefore, under the hypothesis that the corresponding ζ is the unique circular population mean, P | Y ¯ n | w ( α 4 , σ 2 ) 2 α 4 = α 2 . Now, since P ( w ( α 4 , σ 2 ) w ( α 4 , σ 2 ^ ) ) = P ( σ 2 σ 2 ^ ) α 4 , setting s V = w ( α 4 , σ 2 ^ ) we get P | Y ¯ n | s V 3 4 α . Note that ρ 2 1 + ρ 2 increases with ρ 2 , so in case s 0 exists, σ 2 ^ 1 implies α 4 > 2 n σ 2 ^ 1 + σ 2 ^ n , i.e., the existence of s V .
Following the construction for C H from Section 2, we can again obtain a confidence set for μ with coverage probability at least 1 α as shown in our previous article [13]. In practice however, this confidence set is hard to calculate since σ 2 ^ = σ 2 ^ ( ζ ) has to be calculated for every ζ S 1 . Though these confidence sets can be approximated by using a grid as in [13], we suggest using a simultaneous upper bound for the variance of Im ζ 1 Z k .
We obtain a (conservative) connected, symmetric confidence set C V C H by testing ζ C H with σ max 2 ^ = sup ζ C H σ 2 ^ as a common upper bound for the variance perpendicular to any ζ C H . Note that σ max 2 ^ can be obtained as the solution of Equation (9) with
V ˜ n = sup ζ C H 1 n k = 1 n Im ζ 1 Z k 2 .
Furthermore, we can shorten C V by iteratively redefining V ˜ n = sup ζ C V 1 n k = 1 n Im ζ 1 Z k 2 and recalculating C V (see Algorithm 1). The resulting opening angle will be denoted by δ V = arcsin s V | Z ¯ n | .
Algorithm 1: Algorithm for computation of C V .
Entropy 18 00375 i001
Proposition 2.
Let Z 1 , , Z n be random variables taking values on the unit circle S 1 , and let α ( 0 , 1 ) .
(i) 
The set C V resulting from Algorithm 1 is a ( 1 α ) -confidence set for the circular population mean set. In particular, if E Z = 0 , i.e., the circular population mean set equals S 1 , then | Z ¯ n | > 2 s 0 with probability at most α , so indeed C V = S 1 with probability of at least 1 α .
(ii) 
s V is of order n 1 2 .
(iii) 
If E Z 0 , i.e., if the circular population mean is unique, then n δ V 0 in probability, and the probability of obtaining a trivial confidence set, i.e., P ( C H = S 1 ) = P ( | Z ¯ n | 2 s 0 ) , goes to 0 exponentially fast.
(iv) 
If E Z 0 , then
lim sup n δ V δ A 2 ln α 4 q 1 α 2 a . s .
with q 1 α 2 denoting the ( 1 α 2 ) -quantile of the standard normal distribution N ( 0 , 1 ) .
Proof. 
Again, (i) follows by construction, while (iii) is shown as in Proposition 1.
For (ii), note that s V s 0 since the bound in Equation (10) for ρ 2 = 1 agrees with the bound in Equation (6) for a = 1 , b = 1 and v = 0 , thus s V and δ V are at least of the order n 1 2 .
For (iv), we will use the estimate in Equation (11). Recall that ln ( 1 + x ) = x x 2 2 + o ( x 2 ) ; therefore, for large n and hence small s V a.s.
α 4 exp n s V 1 + σ max 2 ^ s V s V σ max 2 ^ s V 2 2 ( σ max 2 ^ ) 2 + o ( s V 2 ) 1 = exp ( n s V 2 / 2 σ max 2 ^ + o ( s V 2 ) ) ,
thus s V 2 σ max 2 ^ ln ( α 4 ) / n + o ( n 1 2 ) . Additionally, arcsin x = x + o ( x ) for x close to 0 which gives δ V = s V / | Z ¯ n | + o ( s V ) 2 σ max 2 ^ ln α 4 / ( n | Z ¯ n | ) + o ( n 1 2 ) a.s.
Furthermore, σ max 2 ^ σ 2 a.s. for n , and we obtain
lim sup n δ V δ A 2 ln α 4 q 1 α 2 a . s .
since
δ A = q 1 α 2 n | Z ¯ n | 1 n k = 1 n ( Im ( μ ^ n 1 Z k ) ) 2 σ 2
(see Equation (5)). ☐

4. Simulation and Application to Real Data

We will compare the asymptotic confidence set C A , the confidence set C H constructed directly using Hoeffding’s inequality in Section 2, and the confidence set C V resulting from Algorithm 1 by reporting their corresponding opening angles δ A , δ H , and δ V in degrees ( ) as well as their coverage frequencies in simulations.
All computations have been performed using our own code based on the software package R (version 2.15.3) [14] .

4.1. Simulation 1: Two Points of Equal Mass at ± 10

First, we consider a rather favourable situation: n = 400 independent draws from the distribution with P ( Z = exp ( 10 π i / 180 ) ) = P ( Z = exp ( 10 π i / 180 ) ) = 1 2 . Then, we have | E Z | = E Z = cos ( 10 π i / 180 ) 0.985 , implying that the data are highly concentrated, μ = 1 is unique, and the variance of Z in the direction of μ is 0; there is only variation perpendicular to μ, i.e., in the direction of the imaginary axis (see Figure 4).
Table 1 shows the results based on 10,000 repetitions for a nominal coverage probability of 1 α = 95 % : the average δ H is about 3 . 5 times larger than δ V , which is about twice as large as δ A . As expected, the asymptotics are rather precise in this situation: C A did cover the true mean in about 95 % of the cases, which implies that the other confidence sets are quite conservative; indeed C H and C V covered the true mean in all repetitions. One may also note that the angles varied only a little between repetitions.

4.2. Simulation 2: Three Points Placed Asymmetrically

Secondly, we consider a situation which has been designed to show that even a considerably large sample size ( n = 100 ) does not guarantee approximate coverage for the asymptotic confidence set C A : the distribution of Z is concentrated on three points, ξ j = exp ( θ j π i / 180 ) , j = 1 , 2 , 3 with weights ω j = P ( Z = ξ j ) chosen such that E Z = | E Z | = 0.9 (implying a small variance and μ = 1 ), ω 1 = 1 % and Arg ξ 1 > 0 , while Arg ξ 2 , Arg ξ 3 < 0 . In numbers, θ 1 25.8 , θ 2 0.3 , and θ 3 179.7 (in ) while ω 2 94 % , and ω 3 5 % (see Figure 5).
The results based on 10,000 repetitions are shown in Table 2 where a nominal coverage probability of 1 α = 90 % was prescribed. Clearly, C A with its coverage probability of less than 64 % performs quite poorly while the others are conservative; δ V 5 still appears small enough to be useful in practice, though.

4.3. Real Data: Movements of Ants

Fisher [3] (Example 4.4) describes a data set of the directions 100 ants took in response to an illuminated target placed at 180 for which it may be of interest to know whether the ants indeed (on average) move towards that target (see [15] for the original publication). The data set is available as Ants_radians within the R package CircNNTSR [16].
The circular sample mean for this data set is about 176.9 ; for a nominal coverage probability of 1 α = 95 % , one gets δ H 27.3 , δ V 20.5 , and δ A 9.6 so that all confidence sets contain ± 180 (see Figure 6). The data set’s concentration is not very high, however, so the circular population mean could—according to C V —also be 156.4 or 162.6 .

5. Discussion

We have derived two confidence sets, C H and C V , for the set of circular sample means. Both guarantee coverage for any finite sample size without making any assumptions on the distribution of the data (besides that they are independent and identically distributed) at the cost of potentially being quite conservative; they are non-asymptotic and universal in this sense. Judging from the simulations and the real data set, C V —which estimates the variance perpendicular to the mean direction—appears to be preferable over C H (as expected) and small enough to be useful in practice.
While the asymptotic confidence set’s opening angle is less than half (asymptotically about 2 / 3 for α = 5 % ) of the one for C V in our simulations and application, it has the drawback that even for a sample size of n = 100 , it may fail to give a coverage probability close to the nominal one; in addition, one has to assume that the circular population mean is unique. Of course, one could also devise an asymptotically justified test for the latter but this would entail a correction for multiple testing (for example working with α 2 each time), which would also render the asymptotic confidence set conservative.
Further improvements would require sharper “universal” mass concentration inequalities taking the first or the first two moments into account; however, this is beyond the scope of this article.

Acknowledgments

T. Hotz wishes to thank Stephan Huckemann from the Georgia Augusta University of Göttingen for fruitful discussions concerning the first construction of confidence regions described in Section 2. We acknowledge support for the Article Processing Charge by the German Research Foundation and the Open Access Publication Fund of the Technische Universität Ilmenau. F. Kelma acknowledges support by the Klaus Tschira Stiftung, gemeinnützige Gesellschaft, Projekt 03.126.2016.

Author Contributions

All authors contributed to the theoretical and numerical results as well as to the writing of the article. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Monotonicity

Lemma A1.
β ( t ) = ν a ν a + t ν a + t b ν b ν t b ν t n b a is strictly decreasing in t .
Proof. 
We show the equivalent statement that β ˜ ( t ) = ln ν a ν a + t ν a + t b ν b ν t b ν t is strictly decreasing in t:
d d t β ˜ ( t ) = d d t ln ( ν a ) ln ( ν a + t ) ( ν a + t ) + ln ( b ν ) ln ( b ν t ) ( b ν t ) = ln ( ν a ) ln ( ν a + t ) 1 ν a + t ( ν a + t ) ln ( b ν ) + ln ( b ν t ) + 1 b ν t ( b ν t ) = ln b ν t b ν < 1 · ν a ν a + t < 1 < 0 .
Hence, β ˜ ( t ) and thus β ( t ) are strictly decreasing in t .  ☐
Lemma A2.
Let t = t ( γ , ν , a , b ) be the solution to the equation β ( t ) = γ . Then, ν + t is strictly increasing in ν .
Proof. 
t is the solution of the equation
( ν a + t ) ln ν a ν a + t + ( b ν t ) ln b ν b ν t = b a n ln γ .
The derivatives of the left-hand side of Equation (A1) w.r.t. ν and t exist and are continuous. Furthermore, the derivative w.r.t. t does not vanish for any t ( 0 , b ν ) , cf. the proof of Lemma A1, whence the derivative t = d t d ν exists by the implicit function theorem. When differentiating Equation (A1) with respect to ν , one obtains
( 1 + t ) ln ν a ν a + t + ( ν a + t ) 1 ν a 1 + t ν a + t ( 1 + t ) ln b ν b ν t + ( b ν t ) 1 b ν + 1 + t b ν t = 0 ,
or equivalently
( 1 + t ) ln ν a ν a + t < 0 ln b ν b ν t > 0 = t ( a b ) ( v a ) ( b v ) < 0 ,
whence 1 + t = d d ν ( ν + t ) > 0 finishes the proof. ☐
Lemma A3.
The function
ξ ( σ 2 ^ ) = 1 σ 2 ^ 1 V n 1 V n σ 2 ^ V n V n n
is strictly decreasing in σ 2 ^ [ V n , 1 ] .
Proof. 
We show the equivalent statement that n 1 ln ξ ( σ 2 ^ ) is strictly decreasing in σ 2 ^ :
d d σ 2 ^ n 1 ln ξ ( σ 2 ^ ) = d d σ 2 ^ ( 1 V n ) ( ln ( 1 σ 2 ^ ) ln ( 1 V n ) ) + V n ( ln ( σ 2 ^ ) ln ( V n ) ) = 1 V n 1 σ 2 ^ > 1 + V n σ 2 ^ < 1 < 0 .
 ☐
Lemma A4.
Let w = w ( γ , ρ 2 ) be the solution of the equation
1 + w ρ 2 ρ 2 w 1 w w 1 n 1 + ρ 2 = γ .
Then, w is increasing in ρ 2 .
Proof. 
w is the solution of the equation
ρ 2 + w 1 + ρ 2 ln 1 + w ρ 2 + 1 w 1 + ρ 2 ln 1 w = ln γ n .
The derivatives of the left-hand side of Equation (A2) w.r.t. ρ 2 and w exist and are continuous. Furthermore, the derivative w.r.t. w does not vanish for any w ( 0 , 1 ) : this derivative is
1 1 + ρ 2 ln 1 + w ρ 2 + ρ 2 + w ρ 2 1 + w ρ 2 ln ( 1 w ) 1 = 1 1 + ρ 2 ln 1 + w ρ 2 ln ( 1 w ) ,
vanishing if and only if 1 + w ρ 2 = 1 w , i.e., if and only if w ( 1 + 1 ρ 2 ) = 0 , which does not happen for w , ρ 2 > 0 . Now, the derivative w = d w d ρ 2 exists by the implicit function theorem. When differentiating Equation (A2) with respect to ρ 2 , one obtains
( 1 + w ) ( 1 + ρ 2 ) ( ρ 2 + w ) ( 1 + ρ 2 ) 2 ln 1 + w ρ 2 ( 1 + w ) ( 1 + ρ 2 ) ( ρ 2 + w ) ( 1 + ρ 2 ) 2 ln + ρ 2 + w 1 + ρ 2 · w ρ 2 w ρ 4 1 + w ρ 2 w ρ 2 w ρ 2 ( 1 + ρ 2 ) w ( 1 + ρ 2 ) + ( 1 w ) ( 1 + ρ 2 ) 2 ln ( 1 w ) w 1 + ρ 2 = 0 ,
or equivalently
w ln 1 + w ρ 2 ln ( 1 w ) > 0 = w ρ 2 1 w 1 + ρ 2 ln ρ 2 + w ρ 2 ( 1 w ) .
Hence, w 0 if and only if w ρ 2 1 w 1 + ρ 2 ln ( ρ 2 + w ρ 2 ( 1 w ) ) , which holds since ln ( ρ 2 + w ρ 2 ( 1 w ) ) = ln ( 1 + w ( 1 + ρ 2 ) ρ 2 ( 1 w ) ) w ρ 2 1 + ρ 2 1 w , finishing the proof. ☐

References

  1. Mardia, K.V. Directional Statistics; Academic Press: London, UK, 1972. [Google Scholar]
  2. Watson, G.S. Statistics on Spheres; Wiley: New York, NY, USA, 1983. [Google Scholar]
  3. Fisher, N.I. Statistical Analysis of Circular Data; Cambridge University Press: Cambridge, UK, 1993. [Google Scholar]
  4. Jammalamadaka, S.R.; SenGupta, A. Topics in Circular Statistics; Series on Multivariate Analysis; World Scientific: Singapore, 2001. [Google Scholar]
  5. Mardia, K.V.; Jupp, P.E. Directional Statistics; Wiley: New York, NY, USA, 2000. [Google Scholar]
  6. Fréchet, M. Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’Institut Henri Poincaré 1948, 10, 215–310. (In French) [Google Scholar]
  7. Hotz, T. Extrinsic vs. Intrinsic Means on the Circle. In Proceedings of the 1st Conference on Geometric Science of Information, Paris, France, 28–30 October 2013; Lecture Notes in Computer Science, Volume 8085. Springer-Verlag: Heidelberg, Germany, 2013; pp. 433–440. [Google Scholar]
  8. Afsari, B. Riemannian Lp center of mass: Existence, uniqueness, and convexity. Proc. Am. Math. Soc. 2011, 139, 655–673. [Google Scholar] [CrossRef]
  9. Arnaudon, M.; Miclo, L. A stochastic algorithm finding p-means on the circle. Bernoulli 2016, 22, 2237–2300. [Google Scholar] [CrossRef]
  10. Leeb, H.; Pötscher, B.M. Model selection and inference: Facts and fiction. Econ. Theory 2005, 21, 21–59. [Google Scholar] [CrossRef]
  11. Hoeffding, W. Probability Inequalities for Sums of Bounded Random Variables. J. Am. Stat. Assoc. 1963, 58, 13–30. [Google Scholar] [CrossRef]
  12. Boucheron, S.; Lugosi, G.; Massart, P. Concentration Inequalities : A Nonasymptotic Theory of Independence; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
  13. Hotz, T.; Kelma, F.; Wieditz, J. Universal, Non-asymptotic Confidence Sets for Circular Means. In Proceedings of the 2nd International Conference on Geometric Science of Information, Palaiseau, France, 28–30 October 2015; Nielsen, F., Barbaresco, F., Eds.; Lecture Notes in Computer Science, Volume 9389. Springer International Publishing: Cham, Switzerland, 2015; pp. 635–642. [Google Scholar]
  14. R Core Team. R: A Language and Environment for Statistical Computing, version 2.15.3; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
  15. Jander, R. Die optische Richtungsorientierung der Roten Waldameise (Formica Rufa L.). Zeitschrift für Vergleichende Physiologie 1957, 40, 162–238. (In German) [Google Scholar] [CrossRef]
  16. Fernandez-Duran, J.J.; Gregorio-Dominguez, M.M. CircNNTSR: An R Package for the Statistical Analysis of Circular Data Using Nonnegative Trigonometric Sums (NNTS) Models, version 2.1. 2013.
Figure 1. The construction for the test of the hypothesis μ = S 1 , or equivalently E Z = 0 .
Figure 1. The construction for the test of the hypothesis μ = S 1 , or equivalently E Z = 0 .
Entropy 18 00375 g001
Figure 2. The construction for the test of the hypothesis E Z = λ ζ with λ > 0 .
Figure 2. The construction for the test of the hypothesis E Z = λ ζ with λ > 0 .
Entropy 18 00375 g002
Figure 3. The critical Z ¯ n regarding the rejection of ζ. δ H bounds the angle between μ ^ n and any accepted ζ .
Figure 3. The critical Z ¯ n regarding the rejection of ζ. δ H bounds the angle between μ ^ n and any accepted ζ .
Entropy 18 00375 g003
Figure 4. Two points of equal mass at ± 10 and their Euclidean mean.
Figure 4. Two points of equal mass at ± 10 and their Euclidean mean.
Entropy 18 00375 g004
Figure 5. Three points placed asymmetrically with different masses and their Euclidean mean.
Figure 5. Three points placed asymmetrically with different masses and their Euclidean mean.
Entropy 18 00375 g005
Figure 6. Ant data ( Entropy 18 00375 i002) placed at increasing radii to visually resolve ties; in addition, the circular mean direction ( Entropy 18 00375 i003) as well as confidence sets C H ( Entropy 18 00375 i004), C V ( Entropy 18 00375 i005), and C A ( Entropy 18 00375 i006) are shown.
Figure 6. Ant data ( Entropy 18 00375 i002) placed at increasing radii to visually resolve ties; in addition, the circular mean direction ( Entropy 18 00375 i003) as well as confidence sets C H ( Entropy 18 00375 i004), C V ( Entropy 18 00375 i005), and C A ( Entropy 18 00375 i006) are shown.
Entropy 18 00375 g006
Table 1. Results for simulation 1 (two points of equal mass at ± 10 ) based on 10,000 repetitions with n = 400 observations each: average observed δ H , δ V , and δ A (with corresponding standard deviation), as well as frequency (with corresponding standard error) with which μ = 1 was covered by C H , C V , and C A , respectively; the nominal coverage probability was 1 α = 95 % .
Table 1. Results for simulation 1 (two points of equal mass at ± 10 ) based on 10,000 repetitions with n = 400 observations each: average observed δ H , δ V , and δ A (with corresponding standard deviation), as well as frequency (with corresponding standard error) with which μ = 1 was covered by C H , C V , and C A , respectively; the nominal coverage probability was 1 α = 95 % .
Confidence SetMean δ (±s.d.)Coverage Frequency (±s.e.)
C H 8.2 ( ± 0.0005 ) 100.0 % ( ± 0.0 % )
C V 2.4 ( ± 0.0025 ) 100.0 % ( ± 0.0 % )
C A 1.0 ( ± 0.0019 ) 94.8 % ( ± 0.2 % )
Table 2. Results for simulation 2 (three points placed asymmetrically) based on 10,000 repetitions with n = 100 observations each: average observed δ H , δ V , and δ A (with corresponding standard deviation), as well as frequency (with corresponding standard error) with which μ = 1 was covered by C H , C V , and C A , respectively; the nominal coverage probability was 1 α = 90 % .
Table 2. Results for simulation 2 (three points placed asymmetrically) based on 10,000 repetitions with n = 100 observations each: average observed δ H , δ V , and δ A (with corresponding standard deviation), as well as frequency (with corresponding standard error) with which μ = 1 was covered by C H , C V , and C A , respectively; the nominal coverage probability was 1 α = 90 % .
Confidence SetMean δ (±s.d.)Coverage Frequency (±s.e.)
C H 16.5 ( ± 0.85 ) 100.0 % ( ± 0.0 % )
C V 5.0 ( ± 0.38 ) 100.0 % ( ± 0.0 % )
C A 0.4 ( ± 0.28 ) 62.8 % ( ± 0.5 % )

Share and Cite

MDPI and ACS Style

Hotz, T.; Kelma, F.; Wieditz, J. Non-Asymptotic Confidence Sets for Circular Means. Entropy 2016, 18, 375. https://0-doi-org.brum.beds.ac.uk/10.3390/e18100375

AMA Style

Hotz T, Kelma F, Wieditz J. Non-Asymptotic Confidence Sets for Circular Means. Entropy. 2016; 18(10):375. https://0-doi-org.brum.beds.ac.uk/10.3390/e18100375

Chicago/Turabian Style

Hotz, Thomas, Florian Kelma, and Johannes Wieditz. 2016. "Non-Asymptotic Confidence Sets for Circular Means" Entropy 18, no. 10: 375. https://0-doi-org.brum.beds.ac.uk/10.3390/e18100375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop