Next Article in Journal
The Information Conveyed in a SPAC′s Offering
Previous Article in Journal
Coherence and Entropy of Credit Cycles across the Euro Area Candidate Countries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing

1
School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
2
Department of Computing, Imperial College London, London SW7 2AZ, UK
3
Department of Mathematics, Imperial College London, London SW7 2AZ, UK
*
Author to whom correspondence should be addressed.
Submission received: 5 August 2021 / Revised: 9 September 2021 / Accepted: 10 September 2021 / Published: 14 September 2021

Abstract

:
The Wasserstein distance, especially among symmetric positive-definite matrices, has broad and deep influences on the development of artificial intelligence (AI) and other branches of computer science. In this paper, by involving the Wasserstein metric on S P D ( n ) , we obtain computationally feasible expressions for some geometric quantities, including geodesics, exponential maps, the Riemannian connection, Jacobi fields and curvatures, particularly the scalar curvature. Furthermore, we discuss the behavior of geodesics and prove that the manifold is globally geodesic convex. Finally, we design algorithms for point cloud denoising and edge detecting of a polluted image based on the Wasserstein curvature on S P D ( n ) . The experimental results show the efficiency and robustness of our curvature-based methods.

1. Introduction

Symmetric positive-definite matrices have wide usage in many fields of information science, such as stability analysis of signal processing, linear stationary systems, optimal control strategies and imaging analysis [1,2,3]. Its importance is beyond words [4,5]. Instead of considering a single matrix, contemporary scientists tend to comprehend the global structure of the set consisting of all n × n symmetric positive-definite matrices. This set is known as S P D ( n ) . S P D ( n ) can be endowed with various structures. The most traditional Euclidean metric is induced as submanifold metric from the Euclidean inner product on the space of matrices. X. Pennec, P. Fillard et al. [6] defined the affine-invariant Riemannian metric. V. Arsigny, P. Fillard et al. [7] showed the Lie group [8] structure on S P D ( n ) , which admits a bi-invariant metric called the Log-Euclidean metric.
Recently, by constructing a principle bundle, Y. Li, M. Wong et al. [9] and S. Zhang et al. [10] defined a new Riemannian metric on S P D ( n ) whose geodesic distance is equivalent to the Wasserstein-2 [11,12] distance, the so-called Wasserstein metric. This distance rather than the metric has been widely used in artificial intelligence [13]. In geometry, encouragingly, T. Asuka [14] and E. Massart, P.-A. Absil [15] gave a series of expressions for geometric quantities theoretically. However, these expressions are too general or complicated to be applied. In this paper, we derive more computationally feasible expressions in a concrete case. Moreover, we give the Jacobi field and scalar curvature.
Along the blooming of data science, point cloud processing, especially denoising, plays an increasingly important role in data relevant researches and engineering. There are immense literature in point cloud denoising and widely used algorithms packed as inline functions of softwares such as PCL [16]. These methods share a common drawback when high density noise is added to point clouds. Utilizing the geometric structure of S P D ( n ) , we design a novel algorithm to overcome this drawback. Compared to traditional methods, our algorithm is more accurate and less dependent on artificial parameters.
In addition to that, we involve our theory for image edge detection, which is a classical problem in image processing and design a new detecting algorithm. Different from traditional gradient-based filters, such as Sobel, Prewitt and Laplacian, we present the connection between Wasserstein sectional curvature and edges. Experiments show the feasibility of our algorithm.
The paper is organized as follows. In Section 2, we introduce some basic knowledge of the Riemannian manifold ( S P D ( n ) , g W ) , and consider the symmetry of the ( S P D ( n ) , g W ) as well. In Section 3, we describe the Wasserstein geometry of S P D ( n ) , including the geodesic, exponential map, connection and curvature. In particular, we prove the geodesic convexity and the nonexistence of cut locus and conjugate pair. In Section 4, we design an adaptive algorithm to denoise point clouds. In Section 5, we develop a curvature-based method to detect image edge. Proofs and detailed numerical results are presented in the Appendix B.

2. Preliminary

2.1. Notation

In this paper, we adopt conventional notations in algebra and geometry. Riemannian manifolds are denoted as pairs of ( manifold , metric ) . For example, our mainly interesting object is ( S P D ( n ) , g W ) , meaning S P D ( n ) endowed with Wasserstein metric. R n is the n-dimensional Euclidean space. M ( n ) represents the set of n × n matrices, S y m ( n ) means the set of n × n symmetric matrices, and O ( n ) means the set of n × n orthogonal matrices. T A M is conventionally the tangent space of M at a point A.
Λ always represents a diagonal n × n matrix. For an n × n matrix A, λ ( A ) or λ i ( A ) means an eigenvalue or the i-th eigenvalue of A, respectively. The components of matrix A with the entries A i j will always be noted as [ A i j ] . The identity matrix is denoted as I. In this paper, we conventionally express points on manifolds as A , B and vector fields as X , Y .
Sylvester equation is one of the most classical matrix equations. The following special case of Sylvester equation plays a key role in understand the geometry of ( S P D ( n ) , g W )
A K + K A = X , A S P D ( n ) , X S y m ( n ) .
We denote the solution about K of (1) as Γ A [ X ] . Then, the matrix Γ A [ X ] M ( n ) satisfies
A Γ A [ X ] + Γ A [ X ] A = X .
From geometric aspects, we can ensure the existence and uniqueness of the solution in the case involved in this paper. Some properties of Γ A [ X ] can be found in Appendix A. More details about the Sylvester equation are presented in [17].
We recall an algorithm to solve this kind of Sylvester equations, which offers an explicit expression of the solution. This expression only depends on the eigenvalue decomposition. More details can be found in [10].
Algorithm 1 will be used frequently in the following passage, and it helps us to comprehend the geometry of S P D ( n ) . Note that this algorithm can also be used for general X M ( n ) . In particular, when X is symmetric (skew-symmetric), Γ A [ X ] is also symmetric (skew-symmetric). Moreover, this algorithm will be simplified if A is diagonal.
Algorithm 1 Solution of Sylvester Equation.
Input: A S P D ( n ) , X S y m ( n )
Output: Γ A [ X ]
1:
Eigenvalue decomposition, A = Q Λ Q T , where Q O ( n ) , Λ : = diag ( λ 1 , , λ n ) are eigenvalues of A;
2:
C X : = [ c i j ] = Q T X Q ;
3:
E X = [ e i j ] = c i j λ i + λ j ;
4:
Γ A [ X ] = Q E Q T .

2.2. Wasserstein Metric

In this part, we introduce the Wasserstein metric on S P D ( n ) .
Definition 1.
For any A S P D ( n ) , X , Y T A S P D ( n ) , define
g W | A ( X , Y ) = tr ( Γ A [ X ] A Γ A [ Y ] ) = 1 2 tr ( X Γ A [ Y ] ) .
In the second equation, we use the facts that Γ A [ X ] , Γ A [ Y ] , A are all symetric and that A Γ A [ X ] + Γ A [ X ] A = X . One can check that g W is a symmetric and non-degenerated bilinear tensor fields on S P D ( n ) [18]. We call g W the Wasserstein metric.
Denote g E ( X ˜ , Y ˜ ) : = tr ( X ˜ T Y ˜ ) , A ˜ G L ( n ) , X ˜ , Y ˜ T A ˜ G L ( n ) as Euclidean metric on G L ( n ) . Then, we have the following conclusions.
Proposition 1.
The projection
σ : ( G L ( n ) , g E ) ( S P D ( n ) , g W ) A ˜ A ˜ T A ˜
is a Riemannian submersion [19], which means that d σ is surjective and
g E ( X ˜ , Y ˜ ) = g W ( d σ ( X ˜ ) , d σ ( Y ˜ ) ) = g W ( X , Y ) .
Remark 1.
The general linear group with Euclidean metric ( G L ( n ) , g E ) and projection σ is a trivial principal bundles on S P D ( n ) with orthogonal group O ( n ) as the structure group. This bundle structure establishes two facts [10]: S P D ( n ) G L ( n ) / O ( n ) , and g E remains invariant under the group action of O ( n ) .
Before giving more conclusions, we review some concepts. For any A ˜ G L ( n ) , we say that the tangent vector V ˜ T A ˜ G L ( n ) is vertical if d σ ( V ˜ ) = 0 , and W ˜ T A ˜ G L ( n ) is horizontal if g W | A ˜ ( V ˜ , W ˜ ) = 0 for all vertical vecters V ˜ . In addition to that, if d σ ( X ˜ ) = X T A S P D ( n ) , we call X ˜ as a lift of X, where A = σ ( A ˜ ) . Using the notation Γ A [ X ] , we can find the horizontal lift of X T A S P D ( n ) .
Proposition 2.
For any A ˜ ( G L ( n ) , g E ) , A = σ ( A ˜ ) and any X T A S P D ( n ) , there is a unique X ˜ to be the horizontal lift of X at T A ˜ G L ( n ) —that is,
X ˜ = A ˜ Γ A [ X ] .
Using Proposition 2, we can obtain the representations of horizontal and vertical vectors.
Proposition 3.
For any A ˜ ( G L ( n ) , g E ) , T A ˜ G L ( n ) has the following orthogonal decomposition
T A ˜ G L ( n ) = H ( A ˜ ) V ( A ˜ ) ,
where H ( A ˜ ) consists of all horizontal vectors, V ( A ˜ ) consists of all vertical vectors, and
H ( A ˜ ) = { A ˜ K | K T = K } , V ( A ˜ ) = { S A ˜ | S T = S } .
Proofs of Proposition 2 and 3 can be found in [10].

2.3. Symmetry

Now we study the symmetry of ( S P D ( n ) , g W ) . Consider orthogonal group action Ψ : O ( n ) × S P D ( n ) S P D ( n ) defined by
Ψ O ( A ) = O A O T , O O ( n ) , A S P D ( n ) .
One can check that Ψ is a group action of O ( n ) and that d Ψ O are isometric for all O O ( n ) , which means that O ( n ) is isomorphic to a subgroup of the isometry group I S O ( S P D ( n ) , g W ) on S P D ( n ) . Precisely, we have
{ Ψ O } O O ( n ) I S O ( S P D ( n ) , g W ) .
According to (10), when we study local geometric characteristics, we only need to consider the sorted diagonal matrices as the representational elements under the orthogonal action rather than all general points on S P D ( n ) . Therefore, some pointwise quantities, such as the scalar curvature and the bounds of sectional curvatures, depend only on the eigenvalues.
At the end of this part, we give the symmetry degree of ( S P D ( n ) , g W ) , which is defined by the dimension of I S O ( S P D ( n ) , g W ) .
Proposition 4.
( S P D ( n ) ) , g W ) has its symmetry degree controlled by
1 2 ( n 1 ) n dim ( I S O ( S P D ( n ) , g W ) ) 1 8 ( n 1 ) n ( n + 1 ) ( n + 2 ) + 1 .
Proof. 
The famous interval theorem [20] about isometric group shows the nonexistence of isometric groups with dimension between m ( m 1 ) 2 + 1 and m ( m + 1 ) 2 , for any m-dimensional Riemannian manifold, except m 4 .
On one hand, For an n-dimensional Riemannian manifold, the dimension of isometry group achieves maximum if and only if it has constant sectional curvature. However, we will show later that ( S P D ( n ) , g W ) has no constant sectional curvature, which means its symmetry degree is less than the highest.
On the other hand, Equation (10) shows that the dimension of Wasserstein isometric group is higher than the dimension of O ( n ) . Therefore, by dim ( S P D ( n ) ) = n 2 + n 2 4 and dim ( O ( n ) ) = n 2 n 2 , we obtain the desired result.    □

3. Wasserstein Geometry of S P D ( n )

3.1. Geodesic

In this part, we give the expression of geodesic on ( S P D ( n ) , g W ) , including the geodesic jointing of two points and the geodesic with initial values. Then, we will show that the whole Riemannian manifold ( S P D ( n ) , g W ) is geodesic convex, which means that we can always find the minimal geodesic jointing any two points. To some extent, geodesic convexity may make up for the incompleteness of ( S P D ( n ) , g W ) .
To prove the geodesic convexity of ( S P D ( n ) , g W ) , we need the following theorem.
Theorem 1.
For any A 1 , A 2 S P D ( n ) , let A ˜ 1 = A 1 1 2 be the fixed lift of A 1 , there exists a lift of A 2
A ˜ 2 = A 1 1 2 ( A 1 A 2 ) 1 2 ,
such that the line segment γ ˜ ( t ) = t A ˜ 2 + ( 1 t ) A ˜ 1 , t [ 0 , 1 ] is horizontal and non-degenerated.
Proof of Theorem 1 can be found in Appendix B. This theorem brings some geometrical and physical facts.
Corollary 1.
(geodesic convexity) ( S P D ( n ) , g W ) is a geodesic convex Riemannian manifold. Between any two points A 1 , A 2 S P D ( n ) , there exists a minimal Wasserstein geodesic
γ ( t ) = ( 1 t ) 2 A 1 + t ( 1 t ) ( A 1 A 2 ) 1 2 + ( A 2 A 1 ) 1 2 + t 2 A 2 ,
where γ ( t ) lies on S P D ( n ) strictly. Thus, ( S P D ( n ) , g W ) is geodesic convex.
The similar expression of geodesic can also be found in several papers [14,15].
Theorem 2.
For any two points in ( S P D ( n ) , g W ) , there exists a unique geodesic jointing them. From geometric aspect, there is no cut locus on any geodesic.
Proof. 
We have proved the existence of minimal geodesic jointing any two points in Corollary 1. Assume that the there exists two minimal geodesics jointing A , B S P D ( n ) . Fix A ˜ = A 1 2 as the horizontal lift of A and lift horizontally these two geodesic, we will find two horizontal lifts of B. Denote these two lifts as B ˜ 1 = B 1 2 Q 1 , B ˜ 2 = B 1 2 Q 2 , Q 1 , Q 2 O ( n ) . Then, Q 1 and Q 2 are both solutions of the following optimization problem
arg min Q O ( n ) d F ( A 1 2 , B 1 2 Q ) .
Since the compactness of O ( n ) , this problem has a unique solution. Thus, we prove the uniqueness of minimal geodesic.    □
Remark 2.
Due to the nonexistence of cut locus, there exists no conjugate pair on ( S P D ( n ) , g W ) .

3.2. Exponential Map

Following Lemma A1, we can directly write down the Wasserstein logarithm on S P D ( n ) , for any A 1 S P D ( n ) , log A 1 : S P D ( n ) T A 1 S P D ( n )
log A 1 A 2 = d σ | A 1 1 2 γ ˜ ˙ ( 0 ) = ( A 1 A 2 ) 1 2 + ( A 2 A 1 ) 1 2 2 A 1 .
By solving the inverse problem of above equation, we gain the expression of the Wasserstein exponential.
Theorem 3.
In a small open ball B ( 0 , ε ) , ε > 0 in T A S P D ( n ) R 1 2 n ( n + 1 ) , the Wasserstein exponential at A, exp A : B ( 0 , ε ) S P D ( n ) is explicitly expressed by
exp A X = A + X + Γ A [ X ] A Γ A [ X ] .
Proof. 
By choosing the normal coordinates [21] at A, there always exist neighborhoods where exp A is well-defined. From (15), given exp A X as well-defined, this satisfies
( A exp A X ) 1 2 + ( exp A X A ) 1 2 = X + 2 A .
This equation can convert to the Sylvester equation, and we can express its solution as
A A 1 ( A exp A X ) 1 2 + ( exp A X A ) 1 2 A 1 A = X + 2 A A ( A 1 ( A exp A X ) 1 2 ) + ( A 1 ( A exp A X ) 1 2 ) A = X + 2 A A Γ A [ X + 2 A ] = ( A exp A X ) 1 2 .
Therefore, we have
exp A X = Γ A [ X + 2 A ] A Γ A [ X + 2 A ] = A + X + Γ A [ X ] A Γ A [ X ] ,
which finishes this proof.    □
Remark 3.
We call the first two terms of right hand A + X as the Euclidean exponential, and the last term of right hand Γ A [ X ] A Γ A [ X ] as the Wasserstein correction for this bend manifold.
Corollary 2.
The geodesic equations with initial conditions γ ( 0 ) , γ ˙ ( 0 ) on ( S P D ( n ) , g W ) have the following explicit solution
γ ( t ) = γ ( 0 ) + t γ ˙ ( 0 ) + t 2 Γ γ ( 0 ) [ γ ˙ ( 0 ) ] γ ( 0 ) Γ γ ( 0 ) [ γ ˙ ( 0 ) ] , t ( ε , ε ) .
Using an exponential map, one can directly construct Jacobi fields with the geodesic variation.
Theorem 4.
Along a geodesic γ ( t ) with γ ( 0 ) = A S P D ( n ) , γ ˙ ( 0 ) = X T A S P D ( n ) , there exists a unique normal Jacobi vector field J ( t ) with initial conditions J ( 0 ) = 0 , γ ˙ ( 0 ) J ( t ) = Y T A S P D ( n ) , where g W | A X , Y = 0 . We have
J ( t ) = t Y + t 2 ( Γ A [ X ] A Γ A [ Y ] + Γ A [ Y ] A Γ A [ X ] ) .
As in [18], J ( t ) is constructed by
J ( t ) : = s s = 0 exp A t ( X + s Y ) ,
substituting (16) into (22), and Theorem 4 comes from direct computation.
Subsequently, the next natural question is what is the maximal length of the extension of a geodesic. This question is equivalent to what is the largest domain of the exponential. We still focus on diagonal matrices.
Theorem 5.
For any A S P D ( n ) and X T A S P D ( n ) , exp A ( t X ) : [ 0 , ε ) S P D ( n ) is well-defined if and only if
ε max = 1 λ min , if λ min < 0 + , if λ min 0 ,
where λ min is the minimal eigenvalue of Γ A [ X ] .
Proof. 
Evidently, ε max = min { s > 0 | det ( exp A ( s X ) ) = 0 } . By (19), we have
det ( exp A ( s X ) ) = det ( A ) det 2 ( I + s Γ A [ X ] ) = 0 s = 1 λ ( Γ A [ X ] ) ,
where λ ( Γ A [ X ] ) is the eigenvalue of Γ A [ X ] . Thus, ε max = min 1 λ ( Γ A [ X ] ) > 0 .    □
Corollary 3.
The Wasserstein metric g W on S P D ( n ) is incomplete.
Corollary 3 can be directly obtained from Hopf–Rinow theorem [22]. Theorem 5 and the next theorem help us to comprehend the size of ( S P D ( n ) , g W ) from sense of each point.
Figure 1 shows geodesics starting from different origins on S P D ( 2 ) . From this group of pictures, we can observe the outline of the manifold and some behaviors of geodesics.
Using ε max , we can obtain the injectivity radius r ( A ) , A S P D ( n ) . Geometrically speaking, r ( A ) is the maximal radius of the ball in which exp A is well-defined.
Theorem 6.
The Wasserstein radius r ( A ) : S P D ( n ) ( 0 , + ) can be given by
r ( A ) = λ min ( A ) ,
and the function r ( A ) is continuous.
Proof of Theorem 6 can be found in Appendix C. Due to the geodesic convexity, the radius actually defines the Wasserstein distance of a point on S P D ( n ) to the ’boundary’ of the manifold. It also measures the degenerated degree of a positive-definite symmetric matrix by λ min .
Figure 2 shows three maximal geodisical balls with different centers on S P D ( 2 ) . From the viewpoint of R 3 , the three balls have different sizes in the sense of Euclidean distance, but on ( S P D ( 2 ) , g W ) , all of them have the same radius.

3.3. Connection

In this section, we will study the Riemannian connection of ( S P D ( n ) , g W ) , called a Wasserstein connection. The flatness of ( G L ( n ) , g E ) and the structure of the Riemannian submersion will take a series of convenience to our work. During computation, we denote both tensor actions of g W on S P D ( n ) and g E on G L ( n ) by · , · . Then, we denote the Euclidean connection as D and the Wasserstein connection as ∇.
The main idea to express the Wasserstein connection is to compute the horizontal term of the Euclidean covariant derivative of lifted vector fields. We shall prove:
Lemma 1.
The Euclidean connection is a lift of the Wasserstein connection. For any smooth vector fields X and Y on S P D ( n ) , and X ˜ and Y ˜ are their horizontal lifts, respectively, the following equation holds
d σ | A ˜ ( D X ˜ Y ˜ ) = X Y , A ˜ G L ( n ) .
Proof of Lemma 1 can be found in Appendix D. This lemma holds for general Riemannian submersion. The reason we reprove it for the case is that we will need use some middle results of the proof later. Using Lemma 1, we can find a direct corollary, which is one of the essential results in this paper.
Corollary 4.
The Wasserstein connection has an explicit expression:
X Y = d Y ( X ) Γ A [ X ] A Γ A [ Y ] Γ A [ Y ] A Γ A [ X ] ,
where d Y ( X ) is a Euclidean directional derivative.
Proof. 
From Lemma 1 and (A14) (in Appendix D), we have
X Y = d σ | A ˜ ( D X ˜ Y ˜ ) = A ˜ T D X ˜ Y ˜ + ( D X ˜ Y ˜ ) T A ˜ = d Y ( X ) ( X Γ A [ Y ] + Γ A [ Y ] X ) + A Γ A [ X ] Γ A [ Y ] + Γ A [ Y ] Γ A [ X ] A = d Y ( X ) Γ A [ X ] A Γ A [ Y ] Γ A [ Y ] A Γ A [ X ] .
The linearity, Leibnitz’s law and symmetry of Wasserstein connection are easily checked from the expression.    □
The vertical component of lifted covariant derivative of Y along X is a vector field in G L ( n ) whose value at A ˜ is defined by
T A ˜ ( X , Y ) : = D X ˜ Y ˜ X Y ˜ .
We say T A ˜ is an A -tensor. The whole vector field is denoted as T ( X , Y ) . By definition and previous results, we can obtain the expression of T A ˜ ( X , Y ) .
Proposition 5.
T A ˜ ( · , · ) is a antisymmetric bilinear map: T A S P D ( n ) T A S P D ( n ) T A ˜ G L ( n ) , and it satisfies
T A ˜ ( X , Y ) = A ˜ Γ A [ Π [ X , Y ] ] A .
Proof. 
Using (A12) (in Appendix D), (6) and (27), we have
T A ˜ ( X , Y ) = A ˜ ( Γ A [ d Y ( X ) ] Γ A [ X Γ A [ Y ] + Γ A [ Y ] X ] + Γ A [ X ] Γ A [ Y ] ) A ˜ Γ A [ d Y ( X ) Γ A [ X ] A Γ A [ Y ] Γ A [ Y ] A Γ A [ X ] ] = A ˜ ( Γ A [ A Γ A [ X ] Γ A [ Y ] ] Γ A [ Γ A [ Y ] Γ A [ X ] A ] + Γ A [ X ] Γ A [ Y ] ) = A ˜ ( Γ A [ X ] Γ A [ Y ] A Γ A [ Γ A [ X ] Γ A [ Y ] ] Γ A [ Γ A [ Y ] Γ A [ X ] ] A ) = A ˜ Γ A [ Γ A [ X ] Γ A [ Y ] Γ A [ Y ] Γ A [ X ] ] A = A ˜ Γ A [ Π [ X , Y ] ] A ,
where (31) shows that T A ˜ ( X , Y ) depends only on A ˜ and the vectors on T A S P D ( n ) . The multi-linearity and T A ˜ ( X , Y ) = T A ˜ ( Y , X ) are easily checked.    □
Recalling (A14) in Appendix D, we also find that
[ X ˜ , Y ˜ ] = [ X , Y ] ˜ + 2 T ( X , Y ) .
In the following parts, we will show the tensor T X , Y plays a significant role for computing curvature.

3.4. Curvature

In this part, we tend to understand the curvature of ( S P D ( n ) , g W ) . Although there exists some relevant results giving abstract expressions for general cases, we obtain simpler expressions and derive the scalar curvature via a special basis.

3.4.1. Riemannian Curvature Tensor

First, we derive the Riemannian curvature of ( S P D ( n ) , g W ) . We denote the Euclidean curvature on bundle (null entirely) as R ˜ , and the Wasserstein (Riemannian) curvature on ( S P D ( n ) , g W ) as R.
Theorem 7.
For any A S P D ( n ) , and X , Y are smooth vector fields on S P D ( n ) , the Wasserstein curvature tensor R ( X , Y , X , Y ) : = R X Y X , Y A at A has an explicit expression
R ( X , Y , X , Y ) = 3 tr ( Γ A [ X ] A Γ A [ Γ A [ X ] Γ A [ Y ] Γ A [ Y ] Γ A [ X ] ] A Γ A [ Y ] ) .
Proof of Theorem 7 can be found in Appendix E. The expression
R ( X , Y , X , Y ) = T ( X , Y ) 2
has been derived before by other research group in similar way. However, here we use another way to calculate curvature tensor and find a more explicit expression, which is easier than expanding T ( X , Y ) 2 directly. In addition to that, from T ( X , Y ) 2 0 and (A23) (in Appendix E), we can obtain the following corollary.
Corollary 5.
( S P D ( n ) , g W ) has non-negative curvatures, namely
R ( X , Y , X , Y ) 0 .
By solving the Sylvester equation with Algorithm 1, we can simplify the expression. We give the sectional curvature K of the section span { X ( A ) , Y ( A ) }
K | A ( X , Y ) = R ( X , Y , X , Y ) X , X Y , Y X , Y 2 = 12 tr ( E X Λ Γ Λ [ E X , E Y ] Λ E Y ) tr ( E X C X ) tr ( E Y C Y ) tr 2 ( E X C Y ) ,
where we use the same donations as Algorithm 1. In particular, in diagonal cases, we obverse that the sectional curvature conforms to the inverse ratio law
K | k Λ ( X , Y ) = 1 k K | Λ ( X , Y ) , k R { 0 } .
These results conform with our visualized views of ( S P D ( n ) , g W ) , as presented in Figure 1, where the manifold tends to be flat when k increases.

3.4.2. Sectional Curvature

Now, we derive more explicit expressions for sectional curvature and scalar curvature. Conventionally, we only need to consider diagonal cases. Before that, we introduce a basis on S y m ( n ) , which is the tangent space of S P D ( n ) . Define { S p , q } as
S p , q = [ S i j p , q ] , S i j p , q = δ i p δ j q + δ i q δ j p ,
where the superscripts p , q marks the nonzero elements in S p , q and δ is the Kronecker delta. Apparently, { S p , q | 1 p q n } forms a basis of S y m ( n ) . For simplicity, we sometimes sign S p , q , S r , t with S 1 , S 2 , respectively. In this way, we can express the curvature under this basis.
By direct calculation, we have
( S 1 S 2 ) i j = k = 1 n S i k p , q S k j r , t = δ i p δ q t δ j r + δ i p δ q r δ j t + δ i q δ p t δ j r + δ i q δ p r δ j t .
By Algorithm 1, we know that E S = S i j λ i + λ j = Γ Λ S , S T Λ S P D ( n ) ( Q = I in the decomposition of Λ ). Note that the elements of Λ , S 1 , S 2 are all positive; therefore, we have
E S 1 E S 2 0 S 1 S 2 0 { p , q } { r , t } .
According to the anti-symmetry of curvature tensor, the non-vanishing curvature means that { p , q } { r , t } . Moreover, by definition we know S p , q = S q , p , S r , t = S t , r . Without loss of generality, we only need to consider the following particular case:
p = r , q t .
Theorem 8.
For any diagonal matrix Λ = diag ( λ 1 , , λ n ) S P D ( n ) , where λ 1 λ 2 λ n , Wasserstein sectional curvature satisfies
K | Λ ( S 1 , S 2 ) = 3 ( 1 + δ p q ) ( 1 + δ p t ) λ q λ t ( λ p + λ q ) ( λ p + λ t ) ( λ q + λ t ) ,
where S 1 = S p , q , S 2 = S r , t , p = r , q t .
Proof of Theorem 8 can be found in Appendix F. With the above expansion for sectional curvatures, we can easily find that sectional curvature can be controlled by the secondly minimal eigenvalue, which implies that the curvature will seldom explode even on a domain almost degenerated. Only when the matrices degenerate at over two dimensions will the curvatures be very large. This phenomenon ensures the curvature information makes sense in most applications. Some examples for this phenomenon can be observed later.

3.4.3. Scalar Curvature

In the last part of this section, we calculate the scalar curvature directly.
Theorem 9.
For any A S P D ( A ) , its scalar curvature ρ ( A ) is
ρ ( A ) = 3 tr ( U Λ ( U + U T ) + ( U + U T ) Λ U + ( U + U T ) Λ U Λ ( U + U T ) ) ,
where the diagonal matrix Λ = diag ( λ 1 , , λ n ) is orthogonal similar to A, and U = 1 λ i + λ j i < j .
Proof of Theorem 9 can be found in Appendix G. Figure 3 presents some examples for scalar curvatures on ( S P D ( 2 ) , g W ) , which shows our argument in the last part of Section 3.4.2.

4. Point Cloud Denoising

Denoising or outlier removal is a fundamental step of point cloud preprocessing since real-world data are often polluted by noise. There are immense literature in point cloud denoising and widely used algorithms packed as inline functions of softwares. For example, PCL [16] is a popular platform for point cloud processing, which collects four denoising schemes. However, these methods fail to give satisfactory performance when point clouds are polluted by high density noise. To solve this problem, we consider both the statistical and geometrical structure of data and design a new algorithm.
The idea is that by embedding the original point cloud from Euclidean space into S P D ( n ) , the Wasserstein scalar curvature gives essential information about noise and true data. Therefore, our new algorithm mainly contains two steps: First, we give the desired embedding by fitting a Gaussian distribution locally at each point. Then, we identify noise by looking at the histogram of the Wasserstein scalar curvature. Due to the flatness of the space of noise, it is reasonable to classify points with small curvature to be noise. The threshold is set to be the first local minimum of the histogram. We call this new scheme adaptive Wasserstein curvature denoising (AWCD).
In the following, we introduce two traditional denoising methods called radius outlier removal (ROR) and statistical outlier removal (SOR). Then, we explain details about AWCD. Additionally, we carry out experiments using different datasets, with a comparison to two classical methods. From the experimental results, AWCD presents better performance regardless of the data size and the density of noise. We also give a time complexity analysis for each denoising algorithm. The results show that AWCD is as efficient as other classical methods. Thus, it is applicable in many practical tasks.

4.1. Radius Outlier Removal

In Radius Outlier Removal, called ROR (seeing Algorithm 2), points are clustered into two categories according to their local density, i.e., points with low density tend to be recognized as noise, whereas points with high density are recognized as true data. ROR requires two parameters: a preset parameter d as the radius for local neighborhoods and α as the least number of points in each neighborhood.
Algorithm 2 Radius Outlier Removal.
Input: initial point cloud D 0 , parameters d, α
Output: cleaned point cloud D 1
1:
search d-radius neighborhood N i for each point P i , where
N i = { N i j D 0 | N i j P i d } ;
2:
if number of neighbors | N i | α then put P i into D 1 ;
3:
return D 1 .
As an illustration, we add uniform noise to the Stanford Bunny with 10,000 points (see Figure 4).
Then, we apply ROR to denoise the polluted point cloud. The result is shown in Figure 5. From a visual observation, ROR preserves almost all true points but fails to recognize a small portion of noise at any area.
In fact, from a series of repetitive experiments we find that ROR is sensitive to the choice of manual parameters. A small radius will make ROR inefficient, while a large radius will wrongly recognize true points as noise. One of the disadvantages of ROR is that there exists no universal method to determine the best parameters. Further, since ROR uses the kernel method to find the undetermined closest neighbors, the time complexity can reach to O ( n 2 ) where n is the number of points. Thus, in practice, it is often difficult to make a trade-off between efficiency and effect of ROR.

4.2. Statistical Outlier Removal

Compared to ROR, Statistical Outlier Removal (SOR) considers more detailed local structures than density does. SOR showed in Algorithm 3 is one of the most popular methods to preprocess point clouds due to its efficiency when dealing with low density noise. However, SOR gives worse performance than ROR when the noise is of high density. The main idea of SOR comes from one-sigma law from classical statistics [23]. An outlier is believed to be far from the center of its k-nearest neighborhood. Conversely, a true point should lie in a confidence area of its neighborhoods. Let Φ be a d-variate Gaussian distribution with expectation μ and covariance Σ , and let P be a fixed point in R d . Then, Φ induces a Gaussian distribution on the line { μ + t v P | t R } where v P = P μ P μ . In fact, we write the eigendecomposition of Σ as Σ = E diag ( σ 1 2 , , σ d 2 ) E T , where E = [ e 1 , , e d ] is an orthogonal matrix. If we write
v P = i = 1 d λ i e i ,
the projected Gaussian distribution in direction v P has null expectation and variance i = 1 d λ i 2 σ i 2 . According to one-sigma law, we say P is in the confidence area of Φ if
P μ 2 i = 1 d λ i 2 σ i 2 ,
which is equivalent to
( P μ ) T Σ ( P μ ) P μ 4 .
This inequality is a generalization of one-sigma law in high dimensions.
Algorithm 3 Statistical Outlier Removal.
Input: initial point cloud D 0 , parameter k
Output: cleaned point cloud D 1
1:
search kNN N i for each point P i ;
2:
compute local mean and local covariance
μ i = 1 k j = 1 k N i j , Σ i = 1 k 1 j = 1 k ( N i j μ i ) T ( N i j μ i ) ;
3:
if ( P i μ i ) T Σ i ( P i μ i ) P i μ i 4 then put P i into D 1 ;
4:
return D 1 .
Thus, SOR consists of three steps: first we search the k-nearest neighbors (kNN) for every point. Then, we compute the empirical mean and covariance under the assumption of Gaussian distribution for each neighborhood. Finally, true points are identified using (46). SOR requires a single parameter k for kNN. Again, as an illustration, we use the data in Figure 4. After SOR, the result is shown in Figure 6.
We use KD-tree in kNN search. Thus, the time complexity is known as O ( k n log n ) where k is the number of neighbors and n is the number of points. The remaining steps are finished in O ( n ) time. Therefore, the total time complexity is O ( k n log n ) .

4.3. Adaptive Wasserstein Curvature Denoising

Note that the key step in SOR is to compute the local covariance, which is a positive-definite matrix. Motivated by the idea of SOR, we extract the covariance matrix at each point, which is equivalent to embed the original point cloud into S P D ( n ) . From an intuitive perspective, since the true data presents a particular pattern, the covariance matrices should have a large Wasserstein curvature. Conversely, for noise, the covariance matrices form a flat region. Hence, AWCD is based on a principal hypothesis that the Wasserstein curvature of true data is larger than noise.
Under such a hypothesis, what we need to do is to set a threshold to pick out points with a small Wasserstein curvature. To do so, we gather all information in a histogram counting the number of points of different curvature. By the continuity of curvature function, true data and noise will form two different ’hills’. Figure 7 shows an example for the histogram.
The phase change happens at the borderline of two hills, i.e., we seek to find the second minimal value of the histogram. In Figure 7, the critical value is annotated as ’marked curvature’. In this way, we do not need to set the threshold manually and, instead, achieve an adaptive selection process. Algorithm 4 shows the processing of this adaptive denoising via wasserstein curvature.
Algorithm 4 Adaptive Wasserstein Curvature Denoising.
Input: initial point cloud D 0 , parameter k
Output: cleaned point cloud D 1
1:
search kNN neighbors N i for each point P i ;
2:
compute local mean and local covariance as before;
3:
compute Wasserstein curvature ρ ( Σ ) as (43);
4:
construct curvature histogram and determine the marked curvature ρ 0 ;
5:
if ρ ( Σ i ) ρ 0 then put P i into D 1 ;
6:
return D 1 .
We use the same example as in Figure 4. The performance of AWCD is shown in Figure 8.
In this example, AWCD removes almost all noise far from Stanford Bunny, and remains almost all true data. The only problem is that a small portion of noise lying on the dragon cause the false positiveness and some true data located on the flat part are wrongly removed.
Since the main step in AWCD is also kNN, the time complexity is the same as SOR, which is O ( k n log n ) . Therefore, AWCD is applicable in practice. It is remarkable that AWCD is effective for data with dense noise and robust to the unique parameter k.

4.4. Experiments

We use ROR, SOR and AWCD to denoise polluted data sets with noise of different levels of densities. The point clouds are from the Stanford 3D scanning repository, including Stanford Bunny, Duke Dragon, Armadillo, Lucy and Happy Buddha. For each data set, we add noise and record its signal-noise ratio (SNR). To show the influence of data size, we downsample the original data sets of different scales.
We adopt three criteria to measure the performance of the algorithms, including true positive rate (TPR), false positive rate (FPR) and signal-noise rate growing (SNRG). TPR describes the accuracy to preserve true points from unpolluted data sets. FPR describes the success rate to remove noisy points. SNRG explicates the promotion of SNR after processing. For any polluted point cloud D 0 = D N , where D is the points set of true data and N is the set of noise. We obtain the cleaned point cloud D 1 after the denoising algorithms. Then, the computation of these measurements are
TPR = | D 1 D | | D | , FPR = 1 | D 1 N | | N | , SNRG = | D 1 D | | D 1 N | · | N | | D | 1 ,
where | · | denotes the cardinality or size of a finite set. Intuitively, higher TPR, SNRG and lower FPR mean better performance of an algorithm. The experimental results are shown in Table A1 in Appendix I. In each experiment, we highlight the lowest FPR, the highest TPR and the SNRG over 99%. Table A1 shows the superiority of AWCD to ROR and SOR. In general, AWCD can remove almost all noise and meanwhile preserves the true data, except for Armadillo.

5. Edge Detection

In this part, we attempt to apply the Wasserstein curvatures to detect the edges of images with noises. This application follows the idea that the edge parts contain more local information while the relatively flat parts tend to be regarded locally as white noise. Hence, the Wasserstein curvatures have natural advantages to depict the local information. This leads to the following Wasserstein sectional curvature edge detection (WSCED) of Algorithm 5.
Algorithm 5 Wasserstein sectional curvature edge detection.
Input: initial grayscale F 0 with pixels of n × m , parameter k
Output: edge figure F e
1:
search kNN N i j for each point P i j ;
2:
compute every local covariance Σ i j to obtain the covariance image C I , which is a ( n 2 k ) × ( m 2 k ) matrix constructed by matrixes Σ i j ;
3:
determine the section σ i j : = X i j Y i j for every point Σ i j on C I by computing tangent vectors X i j = ( Σ ( i + 1 ) j Σ ( i 1 ) j ) , Y i j = ( Σ i ( j + 1 ) Σ i ( j 1 ) ) ;
4:
compute the Wasserstein sectional curvature for every K w | Σ i j ( σ i j ) with (36) to obtain the curvature image F e , which is a ( n 2 k ) × ( m 2 k ) real matrix;
5:
return F e .
Similar to what we have done in the last section, the first step of WSCED is computing the local mean and variance after kNN, which can be regarded as a two-dimensional embedding from a image into S P D ( n ) . Every pixel coordinate ( i , j ) determines a local covariance matrix Σ i j . In the second step, we compute the sectional curvature for every Σ i j . The chosen section X i j Y i j is determined by two difference vectors along x-axis and y-axis,
X i j : = Σ ( i + 1 ) j Σ ( i 1 ) j , Y i j : = Σ i ( j + 1 ) Σ i ( j 1 ) .
According to (36), we obtain the chosen curvature K I i j = K w | Σ i j X i j , Y i j on Σ i j . Then, we obtain a curvature image K I . Finally, with some appropriate image transformation, we can detect edges on K I .
In simulations, we compare WSCED to traditional edge detecting filters, including Sobel, Prewitt and Laplacian [24]. We tend to detect edges for images with noises in high density. From Figure 9, we find that WSCED approaches the same outcome as Sobel and Prewitt filters, which implies the potential connection between Wasserstein curvature and edges. This result also shows the robustness of WSCED to noises. We present more effects of digital experiments in Figure A1 in Appendix H.

6. Conclusions and Future Work

In this paper, we studied the geometric characteristics of ( S P D ( n ) , g W ) , including geodesics, the connection, Jacobi fields and curvatures. Compared with the existing results, our results are simpler in form and more suitable for computation. Based on these results, we designed novel algorithms for point cloud denoising and image edge detection. Numerical experiments showed that these geometry-based methods were valid for applications. From both a theoretical and practical prospective, we gained a more comprehensive understanding regarding the Wasserstein geometry on S P D ( n ) , which shows that the Wasserstein metric has both deep application potential and mathematical elegance.
In our future work, on the one hand, we aim to study Wasserstein geometry on other matrix manifolds, such as the Stiefel manifold [25], Grassman manifold [26] and some complex matrix manifolds [27]. On the other hand, we would like to generalize geometry-based methods to solve more problems in image, signal processing [28] and data science.

Author Contributions

Conceptualization, Y.L.; methodology, Y.L.; software, Y.L., Y.C.; validation, S.Z.; investigation, H.S.; resources, H.S.; writing—original draft preparation, Y.L.; writing—review and editing, S.Z., Y.C.; visualization, Y.C.; supervision, H.S.; funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This subject is supported partially by National Key Research and Development Plan of China (No. 2020YFC2006201).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

In this paper, we obtain all of our point clouds data from the Large Geometric Models Archive of Georgia Tech at https://0-search-crossref-org.brum.beds.ac.uk/funding (on 5 April 2021). The images were gained from the open resources from Google.

Acknowledgments

The authors would like to express their sincere thanks to Chao Qian for his valuable suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Properties of ΓA [X]

Here, we list some basic properties, which will be used frequently in the discussions of this paper.
Proposition A1.
For any A , B S P D ( n ) , X , Y M ( n ) , Q O ( n ) , k R , the following equations hold:
1. 
Γ A [ X + k Y ] = Γ A [ X ] + k Γ A [ Y ] ;
2. 
Γ k A [ X ] = 1 k Γ A [ X ] ;
3. 
Γ A + B [ X ] = Γ A [ X ] Γ A [ B Γ A + B [ X ] + Γ A + B [ X ] B ] , A + B S P D ( n ) ;
4. 
Γ A [ A X ] = A Γ A [ X ] , Γ A [ X A ] = Γ A [ X ] A ;
5. 
Γ A 1 [ X ] = Γ A [ A X A ] = A Γ A [ X ] A ;
6. 
Γ Q A Q 1 [ Q X Q 1 ] = Q Γ A [ X ] Q 1 .
Proposition A1 will determine the geometry on ( S P D ( n ) , g W ) and be involved into every calculation repetitively and alternatively. We omit the proof since these properties are easily checked.

Appendix B. Proof of Theorem 1

To prove Theorem 1, we shall give two lemmas first.
Lemma A1.
For any A ˜ 1 , A ˜ 2 G L ( n ) , l ( t ) = t A ˜ 2 + ( 1 t ) A ˜ 1 , t [ 0 , 1 ] . l ˙ ( t ) is horizontal if and only if det ( l ( t ) ) > 0 , and l ˙ ( 0 ) is horizontal at T A ˜ 1 G L ( n ) .
Proof. 
l ˙ ( 0 ) = A ˜ 2 A ˜ 1 is horizontal at T A ˜ 1 G L ( n ) A ˜ 1 1 ( A ˜ 2 A ˜ 1 ) S y m ( n ) A ˜ 1 1 A ˜ 2 S y m ( n ) A ˜ 2 1 A ˜ 1 S y m ( n ) A ˜ 2 1 ( A ˜ 2 A ˜ 1 ) S y m ( n ) l ˙ ( 1 ) is also horizontal at T A ˜ 2 G L ( n ) .
For any s ( 0 , 1 ) that satisfies det ( l ( s ) ) 0 , we constrict the line segment by l s ( t ) : = l ( s t ) , and the lemma can be induced by a horizontal of l ˙ s ( 1 ) . □
Lemma A1 ensures that a line segment is horizontal if its initial vector is horizontal, which brings quite convenience for the following proofs of certain results.
Lemma A2.
For any A 1 , A 2 S P D ( n ) , there exists P = A 1 1 2 ( A 1 A 2 ) 1 2 A 2 1 2 O ( n ) such that γ ˜ ( t ) = t P A 2 1 2 + ( 1 t ) A 1 1 2 . When det ( γ ˜ ( t ) ) > 0 , γ ˜ ˙ ( t ) keeps horizontal at T γ ˜ ( t ) G L ( n ) .
Proof. 
First, we have
P T P = A 2 1 2 ( A 2 A 1 ) 1 2 A 1 1 ( A 1 A 2 ) 1 2 A 2 1 2 = A 2 1 2 A 1 1 ( A 1 A 2 ) 1 2 ( A 1 A 2 ) 1 2 A 2 1 2 = I .
Then, we only need to check that A 1 1 2 γ ˜ ˙ ( 0 ) S y m ( n ) . In fact, we have
A 1 1 2 γ ˜ ˙ ( 0 ) = A 1 1 ( A 1 A 2 ) 1 2 I = A 1 1 2 A 1 1 2 ( A 1 A 2 ) 1 2 A 1 1 2 A 1 1 2 I = A 1 1 2 ( A 1 1 2 A 2 A 1 1 2 ) 1 2 A 1 1 2 I S y m ( n ) ,
which proves the lemma. □
Remark A1.
Finding the orthogonal matrix P is equivalent to solving a Riccati equation. One can see [29] for details.
To prove Theorem 1, the last step is to clarify the non-degeneration.
Proof .
(Theorem 1) Due to the symmetry from (10), it suffices to prove the diagonal case A 1 = Λ . We shall prove det γ ˜ t > 0 , t ( 0 , 1 ) . We have
det [ γ ˜ ( t ) ] = det t Λ 1 2 ( Λ A 2 ) 1 2 + ( 1 t ) Λ 1 2 = ( 1 t ) n det I + t ( 1 t ) Λ 1 2 ( Λ A 2 ) 1 2 Λ 1 2 det Λ 1 2 .
Since Λ 1 2 ( Λ A 2 ) 1 2 Λ 1 2 is congruent to ( Λ A 2 ) 1 2 , its eigenvalues are all positive. Thus, det [ γ ˜ ( t ) ] > 0 holds for all t ( 0 , 1 ) . Combining with Lemmas A1 and A2, the proof is done. □

Appendix C. Proof of Theorem 6

Proof. 
Again, with (10), we have r ( A ) = r ( Λ ) where A = Q Λ Q T . Thus, we still focus on r ( Λ ) . Following the definition and discussions around (23),
r ( Λ ) = inf { ϵ ( V ) | V T Λ S P D ( n ) , V g W = 1 } = inf { V g W | V T Λ S P D ( n ) , det ( I + Γ Λ [ V ] ) = 0 } ,
in which · g W is the inner product reduced by g W . Then, we have
2 r 2 ( Λ ) = inf { tr ( V Γ Λ [ V ] ) | Γ Λ [ V ] has eigenvalue η k = 1 } .
Denote { λ i > 0 } as eigenvalues of Λ , and η j as eigenvalues of Γ Λ [ V ] . Let x j be eigenvectors corresponding to η j such that { x j } forms an orthogonal basis of R n . By direct calculation, we find
tr ( V Γ Λ [ V ] ) = i = 1 n x i T V Γ Λ [ V ] x i = i = 1 n η i x i T V x i = i = 1 n η i x i T ( Λ Γ Λ [ V ] + Γ Λ [ V ] Λ ) x i = i = 1 n ( η i ( Γ Λ [ V ] x i ) T Λ x i + η i x i T Λ Γ Λ [ V ] x i ) = 2 i = 1 n η i 2 x i T Λ x i 2 x k T Λ x k ,
where we used the facts that Γ is positive-definite and η k = 1 . The equality tr ( V Γ Λ [ V ] ) = x k T Λ x k holds if and only if η j = 0 , j k . In these cases, by Algorithm 1, we have
V i j λ i + λ j = ( Γ Λ [ V ] ) i j = δ i k δ j k ,
then we obtain that
V i j = 2 δ i k δ i k λ k , ( x k ) i = δ k i .
Thus,
tr ( V Γ Λ [ V ] ) = 2 x k T Λ x k = 2 λ k .
In particular, we obtain 2 λ min as min { tr ( V Γ Λ [ V ] ) } , when Λ k = λ min and V i j = 2 δ i k δ j k λ min . The tangent vector V is called the speed-degenerated direction. The function λ min ( A ) is certainly continuous, hence r ( A ) is also continuous. □

Appendix D. Proof of Lemma 1

Before proving Lemma 1, we shall prove a key lemma, which points the relation between the Lie-brackets on the total space and base space.
Lemma A3.
The horizontal lift of vector fields commutes with Lie-brackets,
d σ | A ˜ [ X ˜ , Y ˜ ] = [ X , Y ] , A ˜ G L ( n ) .
Proof .
(Lemma A3) On the flat manifold ( G L ( n ) , g E ) , the connection equals to the usual directional derivative in the Euclidean space. Therefore, for vector fields X , Y on S P D ( n ) , we have
D X ˜ Y ˜ = lim t 0 1 t Y ˜ | A ˜ + X ˜ t Y ˜ | A ˜ = lim t 0 1 t A ˜ + X ˜ t Γ A + X t [ Y | A + X t ] A ˜ Γ A [ Y | A ] = lim t 0 A ˜ t ( Γ A + X t [ Y | A + X t ] Γ A [ Y | A ] ) + X ˜ Γ A [ Y ] = lim t 0 A ˜ t ( Γ A + X t [ Y | A + X t ] Γ A [ Y | A ] ) + A ˜ Γ A [ X ] Γ A [ Y ] = A ˜ ( Γ A [ d Y ( X ) ] Γ A [ X Γ A [ Y ] + Γ A [ Y ] X ] + Γ A [ X ] Γ A [ Y ] ) .
With Π A [ X , Y ] : = Γ A [ X ] Γ A [ Y ] Γ A [ Y ] Γ A [ X ] into (2), we obtain
Γ A [ X Γ A [ Y ] + Γ A [ Y ] X Y Γ A [ X ] Γ A [ X ] Y ] = Π A [ X , Y ] 2 Γ A [ Π A [ X , Y ] ] A .
Then, we compute the Lie-bracket
[ X ˜ , Y ˜ ] : = D X ˜ Y ˜ D Y ˜ X ˜ = A ˜ Γ A [ d Y ( X ) d X ( Y ) ] + A ˜ ( Γ A [ X ] Γ A [ Y ] Γ A [ Y ] Γ A [ X ] ) A ˜ Γ A [ X Γ A [ Y ] + Γ A [ Y ] X Y Γ A [ X ] Γ A [ X ] Y ] = [ X , Y ] ˜ + 2 A ˜ Γ A [ Π A [ X , Y ] ] A ,
where the last equality in (A14) comes from (A13).
Finally, we show the second term in (A14) is vertical. In fact, we have
2 A ˜ Γ A [ Π A [ X , Y ] ] A = ( 2 A ˜ Γ A [ Π A [ X , Y ] ] A ˜ T ) A ˜ ,
where 2 A ˜ Γ A [ Π A [ X , Y ] ] A ˜ T is anti-symmetric. Thus, the proof for Lemma A3 has been done. □
By Lemma A3, the proof for Lemma 1 is clarified.
Proof .
(Lemma 1) For any smooth vector field Z on S P D ( n ) , and its horizontal lift Z ˜ , we have
X Y ˜ , Z ˜ = X Y , Z = 1 2 ( X Y , Z + Y Z , X Z X , Y + Z , [ X , Y ] + Y , [ Z , X ] X , [ Y , Z ] ) = 1 2 X ˜ Y ˜ , Z ˜ + Y ˜ Z ˜ , X ˜ Z ˜ X ˜ , Y ˜ + Z ˜ , [ X , Y ] ˜ + Y ˜ , [ Z , X ] ˜ X ˜ , [ Y , Z ] ˜ = 1 2 X ˜ Y ˜ , Z ˜ + Y ˜ Z ˜ , X ˜ Z ˜ X ˜ , Y ˜ + Z ˜ , [ X ˜ , Y ˜ ] + Y ˜ , [ Z ˜ , X ˜ ] X ˜ , [ Y ˜ , Z ˜ ] = D X ˜ Y ˜ , Z ˜ .
By the arbitrariness of Z, Lemma 1 is proved. □
Remark A2.
Proof of Lemma 1 implies that d σ | A ˜ ( D X ˜ Y ˜ ) is independent on A ˜ chosen among a fixed fiber.

Appendix E. Proof of Theorem 7

Proof. 
Calculating R ˜ | A ˜ ( X ˜ , Y ˜ , X ˜ , Y ˜ ) as follows
R ˜ | A ˜ ( X ˜ , Y ˜ , X ˜ , Y ˜ ) = R ˜ X ˜ Y ˜ X ˜ , Y ˜ = D [ X ˜ , Y ˜ ] X ˜ , Y ˜ D X ˜ D Y ˜ X ˜ , Y ˜ + D Y ˜ D X ˜ X ˜ , Y ˜ = D [ X , Y ] ˜ X ˜ , Y ˜ + D 2 T ( X , Y ) X ˜ , Y ˜ D X ˜ Y X ˜ , Y ˜ D X ˜ T ( Y , X ) , Y ˜ + D Y ˜ X X ˜ , Y ˜ + D Y ˜ T ( X , X ) , Y ˜ = [ X , Y ] X , Y + 2 D T ( X , Y ) X ˜ , Y ˜ X Y X , Y T ( X , Y ) , T ( X , Y ) + Y X X , Y = R | A ( X , Y , X , Y ) + 2 D T ( X , Y ) X ˜ , Y ˜ T ( X , Y ) , T ( X , Y ) ,
where we used (29), (32), T A ˜ ( X , Y ) V ( A ˜ ) , T A ˜ ( X , Y ) = T A ˜ ( Y , X ) and the following equation
D X ˜ T ( Y , X ) , Y ˜ = X ˜ T ( Y , X ) , Y ˜ T ( Y , X ) , D X ˜ Y ˜ = T ( X , Y ) , T ( X , Y ) + X Y ˜ = T ( X , Y ) , T ( X , Y ) .
Since T ( X , Y ) and X ˜ , Y ˜ are both vertical, we have
[ T ( X , Y ) , X ˜ ] , Y ˜ = 0 .
Then, using D T ( X , Y ) X ˜ D X ˜ T ( X , Y ) = [ T ( X , Y ) , X ˜ ] we can obtain that
D T ( X , Y ) X ˜ , Y ˜ = D X ˜ T ( X , Y ) , Y ˜ = T ( X , Y ) , T ( X , Y ) .
By the definition of the directional derivative and σ ( A ˜ + T ( X , Y ) ) = A , we have
D T ( X , Y ) X ˜ = lim t 0 1 t A ˜ + T ( X , Y ) t Γ A [ X ] A ˜ Γ A [ X ] = T ( X , Y ) Γ A [ X ] .
Combining (A20) and (A21), we obtain the following equation
T ( X , Y ) , T ( X , Y ) = T ( X , Y ) Γ A [ X ] , Y ˜ ,
and
R ˜ ( X ˜ , Y ˜ , X ˜ , Y ˜ ) = R ( X , Y , X , Y ) + 3 T ( X , Y ) Γ A [ X ] , A ˜ Γ A [ Y ] = R ( X , Y , X , Y ) 3 T ( X , Y ) , T ( X , Y ) .
Due to R ˜ 0 , we obtain the explicit expression for the Wasserstein curvature
R X , Y , X , Y = 3 T ( X , Y ) Γ A [ X ] , A ˜ Γ A [ Y ] = 3 tr ( Γ A [ X ] T ( X , Y ) A ˜ Γ A [ Y ] ) = 3 tr ( Γ A [ X ] A Γ A [ Π A [ X , Y ] ] A Γ A [ Y ] ) .

Appendix F. Proof of Theorem 8

Proof. 
Using p = r , q t , we have
( S 1 S 2 ) i j = ( 1 + δ p q ) ( 1 + δ p t ) δ i q δ j t = 2 δ i q δ j t , q = p or t = p δ i q δ j t , q p and t p ,
and
( E S 1 E S 2 ) i j = ( 1 + δ p q ) ( 1 + δ p t ) ( λ p + λ q ) ( λ p + λ t ) δ i q δ j t .
Then, we obtain
( Γ Λ [ E S 1 , E S 2 ] ) i j = [ E S 1 , E S 2 ] i j λ i + λ j = ( 1 + δ p q ) ( 1 + δ p t ) ( δ i q δ j t δ j q δ i t ) ( λ p + λ q ) ( λ p + λ t ) ( λ q + λ t ) ,
and
k = 1 n ( Γ Λ [ E S 1 , E S 2 ] ) i k ( Λ E S 2 E S 1 Λ ) k j = ( 1 + δ p q ) ( 1 + δ p t ) ( δ i q δ k t δ k q δ i t ) ( λ p + λ q ) ( λ p + λ t ) ( λ q + λ t ) ( 1 + δ p q ) ( 1 + δ p t ) λ q λ t ( λ p + λ q ) ( λ p + λ t ) δ k t δ j q = ( 1 + δ p q ) 2 ( 1 + δ p t ) 2 λ q λ t ( λ p + λ q ) 2 ( λ p + λ t ) 2 ( λ q + λ t ) δ i q δ j q .
Thus, we obtain the expression for curvature as follows
R | Λ ( S 1 , S 2 , S 1 , S 2 ) = 3 tr ( E S 1 Λ Γ Λ [ E S 1 , E S 2 ] Λ E S 2 ) = 3 ( 1 + δ p q ) 2 ( 1 + δ p t ) 2 λ q λ t ( λ p + λ q ) 2 ( λ p + λ t ) 2 ( λ q + λ t ) .
On the other hand, by (A26), we have
S 1 S 2 S 1 , S 2 = tr ( E S 2 Λ E S 1 ) = tr ( Λ E S 1 E S 2 ) = 0 ,
and
S 1 , S 1 = 1 2 tr ( S 1 E S 1 ) = 1 + δ p q λ p + λ q , S 2 , S 2 = 1 2 tr ( S 2 E S 2 ) = 1 + δ p t λ p + λ t .
Then, we can find the area of the sectional parallelogram as
S 1 , S 1 S 2 , S 2 S 1 , S 2 2 = ( 1 + δ p q ) ( 1 + δ p t ) ( λ p + λ q ) ( λ p + λ t ) .
From (A29) and (A32), the proof is done consequently. □

Appendix G. Proof of Theorem 9

Before proving this theorem, we need to do some preparation. For any A S P D ( n ) with eigenvalues λ i , i = 1 , , n , construct diagonal matrix Λ = diag ( λ 1 , , λ n ) and upper triangle matrix U = 1 λ i + λ j i < j . In p = r , q < t cases, by Theorem 8, we have
K | A ( S p , q , S r , t ) = 3 ( 1 + δ p q ) ( 1 + δ p t ) λ q λ t ( λ p + λ q ) ( λ p + λ t ) ( λ q + λ t ) = 3 U q t ( ( U + U T ) p q Λ q q + δ p q ) ( ( U + U T ) p t Λ t t + δ p t ) = 3 U q t Λ t t ( U + U T ) t p δ p q + 3 ( U + U T ) p q Λ q q U q t δ p t + 3 ( U + U T ) p t Λ t t U t q Λ q q ( U + U T ) q p ,
where we used
1 δ p q λ p + λ q = ( U + U T ) p q ,
and
( 1 + δ p q ) λ q λ p + λ q = ( 1 δ p q ) λ q λ p + λ q λ q + 2 δ p q λ q λ p + λ q = ( U + U T ) p q λ q + δ p q .
Then, we can obtain the scalar curvature by summing (A33).
Proof. 
Note that U q t = 0 , q t , we have
ρ ( Λ ) = p , q < t K | Λ ( S p , q , S p , t ) = p , q < t 3 U q t Λ t t ( U + U T ) t p δ p q + 3 ( U + U T ) p q Λ q q U q t δ p t + p , q < t 3 ( U + U T ) p t Λ t t U t q Λ q q ( U + U T ) q p = p , q , t 3 U q t Λ t t ( U + U T ) t p δ p q + 3 ( U + U T ) p q Λ q q U q t δ p t + p , q , t 3 ( U + U T ) p t Λ t t U t q Λ q q ( U + U T ) q p = 3 tr ( U Λ ( U + U T ) + ( U + U T ) Λ U + ( U + U T ) Λ U Λ ( U + U T ) ) .

Appendix H. Effects of WSCED

Figure A1. WSCED on different polluted images. The first row shows the original RGB images. The second row shows images with uniform noises. The third row shows edge images after WSCED.
Figure A1. WSCED on different polluted images. The first row shows the original RGB images. The second row shows images with uniform noises. The third row shows edge images after WSCED.
Entropy 23 01214 g0a1

Appendix I

Table A1. Comparison of three denosing algorithms.
Table A1. Comparison of three denosing algorithms.
Datasource
(Size)
Original
SNR
RORSORAWCD
TPRFPRSNRGTPRFPRSNRGTPRFPRSNRG
Stanford Bunny
(10,000)
10100.00%18.10%452.5%87.78%49.00%79.1%99.28%4.80%1968.3%
0.1100.00%100.00%0.0%84.44%71.61%17.9%99.93%90.53%10.4%
Duke Dragon
(100,000)
100100.00%9.60%941.7%82.29%51.90%58.6%100.00%2.90%3348.3%
1100.00%100.00%0.0%82.12%70.16%17.1%99.88%2.40%4054.8%
Armadillo
(50,000)
1062.26%0.54%114.388.06%63.46%38.8%34.84%0.01%347.38
162.66%0.41%153.3487.99%69.62%26.4%34.84%0.09%404.14
Lucy
(50,000)
100100.00%6.00%1566.6%80.71%51.80%55.8%99.99%0.80%123.99
10100.00%35.50%181.7%80.65%66.44%21.4%99.99%2.86%3396.1%
Happy Buddha
(50,000)
100100.00%6.80%1370.6%77.82%56.60%37.5%99.92%2.40%4063.3%
10100.00%13.86%621.5%78.24%64.76%20.8%99.93%1.88%5215.6%

References

  1. Hosseini, R.; Sra, S. Matrix Manifold Optimization for Gaussian Mixtures. Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
  2. Manton, J.H. A Primer on Stochastic Differential Geometry for Signal Processing. IEEE J. Sel. Top. Signal Process. 2013, 4, 681–699. [Google Scholar] [CrossRef] [Green Version]
  3. Rubner, Y.; Tomasi, C.; Guibas, L.J. A Metric for Distributions with Applications to Image Databases. In Proceedings of the International Conference on Computer Vision, Bombay, India, 7 January 1998. [Google Scholar]
  4. Bhatia, R. Matrix Ananlysis. Graduate Texts in Mathematics; Springer: New York, NY, USA, 1997. [Google Scholar]
  5. Bhatia, R. Positive Definite Matrices; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
  6. Pennec, X.; Fillard, P.; Ayache, N. A Riemannian Framework for Tensor Computing. Int. J. Comput. Vis. 2006, 1, 41–66. [Google Scholar] [CrossRef] [Green Version]
  7. Arsigny, V.; Fillard, P.; Pennec, X.; Ayache, N. Log-Euclidean Metrics for Fast and Simple Calculus on Diffusion Tensors. Magn. Reson. Med. 2010, 2, 411–421. [Google Scholar] [CrossRef] [PubMed]
  8. Hall, B.C. Lie Groups, Lie Algebras, and Representations; Springer: Berlin, Germany, 2015. [Google Scholar]
  9. Li, Y.; Wong, K. Riemannian Distances for Signal Classification by Power Spectral Density. IEEE J. Sel. Top. Signal Process. 2013, 7, 655–669. [Google Scholar] [CrossRef]
  10. Zhang, S.; Cao, Y.; Li, W.; Yan, F.; Luo, Y.; Sun, H. A New Riemannian Structure in SPD(n). In Proceedings of the IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Chongqing, China, 11–13 December 2019. [Google Scholar]
  11. Gelbrich, M. On a Formula for the L2 Wasserstein Metric between Measures on Euclidean and Hilbert Spaces. Math. Nachrichten 2015, 1, 185–203. [Google Scholar] [CrossRef]
  12. Givens, C.R.; Shortt, R.M. A Class of Wasserstein Metrics for Probability Distributions. Mich. Math. J. 1984, 2, 231–240. [Google Scholar] [CrossRef]
  13. Martin, A.; Soumith, C.; Leon, B. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar]
  14. Asuka, T. On Wasserstein Geometry of Gaussian Measures. In Probability Approach to Geometry; Mathematical Society of Japan: Tokyo, Japan, 2010; pp. 463–472. [Google Scholar]
  15. Massart, E.; Absil, P.-A. Quotient Geometry with Simple Geodesics for the Manifold of Fixed-Rank Positive-Semidefinite Matrices. Siam J. Matrix Anal. Appl. 2020, 1, 171–198. [Google Scholar] [CrossRef]
  16. Rusu, R.B.; Cousins, S. 3D is Here: Point Cloud Library (PCL). In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011. [Google Scholar]
  17. Ward, A.J.B. A General Analysis of Sylvester’s Matrix Equation. Int. J. Math. Educ. Sci. Technol. 1991, 22, 615–620. [Google Scholar] [CrossRef]
  18. Do Carmo, M.P. Riemannian Geometry; Springer: Boston, MA, USA, 1992. [Google Scholar]
  19. Lee, J.M. Introduction to Smooth Manifolds; Springer: Berlin, Germany, 2003. [Google Scholar]
  20. Kobayashi, S. Transformation Groups in Differential Geometry. In Classics in Mathematics; Springer: Berlin, Germany, 1932. [Google Scholar]
  21. Lee, J.M. Riemannian Manifolds An Introduction to Curvature; Springer: New York, NY, USA, 1997. [Google Scholar]
  22. Boothby, W.M. An Introduction to Differentiable Manifolds and Riemannian Geometry, 2nd ed.; Academic Press: Cambridge, MA, USA, 1986. [Google Scholar]
  23. Masry, E. Multivariate Probability Density Estimation for Associated Processes: Strong Consistency and Rates. Stat. Probab. Lett. 2002, 2, 205–219. [Google Scholar] [CrossRef]
  24. Nadernejad, E.; Sharifzadeh, S.; Hassanpour, H. Edge Detection Techniques: Evaluations and Comparisons. Appl. Math. Sci. 2008, 2, 1507–1520. [Google Scholar]
  25. Nishimori, Y.; Akaho, S. Learning Algorithms Utilizing Quasi-geodesic Flows on the Stiefel Manifold. Neurocomputing 2005, 67, 106–135. [Google Scholar] [CrossRef]
  26. Fiori, S. A Theory for Learning by Weight Flow on Stiefel-Grassman Manifold. Neural Comput. 2001, 13, 1625–1647. [Google Scholar] [CrossRef]
  27. Hua, X.; Ono, Y.; Peng, L.; Cheng, Y.; Wang, H. Target Detection Within Nonhomogeneous Clutter Via Total Bregman Divergence-Based Matrix Information Geometry Detectors. IEEE Trans. Signal Process. 2021, 69, 4326–4340. [Google Scholar] [CrossRef]
  28. Hua, X.; Peng, L. MIG Median Detectors with Manifold Filter. Signal Process. 2021, 188, 108176. [Google Scholar] [CrossRef]
  29. Malag, L.; Montrucchio, L.; Pistone, G. Wasserstein Riemannian Geometry of Gaussian densities. In Information Geometry; Springer: Berlin/Heidelberger, Germany, 2018; pp. 137–179. [Google Scholar]
Figure 1. Geodesics of g W on S P D ( 2 ) .
Figure 1. Geodesics of g W on S P D ( 2 ) .
Entropy 23 01214 g001
Figure 2. Three geodisical balls with the same radius.
Figure 2. Three geodisical balls with the same radius.
Entropy 23 01214 g002
Figure 3. Scalar curvatures on SPD(2).
Figure 3. Scalar curvatures on SPD(2).
Entropy 23 01214 g003
Figure 4. Stanford Bunny with uniform noise.
Figure 4. Stanford Bunny with uniform noise.
Entropy 23 01214 g004
Figure 5. Cleaned point cloud by ROR.
Figure 5. Cleaned point cloud by ROR.
Entropy 23 01214 g005
Figure 6. Cleaned point cloud by SOR.
Figure 6. Cleaned point cloud by SOR.
Entropy 23 01214 g006
Figure 7. Histogram of the Wasserstein scalar curvature; x-axis: scalar curvature and y-axis: point number.
Figure 7. Histogram of the Wasserstein scalar curvature; x-axis: scalar curvature and y-axis: point number.
Entropy 23 01214 g007
Figure 8. Cleaned point cloud by AWCD.
Figure 8. Cleaned point cloud by AWCD.
Entropy 23 01214 g008
Figure 9. An example to show different edge detection algorithms on the flower image.
Figure 9. An example to show different edge detection algorithms on the flower image.
Entropy 23 01214 g009
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Luo, Y.; Zhang, S.; Cao, Y.; Sun, H. Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing. Entropy 2021, 23, 1214. https://0-doi-org.brum.beds.ac.uk/10.3390/e23091214

AMA Style

Luo Y, Zhang S, Cao Y, Sun H. Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing. Entropy. 2021; 23(9):1214. https://0-doi-org.brum.beds.ac.uk/10.3390/e23091214

Chicago/Turabian Style

Luo, Yihao, Shiqiang Zhang, Yueqi Cao, and Huafei Sun. 2021. "Geometric Characteristics of the Wasserstein Metric on SPD(n) and Its Applications on Data Processing" Entropy 23, no. 9: 1214. https://0-doi-org.brum.beds.ac.uk/10.3390/e23091214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop