Next Article in Journal
Approximation Algorithms for Sorting λ-Permutations by λ-Operations
Previous Article in Journal
A PID Parameter Tuning Method Based on the Improved QUATRE Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Introduction to Development of Centralized and Distributed Stochastic Approximation Algorithm with Expanding Truncations

1
Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
2
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
Submission received: 2 May 2021 / Revised: 26 May 2021 / Accepted: 31 May 2021 / Published: 31 May 2021
(This article belongs to the Special Issue State-of-the-Art Algorithms and Their Applications in China)

Abstract

:
The stochastic approximation algorithm (SAA), starting from the pioneer work by Robbins and Monro in 1950s, has been successfully applied in systems and control, statistics, machine learning, and so forth. In this paper, we will review the development of SAA in China, to be specific, the stochastic approximation algorithm with expanding truncations (SAAWET) developed by Han-Fu Chen and his colleagues during the past 35 years. We first review the historical development for the centralized algorithm including the probabilistic method (PM) and the ordinary differential equation (ODE) method for SAA and the trajectory-subsequence method for SAAWET. Then, we will give an application example of SAAWET to the recursive principal component analysis. We will also introduce the recent progress on SAAWET in a networked and distributed setting, named the distributed SAAWET (DSAAWET).

1. Introduction

SAA was first introduced by Robbins and Monro in 1950s [1] and is also called the RM algorithm in the literature. Since the 1950s, SAA has found successful applications in systems and control, statistics, machine learning, the stochastic arithmetic, and the CESTAC method for numerical calculation [2,3], energy internet [4], and so forth, and there are plenty of studies on various theoretical properties of SAA, including the sample path convergence [1,5,6,7], the weak convergence [8], convergence of time-varying root-searching functions [9,10,11,12], robustness of SAA [13], stable and unstable limit points [14], convergence rate and asymptotic normality [7,15,16,17,18,19,20], and so forth. In this paper, due to space limitations, we will not give an overview of all aspects of SAA, but focus on the sample path analysis of this kind of algorithm, and particularly introduce the centralized and distributed SAAWET developed by Han-Fu Chen and his colleagues during the past 35 years.
At the early stage of SAA, to guarantee the convergence of the algorithm, it usually requires restrictive conditions on the noise { ε k } k 0 and the function f ( · ) , for example, { ε k } k 0 being a martingale difference sequence and f ( · ) being bounded by a linear function. Then, by tools from the probability theory, such as the martingale theory, the almost sure convergence of SAA can be established and such an analysis method is called the probabilistic method (PM) [1,20]. However, in many situations, the noise { ε k } k 0 may not be a martingale difference sequence, and it may depend on the past estimates generated by SAA itself, which we call the state-dependent noise, and in such cases, PM fails. Towards relaxing the restrictive conditions in PM, in the 1970s a so-called ordinary differential equation (ODE) method was introduced [7,21], which transforms the convergence analysis of SAA into the stability of the equilibrium point of an associated ODE and forms another representative direction for theoretical analysis of SAA. For the ODE method, it a priori assumes that the estimate sequence generated by the algorithm is bounded, which is not easy to be verified in practice. For many theoretical problems, due to the complexity of the system noise and less a priori information on the estimate sequence, neither PM nor the ODE method can be applied.
In order to essentially relax the assumptions required by PM and the ODE method, in the past 35 years an expanding truncation technique was introduced into SAA by Han-Fu Chen and his colleagues, and a so-called stochastic approximation algorithm with expanding truncations (SAAWET) has been developed [22,23]. The key idea of the expanding truncation technique lies in that if the magnitude of the estimate exceeds the truncation bound, then the estimate is reset to the initial value but with a decreased stepsize, and, at the same time, an enlarged truncation bound will be used for the next recursion, while if the magnitude of estimate is smaller than the truncation bound, then SAAWET evolves as the classical SAA in the next recursion and the truncation bound remains unchanged. The expanding truncation technique introduces a way of adaptively choosing the step-sizes of the algorithm and essentially relaxes the conditions on the noises and the root-searching function. Furthermore, to guarantee the convergence of SAAWET, one only needs to verify the noise condition along the index sequence { n k } k 0 of any convergent subsequence { x n k } k 0 rather than along the whole sequence { x k } k 0 , which greatly facilitates the application of such an algorithm. The convergence analysis method of SAAWET is accordingly named the trajectory-subsequence (TS) method. In the past 35 years, SAAWET has successfully been applied to recursive identification of a multivariate ARMAX system [24], recursive identification of nonlinear systems [25,26], adaptive regulation of a nonlinear system [27], consensus control of multiagent systems [28], convergence of a distributed randomized PageRank algorithm [29], and so forth. In this paper, we first review the probabilistic method (PM) and the ordinary differential equation (ODE) method for SAA and the trajectory-subsequence method for SAAWET. Then, we will give an application example of SAAWET to the recursive principal component analysis.
All the above algorithms and methods are applied in a centralized situation. In recent years, distributed algorithms have been extensively investigated—for example, the consensus problem [30,31], distributed estimation [32,33], and so on. The distributed algorithms work in a network situation, where there is no central agent collecting and processing the data over the whole network but the agents cooperatively accomplish a global objective by using their local observations and information obtained from communication with their adjacent neighbours [34,35,36,37,38]. Recently, many distributed problems have been solved by SAA-based algorithms, such as distributed estimation [32,39], and distributed convex optimization [34,35]. As a result, the distributed algorithm for SAA has naturally drawn much attention from researchers. Distributed SAA (DSAA) was proposed in [40,41,42,43] and so forth, in which agents cooperatively find roots of a function, being a sum of local functions associated with agents connected in a network. It is noticed that almost all the above DSAA algorithms require restrictive conditions to guarantee convergence; for example, in [41] it requires the global Lipschitz continuity of local functions and the observation noise as a martingale difference sequence. However, these assumptions may not hold for many problems, including the distributed principal component analysis and the distributed gradient-free optimization, and so forth. Aiming at solving the distributed root-seeking problem under weaker conditions compared with those used in [41], a distributed version of SAAWET (DSAAWET) was recently developed [44]. In this paper, we will introduce DSAAWET and its convergence properties and also give some application examples of DSAAWET.
The rest of the paper is arranged as follows. In Section 2, the centralized SAA and PM, as well as the ODE method for its convergence analysis are reviewed. Then, SAAWET is introduced with the conditions for its convergence analysis and its general convergence theorem given. In this paper, SAAWET is also applied to solve the recursive principal component analysis problem. In Section 3, DSAAWET and its convergence properties are introduced, and some application examples are given as well. In Section 4, some concluding remarks are addressed. The notations used in this paper are listed in Table 1.

2. Centralized Stochastic Approximation Algorithm

2.1. Probabilistic Martingale Method

We first give the detailed problem formulation of SAA.
Assume that f ( · ) : R l R l is an unknown function with root x 0 , that is, f ( x 0 ) = 0 . At time k, assume that x k is an estimate of x 0 and the measurement of f ( · ) at x k is y k + 1 , that is,
y k + 1 = f ( x k ) + ε k + 1 , k 0 ,
where ε k + 1 is the observation noise.
With an arbitrary initial value x 0 , in [1] a recursive algorithm for searching the root x 0 of f ( · ) is introduced
x k + 1 = x k + a k y k + 1 , k 0 ,
where { a k } k 1 is the stepsize sequence which satisfies
a k > 0 , a k k 0 , k = 0 a k = .
Assumption (3) requires that the stepsize sequence tends to zero, but the sum of the stepsizes goes to infinity. This indicates that the noise effect can be asymptotically reduced while the algorithm has the ability to search in the whole feasible domain.
Example 1.
Assume that at time k, the estimate x k is very close to the root x 0 . Then there exists an ε > 0 small enough, such that
x k x 0 + a k f ( x k ) < ε .
Then,
x k + 1 x 0 = x k x 0 + a k f ( x k ) + a k ε k + 1 a k ε k + 1 ε .
From (5) it can be seen that if the stepsize sequence { a k } k 1 has a positive lower bound, then for the unbounded noises, such as Gaussian variables, the estimation error will not converge to zero. Thus, for the stepsize sequence { a k } k 1 , a k k 0 is a necessary condition for convergence of SAA.
Example 2.
Consider the case ε k + 1 0 . If k = 0 a k < and f ( · ) is bounded, then
k = 0 x k + 1 x k k = 0 a k f ( x k ) < .
If the initial value x 0 satisfies x 0 x 0 k = 0 a k f ( x k ) > 0 , then from (6) it follows that
x k + 1 x 0 = x k + 1 x 0 + x 0 x 0 x 0 x 0 k = 0 x k + 1 x k > 0 ,
and { x k } k 0 will not converge to x 0 . Thus, k = 0 a k = is another necessary condition for convergence of SAA.
We make the following assumptions.
A2.1
The stepsize sequence { a k } k 0 satisfies
a k > 0 , k = 0 a k = , k = 0 a k 2 < .
A2.2
There exists a twice-differentiable function v ( · ) : R l R , such that:
(i)
The Hessian matrix of v ( · ) is bounded.
(ii)
v ( x 0 ) = 0 , v ( x ) > 0 , x x 0 and v ( x ) as x .
(iii)
For any ε > 0 , there exists an β ε > 0 such that
sup x x 0 > ε v ( x ) T f ( x ) = β ε < 0 .
A2.3
{ ε k , F k } k 0 is an m.d.s., that is, E [ ε k + 1 | F k ] = 0 , k 0 and further, sup k 0 E ε k 2 < .
A2.4
There exists a constant c > 0 such that
f ( x ) 2 + E [ ε k + 1 2 | F k ] < c ( 1 + v ( x ) ) , k 0 .
Theorem 1.
([20,45,46]) Assume that A2.1–A2.4 hold. Then, with an arbitrary initial value x 0 , estimates generated from (2) converge to the root x 0 of f ( · ) almost surely, that is,
x k k x 0 a . s .
Proof. 
Here, we list the sketch of the proof.
By using the function v ( · ) given in A2.2, define v k + 1 :
v k + 1 ( 1 + v ( x k + 1 ) ) i = k + 1 ( 1 + a i 2 ) ,
By (9), we can obtain
0 E ( v k + 1 | F k ) v k + a k v ( x k ) T f ( x k ) i = k + 1 ( 1 + a i 2 ) v k + a k v ( x k ) T f ( x k ) v k .
By the properties of a nonnegative supermartingale, we have that { v k } k 0 converges almost surely. Noting (8), we further obtain i = k + 1 ( 1 + a i 2 ) k 1 and thus, { v ( x k ) } k 0 converges almost surely.
With an arbitrary ε > 0 , define
G ε { x : x x 0 > ε }
and a stopping sequence { σ ε i } i 0 :
σ ε i = min { t : t > σ ε i 1 , x t G ε c } .
By (13) and (14) and noting (9), we have
E [ v k + 1 | F k ] v k β ε a k I [ σ ε i > k ] .
From (16) and by the properties of the supermartingale, we have
k = 1 a k I [ σ ε i > k ] < a . s .
By (17) and (8), we can prove that σ ε i is finite almost surely, and there exists a subsequence { x k i } i 1 which satisfies x k i x 0 ε . Since ε > 0 can be arbitrarily small, we know that there exists a subsequence of { x k i } i 0 which converges to x 0 . By the convergence of { v ( x k ) } k 0 , we can finally prove (11). □
The above proof is based on the convergence of the nonnegative supermartingale, and the analysis method is accordingly named the probabilistic method (PM).
Remark 1.
If A2.2 is changed to
inf x x 0 > ε v ( x ) T f ( x ) = β ε > 0 ,
then we can prove the almost sure convergence of the following SAA
x k + 1 = x k a k y k + 1 , k 0 .
Remark 2.
From A2.3 we can see that it requires the observation noise being an m.d.s. In many problems, the observation noise may contain complicated state-dependent uncertainties, for example, ε k + 1 = ε k + 1 ( x k ) . On the other hand, if we choose v ( x ) = x x 0 2 , then from A2.4 we have
f ( x ) 2 c ( 1 + x x 0 ) 2 ,
which indicates that f ( · ) is bounded by a linear function. These are restrictions for PM.

2.2. Ordinary Differential Equation Method

We first introduce two technical lemmas.
Lemma 1.
(Arzelà-Ascoli, [47]) Let { f λ ( t ) , t [ 0 , + ) , λ Λ } be a function class satisfying the condition of uniform boundedness and equi-continuity. Then there exists a continuous function f ( t ) and a subsequence { f λ k ( t ) } k 1 such that { f λ k ( t ) } k 1 converges to f ( t ) uniformly on any fixed bounded interval in [ 0 , + ) .
Remark 3.
For the function class { f λ ( t ) , t [ 0 , + ) , λ Λ } , the equi-continuity means that for any ε > 0 , there exists δ > 0 such that for any | t s | < δ ,
i f λ ( t ) f λ ( s ) | < ε , λ Λ .
The following result is well-known for the Lyapunov stability of ODE.
Lemma 2.
Consider the ODE
x ˙ ( t ) = f ( x ( t ) ) , t 0
with an equilibrium point x 0 , that is, f ( x 0 ) = 0 . If there exists a continuously differentiable v ( x ) such that
(i)
v ( x 0 ) = 0 ;
(ii)
v ( x ) > 0 , x x 0 ; v ( x ) x ;
(iii)
v ( x ) T f ( x ) < 0 , x x 0 ;
then x 0 is the globally stable solution of (22).
Before introducing the conditions and the results for the ODE method, we first introduce a crucial notation to be used in this paper. For any T > 0 , define
m ( n , T ) max m : i = n m a i T .
From the definition in (23) we can see that m ( n , T ) is the maximal number of steps starting from n with the sum of stepsizes not exceeding T. Noting that a k 0 as k , we have that m ( n , T ) as n .
We make the following assumptions.
A3.1
a k > 0 , a k k 0 , k = 0 a k = .
A3.2
There exists a twice-differentiable function v ( · ) : R l R such that
(i)
v ( x 0 ) = 0 ;
(ii)
v ( x ) > 0 , x x 0 and v ( x ) as x ;
(iii)
v ( x ) T f ( x ) < 0 , x x 0 .
A3.3
The noise { ε k } k 0 satisfies
lim T 0 lim sup n 1 T i = n m ( n , T ) a i ε i + 1 = 0 , a k ε k + 1 k 0 .
A3.4
The function f ( · ) is continuous.
Remark 4.
If the stepsizes further satisfy k = 0 a k 2 < and the noise { ε k , F k } k 0 is an m.d.s. and sup k 0 E [ ε k + 1 2 | F k ] < a . s . , then it can be directly verified that k = 0 a k ε k + 1 < , and hence, A3.3 holds. If the noise is not stochastic but satisfies ε k k 0 , then A3.3 also holds. In this regard, (24) is more general compared with A2.3.
We give the following result for the ODE method.
Theorem 2.
([6,7,21]) Consider the sample path ω where A3.3 takes place. If A3.1, A3.2, and A3.4 hold on the sample path ω and { x k ( ω ) } k 0 is bounded, then
x k ( ω ) k x 0 .
Proof. 
We sketch the proof.
Define t k i = 0 k a i with t 0 = 0 . By interpolating the estimates generated from (2), we have a sequence of continuous functions
x t 0 t k t a k x k 1 + t t k 1 a k x k , t [ t k 1 , t k ] ,
x k ( t ) = x t + k 0 , k 1 ,
which can be proved to be bounded by the boundedness of { x k } k 0 and uniform continuity by A3.3.
Then by Lemma 1, we can prove that there exists a subsequence { x n k ( t ) } k 0 converging a continuous function x ( t ) satisfying the following ODE
x ˙ ( t ) = f ( x ( t ) ) .
By A3.2 and Lemma 2 we can further prove that the equilibrium point x 0 is globally stable and the estimates { x k } k 0 converge to x 0 on the sample path ω . □
The essential idea of the above proof is to transform the convergence of a recursive algorithm to the stability of an associated ODE, and thus the analysis method is called the ODE method. From assumption A3.3 we can see that the ODE method has a wider application potential than the PM method. However, the boundedness assumption on the estimate sequence is still restrictive. Aiming at removing this condition and further relaxing the technical assumptions on the noise and the root-seeking function, in the next section we will introduce the SAA with expanding truncations (SAAWET) and its general convergence theorem.

2.3. SAAWET and Trajectory-Subsequence Method

We first give an example.
Example 3.([5]) Consider the root-seeking of the function f ( x ) = ( x 10 ) 3 and assume that the observation noise ε k 0 .
(i)
If we choose the initial value x 0 = 0 and the stepsize a k = 1 k + 1 , it can be directly verified that the estimates generated from (2)
x 0 = 0 , x 1 = 1000 , x 2 = 485,148,500 , x 3 3.8 × 10 25 ,
(ii)
If we choose the initial value x 0 = 0 and the stepsize a k = 1 k + 10 3 , it can be directly verified that the estimates generated from (2) is convergent, that is, x k k 10 .
For Example 3, neither the growth rate condition on f ( · ) used by PM nor the boundedness condition on the estimate sequence used by the ODE method take place. On the other hand, if we can choose the stepsize in an adaptive manner, estimates generated from SAA may still converge. The core idea of SAAWET consists in adaptively choosing the stepsizes. Let us describe it.
Denote by J the root set of an unknown function f ( · ) : R l R l , that is, f ( x 0 ) = 0   x 0 J . Choose { M k } k 0 a positive sequence increasingly diverging to infinity, M k k . With an arbitrary initial value x 0 R l , the estimate sequence { x k } k 1 is generated by the following SAAWET:
x k + 1 = ( x k + a k y k + 1 ) I [ x k + a k y k + 1 M σ k ] + x * I [ x k + a k y k + 1 > M σ k ] ,
σ k = σ k 1 + I [ x k 1 + a k 1 y k > M σ k 1 ] , σ 0 = 0 ,
where
y k + 1 = f ( x k ) + ε k + 1 ,
y k + 1 is the observation of f ( · ) at x k , ε k + 1 is the observation noise, a k is the stepsize and x k is the estimate for the root set of f ( · ) at time k.
From (29) and (30), we can see that if the magnitude of x k + a k y k + 1 is located in the truncation bound, then the algorithm evolves as the classical SAA (2), while if the magnitude of x k + a k y k + 1 exceeds the truncation bound, then x k + 1 is pulled back to x * with the truncation bound being enlarged and the stepsize being reduced for the next recursion.
We first list conditions for theoretical analysis.
A4.1
a k > 0 , a k k 0 , k = 1 a k = ;
A4.2
There exists a continuously differentiable function v ( · ) : R l R such that
(i)
sup δ d ( x , J ) Δ v ( x ) T f ( x ) < 0 , 0 < δ < Δ ,
(ii)
The set v ( J ) { v ( x ) , x J } is nowhere dense,
(iii)
For x * R l in (29), there is a c 0 > 0 such that x * < c 0 and v ( x * ) < inf x = c 0 v ( x ) ,
A4.3
On the sample path ω , for any convergent subsequence { x n k } k 1 generated from (29) and (30), it holds that
lim T 0 lim sup k 1 T i = n k m ( n k , T k ) a i ε k + 1 = 0 , T k ( 0 , T ] ,
with m ( n k , T k ) max m : i = n k m a i T k ;
A4.4
The function f ( · ) is measurable and locally bounded.
Remark 5.
A set being nowhere dense means that the set of the interior points of its closure is empty. It is clear that a set with a single point is nowhere dense. Note that for the noise condition A3.3, it only requires to verify (32) on a fixed sample path and along the index of any convergent subsequence, which compared with (24) is much easier to be verified in practice. In this regard, the analysis method for SAAWET is called the trajectory-subsequence (TS) method.
Theorem 3.
([5]) For SAAWET (29) and (30), if A4.1, A4.2 and A4.4 hold, then on the sample path ω where A4.3 takes place, it holds that
d ( x k , J ) k 0 ,
where d ( x k , J ) inf y J x k y . Denote by J ¯ the closure of J and by J * the set of the limit points of { x k } . It holds that
d ( x k , J * ) k 0 ,
and J * is a closed connected subset of J ¯ .
It is direct to verify that if the root set of f ( x ) = 0 is a singleton, then under the conditions of Theorem 3 we have that d ( x k , x 0 ) k 0 and J * = J = { x 0 } .
Proof. 
We outline the proof.
Denote by { x n k } k 1 a convergent subsequence generated by (29) and (30).
Step 1.
By A4.3, prove that for all k large enough and t > 0 small enough, there is no truncation for x l , l { n k , , m ( n k , t ) } , that is,
x l + 1 = x l + a l y l + 1 , l { n k , , m ( n k , t ) } ,
and
x l + 1 x l c t , l { n k , , m ( n k , t ) } .
Step 2.
By the stability condition A4.2 and (35) and (36), prove that for SAAWET, the number of truncations is finite, that is, algorithms (29) and (30) evolve as the classical SAA after a finite number of steps.
Step 3.
To establish the convergence of { x k } k 0 . □
Remark 6.
If it is a priori known that estimates { x k } k 0 generated from (29) and (30) are in a subspace S ( R l ) , then for the convergence of { x k } k 0 , we only need to verify the stability condition in J S and in such a case assumption A4.2 is formulated as follows:
A4.2’ 
There exists a continuously differentiable function v ( · ) : R l R such that
(i)
sup δ d ( x , J S ) Δ , x S v ( x ) T f ( x ) < 0 , 0 < δ < Δ ,
(ii)
The set v ( J S ) { v ( x ) , x J } is nowhere dense,
(iii)
For x * S in (29), there exists a constant c 0 > 0 such that x * < c 0 and v ( x * ) < inf x = c 0 v ( x ) .
Example 4.
Consider the root-seeking problem in Example 3. Assume that y k + 1 = f ( x k ) + ε k + 1 with ε k + 1 0.9 ε k = w k + 1 + 0.5 w k and { w k } k 1 being a sequence of iid Gaussian variables w k N ( 0 , 0 . 1 2 ) . In (29) and (30), choose M k = 2 k + 1 , x * = 0.5 . By a direct calculation, it obtains that
x 10 = 0.5 x 50 = 7.76 x 100 = 9.26 x 200 = 9.46 x 300 = 9.52 x 400 = 9.61 .
From the above sections, we can see that SAAWET does not require that the observation noise is purely stochastic or the estimate sequence is bounded. In fact, it can be shown that conditions for convergence of SAAWET are the weakest possible in a certain sense [5]. In the next section, we will apply SAAWET to give a recursive algorithm for principal component analysis.

2.4. Recursive Principal Component Analysis Algorithm

Before introducing the recursive algorithm, we first make the following assumption.
A5.1
{ A , A k , k 1 } are n × n symmetric matrices. A is unknown and { A k } k 1 is the observation sequence of A satisfying A k k A .
The principal component analysis algorithm considered in this paper aims at estimating the eigenvalues and eigenvectors based on the observation sequence { A k } k 1 . Since A is unknown, if we perform SVD or QR decomposition for each A k , k 1 , it would be rather time-consuming. In the following, we will introduce a SAAWET-based recursive algorithm for solving the problem.
The eigenvectors of matrix A are recursively estimated as follows:
Choose the stepsize a k = 1 k and any initial values u 0 ( 1 ) R n with unit modular and define
u ˜ k + 1 ( 1 ) = u k ( 1 ) a k A k u k ( 1 ) .
If u ˜ k + 1 ( 1 ) 0 , then
u k + 1 ( 1 ) = u ˜ k + 1 ( 1 ) / u ˜ k + 1 ( 1 ) .
Otherwise, u ˜ k + 1 ( 1 ) = 0 and reset u k + 1 ( 1 ) as another vector with a unit modular.
Assume that u k ( i ) , i = 1 , , j have been well-defined. Next, we inductively define the estimation algorithm for u k ( j + 1 ) .
Define the n × j -matrix
V k ( j ) u k ( 1 ) P k ( 1 ) u k ( 2 ) P k ( j 1 ) u k ( j ) ,
where P k ( i ) I V k ( i ) V k ( i ) + , i = 1 , , j 1 and V k ( i ) + is the pseudo inverse of V k ( i ) .
With an arbitrary initial vector u 0 ( j + 1 ) with a unit modular, define
u ˜ k + 1 ( j + 1 ) = P k ( j ) u k ( j + 1 ) a k P k ( j ) A k P k ( j ) · u k ( j + 1 ) .
If u ˜ k + 1 ( j + 1 ) 0 , then
u k + 1 ( j + 1 ) = u ˜ k + 1 ( j + 1 ) / u ˜ k + 1 ( j + 1 ) .
Otherwise, u ˜ k + 1 ( j + 1 ) = 0 and reset u k + 1 ( j + 1 ) as another vector denoted still by u k + 1 ( j + 1 ) satisfying u k + 1 ( j + 1 ) = 1 and V k ( j ) τ u k + 1 ( j + 1 ) = 0 .
{ u k ( 1 ) , , u k ( n ) } serve as the estimates for eigenvectors of matrix A.
Based on { u k ( 1 ) , , u k ( n ) } , we introduce the recursive algorithms for estimating eigenvalues of matrix A.
Choose a positive sequence increasingly diverging to infinity, M k + 1 > M k > 0 , M k k . With arbitrary initial values λ 0 ( j ) , { λ k ( j ) } k 1 , j = 1 , , n are generated by the following algorithms:
λ k + 1 ( j ) = λ k ( j ) a k λ k ( j ) u k ( j ) T A k u k ( j ) · I λ k ( j ) a k λ k ( j ) u k ( j ) T A k u k ( j ) M σ k ( j ) ,
σ k ( j ) = i = 1 k 1 I λ i ( j ) a i λ i ( j ) u i ( j ) T A i u i ( j ) > M σ i ( j ) , σ 0 ( j ) = 0 ,
where λ k ( j ) is the estimate for the eigenvalue corresponding to the eigenvector which u k ( j ) estimates at time k.
Denote S the unit sphere in R n and J the set of all eigenvectors with unit modular of matrix A. Denote the set of all eigenvalues of matrix A by V ( J ) , that is, V ( J ) { λ ( 1 ) , , λ ( n ) } .
For algorithms (39)–(42), the following result takes place.
Theorem 4.
Assume that A5.1 holds.
(i)
There exists a closed-subset J j * of J such that
d u k ( j ) , J j * k 0 .
(ii)
There exists λ ( j ) V ( J ) such that
d u k ( j ) T A u k ( j ) , λ ( j ) k 0 ,
and for any v ( j ) J j *
A v ( j ) = λ ( j ) v ( j ) .
(iii)
λ k ( j ) converges to the eigenvalue λ ( j ) of matrix A and the limit set of λ k ( j ) , j = 1 , , n coincides with V ( J ) .
Proof. 
Noting that { u k ( j ) } k 0 , j = , , n are with a unit modular, algorithms (39)–(42) are SAAWET. The proof can be obtained by applying Theorem 3 and Remark 5. Due to space limitations, the detailed proof is omitted. □

3. Distributed Stochastic Approximation Algorithm

In this section, we will introduce the distributed version of SAAWET (DSAAWET). The key difference from the existing distributed SAA, see for example, [40], lies in the expanding truncation mechanism that adaptively defines enclosing bounds. The theoretical results of DSAAWET guarantee that estimates generated by DSAAWET for all agents converge almost surely to a consensus set, which is contained in the root set of the sum function.
We first introduce the problem formulation.
Consider the case where all agents in a network cooperatively search the root of the sum function given by
f ( · ) = 1 N i = 1 N f i ( · ) ,
where f i ( · ) : R l R l is the local objective function which can only be observed by agent i . Denote the root set of f ( · ) by J { x R l : f ( x ) = 0 } .
For any i V , denote by x i , k R l the estimate for the root of f ( · ) given by agent i at time k. Since at time k + 1 , agent i only has its local noisy observation
O i , k + 1 = f i ( x i , k ) + ε i , k + 1 ,
where ε i , k + 1 is the observation noise, to estimate the root of f ( · ) , all agents need to exchange information with adjacent agents via the topology of the network.
The topology of the network at time k is described by a digraph G ( k ) = { V , E ( k ) } , where V = { 1 , , N } is the index set of all agents, and E ( k ) V × V is the edge set with ( j , i ) E ( k ) representing the information flow from agent j to agent i at time k. Denote the adjacency matrix of the network at time k by W ( k ) = [ ω i j ( k ) ] i , j = 1 N , where ω i j ( k ) > 0 if, and only if ( j , i ) E ( k ) , and ω i j ( k ) = 0 , otherwise. Denote by N i ( k ) = { j V : ( j , i ) E ( k ) } the set of neighbors of agent i at time k.
We apply the idea of expanding truncation to the distributed estimation. Denote by x i , k the estimate for agent i at time k and by σ i , k the number of truncations for agent i up to time k. DSAAWET is given by (46)–(49) with any initial values x i , 0 , where O i , k + 1 is defined by (45), { γ k } is the stepsize, x * R l is a vector known to all agents, { M k } k 0 is a positive sequence increasingly diverging to infinity with M 0 x * .
σ i , 0 = 0 , σ ^ i , k = Δ max j N i ( k ) σ j , k ,
x i , k + 1 = j N i ( k ) ω i j ( k ) ( x j , k I [ σ j , k = σ ^ i , k ] + x * I [ σ j , k < σ ^ i , k ] ) + γ k O i , k + 1 I [ σ i , k = σ ^ i , k ] + x * I [ σ i , k < σ ^ i , k ] ,
x i , k + 1 = x * I [ x i , k + 1 > M σ ^ i , k ] + x i , k + 1 I [ x i , k + 1 M σ ^ i , k ] ,
σ i , k + 1 = σ ^ i , k + I x i , k + 1 > M σ ^ i , k ,
Denote by
σ k = Δ max i V σ i , k
the largest truncation number among all agents at time k. If N = 1 , then by denoting σ ^ i , k = σ i , k σ k , x i , k x k , x i , k x k , O i , k O k , (46)–(49) becomes
x k + 1 = x k + γ k O k + 1 , x k + 1 = x * I x k + 1 > M σ k + x k + 1 I x k + 1 M σ k , σ k + 1 = σ k + I x k + 1 > M σ k ,
which is the SAAWET in the centralized setting.
We first introduce assumptions to be used.
A5.1
γ k > 0 , γ k k 0 , and k = 1 γ k = .
A5.2
There exists a continuously differentiable function v ( · ) : R l R such that
( a ) sup δ d ( x , J ) Δ f T ( x ) v x ( x ) < 0
for any Δ > δ > 0 , where v x ( · ) denotes the gradient of v ( · ) and d ( x , J ) = min y { x y : y J } ,
(b) v ( J ) { v ( x ) : x J } is nowhere dense,
(c) x * < c 0 and v ( x * ) < inf x = c 0 v ( x ) for some positive constant c 0 , where x * is used in (47) and (48).
A5.3
The local functions f i ( · ) i V are continuous.
A5.4
(a) W ( k ) k 0 are doubly stochastic matrices (A matrix is said to be doubly stochastic if all entries being nonnegative and the sum of entries in each row and each column being 1.);
(b) There exists a constant 0 < η < 1 such that
ω i j ( k ) η j N i ( k ) i V k 0 ;
(c) The digraph G = { V , E } is strongly connected, where E is the set of edges ( j , i ) such that j is a neighbor of i which communicates with i infinitely often, that is,
E = { ( j , i ) : ( j , i ) E ( k ) for infinitely many indices k } ;
(d) There exists a positive integer B such that for every ( j , i ) E , agent j sends information to the neighbor i at least once every B consecutive time slots, that is,
( j , i ) E ( k ) E ( k + 1 ) E ( k + B 1 )
for all ( j , i ) E and any k 0 .
Remark 7.
Assumptions A5.1–A5.3 are similar to those conditions used in the centralized SAAWET. Since the DSAAWET is in a distributed setting, assumption A5.4 is a commonly used condition describing the topology of the network [34].
Next, we introduce the condition on the observation noises for each agent.
A5.5
For the sample path ω under consideration and for each agent i,
(a) γ k ε i , k + 1 k 0 , and
(b) lim T 0 lim sup k 1 T m = n k m ( n k , t k ) γ m ε i , m + 1 I [ x i , m K ] = 0 t k [ 0 , T ] along indices { n k } whenever { x i , n k } converges.
We now give the main results of DSAAWET.
Define the vectors X k = Δ col { x 1 , k , , x N , k } , ε k = Δ col { ε 1 , k , , ε N , k } , F ( X k ) = Δ col { f 1 ( x 1 , k ) , , f N ( x N , k ) } . Denote by X , k = Δ D X k the disagreement vector of X k with D ( I N 1 1 T N ) I l , and by x k = 1 N i = 1 N x i , k the average of the estimates derived at all agents at time k.
Theorem 5.
([44]) Let { x i , k } be produced by (46)–(49) with an arbitrary initial value x i , 0 . Assume A5.1–A5.4 hold. Then on the sample path ω where A5.5 holds, the following assertions take place:
(i)
{ x i , k } is bounded and there exists a positive integer k 0 possibly depending on ω such that
x i , k + 1 = j N i ( k ) ω i j ( k ) x j , k + γ k O i , k + 1 k k 0 ,
or in the compact form:
X k + 1 = W ( k ) I l X k + γ k ( F ( X k ) + ε k + 1 ) k k 0 ;
(ii)
X , k k 0 a n d d ( x k , J ) k 0 ;
(iii)
There exists a connected subset J * J such that
d ( x k , J * ) k 0 .
Proof. 
The proof can be obtained by first proving the finite number of truncations, then the consensus of estimates, and finally, the convergence of estimates. The detailed proof can be found in [44]. □
DSAAWET provides a likely tool for solving the distributed problems over networks. In fact, DSAAWET has been successfully applied in the following problems:
  • Distributed identification of linear systems [48],
  • Distributed blind identification of communication channels [49],
  • Output consensus of networked Hammerstein and Wiener systems [38].

4. Concluding Remarks

In this paper, we briefly reviewed the historical development of SAA, and in particular, introduced SAAWET and its distributed version (DSAAWET) developed by Han-Fu Chen and his colleagues during the past 35 years. SAAWET and DSAAWET establish general convergence results for the root-seeking problems with noise observations. Since many problems, such as identification, estimation, adaptive control, and optimization can be transformed into the root-seeking problem, SAAWET and DSAAWET can hopefully provide powerful tools for solving such problems.

Funding

This work was supported by the National Key Research and Development Program of China under Grant 2018YFA0703800 and the National Natural Science Foundation of China under Grant 61822312.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declare no conflict of interest.

References

  1. Robbins, H.; Monro, S. A stochastic approximation method. Annu. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
  2. Abadie, J.; Dekhli, F. A variant of the CESTAC method and its application to constrained optimization. Math. Comput. Simul. 1988, 30, 519–529. [Google Scholar] [CrossRef]
  3. Dickson, J.A.; McLeod, R.D.; Card, H.C. Stochastic arithmetic implementations of neural networks with in situ learning. IEEE Int. Conf. Neural Netw. 1993, 711–716. [Google Scholar] [CrossRef]
  4. Hua, H.; Wei, Z.; Qin, Y.; Wang, T.; Li, L.; Cao, J. A review of distributed control and optimization in energy internet: From traditional methods to artificial intelligence-based methods. Iet-Cyber-Phys. Syst. Theory Appl. 2021, 1–17. [Google Scholar] [CrossRef]
  5. Chen, H.-F. Stochastic Approximation and Its Applications; Kluwer: Dordrecht, The Netherland, 2002. [Google Scholar]
  6. Kushner, H.J.; Clark, D.S. Stochastic Approximation for Constrained and Unconstrained Systems; Springer: New York, NY, USA, 1978. [Google Scholar]
  7. Kushner, H.J.; Yin, G. Stochastic Approximation Algorithms and Applications; Springer: New York, NY, USA, 1997. [Google Scholar]
  8. Kushner, H.J. Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory; MIT Press: Cambridge, MA, USA, 1984. [Google Scholar]
  9. Benaim, M. A dynamical systems approach to stochastic approximation. Siam Control. Optim. 1996, 34, 437–472. [Google Scholar] [CrossRef]
  10. Chen, H.-F.; Uosaki, R. Convergence analysis of dynamic stochastic approximation. Syst. Control. Lett. 1998, 35, 309–315. [Google Scholar] [CrossRef]
  11. Dupac, V. A dynamic stochastic method. Annu. Math. Stat. 1965, 36, 1695–1702. [Google Scholar] [CrossRef]
  12. Uosaki, K. Some generalizations of dynamic stochastic approximation processes. Annu. Stat. 1974, 2, 1042–1048. [Google Scholar] [CrossRef]
  13. Chen, H.-F.; Guo, L.; Gao, A.J. Convergence and robustness of the Robbins-Monro algorithm truncated at randomly varying bounds. Stoch. Process. Their Appl. 1988, 27, 217–231. [Google Scholar] [CrossRef] [Green Version]
  14. Fang, H.-T.; Chen, H.-F. Stability and instability of limit points of stochastic approximation algorithms. IEEE Trans. Autom. Control. 2000, 45, 413–420. [Google Scholar] [CrossRef]
  15. Chen, H.-F.; Zhu, Y.M. Stochastic Approximation; Scientific and Technological Publisher: Shanghai, China, 1996. (In Chinese) [Google Scholar]
  16. Chen, H.-F. Convergence rate of stochastic approximation algorithms in the degenerate case. Siam Control. Optim. 1998, 36, 100–114. [Google Scholar] [CrossRef]
  17. Fabian, V. On asymptotically efficient recursive estimation. Annu. Stat. 1978, 6, 854–856. [Google Scholar] [CrossRef]
  18. Fang, H.-T.; Chen, H.-F. Sharp convergence rates of stochastic approximation for degenerate roots. Sci. China 1998, 41, 383–392. [Google Scholar] [CrossRef]
  19. Ljung, L.; Pflug, G.; Walk, H. Stochastic Approximation and Optimization of Random Processes; Birkhäuser: Basel, Switzerland, 1992. [Google Scholar]
  20. Nevelson, M.B.; Khasminskii, R.Z. Stochastic approximation and recursive estimation. In Translation of Math. Monographs; American Mathematical Society: Providence, RI, USA, 1976; Volume 47. [Google Scholar]
  21. Ljung, L. On positive real transfer functions and the convergence of some recursive schemes. IEEE Trans. Autom. Control. 1977, 22, 539–551. [Google Scholar] [CrossRef]
  22. Chen, H.-F.; Zhu, Y.M. Stochastic approximation procedures with randomly varying truncations. Sci. Sin. 1986, 29, 914–926. [Google Scholar]
  23. Chen, H.-F. Stochastic approximation and its new applications. In Proceedings of the 1994 Hong Kong International Workshop on New Directions of Control and Manufacturing, Hong Kong, China, 7–9 November 1994; pp. 2–12. [Google Scholar]
  24. Chen, H.-F. New approach to recursive identification for ARMA systems. IEEE Trans. Autom. Control. 2010, 55, 868–879. [Google Scholar] [CrossRef]
  25. Zhao, W.X.; Chen, H.-F. Identification of Wiener, Hammerstein, and NARX systems as Markov chains with improved estimates for their nonlinearities. Syst. Control. Lett. 2012, 61, 1175–1186. [Google Scholar] [CrossRef]
  26. Zhao, W.X.; Chen, H.-F. Markov chain approach to identifying Wiener systems. Sci. China Inf. Sci. 2012, 55, 1201–1217. [Google Scholar] [CrossRef]
  27. Chen, H.-F. Adaptive Regulator for Hammerstein and Wiener systems with noisy observations. IEEE Trans. Autom. Control. 2007, 4, 703–709. [Google Scholar] [CrossRef]
  28. Fang, H.-T.; Chen, H.-F.; Wen, L. On control of strong consensus for networked agents with noisy observations. J. Syst. Sci. Complex. 2012, 25, 1–12. [Google Scholar] [CrossRef]
  29. Zhao, W.X.; Chen, H.-F.; Fang, H.-T. Convergence of distributed randomized PageRank algorithms. IEEE Trans. Autom. Control. 2013, 58, 3255–3259. [Google Scholar] [CrossRef]
  30. Ren, W.; Beard, R.W. Consensus seeking in multiagent systems under dynamically changong interaction topologies. IEEE Trans. Autom. Control 2005, 50, 655–661. [Google Scholar] [CrossRef]
  31. Olfati-Saber, R.; Murray, R.M. Consuensus problems in networks of agnets with switching topology ans time-delays. IEEE Trans. Autom. Control 2004, 49, 1520–1533. [Google Scholar] [CrossRef] [Green Version]
  32. Zhang, Q.; Zhang, J.F. Distributed parameter estimation over unreliable networks with Markovian switching topologies. IEEE Trans. Autom. Control 2012, 57, 2545–2560. [Google Scholar] [CrossRef]
  33. Lopes, C.G.; Sayed, A.H. Diffusion least-mean square over adaptive networks: Formulation and performance analysis. IEEE Trans. Singnal Process. 2008, 56, 3122–3136. [Google Scholar] [CrossRef]
  34. Nedic, A.; Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control. 2009, 54, 48–61. [Google Scholar] [CrossRef]
  35. Nedic, A.; Ozdaglar, A.; Parrilo, P.A. Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control. 2010, 55, 922–938. [Google Scholar] [CrossRef]
  36. Simonetto, A.; Leus, G. Distributed asynchronous time-varying constrained optimization. In Proceedings of the 48th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 2–5 November 2014; pp. 2142–2146. [Google Scholar]
  37. Yi, P.; Hong, Y.; Liu, F. Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and application to economic dispatch of power systems. Automatica 2016, 74, 259–269. [Google Scholar] [CrossRef] [Green Version]
  38. Feng, W.; Chen, H.-F. Output Consensus of Networked Hammerstein and Wiener Systems. SIAM J. Control Optim. 2019, 57, 1230–1254. [Google Scholar] [CrossRef]
  39. Lei, J.; Chen, H.-F. Distributed randomized PageRank algorithm based on stochastic approximation. IEEE Trans. Autom. Control. 2015, 60, 1641–1646. [Google Scholar] [CrossRef]
  40. Bianchi, P.; Fort, G.; Hachem, W. Performance of a distributed stochastic approximation algorithm. IEEE Trans. Inf. Theory 2013, 59, 7405–7418. [Google Scholar] [CrossRef] [Green Version]
  41. Morral, G.; Bianchi, P.; Fort, G.; Jakubowicz, J. Distributed stochastic approximation: The price of non-double stochasticity. Proc. Asilomoar 2012, 1473–1477. [Google Scholar]
  42. Bokar, V.S. Asynchronous stochastic approximation. Siam J. Control Optim. 1998, 36, 840–851. [Google Scholar] [CrossRef] [Green Version]
  43. Lei, J.; Fang, H.T.; Chen, H.-F. Asymptotic Properties of Primal-Dual Algorithm for Distributed Stochastic Optimization over Random Networks with Imperfect Communications. Siam J. Control. Optim. 2018, 56, 2159–2188. [Google Scholar] [CrossRef]
  44. Lei, J.; Chen, H.-F. Distributed stochastic approximation algorithm with expanding truncations. IEEE Trans. Autom. Control. 2020, 65, 664–679. [Google Scholar] [CrossRef]
  45. Blum, J.R. Multidimensional stochastic approximation. Annu. Math. Stat. 1954, 9, 737–744. [Google Scholar] [CrossRef]
  46. Gladyshev, E.G. On stochastic approximation. Theory Probab. Appl. 1965, 10, 275–278. (In Russian) [Google Scholar] [CrossRef]
  47. Dunford, N.; Schwartz, J.T. Linear Operators, Part I: General Theory; Wiley-Interscience: New York, NY, USA, 1966. [Google Scholar]
  48. Lei, J.; Chen, H.-F. Distributed estimation for parameter in heterogeneous linear time-varying models with observations at network sensors. Commun. Inf. Syst. 2015, 15, 423–451. [Google Scholar] [CrossRef]
  49. Liu, R.; Chen, H.-F. Distributed and Recursive Blind Channel Identification to Sensor Networks. Control Theory Technol. 2017, 15, 274–287. [Google Scholar] [CrossRef]
Table 1. Notations.
Table 1. Notations.
| | v | | , | | A | | L 2 norm of vector v, matrix A
| | v | | 1 , | | A | | 1 L 1 norm of vector v, matrix A
I m m × m identity matrix
1 Vector or matrix with all entries equal to 1
0 Vector or matrix with all entries equal to 0
X T Transpose of matrix X
col { x 1 , , x m } col { x 1 , , x m } { x 1 T , , x m T } T , stacks of vectors or matrices { x 1 , , x m }
I A ( x ) Indicator function, I A ( x ) = 1 if x A , I A ( x ) = 0 otherwise
E [ · ] Expectation operator
m ( k , T ) m ( k , T ) max { m : i = k m a i T }
σ i , k Truncation number of agent i at time k
σ ^ i , k σ ^ i , k max j N i ( k ) σ j , k , where N i ( k ) is the set of neighbors of agent i at time k
σ k Largest truncation number of all agents at time k, that is, σ k = max i V σ i , k = max i V σ ^ i , k
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhao, W. An Introduction to Development of Centralized and Distributed Stochastic Approximation Algorithm with Expanding Truncations. Algorithms 2021, 14, 174. https://0-doi-org.brum.beds.ac.uk/10.3390/a14060174

AMA Style

Zhao W. An Introduction to Development of Centralized and Distributed Stochastic Approximation Algorithm with Expanding Truncations. Algorithms. 2021; 14(6):174. https://0-doi-org.brum.beds.ac.uk/10.3390/a14060174

Chicago/Turabian Style

Zhao, Wenxiao. 2021. "An Introduction to Development of Centralized and Distributed Stochastic Approximation Algorithm with Expanding Truncations" Algorithms 14, no. 6: 174. https://0-doi-org.brum.beds.ac.uk/10.3390/a14060174

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop