1. Introduction
With the development of information technology, the number of participated devices and data transmission rate have substantially increased in recent years. Solving the problems in a wide range of signal processing applications is equivalent to getting the solution of a linear least squares (LS) problem [
1]. These applications include adaptive antenna array applications [
2], multi-user detection [
3], multiple-input multiple-output (MIMO) detection [
4], echo cancellation [
5], equalization [
6], and system identification [
1,
7,
8,
9]. If the channel information is known, zero forcing (ZF) algorithm and minimum mean-square error (MMSE) algorithm are popular to be used in these applications. They are simple to implement but require the operation of matrix inversion.
The complexity of matrix inversion requires
arithmetic operations, where
U is system size [
10]. When the system size is large, the complexity of the matrix inversion is prohibitively high. As a result, there are some techniques proposed to solve systems of equations without inverting the matrix. Direct methods, for example Gaussian elimination, Cholesky factorization, and QR decomposition attain a complexity of
[
10,
11]. Therefore, it is difficult for direct methods to be implemented in real-time signal processing and hardware applications. Using these methods to solve the linear equations of large sparse systems may be prohibitively expensive and are infeasible for practical applications. Iterative methods, for example, the conjugate gradient method and the steepest descent method could achieve fast convergence. In each iteration, they require
complexity. Other iterative methods, for example, the Southwell’s relaxation [
12], Jacobi [
10] and Gauss-Seidel [
10], are coordinate descent approaches. At each iteration, they only require
complexity but have a slower convergence speed. The number of iterations performed decides the computational complexity of these techniques. Iterative methods can solve problems with sparse structures more efficiently compared with direct methods [
13]. Moreover, iterative methods have the potential to achieve a good initial guess for the expected results, which can reduce the number of iterations required to obtain a solution [
14]. However, some iterative methods include the numerical operations of multiplying or dividing, which are difficult for hardware implementation.
Prior research work investigates the dichotomous coordinate descent (DCD) algorithms that use a reduced number of operations to solve systems of equations. DCD-based algorithms are well known in many applications because the DCD algorithm avoids the operations of multiplying and dividing, which are expensive to implement. In [
15,
16,
17], by applying the DCD iterations, the numerical complexity of affine projection algorithms for active noise control is reduced. In [
18], DCD iterations solve the recursive least squares (RLS) matrix inversion problem by using bit-shift. The DCD algorithm [
14,
19] is a non-stationary iterative method based on stationary CD techniques. Only
additions or
additions are required in each iteration [
11]. The DCD algorithm, therefore, is a good choice for real-time hardware implementations. The DCD algorithm variants, cyclic DCD and leading DCD algorithm, were presented in [
20]. When scenarios with sparse true solutions are considered, such as multi-path channel estimation and detection of a multi-user system with several unknown active users, the leading DCD algorithm converges faster than the cyclic DCD algorithm. The cyclic DCD algorithm is ideal for solving systems of equations that require a large number of iterations because it converges faster and has a lower error level than the leading DCD algorithm. However, the leading DCD algorithm has a faster convergence rate than the cyclic DCD algorithm in the initial iterations if the solution is not very sparse.
In this paper, we considered a combination of leading and cyclic DCD iterations and proposed a leading-cyclic DCD algorithm for obtaining a sparse solution in systems requiring a large number of updates. The idea is that the leading DCD algorithm uses a small number of iterations to obtain the initial input for the cyclic DCD algorithm. The number of iterations used in the leading DCD algorithm has been thoroughly investigated under different system matrix conditions and solution sparsities. In the proposed algorithm, a range of the number of iterations is determined that will produce an optimal number of updates for the leading DCD algorithm. The results show that the proposed leading-cyclic DCD algorithm improves the convergence speed of the cyclic DCD algorithm and lowers the steady-state level.
Notation: Boldface uppercase and boldface lowercase letters represent matrices and vectors (e.g., and ). Elements of matrix and vector are denoted as and , respectively. The n-th column of is denoted as . denotes a matrix transpose, and denotes the Hermitian transpose.
3. Proposed Leading-Cyclic DCD Algorithm
From the results above, we can see that the cyclic DCD algorithm is more suitable for solving a non-sparse system with a large condition number. However, it is noted that when the percentage of non-zero elements of the solution is approximately
, the leading DCD algorithm provides a faster convergence in the initial updates. To further improve the convergence when the solution is not highly sparse, we consider combining the leading and cyclic DCD algorithms and propose the leading-cyclic DCD algorithm shown in
Table 3.
The proposed algorithm involves two steps. In the first step, a data vector is obtained using the leading DCD algorithm with a considerably smaller number of iterations than in the standard version of the leading DCD algorithm. In the second step, an improved soft output is generated by applying the cyclic DCD algorithm with a large number of iterations . The updated vectors of and and the step size of obtained from the leading DCD algorithm are the initial values for the cyclic DCD algorithm. There are two factors that determine the worst-case computational complexity of the leading-cyclic DCD algorithm. The cyclic DCD algorithm starts at bit for updating an element of the solution. The upper computational complexity of the cyclic DCD algorithm is calculated as follows. For the m-th bit, has successful iterations. That is, executing the initial bits does not include a successful iteration and thus requires additions. Among the U iterations , if there is only one successful iteration, then calculating the last bit requires U additions for the comparison and additions for updating the residual vector and elements . In total, successful iterations require additions. The upper bound of computational complexity of the cyclic DCD algorithm is additions. The worst-case scenario for the complexity of the leading DCD algorithm occurs when the algorithm uses all updates. By using iterations, the computational complexity of the leading DCD algorithm is additions. Therefore, the computational complexity of the leading-cyclic DCD algorithm has an upper limit of real-valued additions. In a practical situation, in each pass there should be several successful updates, and the average complexity is close to . is approximately equal to ; therefore, the average complexity of the leading-cyclic DCD algorithm is close to that of the cyclic DCD algorithm.
The selection of
varies based on the system matrix condition and the solution sparsity. The leading-cyclic algorithm is evaluated by using different
values in the range of
. Among these values of
, the one that allows the proposed algorithm to achieve the fastest convergence to the steady-state level is chosen as the optimal
.
Figure 5 shows the optimal
in the leading-cyclic DCD algorithm for matrices with different sparsities
.
The misalignments compared to the number of successful iterations for large system matrices with condition numbers (≥100) and sparse solutions are shown in
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11. By comparing the results from
Figure 6 and
Figure 7, we can see that the lower the condition number of the system matrix is, the faster the convergence speed and the lower the steady-state level of these DCD algorithms. From these figures, we can see that when
, the leading DCD algorithm has a significantly faster convergence speed and a lower steady-state misalignment than the cyclic DCD algorithm. The leading-cyclic DCD algorithm has approximately the same steady-state level as the leading DCD algorithm. As
increases, the cyclic DCD algorithm gradually outperforms the leading DCD algorithm. The leading-cyclic DCD algorithm with a given
(shown in
Figure 5) has an increased convergence speed and a lower steady-state misalignment compared to the leading DCD algorithm. When
increases to
, the leading-cyclic DCD algorithm exhibits an insignificantly increased convergence speed compared to the cyclic DCD algorithm.
Figure 8,
Figure 9,
Figure 10 and
Figure 11 show the misalignments of the DCD algorithms with conditional numbers of the system matrix in the interval [300, 400]. We can see that when
increases, the convergence speed of the DCD algorithms decreases.
According to the numerical results above, we can see that the leading-cyclic DCD algorithm is an effective approach for solving a sparse system with a large number of updates. For example, in code division multiple access (CDMA)with a spreading factor Q, there are approximately half of the users are active in the U user system. The columns of are represented as spreading sequences. The matrix is represented as a correlation matrix of the spreading sequence.
Figure 12 shows the misalignments of the RLS algorithm and DCD
-RLS (forgetting factor
, the length of the filter
, and
). When system noise energy increases by 10 times between 1000 iterations and 1100 iterations, the DCD
-RLS maintains a low error rate. The proposed algorithm-based RLS provides a lower error level and better tracking performance than the RLS algorithm.