A Combined Method for MEMS Gyroscope Error Compensation Using a Long Short-Term Memory Network and Kalman Filter in Random Vibration Environments

Zhu, Chenhao; Cai, Sheng; Yang, Yifan; Xu, Wei; Shen, Honghai; Chu, Hairong

doi:10.3390/s21041181

Open AccessArticle

A Combined Method for MEMS Gyroscope Error Compensation Using a Long Short-Term Memory Network and Kalman Filter in Random Vibration Environments

¹

Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Key Laboratory of Airborne Optical Imaging and Measurement, Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China

^*

Author to whom correspondence should be addressed.

Sensors 2021, 21(4), 1181; https://0-doi-org.brum.beds.ac.uk/10.3390/s21041181

Submission received: 2 January 2021 / Revised: 1 February 2021 / Accepted: 4 February 2021 / Published: 8 February 2021

(This article belongs to the Collection Modeling, Testing and Reliability Issues in MEMS Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

In applications such as carrier attitude control and mobile device navigation, a micro-electro-mechanical-system (MEMS) gyroscope will inevitably be affected by random vibration, which significantly affects the performance of the MEMS gyroscope. In order to solve the degradation of MEMS gyroscope performance in random vibration environments, in this paper, a combined method of a long short-term memory (LSTM) network and Kalman filter (KF) is proposed for error compensation, where Kalman filter parameters are iteratively optimized using the Kalman smoother and expectation-maximization (EM) algorithm. In order to verify the effectiveness of the proposed method, we performed a linear random vibration test to acquire MEMS gyroscope data. Subsequently, an analysis of the effects of input data step size and network topology on gyroscope error compensation performance is presented. Furthermore, the autoregressive moving average-Kalman filter (ARMA-KF) model, which is commonly used in gyroscope error compensation, was also combined with the LSTM network as a comparison method. The results show that, for the x-axis data, the proposed combined method reduces the standard deviation (STD) by 51.58% and 31.92% compared to the bidirectional LSTM (BiLSTM) network, and EM-KF method, respectively. For the z-axis data, the proposed combined method reduces the standard deviation by 29.19% and 12.75% compared to the BiLSTM network and EM-KF method, respectively. Furthermore, for x-axis data and z-axis data, the proposed combined method reduces the standard deviation by 46.54% and 22.30% compared to the BiLSTM-ARMA-KF method, respectively, and the output is smoother, proving the effectiveness of the proposed method.

Keywords:

MEMS gyroscope; random vibration environments; long short-term memory network; Kalman filter; expectation-maximization algorithm

1. Introduction

Fiber optic gyroscopes and laser gyroscopes have excellent performance, but they are too large and expensive for portable devices [1,2]. Micro-electro-mechanical-system (MEMS) gyroscopes have, in recent years, been used in low-cost inertial navigation systems (INS) due to their small size and low cost. However, the MEMS gyroscope has a significant error due to the manufacturing technology and structural composition [3,4]. The error of the MEMS gyroscope can be divided into deterministic error and random error. The deterministic error mainly refers to perturbation errors such as zero offsets and the scale factor, which can be corrected by a calibration test [5,6]. Random error refers to the random drift caused by uncertain factors, usually determined by the device’s accuracy level [7], with no precise repeatability. Therefore, it is difficult to accurately compensate for random error, which hinders the further improvement of MEMS gyroscope performance.

In MEMS gyroscope error compensation research, the MEMS gyroscope data are generally treated as time-series data. Scholars have proposed methods such as the autoregressive moving average (ARMA) model, the Allan variance (AV), the wavelet threshold (WT), the support vector machine (SVM), and the artificial neural network (ANN), and all of them have achieved excellent results [7,8,9,10,11,12,13,14]. Recently, various variants of the recurrent neural network (RNN), which has strong processing power for time-series data, have been shown to be superior to traditional methods in the research of error compensation in MEMS gyroscopes [15,16,17,18].

However, most of the research mentioned above has acquired data by placing the MEMS gyroscope in a static environment. In practical applications of the MEMS gyroscope, it is inevitably that it is affected by random vibration [19]. In random vibration environments, the MEMS gyroscope is interfered with by both internal device noise and external vibration noise [20], which dramatically affects the performance of the MEMS gyroscope. The degradation of performance in vibrating environments is a fatal problem for MEMS gyroscopes [21,22], so it is essential to research error compensation methods in random vibration environments.

Most of the current research on improving the performance of the MEMS gyroscope in random vibration environments is to fix the MEMS gyroscope on a vibration isolation platform [23,24,25]. However, this kind of method is not universal [26]. There is not much research based on time-series models—the windowed measurement error covariance (WMEC) method has been applied to compensate for the effects of the vibration environments [27], singular spectrum analysis (SSA) was proposed to remove the low-frequency vibration noise perturbations of MEMS accelerometers [28], and the third-order autoregressive (AR) model was used to estimate the Kalman filter to compensate for the MEMS gyroscope’s attitude angle error caused by random vibration [29].

Considering the dramatic perturbation of the MEMS gyroscope in random vibration environments, in this paper, a combined method of a long short-term memory (LSTM) network and Kalman filter is proposed for error compensation, with the Kalman smoother and expectation-maximization (EM) algorithm to dynamically adjust the predicted values of the LSTM network to improve the performance in error compensation. The main contributions of this paper are as follows:

(1): The combination of LSTM network and Kalman filter is applied to MEMS gyroscope error compensation in random vibration environments;
(2): The proper input data step and the network topology are explored, and the error compensation performance of the bidirectional LSTM (BiLSTM) network and other recurrent neural network (RNN) variants are compared;
(3): In designing the Kalman filter, the EM algorithm is used to estimate the parameters. It is compared with the ARMA model, a parameter estimation method commonly used in research of the MEMS gyroscope error compensation problem.

The remainder of this paper is organized as follows: (1) Section 2 introduces the methods, including BiLSTM network, Kalman filter, ARMA-KF model, and EM-KF model, and gives the illustration of this paper proposed method; (2) Section 3 presents the experiment, results, and comparisons; and (3) the remaining sections are the conclusion, appendix, and references.

2. Method

2.1. Multi-Layer BiLSTM Network and Kalman Filter

The long short-term memory network is a variant of the recurrent neural network used to solve the gradient vanishing or gradient explosion problem of RNNs [30,31]. A detailed description of LSTM units can be found in references [15,16,17,18].

The basic LSTM network only considers the historical and current inputs and ignores future inputs [32]. Therefore, the LSTM network can perform the reverse operation, superimpose the forward and reverse information flows, and fully utilize the front and back inputs at the current time to improve the error compensation performance. In addition, the previous hidden layer’s output is used as the input of the following layer to explore the more in-depth features of the time-series data, thus enhancing the model’s nonlinear fitting ability. The multi-layer BiLSTM network information flow is shown in Figure 1.

The cell state of the Layer n BiLSTM network at time t can be presented as:

[\begin{matrix} i_{t}^{(n)} \\ f_{t}^{(n)} \\ o_{t}^{(n)} \\ {\tilde{c}}_{t}^{(n)} \end{matrix}] = [\begin{matrix} σ \\ σ \\ σ \\ t a n h \end{matrix}] [\begin{matrix} W_{i, x}^{(n)} & W_{i, h}^{(n)} \\ W_{f, x}^{(n)} & W_{f, h}^{(n)} \\ W_{o, x}^{(n)} & W_{o, h}^{(n)} \\ W_{\tilde{c}, x}^{(n)} & W_{\tilde{c}, h}^{(n)} \end{matrix}] [\begin{matrix} h_{t}^{(n - 1)} \\ h_{t}^{(n)} \end{matrix}]

(1)

where

h_{t}^{(n - 1)}

is the hidden state of the Layer

n - 1

at time

t

. Each hidden state is composed of forward and reverse superposition. The related equations are denoted as follows:

h_{t}^{(n)} = {\vec{h}}_{t}^{(n)} \oplus {\overset{\leftarrow}{h}}_{t}^{(n)}

(2)

The Kalman filter is an optimal state estimation method that can be applied to dynamic systems with random disturbances. It estimates the system state based on discrete measurement that contain noise [33,34]. Suppose the state–space model is built as:

{\hat{x}}_{k} = Φ {\hat{x}}_{k - 1} + Γ ω_{k - 1}

(3)

y_{k} = H x_{k} + v_{k}

(4)

where

Φ

is the system state transition matrix,

Γ

is the system noise-driven matrix,

H

is the measurement matrix,

{\hat{x}}_{k}

is the system state vector,

y_{k}

is the measurement vector,

ω_{k}

is the system noise vector, and

v_{k}

is the measurement noise vector.

The noise of the system models and measurement models are assumed to have normal distribution in the Kalman filter, such that [35]:

ω_{k} \sim N (0, Q)

(5)

v_{k} \sim N (0, R)

(6)

where

Q

is the covariance matrix of the system models and

R

is the covariance matrix of measurement models.

The Kalman filter is composed of two-stage optimization. In the predicted stage, the current system state vector is predicted based on the system state vector at the previous time, such that:

{\hat{x}}_{k / k - 1} = Φ {\hat{x}}_{k - 1}

(7)

P_{k / k - 1} = Φ P_{k - 1} Φ^{T} + Γ Q Γ^{T}

(8)

where

{\hat{x}}_{k / k - 1}

is the predicted value of the system state vector and

P_{k / k - 1}

is the predicted covariance matrix of the system state vector.

In the updated stage of the Kalman filter, the current system state vector is updated by using the measurement vector, such that:

K_{k} = P_{k / k - 1} H^{T} {(H P_{k / k - 1} H^{T} + R)}^{- 1}

(9)

{\hat{x}}_{k} = {\hat{x}}_{k / k - 1} + K_{k} (y_{k} - H {\hat{x}}_{k / k - 1})

(10)

P_{k} = (I - K_{k} H) P_{k / k - 1}

(11)

where

K_{k}

is the Kalman filter gain matrix and

P_{k}

is the updated covariance matrix of the system state vector.

2.2. Kalman Filter Design with ARMA Model

The autoregressive moving average model is the most widespread model used in time-series analysis, and it is derived and developed on the basis of the linear regression model. The ARMA model can be described as [36]:

x_{t} = \sum_{i = 1}^{p} φ_{i} x_{t - i} - \sum_{j = 1}^{q} θ_{j} ε_{t - j} + ε_{t}, ε_{t} \sim N (0, δ_{ε}^{2})

(12)

where

p

and

q

are the order of the ARMA model;

φ_{i}

and

θ_{j}

are coefficients that satisfy stationary and invertible conditions, respectively [37];

and ε_{t}

is white noise, which is an uncorrelated random variable with mean zero and constant variance. The model expresses that the measured values of the stochastic process

{x_{t}}

at time

t

are correlated with the previous

p

measurements and the previous

q

white noise.

The steps to design a Kalman filter using the ARMA model as follows: (1) test the stationarity and normality of the measurement data, (2) determine the model type according to the autocorrelation function and partial autocorrelation function, (3) determine the order and parameters of the model according to the Akaike information criterion (AIC) [38], and (4) perform adaptive testing of the designed model.

2.3. Kalman Filter Design with EM Algorithm

The expectation-maximization algorithm is an iterative method proposed by Shumway and Stoffer to compute maximum likelihood estimates based on incomplete data [39]. It is convergent and can identify parameters and states in the model [40]. Andrieu and Doucet introduced the EM algorithm for parameter estimation for linear state–space models was introduced [41]. The EM algorithm is an iterative numerical algorithm for computing the maximum likelihood estimation (MLE). The linear Gaussian state–space model used for the EM algorithm can be expressed as follows [42]:

x_{k} = Φ x_{k - 1} + ω_{k - 1}

(13)

y_{k} = H x_{k} + υ_{k}

(14)

The conditional probability densities of the state equation and the measurement equation are obtained from Equations (13) and (14), respectively:

p (x_{k} | x_{k - 1}) = \exp {- \frac{1}{2} {[x_{k} - Φ x_{k - 1}]}^{T} Q^{- 1} [x_{k} - Φ x_{k - 1}]} {(2 π)}^{- \frac{n}{2}} {| Q |}^{- \frac{1}{2}}

(15)

p (y_{k} | x_{k}) = \exp {- \frac{1}{2} {[y_{k} - H x_{k}]}^{T} R^{- 1} [y_{k} - H x_{k}]} {(2 π)}^{- \frac{m}{2}} {| R |}^{- \frac{1}{2}}

(16)

It is assumed that the likelihood of the system state data and the evolution of the states is Gaussian, which are defined by the following equations [43]:

p (Y, X | Θ) = p (x_{1}) \prod_{k = 1}^{N} p (y_{k} | x_{k}) \prod_{k = 2}^{N} p (x_{k} | x_{k - 1})

(17)

where

Y

is the measurement data,

X

is the unknown system state data, and

Θ

is the parameter set of linear Gaussian state–space model.

Θ

can be represented as follows:

Θ = {Φ, H, Q, R}

(18)

By taking the log of the likelihood we arrive at the following formula:

\ln p (Y, X | Θ) = - \frac{1}{2} \sum_{k = 2}^{N} {\ln | Q | + {[x_{k} - Φ x_{k - 1}]}^{T} Q^{- 1} [x_{k} - Φ x_{k - 1}]} - \frac{1}{2} \sum_{k = 1}^{N} {\ln | R | + {[y_{k} - H x_{k}]}^{T} R^{- 1} [y_{k} - H x_{k}]}

(19)

Depending on the maximum likelihood method, the linear Gaussian state–space model can be identified through an EM algorithm [42]. The algorithm alternates between two steps—the E-step (expectation) and the M-step (maximization) [44]. In general, the likelihood density function based on the measurement data, denoted by

p (Θ | Y)

, is called the posterior distribution of the measurement. The EM algorithm aims to compute the maximum likelihood estimation of

p (Θ | Y)

.

Θ_{i}

is denoted as the estimate of the likelihood function at the beginning of the

i

th iteration.

In the E-step, the expectation for the conditional distribution of

\ln p (Y, X | Θ)

concerning

X

, is calculated such that:

Ω (Θ | Θ_{i}, Y) ≙ E_{X} {\ln p (Θ | Y, X) | Θ_{i}, Y} = \int [\ln p (Θ | Y, X)] (X | Θ_{i}, Y) d X

(20)

In the M-step,

Ω (Θ | Θ_{i}, Y)

is maximized to find

Θ_{i + 1}

such that:

Ω (Θ_{i + 1} | Θ_{i}, Y) = a r g \max_{Θ} [Ω (Θ | Θ_{i}, Y)]

(21)

The E-step and the M-step are iterated until,

‖ L (Θ_{i + 1}) - L (Θ_{i}) ‖ < τ

(22)

where

τ

is the predefined threshold. Equation (22) means that it has satisfied the convergence criterion. The specific process of designing a Kalman filter using the EM algorithm is given as follows:

The value of

Ω (Θ | Θ_{i}, Y)

is determined by the following [45]:

E_{X} (x_{k} | Y) = {\hat{x}}_{k | N}

(23)

E_{X} (x_{k} x_{k - 1}^{T} | Y) = P_{k, k - 1 | N} + {\hat{x}}_{k | N} {\hat{x}}_{k - 1 | N}^{T}

(24)

E_{X} (x_{k} x_{k}^{T} | Y) = P_{k | N} + {\hat{x}}_{k | N} {\hat{x}}_{k | N}^{T}

(25)

where

{\hat{x}}_{k | N}

is the smoothed value of the system state vector and

P_{k | N}

is the smoothed covariance matrix of the system state vector.

P_{k, k - 1 | N}

is initialized by:

P_{k, k - 1 | k} = (I - K_{k} H) Φ P_{k - 1}

(26)

P_{k, k - 1 | N} = P_{k, k - 1 | k} + [P_{k | N} - P_{k}] P_{k | k}^{- 1} P_{k, k - 1 | k}

(27)

{\hat{x}}_{k | N}

and

P_{k | N}

can be obtained by smoothing the outputs of Kalman filter using backward-pass methods such as the Rauch–Tung–Striebel (RTS) smoother [46]. This method is summarized in the following equations:

J_{k - 1} = P_{k - 1} Φ^{T} P_{k | k - 1}^{- 1}

(28)

{\hat{x}}_{k - 1 | N} = {\hat{x}}_{k - 1} + J_{k - 1} ({\hat{x}}_{k | N} - {\hat{x}}_{k / k - 1})

(29)

P_{k - 1 | N} = P_{k - 1} - J_{k - 1} (P_{k | N} - P_{k | k - 1}) J_{k - 1}^{T}

(30)

Then, the model parameters are re-estimated by maximizing the

Ω (Θ | Θ_{i}, Y)

over

Θ

using partial derivatives of

Ω (Θ | Θ_{i}, Y)

and setting them to zero. Solving these equations yields the updated parameters (in the

i

th iteration) as follows:

\frac{\partial L (Θ)}{\partial Φ} = - \sum_{k = 2}^{N} Q^{- 1} (P_{k, k - 1 | N} + {\hat{x}}_{k | N} {\hat{x}}_{k | N}^{T}) + \sum_{k = 2}^{N} Q^{- 1} Φ (P_{k - 1 | N} + {\hat{x}}_{k - 1 | N} {\hat{x}}_{k - 1 | N}^{T}) = 0

(31)

Φ_{i + 1} = (\sum_{k = 2}^{N} P_{k, k - 1 | N} + {\hat{x}}_{k | N} {\hat{x}}_{k | N}^{T}) {(\sum_{k = 2}^{N} P_{k - 1 | N} + {\hat{x}}_{k - 1 | N} {\hat{x}}_{k - 1 | N}^{T})}^{- 1}

(32)

\frac{\partial L (Θ)}{\partial H} = - \sum_{k = 1}^{N} R^{- 1} y_{k} {\hat{x}}_{k | N}^{T} + \sum_{k = 1}^{N} R^{- 1} H (P_{k | N} + {\hat{x}}_{k | N} {\hat{x}}_{k | N}^{T}) = 0

(33)

H_{i + 1} = (\sum_{k = 1}^{N} y_{k} {\hat{x}}_{k | N}^{T}) {[\sum_{k = 1}^{N} (P_{k | N} + {\hat{x}}_{k | N} {\hat{x}}_{k | N}^{T})]}^{- 1}

(34)

\frac{\partial L (Θ)}{\partial Q^{- 1}} = \frac{N}{2} Q - \frac{1}{2} \sum_{k = 1}^{N} (P_{k | N} + {\hat{x}}_{k | N} {\hat{x}}_{k | N}^{T}) + Φ [\frac{1}{N} \sum_{k = 2}^{N} (P_{k, k - 1 | N} + {\hat{x}}_{k | N} {\hat{x}}_{k - 1 | N}^{T})] = 0

(35)

Q_{i + 1} = \frac{1}{N} (\sum_{k = 1}^{N} (P_{k | N} + {\hat{x}}_{k | N} {\hat{x}}_{k | N}^{T}) - Φ_{i + 1} \sum_{k = 2}^{N} (P_{k, k - 1 | N} + {\hat{x}}_{k | N} {\hat{x}}_{k - 1 | N}^{T}))

(36)

\frac{\partial L (Θ)}{\partial R^{- 1}} = \frac{N + 1}{2} R - \sum_{k = 1}^{N} (\frac{1}{2} y_{k} y_{k}^{T} - H {\hat{x}}_{k | N} y_{k}^{T} + \frac{1}{2} H (P_{k | N} + {\hat{x}}_{k | N} {\hat{x}}_{k | N}^{T}) H^{T}) = 0

(37)

R_{i + 1} = \frac{1}{N + 1} (\sum_{k = 1}^{N} y_{k} y_{k}^{T}) - H_{i + 1} {(\frac{1}{N + 1} \sum_{k = 1}^{N} y_{k} {\hat{x}}_{k | N})}^{T}

(38)

In this paper, based on the EM algorithm, the proposed LSTM and Kalman filter combination method is illustrated in Figure 2.

3. Experiments and Results

In this section, the designed experiments and the analysis of the results are presented to verify the effectiveness of the proposed method.

3.1. Data Acquisition

The MSI320H MEMS Inertial Measurement Unit (IMU) was employed for experiments. This consists of a three-axis MEMS gyroscope and a three-axis MEMS accelerometer. The real picture and the gyroscope specifications of MSI320H are shown in Figure 3a and Table 1, respectively. The MSI320H was fixed on the vibration table. A picture of the vibration table is shown in Figure 3b. The data acquisition procedure of the MSI320H is shown in Figure 3c. Data from the MSI320H was sent to the xPC via the RS422 communication interface with a Baud of 921,600 bps. The xPC decoded the gyroscope data and sent it to the host computer via the network cable. The MSI320H was preheated at room temperature with power for 20 minutes. Then, linear vibration experiments were performed. The vibration direction of the vibration table is the y-axis of the gyroscope, and the power spectral density (PSD) of the linear random vibration loads is shown in Figure 3d.

As illustrated in Figure 3d, the acceleration of the applied vibration loads can be expressed as follows:

a_{v} (t) = ξ_{v} \sin (ω_{v} t), ω_{v} \in [20 \cdot 2 π, 2000 \cdot 2 π]

(39)

where

ξ_{v} = 6

g. The sample rate was set to 200 Hz, and approximately 60,000 data were acquired in the random vibration environment. The variation of gyroscope x-axis signal affected by random vibration is shown in Figure 4, and the error of the gyroscope increased significantly in random vibration environments.

3.2. Comparison of BiLSTM and Other RNN Variants

The outliers of the acquired data in random vibration environments were eliminated using the Puata criterion [47]. Because the linear random vibration test’s direction was the y-axis of the gyroscope, the x-axis and z-axis directions were in a random vibration environment. For the consideration of model generality, we took the last 80% of the processed x-axis data as the training set and the first 20% of the x-axis and z-axis data were used as the testing set.

If the hidden layer structure of the designed network is too simple, it is not easy to characterize the time-series model of gyroscope data. Conversely, it increases the complexity of the network, reduces the learning speed of the network, and tends to fall into local minima during the learning process. With the above considerations, in this paper, the proposed BiLSTM network is shown in Figure 5. The dense layer transformed the high-dimensional stacked sequence of the hidden layer into an output sequence with the same shape as the input sequence. Moreover, considering a large number of network parameters, dropout was set in the dense layer to prevent the overfitting phenomenon [48].

The adaptive moment estimation (Adam) optimization algorithm was used to update the network parameters [49]. The Adam algorithm uses default parameters. The activation function of the Dense layer is rectified linear unit (ReLU). The specifications used for network training are illustrated in Table 2. Moreover, the root mean square error (RMSE) was used as the loss function. The network training was performed on Tensorflow 2.0.0 and Keras 2.3.1 over the Ubuntu 16.04-LTS-x86 64 operating system. The heterogeneous computing platform was equipped with Intel Xeon E5-1620 and GeForce RTX-2080Ti GPUs.

In order to verify the performance of BiLSTM for gyroscope error compensation in random vibration environments, proper values for the input data step size, the number of hidden units, and the number of hidden layers was first explored using the x-axis testing set. Subsequently, training was performed using the identified values. The BiLSTM network results were compared with the LSTM network, gated recurrent unit (GRU) network, and bidirectional GRU (BiGRU) network using the x-axis and z-axis testing sets, respectively.

As shown in Table 3, Table 4 and Table 5, when the input data step size and the number of hidden layers are more extensive, the training time per epoch will be longer. So we needed to make a trade-off between the results and the computational performance. According to the results, the best results were obtained when taking the input data step of 20, the number of hidden units of 128, and the number of hidden layers of 10. Although it does not indicate that this is the optimal parameter for the network, it will be the proper value to be obtained considering the computational resources.

The results are shown in Figure 6 and Figure 7 and Table 6 and Table 7. Figure 6 shows the training loss within 50 epochs, and convergence was achieved for all networks. Table 6 and Table 7 show that the standard deviations of the BiLSTM network results for the x-axis and z-axis were reduced by 46.81% and 43.63%, respectively, compared to the raw data, proving that the BiLSTM network is feasible for the application in the research of MEMS gyroscope error compensation. Furthermore, compared with the results of the LSTM network, the BiGRU network, and the GRU network, the standard deviation values of BiLSTM results in the x-axis were reduced by 14.06%, 11.66%, and 17.33%, respectively, and the standard deviation values of BiLSTM results in the z-axis were reduced by 12.71%, 10.04%, and 14.04%, respectively. This indicates that the error compensation performance of the BiLSTM network is better than these three networks.

3.3. Comparison of LSTM-EM-KF and LSTM-ARMA-KF

In this section, the raw data and BiLSTM network results of the x-axis and z-axis are used as the measurement, respectively. The Kalman filter parameters are estimated by the ARMA model and EM algorithm, and the filter results are compared.

3.3.1. Estimating Kalman Filter Parameters Using the ARMA Model

When modeling time-series data using the ARMA model, the time-series data must meet stationarity and normality requirements. Therefore, a polynomial fitting method was used to eliminate the trend term before modeling. In this paper, the stationarity was tested using the run test, and the normality was tested by calculating the skewness,

ξ,

and the kurtosis,

υ

.

According to the test, after eliminating the trend term, the raw data and BiLSTM network results of the x-axis and z-axis met the stationarity and normality requirements. The test process is shown in Appendix A. Moreover, as illustrated in Figure A1, the autocorrelation function and partial autocorrelation function diagrams exhibit trailing properties, and all models can be identified as an ARMA

(p, q)

model.

The next step is to determine the order of the model. If the order is increased, the identified model will be more realistic, but the computational difficulty will also increase as the order increases. Therefore, the maximum order was set to 3, which means the maximum value of

p

and

q

was set to 3. Furthermore, the Akaike information criterion (AIC) was used for determining model order. Determining the model order process and the Durbin–Watson test results are shown in Appendix B.

For x-axis raw data, the model identified is identified as ARMA(3,3):

x_{n} = - 0.3126 x_{n - 1} + 0.8168 x_{n - 2} + 0.1520 x_{n - 3} + ε_{n} + 0.6885 ε_{n - 1} - 0.3241 ε_{n - 2} - 0.0366 ε_{n - 3}

(40)

For x-axis BiLSTM network results, the model is identified as ARMA(3,3):

x_{n} = 0.8505 x_{n - 1} + 0.8891 x_{n - 2} - 0.7745 x_{n - 3} + ε_{n} + 0.1354 ε_{n - 1} - 0.8476 ε_{n - 2} - 0.0815 ε_{n - 3}

(41)

For z-axis raw data, the model identified is identified as ARMA(1,3):

x_{n} = 0.8593 x_{n - 1} + ε_{n} - 0.51915 ε_{n - 1} + 0.0328 ε_{n - 2} - 0.0236 ε_{n - 3}

(42)

For z-axis BiLSTM network results, the model identified is identified as ARMA(3,3):

x_{n} = 2.0000 x_{n - 1} - 1.2268 x_{n - 2} + 0.2031 x_{n - 3} + ε_{n} - 1.0446 ε_{n - 1} - 0.0809 ε_{n - 2} - 0.1086 ε_{n - 3}

(43)

where

x_{n}

is the output of the ARMA model,

ε_{n}

is the driving white noise (with mean, 0, and variance,

{\hat{δ}}_{ε}^{2}

). The Kalman filter parameters are presented in Table 8. The value of R is the variance of the measurement. The initial value of the Kalman filter is set as follows:

x_{1} = [0; 0; 0; 0]

, and

P_{1}

is the fourth-order identity matrix.

3.3.2. Estimating Kalman Filter Parameters Using the EM Algorithm

When using the EM algorithm to estimate the Kalman filter parameters, only the iteration convergence conditions and initial parameters need to be set. The M-step convergence constant

τ

in Equation (22) was set to 0.1. The Kalman filter’s initial values were set to

x_{1} = 0 and P_{1} = 1

, and the Kalman filter’s initial parameters were set to

Φ_{1} = 1

,

H_{1} = 1

,

Q_{1} = 1

, and

R_{1} = 1

. The change of the log-likelihood function during the iteration of the EM algorithm is shown in Figure 8. The parameter estimation results are presented in Table 9.

3.3.3. Kalman Filtering Results

The results are illustrated in Table 10 and Table 11. For the x-axis data, the standard deviation of the BiLSTM-EM-KF results was reduced by 51.58% and 31.92% compared to the BiLSTM network and EM-KF, respectively. For the z-axis data, the standard deviation of the BiLSTM-EM-KF results was reduced by 29.19% and 12.75% compared to the BiLSTM network and EM-KF, respectively. Therefore, the combined method proposed in this paper can be demonstrated to improve the gyroscope error compensation performance of the BiLSTM network and EM algorithm. Moreover, compared with BiLSTM-ARMA-KF results, the standard deviation of the BiLSTM-EM-KF was reduced by 46.54% and 22.30% in x-axis and z-axis, respectively. It indicates the proposed method’s superior performance to that of BiLSTM-ARMA-KF. Furthermore, according to Figure 9, the curves of BiLSTM-EM-KF results are smoother, which proves that the proposed combined method is effective.

4. Conclusions

In this paper, a combined method of an LSTM network and Kalman filter is proposed for MEMS gyroscope error compensation in random vibration environments. Through the results, the following conclusions were obtained:

(1): After exploring proper input data step size and network topology, the network was trained, and the test results showed that the BiLSTM network outperformed the LSTM network, the GRU network, and the BiGRU network in gyroscope error compensation;
(2): Combining the BiLSTM network with the EM-KF method can improve their gyroscopic error compensation performance;
(3): In the classical gyroscope error compensation method, the ARMA-KF method, tedious data testing and model checking are required. In contrast, the EM-KF method only needs to set the initial parameters and the convergence value, which is much easier to apply. Moreover, the ARMA-KF method parameters cannot be updated through the filtering process, which means that satisfactory results cannot be obtained if the parameters are not defined correctly before the filtering process. From the filtering results, compared with BiLSTM-ARMA-KF, the standard deviation of the BiLSTM-EM-KF results were 46.54% and 22.30% lower, in x-axis and z-axis, and the output curve was smoother, which proves the effectiveness of the proposed method in this paper.

Future work should include conducting dynamic field experiments to obtain MEMS gyroscope outputs, as well as combining neural networks with more state-of-the-art Kalman filter methods for MEMS gyroscope error compensation and using fiber optic gyroscopes or laser gyroscopes as benchmarks for comparison.

Author Contributions

Conceptualization, C.Z. and H.C.; methodology, C.Z., S.C. and W.X.; software, C.Z. and Y.Y.; validation, S.C. and H.S.; formal analysis, C.Z. and S.C.; investigation, H.C. and W.X.; resources, S.C. and W.X.; data curation, C.Z.; writing—original draft preparation, C.Z.; writing—review and editing, C.Z.; visualization, C.Z.; supervision, S.C.; project administration, H.S.; funding acquisition, W.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Jilin Province, China, grant number 20200201170JC.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Stationarity and Normality Tests

For stationarity, the most common test method is the run test. It was divided equally into 20 groups

{X_{1}, X_{2}, \dots, X_{20}}

, and the mean square value of each group was

σ_{i}

where

i \in [1, 20]

. The mean of

{σ_{i}}

was

σ_{m e a n}

, and the difference between each

σ_{i}

and

σ_{m e a n}

formed

{σ_{a i}}

:

σ_{a i} = σ_{i} - σ_{m e a n}

(A1)

The total number of positive values in

{σ_{a i}}

was recorded as

n_{1}

, and the total number of negative values in

{σ_{a i}}

was recorded as

n_{2}

. The number of positive and negative alternations in order and plus 1 more was the Run

r

. According to the run test table, whether

r

was within the confidence interval was determined at the significance level

α = 0.05

.

After eliminating the trend term, the raw data and BiLSTM network results of the x-axis and z-axis met the stationarity requirements. The test process is presented in Table A1, Table A2, Table A3 and Table A4.

Table A1. Run test process of x-axis raw data.

$σ_{m e a n} = 0.0604$ , $n_{1} = 8$ , $n_{2} = 12$ , $r = 14$ , Significance Level $α = 0.05$ , Confidence Interval $[6, 16]$ .
	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$
$σ_{i}$	0.0615	0.0583	0.0516	0.0621	0.0664	0.0572	0.0640	0.0545	0.0571	0.0628
$σ_{a i}$	0.0011	−0.0021	−0.0088	0.0017	0.0060	−0.0032	0.0036	−0.0059	−0.0033	0.0024
r	1	2		3		4	5	6		7
	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$	$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$
$σ_{i}$	0.0601	0.0602	0.0565	0.0743	0.0585	0.0655	0.0532	0.0693	0.0588	0.0557
$σ_{a i}$	−0.0003	−0.0002	−0.0039	0.0139	−0.0019	0.0051	−0.0072	0.0089	−0.0016	−0.0047
$r$	8			9	10	11	12	13	14

Table A2. Run test process of x-axis BiLSTM network results.

$σ_{m e a n} = 0.0175$ , $n_{1} = 11$ , $n_{2} = 9$ , $r = 14$ , Significance Level $α = 0.05$ , Confidence Interval $[6, 16]$ .
	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$
$σ_{i}$	0.0189	0.0177	0.0144	0.0192	0.0198	0.0163	0.0191	0.0154	0.0159	0.0192
$σ_{a i}$	0.0014	0.0002	−0.0031	0.0017	0.0023	−0.0012	0.0016	−0.0022	−0.0016	0.0017
r	1		2	3		4	5	6		7
	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$	$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$
$σ_{i}$	0.0176	0.0178	0.0159	0.0224	0.0149	0.0191	0.0144	0.0205	0.0166	0.0150
$σ_{a i}$	0.0001	0.0003	−0.0016	0.0049	−0.0026	0.0016	−0.0031	0.0030	−0.0009	−0.0025
$r$			8	9	10	11	12	13	14

Table A3. Run test process of z-axis raw data.

$σ_{m e a n} = 0.0596$ , $n_{1} = 10$ , $n_{2} = 10$ , $r = 10$ , Significance Level $α = 0.05$ , Confidence Interval $[6, 16]$ .
	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$
$σ_{i}$	0.0673	0.0565	0.0639	0.0694	0.0611	0.0570	0.0489	0.0533	0.0556	0.0539
$σ_{a i}$	0.0077	−0.0031	0.0043	0.0098	0.0015	−0.0026	−0.0107	−0.0063	−0.0040	−0.0057
r	1	2	3			4
	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$	$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$
$σ_{i}$	0.0577	0.0671	0.0642	0.0645	0.0585	0.0619	0.0516	0.0622	0.0620	0.0560
$σ_{a i}$	−0.0019	0.0075	0.0046	0.0049	−0.0011	0.0023	−0.0080	0.0026	0.0024	−0.0036
$r$		5			6	7	8	9		10

Table A4. Run test process of z-axis BiLSTM network results.

$σ_{m e a n} = 0.0183$ , $n_{1} = 9$ , $n_{2} = 11$ , $r = 10$ , Significance Level $α = 0.05$ , Confidence Interval $[6, 16]$ .
	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$
$σ_{i}$	0.0216	0.0159	0.0200	0.0227	0.0180	0.0168	0.0139	0.0160	0.0158	0.0158
$σ_{a i}$	0.0033	−0.0024	0.0017	0.0044	−0.0003	−0.0015	−0.0044	−0.0023	−0.0025	−0.0025
r	1	2	3		4
	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$	$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$
$σ_{i}$	0.0173	0.0217	0.0213	0.0208	0.0175	0.0197	0.0154	0.0199	0.0191	0.0162
$σ_{a i}$	−0.0010	0.0034	0.0030	0.0025	−0.0008	0.0014	−0.0029	0.0016	0.0008	−0.0021
$r$		5			6	7	8	9		10

For normality, the most common test method is calculating skewness,

ξ,

and kurtosis,

υ

. When the calculated

ξ

is close to 0, and

υ

is close to 3, the data are considered to satisfy normality. For the valuation of ξ and υ:

ξ = E [{(\frac{x_{N} - e_{x}}{σ_{x}})}^{3}] = \frac{E [{(x_{N} - e_{x})}^{3}]}{{E [{(x_{N} - e_{x})}^{2}]}^{\frac{3}{2}}} = \frac{\frac{1}{N} \sum_{n = 1}^{N} {(x_{N} - e_{x})}^{3}}{{[\frac{1}{N} \sum_{n = 1}^{N} {(x_{N} - e_{x})}^{2}]}^{\frac{3}{2}}}

(A2)

υ = E [{(\frac{x_{N} - e_{x}}{σ_{x}})}^{4}] = \frac{E [{(x_{N} - e_{x})}^{4}]}{{E [{(x_{N} - e_{x})}^{2}]}^{2}} = \frac{\frac{1}{N} \sum_{n = 1}^{N} {(x_{N} - e_{x})}^{4}}{{[\frac{1}{N} \sum_{n = 1}^{N} {(x_{N} - e_{x})}^{2}]}^{2}}

(A3)

where

e_{x}

and

σ_{x}

are the mean and standard deviation of the data, respectively.

After eliminating the trend term, the raw data and BiLSTM network results of the x-axis and z-axis met the normality requirements. The results are presented in Table A5.

Table A5. The normality test results.

	x-axis Raw Data	x-axis BiLSTM Network Results	z-axis Raw Data	z-axis BiLSTM Network Results
Skewness ξ	2.8329	2.7283	2.7294	2.8156
Kurtosis υ	0.0177	−0.1077	−0.0631	−0.0170

Appendix B. Determine the ARMA Model Order

Figure A1. Autocorrelation function and partial autocorrelation function diagram. (a,b) x-axis raw data, (c,d) x-axis BiLSTM network results, (e,f) z-axis raw data, and (g,h) z-axis BiLSTM network results.

The Akaike information criterion can be expressed as follows:

AIC (p, q) = N \ln (δ^{2} (p, q)) + 2 (p + q + 1)

(A4)

where

δ^{2}

is the variance of driving white noise under each order and

N

is the total number of data samples. When AIC obtains the minimum value, the fitted model is the optimal model. In the case that the maximum values of

p

and

q

are set to 3, the AIC values at each order are shown in Table A6, Table A7, Table A8 and Table A9.

Table A6. AIC values of x-axis raw data at each order.

	p = 0	p = 1	p = 2	p = 3
q = 0	$-$	−4350.7006	−5373.4026	−5483.9134
q = 1	−2378.8661	−5480.4267	−5503.6485	−5503.2634
q = 2	−3784.2642	−5501.9738	−5502.2446	−5501.9462
q = 3	−4524.5464	−5504.1306	−5502.9798	−5514.9860

Table A7. AIC values of x-axis BiLSTM network results at each order.

	p = 0	p = 1	p = 2	p = 3
q = 0	$-$	−33245.8540	−33390.6679	−33390.1658
q = 1	−24407.6844	−33392.3127	−33390.3534	−33387.4822
q = 2	−28837.8009	−33390.3543	−33391.2919	−33390.0297
q = 3	−30742.3669	−33388.3649	−33390.0548	−33456.4704

Table A8. AIC values of z-axis raw data at each order.

	p = 0	p = 1	p = 2	p = 3
q = 0	$-$	−3384.5655	−4241.9726	−4367.2696
q = 1	−1992.4063	−4410.0138	−4414.9553	−4417.2529
q = 2	−3107.6730	−4414.3979	−4414.5702	−4415.3240
q = 3	−3612.4223	−4417.6140	−4415.7666	−4414.6351

Table A9. AIC values of z-axis BiLSTM network results at each order.

	p = 0	p = 1	p = 2	p = 3
q = 0	$-$	−31059.1193	−31186.5670	−31212.5235
q = 1	−23478.3934	−31198.8844	−31202.3767	−31212.2053
q = 2	−27430.5011	−31206.1886	−31221.9048	−31228.0632
q = 3	−29112.3646	−31212.9557	−31211.5881	−31262.0166

The first-order autocorrelation of the identified model residuals

{ω_{t}}

was tested by the Durbin–Watson method. Assuming that the first-order correlation of

{ω_{t}}

can be defined as:

ω_{t} = ρ ω_{t - 1} + v_{t}

(A5)

when

ρ

= 0, there is no first-order autocorrelation in

{ω_{t}}

. Then the Durbin–Watson test value

d

:

d = \frac{\sum_{n = 2}^{N} {(ω_{n} - ω_{n - 1})}^{2}}{\sum_{n = 1}^{N} ω_{n}^{2}} \approx 2 (1 - ρ)

(A6)

The identified model’s Durbin–Watson test value is shown in Table A10, indicating that the estimated model satisfies the requirements.

Table A10. D-W test results.

	x-axis Raw Data	x-axis BiLSTM Network Results	z-axis Raw Data	z-axis BiLSTM Network Results
Durbin–Watson test value	1.9997	2.0064	1.9998	1.9903

References

Brown, A.K. Gps/ins uses low-cost mems imu. IEEE Aerosp. Electron. Syst. Mag. 2005, 20, 3–10. [Google Scholar] [CrossRef]
Noureldin, A.; Karamat, T.B.; Eberts, M.D.; El-Shafie, A. Performance enhancement of MEMS-based INS/GPS integration for low-cost navigation applications. IEEE Trans. Veh. Technol. 2008, 58, 1077–1096. [Google Scholar] [CrossRef]
Chia, J.; Low, K.; Goh, S.; Xing, Y. In A low complexity Kalman filter for improving MEMS based gyroscope performance. In Proceedings of the 2016 IEEE Aerospace Conference, Big Sky, MO, USA, 5–12 March 2016; pp. 1–7. [Google Scholar]
Mohammadi, Z.; Salarieh, H. Investigating the effects of quadrature error in parametrically and harmonically excited MEMS rate gyroscopes. Measurement 2016, 87, 152–175. [Google Scholar] [CrossRef]
Zhang, H.; Wu, Y.; Wu, W.; Wu, M.; Hu, X. Improved multi-position calibration for inertial measurement units. Meas. Sci. Technol. 2009, 21, 015107. [Google Scholar] [CrossRef]
Fong, W.; Ong, S.; Nee, A. Methods for in-field user calibration of an inertial measurement unit without external equipment. Meas. Sci. Technol. 2008, 19, 085202. [Google Scholar] [CrossRef]
Huang, L. Auto regressive moving average (ARMA) modeling method for Gyro random noise using a robust Kalman filter. Sensors 2015, 15, 25277–25286. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Narasimhappa, M.; Nayak, J.; Terra, M.H.; Sabat, S.L. ARMA model based adaptive unscented fading Kalman filter for reducing drift of fiber optic gyroscope. Sens. Actuators 2016, 251, 42–51. [Google Scholar] [CrossRef]
Seong, S.M.; Lee, J.G.; Park, C.G. Equivalent ARMA model representation for RLG random errors. IEEE Trans. Aerosp. Electron. Syst. 2000, 36, 286–290. [Google Scholar] [CrossRef]
Quinchia, A.G.; Ferrer, C.; Falco, G.; Falletti, E.; Dovis, F. In Analysis and modelling of MEMS inertial measurement unit. In Proceedings of the 2012 International Conference on Localization and GNSS, Starnberg, Germany, 25–27 June 2012; pp. 1–7. [Google Scholar]
Kang, C.H.; Kim, S.Y.; Park, C.G. Improvement of a low cost MEMS inertial-GPS integrated system using wavelet denoising techniques. Int. J. Aeronaut. Space Sci. 2011, 12, 371–378. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.S.; Yang, T. Modeling and compensation of MEMS gyroscope output data based on support vector machine. Measurement 2012, 45, 922–926. [Google Scholar] [CrossRef]
Bhatt, D.; Aggarwal, P.; Bhattacharya, P.; Devabhaktuni, V. An enhanced mems error modeling approach based on nu-support vector regression. Sensors 2012, 12, 9448–9466. [Google Scholar] [CrossRef] [PubMed] [Green Version]
El-Rabbany, A.; El-Diasty, M. An efficient neural network model for de-noising of MEMS-based inertial data. J. Navig. 2004, 57, 407. [Google Scholar] [CrossRef]
Jiang, C.; Chen, S.; Chen, Y.; Zhang, B.; Feng, Z.; Zhou, H.; Bo, Y. A MEMS IMU de-noising method using long short term memory recurrent neural networks (LSTM-RNN). Sensors 2018, 18, 3470. [Google Scholar] [CrossRef] [Green Version]
Jiang, C.; Chen, S.; Chen, Y.; Bo, Y.; Han, L.; Guo, J.; Feng, Z.; Zhou, H. Performance analysis of a deep simple recurrent unit recurrent neural network (SRU-RNN) in MEMS gyroscope de-noising. Sensors 2018, 18, 4471. [Google Scholar] [CrossRef] [Green Version]
Jiang, C.; Chen, Y.; Chen, S.; Bo, Y.; Li, W.; Tian, W.; Guo, J. A mixed deep recurrent neural network for MEMS gyroscope noise suppressing. Electronics 2019, 8, 181. [Google Scholar] [CrossRef] [Green Version]
Zhu, Z.; Bo, Y.; Jiang, C. A MEMS Gyroscope Noise Suppressing Method Using Neural Architecture Search Neural Network. Math. Probl. Eng. 2019, 2019, 5491243. [Google Scholar] [CrossRef] [Green Version]
Barbour, N.M. Inertial Navigation Sensors; Charles Stark Draper Lab Inc.: Cambridge, MA, USA, 2010. [Google Scholar]
Jafari, M.; Najafabadi, T.A.; Moshiri, B.; Tabatabaei, S.S.; Sahebjameyan, M. PEM stochastic modeling for MEMS inertial sensors in conventional and redundant IMUs. IEEE Sens. J. 2014, 14, 2019–2027. [Google Scholar] [CrossRef]
Cho, J.Y. High-Performance Micromachined Vibratory Rate and Rate-Integrating Gyroscopes. Ph.D. Thesis, University of Michigan, Ann Arbor, MI, USA, 2012. [Google Scholar]
Park, B.S.; Han, K.; Lee, S.; Yu, M. Analysis of compensation for a g-sensitivity scale-factor error for a MEMS vibratory gyroscope. J. Micromech. Microeng. 2015, 25, 115006. [Google Scholar] [CrossRef]
Dean, R.; Flowers, G.; Hodel, S.; MacAllister, K.; Horvath, R. In Vibration Isolation of MEMS Sensors for Aerospace Applications. In Proceedings of the SPIE Proceedings Series; SPIE: Bellingham, WA, USA, 2002; pp. 166–170. [Google Scholar]
Reid, J.R.; Bright, V.M.; Kosinski, J.A. A micromachined vibration isolation system for reducing the vibration sensitivity of surface transverse wave resonators. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 1998, 45, 528–534. [Google Scholar] [CrossRef]
Reid, J.R.; Bright, V.M.; Stewart, J.T.; Kosinski, J.A. In Reducing the normal acceleration sensitivity of surface transverse wave resonators using micromachined isolation systems. In Proceedings of the 1996 IEEE International Frequency Control Symposium, Honolulu, HI, USA, 5–7 June 1996; pp. 464–472. [Google Scholar]
Dean, R.; Flowers, G.; Sanders, N.; Horvath, R.; Johnson, W.; Kranz, M.; Whitley, M. Experimental validation and testing of components for active damping control for micromachined mechanical vibration isolation filters using electrostatic actuation. In Smart Structures and Materials 2006: Smart Electronics, MEMS, BioMEMS, and Nanotechnology; International Society for Optics and Photonics: Bellingham, WA, USA, 2006; p. 61721C. [Google Scholar]
Kim, J.M.; Mok, S.H.; Leeghim, H.; Lee, C.Y. Vibration-Robust Attitude and Heading Reference System Using Windowed Measurement Error Covariance. Int. J. Aeronaut. Space Sci. 2017, 18, 555–564. [Google Scholar] [CrossRef]
Wu, Z.; Yao, M.; Ma, H.; Jia, W. De-noising MEMS inertial sensors for low-cost vehicular attitude estimation based on singular spectrum analysis and independent component analysis. Electron. Lett. 2013, 49, 892–893. [Google Scholar] [CrossRef]
Hao, X.Y.; Li, M.; Han, X.F.; Jia, H.G. In Analysis on the influence of random vibration on MEMS gyro precision and error compensation. In Applied Mechanics and Materials; Trans Tech Publications: Stafa, Switzerland, 2012; pp. 4164–4168. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
Graves, A.; Schmidhuber, J. In Framewise phoneme classification with bidirectional LSTM networks. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; pp. 2047–2052. [Google Scholar]
Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. Mar 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
Kustiawan, I.; Chi, K.H. Handoff decision using a Kalman filter and fuzzy logic in heterogeneous wireless networks. IEEE Commun. Lett. 2015, 19, 2258–2261. [Google Scholar] [CrossRef]
Hosseinyalamdary, S. Deep Kalman filter: Simultaneous multi-sensor integration and modelling; A GNSS/IMU case study. Sensors 2018, 18, 1316. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Peng, C.; Mou, D.; Li, M.; Quan, W. An adaptive filtering approach based on the dynamic variance model for reducing MEMS gyroscope random error. Sensors 2018, 18, 3943. [Google Scholar] [CrossRef] [Green Version]
Yan, Y.; Guo, P.; Liu, L. In A novel hybridization of artificial neural networks and ARIMA models for forecasting resource consumption in an IIS web server. In Proceedings of the 2014 IEEE International Symposium on Software Reliability Engineering Workshops, Naples, Italy, 3–6 November 2014; pp. 437–442. [Google Scholar]
Lőrincz, I.; Tajmar, M. Identification of error sources in high precision weight measurements of gyroscopes. Measurement 2015, 73, 453–461. [Google Scholar] [CrossRef] [Green Version]
Shumway, R.H.; Stoffer, D.S. An approach to time series smoothing and forecasting using the EM algorithm. J. Time Ser. Anal. 1982, 3, 253–264. [Google Scholar] [CrossRef]
Wu, C.J. On the convergence properties of the EM algorithm. Ann. Stat. 1983, 11, 95–103. [Google Scholar] [CrossRef]
Andrieu, C.; Doucet, A. In Online expectation-maximization type algorithms for parameter estimation in general state space models. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, 6–10 April 2003; p. VI-69. [Google Scholar]
Wen, Q.; Ge, Z.; Song, Z. Data-based linear Gaussian state-space model for dynamic process monitoring. AIChE J. 2012, 58, 3763–3776. [Google Scholar] [CrossRef]
Mirikitani, D.; Nikolaev, N. Nonlinear maximum likelihood estimation of electricity spot prices using recurrent neural networks. Neural Comput. Appl. 2011, 20, 79–89. [Google Scholar] [CrossRef]
Moore, T.J.; Sadler, B.M.; Kozick, R.J. Maximum-likelihood estimation, the Cramér–Rao bound, and the method of scoring with parameter constraints. IEEE Trans. Signal. Process. 2008, 56, 895–908. [Google Scholar] [CrossRef]
Hesar, H.D.; Mohebbi, M. An Adaptive Kalman Filter Bank for ECG Denoising. IEEE J. Biomed. Health Inform. 2020, 25, 13–21. [Google Scholar] [CrossRef]
Hartikainen, J.; Solin, A.; Särkkä, S. Optimal Filtering with Kalman Filters and Smoothers—A Manual for MATLAB Toolbox EKF/UKF; Aalto University School of Science: Espoo, Finland, 2011. [Google Scholar]
Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 589. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. Information flow of multi-layer bidirectional long short-term memory (BiLSTM) network.

Figure 2. An illustration of this paper’s proposed method.

Figure 3. Experimental system. (a) MSI3200H inertial measurement unit, (b) vibration table, (c) data acquisition procedure, and (d) power spectral density of linear random vibration loads.

Figure 4. The variation of gyroscope x-axis signal.

Figure 5. Multi-layer BiLSTM network.

Figure 6. Training loss of BiLSTM, LSTM, BiGRU, and GRU.

Figure 7. Comparison of error compensation performance of BiLSTM, LSTM, BiGRU, and GRU. (a) Part of the x-axis data zoomed-in and (b) part of the z-axis data zoomed-in.

Figure 8. Log-likelihood value change of the iteration epoch.

Figure 9. Comparison of error compensation performance of BiLSTM-EM-KF and BiLSTM-ARMA-KF. (a) Part of the x-axis data zoomed-in and (b) part of the z-axis data zoomed-in.

Table 1. Specifications of MSI320H gyroscope.

GYRO	Input range	±1800°/s
	Bias instability (Allan variance)	36°/h
	Angular random walk (Allan variance)	0.4°/√h
	Bandwidth (−3 dB)	≥220 Hz
GENERAL	Sample rate	100 $\sim$ 1000 Hz
	Weight	≤25 g
	Supply voltage	5.0 ± 0.5 V
	RS422 transmission bit rate	921600 bps
	Mechanical shock, any direction	≥20,000 g

Table 2. Specifications used for network training.

The output dimension of dense layer	1
Activation function of dense layer	ReLU
Dropout rate	0.5
Batch size	256
Training epoch	50
Learning rate	0.001

Table 3. Performance with varying values of the input data step.

Number of Hidden Layers	Number of Hidden Units	Input Data Step	STD (°/s)	Time/Epoch
10	64	5	0.1551	23 s
10	64	10	0.1481	38 s
10	64	15	0.1483	60 s
10	64	20	0.1346	82 s
10	64	25	0.1368	98 s
10	64	30	0.1501	115 s

Table 4. Performance with varying values of the number of hidden units.

Number of Hidden Layers	Number of Hidden Units	Input Data Step	STD (°/s)	Time/Epoch
10	8	20	0.1504	82 s
10	16	20	0.1459	82 s
10	32	20	0.1559	81 s
10	64	20	0.1346	82 s
10	128	20	0.1326	81 s
10	256	20	0.1468	88 s

Table 5. Performance with varying values of the number of hidden layers.

Number of Hidden Layers	Number of Hidden Units	Input Data Step	STD (°/s)	Time/Epoch
1	128	20	0.1493	10 s
2	128	20	0.1513	19 s
3	128	20	0.1597	27 s
4	128	20	0.1557	34 s
5	128	20	0.1658	42 s
6	128	20	0.1505	50 s
7	128	20	0.1470	58 s
8	128	20	0.1459	66 s
9	128	20	0.1542	75 s
10	128	20	0.1326	81 s
11	128	20	0.1405	90 s
12	128	20	0.1393	98 s

Table 6. Comparison of raw data, BiLSTM, LSTM, BiGRU, and GRU standard deviation values for x-axis data.

x-axis	STD (°/s)	Percentage
Raw data	0.2493	$-$
BiLSTM	0.1326	53.19%
LSTM	0.1543	61.89%
BiGRU	0.1501	60.21%
GRU	0.1604	64.34%

Table 7. Comparison of raw data, BiLSTM, LSTM, BiGRU, and GRU standard deviation values for z-axis data.

z-axis	STD (°/s)	Percentage
Raw data	0.2400	$-$
BiLSTM	0.1353	56.38%
LSTM	0.1550	64.58%
BiGRU	0.1504	62.67%
GRU	0.1574	65.58%

Table 8. Kalman filter parameters of all measurements.

	$Φ$	$Γ$	$H$	$Q$	$R$
x-axis raw data	$[\begin{matrix} - 0.3126 & 0.8168 & 0.1520 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}]$	$[\begin{matrix} 1 & 0.6885 & - 0.3241 & - 0.0366 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}]$	$[\begin{matrix} 1 & 0 & 0 & 0 \end{matrix}]$	$[\begin{matrix} 0.0370 & 0 & 0 & 0 \\ 0 & 0.0370 & 0 & 0 \\ 0 & 0 & 0.0370 & 0 \\ 0 & 0 & 0 & 0.0370 \end{matrix}]$	$0.0604$
x-axis BiLSTM	$[\begin{matrix} 0.8505 & 0.8891 & - 0.7745 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}]$	$[\begin{matrix} 1 & 0.1354 & - 0.8476 & - 0.0815 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}]$	$[\begin{matrix} 1 & 0 & 0 & 0 \end{matrix}]$	$[\begin{matrix} 0.0036 & 0 & 0 & 0 \\ 0 & 0.0036 & 0 & 0 \\ 0 & 0 & 0.0036 & 0 \\ 0 & 0 & 0 & 0.0036 \end{matrix}]$	$0.0175$
z-axis raw data	$[\begin{matrix} 0.8593 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}]$	$[\begin{matrix} 1 & - 0.5191 & 0.0328 & 0.0236 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}]$	$[\begin{matrix} 1 & 0 & 0 & 0 \end{matrix}]$	$[\begin{matrix} 0.0405 & 0 & 0 & 0 \\ 0 & 0.0405 & 0 & 0 \\ 0 & 0 & 0.0405 & 0 \\ 0 & 0 & 0 & 0.0405 \end{matrix}]$	$0.0596$
z-axis BiLSTM	$[\begin{matrix} 2.0000 & - 1.2268 & 0.2031 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}]$	$[\begin{matrix} 1 & - 1.0446 & 0.0809 & 0.1086 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}]$	$[\begin{matrix} 1 & 0 & 0 & 0 \end{matrix}]$	$[\begin{matrix} 0.0043 & 0 & 0 & 0 \\ 0 & 0.0043 & 0 & 0 \\ 0 & 0 & 0.0043 & 0 \\ 0 & 0 & 0 & 0.0043 \end{matrix}]$	$0.0183$

Table 9. The parameter estimation results.

	$Φ$	$H$	$Q$	$R$
x-axis raw data	0.9723	0.1350	0.0016	0.1181
x-axis BiLSTM	0.9738	0.3422	0.0016	0.2852
z-axis raw data	0.9471	0.1327	0.0065	0.1155
z-axis BiLSTM	0.9530	0.2983	0.0044	0.2430

Table 10. Comparison of raw data, BiLSTM, ARMA-KF, EM-KF, BiLSTM-ARMA-KF, and BiLSTM-EM-KF standard deviation values for x-axis data.

x-axis	STD (°/s)	Percentage
Raw data	0.2493	$-$
BiLSTM	0.1326	53.19%
ARMA-KF	0.1618	64.90%
EM-KF	0.0943	37.83%
BiLSTM-ARMA-KF	0.1201	48.17%
BiLSTM-EM-KF	0.0642	25.75%

Table 11. Comparison of raw data, BiLSTM, ARMA-KF, EM-KF, BiLSTM-ARMA-KF, and BiLSTM-EM-KF standard deviation values for z-axis data.

z-axis	STD (°/s)	Percentage
Raw data	0.2400	$-$
BiLSTM	0.1353	56.38%
ARMA-KF	0.1853	77.21%
EM-KF	0.1098	45.75%
BiLSTM-ARMA-KF	0.1233	51.38%
BiLSTM-EM-KF	0.0958	39.92%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, C.; Cai, S.; Yang, Y.; Xu, W.; Shen, H.; Chu, H. A Combined Method for MEMS Gyroscope Error Compensation Using a Long Short-Term Memory Network and Kalman Filter in Random Vibration Environments. Sensors 2021, 21, 1181. https://0-doi-org.brum.beds.ac.uk/10.3390/s21041181

AMA Style

Zhu C, Cai S, Yang Y, Xu W, Shen H, Chu H. A Combined Method for MEMS Gyroscope Error Compensation Using a Long Short-Term Memory Network and Kalman Filter in Random Vibration Environments. Sensors. 2021; 21(4):1181. https://0-doi-org.brum.beds.ac.uk/10.3390/s21041181

Chicago/Turabian Style

Zhu, Chenhao, Sheng Cai, Yifan Yang, Wei Xu, Honghai Shen, and Hairong Chu. 2021. "A Combined Method for MEMS Gyroscope Error Compensation Using a Long Short-Term Memory Network and Kalman Filter in Random Vibration Environments" Sensors 21, no. 4: 1181. https://0-doi-org.brum.beds.ac.uk/10.3390/s21041181

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Combined Method for MEMS Gyroscope Error Compensation Using a Long Short-Term Memory Network and Kalman Filter in Random Vibration Environments

Abstract

1. Introduction

2. Method

2.1. Multi-Layer BiLSTM Network and Kalman Filter

2.2. Kalman Filter Design with ARMA Model

2.3. Kalman Filter Design with EM Algorithm

3. Experiments and Results

3.1. Data Acquisition

3.2. Comparison of BiLSTM and Other RNN Variants

3.3. Comparison of LSTM-EM-KF and LSTM-ARMA-KF

3.3.1. Estimating Kalman Filter Parameters Using the ARMA Model

3.3.2. Estimating Kalman Filter Parameters Using the EM Algorithm

3.3.3. Kalman Filtering Results

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Stationarity and Normality Tests

Appendix B. Determine the ARMA Model Order

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI