Next Article in Journal
Investigating Tools for Sustainability Assessment of Road Pavements in Europe
Previous Article in Journal
Towards Time-Series Feature Engineering in Automated Machine Learning for Multi-Step-Ahead Forecasting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Outliers Impact on Parameter Estimation of Gaussian and Non-Gaussian State Space Models: A Simulation Study †

by
Fernanda Catarina Pereira
1,*,
Arminda Manuela Gonçalves
2,‡ and
Marco Costa
3,‡
1
Centre of Mathematics, University of Minho, 4710-057 Braga, Portugal
2
Department of Mathematics and Centre of Mathematics, University of Minho, 4710-057 Braga, Portugal
3
Centre for Research and Development in Mathematics and Applications, Águeda School of Technology and Management, University of Aveiro, 3810-193 Aveiro, Portugal
*
Author to whom correspondence should be addressed.
Presented at the 8th International Conference on Time Series and Forecasting, Gran Canaria, Spain, 27–30 June 2022.
These authors contributed equally to this work.
Published: 22 June 2022
(This article belongs to the Proceedings of The 8th International Conference on Time Series and Forecasting)

Abstract

:
State space models are powerful and quite flexible tools that allow systems that vary significantly over time due to their formulation to be dealt with, because the models’ parameters vary over time. Assuming a known distribution of errors, in particular the Gaussian distribution, parameter estimation is usually performed by maximum likelihood. However, in time series data, it is common to have discrepant values that can impact statistical data analysis. This paper presents a simulation study with several scenarios to find out in which situations outliers can affect the maximum likelihood estimators. The results obtained were evaluated in terms of the difference between the maximum likelihood estimate and the true value of the parameter and the rate of valid estimates. It was found that both for Gaussian and exponential errors, outliers had more impact in two situations: when the sample size is small and the autoregressive parameter is close to 1, and when the sample size is large and the autoregressive parameter is close to 0.25.

1. Introduction

There are several books in the literature that describe state space models in detail [1,2,3,4,5]. A major advantage of these models is the possibility of explicitly integrating the unobservable components of a time series by relating to each other stochastically.
State space models have in their structure a latent process, the state, which is not observed. The Kalman filter is typically used to estimate it, as it is a recursive algorithm that, at each time, computes the optimal estimator in the sense that it has the minimum mean squared error of the state when the model is fully specified, and one-step-ahead predictions by updating and improving the predictions of the state vector in real time when new observations become available. The Kalman filter was originally developed by control engineering in the 1960s in one of Kalman’s papers [6] describing a recursive solution to the linear filter problem for discrete time. Today, this algorithm is applied in various areas of study.
Usually, to estimate the unknown parameters of the model, the maximum likelihood method is used by assuming normality of the errors; however, this assumption cannot always be guaranteed. Non-parametric estimation methods can be a strong contribution when it comes to the initial values of iterative methods used to optimize the likelihood function, which often do not verify the convergence of the algorithms due to the initial choice of these parameters. For example, ref. [7] propose estimators based on the generalized method of moments, the distribution-free estimators, where these estimators do not depend on the distribution of errors.
Nevertheless, even if the assumption of normality of errors is not verified, the Kalman filter still returns optimal predictions within the class of all linear estimators. However, the optimal properties of Kalman filter predictors can only be ensured when all state space models’ parameters are known. When the unknown parameter vector is replaced by its estimate, the mean squared error of the estimators is underestimated.
The analysis and modeling of dynamic systems through state space models has been quite useful given its flexibility. In its formulation, the state process is assumed to be a Markov process, allowing optimal predictions of the states and, consequently, observations based only on the optimal estimator of the current state to be obtained.
Despite these advantages, any prediction model is dependent on the quality of the data. Particularly, in many cases, meteorological time series are subject to higher uncertainties, and Kalman filter solutions can be biased [8].
In particular, outliers are an important issue in time series modeling. Time series data are typically dependent on each other and the presence of outliers can impact parameter estimates, forecasting and also inference results [9]. In the presence of incomplete data and outliers in the observed data, ref. [10] developed a modified robust Kalman filter. Ref. [11] showed that linear Gaussian state space models are suitable for estimating the unknown parameters and can consequently affect the state predictions, especially when the measurement error was much larger than the stochasticity of the process. Ref. [12] proposed a non-parametric estimation method based on statistical data depth functions to obtain robust estimates of the mean and the covariance matrix of the asset returns, which is more robust in the presence of outliers, and also does not require parametric assumptions.
This work arose from the project “TO CHAIR—The Optimal Challenges in Irrigation”, in which short-term forecast models, with the state space representation, were developed to model the time series of maximum air temperature. For this project, we analyzed data provided by the University of Trás-os-Montes and Alto Douro, corresponding to the maximum air temperature observed in a farm, located in the district of Bragança, between 20 February and 11 October 2019, and data from the website weatherstack.com, corresponding to the forecasts with a time horizon of 1 to 6 days of the same meteorological variable for the same location. The main goal focused on improving the accuracy of the forecasts for the farm. However, there were some modeling problems, particularly regarding the convergence of the numerical method, which arose in the presence of outliers.
Therefore, to evaluate and compare the quality of the estimates of the unknown parameters of the linear invariant state space model in the presence of outliers, this paper presents four simulation studies: the first is based on the linear Gaussian state space model; the second is based on the linear Gaussian state space model with contaminated observations; the third is based on the linear non-Gaussian state space model with exponential errors; and the last one is based on the linear non-Gaussian state space model with exponential errors and contaminated observations. For each of the four studies, several scenarios were tested, in which 2000 samples with valid estimates of size n ( n = 50 , 200 , 500 ) were simulated. The results obtained were evaluated in terms of the difference between the maximum likelihood estimate and the true value of the parameter and the rate of valid estimates.

2. Simulation Design

In general, the linear univariate state space model is given as follows:
Y t = β t W t + e t , observation equation
β t = μ + ϕ ( β t 1 μ ) + ε t , state equation
where t = 1 , , n is the discrete time and
  • Y t is the observed data;
  • W t is a factor, assumed to be known, that relates the observation Y t to the state β t at time t;
  • { β t } t = 1 , , n A R ( 1 ) , 1 < ϕ < 1 , E ( β t ) = μ , and v a r ( β t ) = σ ε 2 1 ϕ 2 ;
  • E ( e t ) = 0 , E ( e t e s ) = 0 , t s , and v a r ( e t ) = σ e 2 ;
  • E ( ε t ) = 0 , E ( ε t ε s ) = 0 , t s , and v a r ( ε t ) = σ ε 2 ;
  • E ( e t ε s ) = 0 , t , s .
This paper aims to investigate under what conditions the presence of outliers affects the estimation of parameters and states in the state space model. Thus, we simulate time series of size n ( n = 50 , 200 , 500 ) using the model defined by Equations (1) and (2). For simplicity’s sake, we consider for all simulation studies W t = 1 , t , and μ = 0 , that is
Y t = β t + e t ,
β t = ϕ β t 1 + ε t , t = 1 , , n .
To create the contamination scenario, we study real time series concerning maximum air temperature. We used data from two different sources: the first corresponds to daily records of maximum air temperature between 20 February and 11 October 2019 (234 observations) through a portable weather station installed on a farm located in the Bragança district in northeastern Portugal; the second database corresponds to forecasts from the weatherstack.com website. These forecasts have a time horizon of up to 6 days; this means that, for a certain time t, we have forecasts given at times t 6 , t 5 , , t 1 .
So, first we took the difference between the recorded/observed maximum temperature and the website’s forecasts, say, Λ t , ( h ) , where t is the time, in days, and h is the time horizon of the forecasts, h = 1 , , 6 days. Next, we calculated the percentage of outliers of Λ t , ( h ) , whose percentage was on average 5%. Regarding the variable Λ t , ( h ) , outliers were removed and replaced by linear interpolation, say, Λ t , ( h ) * , in order to remove the contamination present in the data, and its mean was subtracted, Λ t , ( h ) * m e a n ( Λ t , ( h ) * ) , so that it had zero mean. Then, for each time horizon h ( h = 1 , , 6 ), the model with a state space representation presented by Equations (3) and (4) was fitted to the data Λ t , ( h ) * m e a n ( Λ t , ( h ) * ) .
In order to establish a relationship between the estimates of parameters ϕ , σ ε 2 and σ e 2 , that were obtained from the “non-contaminated” data, and the magnitude of the outliers of Λ t , ( h ) , the linear regression model was fitted, whose relationship is given by
k = 1.8874 + 3.5161 σ ε 2 1 ϕ 2 + σ e 2
where k = | outliers of Λ t , ( h ) mean of Λ t , ( h ) without outliers | , is the magnitude of the outliers, and σ ε 2 1 ϕ 2 + σ e 2 is the total variance of Y t . In total, Λ t , ( h ) ( h = 1 , , 6 ) shows 59 outliers.
In this work, four simulation scenarios were tested:
  • The first is based on the linear Gaussian state space model given by
    Y t = β t + e t , e t N ( 0 , σ e 2 ) β t = ϕ β t 1 + ε t , ε t N ( 0 , σ ε 2 ) , t = 1 , , n
  • The second is based on the linear Gaussian state space model with contaminated observations.
    To contaminate the model, the deterministic factor k, given in (5), is added in this way
    Y t = β t + e t + I t k , e t N ( 0 , σ e 2 ) β t = ϕ β t 1 + ε t , ε t N ( 0 , σ ε 2 ) ,
    where I t B ( 1 , 0.05 ) .
  • The third is based on the linear non-Gaussian state space model with exponential errors defined by
    Y t = β t + e t , e t Exp ( λ e ) 1 λ e β t = ϕ β t 1 + ε t , ε t Exp ( λ ε ) 1 λ ε , t = 1 , , n
  • The last one is based on the linear non-Gaussian state space model with exponential errors and contaminated observations. Similar to scenario 2, we have
    Y t = β t + e t + I t k , e t Exp ( λ e ) 1 λ e β t = ϕ β t 1 + ε t , ε t Exp ( λ ε ) 1 λ ε , t = 1 , , n
    where I t B ( 1 , 0.05 ) , and k given in (5).
For each of the four scenarios, sample sizes of n = 50 , 200 , 500 were simulated. In this study, a range of values were simulated for ϕ (0.25, 0.75), and σ ε 2 and σ e 2 (0.10, 1.00, 5.00, 0.10, 2.00, 0.05). For each parameter combination, 2000 replicates with valid estimates were considered, i.e., estimates within the parameter space: 1 < ϕ < 1 , σ ε > 0 , and σ e > 0 . In all simulations, we take the initial state β 0 = 0 in the Kalman filter.
To evaluate the quality of the parameter estimates, we considered the Root Mean Square Error (RMSE),
RMSE ( Θ ) = 1 2000 i = 1 2000 Θ i Θ ^ i 2
the Mean Absolute Error (MAE),
MAE ( Θ ) = 1 2000 i = 1 2000 Θ i Θ ^ i
the Mean Absolute Percentage Error (MAPE),
MAPE ( Θ ) = 1 2000 i = 1 2000 Θ i Θ ^ i Θ i × 100
Θ = ( ϕ , σ ε 2 , σ e 2 ) and the convergence rate. The convergence rate provides information about the percentage of valid estimates among all simulations (simulations with valid and non-valid estimates). The convergence rate is given by the number of valid simulated estimates (in this case, 2000) divided by the number of total simulations.
To estimate the unknown parameters of the state space model (3) and (4) Θ = ( ϕ , σ ε 2 , σ e 2 ) of each simulation, the maximum likelihood method was used by assuming the normality of the disturbances for all four scenarios. Log-likelihood maximization was performed by the Newton–Raphson numerical method. In this study, the R package “astsa” was used [3,13,14].

3. Results

In this section, the simulation results are presented. Table 1, Table 2 and Table 3 present the results of the simulations in terms of the RMSE, MAE, MAPE (%) and the convergence rate (%) for sample sizes n = 50, n = 200 and n = 500, respectively, considering both non-contaminated (NC) and contaminated Gaussian errors. Table 4, Table 5 and Table 6 show the simulation results considering contaminated and non-contaminated exponential errors.
As expected, contamination had an impact on the performance of the maximum likelihood estimators.
First, it is seen that for small sample sizes and non-contaminated errors, the convergence rate tends to decrease. For example, for n = 500 in the case of non-contaminated Gaussian errors, the convergence rate was over 72%, while for n = 50 , it was over 57%. For contaminated Gaussian and exponential errors, the convergence rate decreased compared to non-contaminated errors.
Overall, an improvement in the rate of valid estimates (convergence rate) is noticeable when ϕ = 0.75 compared to ϕ = 0.25 in the case of non-contaminated Gaussian and exponential errors. In the case of contaminated Gaussian and exponential errors, this behavior only occurred when n = 500 .
When the errors are not contaminated, the RMSE, MAE and MAPE tend to decrease with increasing sample size. However, this premise is not true when the errors are contaminated. In fact, it was found that for both Gaussian and exponential errors, outliers had more impact in two situations: when ϕ = 0.75 and n = 50 (Table 1 and Table 4); and when ϕ = 0.25 and n = 500 (Table 3 and Table 6). This impact is reflected in the RMSE, MAE and MAPE, which produced very high values.
Furthermore, there are many cases where, for example, the RMSE of the estimators of the contaminated errors are 3 times higher than the RMSE of the non-contaminated errors. For example, in the case of the Gaussian errors with n = 500, ϕ = 0.25 , σ ε 2 = 0.10 and σ e 2 = 0.05 , the RMSE of ϕ , σ ε 2 and σ e 2 of the contaminated Gaussian errors were about 3, 6 and 11 times higher, respectively, compared to the non-contaminated Gaussian errors (Table 3).
On the other hand, comparing both the Gaussian and exponential error cases, we find that there are no significant differences in the convergence rate, as well as in the efficiency of the autoregressive ϕ estimator. However, the RMSE, MAE and MAPE of the variance estimators, σ ε 2 and σ e 2 , are in general higher in the case of exponential errors.

4. Discussion

In this work, outliers were found to impact the performance of the Maximum Likelihood estimators. In particular, it was found through the simulation study that outliers have a very significant impact in both cases: when the sample size is small and the autoregressive parameter is close to 1, and when the sample size is large and the autoregressive parameter is close to 0.25. This impact was reflected in the RMSE, MAE and MAPE values which, in many cases, were higher compared to the case of non-contaminated errors.
Moreover, we notice that the rate of valid estimates (convergence rate) is higher for large sample sizes, and is more evident for non-contaminated Gaussian and exponential errors. On the other hand, it is also important to have large sample sizes to avoid problems related to parameter estimation [11]. In general, the convergence rate is lower when Gaussian and exponential errors are contaminated.
Therefore, our next step is to develop methods to detect outliers in time series and/or to establish other estimation methods that are more robust, in the sense that they do not assume a distribution of the data and are less sensitive to outliers.
In this work, the outliers were generated from a regression model that established a linear relationship between the magnitude of the outliers and the total variance of the model with the state space representation of maximum air temperature real data. The rate of outliers from the real data was 5%; thus, this was the percentage used in this work.
In the literature, we did not find a unanimous approach for doing this. For example, ref. [15] contaminated the error of the zero-mean Gaussian equation of state by replacing the standard deviation of the observation error with a 10-times-higher standard deviation with a probability of 10% (symmetric outliers). They also considered the case of asymmetric outliers, where the zero mean of the observation error was replaced with a value 10 times higher than the standard deviation with a probability of 10%. Ref. [16] followed the same line as [15], but in this case they call symmetric outliers “zero-mean” and asymmetric outliers “non-zero”, considering the probability of contamination to be 5%. Ref. [9] contaminated both the observation and state equation errors, considering the magnitude of the outliers equal to 2.5 the standard deviation from the diagonal elements of the observation and state covariance matrices, respectively.

Author Contributions

F.C.P., A.M.G. and M.C. contributed to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by FEDER/COMPETE/NORTE 2020/POCI/FCT funds through grants UID/EEA/-00147/20 13/UID/IEEA/00147/006933-SYSTEC project and To CHAIR - POCI-01-0145-FEDER-028247. A. Manuela Gonçalves was partially financed by Portuguese Funds through FCT (Fundação para a Ciência e a Tecnologia) within the Projects UIDB/00013/2020 and UIDP/00013/2020 of CMAT-UM. Marco Costa was partially supported by The Center for Research and Development in Mathematics and Applications (CIDMA) through the Portuguese Foundation for Science and Technology (FCT-Fundação para a Ciência e a Tecnologia), references UIDB/04106/2020 and UIDP/04106/2020. F. Catarina Pereira was financed by national funds through FCT (Fundação para a Ciência e a Tecnologia) through the individual PhD research grant UI/BD/150967/2021 of CMAT-UM.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994. [Google Scholar]
  2. Harvey, A.C. Forecasting, Structural Time Series Models and the Kalman Filter; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  3. Shumway, R.H.; Stoffer, D.S. Time Series Analysis and its Applications: With R Examples; Springer: New York, NY, USA, 2017. [Google Scholar]
  4. Petris, G.; Petrone, S.; Campagnoli, P. Dynamic Linear Models with R; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  5. Durbin, J.; Koopman, S. Time Series Analysis by State Space Methods; Oxford University Press: Oxford, UK, 2001. [Google Scholar]
  6. Kalman, R. A New Approach to Linear Filtering and Prediction Problems. ASME J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
  7. Costa, M.; Alpuim, T. Parameter estimation of state space models for univariate observations. J. Stat. Plan. Inference 2010, 140, 1889–1902. [Google Scholar] [CrossRef]
  8. Costa, M.; Monteiro, M. Bias-correction of kalman filter estimators associated to a linear state space model with estimated parameters. J. Stat. Plan. Inference 2016, 176, 22–32. [Google Scholar] [CrossRef] [Green Version]
  9. You, D.; Hunter, M.; Chen, M.; Chow, S.M. A diagnostic procedure for detecting outliers in linear state-space models. Multivar. Behav. Res. 2020, 55, 231–255. [Google Scholar] [CrossRef] [PubMed]
  10. Cipra, T.; Romera, R. Kalman filter with outliers and missing observations. Test 1997, 6, 379–395. [Google Scholar] [CrossRef]
  11. Auger-Méthé, M.; Field, C.; Albertsen, C.M.; Derocher, A.E.; Lewis, M.A.; Jonsen, I.D.; Flemming, J.M. State-space models’ dirty little secrets: Even simple linear Gaussian models can have estimation problems. Sci. Rep. 2016, 6, 26677. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Pandolfo, G.; Iorio, C.; Siciliano, R.; D’Ambrosio, A. Robust mean-variance portfolio through the weighted Lp depth function. In Annals of Operations Research; Springer: Berlin/Heidelberg, Germany, 2020; Volume 292, pp. 519–531. [Google Scholar]
  13. Shumway, R.H.; Stoffer, D.S. Time Series: A Data Analysis Approach Using R; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
  14. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  15. Crevits, R.; Croux, C. Robust estimation of linear state space models. Commun. Stat.-Simul. Comput. 2019, 48, 1694–1705. [Google Scholar] [CrossRef] [Green Version]
  16. Ali, K.; Tahir, M. Maximum likelihood-based robust state estimation over a horizon length during measurement outliers. Trans. Inst. Meas. Control 2021, 43, 510–518. [Google Scholar] [CrossRef]
Table 1. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 50 , considering Gaussian errors (NC = Non-Contaminated; C = Contaminated).
Table 1. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 50 , considering Gaussian errors (NC = Non-Contaminated; C = Contaminated).
Parameters RMSEMAEMAPE (%)Convergence Rate
ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 (%)
0.250.100.05NC0.25420.05630.05020.18940.04840.045275.742348.338390.4927 2000 2679 75 %
C0.38230.33250.30950.29830.23750.2052119.3057237.5420410.4236 2000 3059 65 %
1.000.10NC0.25980.46380.36990.19340.36340.263277.352836.3408263.1570 2000 2466 81 %
C0.35141.35201.23190.27171.09570.7552108.6993109.5651755.1521 2000 2295 87 %
5.002.00NC0.28802.70782.28640.21842.28371.978787.352045.674698.9341 2000 2515 80 %
C0.38206.35805.64670.29945.14464.1499119.7610102.8927207.4966 2000 2223 90 %
0.101.00NC0.33200.65800.66300.26340.47940.5380105.3703479.354853.7974 2000 3520 57 %
C0.48511.53451.17600.39161.06720.9876156.65581067.186098.7612 2000 2533 79 %
2.005.00NC0.30973.36163.37380.24042.66562.781396.1717133.278555.6251 2000 3137 64 %
C0.47356.86575.61240.37064.89764.7397148.2420244.881494.7940 2000 2265 88 %
0.050.10NC0.26720.07500.07270.20770.06210.062083.0659124.101862.0471 2000 3015 66 %
C0.44730.31980.36760.35850.22160.2602143.4173443.1005260.2065 2000 3033 66 %
0.750.100.05NC0.15950.05030.03670.12280.04130.030916.368741.279761.7597 2000 2265 88 %
C0.43560.34440.51650.30190.19160.353340.2592191.5928706.5637 2000 3843 52 %
1.000.10NC0.11900.34300.18850.09170.27280.140812.226127.2783140.7727 2000 2374 84 %
C0.30561.38902.38120.20710.91841.794927.616191.84021794.8840 2000 2552 78 %
5.002.00NC0.13642.22491.58570.10621.83821.311114.165336.763365.5527 2000 2220 90 %
C0.28996.822010.81050.18974.51068.579725.298390.2117428.9829 2000 2192 91 %
0.101.00NC0.31520.48490.49720.24100.30090.366632.1341300.855936.6612 2000 2695 74 %
C0.56111.42251.46930.39810.81591.203753.0740815.9315120.3714 2000 2751 73 %
2.005.00NC0.23622.61492.47550.17841.88781.911423.793194.391438.2272 2000 2228 90 %
C0.44796.70067.53370.30854.21986.140241.1287210.9902122.8036 2000 2302 87 %
0.050.10NC0.22960.05820.05260.17430.04290.041423.245685.721241.3812 2000 2223 90 %
C0.51480.30890.43490.37310.17870.317549.7412357.4894317.4564 2000 3234 62 %
Table 2. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 200 , considering Gaussian errors (NC = Non-Contaminated; C = Contaminated).
Table 2. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 200 , considering Gaussian errors (NC = Non-Contaminated; C = Contaminated).
Parameters RMSEMAEMAPE (%)Convergence Rate
ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 (%)
0.250.100.05NC0.21250.05330.04860.15520.04810.044562.084348.063789.0227 2000 2142 93 %
C0.44140.29370.39730.35110.21700.3170140.4550216.9786634.0003 2000 3415 59 %
1.000.10NC0.18270.38720.33420.12630.28900.246650.517228.8980246.6108 2000 2158 93 %
C0.33021.24941.36550.25311.11830.8983101.2448111.8291898.3335 2000 2339 86 %
5.002.00NC0.22572.48572.22980.16472.15841.965665.870943.167698.2816 2000 2114 95 %
C0.32106.03535.75850.25255.28434.1940101.0001105.6860209.6988 2000 2171 92 %
0.101.00NC0.32940.59100.59520.26930.42030.4418107.7206420.323044.1772 2000 3064 65 %
C0.50791.14901.25650.41590.70941.1202166.3406709.3999112.0214 2000 2934 68 %
2.005.00NC0.29423.10513.03460.23522.48882.420494.0959124.437748.4077 2000 2432 82 %
C0.48885.20055.97690.39323.60315.3909157.2640180.1539107.8174 2000 2355 85 %
0.050.10NC0.25120.06900.06720.19920.05740.055579.6690114.729955.4981 2000 2353 85 %
C0.49920.28980.38410.40260.19510.3182161.0268390.1951318.2143 2000 3550 56 %
0.750.100.05NC0.07910.03060.02230.06130.02430.01778.174124.257435.4005 2000 2020 99 %
C0.32490.17740.63070.19980.10420.562226.6395104.15521124.3160 2000 5655 35 %
1.000.10NC0.05570.18380.10570.04420.14680.08595.896614.682385.8500 2000 2175 92 %
C0.27260.71852.59980.14590.51132.315819.452951.13452315.7740 2000 3564 56 %
5.002.00NC0.07631.42590.99460.05961.14140.79717.948422.827139.8563 2000 2022 99 %
C0.12413.432410.95910.09442.395010.075612.584347.8992503.7812 2000 2054 97 %
0.101.00NC0.24570.34090.33480.17790.18360.211623.7169183.583321.1571 2000 2139 94 %
C0.46900.86621.51810.31370.40361.399441.8257403.6151139.9393 2000 2673 75 %
2.005.00NC0.12931.46091.33630.09431.00840.969112.567250.417519.3819 2000 2012 99 %
C0.32333.22707.98950.18261.88407.336024.349394.1989146.7208 2000 2320 86 %
0.050.10NC0.12460.03260.02910.09270.02320.021612.354746.395621.6304 2000 2025 99 %
C0.36330.20140.48950.24100.10210.437332.1288204.1943437.3301 2000 4233 47 %
Table 3. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 500 , considering Gaussian errors (NC = non-contaminated; C = contaminated).
Table 3. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 500 , considering Gaussian errors (NC = non-contaminated; C = contaminated).
Parameters RMSEMAEMAPE (%)Convergence Rate
ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 (%)
0.250.100.05NC0.16700.04870.04510.12460.04400.041049.826243.967581.9983 2000 2090 96 %
C0.50990.28430.49460.42660.20270.4261170.6503202.6937852.1144 2000 3936 51 %
1.000.10NC0.13220.32120.28340.09260.24110.214237.046924.1096214.2324 2000 2073 96 %
C0.37571.05111.67600.29540.92101.3679118.179592.10251367.9340 2000 2309 87 %
5.002.00NC0.16652.19732.02040.12191.94011.801848.773738.802290.0924 2000 2015 99 %
C0.35715.23137.11970.27454.49276.0116109.783689.8549300.5794 2000 2154 93 %
0.101.00NC0.31860.51570.51570.26550.34970.3555106.1993349.717235.5477 2000 2793 72 %
C0.56601.00721.28700.47350.59181.1856189.3834591.7733118.5649 2000 2666 75 %
2.005.00NC0.25962.70022.64260.20802.12822.052983.2062106.411141.0575 2000 2102 95 %
C0.43274.92155.70570.34493.48615.2227137.9698174.3029104.4536 2000 2347 85 %
0.050.10NC0.23750.06450.06280.19000.05280.051075.9833105.588150.9948 2000 2082 96 %
C0.57870.26910.49080.50390.16350.4430201.5559326.9245443.0177 2000 4751 42 %
0.750.100.05NC0.04770.01950.01420.03730.01540.01144.977115.351622.7729 2000 2003 100 %
C0.16960.08170.66180.11060.05490.645514.753254.87531291.0500 2000 2501 80 %
1.000.10NC0.03950.13430.07820.03180.10810.06474.234110.814764.6665 2000 2090 96 %
C0.07320.36632.57600.05870.28342.50037.821328.34052500.3230 2000 2815 71 %
5.002.00NC0.04740.94270.66000.03710.74850.52284.947714.970026.1379 2000 2005 100 %
C0.07441.927310.46520.05781.429310.12917.712628.5863506.4527 2000 2020 99 %
0.101.00NC0.17320.20680.20270.12190.10110.117416.2546101.106111.7441 2000 2001 100 %
C0.24390.51091.57060.20010.21511.508726.6834215.1086150.8725 2000 2390 84 %
2.005.00NC0.07230.75140.73000.05540.56230.56637.381228.117011.3252 2000 2000 100 %
C0.11621.60957.96010.09031.05267.654112.034252.6321153.0817 2000 2014 99 %
0.050.10NC0.06890.01750.01590.05280.01310.01227.034126.251912.2090 2000 2002 100 %
C0.19140.12010.58320.15340.05470.563520.4470109.4875563.4770 2000 2293 87 %
Table 4. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 50 , considering exponential errors (NC = non-contaminated; C = contaminated).
Table 4. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 50 , considering exponential errors (NC = non-contaminated; C = contaminated).
Parameters RMSEMAEMAPE (%)Convergence rate
ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 (%)
0.250.100.05NC0.24030.06210.05200.17990.05040.045771.967250.376391.4927 2000 2635 76 %
C0.39950.33990.32740.31340.24280.2110125.3731242.7797422.0013 2000 3048 66 %
1.000.10NC0.25910.53150.39830.19340.43200.263577.356743.2020263.5235 2000 2442 82 %
C0.35161.35251.31380.27441.09540.7735109.7515109.5363773.5130 2000 2313 86 %
5.002.00NC0.28103.06012.43200.21402.49091.994785.612649.818799.7367 2000 2489 80 %
C0.37886.30865.64190.29715.07204.0209118.8333101.4390201.0444 2000 2221 90 %
0.101.00NC0.33520.70700.71210.26560.49580.6145106.2327495.786561.4506 2000 3845 52 %
C0.49791.49521.20900.40451.03521.0202161.78471035.2270102.0164 2000 2500 80 %
2.005.00NC0.30363.50203.61590.23592.70183.095694.3541135.091261.9127 2000 3273 61 %
C0.47567.42545.79650.37645.23034.8901150.5748261.516497.8024 2000 2281 88 %
0.050.10NC0.27120.08180.07750.20890.06440.067683.5522128.878567.6088 2000 3045 66 %
C0.45050.34980.34210.35730.24850.2372142.9146496.9232237.1900 2000 3014 66 %
0.750.100.05NC0.16110.05640.03980.12460.04500.032216.609945.005964.3029 2000 2273 88 %
C0.43590.32230.54050.30330.18750.375040.4453187.5322750.0313 2000 3694 54 %
1.000.10NC0.11750.44880.19290.09250.35740.139712.327535.7367139.7342 2000 2364 85 %
C0.34331.36622.51330.22840.92721.879030.453792.72181879.0050 2000 2470 81 %
5.002.00NC0.14482.70201.71890.11202.12161.360914.929442.432368.0429 2000 2181 92 %
C0.30006.817911.01490.19774.54878.627826.355590.9748431.3905 2000 2176 92 %
0.101.00NC0.30930.49450.56220.23680.30070.452431.5769300.722845.2368 2000 2672 75 %
C0.57651.38061.53930.41030.78171.249054.7124781.7267124.9006 2000 2792 72 %
2.005.00NC0.23942.86412.91440.18011.98102.340824.017299.048746.8162 2000 2221 90 %
C0.46886.93407.53280.32414.30066.039343.2112215.0307120.7854 2000 2277 88 %
0.050.10NC0.23450.06140.05940.17670.04370.048423.559987.392648.3987 2000 2246 89 %
C0.53990.34050.42590.40390.20830.302453.8548416.5115302.3952 2000 3314 60 %
Table 5. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 200 , considering exponential errors (NC = non-contaminated; C = contaminated).
Table 5. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 200 , considering exponential errors (NC = non-contaminated; C = contaminated).
Parameters RMSEMAEMAPE (%)Convergence Rate
ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 (%)
0.250.100.05NC0.21950.05670.05020.16050.05010.045864.177950.074791.6168 2000 2185 92 %
C0.44890.29190.40620.35890.21590.3230143.5695215.8788645.9832 2000 3303 61 %
1.000.10NC0.19420.42470.35100.13430.33040.255053.722233.0384255.0102 2000 2194 91 %
C0.32751.22291.38290.25201.09160.8967100.8199109.1563896.7205 2000 2299 87 %
5.002.00NC0.23522.62332.29580.17092.24502.003068.341844.8991100.1478 2000 2120 94 %
C0.31576.04855.85890.24915.24274.230399.6284104.8537211.5132 2000 2170 92 %
0.101.00NC0.32630.60680.60710.26980.42640.4734107.9090426.431347.3421 2000 3235 62 %
C0.50631.17971.26150.41360.74301.1099165.4252743.0232110.9928 2000 2793 72 %
2.005.00NC0.28713.08733.07560.22862.42942.463091.4476121.471049.2608 2000 2409 83 %
C0.48005.34335.88930.38783.71795.2539155.1106185.8963105.0775 2000 2349 85 %
0.050.10NC0.25470.07060.06890.20400.05820.057681.5998116.383257.6243 2000 2381 84 %
C0.48610.28240.36720.39590.19230.3042158.3634384.6917304.2286 2000 3600 56 %
0.750.100.05NC0.07960.03500.02330.06170.02740.01868.231527.433937.1357 2000 2037 98 %
C0.32720.20140.61010.19530.11140.540426.0350111.35101080.8030 2000 5646 35 %
1.000.10NC0.05970.25080.10850.04740.19940.08736.324119.942787.2755 2000 2177 92 %
C0.28910.72512.60680.14670.51482.336819.557651.47622336.7630 2000 3530 57 %
5.002.00NC0.07461.63291.04660.05941.31460.84397.915726.291042.1946 2000 2025 99 %
C0.12723.699910.66050.09402.52989.785012.528850.5952489.2506 2000 2058 97 %
0.101.00NC0.23970.33970.36130.17280.18070.256623.0433180.713825.6634 2000 2212 90 %
C0.44770.81761.55420.30420.38491.421840.5591384.9222142.1828 2000 2667 75 %
2.005.00NC0.12961.51551.54290.09511.05381.174412.681452.691023.4870 2000 2015 99 %
C0.34113.22868.13960.19061.86997.460425.419993.4933149.2072 2000 2354 85 %
0.050.10NC0.11990.03260.03270.08880.02350.025311.836946.994225.2911 2000 2029 99 %
C0.40570.19710.49530.26360.09800.441035.1415196.0500440.9552 2000 4253 47 %
Table 6. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 500 , considering exponential errors (NC = non-contaminated; C = contaminated).
Table 6. RMSE, MAE, MAPE and convergence rate of Θ with 2000 simulations of sample sizes n = 500 , considering exponential errors (NC = non-contaminated; C = contaminated).
Parameters RMSEMAEMAPE (%)Convergence Rate
ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 ϕ σ ε 2 σ e 2 (%)
0.250.100.05NC0.17740.05020.04590.12950.04490.041551.813344.929883.0451 2000 2101 95 %
C0.51050.27960.49230.43120.20000.4234172.4715200.0235846.7976 2000 3882 52 %
1.000.10NC0.13600.33930.28900.09250.25830.215737.014125.8315215.7496 2000 2068 97 %
C0.37511.04261.69100.29380.91171.3735117.507891.16861373.5180 2000 2331 86 %
5.002.00NC0.17042.25032.04700.12341.96251.802049.353439.249390.0986 2000 2017 99 %
C0.34795.30397.06870.26794.52855.9227107.178090.5707296.1330 2000 2129 94 %
0.101.00NC0.31310.53000.54110.25970.37030.3968103.8980370.319839.6789 2000 2816 71 %
C0.55971.02121.27690.46840.60741.1618187.3698607.3962116.1793 2000 2652 75 %
2.005.00NC0.26012.72882.72330.20732.16202.146382.9108108.097942.9251 2000 2126 94 %
C0.43064.90825.72490.34523.50715.1919138.0972175.3540103.8379 2000 2317 86 %
0.050.10NC0.23760.06450.06270.19010.05240.051076.0432104.723550.9923 2000 2074 96 %
C0.58580.24540.49840.51180.14460.4560204.7275289.2989456.0194 2000 4715 42 %
0.750.100.05NC0.04750.02310.01550.03720.01810.01234.972018.100224.5343 2000 2004 100 %
C0.15910.08420.66090.10620.05450.645414.155054.51971290.8090 2000 2474 81 %
1.000.10NC0.03840.17040.07730.03070.13730.06434.094113.726064.3178 2000 2080 96 %
C0.07210.36742.56270.05810.28822.48627.752728.82332486.2120 2000 2794 72 %
5.002.00NC0.04661.09460.68050.03700.85770.53774.926717.153626.8825 2000 2003 100 %
C0.07241.911610.68440.05721.445610.34967.629728.9121517.4798 2000 2039 98 %
0.101.00NC0.17250.19450.21490.12110.09880.147316.148698.810514.7318 2000 2010 100 %
C0.24920.51421.56270.19720.20801.497426.2946208.0308149.7398 2000 2384 84 %
2.005.00NC0.07600.82520.92830.05760.60680.73087.684230.342114.6165 2000 2001 100 %
C0.12041.71237.98690.09341.07457.665912.459253.7261153.3177 2000 2013 99 %
0.050.10NC0.06840.01850.01900.05240.01380.01496.985527.696214.9040 2000 2001 100 %
C0.19080.11620.58540.15350.05310.566520.4642106.2381566.4536 2000 2334 86 %
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pereira, F.C.; Gonçalves, A.M.; Costa, M. Outliers Impact on Parameter Estimation of Gaussian and Non-Gaussian State Space Models: A Simulation Study. Eng. Proc. 2022, 18, 31. https://0-doi-org.brum.beds.ac.uk/10.3390/engproc2022018031

AMA Style

Pereira FC, Gonçalves AM, Costa M. Outliers Impact on Parameter Estimation of Gaussian and Non-Gaussian State Space Models: A Simulation Study. Engineering Proceedings. 2022; 18(1):31. https://0-doi-org.brum.beds.ac.uk/10.3390/engproc2022018031

Chicago/Turabian Style

Pereira, Fernanda Catarina, Arminda Manuela Gonçalves, and Marco Costa. 2022. "Outliers Impact on Parameter Estimation of Gaussian and Non-Gaussian State Space Models: A Simulation Study" Engineering Proceedings 18, no. 1: 31. https://0-doi-org.brum.beds.ac.uk/10.3390/engproc2022018031

Article Metrics

Back to TopTop