Common metrics for Time Series Analysis

joydeep bhattacharjee
7 min readDec 29, 2019

Whenever our data has a temporal component attached to it, or in other words there is a chance that the current value under scanner may be dependent on past values of the same variable, or the placement of the variable in the whole sequence, we should do time series analysis. Example of such cases may be demand forecasting where we are trying to understand the demand of an item, sometime in the future. Its common sense that ice-cream would sell more in the summer. If 1000 apples sold in a particular store last month, it is unlikely that it will sell 2000 in the current month. In such scenarios, we use techniques that come under the purview of time series analysis. In time series analysis we are trying to forecast some dimension, like future sales of the product.

Let us say we are trying to understand future sales of apples. Before you begin forecasting, you should understand how to measure the quality of the predictions. There are many metrics that can be used for this.

R-squared

The fundamental definition of the coefficient of determination R² is

where SSres is the sum of squared residuals from the predicted values and SStot is the sum of squared deviations of the dependent variable from the sample mean. It means how much of the variance in the dependent variable can explain the variance in the independent variable. High value means the variance in the model is similar to the variance in the true values and if the R2 value is low it means that the two values are not much correlated.

The important point to note in the case of R-squared is that, it does not show if the model is satisfactory future predictions or not. It shows if the model is a good fit observed values and how good of a “fit” it is. High R² means that the correlation between observed and predicted values is high. When thinking time series, generally we start thinking of how well it was able to predict future values then how good it was able to fit on past values. Another disadvantage is that the R2 value will increase by increasing the number of features. Hence it is very easy for researchers to fool themselves.

If the underlying predictions do not have a constant term in the model, then R2 values are not very meaningful. In autoregressive models, the constant term (or drift in time series terminology) is generally eliminated.

Check out a possible python implementation for the code

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_observed_data = sum(y_true) / len(y_true)
>>> ss_tot = sum([(y - mean_observed_data)**2 for y in y_true])
>>> ss_res = sum([(y - y_hat)**2 for y, y_hat in zip(y_true, y_pred)])
>>> div = ss_res/ss_tot
>>> r2 = 1 - div
>>> print(r2)
0.9486081370449679

In the real scenario you should be using a function from a well tested library such as scikit-learn

>>> from sklearn.metrics import r2_score
>>> r2_score(y_true, y_pred)
0.9486081370449679

Mean Absolute Error

Mean absolute error is the average of the absolute values of the deviation. This type of error measurement is useful when measuring prediction errors in the same unit as the original series. See the below formula

Thus MAE will tell you how big of an error can you expect from the forecast on average. Its is quite robust to ourliers. Hence looking at MAE is useful if the training data is corrupted with outliers and there are huge positive/negative values in our data which you believe might not be the case in the future. Also this error measure is easy to interpret. Hence if the data is homogeneous, use this error measure to compare between different models.

Keep in mind that the MAE is not unique and hence might seem to show schizophrenic behaviour.

Check out the python implementation and the usage using scikitlearn library below

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> import math
>>> ss_res = sum([math.fabs(y - y_hat) for y, y_hat in zip(y_true, y_pred)])
>>> mae = ss_res/len(y_true)
>>> print(mae)
0.5
>>> from sklearn.metrics import mean_absolute_error
>>> print(mean_absolute_error(y_true, y_pred))
0.5

Median Absolute Error

Median absolute error(MedAE) is similar to the MAE. To calculate MedAE, take the absolute differences and then find the median value.

Similar to MAE the distribution needs to be homogeneous and equally spaced to be useful. An interesting advantage here is that this score allows for missing values. Using median is an extreme way of triming extreme values. Hence median absolute error reduces the bias in favor of low forecasts.

Check out the python implementation and the usage using sckitlearn library

>>> import statistics
>>> import math
>>> abs_errors = [math.fabs(x - y) for x, y in zip(y_true, y_pred)]
>>> med_ae = statistics.median(errors)
>>> print(med_ae)
0.5
>>> from sklearn.metrics import median_absolute_error
>>> print(median_absolute_error(y_true, y_pred))
0.5

Mean Squared Error

The mean squared error is the average of the square of the forecast error. As the square of the errors are taken, the effect is that larger errors have more weight on the score.

This is one of the most measures to evaluate and find models, but once the model is found, generally other error measures such as MAE is used. For example, finding that the average forecast can be off by ±5% is a useful result in and of itself, but an MSE of 65.34 is harder to understand. A way of dealing with this is by taking the root of the sum of the errors before dividing it with sample size. This is called Root Mean Square Error and has the advantage of being in the same unit as the forecast variable.

Also since both MSE and RMSE takes the square of the errors, outliers will have a huge effect on the resulting error.

Check out the python implementation and the usage using sckit-learn

>>> from sklearn.metrics import mean_squared_error
>>> print(mean_squared_error(y_pred, y_true))
0.375
>>> err_sq = sum([(y - y_hat)**2 for y, y_hat in zip(y_true, y_pred)])
>>> mse = err_sq/len(y_true)
>>> print(mse)
0.375

Mean Absolute Percentage Error

In the above cases the underlying ball park of where the data is not understood from the error measure itself. This is where mean absolute percentage error comes in as this is a percentage error and hence gives a good idea of the relative error. The formula is given by

The problem with this approach is when the forecast series can have small denominators and hence there might be chances of zero division or the value blowing up.

MAPE also puts a heavy penalty on negative errors where y<yhat. As a consequence, when MAPE is used to compare the accuracy of predictions it will select a method whose forecasts are too low.

When going through literature or when doing analysis, you will encounter all these error measures as well as a lot more. Be careful of the pitfalls of each error measure to see if they will behave well in your context. Various models use either the mean squared error, mean absolute error or mean absolute percentage error to identify the optimum parameter. Keep note that these are all measures of best fit on seen data and hence all analysis will go to a toss once black swan events occur.

There is no implementation in scikit-learn and hence below is an implementation using numpy that is production ready. Source: https://stackoverflow.com/a/42251083/5417164

def mape_vectorized_v2(a, b): 
mask = a <> 0
return (np.fabs(a - b)/a)[mask].mean()

Thanks for reading this post. If you found this useful, please click on the claps button and share the post with your friends and colleagues.

--

--