Complaint Time Series with Bayesian Models

Summary

WHAT

Complaint Forecast
1,473,407 financial product complaints registered from 2011 December 1st to 2019 December 31st

HOW

Bayesian Multiplicative model

Additive and Multiplicative models were tested

RESULTS

Predictions with 23% error
Trends and weekly cycle were well represented

TECHNOLOGIES

VS Code

Python

Pandas

NumPy

Prophet

Seaborn

Matplotlib

Introduction

Communication channels between companies and customers are essential to gather feedback and improve products and services. Complaints are one of the most common types, given their ability to expose weaknesses in processes, logistics, raw materials, among others. Modeling the trends in the number of complaints could be useful as a basis to evaluate future improvements.

In this study my objective was to use time series additive and multiplicative models to model the historical complaints in financial institutions and use it to predict future complaints.

Methods

The database of financial product and service complaints of Consumer Financial Protection Bureau, from United State Government, was used, which has 1,473,407 complaints registered from 2011 December 1st to 2019 December 31st.

Among all products complained, the top three comprehend the 61% of the total in the market. To work with a clean target, this study was focused on the main factor “Credit, reporting, credit repair services, or other personal consumer reports”.

Considering only this type of product, the company with more complaints is Equifax Inc., as it can be seen in the next plot, so it was the company considered for the study.

As you can see in the next figure there are outliers needed to be removed. Therefore, all values above 250 complaints per day wasn’t considered in this analysis.

Finally, after cleaning the dataset, the clean data is showed below.

To perform the analysis, this dataset was divided into training data (80%) and testing data (20%).

The Python library Pandas and Numpy were used for data manipulation, Matplotlib and Seaborn for Visualization. The timeseries analysis was made using the Python library Prophet, comparing 2 bayesian models, Additive and Multiplicative model. For each model a cross-validation analysis were executed. Finally, the model is tested with testing data in order to choose the best model.

Results

To evaluate the performance of each model with the training dataset a cross-validation of each model was executed, with an horizon of 100 days, a period of 80 days and an initial range to learn of 400 days.

The index used to compare models was mainly MAPE (Mean Absolute Percentage Error), which represents the percentage mean proportion of model error over the real values. This index is one of the most used in time series analysis due to its easy interpretation. For instance, a value of 50% means that the average difference between the real and predicted number of complaints is 50%.

As you can see in the next figure there is a difference of MAPE between both models, with a better performance from the Multiplicative Model with 27%, which means that the average error of the model compared to the real value is 27%.

The next step was to ask the model to predict the values of the testing data dates to calculate the MAPE and with these compare model performances. The result was that again the Multiplicative model could predict the complaints with an average error of 23%. This result is closed to the cross-validation, evidence of no significative overfitting.

The plot with the real and predicting values from models shows that the cyclic behavior of the complaints is well represented by the two models for almost all weeks from the testing data, but the Multiplicative model has a better flexibility to fit the variability of the real data.

All the above results support the multiplicative model as the best model for this dataset.

With Prophet library is possible to detect the change in trends along the data series, as you can see in the next figure for the multiplicative model. There are seven times the complaints break the tendency, possibly because of changes in the procedures, changes in the demand, inclusion of new branches.

The components from the time series confirm the weekly cycle, but more important a trend of increasing complaints over time, and an irregular but clear cycle every of ups and downs every year, with minimums at the beginning of each year.

Conclusion

Using an additive and multiplicative model to predict the trends of complaints raised a good performance fitting, over all catching the tendency along the years and predicting the weekly cycle, especially the Multiplicative Model.

Considering the predictions made in this study, a null model of complaints could be constructed, where the predicted values can be interpreted as the outcomes of past conditions, with no change in the future. After implementing corrective actions, Equifax complaints should decrease to below the predictions to demonstrate its effectivity. Thus, this study highlight not just the tendency but to guide the expectation of the corrective actions.