At DemandForecasting.com, it is our view that we have only just reached an inflection point that will see deep learning methods dominate other methods for forecasting applications in the near future. This is part two of a two part article. In part one I provided an introduction to the history of forecasting, and how historically, basic forecasting methods have tended to perform as well as complicated ones until recently. I also outlined how two forecasting methods that made use of deep learning have passed classical statistical methods in accuracy.
In this second part I provide eight reasons why we believe that deep learning based forecasting methods will become the de facto standard for serious forecasting problems:
Many classical time series models have been developed specifically for forecasting. Conversely, the key deep learning models have been developed for general purposes and adapted to use cases because of their power. RNNs (because of its ability to work with sequential data) and gradient boosting methods have proven useful recently for forecasting purposes, but still have inherent weaknesses. For example, even RNNs tend to overaccount / overaverage early data, failing to account for the amount that recent data tends to be more useful than historical data, a peculiarity of time series data. As RNNs mature or specific deep learning models are developed with time series forecasting in mind, their performance for this use case will improve.
We’ve only just seen models that make use of deep learning (in particularly through the use of DNNs) start to outperform classical models. As these findings become established empirical facts, more commercial and academic interest will turn to improving deep learning methods for forecasting. We’ve recently seen commercial ML based models for forecasting starting to get productionised, from Smyl’s model (used by Uber) to Facebook’s Prophet model and Amazon’s DeepAR model (and more recently, the release by Amazon of the GluonTS package).
DNNs simply have more power to account for peripheral features and data at a cross-sectional level, or to leverage information from other time series when making predictions, adding to the predictive ability of the model. In the M4-competition, participants were provided with a very generic dataset, which was hard to extract features from (for example, timestamps, typically provided in real world datasets weren’t provided). In real world datasets, there are often many surrounding features that can be utilised across time series to improve model performance. As a simple example, including a timestamp on datapoints allows for effects of dates (including whether the date fell on a holiday), weekdays, times, etc to be drawn across time series. It may be fairly suggested that if such datapoints were available, methods that made use of ML may have performed even better in the competition, while pure ML methods may have performed better than they did (methods that purely used a ML approach and did not combine or hybridise its ML approach with a statistical one all underperformed relative to the statistical benchmark method adopted).
In forecasting, the power of simply averaging models has been shown to be an extremely effective way to improve the accuracy of a forecast on average and to outperform individual component models, in a manner analogous to the use of the central limit theorem or to the benefits of diversifying a portfolio in finance. DNNs can be used to improve this finding, by weighting different models in an ensemble in a more sophisticated manner than just simply averaging methods. The ability of DNNs to correlate prediction errors with time series features improves its ability to do so.
Deep learning models, almost by definition, learn over time. As deep learning models remain in use, they learn from the quality of their historical predictions which can be used to forecast their errors, which can then be optimised out, a second order problem not typically considered by traditional forecasting techniques.
Deep learning methods have less restraints on the data required to provide an accurate forecast than classical methods. In part due to being able to make use of surrounding data, and partly due to the inductive power of DNNs, DNNs require less data from an individual time series to make accurate forecasts than classical time series forecasting techniques require. This is useful when working with small datasets, but also helps practitioners as they can trim large datasets, saving computing power. On the flipside, the more data provided (or more types of data), the better deep learning methods get at forecasting, whereas traditional methods require a minimum amount of data to be adequate, then do not necessarily improve their accuracy with further data.
DNNs are able to forecast in a probabilistic manner as well as just providing point predictions or prediction intervals. This can be utilised throughout the layers of a DNN too to provide probabilistic forecasts. When using forecasts for real world use cases, practitioners can see probability weighted outcomes, which may affect decision making. Probabilistic forecasts under traditional approaches use too much computing power to be feasible.