To find a solution to the problem common to many companies of predicting the volume of purchases for a given product, one of the approaches that can be pursued is based on the analysis of the time series of historical volumes. Often, it is possible to find a sure periodicity (to various levels) of the question in the time, as an example in the function of the days of the week, the months of the year, the presence or less than festivity, the variability of the price, etc. To be able to consider more variables in a neural net, and, at the same time to try to exploit the possible correlations among them keeping in mind their temporal sequence, it is possible to use a Recurrent Neural Network.
Recurrent Neural Network Architecture
The Feed-Forward neural networks, such as Multi-Layer Perceptron and Convolutional Neural Network, operate on vectors with fixed size. Different applications, in which the order of the input variables affects the determination of the next value, can require that the input and/or the output can be sequences (also of variable length). RNN-type neural networks also involve “backwards” or same-level connections. The most commonly used models, for example, Long Short-Term Memory, foresee connections towards the same level. At each step of a given sequence, identifiable with a temporal instant t), the level in question receives in addition to the input relative to the instant t=x(t) also the output of the previous step, y(t-1). This allows the neural network to base its output at the instant t, y(t), also on the “history”, i.e. on all the elements of a given temporal sequence and on their mutual position, exploiting a “memory effect”. To train a Recurrent Neural Network, it is necessary to use Back Propagation Through Time. The typical architecture of an RNN is shown in Fig. 1.
The drawing on the left uses the cycle representation while the one on the right explodes the cycle into a single line in time. For details on the hidden representations, one can refer to countless texts in the literature https://atcold.github.io/pytorch-Deep-Learning/it/week06/06-3/. A common Long Short-Term Memory unit consists of a cell, an input port (input gate), an output port (output gate), and a “don’t forget gate” memory cell. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell. The input transformation functions for each cell are shown in Fig. 2, and multiple cells can be concatenated sequentially. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
In one of the last projects completed for a Premoneo’s client in the Logistics market, an LSTM was used to determine the demand for a given product, given as input the following series: sales volume history, date, day of the week (Monday=1, …. Sunday=7), the month of the year (from 1 to 12), the daily unit price of the product. The architecture of the tested network chains 200 LSTM cells to one output neuron. The periodicity chosen for each input feature is 7 days and the training has been carried out on the history including the whole of 2020. The mean absolute percentage error (MAPE) was used as a loss function with the Adam method (adaptive momentum estimation) as an optimizer. The training was performed with 400 epochs and batch size 10. The prediction was obtained over 30 days and as a figure of merit, the agreement between prediction and actual over 30 days was observed. The result is shown in Fig. 3., where “Forecast” represents the volume trend as a function of date in the number of days (translated to 0 for the first day) predicted by the LSTM and “Actual” represents the same variable taken from the history of the actuals for the same period.
The sample from which the actual values shown in Fig.3 were taken was not considered in the training but represents an independent test sample. Although there are other methods of analysis, such as ARIMA or GLM, the Recursive Neural Networks allow to exploit not only the historical time series of prices but also the correlations existing between other variables. Moreover, thanks to the many existing open-source libraries (such as Keras, used in this test), implementing different models in a versatile way with minimal changes to the source codes, testing multiple architectures, is practical and fast. For datasets that exceed a certain dimensionality, the use of neural networks certainly provides an advantage in terms of computational speed and prediction accuracy than analytical models.