According to my contract, I’m not allowed to name the company, so I will refer to it as Alpha Distribution.
The competition took place in August and September 2019 for three consumers. The competition rules were simple. Based on actual values for 2012-2019, we built the models. During August-September, every working day we made consumption forecasts for the following day for three consumers. In the case of Friday, we made forecasts for three days ahead: Saturday, Sunday, and Monday. Alpha Distribution made the same forecasts under the same condition to bid for the day-ahead market. If you don’t know what the day-ahead market is, please see my latest paper The Three-Headed Dragon: Electricity, Trading, Analysis.
At the beginning of October 2019, we compared forecasts. The result of the competition was a draw. Or deuce, as is said in tennis. The forecast errors are in the table below:
The fourth column Consensus Forecast is an average of Alpha Distribution and our forecasts. The consensus forecast values and errors were calculated during forecast comparison.
For Consumer 1, the largest in terms of volume, we had a definite draw. During August our forecast was slightly better, in September the Alpha Distribution’s one outperformed ours. As a result, after two months we had 1.9% for both. Note, the error is calculated as the mean absolute error in percentage. From my previous experience, I know that the average of two independent forecasts of similar quality will be more accurate than each input. This forecast I call consensus forecast. In this case, the consensus error was 1.7%.
For Consumer 2, the smallest in terms of volume, our forecast was noticeably better than Alpha Distribution’s one. Here we had a less tough fight: in August we won around 1%, in September lost about 0.1%. All together we gained 0.5% over Alpha Distribution and this is the only case out of three when consensus didn’t allow to improve accuracy.
For Consumer 3, the middle in terms of volume, in August we lost about 0.6%, in September we had a partial fightback. After two months, Alpha Distribution’s forecast was 0.1% better than ours: 2.3% against 2.4%. Here, again the consensus gave a slight improvement in 0.1%, the consensus error was 2.2%.
Analyst recommendations to improve accuracy
After the competition results analysis, I formulated recommendations for Alpha Distribution to improve the forecast quality for the three consumers in question:
- For Consumer 1, apply two systems in parallel: Alpha Distribution internal system and our forecast system, then calculate consensus. Improvement for two months was 0.2%.
- For Consumer 2, apply only our forecast system. The improvement for two months was 0.5%.
- For Consumer 3, apply two systems in parallel: Alpha Distribution internal system and our forecast system, then calculate consensus. The improvement for two months was 0.1%.
I used to take part in similar competitions during 2012-2014 when, together with a friend of mine Sergey, we built up a forecast service for the Wholesale Electricity and Capacity Market of Russia. A short story you may read in my blog ‘The short-term electricity price forecast model on the most similar pattern.’ I lost all the competitions and was disappointed in such a scheme of forecast quality assessment. I had the impression that it was impossible to prepare a model in a couple of weeks and win against the model, even in MS Excel, which had been developing for several years and was applied daily. I still think that accuracy is a matter of time.
I’ve tried again with Alpha Distribution this year. It was scary. Before August, I understood that we’re challenging one of the most developed companies in the wholesale market of Russia in terms of analytics and modeling. To develop the forecast system from scratch, not even a system but currently a tool in Python, took me 3.5 months. I made 90% of the Python code for the tool, a colleague of mine helped me with ETL and daily forecasting during the competition. And with that new tool, we challenged a heavyweight champion. The goal was to stay on our feet. We did stay after the match was over. And we have gotten our portion of victory and defeat and, as a result, experience.
The conclusion I’ve come to so far is that, for accuracy improvement, one model is not enough. Two or more models with similar quality are required. In that case consensus, as the most straightforward mechanism for models ensembling, gives a noticeable improvement.
And yep, I think it impossible to win such competitions!