Today, or yesterday, I celebrate a year of work in Python. Probably, nowadays and in the near future, it will be my main working tool. Every working day starts with running PyCharm and coding in Python.
What have I learned in the first year?
1) New mathematical models.
Within one year, I have learned and developed solutions using the following algorithms/models:
- Empirical Mode Decomposition;
- Neural Network: Dense and LSTM;
- Boosts: XGBoost, LightGBM, CatBoost;
Some models are well developed, like Neural Network; some are still in their first draft versions, like EMD and Prophet. My boostings, now, are in the state of semi-products.
I opened up Kaggle. I registered on this five years ago but, since then, had not opened it until fall 2019. In late October, I found the electric power, water, and steam consumption forecast competition for 1,500 buildings all over the world. I made my first and very immature solution and got an idea of the problem. But what impressed me most was the winner’s solutions, which they openly and kindly shared with the rest of the world. Going through their solutions, I have learned modern approaches and techniques in solving a problem of such a scale. Kaggle is unique; it lets us study best practice, discuss ideas and thoughts and, most importantly, share Python modeling code.
3) High demand for Python data scientists.
I become a more-demanded power analyst every time I mention Python. What I have learned: everyone wants mathematics in Python. Very easily, I organized my first Python workshop after 10 months of practice. Currently, together with the Education Center, we are planning a new one with the same subject “The Power Price and Consumption Forecast Using Neural Network in Python”. This time, it will last for two days. It is worth noting that 90% of participants in the Kaggle power consumption forecast competition used Python in their solutions.
What is the core difference between Python and MATLAB, R?
1) Level of “development.”
I have been working in MATLAB for 12 years, R for six months, and Python for one year, and I once made an application in Java (IDE Eclipse). From this experience, I have concluded the following:
- MATLAB is a pure data science tool. You take an input, calculate the result, and put the result in your scientific paper. That’s it. The main advantage of MATLAB is the ability to rerun calculations from the middle: you calculate for hours, then save workspace in a single .mat file. You can proceed from where you stopped at a later date. MATLAB is the only tool mentioned that effortlessly allows such a trick. I do not expect a great future for MATLAB because its license costs a lot and algorithms are closed for users.
- R is the first step from pure science to software development. In R, we use libraries and, still, R contains some sort of workspace (analogously to MATLAB). To jump from MATLAB to R is easy because the coding logic is close. What I do not like about R is the debugger — this is the worst debugger I’ve ever worked with, cumbersome and untransparent. I hope, nowadays, R studio has improved that important part of their application.
- Python is the next step from math to software development. On one hand, this is a high-level object-oriented language, but it also allows simple and clear math coding à la R and MATLAB. You are still able to import data from a text file in two lines, which never works in a strict object-oriented language like Java. This coding logic flexibility allows the smooth integration of Python mathematics into IT infrastructure. On the other hand, Python contains an innumerous number of libraries with different types of mathematical models.
- Java is the complete opposite of MATLAB and comprises a pure cross-platform software development. I do not believe a lot of math models are being implemented in Java today.
2) Complete openness and investment from Google, Microsoft, Facebook, Yandex, etc.
You should know that libraries in Python are being developed and openly published by IT monsters. These are a few examples I have been working with:
- Prophet — Facebook development for time series analysis; quite sophisticated;
- LightGBM — Microsoft solution in the area of decision trees;
- CatBoost — Our proud Yandex solution in the area of decision trees (this area is extremely popular right now);
- TensorFlow — Backend for neural network training developed by Google. By the way, Kaggle belongs to Google too.
I guess that the rapid development of mathematics for both big and small data problems is the result of the openness of mathematical models. Monsters like Microsoft and Google understand clearly that the most efficient way to solve the mathematical problems that people and industries face today is to collaborate with the entire world: share ideas, codes, and best practices openly.