How to write code that scales and accelerates your work as a data scientist or machine learning engineer.
When I was a junior Data Scientist, my goal was to write code that simply worked.
I used to see Python as a framework to run Pandas, Numpy, or Matplotlib only. I started like everybody else in a Jupyter Notebook, processing the data and training models cell by cell.
I remember my first job in a company.
As the project progressed, the notebook grew, and despite providing explanations with markdowns, the code began to get messy.
The first model was finally trained, its performance evaluated and shipped to production with the developers’ help.
However, like any Machine Learning project, deploying a model is not the end of the journey but the beginning…
Several weeks later, I had to start over and revisit the notebook. To be honest, it was almost easier to create a new notebook. Requirements had changed. The code was too messy to attempt any modifications.
Furthermore, shipping the processing algorithm to production was a painful task. Data had to be processed identically across the notebook, in the training pipeline, and in the inference pipeline.
The need to write the code three times meant that any modification in the notebook required corresponding changes in the different pipelines, increasing the likelihood of introducing bugs.
Doing Machine Learning at this time was painful for me.
Until I started to apply Software Engineer best practices.
My code, my relationship with my colleagues, and my efficiency in delivering ML pipelines improved significantly.
One of those best practices was about using SOLID principles.
Why should you learn to code with SOLID principles?
You probably recognized yourself in my story.
Don’t worry—you’re not alone.