Scale your Machine Learning Projects with SOLID principles

How to write code that scales and accelerates your work as a data scientist or machine learning engineer.

Jeremy Arancio

Published in

Towards Data Science

13 min read

14 hours ago

—

When I was a junior Data Scientist, my goal was to write code that simply worked.

I used to see Python as a framework to run Pandas, Numpy, or Matplotlib only. I started like everybody else in a Jupyter Notebook, processing the data and training models cell by cell.

I remember my first job in a company.

As the project progressed, the notebook grew, and despite providing explanations with markdowns, the code began to get messy.

The first model was finally trained, its performance evaluated and shipped to production with the developers’ help.

However, like any Machine Learning project, deploying a model is not the end of the journey but the beginning…

Several weeks later, I had to start over and revisit the notebook. To be honest, it was almost easier to create a new notebook. Requirements had changed. The code was too messy to attempt any modifications.

Furthermore, shipping the processing algorithm to production was a painful task. Data had to be processed identically across the notebook, in the training pipeline, and in the inference pipeline.

The need to write the code three times meant that any modification in the notebook required corresponding changes in the different pipelines, increasing the likelihood of introducing bugs.

Doing Machine Learning at this time was painful for me.

Until I started to apply Software Engineer best practices.

My code, my relationship with my colleagues, and my efficiency in delivering ML pipelines improved significantly.

One of those best practices was about using SOLID principles.

Photo by Clément Hélardot on Unsplash

Why should you learn to code with SOLID principles?

You probably recognized yourself in my story.

Don’t worry—you’re not alone.

AMD’s new B850 motherboard spotted: new mid-range AM5 motherboard ready for CES 2025 reveal

TL;DR: The AMD B850 motherboard, designed for Ryzen 9000 “Zen 5” processors, has been revealed, featuring Gen5 and Gen4 support, CPU and RAM overclocking, and

December 5, 2024

Music Generation Via A Hidden Markov Model – Part 2 | HackerNoon

Introduction In Part 1 of this guide, you built a Jupyter Notebook to generate music sequences via a Hidden Markov Model (HMM). In this Part

July 30, 2024

World’s most powerful space telescope tells the weather on planet light-years away

NASA’s James Webb Space Telescope has pointed its extremely sensitive instruments at an exoplanet located 280 light-years away from Earth. Global temperature map of the

May 5, 2024

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

With expert analysis, comprehensive market coverage, and actionable insights, our newsletter equips you with the knowledge & tools necessary to make informed decisions & maximize your potential returns in the dynamic world of future tech stocks.