Scale your Machine Learning Projects with SOLID principles

How to write code that scales and accelerates your work as a data scientist or machine learning engineer.

Jeremy Arancio

Published in

Towards Data Science

13 min read

14 hours ago

—

When I was a junior Data Scientist, my goal was to write code that simply worked.

I used to see Python as a framework to run Pandas, Numpy, or Matplotlib only. I started like everybody else in a Jupyter Notebook, processing the data and training models cell by cell.

I remember my first job in a company.

As the project progressed, the notebook grew, and despite providing explanations with markdowns, the code began to get messy.

The first model was finally trained, its performance evaluated and shipped to production with the developers’ help.

However, like any Machine Learning project, deploying a model is not the end of the journey but the beginning…

Several weeks later, I had to start over and revisit the notebook. To be honest, it was almost easier to create a new notebook. Requirements had changed. The code was too messy to attempt any modifications.

Furthermore, shipping the processing algorithm to production was a painful task. Data had to be processed identically across the notebook, in the training pipeline, and in the inference pipeline.

The need to write the code three times meant that any modification in the notebook required corresponding changes in the different pipelines, increasing the likelihood of introducing bugs.

Doing Machine Learning at this time was painful for me.

Until I started to apply Software Engineer best practices.

My code, my relationship with my colleagues, and my efficiency in delivering ML pipelines improved significantly.

One of those best practices was about using SOLID principles.

Photo by Clément Hélardot on Unsplash

Why should you learn to code with SOLID principles?

You probably recognized yourself in my story.

Don’t worry—you’re not alone.

How to Unlock 50K Views on Medium With This Winning Article Template | HackerNoon

The template below has consistently brought me success; it’s an outstanding method for organizing technical articles. Template Introduction Learning Objectives Prerequisites Getting Started Sub Topics

September 6, 2024

Simbe Brand Insights gives stores, suppliers visibility into inventory data – The Robot Report

Listen to this article Simbe Brand Insights provides daily, data-driven insights to partners, including vendors, warehouse distributors, CPG brands, national manufacturers, and smaller brands via

October 22, 2024

Duplicate Detection with GenAI

How using LLMs and GenAI techniques can improve de-duplication Ian Ormesher · Follow Published in Towards Data Science · 5 min read · 13 hours

July 1, 2024

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

With expert analysis, comprehensive market coverage, and actionable insights, our newsletter equips you with the knowledge & tools necessary to make informed decisions & maximize your potential returns in the dynamic world of future tech stocks.