AI

AI

Six Ways to Control Style and Content in Diffusion Models | Towards Data Science

Stable Diffusion 1.5/2.0/2.1/XL 1.0, DALL-E, Imagen… In the past years, Diffusion Models have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more specialised queries, for example generating images in a specific style, that was not frequently seen in the training dataset. We could retrain the whole model on vast number of images, explaining the concepts needed to address the issue from scratch.

Read More »
AI

The Gamma Hurdle Distribution | Towards Data Science

Which Outcome Matters? Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be no communication or no offer. “A” could be 10% off and “B” could be 20% off. Two groups, two different treatments, where A and

Read More »
AI

Triangle Forecasting: Why Traditional Impact Estimates Are Inflated (And How to Fix Them) | Towards Data Science

Accurate impact estimations can make or break your business case. Yet, despite its importance, most teams use oversimplified calculations that can lead to inflated projections. These shot-in-the-dark numbers not only destroy credibility with stakeholders but can also result in misallocation of resources and failed initiatives. But there’s a better way to forecast effects of gradual customer acquisition, without requiring messy Excel spreadsheets and formulas that error out. By the end of this article, you will

Read More »
AI

I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms | Towards Data Science

Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too big of a scope to write about… but when a model like DeepSeek comes out of nowhere with a steel chair, boasting similar performance levels to other models, what does performance really mean in this

Read More »
AI

Synthetic Data Generation with LLMs | Towards Data Science

Popularity of RAG Over the past two years while working with financial firms, I’ve observed firsthand how they identify and prioritize Generative AI use cases, balancing complexity with potential value. Retrieval-Augmented Generation (RAG) often stands out as a foundational capability across many LLM-driven solutions, striking a balance between ease of implementation and real-world impact. By combining a retriever that surfaces relevant documents with an LLM that synthesizes responses, RAG streamlines knowledge access, making it invaluable for applications like customer support, research,

Read More »
AI

The Method of Moments Estimator for Gaussian Mixture Models | Towards Data Science

Audio Processing is one of the most important application domains of digital signal processing (DSP) and machine learning. Modeling acoustic environments is an essential step in developing digital audio processing systems such as: speech recognition, speech enhancement, acoustic echo cancellation, etc. Acoustic environments are filled with background noise that can have multiple sources. For example, when sitting in a coffee shop, walking down the street, or driving your car, you hear sounds that can be

Read More »
AI

Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics | Towards Data Science

Metric collection is an essential part of every machine learning project, enabling us to track model performance and monitor training progress. Ideally, Metrics should be collected and computed without introducing any additional overhead to the training process. However, just like other components of the training loop, inefficient metric computation can introduce unnecessary overhead, increase training-step times and inflate training costs. This post is the seventh in our series on performance profiling and optimization in PyTorch.

Read More »
AI

Introduction to Minimum Cost Flow Optimization in Python | Towards Data Science

Minimum cost flow optimization minimizes the cost of moving flow through a network of nodes and edges. Nodes include sources (supply) and sinks (demand), with different costs and capacity limits. The aim is to find the least costly way to move volume from sources to sinks while adhering to all capacity limitations. Applications Applications of minimum cost flow optimization are vast and varied, spanning multiple industries and sectors. This approach is crucial in logistics and

Read More »
AI

A Visual Guide to How Diffusion Models Work | Towards Data Science

This article is aimed at those who want to understand exactly how Diffusion Models work, with no prior knowledge expected. I’ve tried to use illustrations wherever possible to provide visual intuitions on each part of these models. I’ve kept mathematical notation and equations to a minimum, and where they are necessary I’ve tried to define and explain them as they occur. Intro I’ve framed this article around three main questions: What exactly is it that

Read More »
AI

Myths vs. Data: Does an Apple a Day Keep the Doctor Away? | Towards Data Science

Introduction “Money can’t buy happiness.” “You can’t judge a book by its cover.” “An apple a day keeps the doctor away.” You’ve probably heard these sayings several times, but do they actually hold up when we look at the data? In this article series, I want to take popular myths/sayings and put them to the test using real-world data.  We might confirm some unexpected truths, or debunk some popular beliefs. Hopefully, in either case we

Read More »