Aaron Birnbaum and Matthew Makansi

How to Train LLMs to “Think” (o1 & DeepSeek-R1) | Towards Data Science

In September 2024, OpenAI released its o1 model, trained on large-scale reinforcement learning, giving it “advanced reasoning” capabilities. Unfortunately, the details of how they pulled this off were never shared publicly. Today, however, DeepSeek (an AI research lab) has replicated this reasoning behavior and published the full technical details of their approach. In this article, I will discuss the key ideas behind this innovation and describe how they work under the hood. [embedded content] OpenAI’s

Aaron Birnbaum and Matthew Makansi March 4, 2025

Generative AI and Civic Institutions | Towards Data Science

Different sectors, different goals Recent events have got me thinking about AI as it relates to our civic institutions — think government, education, public libraries, and so on. We often forget that civic and governmental organizations are inherently deeply different from private companies and profit-making enterprises. They exist to enable people to live their best lives, protect people’s rights, and make opportunities accessible, even if (especially if) this work doesn’t have immediate monetary returns. The

Aaron Birnbaum and Matthew Makansi March 3, 2025

LLM + RAG: Creating an AI-Powered File Reader Assistant | Towards Data Science

Introduction AI is everywhere. It is hard not to interact at least once a day with a Large Language Model (LLM). The chatbots are here to stay. They’re in your apps, they help you write better, they compose emails, they read emails…well, they do a lot. And I don’t think that that is bad. In fact, my opinion is the other way – at least so far. I defend and advocate for the use of

Aaron Birnbaum and Matthew Makansi March 3, 2025

Data Science: From School to Work, Part II | Towards Data Science

In my previous article, I highlighted the importance of effective project management in Python development. Now, let’s shift our focus to the code itself and explore how to write clean, maintainable code — an essential practice in professional and collaborative environments. Readability & Maintainability: Well-structured code is easier to read, understand, and modify. Other developers — or even your future self — can quickly grasp the logic without struggling to decipher messy code. Debugging & Troubleshooting: Organized code with clear variable

Aaron Birnbaum and Matthew Makansi March 3, 2025

Avoidable and Unavoidable Randomness in GPT-4o | Towards Data Science

Of course there is randomness in GPT-4o’s outputs. After all, the model samples from a probability distribution when choosing each token. But what I didn’t understand was that those very probabilities themselves are not deterministic. Even with consistent prompts, fixed seeds, and temperature set to zero, GPT-4o still introduces subtle, frustrating randomness. There’s no fix for this, and it might not even be something OpenAI could fix if they wanted to, just so we’re clear

Aaron Birnbaum and Matthew Makansi March 3, 2025

Vision Transformers (ViT) Explained: Are They Better Than CNNs? | Towards Data Science

1. Introduction Ever since the introduction of the self-attention mechanism, Transformers have been the top choice when it comes to Natural Language Processing (NLP) tasks. Self-attention-based models are highly parallelizable and require substantially fewer parameters, making them much more computationally efficient, less prone to overfitting, and easier to fine-tune for domain-specific tasks [1]. Furthermore, the key advantage of transformers over past models (like RNN, LSTM, GRU and other neural-based architectures that dominated the NLP domain