The Most Valuable LLM Dev Skill is Easy to Learn, But Costly to Practice.

Here’s how not to waste your budget on evaluating models and systems

mage created by the author using Flux1.1 Pro.

You can build a fortress in two ways: Start stacking bricks one above the other, or draw a picture of the fortress you’re about to build and plan its execution; then, keep evaluating it against your plan.

We all know the second one is the only way we can possibly build a fortress.

Sometimes, I’m the worst follower of my advice. I’m talking about jumping straight into a notebook to build an LLM app. It’s the worst thing we can do to ruin our project.

Before we begin anything, we need a mechanism to tell us we’re moving in the right direction — to say that the last thing we tried was better than before (or otherwise.)

In software engineering, it’s called test-driven development. For machine learning, it’s evaluation.

The first step and the most valuable skill in developing LLM-powered applications is to define how you’ll evaluate your project.

Evaluating LLM applications is nowhere like software testing. I don’t undermine the challenges in software testing, but evaluating LLMs isn’t as straightforward as testing.