Daniel Kharitonov

AI

Unsupervised LLM Evaluations

Practitioners guide to judging outputs of large language models Daniel Kharitonov · Follow Published in Towards Data Science · 12 min read · 23 hours ago — <TLDR> Evaluating AI-generated outputs is critical for building robust applications of large language models because it allows complex AI applications to be split into simple stages with built-in error control. It is relatively straightforward to evaluate generative outputs in a supervised mode, where the “right answers” can be

Read More »
AI

Enforcing JSON outputs in commercial LLMs

A comprehensive guide Daniel Kharitonov · Follow Published in Towards Data Science · 9 min read · 21 hours ago — TL;DR We tested the structured output capabilities of Google Gemini Pro, Anthropic Claude, and OpenAI GPT. In their best-performing configurations, all three models can generate structured outputs on a scale of thousands of JSON objects. However, the API capabilities vary significantly in the effort required to prompt the models to produce JSONs and in

Read More »