AI Evals & Discovery

Show notes

What you’ll learn in this episode:

  • What “evals” actually mean in the AI/ML world
  • Why evals are more than just quality assurance
  • The difference between golden datasets, synthetic data, and real-world traces
  • How to identify error modes and turn them into evals
  • When to use code-based evals vs. LLM-as-judge evals
  • How discovery practices inform every step of AI product evaluation
  • Why evals require continuous maintenance (and what “criteria drift” means for your product)
  • The relationship between evals, guardrails, and ongoing human oversight

Resources & Links:

Mentioned in the episode:

Coming soon from Teresa:

  • Weekly Monday posts sharing lessons learned while building AI products
  • A new podcast interviewing cross-functional teams about real-world AI product development stories

New comment

Your name or nickname, will be shown publicly
At least 10 characters long
By submitting your comment you agree that the content of the field "Name or nickname" will be stored and shown publicly next to your comment. Using your real name is optional.