Josh Pitzalis

  • 👋 Start Here
  • How to Build Bulletproof LLM Eval Systems

    How to Build Bulletproof LLM Eval Systems

    The Step-by-Step Evaluation Framework That Companies Like Uber and Netflix Use to Get 99%+ Large Language Model Reliability in Production If you’re tired of LLM applications that work in demos but fail with real users… this comprehensive guide will show you exactly how to build the evaluation framework that engineering teams at top companies use…

    Josh

    June 29, 2025
    Generative AI
    evals
  • Systematic Error Detection for AI Systems

    How to Know if Your Application Actually Works So you’ve built your RAG bot, your customer service chatbot, or your AI-powered application. It looks great in demos, but how do you actually know if it works when real users start interacting with it? This is where systematic error detection comes in—a methodical approach borrowed from…

    Josh

    June 7, 2025
    Generative AI
    data-annotation, error-analysis, quantitative-analysis
  • LLM Evaluation Framework: Beyond the Vibe Check

    Building an LLM application that passes the initial “vibe check” is just the beginning. The real challenge lies in making it production-ready, reliable, and systematically defensible against edge cases. This guide explores the fundamental principles and lifecycle approach to evaluating LLMs, drawing from cutting-edge research in ML reliability. Why Evaluating LLMs Matters More Than You…

    Josh

    June 6, 2025
    Generative AI
    evaluation
  • Building Custom Annotation Tools for AI Error Analysis

    Why custom dataset annotation solutions often outperform generic data labeling platforms for machine learning systems When your AI application isn’t performing as expected, you need to understand where it’s failing. This requires systematic error analysis through data annotation—having domain experts review queries and responses to identify patterns and problems. While off-the-shelf annotation tools seem like…

    Josh

    June 5, 2025
    Generative AI
    data-annotation, error-analysis
  • How to Evaluate RAG Systems

    Every few weeks, someone declares that retrieval augmented generation (RAG) is dead. But here’s the thing: retrieval isn’t going anywhere. Any large language models system worth its salt needs to retrieve data at some point—whether through an MCP call, database query, or document lookup. The real question isn’t whether retrieval augmented generation systems are dead,…

    Josh

    June 4, 2025
    Generative AI
    rag
1 2 3 … 43
Next Page→

Blog at WordPress.com.

  • Subscribe Subscribed
    • Josh Pitzalis
    • Already have a WordPress.com account? Log in now.
    • Josh Pitzalis
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar