Grounding LLM responses in chunks retrieved from an external corpus so the model reasons over real, citable sources instead of parametric memory alone.
Build production evaluation pipelines for LLM applications — golden datasets, LLM-as-judge, rubrics, statistical significance, regression detection, and evals vs tests.
Trace, debug, evaluate, and monitor LLM applications with LangSmith. Covers tracing setup, datasets, evaluators, prompt hub, comparing runs, and CI integration.
Measure and improve RAG pipeline quality with ragas. Covers faithfulness, answer relevancy, context precision, context recall, dataset format, LLM judges, and CI integration.
Evaluate and monitor LLM applications with TruLens. Covers the RAG triad, feedback functions, TruChain, TruLlama, custom evaluators, the dashboard, and CI integration.
navigation
actions
cheat sheet pages