#evaluation (5)

Concepts (1)

Retrieval-Augmented Generation (RAG)

Grounding LLM responses in chunks retrieved from an external corpus so the model reasons over real, citable sources instead of parametric memory alone.

llm vector-search ai grounding embeddings hallucination evaluation web-researched

Cheat sheets (4)

LLM Evaluations

Build production evaluation pipelines for LLM applications — golden datasets, LLM-as-judge, rubrics, statistical significance, regression detection, and evals vs tests.

evals evaluation llm-as-judge golden-dataset regression statistics prompting

LangSmith — LLM Observability & Evaluation

Trace, debug, evaluate, and monitor LLM applications with LangSmith. Covers tracing setup, datasets, evaluators, prompt hub, comparing runs, and CI integration.

python langsmith llm observability tracing evaluation langchain ai

ragas — RAG Evaluation Framework

Measure and improve RAG pipeline quality with ragas. Covers faithfulness, answer relevancy, context precision, context recall, dataset format, LLM judges, and CI integration.

python ragas rag evaluation llm ai retrieval metrics

TruLens — LLM App Evaluation

Evaluate and monitor LLM applications with TruLens. Covers the RAG triad, feedback functions, TruChain, TruLlama, custom evaluators, the dashboard, and CI integration.

python trulens rag evaluation llm ai observability feedback

g h	home
g p	Programming section
g p	Python section
g j	JavaScript section
g t	TypeScript section
g o	OS section
g l	Linux section
g w	Windows section
g z	z/OS section
g o	macOS section
g a	AI section
g c	Claude Code section
g c	Codex CLI section
g c	Claude API section
g p	Prompting section
g f	Frameworks section
g p	Packages section
g p	Pip (Python) section
g p	npm (Node) section
g p	Cargo (Rust) section
g p	Go modules section
g g	graph view
g t	tags index

⌘K / /	open search palette
t	cycle theme (dark → light → system)
?	toggle this panel

[ / ]	previous / next sheet in section
j / k	scroll down / up