John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more math and ML throughout his career.
RAG Explained
"RAG is not a thing. RAG is two things." It breaks into:
- Search - finding relevant information
- Prompt engineering - presenting that information to the model
These should be treated as separate problems to optimize.
The Little Red Riding Hood Principle
When prompting LLMs, stay on the path of what models have seen in training. Use formats, structures, and patterns they recognize from their training data:
- For code, use docstrings and proper formatting
- For financial data, use SEC report structures
- Use Markdown for better formatting
Models respond better to familiar structures.
Testing Prompts
Testing strategies:
- Start with "vibe testing" - human evaluation of outputs
- Develop systematic tests based on observed failure patterns
- Use token probabilities to measure model confidence
- For few-shot prompts, watch for diminishing returns as examples increase
Managing Token Limits
When designing prompts, divide content into:
- Static elements (boilerplate, instructions)
- Dynamic elements (user inputs, context)
Prioritize content by:
- Must-have information
- Nice-to-have information
- Optional if space allows
Even with larger context windows, efficiency remains important for cost and latency.
Completion vs. Chat Models
Chat models are winning despite initial concerns about their constraints:
- Completion models allow more flexibility in document format
- Chat models are more reliable and aligned with common use cases
- Most applications now use chat models, even for completion-like tasks
Applications: Workflows vs. Assistants
Two main LLM application patterns:
- Assistants: Human-in-the-loop interactions where users guide and correct
- Workflows: Decomposed tasks where LLMs handle well-defined steps with safeguards
Breaking Down Complex Problems
Two approaches:
- Horizontal: Split into sequential steps with clear inputs/outputs
- Vertical: Divide by case type, with specialized handling for each scenario
Example: For SOX compliance, break horizontally (understand control, find evidence, extract data, compile report) and vertically (different audit types).
On Agents
Agents exist on a spectrum from assistants to workflows, characterized by:
- Having some autonomy to make decisions
- Using tools to interact with the environment
- Usually requiring human oversight
Best Practices
For building with LLMs:
- Start simple: API key + Jupyter notebook
- Build prototypes and iterate quickly
- Add evaluation as you scale
- Keep users in the loop until models prove reliability
John Berryman:
Nicolay Gerold: