Practical AI

Representation Engineering (Activation Hacking)

18 snips
Feb 28, 2024
Discover the intriguing concept of Activation Hacking and how it relates to representation engineering, featuring insights from a recent hackathon. The hosts share their thoughts on the latest advancements, including OpenAI's new Sora model for video generation. Explore the nuances of AI safety, prompting techniques, and the innovative GPTScript language. They also discuss database optimization and the exciting potential of smaller models like Gemma. Join the conversation on utilizing AI responsibly while engaging with the vibrant community.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Treehacks Projects

  • Daniel Whitenack attended Treehacks at Stanford, witnessing impressive AI projects.
  • One winning project used LoRa mesh network devices for disaster relief, transcribing audio and using an LLM for command and control.
INSIGHT

Representation Engineering

  • Representation engineering (activation hacking) offers a new way to control AI models beyond prompt engineering.
  • It involves directly manipulating the model's hidden states to induce specific behaviors or tones.
ADVICE

Creating Control Vectors

  • Create contrasting prompt pairs (e.g., happy vs. sad) and collect hidden states from the model's responses.
  • Calculate differences between corresponding hidden states to create control vectors.
Get the Snipd Podcast app to discover more snips from this episode
Get the app