Representation Engineering (Activation Hacking)

18 snips

Feb 28, 2024

Discover the intriguing concept of Activation Hacking and how it relates to representation engineering, featuring insights from a recent hackathon. The hosts share their thoughts on the latest advancements, including OpenAI's new Sora model for video generation. Explore the nuances of AI safety, prompting techniques, and the innovative GPTScript language. They also discuss database optimization and the exciting potential of smaller models like Gemma. Join the conversation on utilizing AI responsibly while engaging with the vibrant community.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Treehacks Projects

Daniel Whitenack attended Treehacks at Stanford, witnessing impressive AI projects.
One winning project used LoRa mesh network devices for disaster relief, transcribing audio and using an LLM for command and control.

INSIGHT

Representation Engineering

Representation engineering (activation hacking) offers a new way to control AI models beyond prompt engineering.
It involves directly manipulating the model's hidden states to induce specific behaviors or tones.

ADVICE

Creating Control Vectors

Create contrasting prompt pairs (e.g., happy vs. sad) and collect hidden states from the model's responses.
Calculate differences between corresponding hidden states to create control vectors.

Get the Snipd Podcast app to discover more snips from this episode

Get the app