Interviewing Andrew Trask on how language models should store (and access) information

7 snips

Oct 10, 2024

Andrew Trask, a passionate AI researcher and leader of the OpenMined organization, shares insights on privacy-preserving AI and data access. He discusses the importance of secure enclaves in AI evaluation and the complexities of copyright laws impacting language models. Trask explores the ethical dilemmas of using non-licensed data, federated learning's potential, and challenges startups face in the AI landscape. He emphasizes the need for innovative infrastructures and the synergy between Digital Rights Management and secure computing for better data governance.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Twitter's Data Challenge

Twitter lacked ground truth demographic data for bias studies, relying on external sources.
Sensitive data, like census information, cannot be easily shared due to privacy laws, hindering research.

INSIGHT

Secure Enclaves Explained

Secure enclaves encrypt data in RAM, using a chip-specific key, enhancing privacy during computation.
They provide a signed hash of the running program, enabling verification and trust among parties.

INSIGHT

LLM Information Storage

LLMs store syntactic (grammar) and semantic (real-world) information; current models store both, necessitating full retraining for updates.
Andrew Trask suggests separating these, allowing for more efficient updates by swapping out the real-world database.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Andrew Trask is one of the bright spots in engaging with AI policy for me in the last year. He is a passionate idealist, trying to create a future for AI that enables privacy, academic research, and government involvement in a rapidly transforming ecosystem. Trask is a leader of the OpenMined organization facilitating researcher access to non-public data and AIs, a senior research scientist at Google DeepMind, a PhD student at the University of Oxford, an author and educator on Deep Learning.

You can find more about Trask on Twitter or Google Scholar. You may want to watch his recent talk at Cohere on the future of AI (and why data breakthroughs dominate), his lecture at MIT on privacy preserving ML, or his book on deep learning that has a substantial GitHub component. Here’s a slide I liked from his recent Cohere talk:

The organization he helps run, OpenMined, has a few principles that say a lot about his ambitions and approaches to modern AI:

We believe we can inspire all data owners to open their data for research by building open-source privacy software that empowers them to receive more benefits (co-authorships, citations, grants, etc.) while mitigating risks related to privacy, security, and IP.

We cover privacy of LLMs, retrieval LLMs, secure enclaves, o1, Apple's new models, and many more topics.