2min chapter

Win The Content Game cover image

28. How to Get A-List Guests on Your Podcast with Sarah St John

Win The Content Game

CHAPTER

How to Direct Message Pat Flynn

The only way to contact him is through like his contact form on his website. So he launched a private it's a paid community, but it's only like 49 a month. I noticed that you can direct message people who are in the community. And so I did. But I still didn't think anything of it. When I told him about my podcast, what it was about, and asked if he'd like to be a guest, within two hours he had already booked in response.

00:00
Speaker 2
Interesting. And so, yeah, you work for Google on very technical topics, but as a product lead there, you have quite a big impact by implementing, I mean, bridging what researchers do and having some impact on the business internally and externally and a couple of years later you join openai you're now working at openai on the llm evaluation team first things is well how did you join OpenAI? How did you get to work there? How did you make this transition? I think after Google, you had a bunch of experiences. But how did you end up joining OpenAI?
Speaker 1
Yeah. I was actually working as VP of product for a machine learning observability company called Aporia in Tel Aviv. And machine learning observability is maybe kind of like a fancy term for data dog, but for machine learning models. so being able to really identify and um investigate model specific issues like performance degradations or uh model drift or data quality issues and so i was very focused on building products for measuring investigating and actioning on model performance across not just like models as an entity but like across systems and applications and at the time um naturally open ai was gaining traction and kind of chat gpt blew the world up and for the right reasons and it was extremely fascinating um you know you went from assessing model quality for these fairly small like simple model architecture you know discrete classifiers you know to to a billion token you know hundreds of billions of tokens now being nested into a single large language model, much more generally applicable, much broader use cases, much higher capabilities and performance. And so I was fascinated. And I actually wasn't looking for a new opportunity. I was fairly content at my role and had a lot of impact and they reached out to me. someone in my my network uh i guess was like perusing my my personal site or my linkedin and reached out in a linkedin message and said hey um i passed along your your resume and your your experience is extremely relevant and you know we're looking for someone to to help lead our evaluation team at OpenAI. And it was kind of one of those opportunities that was very difficult to refuse. And so I think like a tremendous number of interviews later, like maybe like eight or nine interviews later, I found myself moving from Tel Aviv to San Francisco.
Speaker 2
When did you join the company? It's been a couple of months now, is that right?
Speaker 1
I joined in December, early December.
Speaker 2
I'm keen to understand a bit more your current role at OpenAI. We mentioned that you're working in the LLM evaluation team, but what's your role like and what does this really mean to work on the LLM evaluation?
Speaker 1
Yeah, so being part of the model evaluation team, it's really interesting because the objective of the team is to not just measure and and benchmark our model capabilities but to really assess uh what it means for the model to be good at something and as you can imagine like what we perceive to be good today is much better than what we perceive to be good even six months ago. And so it's kind of like this constantly moving target of figuring out what sort of intelligence does the model need to exemplify and be proficient at? What sort of domains does it need to know about and reason about? What sort of capabilities should it be able to perform with what degree of precision, you know. So it's really a fascinating science of figuring out how smart is smart, how capable is truly capable, and how do you consistently improve those capabilities across different areas of expertise, and the breadth of things that our users care about.
Speaker 2
And yeah, is there also this... there is also, I feel, I've done a couple of projects, GenAI project, and there is also this challenge of, I have two models. How do I really know that one is better than the other? Is that also a challenge on your side? Or maybe because you've got, I mean, lots of data, lots of processes, that's an easy thing for you? Or is it still difficult to be sure that one model is actually better than the other? Because it can answer a set of questions or a domain of questions correctly, but be slightly worse in another area. And so it's not like a traditional ML algorithm where you've got a mean absolute error and the algorithm that gets the lowest error is the best. LLM, it's a bit more nuanced, right?
Speaker 1
Yeah, it's a great question. And in like a daily struggle, this candidate model versus incumbent model, which one is better? And I think the biggest challenge there is you oftentimes have a pretty diverse set of eval data sets which you run various benchmarks on and you might excel or improve in one on one eval and regress in another right and so how do you really like balance out the decision making of ultimately you know is this model better or worse and i think there are like several techniques that we've developed in open ai which are novel and really inspiring and i think that's the the more creative aspect of of my role which i'm very fortunate to be kind of pushing. And I think, you know, one other thing that's very interesting is actually being able to come up with different constraints and visualizations to try to make sense of this data. I think the difference in, you know, the way you described my work at Google was kind of bridging the gap between research and, you know, product or, you know, things that get exposed and used. And the way I would describe my work here is like bridging the gap between metrics and like actionable decisions because i think like that that bridge between metrics and just these like numbers that give you like a score on how well you did here how well you did on this email like don't really tell a story um and so being able to make sense of that in like a novel fashion is something that i'm super passionate about because like to me a metric is not an insight like a story is an insight how did we get here and what can we do to improve this and so bridging that gap between the metrics and the action that a researcher should take to fix something or improve something is kind of the science that we're working on. Yeah,
Speaker 2
so it's not enough to just look at the numbers and see, okay, this model is an improvement on this particular metric. We're going to release it to everyone. It's a bit more nuanced, especially for LLMs. Absolutely.
Speaker 1
And
Speaker 2
well, keen to understand also, what's your advice for people who build Gen AI apps? What can they do if they want to make some improvements to their models? What can they do in their day-to to make sure that their LLM is actually an improvement? If they want to do some quick checks or some tips that they can do to make sure that the V2 that they build is actually better than the V1. Yeah,
Speaker 1
great question. I think, you know, in many cases, what's in everybody's best interest is to curate a data set or a holdout set, test set, whatever you want to call it, to evaluate your model regularly and make sure that test set is being updated continuously. to model evaluation, right? And whether version one is better or worse than version two, right? The question I often ask folks is, you know, what is the cost or the consequence of a missed prediction of a false positive or of a false negative? Because in the case of autonomous driving, right, the cost is very high. If you perceive a human as a cone, it could lead to a fatality and, you know, God forbid that happens. perceives your um you know expensive shopping spree as fraud and they block your credit card like the cost of that is probably not you switching credit card companies it's like an inconvenience or a text message that says hey is this really you and so while let's just say model version 1 is actually better in terms of its false positive rate than model version 2, but model version 2 latency has improved significantly. Again, I ask the question, what's the cost of this misprediction if your model were to hallucinate in this way or give a wrong answer in this way? So get your golden test set, continue to curate it. Users' behaviors drift, data drifts. And so make sure that you're continuously measuring progress and updating that data set so it's reflective, diverse, and representative of what you expect to see in production.
Speaker 2
Okay, that's interesting because that's actually quite close to regular ML. Like you need a very good test set that's representative. And you also need a bunch of metrics. Usually one metric is not enough you need you talked about latency about false positive there are other metrics and and then it's what you're doing now like a story between the metrics and the actual business goal so it's not enough to look at just the metrics it's also telling a story or sharing a story summarizing all the information that you've got and making sense of this data to then say, okay, is my model really better or not? Absolutely.
Speaker 1
I think like fundamental principles apply everywhere. building on top of these fundamental principles to really differentiate yourself and your science for decision-making. So
Speaker 2
a few general questions I have on OpenAI. I mean, the first one is, where do you see the future of the field? Like more for you, we've got, so OpenAI released Sora, which is more of a visual model. We've got ChatGPT, which is more of a text model. So multimodality is probably the future there. Agents are also becoming more popular, but where do you see the future of GenAI and LLMs? Yeah.
Speaker 1
You know, it's hard to predict, but I'm super excited about it. I think, you know, when people throw around the word AGI, what does that really mean? I find it analogous to product market fit, which is something that not a lot of people can define or describe but every product person or founder or CEO who has like built something that has reached product market fit can attest to the fact that when you get there, you know you're there. Maybe it's because your servers aren't scaling fast enough that you can't acquire enough GPUs that you truly can't handle all the user requests that are you know flooding your your support channel or customer service channel and so you know i'm not sure how to define ati whether it's like x number of modalities or being better on these sets of advanced placement exams uh in these domains right right? But it's something that I hope and am optimistic about, because I truly believe that AGI will help alleviate a ton of societal and global issues that I feel are very difficult to reason about and very difficult to come up with constructive, better solutions across a variety of important industries, medical to finance to tax to policy research. And so I am very excited about the future of OpenAI and the fact that its mission is to build safe AGI, which I think is a critical key word there. And I think we're not too far from it which is exciting and you know some would say terrifying but again i'm optimistic about the the good use of this type of technology which i think can truly benefit humanity in ways that nothing like it has ever had such a grandiose impact. So
Speaker 2
you don't know how AGI will look like, but you think that once we'll get it, we'll know it because demand will just burst. And well, if you invent the algorithm, OpenAI will just get so much demand that you will suddenly realize that it's there. Yeah,
Speaker 1
or if you think about it, like when you first interacted with ChatGPT person, right? You were, I'd like to think you were blown away. Yeah. Or I'll say, I'll speak for myself. I'll say I was blown away. And so I could do a ton of stuff. And it was like this impressive, hard to believe that it was a machine at first glance. Right. Like. If you were a. You know, let's take you back in time. You are a four year old or a five year old or just like an infant or a child. probably not a lot that you could ask it that it couldn't answer, couldn't help with. And so, you know, maybe some would say the current state of the art large language model is AGI for, you know, a child's level of thinking and reasoning. And so as we as humans have advanced, again, it's kind of this moving target of what is AGI? I think we'll know when we get there, not just because of the flood of demand, but because of the positive change that was so stagnant for, you know, decades, if not centuries in different areas, right? Like maybe it's cancer research. Maybe we don't have to deal with, you know, losing loved ones because we've synthesized a drug that you know previously wasn't possible maybe it's you know we don't have to deal with like you know global deprivation or hunger when it comes to uh you know third world countries because we're able to innovate new ways to new methods for agriculture or new methods for, you know, synthetic meat or whatever it is. I think like we will have made very significant leaps in a short amount of time that weren't previously possible.
Speaker 2
Yeah. And when, this happen? I
Speaker 1
hope within our lifetime. I think much sooner than that. But yeah, I think within 10 to 20 years is a sure bet. Within five to 10 years is like a risky bet. But I'm optimistic. And I think like, one thing that I can brag about at OpenAI that is different from other companies and the googles of the world is the the passion and the talent of the people that i work with on a daily basis is impressive And the velocity at which the company prides itself on, which is like moving quickly, ensuring that we follow, you know, the safe principles that align with our mission.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode