Enhancing AI Model Security Against Malicious Behavior

This chapter delves into testing the effectiveness of safety techniques in AI models to combat potential malicious acts like poisoned or backdoored models. It evaluates different training methods to address harmful behaviors and explores strategies to improve reasoning and efficiency in AI frameworks.

Play episode from 49:08

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Like many of us, I've been using Gmail for roughly 20 years now, and while I've tried several alternative email clients over the years, for me, none have really stuck.

Shortwave, however, has truly impressed me, and I have continued to use it past the point of curiosity and into the realm of forming a genuinely new, and likely lasting habit.

My favorite feature, by far, is the "AI assistant", which presents in the increasingly familiar form factor of the natural language sidebar chatbot. It can help you search through and configure your inbox, check your availability and schedule meetings, and refer to similar emails you've sent in the past so that it can imitate your style when drafting responses for you.

As someone who has long since given up on inbox zero and really just wants an AI assistant to help me navigate my overloaded inbox, I can definitely say I've had a few magical experiences with this product – the time saved in searching for things I know exist but can't quite remember the keywords for, unto itself, has been delightful.

Andrew, who previously founded Firebase and has already been acquired by Google once, was extremely open about the technology underlying Shortwave, reflecting the fact that this is no thin wrapper, and it was a ton of fun to get so deep into the details.

We covered:

Shortwave's RAG stack, which is powered by a full download and re-indexing of your entire inbox, a process that takes hours and costs Shortwave real money, but which creates a remarkably responsive experience downstream
How the AI assistant works from user message input to AI response, including tool selection, query reformulation, feature extraction, retrieval, re-ranking, and answer generation.
Which models Shortwave is using, which includes Mistral, fine-tuned GPT-3.5, and GPT-4-turbo – a list which is always subject to change, and right now even more than usual since Claude-3 launched just after we recorded, and long-context Gemini 1.5 is on the horizon as well.

Finally, we discussed how Andrew thinks about building and timing product launches in such a fast-moving space, his vision for Shortwave, how he expects email to evolve and how we will deal with the inevitable rise in high-quality spam, and deep AI integration will help manage knowledge on a company-wide basis.

If you're building with LLMs, this conversation has a ton of great nuggets which you won't want to miss.

And if you're an email user, and I'm pretty sure all of you are, I definitely recommend checking out Shortwave.

Note that this is not a paid promotion. Andrew was kind enough to give me a free year of Shortwave, but that's it. I am genuinely just super enthusiastic about this product, and you have my commitment that I will always be transparent about any sponsorship deals that we do in the future.

As always, we appreciate it when folks share The Cognitive Revolution with friends. And right now, in particular, as we are ramping up the new feed, a social media share would be especially valuable.

And of course, your feedback is always welcome – you can leave a message on our new website at CognitiveRevolution.ai, or DM me on the social media platform of your choice.

Now, let's dive deep into the AI technology powering the future of email with Andrew Lee and Shortwave.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books