AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
OpenAI has released a new model called GPT tour book. This model has a context length of 128 K tokens, which is a significant increase compared to previous models. It is also faster and cheaper to use than previous models, making it more accessible for developers. Additionally, GPT tour book has improved accuracy and a better attention mechanism. OpenAI has also announced a pricing decrease for GPT 3.5 and higher rate limits for the API.
OpenAI has introduced the concept of agents and the stateful API. Agents, known as assistants, can be created via the API and are capable of remembering their previous context and prompts. They can be customized with tools such as code interpreter, retrieval, and more. Code interpreter allows assistants to run and execute code in a safe environment, while retrieval enables the extraction and searching of information from uploaded documents. The stateful API simplifies managing conversational state by creating threads where users can send and receive messages without needing to resend previous context. It also allows for separate assistants on the same thread.
Assistants created using the API can have various tools and functions. OpenAI's tool selection allows customization based on individual needs and preferences. Users can add functions as tools to the assistant. These tools can be called and executed by the user, providing additional capabilities to the assistant. Additionally, the API supports parallel function calling, enabling the execution of multiple functions at the same time. This opens up possibilities for scenarios like smart home automation, where multiple actions can be performed simultaneously.
While the stateful API provides convenience, it's important to note the associated storage costs. Storing data with OpenAI's API can be expensive compared to other solutions like S3. Storage is priced at 20 cents per gigabyte per assistant per day. The limitation of 100 files per assistant also exists. It's essential for businesses to consider the costs and storage limits when scaling their applications using OpenAI's API.
Humane, a company, has launched a groundbreaking AI device called Pin. Pin offers a unique interface projection system that can be navigated by tilting your hand and tapping your fingers. The device aims to provide an alternative to smartphones and comes with its own SIM, powered by T-Mobile, allowing users to have a separate phone number. Pin incorporates GPT4 Turbo AI capabilities, enabling real-time language translation and personalized responses. It also integrates with music services like Tidal, offers a hot-swappable battery system, and provides multi-modal interaction through touch, voice, and projection.
GitHub announced several updates and expansions for its coding companion tool, Co-Pilot. They unveiled Co-Pilot Chat, a chat-based interface within VS Code and other platforms that enables developers to ask questions and get code-aware guidance and generation. Co-Pilot Chat will be available in GitHub Mobile and GitHub Web. GitHub also announced the Co-Pilot Extension Program, opening up opportunities for developers to extend Co-Pilot's functionality through the GitHub ecosystem. Additionally, Co-Pilot will be integrated into the GitHub Marketplace and is compatible with various platforms like BuyChar. These updates aim to enhance code collaboration and productivity for developers.
Humane's Pin is an innovative AI device designed to offer an alternative to smartphones, providing a unique interface projection system and supporting multi-modal interaction. Meanwhile, GitHub announced several expansions and updates for Co-Pilot, including the introduction of Co-Pilot Chat, enhancements to code generation and guidance, and integration with various platforms. These developments strive to elevate the coding experience for developers, utilizing AI-powered tools to improve productivity and facilitate collaboration.
Humane, a company consisting of former Apple employees, has announced the release of their AI-powered hardware device. The device, called 'Avi', is a handheld device equipped with GPT-4 based AI. It acts as an AI assistant, allowing users to interact with it using voice input and camera-based vision input. It can perform tasks such as summarizing messages, providing nutritional information, and storing notes. The device aims to augment human memory and provide on-the-fly AI experiences. Avi also focuses on health-related features and can assist with nutrition tracking and workouts. It does not support apps, and all functionality is based on cloud-based AI systems.
During the podcast episode, there were updates on OpenAI and the introduction of Yay's 34B model. Hugging Face updated their benchmarks, highlighting models that performed well and those that saw a decrease in performance. Yay, a Chinese unicorn company, open-sourced their model and showcased impressive performance. Additionally, plans for open-sourcing more models from OpenAI were mentioned, along with the release of the Whisper v3 model. The podcast also touched on the AI hardware space, with discussions around the Humane device, partnerships with OpenAI, and the potential future of hardware and software integration.
Hey everyone, this is Alex Volkov π
This week was an incredibly packed with news, started strong on Sunday with x.ai GrΕk announcement, Monday with all the releases during OpenAI Dev Day, then topped of with Github Universe Copilot announcements, and to top it all of, we postponed the live recording to see what hu.ma.ne has in store for us as AI devices go (Finally announced Pin with all the features)
In between we had a new AI Unicorn from HongKong called Yi from 01.ai which dropped a new SOTA 34B model with a whopping 200K context window and a commercial license by ex-Google China lead Kai Fu Lee.
Above all, this week was a monumental for me personally, ThursdAI has been a passion project for the longest time (240 days), and it led me to incredible places, like being invited to ai.engineer summit to do media, then getting invited to OpenAI Dev Day (to also do podcasting from there), interview and befriend folks from HuggingFace, Github, Adobe, Google, OpenAI and of course open source friends like Nous Research, Alignment Labs, and interview authors of papers, hackers of projects, and fine-tuners and of course all of you, who tune in from week to week π Thank you!
It's all been so humbling and fun, which makes me ever more excited to share the next chapter. Starting Monday I'm joining Weights & Biases as an AI Evangelist! π
I couldn't be more excited to continue ThursdAI mission, of spreading knowledge about AI, connecting between the AI engineers and the fine-tuners, the Data Scientists and the GEN AI folks, the super advanced cutting edge stuff, and the folks who fear AI with the backing of such an incredible and important company in the AI space.
ThursdAI will continue as a X space, newsletter and podcast, as we'll gradually find a common voice, and continue bringing folks awareness of WandB incredible brand to newer developers, products and communities. Expect more on this very soon!
Ok now to the actual AI news π
TL;DR of all topics covered:
* OpenAI Dev Day
* GPT-4 Turbo with 128K context, 3x cheaper than GPT-4
* Assistant API - OpenAI's new Agent API, with retrieval memory, code interpreter, function calling, JSON mode
* GPTs - Shareable, configurable GPT agents with memory, code interpreter, DALL-E, Browsing, custom instructions and actions
* Privacy Shield - Open AI lawyers will protect you from copyright lawsuits
* Dev Day emergency pod with Latent Space with Swyx, Allesio, Simon and Me! (Listen)
* OpenSource LLMs
* 01 launches YI-34B, a 200K context window model commercially licensed and it tops all HuggingFace leaderboards across all sizes (Announcement)
* Vision
* GPT-4 Vision API finally announced, rejoice, it's as incredible as we've imagined it to be
* Voice
* Open AI TTS models with 6 very-realistic, multilingual voices, no cloning tho
* AI Art & Diffusion
* <2.5 seconds full SDXL inference with FAL (Announcement)
OpenAI Dev Day
So much to cover from OpenAI that this has it's own section today in the newsletter.
I was lucky enough to get invited, and attend the first ever OpenAI developer conference (AKA Dev Day) and it was an absolute blast to attend. It was also incredible to attend it together with all 8.5 thousand of you who tuned into our live stream on X, as we were walking to the event, and then watched the keynote together (Thanks Ray for the restream) and talked with OpenAI folks about the updates. Huge shoutout to LDJ, Nisten, Ray, Phlo, Swyx and many other folks who held the space, while we were otherwise engaged with deep dives and meeting folks and doing interviews!
So now for some actual reporting! What did we get from OpenAI? omg we got so much, as developers, as users (and as attendees, I will add more on this later)
GPT4-Turbo with 128K context length
The major thing that was announced is a new model, GPT-4-turbo, which is supposedly faster than GPT-4, while being 3x cheaper (2x on output) and having a whopping 128K context length while also being more accurate (with significantly better recall and attention throughout this context length)
With JSON mode and significantly improved function calling capabilities, updated cut-off time (April 2023), and higher rate limits, this new model is already being implemented across all the products and is a significant significant upgrade to many folks
GPTs - A massive shift in agent landscapes by OpenAI
Another (semi-separate) thing that Sam talked about was the GPTs, their version of agents
not to be confused with the Assistants API, which is also Agents, but for developers, and they are not the same and it's confusing
GPTs I think is a genius marketing move by OpenAI and replaces Plugins (that didn't even meet product market fit) in many regards.
GPTs are instances of well... GPT4-turbo, that you can create by simply chatting with BuilderGPT, and they can have their own custom instruction set, and capabilities that you can turn on and off, like browse the web with Bing, Create images with DALL-E and write and execute code with Code Interpreter (bye bye Advanced Data Analysis, we don't miss ya).
GPTs also have memory, you can upload a bunch of documents (and your users as well) and GPTs will do vectorization and extract the relevant information out of those documents, so think, your personal Tax assistant that has all 3 years of your tax returns
And they have eyes, GPT4-V is built in so you can drop in screenshots, images and all kinds of combinations of things.
Additionally, you can define actions for assistants (which is similar to how Plugins were developed previously, via an OpenAPI schema) and the GPT will be able to use those actions to do tasks outside of the GPT context, like send emails, check stuff in your documentation and much more, pretty much anything that's possible via API is now possible via the actions.
One big thing that's missing for me is, GPTs are reactive, so they won't reach out to you or your user when there's a new thing, like a new email to summarize or a new task completed, but I'm sure OpenAI will close that gap at some point.
GPTs are not Assistants, they are similar but not the same and it's quite confusing. GPTs are created online, and then are share-able with links.
Which btw, I created a GPT that uses several of the available tools, browsing for real time weather info, and date/time and generates an on the fly, never seen before weather art for everyone. It's really fun to play with, let me know what you think (HERE) the image above is generated by the Visual Weather GPT
Unified "All tools" mode for everyone (who pays)
One tiny thing that Sam mentioned on stage, is in fact huge IMO, is the removal of the selector in chatGPT, and all premium users now have access to 1 interface that is multi modal on input and output (I call it MMIO) - This mode now understands images (vision) + text on input, and can browse the web and generate images, text, graphs (as it runs code) on the output.
This is a significant capabilities upgrade to many folks who will use these tools, but previously didn't want to choose between DALL-E mode and browse or Code Interpreter mode. The model now intelligently selects what tool to use for a given task, and this means more and more "generality" for the models, as they are learning and getting new capabilities in the form of tools.
This in addition to a MASSIVE 128K context window, means that chatGPT has now been significantly upgraded, and you still pay $20/mo π Gotta love that!
Assistant API (OpenAI Agents)
This is the big announcement for developers, we all got access to a new and significantly improved Assistants API, which improves on several our experience on several categories:
* Creating Assistants - Assistants are OpenAI's first foray into the world of AGENTS, and it's quite exciting! You can create an assistant via API (not quite the same as GPTs, we'll cover the differences later), you can create each assistant with it's own set of instructions (that you don't have to pass each time with the prompt), tools like code interpreter and retrieval, and functions. Also you can select models, so you don't have to use the new GPT-4-turbo (but you should!)
* Code Interpreter - Assistants are able to write and execute code now, which is a whole world of excitement! Having code abilities (that executes in a safe environment on OAI side) is a significant boost in many regards and many tasks require bits of code "on the fly", for example time-zone tasks. You will no longer have to write that code yourself, you can ask your assistant
* Retrieval - OpenAI (and apparently QDrant!) have given all the developers a built in RAG (retrieval augmented generation) capabilities + document uploading and understanding. You can upload files like documentation via the API or let your users upload files, and parse and extract information out of! This is an additional huge huge thing, basically memory is built in for you now
* Stateful API - this API introduces the concept of threads, where OpenAI will manage the state of your conversation, and you can assign 1 user per thread and then just send the responses back to the user, and send the user queries to the same thread. No longer do you have to send the whole history back and forth! It's quite incredible, however it raises the question of pricing, and calculating tokens. Per OpenAI (I asked), if you would like to calculate costs on the fly, you'd have to use the get thread endpoint, and then count the amount of tokens that's already in the thread (and it can be a LOT since there's now 128K tokens in the context length)
* JSON and Better functions calling - You can now set the API to respond in JSON mode! Which is an incredible improvement for devs, and which we only were able to do via Functions before, and even functions got an upgrade, with an ability to call multiple functions. Functions are added as "actions" in the assistant creation, so you can give your assistant abilities that it will execute by returing to you functions with the right parameters. Thing "set the mood" will return a function to call the smart lights, and "play" will return a function that will call Spotify API
* Multiple Assistants can join a thread - you can create specific assistants that can all join the same thread with the user, each with a set of custom instructions and capabilities and tools
* Parallel Functions - this is also new, the assistant API can now return several functions for you to execute, which could lead to the creation of scenes, for example in a smart home, you want to "set the mood" and several functions would be returned from the API, one that will turn of the lights, one that will start the music, and one that will turn on mood lighting.
Vision
GPT-4 Vision
Finally, it's here, multimodality for developers to implement, the moment I personally have been waiting for since GPT-4 was launched (and ThursdAI started) back on March 14 (240 days ago, but who's counting)
GPT-4 vision is able to take images, and text, and respond with many vision related tasks, like analysis, understanding, summarization of captions. Many folks are splitting videos frame by frame and analyzing whole videos already (in addition to whispering the video to get what is said)
Hackers and developers like friend of the pod Robert, created quick hacks like a browser extension that lets you select any screenshot on the page and ask GPT4 vision things about it, another friend of the pod SkalskiP created a hot dog classifier Gradio space π and is maintaining an awesome list of experiments with vision on Github
Voice
Text to speech models
OpenAI decided to help us all build agents properly, and agents need not only ears (for which they gave us whisper, and released V3 as well) but also voice, and we finally got the TTS from OpenAI, 6 very beautiful, emotional voices, that you can run very easily, and cheaply. You can't generate more or clone yet (that's only for friends of OpenAI like Spotify and others) but you can use the 6 we got (plus a secret pirate one apparently they trained but never released!)
They sound ultra-realistic, and are multi-linugal as well, you can just pass different languages and voila. Friend of the pod Simon Willison created a quick CLI tool called ospeak to pipe text into and it'll use your OAI key to read that text out with those super nice voices!
Whisper v3 was released!
https://github.com/openai/whisper/discussions/1762
The large-v3 model shows improved performance over a wide variety of languages, and the plot below includes all languages where Whisper large-v3 performs lower than 60% error rate on Common Voice 15 and Fleurs, showing 10% to 20% reduction of errors compared to large-v2:
HUMANE
Humane AI pin is ready for pre-order at 699
HUMANE pin was finally announced, and here is the break-down, they have a clever way to achieve "all day battery life" by having a hot swap system, with a magnetic booster that you can swap when you get low on battery (pretty genius TBH)
It's passive so it's not "always listening" but there is a wake word apparently, and you can activate by touch. Runs on the T-mobile Network ( which sucks for folks like me where T-mobile just doesn't have reception in their neighborhood π )
No apps, just AI experiences powered by OpenAI, with a laser powered projector UI on your hand, and voice controls
AI voice input will allow interactions like asking for information (which has browsing) and is SIGNIFICANTLY better than "Siri" or "Ok Google" from the demo, being able to rewrite your messages for you, catch you up on multiple messages and even search through them! You can ask for retrieval from previous messages
Pin is multimodal, voice input and vision
Holding the microphone on Tab while someone's speaking to you in a different language will automatically translate that language for you and then translate you back to that language with your own intonation! Bye bye language barriers!
And with vision, you can do things like tracking calories from showing it what you ate, or buy things you're seeing in the store, but online, take pictures and videos and then store all of them transcribed in your personal AI memory
Starting at $699, with a $24/mo payment that comes with unlimited AI queries, storage and service (again, just T-mobile), Tidal music subscription and more.
I think it's lovely that someone tries to take on Google/Apple duopoly with a completely re-imagined AI device, and can't wait to pre-order mine and test it out. It will be an interesting experience of balance with 2 phone numbers, but also a monthly payment that basically makes the device use-less if you stop paying.
Phew, this was a big update, not to mention there's a whole 2 hour podcast I want you to listen to on top of this, thank you for reading, for subscribing, for participating in the community and I can't wait to finally relax after this long week (still Jet-lagged) and prepare for my new Monday!
I want to send a heartfelt shoutout to my friend swyx who not only let me on to Latent Space from time to time (including the last recap emergency pod), but also is my blood-line to SF, where everything happens! Thanks man, I really appreciate all you did for me and ThursdAI π«‘
Can't wait to see you all on the next ThursdAI, and as always, replies, comments, congratulations, are welcome as replies, DMs and send me the π for this one, I'd really appreciate it!
β Alex
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode