AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
A significant update has been made in the ChatGPT desktop app with the introduction of a search feature that allows users to find real-time answers and links to trusted sources. This feature enhances the interactivity of the application by enabling users to ask direct questions and receive instant responses. The availability of this search function aims to improve the utility of the ChatGPT platform, particularly in providing current information and adapting to user needs. With this change, ChatGPT is positioned to better compete with other platforms that integrate direct search capabilities.
A Halloween-themed AI toy named Fester has been developed, showcasing innovative technology and integration with AI features. Fester interacts with users through voice commands and can recognize various costumes, demonstrating the potential of AI in entertainment and education during festive occasions. The toy leverages machine learning models to create engaging experiences, making it a unique addition to the realm of AI-powered devices. This project serves as an example of how AI can be fun and functional, enriching children's Halloween experiences.
Microsoft has released OmniParser, an advanced UI parsing model designed for web automation tasks. This state-of-the-art tool aims to enhance user interaction with applications by understanding web interfaces better than previous models. With OmniParser, users can expect improved performance in navigating and automating tasks on the web, allowing for more efficient use of various applications. This release reflects ongoing advancements in AI technology, particularly in making software more accessible and user-friendly.
The GLM for voice model has made its debut in the open-source community, marking a notable advancement in voice processing capabilities. Unlike previous voice models, GLM can understand and speak in both Chinese and English, showcasing its versatility and reach to diverse user bases. This end-to-end voice model creates opportunities for applications in more extensive areas, including customer service and accessibility. Its introduction signifies an essential step towards achieving sophisticated human-computer interactions through voice.
Meta has launched Long VU, a video language model focused on the comprehension of long video content. With the ability to analyze video sequences contextually, Long VU enhances the experience of video consumption by summarizing and extracting meaningful insights over extended footage. This innovation is particularly relevant in a world increasingly centered around video media, offering users a means to derive value from long content. By addressing challenges in video understanding, Long VU aims to redefine how we interact with video resources online.
OpenAI has introduced a new evaluation benchmark called SimpleQA, designed to improve the accuracy of AI responses by minimizing hallucinations. SimpleQA emphasizes correctness by providing a grading system that categorizes responses as correct, incorrect, or not attempted. This initiative is part of OpenAI's ongoing efforts to enhance the quality and reliability of responses generated by their models. By pushing for higher standards in AI assessments, SimpleQA aims to foster trust among users regarding AI reliability.
At GitHub Universe, the company announced that developers will now be able to choose from multiple AI models for software development. This feature includes the integration of models like Claude and Gemini alongside OpenAI models, providing flexibility for developers to select the best-performing model for their needs. In addition, GitHub has launched Spark, a new tool that allows developers to generate UI components using language commands. These innovations mark a significant evolution in software development practices and demonstrate GitHub's commitment to enhancing developer experiences.
Hey everyone, Happy Halloween! Alex here, coming to you live from my mad scientist lair! For the first ever, live video stream of ThursdAI, I dressed up as a mad scientist and had my co-host, Fester the AI powered Skeleton join me (as well as my usual cohosts haha) in a very energetic and hopefully entertaining video stream!
Since it's Halloween today, Fester (and I) have a very busy schedule, so no super length ThursdAI news-letter today, as we're still not in the realm of Gemini being able to write a decent draft that takes everything we talked about and cover all the breaking news, I'm afraid I will have to wish you a Happy Halloween and ask that you watch/listen to the episode.
The TL;DR and show links from today, don't cover all the breaking news but the major things we saw today (and caught live on the show as Breaking News) were, ChatGPT now has search, Gemini has grounded search as well (seems like the release something before Google announces it streak from OpenAI continues).
Here's a quick trailer of the major things that happened:
This weeks buzz - Halloween AI toy with Weave
In this weeks buzz, my long awaited Halloween project is finally live and operational!
I've posted a public Weave dashboard here and the code (that you can run on your mac!) here
Really looking forward to see all the amazing costumers the kiddos come up with and how Gemini will be able to respond to them, follow along!
ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Ok and finally my raw TL;DR notes and links for this week. Happy halloween everyone, I'm running off to spook the kiddos (and of course record and post about it!)
ThursdAI - Oct 31 - TL;DR
TL;DR of all topics covered:
* Open Source LLMs:
* Microsoft's OmniParser: SOTA UI parsing (MIT Licensed) π
* Groundbreaking model for web automation (MIT license).
* State-of-the-art UI parsing and understanding.
* Outperforms GPT-4V in parsing web UI.
* Designed for web automation tasks.
* Can be integrated into various development workflows.
* ZhipuAI's GLM-4-Voice: End-to-end Chinese/English speech π
* End-to-end voice model for Chinese and English speech.
* Open-sourced and readily available.
* Focuses on direct speech understanding and generation.
* Potential applications in various speech-related tasks.
* Meta releases LongVU: Video LM for long videos π
* Handles long videos with impressive performance.
* Uses DINOv2 for downsampling, eliminating redundant scenes.
* Fuses features using DINOv2 and SigLIP.
* Select tokens are passed to Qwen2/Llama-3.2-3B.
* Demo and model are available on HuggingFace.
* Potential for significant advancements in video understanding.
* OpenAI new factuality benchmark (Blog, Github)
* Introducing SimpleQA: new factuality benchmark
* Goal: high correctness, diversity, challenging for frontier models
* Question Curation: AI trainers, verified by second trainer
* Quality Assurance: 3% inherent error rate
* Topic Diversity: wide range of topics
* Grading Methodology: "correct", "incorrect", "not attempted"
* Model Comparison: smaller models answer fewer correctly
* Calibration Measurement: larger models more calibrated
* Limitations: only for short, fact-seeking queries
* Conclusion: drive research on trustworthy AI
* Big CO LLMs + APIs:
* ChatGPT now has Search! (X)
* Grounded search results in browsing the web
* Still hallucinates
* Reincarnation of Search GPT inside ChatGPT
* Apple Intelligence Launch: Image features for iOS 18.2 [π]( Link not provided in source material)
* Officially launched for developers in iOS 18.2.
* Includes Image Playground and Gen Moji.
* Aims to enhance image creation and manipulation on iPhones.
* GitHub Universe AI News: Co-pilot expands, new Spark tool π
* GitHub Co-pilot now supports Claude, Gemini, and OpenAI models.
* GitHub Spark: Create micro-apps using natural language.
* Expanding the capabilities of AI-powered coding tools.
* Copilot now supports multi-file edits in VS Code, similar to Cursor, and faster code reviews.
* GitHub Copilot extensions are planned for release in 2025.
* Grok Vision: Image understanding now in Grok π
* Finally has vision capabilities (currently via π, API coming soon).
* Can now understand and explain images, even jokes.
* Early version, with rapid improvements expected.
* OpenAI advanced voice mode updates (X)
* 70% cheaper in input tokens because of automatic caching (X)
* Advanced voice mode is now on desktop app
* Claude this morning - new mac / pc App
* This week's Buzz:
* My AI Halloween toy skeleton is greeting kids right now (and is reporting to Weave dashboard)
* Vision & Video:
* Meta's LongVU: Video LM for long videos π (see Open Source LLMs for details)
* Grok Vision on π: π (see Big CO LLMs + APIs for details)
* Voice & Audio:
* MaskGCT: New SoTA Text-to-Speech π
* New open-source state-of-the-art text-to-speech model.
* Zero-shot voice cloning, emotional TTS, long-form synthesis, variable speed synthesis, bilingual (Chinese & English).
* Available on Hugging Face.
* ZhipuAI's GLM-4-Voice: End-to-end Chinese/English speech π (see Open Source LLMs for details)
* Advanced Voice Mode on Desktops: π (See Big CO LLMs + APIs for details).
* AI Art & Diffusion: (See Red Panda in "This week's Buzz" above)
* Redcraft Red Panda: new SOTA image diffusion π
* High-performing image diffusion model, beating Black Forest Labs Flux.
* 72% win rate, higher ELO than competitors.
* Creates SVG files, editable as vector files.
* From Redcraft V3.
* Tools:
* Bolt.new by StackBlitz: In-browser full-stack dev environment π
* Platform for prompting, editing, running, and deploying full-stack apps directly in your browser.
* Uses WebContainers.
* Supports npm, Vite, Next.js, and integrations with Netlify, Cloudflare, and SuperBase.
* Free to use.
* Jina AI's Meta-Prompt: Improved LLM Codegen π
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode