#128 - Big & Small Data in 2023 w/ Joe Reis & Matt Housley
May 29, 2023
auto_awesome
Debate between big and small data in 2023. Exploration of AI and ML, massive LLMs vs. small data alternatives. Will Big Data reign supreme or will small data become more popular?
The podcast debates the future dominance of big data over small data in data engineering and AI landscapes.
Training and maintaining large language models pose significant costs and doubts regarding competitive advantages, emphasizing data quality.
Tools like DuckDB highlight the shift towards small data processing on individual machines for efficient operations.
Deep dives
Evolution of Big Data and Google File System
The podcast delves into the evolution of big data, starting from the late '90s with search engines like Altavista and Google File System (GFS) in MapReduce. The discussion highlights how companies like John King's, facing large data problems back in the mid-2000s, needed tools like Hadoop to process vast amounts of data. It touches on the hype surrounding big data, shifting focus from technologies to data sizes, and reflects on current definitions of data scalability and complexity.
Challenges and Risks with Training Large Language Models
The podcast explores the challenges and risks associated with training large language models like LAMA and GPT-4. It discusses the significant costs involved, estimating around $10 million to train GPT-4 and the need for multiple GPUs. The conversation also touches on the leaked Google memo suggesting doubts about the competitive advantages of large language models, emphasizing the importance of data quality over scale for effective problem-solving.
Transition towards Small Data and Localized Computing
Transitioning towards small data and localized computing, the podcast mentions tools like DuckDB that enable data processing on individual machines efficiently. It emphasizes the shift from scaling out to scaling up, focusing on maximizing resources on single machines for quicker operations. The discussion highlights the value of aligning technology to actual customer needs, differentiating scenarios requiring big data tools from those where small data and individual machine processing are more practical.
Chat GPT's Role in Writing Projects
Chat GPT serves as a tool for enhancing writing projects by providing unique perspectives stylistically and aiding in brainstorming. While it can be utilized for routine tasks such as data modeling descriptions, its effectiveness diminishes when used to generate entire book content due to the need for subsequent editing and refining to maintain the author's voice and style.
Impact of Generative AI on Writing and Editing
Generative AI like Chat GPT acts as a creative assistant for writers who may encounter blocks in their writing process, offering new ideas and facilitating content flow. While primarily beneficial for suggesting new directions or topics, users must rework the generated text to align with their personal voice and refine the content for readability. Additionally, generative AI tools can be more suited for editing tasks, providing suggestions for sentence and paragraph restructuring to enhance overall quality.
There's a lot of debate on big and small data. For systems and compute, some say "Big Data is Dead", while others challenge this notion. In AI and ML, big tech companies can pour tons of money and data into building massive LLMs, while open source provides compelling "small data" alternatives to the LLM walled gardens. So which is it? Will Big Data reign supreme or will small data become more popular? Matt and I riff on these topics and more. #data#dataengineering#chatgpt#ai#bigdata
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode