

Data Driven
Data Driven
Data Driven: the podcast where we explore the emerging field of Data Science. We bring the best minds in Data, Software Engineering, Machine Learning, and Artificial Intelligence right to you every Tuesday.
The field of data science mashes up the worlds of statistics, database architecture and software engineering. Data Scientist has been labelled by the Harvard Business Review, as "the sexiest job of the 21st century." A quick search of job search sites reveal that this field is in high demand.
In a world where Data is the new Oil, Data Science the new Refineries, consider this Car Talk for the Data Age. Every week we bring the best minds in this emerging field straight to you. Our goal is to educate and inspire our listeners so that they can be prepared to thrive in a Data Driven world.
The field of data science mashes up the worlds of statistics, database architecture and software engineering. Data Scientist has been labelled by the Harvard Business Review, as "the sexiest job of the 21st century." A quick search of job search sites reveal that this field is in high demand.
In a world where Data is the new Oil, Data Science the new Refineries, consider this Car Talk for the Data Age. Every week we bring the best minds in this emerging field straight to you. Our goal is to educate and inspire our listeners so that they can be prepared to thrive in a Data Driven world.
Episodes
Mentioned books

Jul 18, 2023 • 57min
Lauren Maffeo on Data Governance from the Ground Up
Data governance expert, Lauren Maffeo, joins Frank and Andy Leonard to discuss the importance of data governance in relation to generative AI, copyright infringement, and protecting consumer rights. They explore the challenges faced by startups in implementing data governance, the need for proactive cybersecurity measures, and the cultural transformation required for successful implementation. It is a thought-provoking discussion that provides insights into the complexities and potential solutions related to data governance in today's data-driven world.

7 snips
Jul 3, 2023 • 35min
Lauren Tickner on Strategies for Building a Personal Brand
On this episode of Data Driven, BAILeY and Frank La Vigne welcome special guest Lauren Tickner to discuss strategies for maximizing time and success in the digital age. Lauren shares her insights on motivation, dealing with online haters, and the power of automation in business. The conversation delves into the importance of understanding risks and rewards, breaking free from traditional career paths, and the benefits of working in startups or entrepreneurial businesses. Lauren also provides valuable tips on social media content creation, utilizing storytelling and personalization to engage readers. Additionally, she introduces the PASTA framework for creating compelling social media posts and shares her approach to tracking and optimizing the client journey. Moments[00:01:16] The podcast uses a British voiceover actor to differentiate from East Coast accents. An AI voice named Bailey was later used, which can now be animated.[00:06:19] Successful asset manager quits job to pursue fitness career using social media. Simplifies life and focuses on selling premium packages. Finds success with minimal monthly sales.[00:08:05] The speaker discusses their upbringing in New York and the pressure to work in the financial industry. They admire the listener's decision to break free from that path and simplify things. They also comment on the listener's sense of humor and social media presence.[00:13:00] To simplify social media content creation: automate posting to multiple platforms, identify 5 topics to focus on, add personal storytelling to engage readers, and include a call to action to prompt specific actions.[00:19:41] The text discusses creating and sharing content for three different audience groups based on their familiarity with the author. It suggests using different types of content for each group, such as introducing oneself to new audiences, showcasing expertise to familiar audiences, and offering opportunities to become clients. The author also talks about segmenting content into top, middle, and bottom of the funnel, and using different calls to action to gauge audience interest.[00:24:09] Data shows that clients who watch 2 case studies before joining stay longer. We track client journey and added quick welcome call within 4 hours of joining for positive experience. Pooled calendar allows immediate availability for calls.[00:27:46] The author explains their approach to managing their business, aiming for a smaller internal company and owning multiple businesses rather than having a large team and many clients.[00:31:58] We should focus on the potential benefits, not just the downsides. Make realistic lists of what could go right and wrong. Replace "time" with "life" to make better decisions. Consider leaving high-paid jobs for startups or entrepreneurial businesses. Showcase the value you can bring to companies.[00:34:17] The speaker finds the content interesting and praises the concept, emphasizing the key takeaway. They inquire about finding more information.

Jun 27, 2023 • 1h 3min
Steve Orrin on the Importance of Hardware in AI Development
On this episode of Data Driven, the focus is on hardware from AI optimized chips to edge computing.Frank and Andy interview Steven Orrin, the CTO of Intel Federal.Intel has developed new CPU instructions to accelerate AI workloads, and FPGAs allow for faster development in custom applications with specific needs. The speaker emphasizes the importance of data curation and wrangling before jumping into machine learning and AI, LinksWebinar: AI application benchmarking on Intel hardware through Red Hat OpenShift Data Science Platform. Register here: https://qrcodes.at/RHODSIntelBenchmarkingWebinarGet a free audiobook on us! http://thedatadrivenbook.com/Moments00:01:59 Hardware and software infrastructure for AI.00:07:18 AI benchmarks show importance of GPUs & CPUs00:14:08 Habana is a two-chip strategy offering AI accelerator chips designed for training flows and inferencing workloads. It is available in the Amazon cloud and data centers. The Habana chips are geared for large-scale training and inference tasks, and they scale with the architecture. One chip, Goya, is for inferencing, while the other chip, Gaudí, is for training. Intel also offers CPUs with added instructions for AI workloads, as well as GPUs for specialized tasks. Custom approaches like using FPGAs and ASICs are gaining popularity, especially for edge computing where low power and performance are essential.00:19:47 Intel's diverse team stays ahead of AI trends by collaborating with specialists and responding to industry needs. They have a large number of software engineers focused on optimizing software for Intel architecture, contributing to open source, and providing resources to help companies run their software efficiently. Intel's goal is to ensure that everyone's software runs smoothly and continues to raise the bar for the industry.00:25:24 Moore's Law drives compute by reducing size. Cloud enables cost-effective edge use cases. Edge brings cloud capabilities to devices.00:31:40 FPGA is programmable hardware allowing customization. It has applications in AI and neuromorphic processing. It is used in cellular and RF communications. Can be rapidly prototyped and deployed in the cloud.00:41:09 Started in biology, became a hacker, joined Intel.00:48:01 Coding as a viable and well-paying career.00:55:50 Looking forward to image-to-code and augmented reality integration in daily life.01:00:46 Tech show, similar to Halt and Catch Fire.Key Topics:Topics Covered:- The role of infrastructure in AI- Hardware optimization for training and inferencing- Intel's range of hardware solutions- Importance of software infrastructure and collaboration with the open source community- Introduction to Havana AI accelerator chips- The concept of collapsing data into a single integer level- Challenges and considerations in data collection and storage- Explanation and future of FPGAs- Moore's Law and its impact on compute- The rise of edge computing and its benefits- Bringing cloud capabilities to devices- Importance of inference and decision-making on the device- Challenges in achieving high performance and energy efficiency in edge computing- The role of diverse teams in staying ahead in the AI world- Overview of Intel Labs and their research domains- Intel's software engineering capabilities and dedication to open source- Intel as collaborators in the industry- Importance of benchmarking across different AI types and stages- The role of CPUs and GPUs in AI workloads- Optimizing workload through software to hardware- Importance of memory in memory-intensive activities- Security mechanisms in FPGAs- Programming and development advantages of FPGAs- Resurgence of FPGAs in AI and other domainsKey Facts about the Speaker:- Background in molecular biology bioresearch- Transitioned to hacking and coding- Started first company in 1995- Mentored by Bruce Schneier- Joined Intel in 2005- Worked on projects related to antimalware technologies, cloud security, web security, and data science- Transitioned to the federal team at Intel

Jun 16, 2023 • 16min
*DataPoint* Accelerating AI with Python-native Ray and the Importance of Open Source in AI
On this episode of Data Driven, we explore the topic of distributed computing frameworks for AI and ML workloads. Frank discusses the advancements of Ray, a new technology based on Python language, with performance enhancements that could range from 10-12 times faster to thousands of times faster in extreme cases. We delve into the power of open source artificial intelligence and how it can aid data endeavors to accelerate these efforts. Along the way, we touch upon IBM and Red Hat's partnership, the evolution of technology, the importance of problem-specific solutions, and more. Stay tuned for a new episode of "Data Driven" and a special segment from our speaker on the potential AI holds for our future.[00:01:50] Ray is a new computing framework for AI/ML, may replace Spark, based on Python, can free people from PySpark.[00:03:49] Speaker has a MacBook M2 and prefers it over Windows. They enjoy stream-side streaming and wrote an article prompted by a question at work about a new technology claiming to be the next big data processing framework. They believe Ray still has an advantage.[00:06:51] Webinar about power of IBM-Red Hat partnership in AI. Speaker mentions travel with family and introduces production assistant.[00:11:34] Tech anticipated, surprised by speed of Chat GPT. Some dismiss as a fad, but it's different from predictive text like comparing paper airplane to an Airbus A 380, based on same principles but very different in implementation and technology.[00:13:30] Encourage attendance at AI webinar showcasing ethical concerns. Open source needed for transparency and risk-sharing. AI impact on all, even entry-level jobs and economy.

May 23, 2023 • 50min
Saket Saurabh on Automating Data Engineering
In this episode of Data Driven, Frank and Andy get back to the data engineering side of the equation by speaking with Saket Saurabh, CEO & co founder of Nexla. Nexla specializes in tools for automating data engineering processes.

May 16, 2023 • 1h 13min
Celebrating 6 years of Data Driven and a New Magazine
Welcome to the grand premiere of Season 7 of the Data Driven Podcast! In this inaugural episode of our seventh season, Andy and Frank interview each other, announce their new project, and more!Now, let us start season 7 with the promise that our season 7 will be better than Game Of Thrones' Season 7. The north remembers, as you know.

May 9, 2023 • 57min
Albert Castellana on Generative AI Agents
On this episode of the Data Driven podcast, Frank and Andy interview Albert Castellana, Co-Founder and CEO at Yeager AI. Yeager as in Chuck Yeager and AI as in generative AI.Stay tuned for a fascinating discussion on the nature of NLP models, entrepreneurship, and good Barcelona coffee.Linkshttps://yeager.ai/https://www.linkedin.com/in/acastellana/https://www.youtube.com/@TheWhyFilesPS. I know that last week I said that show number 326 would be the last of season six. This just goes to show you that you cannot always trust what an AI tells you.-BAILeY

7 snips
May 2, 2023 • 60min
Bryan Debois on Autonomous Industrial AI
Bryan DeBois, Director of Industrial AI at RoviSys, discusses the evolution of industrial AI, cybersecurity in manufacturing, blockchain in supply chains, bias removal in AI, and the intersection of data and industry. Personal anecdotes about family roller coaster adventures and book recommendations add a touch of fun to the conversation.

Apr 25, 2023 • 53min
W. Curtis Preston on the Future of Backup Tech
In this episode, Frank interviews W. Curtis Preston on the Future of Backup Tech after getting nostalgic about T1 lines, tapes, and the Y2K bug.

Apr 12, 2023 • 46min
Thomas Yionoulis on Going from Stand Up to Start Up
On today's episode of the Data Driven podcast, Andy interviews Tommy Yionoulis, founder of Ops Analitica. As a former stand-up comic turned sass founder, he has extensive experience helping businesses become more efficient and profitable through process, accountability, and data.