Simon Willison: How Datasette Helps with Investigative Reporting (Part 2)
Dec 2, 2023
auto_awesome
Simon Willison, Former software architect at The Guardian and a JSK Journalism Fellow at Stanford University, built Datasette to help journalists analyze data. He shares fun use cases of ChatGPT and discusses the intersection of data journalism and coding using GPT-3. They explore the use of ChatGPT for satirical news articles and address bias in story generation.
Data set: An Open Source Tool for Data Exploration and Publishing
Data set is an open source project developed by Simon Villasen, the founder of DataSet. The aim of this project is to assist news organizations in publishing the data behind their stories. By making the numbers available to the audience, the stories become more impactful and trustworthy. The inspiration for data set came when Villasen worked for The Guardian and realized the potential of publishing data alongside stories. The initial version of data set provided a web application where data could be browsed and explored through a web interface and a JSON API. Villasen then added a plugin system, similar to WordPress, allowing users to add functionalities like mapping or different types of data transformations. Currently, there are about 130 data set plugins available. Moreover, Villasen recognized that there was a gap in working with medium-sized data sets that are too large for Excel but not large enough to require advanced coding skills. To address this, data set started evolving to cater to this middle-sized data by providing tools for analysis, visualization, and sharing within newsrooms. Villasen is also developing a hosted version of data set called data set cloud, making it more accessible for newsrooms to use without having to install the software locally.
The Transformative Role of Generative AI in Data Analysis
Simon Villasen discusses the transformative role of generative AI in data analysis. He highlights the new feature of code interpreter offered by GPT, which allows users to write code and analyze data in a more user-friendly way. Villasen explains that data set can also benefit from generative AI by incorporating AI features into the software. He gives an example of an exciting feature he recently developed, called enrichments, which allows users to extract specific information from tables by running GPT-3.5 calls against the data, enabling AI-assisted analysis. Villasen demonstrates his ambition to leverage AI to generate leads for journalists, emphasizing that AI can assist in uncovering stories from data, as long as it is carefully managed, fact-checked, and verified.
Privacy and Security Concerns in AI Application
Simon Villasen acknowledges the privacy and security concerns in AI applications, particularly in the context of the open AI models like GPT. He highlights the potential data leakage risks, especially when using the API or relying on hosted versions of the software. Villasen explains that while AI companies claim not to train models based on submitted data, there is still uncertainty and lack of transparency regarding how the data is used and whether it is vulnerable to breaches. He emphasizes the need for caution, especially in sensitive reporting areas, and advises considering running models on local hardware for added security. Villasen also mentions the security vulnerabilities that can arise with hosted software and acknowledges the ongoing efforts to ensure security in data set, both in the hosted version and the open source version.
AI as a Productivity Boost for Programming and Beyond
Simon Villasen highlights how AI, particularly GPT, has become a valuable productivity tool for programming. He estimates that he is 2 to 5 times more productive as a programmer using AI assistance. From helping with code completion, suggesting API methods, and providing debugging assistance, AI has significantly enhanced his programming workflow. Beyond programming, Villasen describes how AI can aid in brainstorming, writing assistance, and even entertainment. He encourages learning to code, as AI assistance can accelerate the learning process, making it less frustrating and more enjoyable. He envisions a future where AI empowers professionals in various fields, helping them automate tedious tasks, generate ideas, and enhance productivity.
In this second part of the episode with Simon Willison, he shares how Datasette, the open-source data exploration and publishing tool he built, could help journalists perform data analysis with minimum technical expertise. He also shares some fun use cases of ChatGPT in his personal life.
Simon, a former software architect at The Guardian and a JSK Journalism Fellow at Stanford University, currently works full-time to build open-source tools for data journalism. Before becoming an independent open-source developer, Simon was an engineering director at Eventbrite. He is also renowned for his work as the co-creator of the Django Web Framework, a key tool in Python web development.
If you're intrigued to discover how Datasette works and how it can help you in your newsroom, don't miss the opportunity to connect directly with Simon Willison.
🎧 Tune in to hear how AI has the potential to help amplify data journalism
🔔 Course registration is now open. Sign up for Wonder Tools X Newsroom Robots Generative AI for Media Pros Masterclass. A Live Cohort-Based Course taught by Jeremy Caplan & Nikita Roy. Sign up here.