Unleashing Unconstrained News Knowledge Graphs to Combat Misinformation // Robert Caulk // #279
Dec 20, 2024
auto_awesome
Robert Caulk, the founder of Emergent Methods and an expert in large-scale applications, discusses the cutting-edge development of unconstrained knowledge graphs to counter misinformation. He reveals how new tools allow for the processing of vast amounts of news data more efficiently. The podcast explores the integration of knowledge graphs with AI, enhancing user interaction and the fight against false narratives. Caulk emphasizes the ethical challenges of data handling and the role of advanced AI models in improving sentiment analysis, showcasing a future of responsible information management.
The podcast contrasts traditional ontologies with ontology-free approaches in knowledge graph construction, emphasizing the need for adaptability in evolving data relationships.
Large Language Models significantly enhance the ability to process and communicate complex information succinctly, optimizing analysis of vast datasets in cost-effective formats.
Ensuring the quality of news-related data requires a robust filtering system combining human oversight and automated checks to combat misinformation effectively.
Deep dives
Understanding Knowledge Graphs and Ontology
Knowledge graphs serve as a robust framework for connecting diverse information sources, enabling complex reasoning over relationships. The podcast delves into the ongoing debate about utilizing traditional ontologies versus an ontology-free approach in building knowledge graphs. Traditional ontologies provide structured relationships, allowing for efficient data filtering but can become cumbersome when new relationships emerge, as constant updates are necessary to capture evolving connections. In contrast, an ontology-free model allows for emergent relationships to form naturally, providing high-resolution connections that emphasize critical insights without imposing predefined structures.
Navigating Infinite Relationships
The conversation highlights the challenge of dealing with infinite relationships that can complicate the efficacy of graph databases. While traditional knowledge graphs struggle with the vast and dynamic nature of relationships, the discussed approach focuses on harnessing high-level reasoning capabilities to manage relationships effectively. Instead of clamoring to define every conceivable connection, this method allows for dynamic understanding and efficient querying of relevant information. This adaptability is especially crucial given the overwhelming amount of relationships present in a real-time data scenario, such as news articles.
The Role of LLMs in Knowledge Representation
Large Language Models (LLMs) play a pivotal role in converting complex information into concise formats suitable for further analysis. The podcast illustrates how using LLMs enhances the capacity to communicate intricate relationship data in a token-efficient manner. By representing information in structures like JSON, analysts can save significantly on token usage while conveying essential relationships. This methodology not only optimizes costs but also improves the groundwork for insights gathered from large swathes of data.
Managing Data Quality and Credibility
Ensuring data quality is essential when building knowledge graphs, particularly in the context of news-related data where misinformation is prevalent. The discussion emphasizes a robust filtering system that includes human oversight and automated checks for credibility, sentiment, and reporting voice. This dual approach allows for a nuanced understanding of how different news sources align or contradict each other, providing a comprehensive picture of any given topic. Furthermore, the podcast sheds light on how leveraging diverse perspectives can help mitigate risks associated with misinformation, thus enhancing the reliability of insights generated.
Real-Time Querying and User Experience
The flexibility in constructing dynamic, real-time knowledge graphs is key to addressing user queries effectively. The ability to quickly spin up knowledge graphs tailored to individual user requests enables analysts to gain insights they need without being overwhelmed by data. This process involves interpreting natural language inputs and translating them into actionable queries, representing a significant leap forward in how users interact with complex datasets. Ultimately, this iterative and user-centric approach ensures that the knowledge graph remains pertinent and adaptable to the ever-evolving landscape of news and information.
Robert Caulk is responsible for directing software development, enabling research, coordinating company projects, quality control, proposing external collaborations, and securing funding. He believes firmly in open-source, having spent 12 years accruing over 1000 academic citations building open-source software in domains such as machine learning, image analysis, and coupled physical processes. He received his Ph.D. from Université Grenoble Alpes, France, in computational mechanics.
Unleashing Unconstrained News Knowledge Graphs to Combat Misinformation // MLOps Podcast #279 with Robert Caulk, Founder of Emergent Methods.
// Abstract
Indexing hundreds of thousands of news articles per day into a knowledge graph (KG) was previously impossible due to the strict requirement that high-level reasoning, general world knowledge, and full-text context *must* be present for proper KG construction.
The latest tools now enable such general world knowledge and reasoning to be applied cost effectively to high-volumes of news articles. Beyond the low cost of processing these news articles, these tools are also opening up a new, controversial, approach to KG building - unconstrained KGs.
We discuss the construction and exploration of the largest news-knowledge-graph on the planet - hosted on an endpoint at AskNews.app. During talk we aim to highlight some of the sacrifices and benefits that go hand-in-hand with using the infamous unconstrained KG approach.
We conclude the talk by explaining how knowledge graphs like these help to mitigate misinformation. We provide some examples of how our clients are using this graph, such as generating sports forecasts, generating better social media posts, generating regional security alerts, and combating human trafficking.
// Bio
Robert is the founder of Emergent Methods, where he directs research and software development for large-scale applications. He is currently overseeing the structuring of hundreds of thousands of news articles per day in order to build the best news retrieval API in the world: https://asknews.app.
// MLOps Swag/Merch
Timestamps:
[00:00] Rob's preferred coffee
[00:05] Takeaways
[00:55] Please like, share, leave a review, and subscribe to our MLOps channels!
[01:00] Join our Local Organizer Carousel!
[02:15] Knowledge Graphs and ontology
[07:43] Ontology vs Noun Approach
[12:46] Ephemeral tools for efficiency
[17:26] Oracle to PostgreSQL migration
[22:20] MEM Graph life cycle
[29:14] Knowledge Graph Investigation Insights
[33:37] Fine-tuning and distillation of LLMs
[39:28] DAG workflow and quality control
[46:23] Crawling nodes with Phi 3 Llama
[50:05] AI pricing risks and strategies
[56:14] Data labeling and poisoning
[58:34] API costs vs News latency
[1:02:10] Product focus and value
[1:04:52] Ensuring reliable information
[1:11:01] Podcast transcripts as News
[1:13:08] Ontology trade-offs explained
[1:15:00] Wrap up
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode