
Paco Nathan: Overview of the AI Tech Stack and Business Ecosystem – Episode 2
Content + AI
00:00
AI Technology in Content Design and Strategy
Exploring the integration of AI technology in content creation and strategy, emphasizing the collaboration between AI systems and subject matter experts. Discusses the significance of high-quality data sets curated by domain experts, challenges in data set development, and the evolving landscape of AI technology in various industries.
Play episode from 17:09
Transcript
Transcript
Episode notes
Paco Nathan
Most of us are learning about AI on the fly and just got started in the past year or two.
Paco Nathan has been working with AI since the 1980s and has been doing digital business nearly as long.
His background in both the technical and commercial sides of artificial intelligence gives him a unique perspective on the field that can help newcomers like me and you get oriented to this new landscape.
We talked about:
his extensive history in the AI field, including work with some of the earliest chatbots
how graphs can serve as a way to ground and contextualize unstructured content
how content that is structured properly can help help users and drive action
the tech stack underlying the current generation of AI tools
two technologies at the base level of the stack: sequence-to-sequence and diffusion
the benefits of SSM, small specialized models, over LLMs
his take on the impact of LLM chat agents on content and editorial practice
four take-homes from his recent immersion in AI conferences and gatherings:
the superiority of small, specialized learning models (SSMs) over LLMs
the issue of losing domain knowledge as experts age and retire
the importance of using your own data sets
the need for detailed task analysis as you begin building any AI model
the contrasts and interplay between AI developments at large, well-funded entities like Alphabet, Meta, and Microsoft and the smaller, more diverse ecosystem around open-source AI projects
Paco's bio
Paco Nathan, Managing Partner at Derwen, Inc., and author of Latent Space, along with other books, plus popular videos and tutorials about machine learning, natural language, and related topics. Known as a "player/coach", with +40 years tech industry experience, ranging from Bell Labs to early-stage start-ups. Werner Herzog is his spirit animal.
Board member for Argilla.io; Advisor for KUNGFU.AI. Lead committer on PyTextRank, kglab. Formerly: Director, Community Evangelism for Apache Spark at Databricks.
Long, long ago, when the world was much younger, Paco led a media collective / indie bookstore / performance art space / large online community called FringeWare. Beginning in 1992, this was one of the first online bookstores and likely the first commercial use of a chat bot on the Internet.
Connect with Paco online
Derwen.ai
Argilla.io
Video
Here’s the video version of our conversation:
https://youtu.be/bjU_q36cggw
Podcast intro transcript
This is the Content and AI podcast, episode number 2. You'd think from news stories and social media that AI is mostly about large language models like ChatGPT and big companies like Microsoft and Google. In fact, there's a large, well-established community of open-source AI projects and a variety of technologies in addition to LLM-based chat agents. With more than 40 years of experience in artificial intelligence and in the tech business world, Paco Nathan is uniquely qualified to orient us in the current AI landscape.
Interview transcript
Larry:
Hey everyone, welcome to episode number two of the Content and AI Podcast. I'm really happy today to welcome to the show Paco Nathan. Paco, we could talk literally for 20 hours about this stuff we're going to talk about today. But what Paco and I are going to talk about today just kinda get you grounded in making sense of the AI ecosystem. Paco's been doing this stuff forever. He's studied AI back in the, what, the 80s or something like that. Anyhow, welcome, Paco. Oh, and one last quick thing. Paco is the managing director of Derwen.ai, his consulting company. So welcome, Paco. Tell the folks a little bit more about your work at Derwen.ai and some of your discoveries around AI lately.
Paco:
Oh, fantastic. Thank you very much, Larry, I really appreciate it. Yeah, Derwen, we're really focused on open-source integration to support machine learning in general. But we focus a lot on natural language and graph technologies. And for what it's worth, I got into doing graph work, which is how we met, I got into that because of natural language. I was working with a family of algorithms. There's some research that had come out of, basically, taking a raw text and being able to start to put structure into, and turn it into a graph by using natural language.
Paco:
So we ended up using, like I say, these kinds of technologies, mostly in open-source, for enterprise customers. Really, to help power, help them build applications of knowledge graph, and now large language has become very popular. And, been doing this for a while.
Paco:
One of the projects I'm involved with, there is an open-source project called Argilla, based in Spain. I'm in Spain right now, actually. We started six years ago in natural language, when some of the first open-source large language models were coming out, LLMLP templates, things like that. Argilla has been doing a lot of those open source integration paths with Spacy and other kinds of NLP projects. But using them in enterprise, like I say, for the past six years. It's been interesting because, five years ago, we decided, we made a business decision to focus on large language models in enterprise business use cases. Back then, people would be like, "Language models? That seems very narrow. Why do you want to focus on this?"
Larry:
Yeah, but you're not an "I told you so" kind of guy, but you still must have some fun with it. Yeah. Well, that's great.
Larry:
And what you were just saying, too, about the natural language stuff and the graph stuff, there's this ... One of the things we've seen lately is the large language models, which are notoriously hallucination prone and not very bright, being merged with techniques like retrieval augmented generation to access a knowledge source, like a knowledge graph or something like that.
Paco:
Like a knowledge graph.
Larry:
Yeah. So that gets into the ecosystem part of this. It's not ChatGPT all the time. As you just said, you've been doing this way before they came on the scene. I'd love to get just your quick overview of that ecosystem. There's the natural language, ML flavored stuff, the knowledge representation stuff. What else is relevant, especially in terms of content practice, do you think?
Paco:
Well certainly, we can also talk about the ecosystem more, but let's first focus on where the building blocks are. Obviously, a lot of people are interested in chat, we'll touch on that later. You know, I've been working with chat apps for a long time. Going back to the early 80s, that's what we used for our class projects. I used to TA a course that Andrew Ing eventually took over and made popular. But we would teach Eliza to people doing chatbots, back in the early 80s.
Paco:
Yeah. Not all the world is chat. There is a lot of the world that has to do with text, and images, and video. And a lot of the text is structured in ways ... For instance, we work a lot with manufacturers here in Europe. And you might think that manufacturing data is all about process controls, and factories, and automation. It's not. The stuff that we work with, and so much of the important data is all PDF documents. Because you've got patent applications from your possible competitors, you've got environmental impact reports where your competitors might disclose. You've got scientific papers are being published that you have to keep up on. You've got your regulatory norms that you're publishing to European Commission, or whatever. And, just on, and on, and on. You end up with hundreds of millions of PDF documents.
Paco:
To be able to use those, you can't really do much with that in a data lake, you have to process it. So you need to use NLP to extract out the information and the relationships. And then, the next step is gosh, this is all linked. The scientific papers are referencing things that are also in the patent applications, and that has a lot to do with our competitors' factories. And by the way, if we've got thousands of vendors in a network, at the end of the day, you end up with a very large graph. This is how you make sense of it. This is how you rationalize it, is by grounding in a graph. Which is, like you say, with retrieval augmented generation, yeah, people are realizing, "A knowledge graph might be good for grounding our data."
Larry:
Yeah. And the way you just talked about that, too. There's so much of the fuss in the content world is around generative AI, and just creating content. But you just talked about just one use case for the understanding what you've got already. There's huge power in that. In fact, I just saw a paper the other day where somebody ... I don't know exactly which technology there at play, but a lot of the technologies just take random 500 character or 500 word chunks of a document.
Paco:
Right.
Larry:
And this was a new technique to take the inherent, implied semantic meaning of headings in a PDF doc and do the chunking that way to get better results.
Paco:
Oh, yeah.
Larry:
But that's just an optimization of this kind of stuff. So there's both the analytical understanding part of what you go. When I look at a lot of those PDFs I'm like, "Oh, for crying out loud. Why didn't you hire me 10 years ago?" I think this might get to what you're talking about with those knowledge graphs, that having a structure, and meaning, and semantic attributes of stuff - before you create it, that's just a pet project of mine. How can AI help with that kind of stuff? Like workflows, maybe, around that.
Paco:
Yeah. Well, it really cuts both ways. And actually, I'm here in A Coruña doing a talk about this. It's about this intersection of graphs and language for industry AI applications. So it's really cutting both ways.
Paco:
On the one hand, usage, it's good to be using graphs to organize things because, when you think of ChatGPT or any of the chatbots,
The AI-powered Podcast Player
Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!


