AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
OCR Text Extraction, Flow Charts, and Configuration Scripts
This chapter explores the use of OCR to extract text from images and its implementation in the prompt. It delves into the challenges of resizing and tiling input images and discusses the potential applications of the AI model in generating flow charts and configuration scripts for AWS.
Happy 2024! We appreciated all the feedback on the listener survey (still open, link here)! Surprising to see that some people’s favorite episodes were others’ least, but we’ll always work on improving our audio quality and booking great guests. Help us out by leaving reviews on Twitter, YouTube, and Apple Podcasts! 🙏
Big thanks to Chris Anderson for the latest review - be like Chris!
Note to the Audio-only Listener
Because of the nature of today’s topic, it makes the most sense to follow along the demo on video rather than audio. There’s also about 30 mins of demos and technical detail that we had to remove from the audio version, because they didn’t make sense without video.
Trailer here.
Full 90min chat:
(In other words, pls jump over and watch on our YouTube if you can! Did you know we are now posting every episode to YouTube? We’ve been multimodal for a long time!)
Trend 1: GPT4-V Coding
You might remember Greg Brockman’s hand-scribble-to-working-website demo from the GPT-4 demo from March. This was largely inaccessible to the rest of us until the GPT4-V API was released at Dev Day in November.
As mentioned in our November 2023 recap, one of the biggest viral trends was tldraw’s open source “Make It Real” demo: starting from a simple wireframe and text annotations, you could create a real, functioning UI with the click of a button.
Provoking another crisis of confidence in developer circles:
And using state charts:
And provoking responses from Excalidraw, a competitor.
You can see us creating a Replit clone in this silent video here:
Since our intervew the new GPT4V Coding metagame has been merging app UI’s and SQL with Supabase (another AIE Summit speaker) and other backend tools:
* converting ERDs to SQL (part 2, for MariaDB)
Trend 2: Latent Consistency Models
As covered in the Latent Space Paper Club in November, 3 papers drove a roughly ~100x acceleration in the speed of text to image generation over the past year:
* Consistency Models (with Ilya Sutskever)
* Latent Consistency Models (from Tsinghua)
* LCM-LoRA (also Tsinghua, same authors)
With the invaluable help of Fal.ai (friends of the show and AI Engineer Summit and progenitors of the viral GPU Rich/Poor hats mentioned on the Semianalysis episode), TLDraw has also been at the forefront of putting this research into production, with two projects:
* drawfast: add a prompt, start sketching into the canvas and see each stroke affect the drawing. Overlap multiple of them to extend and merge drawings.
* lens: a collaborative canvas where in real time people can draw and have their sketch turn into AI-generated art. Start drawing at the bottom and see it scroll into the magic canvas.
For nontechnical people in your life, we do recommend showing them lens.tldraw.com (and its predecessor that we discuss on the show) on your and their mobile devices.
The Rise of Multimodal Prompting
At the first AI Engineer Summit in October, Logan (our first guest!) declared this the Year of Multimodality. Over the next 2 months we saw an explosion of activity in multimodal: GPT-4V’s API release at OpenAI Dev Day (our coverage here), LLaVA (our chat with author here on Visual Instruction Tuning), BakLLaVA, Qwen-VL, CogVLM, etc.
On today’s episode we have Steve Ruiz, founder of tldraw. The project originally started as an open source whiteboard that Steve built for himself and then “accidentally made a really, really good visual multimodal prompting application environment”. Turns out that infinite canvas and generative models are a very good match:
* Design is iterative: DALL-E, Midjourney, etc all work in a linear way: prompt goes in, 1-4 images come back. As you generate more, the previous images scroll away from your view. In a canvas environment, you can see the progression of your generation and visually “branch” by putting new prompts in different spaces.
* UI has “layers”: when designing interfaces there are different layers to it: the functionality, the style, the state, etc. Some of what they are building in tldraw is bringing images into the canvas to influence different layers: “One thing that we've done is to bring in screenshots of other apps, like here's Stripe.com, like make it look like Stripe, you know? Or like here's Linear.com, like let's do it this way”.
In the episode we spend a lot more time talking through all of these ideas and how Steve’s background in fine arts came back to being really useful in building a multi-modal AI canvas. Enjoy!
Show Notes
* tldraw
* Perfect Free Hand and Perfect Arrows
* “Make Real, the story so far”
* Dog CEO
* Other whiteboarding products mentioned
* FigJam
* See also Steve’s interviews on the Slow Steady Pod and TWiSt, and subscribe to his tldraw substack!
Timestamps
* [00:00:00] Introductions
* [00:01:02] Steve's Background In Fine Arts and Transition into Tech
* [00:08:22] The creation of tldraw and its open source origin
* [00:15:44] The Inception and Growth of tldraw
* [00:18:40] The Integration of AI with tldraw and Make It Real Feature
* [00:21:56] Discussion on Multimodal Prompting and Iterative Design
* [00:32:32] The Concept of Parallel Prompting in Design
* [00:34:11] Impact of AI on developer jobs
* [00:37:28] Additional Layers in Multimodal Prompting
* [00:45:18] Introduction of DrawFast and Lens Projects
* [00:50:03] tldraw 2.0 and the future of the project
* [00:55:41] The Competitive Landscape of Canvas Tools and tldraw's Unique Position
* [01:00:22] Advice for Founders: Following Your Interests and Desires
Transcript
Swyx: Welcome back to Latent Space. I'm very excited to have my good friend, Steve Ruiz. How are you this morning? [00:00:13]
Steve: Hey, how's it going? [00:00:14]
Swyx: I have had the good fortune of knowing you before you got famous and actually hanging out in the precise office and studio that you're recording from right now. Congrats on Make It Real. Congrats on tldraw. I think it's been something that's sort of years in the making, but it's probably looks like overnight success to a lot of people. [00:00:32]
Steve: Yeah. Thank you. It's kind of a funny story. I don't know. Where should we jump into it? [00:00:37]
Swyx: Well, I like to give you a little background on the person. You don't have a lot of detail on LinkedIn, just like myself. I just found out just before recording that you're also a late entrance into tech. So maybe just like, what's your background coming into something like tldraw? What makes you so unique at doing sort of creative collaborative experiences like that? I know you and I've actually used tldraw, so I have some appreciation for how hard this thing is. [00:01:02]
Steve: Yeah. Like you said, I kind of came into this a little late and kind of came into it from a weird angle. My background is actually in fine art and studio art. I have my master's from University of Chicago in visual art, would write about contemporary art and put together exhibitions and do my own paintings and drawings. And that was back when I was living in Chicago. And then when I moved over to the UK, you know, got a new studio, kept that going. But when I turned 30, I kind of decided I should probably make some money and work with other people closer than I was at the time. Studio art is primarily a solo thing. I'd always had kind of like an analytical kind of side to me. My day jobs were, you know, I was working for lawyers. I was doing this writing, like magazines and stuff. So when I did that kind of that switch back eventually to design and product design, I was also able to use a tiny little bit of technical skill that I had had just building like WordPress websites for myself and other artists as portfolios. Kind of take that, just some natural curiosity around the way that products work and kind of create a career direction that was more around prototyping and like technical design and kind of like doing the design on the bits of a product that really couldn't be designed otherwise. So the interactive bits, the bits which are maybe more, there's more questions about them. There's no clear answer to terms of like, how should this work? You know, in all those places, you kind of have to build something in order to, to figure out what you want to build. It turns out, you know, to skip right to the end for a moment, like canvas is full of those types of problems. So it's no surprise that I ended up there. It's like kind of an extreme form of the same problem. But yeah, so I was working, this was back in like 2017, 2018. And I used at the time a product called Framer. That was back when it was more of like a code product than what it is now, which is more of like a visual builder that is kind of backed by code. So I'm sort of just drilled into that. It was cool. Uber was using it. No one knew how it worked. No one could use it. So I got good at it and got a lot of advancement, early traction, whatever in my career based on that. But it also taught me to code, taught me to think about building things that other people are going to use. Taught me about kind of like the type of code that you write when you're in an exploratory phase rather than like in an execution, like production phase. And I actually ended up working for Framer. I did their education for a year, which was very different than the type of product design that I was doing before that. I did a lot of video tutorials and writing and tweeting, trying to figure out some way to make technical design content interesting, you know, in little chunks that people could consume. I joke that like they probably got less out of me in that job than I got out of the job itself. Like because, yeah, I walked away from that. Not sure if I'd helped anyone really learn how to use Framer, but I certainly learned how to, how to tweet and learn how to record a good GIF and learn how to talk into a microphone and all that type of stuff. And so in the next roles that I had, I worked for a company called Play out in New York who is also doing design tools and I really wanted to work in design tools after that. Play's doing like a mobile, I guess right now it's like just general iOS, macOS platform specific design tools where you're using actual elements from the kind of widgets from that component collection in your designs and kind of bringing that a lot closer to the end product. At the same time I started getting into open source, I'd kind of done some popular open source before. This was now 2019, it was, it was locked down. I had a little bit more time. I also had a daughter, so not that much more time. I guess that open source that I started getting into started swinging back towards some of my kind of artistic interests or studio interests and kind of visual interests. Those are the parts where I felt like the problem space was, was really underserved. It wasn't necessarily like technical problems that were really hard. It was more subjective problems where I think the thing that was lacking was the taste or the opinions or the like feeling for what good solutions were. So the first kind of problem like this that I got into was arrows. I had, you know, two boxes or two points arbitrarily placed. I want a good looking arrow, like a quote mark, like good looking arrow between the two. Well, that could be anything. That's not a math problem. Maybe it involves some angles and linear geometry and vectors and all that, but it's like the good looking part was just like my own taste and my own eye and like tons and tons of iterations and arrows are super tricky and there's a million ways for this, you know, edge cases when things are overlapping or things are too far away or too close and all this. But I was working on this and I was working on this in public on Twitter, recording gifs of boxes and arrows kind of squishing together and all that. And I think people really liked that and they liked kind of following me on this somewhat obsessive journey, which was technical, but it wasn't like it wasn't like trying to crack an algorithm. It was like trying to trying to figure out and identify the the rules governing an aesthetic experience or an ecstatic thing, which was a good looking arrow that became perfect arrows and that was pretty popular. But the next one really is what kind of broke my popularity on Twitter or just in the space and that was a project that ended up being called perfect freehand. This is a little hard to describe. If you've ever used like an iPad pencil or drew with like a stylus in Photoshop or something, like the harder you push, the thicker the line gets and the lighter you push, the thinner the line gets. It kind of is like this ink experience and that's it's not an easy problem. But if you're doing it in a kind of a Photoshop style, like raster environment, you know, the solution is pretty straightforward. You interpolate like tons and tons of tons of whatever shape you're drawing in between each point that you've actually moved your mouse to and you just change the size of that little stamp that you're making. So it's like a little circle, slightly bigger circle, slightly bigger circle, slightly bigger circle, but they're all really tightly packed together and so it looks like a kind of a line that's changing its width as it moves. My angle on this, the reason why I spent so much time on it was that I wanted to do that using vectors. I wanted to get a bunch of points in and then like a polygon that sort of defined the outside of that shape coming out because I like to work in SVG and it turned out that this was like an insanely hard problem that no one had solved. And if they have solved it, they certainly didn't open source it, but I couldn't find any good example of a variable width line that actually worked fast enough and consistent enough, etc. for it to be digital ink. And so again, I did this in public, did this on Twitter, a million GIFs of lines that look terrible, but you know, like slowly attracting more, like getting closer to the solution, attracting more people who had solved this problem or tried to do this or they wrote their PhD on ink and let me tell you about, you know, how arcs work in this environment and all this stuff. [00:07:35]
Swyx: Wow. [00:07:36]
Steve: And I, it was fantastic. Like I met so many good people who had like, were experts on this or something like it. And slowly we made a really, really good, tight little library for doing exactly what I wanted. Like here are a bunch of mouse points or just arbitrary points, like give me back a polygon that surrounds them. And let me essentially draw a line around the edge of that polygon, fill it in and it'll look like ink. So that was perfect freehand. And that's now used in like Canva uses it, like draw.io uses it, ExcalDraw uses it. We use it at tldraw all over the place. It's just like a significantly better than the next best solution in that space. And there really wasn't even any known solution in that space. So someday I'm going to be checking out at a hotel and see my own ink and you know, a little iPad or something like that. [00:08:21]
Swyx: That's amazing. [00:08:22]
Steve: So that's kind of led right into tldraw is that I had integrated my ink into Excalidraw and I, you know, spent time in that code base. And I'd also like created several like infinite canvas like tools to help me build perfect freehand and visualize it and sort of do my ink pan and zoom in and, and program against this thing. And so I had done, including Globstud design, which I won't necessarily talk about, but it's a kind of like a weird experimental design tool, but anyway, it was like, it was an infinite canvas. It was like, you know, Framer, Figma, et cetera. And after doing Excalidraw and been working on these kinds of projects that were in the same area, I was like, you know, maybe there's, there's a market here for, or not even a market. It was just like, I think the thing that I want to work on next is like a general purpose, kind of like whiteboard like engine, mostly for myself. I'd built globs, but the only thing that you could put on the canvas in globs was a glob. So I had all this like code and these solutions that I, you know, was like hanging around. It could kind of see how I would adapt it. And so that's what I started doing. And that was the next story that I was kind of telling on Twitter is that like, okay, here's how selection works in something like Figma or, or something like Miro or Framer or Sketch. It's these sort of conventions that are part of this really complicated thing called like the infinite canvas, you know, going all the way back to Flash and before then, you know, Adobe Illustrator and before then all the way back. And they're all pretty consistent between products. Like if you're making a canvas this way, you have to kind of do them all. Like your undo, redo should work in a specific way. Your selection should work in a specific way. Like, you know, the camera position and how the camera moves should work in a certain way. All the modifier, like option, drag to clone. And all of those became their little vignettes of how I was building this thing. This was now like spring of 2021. And I had everyone from any infinite canvas related creative product kind of like in my inbox being like, Hey, can you come work for us? I was like, let's talk, let's do this. And so I was either going to go work for Figma or Adobe. And I ended up going with Adobe in part because I think FigJam had just come out and the team at Figma were like, well, this is competitive with FigJam. I'm like, this thing is like nothing. It's like a little open source, you know, it's like no one uses this. It's just me trying to get to 10,000 Twitter followers, but you know, it's mine. So no. So I went with Adobe, but I told them, I'm like, I don't want to start for six months. Like this is actually a pretty fun project for me. I want to get it out of my system, you know, let me start in January and just work on this. And so they said yes. And I quit working with Play and said, I'm going to go work on this little open source thing for six months. I have some contracting money in the bank. Let's drain the company account and do this. And that's not what happened. I went full time from a Wednesday on Thursday. I had a very large communications company say, Hey, we're moving a whiteboard that we've designed for specific touchscreen devices. We're moving that into the browser. It turns out people want to use the whiteboard on their phones and on their laptops and all that like they do with Miro. And so we need to make this thing that we wrote in C++ to be highly performant on these, you know, kind of tiny microcomputers that were part of these interactive touchscreen TVs. We have to make this work on the web and we don't think it's going to be good enough. We have to build from scratch. We don't have the team. Can we just build on what you're building? At the time, this thing wasn't open source, it was just sort of, but it was getting there And I'm like, yeah, sure. Like, give me like $75,000. I'll let you see the source code. I don't want to talk to you very often, you know, like I'm not working for you. I never want to see your code, but you can look at mine. And they said yes to that. And so I was, you know, funded for those first six months. And I got to work on this without having to feel bad about it. And I'd also eventually opened up tldraw to be like sponsorware that if you were sponsoring me on GitHub, you could access it, you know, in its kind of primitive state on tldraw.com. And it had like a couple hundred people join that way and sponsor me. So at one point, like my sponsorship was, you know, over $5,000 a month, which is not massive money, but it's like I wasn't doing anything different. So it was pretty good. That's a kind of a passive thing. Anyway, I shipped it at the end of November 2021. And it was very popular. I just open source everything. It was just like, you know, the tldraw.com app, the library, the canvas, and it was organized in a certain way. It just made it all public. Everything was MIT, you know, let's just throw this out into the world and see where it goes. Well, it went pretty far. There was like number one on Hacker News for a while. It was like the top trending repo on GitHub. A lot of people, like 40,000 people showed up at tldraw.com to use it on that launch date, which was all good. Like so far, this was all within my same narrative of, okay, this is cool. I'll make this and then I'll go do something else afterwards. The thing that really surprised me was how many teams wanted to build on this. And they weren't like, they weren't building whiteboards. They weren't Miro competitors or Figma competitors. They were just like apps that you wouldn't expect to have infinite canvases inside of them and they wouldn't have built it except that I had suddenly made this very easy. And I had suddenly shrunk the development time of this like whiteboard like feature in their product from like three years and three people to three weeks and one person and not even one person just like no new developers, no new team, no new graphics experts, no computational geometry guys. Like, you know, we can do this. The canvas itself is like React all the way down. So even if you wanted to customize it, you'd just be writing React components and then a little bit more code on top. And so I was totally overwhelmed by inbound from companies who were like, I want to build this or I want to acquire you or I want to, I want you to build something for me. Or, you know, I want this in my app, you know, how do you help me or how can I do this? And people were shipping things also like within two weeks, three weeks, like production ready. Like people had taken this and run with it. And so the story that I started to get around tldraw was that like, OK, well, this is this is a cool little whiteboard, but it's also kind of like filling a gap that no one knew was there in the same way that like Mapbox or Google Maps, you know, provide maps for apps that would definitely not build maps themselves. Like maps are insanely hard, like your little local food delivery app like wouldn't just wouldn't have a map in it, you know, like easy. But it is a value add. If they can have it in there, then absolutely it is a value add. It's just completely impractical to do themselves. And what I learned talking to folks was that like every PM had used Miro or used Figma or used one of these other collaborative tools. And every creative product person was like, well, this is fun. Collaboration is fun. This canvas thing is pretty cool. Like, you know, why can't we put our CRM on the canvas or why can't we do our sales stuff here? Like we're already kind of using Miro for this. Like, why couldn't we give this to our customers as well? Like, why don't we build a product around this? And it was just a technical no until, you know, November 24th, 2021, when suddenly it was like a technical maybe and there was absolutely demand. So hence the, you know, I had to call Adobe and say, no, I'm not going to come in on Monday. Like it turned out that the best possible outcome of this happened and there's actually a company here. And then I went out and I raised a seed round from Lux in New York and Amplify in California and a whole bunch of really great angels, you know, on the story of, yeah, this is cool It's a good app, feels good. Companies want it. And, you know, by then I had almost $200,000 of sponsorship, you know, and people were just signing up and signing up because there was no way to even be a customer. [00:15:44]
Swyx: You're not saying 200k a month. [00:15:46]
Steve: No, no, no. But like, I mean, I had had up to then the amount of sponsorship that I had received was around $200,000. I think some of the recurring stuff was like, like 5,000 a month. Yeah. But yeah. [00:16:00]
Swyx: Which is in the top echelon. A lot. [00:16:02]
Steve: Yeah. Oh yeah. Certainly. Just the amount of like kind of validation that had come in around this was like more than usual. So raise a round, put together a team here in London and basically had just been building this whiteboard SDK since then, you know, we, we reconfigured the project around, okay, we're going to be building this not necessarily for end users, but for, for teams to use as kind of an infrastructure product, a developer product, something closer to Mapbox, you know, we were making demos to kind of like show different ways that it could be used. Certainly the collaboration thing is a big one, but the fact that you could put anything on the canvas that you can put on a website just because it is all HTML, CSS all the way down and that was going really well, it was already a good story. And then I just raised like a 2 million extension for the company while I was on the final pitch for that, the dev day was happening at OpenAI. And in the morning I woke up and I was getting all this kind of action on Twitter because a developer at Figma had used tldraw to make this little demo where you could draw a website, click a button and get back a, a big pop-up that had your, your website in there. It was like a prompt, like you're a developer, you just got this wireframe from your designer. Can you give it back to a single page HTML file? And it would do it and it could do it. And then you could show that website to whoever is using the app. And we took that and we're like, wow, you could do so much more with tldraw. It's just like, it's, it's only scratching the surface of the type of integration that you could do. Again, we had just finished the race. Pressure was off a little bit. It was kind of getting towards the end of the year. I was like, all right, let's, let's just take this and have some fun. Let's make some, some viral s**t. Maybe we'll get like 200 likes or something like that. And it exploded. It was like, I think we're at like last 30 days, like 22 million views or something like that. It's just like Kanye West numbers. It was, it was really, really, really popular for a couple of days. If you're on Twitter and at all technical, you might've just seen a ton of tldraw stuff on your timeline or about two weeks ago, three weeks ago. [00:17:55]
Swyx: Well, so yeah, that, that, that kind of brings us up almost to today. You just released something two hours ago, which we should talk about. Maybe this will bring a good time to bring up the screens, you know, for those who are listening. [00:18:08]
Steve: Let me, let me share. [00:18:09]
Swyx: We're recording a video as well. You can jump over to the YouTube to see stuff, but this is an inherently visual podcast, so we have to show stuff on the screen. The incremental thing I got from your blog post. So you did do a write up, which thank you for that, because I actually didn't know that you did a write up. It was just drawn up. [00:18:26]
Steve: Oh yeah. [00:18:27]
Swyx: Videos. This is the power of open source, right? That someone else had the idea. You weren't even focused on Dev Day. Someone else had the idea and just like, you know, made it without your permission or talking with you. And then the idea could spread back to you and you could run with it. [00:18:40]
Steve: Yeah, exactly. And we had made a lot of the bits and pieces like in place already based on, you know, I mean, it's, it's well documented or it's documented. There's tons of examples and all that. Yeah. Yeah. And I mean, it's a big library as far as an open source library goes, but yeah, you can work with it. And once this thing got popular, the first thing we did was create like a starter kit so that someone could take it and like run with it. So this is normal tldraw where you draw, you can whatever, move things around. It works if you've used Figma, if you've used Miro, it's kind of, kind of familiar to that. And you can put pretty much anything on this canvas that you want, like YouTube links, et cetera, because this canvas is HTML and CSS like divs and stuff all the way down. You can put things like YouTube videos on there. You can even make them play because again, like anything you can do in a website you can do on tldraws canvas. What's fun is because it is a canvas all the way down, you can also like draw on top and like do the kind of canvas manipulation stuff that you might do with normal shapes, but also with this type of content. So that ended up becoming like a big part of why make it real got kind of popular. So anyway, I'll show you make it real now. This was a hastily built layer on top of the kind of tldraw engine SDK that we sent out. And the idea here is that you can make a wireframe and we're going to send it to GPT-4 with vision with like a prompt, like much like the original one that Sawyer Hood had come up with, which is you are a web developer, you work with designers, they give you wireframes and notes and screenshots and all sorts of stuff. Could be anything. Your job is to come back with a single HTML file that has all the styles, all the JavaScript, all the markup necessary in order to make a real working prototype based on what you've been sent. It also has emotional manipulation, like you love your designers and you want them to be happy and like the better your prototype is, the happier they are. Oh, in the prompts? Yeah. Yeah. Yeah. Again, it's open source. You can read, read the prompt. It's kind of a funny one. This is part of the joy of like a multimodal prompt is we send it the photo, which kind of looks like the same as if you had done a copy and paste thing. Yeah. So like an image as well as all the text. And you had all this functionality worked out prior. [00:21:00]
Swyx: Yeah. [00:21:01]
Steve: Yeah. Yeah. [00:21:03]
Swyx: Yeah. Like that's what I find so poetic about this, that you were just ready. [00:21:06]
Steve: Yeah. It feels like we had gone off, you know, as collaboration and AI and stuff was going in one direction, we kind of just went off in our own weird, like, hey, the world is really going to need a whiteboard at some point direction. And then it just, they kind of met us where we were at. And then we've been able to just be like, show up on day one of this new world of possibility with like the thing that if I hadn't spent the last two years building this, I would spend the next two years building this. Like it is the right product for this type of a feature. So anyway, they give us back a HTML. We stick it into an iframe, put that onto the canvas, just like we did with that YouTube link and I can interact with it. So it should be going from orange to pink, orange to pink, hey, it's given us a hex code. I can click the hex code and it gives me, you know, it says it's copied it to the clipboard. [00:21:55]
Swyx: That's incredible. [00:21:56]
Steve: Like this alone is like super cool in something like V0 or some of these other kind of prompting environments, like the only way for you to then make this better, oh, maybe you can do this with ChatchubbyT or something and you could write like, oh, actually, you know, you missed the labels. Like it should say orange and pink, you know, on top of this thing. And it doesn't. So you could go back here and like, you know, make sure that this is, whatever, you could change the input. But because this is tldrawn, because you can draw on top of this stuff, you could also, you know, write on top. Like you could kind of modify this and maybe even give it the same type of markup that you would give to a designer or something like that, you know, and draw some arrows or maybe paste in a screenshot and say, hey, make it look more stylistically close to this other thing. And then what you do is you select the website that they gave you back, the previous result, along with all this markup, and you use that as the new input. And so that's going to give you something kind of like an image that looks like this that you've now sent. But we've also kind of tweaked the prompt a little bit when you do include a previous result and say like, hey, the wireframes coming back are annotations or markup based on what you sent before. And there you go. So now we have a new prompt that, sure enough, the labels are there, you know, it still works just like before. The button is full width and, you know, it still works just the same. So we send it back. Again, we send it the image, we send it the text, the prompt. We also send it all of the text items themselves separately because ChetchiBT is not really great with recognizing text. So we say like, oh, by the way, your vision's not so good. So we've ensured to have our copywriter, you know, list out all the copy that you can use. I think we even send it back the HTML that they used for the previous result. So we just dump like as much information as possible at GPT-4 with vision. And that's how you're able to get these sort of iterative results. And it is like legitimately good, like it feels like work. It feels like you're actually doing stuff when you're iterating through this way and slowly shaping and adding complexity and doing step by step, you know, as you're building something. And then you can copy a link to that and open that in a new tab like we host it. It's there forever. You can bookmark this. If you really just needed a slider between orange and pink, well, now you have one, you know, whether you could code it or not, or maybe not worth building or using a no code tool to build. But we just made that in five minutes. If you are more on the co-design, you want to use this as a kind of a foundation of a real project or maybe just to like see how it like, how does that actually work? You can open it up in StackBlitz or CodeSandbox. I think tomorrow we'll have Repl.it and yeah, see all the code, see what Chachapiti came up with and kind of use it or adapt it or, you know, keep it going or do whatever you want with it. Yeah. Cause it is, it is real. Yeah. [00:24:50]
Swyx: Make real. Yeah. It's interesting that you can also, I've seen some of your other demos. It looks like you're about to move us on to another. [00:24:57]
Steve: Yeah. I'm going to grab a couple. Okay. So what I have on the screen now just to narrate, describe it is, is I have a drawing of a, like a kitchen timer, you know, where you can add a minute, add a second, you know, start or stop the timer or, or reset the timer. And then next to it, I also have a state chart, like state machine describing the three states of the timer stopped running or complete and like what each one of those buttons should do in terms of transitions or changing the state. I think you can hand this to pretty much any designer or developer and get back a working result. [00:25:32]
Swyx: Like it's fully spec'd sort of. I mean, our friend David Korsheid might say, you know, develop a state chart first and then, you know, plug it into X state. [00:25:38]
Steve: Yeah, exactly. Well, let's do a couple of things in parallel. First thing I'm going to do is I am just going to make a box over here and I'm going to say kitchen timer right in the middle of the box. And this is going to be the only prompt that I'm going to, I'm going to give it. Okay. Just going to click make real and just the, the kitchen timer box. As you see with these multimodal prompting, like someone will draw a calculator, like in a lot of complexity and say, you know, it makes this real and sure enough, you get back like a really complex full calculator. But if you did the same thing and you just said empty box, but just the word calculator, it would give you back the same thing is that it knows what a calculator looks like and it knows how it works and all that. So next let's also give it just the user interface, like without the state chart, we'll leave the state chart out, but we'll do just the user interface. And then we'll do just the state chart, you know, and say, Hey, make this real. And then we'll do both the state chart and the UI. So we have four different prompts with four potential different results based on, you know, variations of the same, same input. So first off our kitchen timer, where all we did was we, we sent it a box with the word kitchen timer. It has, I don't know what this box is for, but we have a time we have start, stop and reset. So I can double click in, I can click start. It doesn't do anything. Oh, what is this? Oh, whoa. If this, okay, well, if the numbers there, yeah, then it'll, it'll stop. If I stop it, it stops. I can start it. It'll keep going again. Okay. And I can reset it. And there we go. The only weird thing is that it works. Yeah. It has a, a number input field for the number of seconds that I can, I can type out. But yeah. You know what? In a pinch, in a pinch, I'll take it. If I really needed just to count 60 seconds or something. Next we have, or the result where the input was just my drawing of a kitchen timer. I didn't tell it it was a kitchen timer. I didn't send it the words kitchen timer and I didn't tell it how it should work, but it did produce something that kind of looks the same. Let's see if it works. So I'm going to click minute, second, start, reset. No. So unfortunately it did not make any working UI, although it did, you know, put the buttons in the right place or something like that. [00:27:51]
Swyx: Maybe it over focuses on the UI because you told it, you just, that's all you gave it. Yeah. [00:27:57]
Steve: Yeah. I mean, there is in the prompt kind of language around, like use what you know about the way that applications work in order to sort of fill in the blanks here in terms of the behavior and all that. But let's go to the next one. This one is where we only sent it the state chart. There's also something in the prompt that says like, if it's red, it's not part of the UI. Like if it's red, then like treat that as an annotation rather than a thing that you should, should actually make. So this time it actually looks a lot like the previous one. But it does have these minute, second buttons. Oh, weird. It has plus and minus minute, seconds, and it also has this like stop state written at the bottom. So there's four buttons, you know, minus minute, minus second, plus minute, plus second, and then there's start and reset. So does it work? I can add a minute. I can also subtract a minute. All right. [00:28:44]
Swyx: Honestly, that's pretty smart. [00:28:45]
Steve: I can add a second. I can also, yeah. And if I press start, we're now in the running state. Apparently it's going up rather than down. And I can reset it and okay. I'm just curious if I, if I do give it a, an additional prompt here and I say like this should count down, not up and just kind of do an arrow towards the start button here. Let me see if that'll make a real one. But, and then finally we look at the other example, which is where we sent the state chart and the UI. We get something that looks much, much more like our user interface. The question is, does it work? Yes, it does. Perfect. I can stop it. Amazing. [00:29:24]
Swyx: Start it. [00:29:25]
Steve: That's a working timer. Reset it. Wonderful. And in this case, my feedback was accepted. I went back to the one where I, I'd asked it to count down and not up and it all looks the same, but now it's counting down. So I think for folks, especially who have worked in design and who have worked in sort of like user experience design in particular, like this should feel pretty familiar, kind of sketching out and trying to do your best to specify like what it is you want and see what you get back from your designers. You see what you get back from your developer, but having like a environment in which to have that like game loop, that like iteration cycle alone and instantaneous and essentially free is really, really wild. And you end up spending a lot of time kind of like not only getting into the head of the AI and sort of being like, okay, well, why are they getting confused? You know, what am I sending that is confusing? How do I send more information in order to like produce a better result? But also it really forces you to clarify your own expectations of like somewhere up here, I have a drag and drop list, you know, where you can drag list items between and like I started working on this and started specking it out. I was like, man, this is like actually like not only really hard to produce a good result, but it's also like just really hard to describe is that like the failure was really on my end for just not knowing how to get the information in there because I didn't actually know how this thing should work. But I could figure it out. I have an environment in which to figure that out. It's fun. [00:30:49]
Swyx: That's amazing. I'm still processing. [00:30:51]
Steve: During this, like, because this thing went massively popular on Twitter, thousands of retweets. And there were some folks who like were subtweeting it about like, you know, get over it. It's just a wireframing or no code tool or something like that. One guy did say like, you know, I prefer to wireframe like the old fashioned way with pen and paper. And I was like, oh yeah, no, that works too. Like this works with screenshots. I can just take the screenshot here of posted of the drawing that he had made. You know, it's not even like a good photo. There's a pen, you know, across one of the screens, etc. But if you just give that with no other information, like as a prompt, you get back a pretty good result. You know, just from this like photo of a piece of paper on the guy's desk, you have a not completely arbitrary result, like working website here that was inferred from just that picture with no other input, not even like titles or anything else. And of course, it's like responsive and all this stuff. And so the idea of, yes, I've worked really hard to make all of our shapes, you know, really good and our arrows obsessively good and all this stuff. But like the fun of the infinite canvas and tldraw in particular is that you could just dump like whatever you want onto the canvas, screenshots, text, images, other websites, sticky notes, all that stuff. And the model, even as something that was in preview, like the very, very first sort of multimodal model can do a really good job at just taking all that stuff as the input. And yeah, like so we accidentally made a really, really good visual multimodal prompting application environments or user experience environment. I'm not even sure what we're going to call this thing. [00:32:32]
Swyx: You also had in our pre-show prep, you also talked about parallel prompting. Is that basically just prompting and then moving on to something else? Is that what you've been showing us? [00:32:41]
Steve: Yeah, that's kind of what we did up here with the stopwatches. The fact that we could get multiple prompts going at the same time and like arrange them spatially. People have done this also with imagery to say, OK, well, here we're going to use DALI. We're going to kind of like make a tree of prompts as you go, different iterations based on whatever. You make four iterations. You pick your favorite one. You keep going. Kind of like what you do in MidJourney. But to have that spatial and to have that like arranged on a canvas so that it actually can make sense to you and you can kind of look back and follow it, follow forward that like whiteboards, infinite canvas stuff is just really good for a lot of things. So organizing like a whole bunch of different content that is irregular or ephemeral or has a kind of like ad hoc meaning configuration, like, you know, things that are next to each other or things that are in a grid or in this case, you know, just even what we have here for what we did with the stopwatch, like there's an implicit meaning of like, OK, the source is on the left, the result is on the right and any further iterations are further on the right. Right. Like we didn't put that into a data model. We didn't structure that in any way. It doesn't actually that meaning relationship doesn't really exist in any part of the product. It just exists to us because we can make sense of it for this type of thing. Not only is it cool that now a model can make sense of it as well, but yeah, for organizing complex iterations of imagery, complex iterations of outputs, et cetera, like, yeah, the canvas is a place. I really do believe that. Yeah. [00:34:11]
Swyx: I mean, that's that's that's really incredible. I think a few developers are kind of scared about, you know, how much of this their jobs this does. Obviously, there's a lot more that they can't do. [00:34:22]
Steve: Yeah. Will this take my job story is is interesting. [00:34:26]
Swyx: I think I'm not actually concerned, but I'm curious. I think this augments actually my concern as a developer is that this is good, but not good enough. You know, like it's good for throwaway UI, but would I actually export the code and take that code? I don't know. It looks like your first MVP was just HTML files, which, you know, if it's a single HTML file, it can have some JS and some CSS. I saw some problems with layout in there, which I don't know how for sure it is a layout. It's it looks like you could just prompt it for Tailwind if you want Tailwind. I assume it can generate React. I don't know. What are the limitations of this thing? [00:35:05]
Steve: There's the limitations that are in that particular demo, which is that, like, it couldn't do React because it needs to just be a single compiled thing, excuse me, ready to go. So it needs to be a single compiled thing just ready to go. You can't do any multi page stuff or anything like that. But that's more of like how we're structuring the project rather than like a specific requirement of the project itself. There's two kind of things. There's one is like how big is the input window and how big is the output window or something. In theory, you could have the input be here's a entire full stack React application together with all my UI and all this, all my components, etc. And here is a screenshot that I took of the landing page where the menu is in the wrong spot, you know, and I'm going to annotate that with some arrows and some text in order to say, like, here's where I want it to be or here's what I want, etc. And for the output to be, you know, a diff that I can apply to my code base, like basically like produce the commit that would change this and have that commit be against multiple files and etc. in order to have potentially like a solution that is just ready to go applicable like a patch, a PR that you can make. There really isn't any limit in that and we've seen with Copilot, etc. The challenge is more on the input side than the output side. Absolutely. You could figure out a way for this thing to spit out like a working iOS app or something like that. The question is like, how do you tell it what you want and how do you iterate when it gets it wrong? And just doing zero shot, zero shot, zero shot is like really a frustrating process. But if you do have a way of iterating, if you do have a way of kind of like step by step moving towards the solution that you want and kind of like getting it into like, okay, well, this is good, but it's not great, could be better, etc. That's how you actually make that type of complex output more practical or more realistic is that you probably won't get ever get the prompt just right. Even if you have like a really, really, really good three generations from now agent, like you still have to put that information in, but you're never going to put all the information in the first time you need to be able to iterate on it. And so with visual stuff, I feel like the canvas, like what we were looking at, that's part of what it unlocks is that like space of iteration, that space of you have a way of marking up the result and using that as the new prompt. And that's that's kind of new. [00:37:28]
Swyx: Yeah. Multimodal prompting is such a brilliant concept that, you know, I think it's going to be a norm for some things. In my mind, you demonstrated, you know, coming from Photoshop, there's this concept of layers. You can kind of simulate layers in tldraw. And I see like emergent property of layers in this kind of prompting, which is there's the UI layer, and then there's the state chart layer. And those two things seem like pretty useful in specifying a prompt. I was just wondering if you've thought a little bit about like other dimensions or other layers that would be useful in multimodal prompting. [00:38:02]
Steve: Yeah. One thing that we've done is to bring in screenshots of other apps, like here's Stripe.com, like make it look like Stripe, you know? Or like here's Linear.com, like let's do it this way. [00:38:16]
Swyx: Make my dev tool a website or make pop. Exactly. You should just, you should just like give a design and ask it to make pop instead of make real. [00:38:25]
Steve: Yeah, exactly. Make it more, make it more, make more pop. So there's the idea of like bringing in style as like a, as another part of the input. Flowcharts are absolutely useful. I mean, this is, it really just boils down to like, what would you really give a developer who you are working completely asynchronous with, you know, if you had to spec out a project and put that, print it out on paper and mail it to a developer and they were going to mail back a disk with an HTML file on it, like what would you send? If you were sending this to the moon or something. So yeah, definitely like descriptions of how the state should operate and specs on that. We've even just pasted in code, like, like here's a whole bunch of Jason that you can use and have it just read that as the, as the input data. You can point it at specific endpoints. You can say like, I want you to hit this endpoint and then display the results, you know, as, as cards or as items or something like that. And not, I mean, you don't even have to like wire this up. It's not like retool or anything where you, you have to register that, you know, it's not built into the tool. You just. [00:39:29]
Swyx: From an endpoint. [00:39:30]
Steve: Yeah. Yeah. Yeah. I'm trying to think of what a good demo endpoint would be. We could, maybe we could do one more, more test. What is it? Like dog.co? [00:39:38]
Swyx: Is that? Yeah. Dog.co is a good demo. Yeah. [00:39:42]
Steve: I've used that one. I mean, this might be kind of like the box with the word calculator. Like it might just know because it's probably been in a bunch of tutorials. [00:39:48]
Swyx: It's in the training set. Yeah. You're not sharing by the way. [00:39:51]
Steve: You know what? We'll, we'll do it anyway. We'll, I'll, I'll share it. We'll try. [00:39:55]
Swyx: Dog.co is, is one of those like demo APIs that you just set up just because it's not offensive. [00:40:02]
Steve: And. [00:40:03]
Swyx: Yeah, exactly. There's some useful dogs and everyone likes looking at dogs. [00:40:07]
Steve: You can, you can get dog.co. [00:40:09]
Swyx: I definitely didn't think about hitting endpoints just because it's just not in any of the demos I've seen. [00:40:15]
Steve: Yeah. But it works. Let me see. I'll, I'll have a big button down here. Show me a dog. Okay. So that's going to be our show me a dog button. This should be a picture of a dog. [00:40:26]
Swyx: Oh, that's a great dog. No, that's a cut. [00:40:30]
Steve: Thank you. And then we'll, we'll do some annotations here. We'll say like when, when this is clicked, get a new dog. [00:40:36]
Swyx: There's those perfect arrows coming in. [00:40:38]
Steve: Yeah, exactly. When clicked, get a new dog from, from, I'll just paste in this and put the result in the image. Okay. So it's, it's more of a, more of an instruction than you would normally. [00:40:52]
Swyx: Yeah. One thing that it's going to have to guess is that, you know, the, the response format, right? [00:40:57]
Steve: Cause it could be anything. This is true. Let's see if it works. Yeah. And let's see if it hit the end point in the right way. So dog button. Yeah. Okay. It hit the right red end point, Jason dog image, and then it put it in. So there you go. You have yourself a JavaScript tutorial in a box ready to go. And I think like, we probably wouldn't do this on camera, but like, you can say, you know, like, like use the auth token, you know, whatever, and like, you know, go like really get real data back from this thing. There's no reason why it wouldn't be able to do that. [00:41:34]
Swyx: You're kind of relying on the OCR. [00:41:35]
Steve: Well, not really, because again, what, inside of the prompt for this, we do give it like an array of all the texts that you've put in. We say like, look, I know your vision isn't so good, or you have a hard time reading text sometimes when it's small, because what, like the input that you get is pretty wild. It's like, it takes this as a PNG, and then it like, I can't do this in tldraw, but it resizes it, it squishes it into a 512 by 512 image or something like that. [00:42:05]
Swyx: It tiles it. [00:42:06]
Steve: Yeah. The text especially can get kind of like chunked up, especially if it's small. So we send those strings separately so that it can kind of reassemble anything that it can't read right off the bat. This is a weird future that we've found ourselves in. Pretty cool. Yeah. [00:42:23]
Swyx: I mean, you know, one layer I automatically think of is back-end, right? Like as someone who has worked at AWS, I see a lot of systems diagrams, like cloud diagrams, entity relationship diagrams for database. So I wonder if like anyone's tackled extending this to back-end, and then obviously the next level from that is full-stack apps where you have back-end in front of it. [00:42:43]
Steve: Yeah. I mean, I guess there's someone on Twitter that was using this to generate flowcharts. I'm not a back-end guy, so I don't actually know exactly what the output was, but I believe it was like a configuration script for AWS that was built off of this, like, I think you just copy and pasted a diagram that he had made in tldraw anyway and said, okay, let's throw this at this thing and see what it comes up with. Tweaking the prompt to say like, rather than building single page websites, you just return the JSON description of this configuration or something like that, or return a script that would set this up. You could tweak it to say like, here are all the entity relationships between different tables or items in tables, and give me back the SQL initialization or something that would make all these tables and set up these relationships. Yeah. It's just, again, the hard part is getting that information in. I don't know, pictures are really good. [00:43:35]
Swyx: They may speak a thousand words. Awesome. So that's one of two, what I think about multimodal viral hits in November. The other one also, you had a part to play in it, which is the local consistency models trend, where I think you worked with Fel. [00:43:51]
Steve: Yeah. So actually, I do have something to show here. We actually have a couple of things to show here. We connected with Fel because they used tldraw to create a demo for their LCM, right? Yeah. So we did that, and we made a drawfast.tldraw.com, which is basically, you get these shapes, these little draw fast shapes, and it puts the result, basically grabs that new image and puts it right next to it. And these are extremely fast. So as I'm moving things, you should see the image updating as well. And I think this was originally not a wise princess, I don't know, I'd play this more with my daughter than anything else, what this looks like. Yeah, the kids must love it. And, oh my gosh, she does. And actually, we had a lot of folks on Twitter being like, this is not good, like, whatever. Because I had a video of, whatever, my daughter drawing, and she made this awesome drawing of a mermaid, and we turned it into this really anonymous, crappy version of an illustration of a mermaid. And they're like, no, no, the children's drawing is much more interesting. I'm like, yeah, yeah, yeah, come on, who cares? Of course it is. But, you know, this is fun. [00:45:03]
Swyx: Yeah, I do think you might do animations, like some kind of, like, you could make some kind of, this is almost like stop motion film. Yeah. Yeah. I mean, we just, we need to do more work on consistency, but like, this is getting there. [00:45:18]
Steve: Yeah, it is. The fun is that like, you end up, after playing with this for a little while, you end up like, getting really into the particularities of the input. Like what can you do with a design tool? Okay. You can move things around, right? I can grab some of these and move them around, like, oh yeah, there's a highlighter here too. So we could do some highlighting, you know, that'll, that'll do stuff. And then we couldn't help ourselves. We started making these like stories. So all right, well then I'll move on to the other one that we, that we released earlier today. Yeah. Which is called lens.teeldraw.com. So that was drawfast.teeldraw.com. And again, this is probably not making a good podcast audio, but the image updates as soon as possible based on what the input drawing is. And it is pretty hypnotic. So this one's a little riskier because it's live. So we took a project called Together, which is a vertically scrolling, infinite drawing collaborative experience, a little bit like a chat room. As you're drawing, everything's just sort of moving up and it just disappears off the top of the screen, never to be seen again. So it's kind of just fun to play with. [00:46:21]
Swyx: By the way, one of the most magical chat experiences I ever had was with you. I think you were like with your daughter or something and I was, I was, whatever, showing off together. And you started writing, I started writing and we had chat on together.teeldraw.com. [00:46:34]
Steve: Yeah. [00:46:35]
Swyx: I was like, what is this? [00:46:37]
Steve: It's super cool. Inevitably someone will write like, you know, where are you from? And everyone's like chiming in and talking about it. So I'll describe what's on the screen now, which is we're taking like a screenshot of the center, like a square out of the center of this chaotic, vertically scrolling chat experience. And we're sending that to the LCM and putting back the image based on like a prompt, like, you know, desert scene or busy marketplace or futuristic cityscape or something like that. And so it is updating like, you know, 10 times a second as we go. [00:47:12]
Swyx: It's updating surprisingly quickly, like 10 frames per second. [00:47:14]
Steve: No, I think it's now like to 32 milliseconds basically as you go. And so if I draw like a big orange thing down here, it's going to kind of show up into the drawing. Maybe I'll do a big black one so you can see better. Like it just sort of becomes part of the input to this prompt and it is extremely hypnotic. This is again like lens.teeldraw.com. Yeah. It's like this like slow moving, collaborative kind of like hallucination experience and it just never ends. I mean, yeah. I'm probably going to be funding Fal completely for the next, you know, their Series A or something like that. [00:47:55]
Swyx: I don't know. I have a healthy respect for like the amount of processing that must be going on behind these things. [00:48:00]
Steve: Yeah. Well, what's funny is that like, yeah, we're using like Cloudflare workers to do the updates and the CRDTs to do the collaboration and all this like whatever LCM models to populate this image or create this image. But there's also a laptop in my living room right now that is doing the actual screenshotting and sending that up. And so there's a big note that I had to write, you know, for my family to say like, don't turn off this laptop. Don't close this laptop because this needs to be on in order for this thing to work. And no matter how good our tech stack gets, we'll always come back to some laptop stuck in the corner that can't possibly be turned off. [00:48:39]
Swyx: That's pretty fun. [00:48:40]
Steve: Yeah. [00:48:41]
Swyx: I've heard of major businesses being run that way. Yeah, exactly. [00:48:43]
Steve: Raspberry Pi in the closet. [00:48:45]
Swyx: Yeah. You know, it's weird because it's really funny because like, you know, you are inventing your own art form. This is fine art. You know, going back to your degree, it's just a different kind of art. [00:48:54]
Steve: It's funny because like the output of this, like while it is like a visual output, the output like doesn't actually matter. Like it's gone in 16 milliseconds and it's not coming back. And I think with all this AI stuff right now, just where we are with it and just how completely unknown it is in terms of like, where is this useful? Like the best thing that you can get out of this is like the experience. And so I think of this much more as like, you know, the thing that people will walk away from, from playing with like lens.tiltro.com should be more of like that experience of having interacted with this thing or interacted with it, you know, among with others rather than like, oh, there's made my favorite image or something like that. I don't know. As a former image maker, like the idea of having, having like an aesthetic experience where the image is a major part, but it's, it's not necessarily like the important part or any one of these images isn't the important part. I don't know. There's something new feeling about this. Kind of fun. Certainly. I wish I could do a big critique with all the new media artists, people about this and about like what, you know, where does this fit into the sort of the, uh, like other people's work, et cetera. [00:50:03]
Swyx: That's for them to write. And, you know, for you to build, you know, I would encourage you to keep building there because you're definitely well done with your explorations. I can sort of round it out by sort of looking towards the future. You hinted a little bit, uh, you're working towards TL draw 2.0. So first of all, actually, it seems like you're very focused on the core mission of Canvas and the AI stuff is, is a side project for now. Why not pursue it as like a full, why not pivot and like be an AI company, right? Like that's, that's, I'm sure you've got a lot of those questions. Yeah. [00:50:35]
Steve: Yeah. I mean, when you, when you get something as viral as, as tldraw got, like I think I've talked to everyone, certainly every, every investor and yes, we, we probably could on for something like together or that draw fast thing, make a tiny little SaaS app, you know, give me $10 a month, play with this thing and you know, could make it, make it good. We could go in that direction. There's not much of a moat around any of this stuff. And we're seeing that just in, you know, I don't know, Gemini is going to come out in a couple of days, weeks or whatever. And if it's better than people are just going to use that until the next better thing comes along. Like there's not a lot of like unique defensible about like, Hey, it's an, it's a drawing app plus an LCM like model because there's going to be a lot of those models and there's going to be a lot of drawing app. The thing that I think is really unique for tldraw, the thing that we have added that is not easily created is the canvas itself. Is that like web-based, hackable, extendable, super refined interactions and all that stuff like all the thousand table stakes features that drive people nuts when building something like this, like they're all there, they're all good. Day one, you could build a really great experience, whether it's AI driven or not, like using tldraw in a way that it's just not practical to do if you're building it yourself. And especially if you're not doing like graphic stuff, there's really not that much else out there oriented towards this type of thing. And I think in a world where these types of AI driven capabilities are just going to keep coming out faster and faster, you know, I don't know, next year is going to be wild. Like every month there's going to be some new, you know, capability or something. The thing that I would want to see both just me as a person and as me as having built a business that I've built is for tldraw to sort of become the place where some of this prompting, some of these ideas are explored. Even if we decided to, okay, we're just going to close everything up, we're going to build a product based on this, and maybe it's a great product, but it would only be one direction, one ray kind of into this infinite space of possibility. And that could be successful, good, but like, I mean, we've built the sort of the direct manipulation core, but there are so many, even like AI specific APIs that we could build around tldraw for having, you know, like a virtual collaborator or working with images in a more rich way. There's just so much that we could build in order to make this the best possible place to explore, not just one direction, but like, you know, many, many, many directions. And I think that narrative, that gets me much more excited. And I think we're also just like the team that we have and the tech that we have and the skills that we have, we're more of the team to build that rather than like to become like a SaaS product company. I'm not saying we'll never do like a, you know, pay us 10 bucks a month and we can, you can play with our magic toy, but primarily my goal is to make tldraw either the place to explore these different models, or you might think of it as like the battleground on which the winners will be kind of identified. Like right now we're using open AI for the make real thing. Maybe next week we'll be using Gemini and now it's, now it's a question of, okay, well now we have an environment in which to compare these two models with the same input and a very advanced form of input. But yeah, like, let's see which one does better. Now, nothing would make me happier than to be at sort of like the battlefield for multimodal prompting and multimodal AI experience. [00:53:58]
Swyx: I should also shout out Baklava as the open source vision and multimodal model. So I fully understand you want to, you want to own the light cone of multimodal prompting. I think that that'll probably be the title of the episode. What's coming up for tldraw 2.0? [00:54:15]
Steve: So really the tldraw that you are using now and that I'm using are basically 2.0. It's been in pre-release for a long time. Really the only change that's going to happen once we launch it is we're going to start selling commercial licenses for it. So if you are using tldraw in a commercial product or if you want to, then, you know, if you're funded or if you have revenue, then you'll buy a license and I'll add you to our special list of customers. So yeah, it's mostly just go to market and the necessary changes around there. There will be some kind of fun changes, secret saucy changes that launch, but nothing substantial, nothing breaking. We've put a lot of effort in the last, like it's crazy that we've only had an open source since May of this year, this new version, right? And we've been very busy since then, but it is, it's stable, it's robust. We put it through a lot of usage and caught a lot of the issues. So it's absolutely ready to go. But I have a one or two conversations with my lawyer before we, we turned, turn over the license and start, start moving it that way. Gotcha. [00:55:12]
Swyx: And then maybe I think if I could get your commentary before we close on just the competition out there, like you are not the only sort of canvas tool. I think I get now that I was going to ask about like Figma, FigJam, and they have some AI thing that they're also doing. I think Adobe is also working on similar things. Canvas also working on similar things, but they're all individual point solutions, whereas you're more the open source canvas to power all of them. I feel like it's just Excalidraw. That's like the other alternative that's, that remains. [00:55:41]
Steve: I think Excalidraw, and I like Excalidraw a lot, I contributed there and we, we retweet each other and tease each other on Twitter. And early on, I was copying features from them. Now they're copying features from me, so I, but no, it's the collaboration space is so, has so many dominant players, like that I, I think me and Excalidraw are tiny within that. There's two things. One is that we made this very strange bet on using a kind of a web canvas that our canvas is not like an HTML element or HTML canvas element. It's like normal React components all the way down. So if you wanted to add something interactive and have that participate in the sort of space of the canvas, the way that we were doing our iframes, kind of like being able to write on top of an iframe, you can't do that in Excalidraw. You can't do that anywhere. That is like a very strange tech choice that we made around tldraw that is, you know, finding its home in a few different ways. Most of the people who pick tldraw and approach me like the inbound that I get are folks for whom that's like the killer feature, be able to put interactive widgets on the canvas using just React. No matter how good Figma's like AI solution is, and I hope it's great because I love Figma and I use it, it's not going to solve every possible problem in this space. It's not even going to like touch, you know, like you can't like none of these things. And I mean, I already had identified like, OK, there was a point where like any Kanban board was like was Trello, right? When you when you talked about Kanban boards, you were talking about Trello. Kanban boards are in every productivity app now. I think the same thing is going to happen with collaborative whiteboards. It's like people like them. I'm making it easy. People are already doing it even without tldraw when it's hard. Like, like, yeah, that's going to become a kind of a commodity user experience in a lot of different products. Probably, you know, give me a diagram from a text prompt like, yeah, that is probably going to be a commodity to give me an image from a text prompt like, yeah, that's just going to be everywhere. We're just going to assume that that's, you know, it's like adding a GIF to a to a chat or something that there's no mode there. I do hope that Figma has an amazing AI integration, but I think the thing that it will help you do is use Figma, like generating an image won't be super useful, but like generating it now, autocomplete this design absolutely would be. And I hope to launch something amazing there. But yeah, like I said, there's just a million different directions that this stuff could go in. The canvas is just like a input device that allows a certain type of user experience. And that's certainly not limited to design. That's not limited to whiteboarding. It's not limited to collaboration or anything like that. Yeah, my hope is that there are those like 10,000 products that could be made with what we're making. [00:58:26]
Swyx: Yeah. That's a really great mission. And I see why you're so passionate about it. You're the right team for it. Okay. You know, a couple of lightning round questions. One, which is like, if you had some AI capability that you would wish for that you don't have yet, what would it be? [00:58:39]
Steve: Oh, that's a really good question. [00:58:42]
Swyx: Helps people to do some research. [00:58:44]
Steve: Yeah. I think probably related to, it's not quite a CRM, but like a human, just normal relationship management. This is something that I've never had a problem with until I had a startup, actually, where there's just a lot more people involved in my life. And it's hard to keep up with them all. And I think this is probably something that like an EA kind of does of saying like, hey, there's a birthday coming up or something like that. But also just, you know, identifying opportunities to work together, to connect or who's an expert on this thing that I'm working on, like that doesn't always occur to me. And I think the value of your network, that even if you're good at that, you're probably only scratching the surface of like, you know, how you could be helping the people around you and how they could be helping you based on like the specific context of like what you're working on and the problems on your table today. Yeah. [00:59:36]
Swyx: I've also wanted to build a CRM on top of Twitter because you have all the info there about what people working on your past conversations with each other and your shared interests. You know, like a bare minimum to search it, but to proactively suggest is the next layer. And I guess AI chief of staff, AI executive assistant, something like that. I think like some people are working on that, but the problem is so big that they're working on like the automation piece. So like Lindy, I had at my conference where they're like, it's a virtual assistant that you can trigger on your desktop or via email. And it mostly deals with scheduling, but also helps you do a little bit of research. So that, yeah, I think the agents field will progress there. We might take 10 years to do it. Yeah. [01:00:19]
Steve: Yeah. I can wait. It's all good. [01:00:22]
Swyx: And then finally, advice for founders, like what has helped you the most as a founder, you know, you're two years into your journey. [01:00:29]
Steve: Yeah. So this, this kind of comes a little bit out of what you learn in art school type of thing. But yeah, but one thing is that basically like when you're a studio artist or you're in a studio or whatever, there's no external constraints. You just kind of are running on, well, what do I feel like working on? And the further you get away from like, what do I feel like working on kind of like the worse your work becomes. So having like a really good feeling for that sort of desire and being able to respect and follow that desire as like, because it's not arbitrary. Is that like, if you really, really feel like working on thing, like that might be the sort of the tip of a very complex iceberg of analysis of like the field or like what people are talking about or something that you, directions and market or something like that. Like I don't know, I think with, with tldraw and with, as, as a founder on this, the thing that I've tried to do and I've tried to preserve is like being able to prioritize based on like what is most interesting right now. And that is, that is true for what code we write and like what features we work on. That's true for like which partners we, you know, we spend time with in terms of who is using tldraw, the types of problems that we want to solve, like using your own sort of sense of what's interesting as a filter and what you want to work on, like what sounds like a fun thing to work on right now as a filter. It's not naive and it can be kind of part of your, your secret sauce. I think a lot of early founders are encouraged against that and to, to be working backwards from a certain outcome and all that. And yeah, you do have to have to do that. You have to put that into the, into the mix as well, but be sure that you're, you're picking the best parts out of that mix. I don't know, the parts that you want to work on. [01:02:12]
Swyx: Well, I mean, what's the point of doing this if you don't have some fun, indulge your curiosity. [01:02:16]
Steve: Yeah. The worst case, you'll, you'll build something that you love. Yeah. Yeah, exactly. Good things can come out. Good things can absolutely come out of like. [01:02:24]
Swyx: You had an 8,000% increase in your followers or something. [01:02:29]
Steve: Yeah. Yeah. If you're a, if you're a sub stack reader, the tldraw sub stack 72 hours into this big make real virality explosion, I sat down and wrote a blog post and I, I wanted to at least capture that, that vibe of what it felt like in the middle of that, that hurricane. So it's, it's a pretty fun one. Very special. It's good to read back. [01:02:48]
Swyx: Well, I'm sure it's not the last time we'll see you do something crazy viral. I'm sure that a lot of people will be exploring tldraw. I hope a lot of people, honestly, one thing I'm thinking about is like embedding tldraw into my, my input box. I can't tldraw be like, you know, part of the input. [01:03:05]
Steve: Hey, I'm, I'm talking to the good folks over at OpenAI tomorrow. Fingers crossed. Maybe we, maybe we get it in, inside of a chat GPT or something. Cause yeah, like, [01:03:15]
Swyx: I need to, I need to move faster. [01:03:17]
Steve: Like what? You want to like take a drawing or take a photo and then annotate it or like, you know, sketch something out. You should be able to do that. [01:03:29]
Swyx: It's yeah, exactly. [01:03:31]
Steve: Yeah. It's just a good, it's just a good thing. Yeah. The people cry out for it. I failed it fast enough. [01:03:38]
Swyx: Well, thank you for inspiring the rest of us. Thank you for everything. And I'm sure we'll, we'll hear from more from you over the next few years. [01:03:46]
Steve: So thanks. [01:03:46]
Swyx: Thanks for your time. Awesome. [01:03:48]
Steve: Thank you for your time. [01:04:01]
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode