As I work on more AI content generation installments, here are some quick thoughts spurred by OpenAI’s newly announced, open-source(!) speech-to-text model, Whisper.
There was a golden age of web development in the early 00’s when rapid advances in tech and tooling made it incredibly easy for even the least talented tech worker to quickly churn out a clickable, singing, tap-dancing web page demo that could impress even technical upper-level managers who just weren’t quite current enough to grasp how easy it had just gotten to make something that looked really slick. (Some of my older readers did a bit of this back in the Cold Fusion days… real ones know.)
AI content generation is in this midwit programmer phase right now. It doesn’t take much in the way of tech mojo to stand up a basic web front-end for a machine-learning-backed service that performs absolute miracles. Or if you have an existing product, you and a team of 0.01X programmers can roll out a pretty spectacular new feature by just adding a few new API calls.
This dynamic is why this space is going to not only keep growing but is poised to support a meta-explosion of pick-and-shovel plays that help users with discovery and evaluation. (Hi, yes, it’s me. I will be doing some of that on this Substack.)
To be fair: Many of the teams behind the current wave of web frontend + ML API apps are in fact gearing up to tackle their own set of Hard Problems. This easy stuff is just what it looks like when you’re trying to put boots on the ground and start attracting early adopters. At some point, one of these weekend-hackathon-looking apps will morph into something truly novel and interesting that everyone else tries to copy.
So what I’m sort of trollishly calling “the midwit programmer phase” is really more like the early exploration phase. When it comes to commercializing a new technology, whether it’s HTML or ML, you have to start somewhere and then iterate into something better as you explore the problem/solution space.
You can get a sense of how this is playing out right now by checking out my Airtable list from my first article on AI content generation, which has a list of apps and models that I’m still adding to. There’s also GPT3Demo, which is a great resource for exploring the space, and one that I hope will be maintained.
(If you come across any more of these types of collections, please drop them in the comments or ping me on Twitter. I’d love to add to this list.)
Decentralized AI and the new features flood
The space of ML-powered web apps was already pretty hot when the only real ML available to developers was walled off behind APIs for carefully gatekept, centralized models. (I’m looking at you, OpenAI.)
Then came Stable Diffusion.
What we’ve seen since Stable Diffusion launched is an explosion of new apps and projects. The new model’s open-source nature — anyone can run it, retrain it, tweak it, customize it, host it, etc. — instantly solved at least three business model problems that plagued the centrally hosted, API-accessible models:
It’s risky to build your entire business around a single, unique API from a single provider that is not one of Amazon, Facebook, Google, Apple, or Microsoft. And even with these BigCos, you’re still taking a business risk, but there’s some mitigation in that you can often replace the service if you lose access to it. Not so if you’ve built your entire business on a single OpenAI API.
You’re forced to pass on to your own users the metered billing that the upstream API is using. In other words, your users are going to have to purchase credits on your platform, and then spend those credits doing things. Even if you don’t want to be locked into this model for whatever reasons, you kinda are.
You may be making desktop software that’s standalone, and you don’t want to be dealing with an API, much less a metered one. One moment, your customers were happily using the app that they paid for when suddenly they find themselves being asked for micropayments in order to unlock the new, ML-powered hotness.
Open-source models like Stable Diffusion and OpenAI’s new Whisper fix all of these problems at once.
It’s safe to build on these models because you don’t have to worry about a single provider acting as judge, jury, and executioner for you and your investors.
You can design the billing to work however you want, and you can even compete on price if you’re willing to be aggressive about finding or building cheap hosting.
Finally, if you make desktop software that runs locally, you can incorporate new ML-powered features with zero external API calls or billing worries.
The upshot is that we’re going to see a lot of new ML-powered apps and features in the coming weeks and months, as everyone who can do some basic web coding piles into this market. It’s a classic gold rush, where everyone is scrambling to just stake out territory (users, branding) and then they’ll figure out how to mine it.
What I’m probably most excited about is the fact that new features based on these models will start showing up in existing products via plug-ins and new releases. I’d say that by the end of the year, any app that’s popular for image editing and drawing will get AI superpowers. My kids will be using Procreate to do things like in the video below:
Indie and open-source text editors and word processors will also start getting text generation and transformation features natively built-in so that advanced phrase completion will no longer be limited to Google products.
Finally, programmers are already piling aboard the ML-assisted coding train in droves. GitHub Copilot is being rapidly adopted in my circles, and there’s an extension for VS Code that lets you use it right inside the popular editor. These tools really do work, and they’re going to make software better. They’re also going to slightly increase the speed of the aforementioned app and feature roll-outs.
Definitely a huge opportunity right now is to take an existing growth channel and build a small UI layer on top of GPT-3 and market it.
In terms of defensibility - there is none. And you're still dependent on the API layer - but people will move to build more models to compete if the API layer is charging too much, or build more competitive and similar apps until it gets saturated.
Who will be the Pets.com of AI?