Late last year, in the first part of this wrap up of technology in 2022, I looked at the fall of Twitter and rise of Mastodon?
This week, to round out 2022 and kick off 2023, let’s focus on a second significant transformation taking place now that it feels will resonate for some time to come–ChatGPT, Stable Diffusion and increasingly convincing generative AI.
It’s interesting in the month or so since I started writing this, how much has happened in this area, and how widely diffused the knowledge and use of in particular ChatGPT has become.
But Clubhouse came and went in a hot minute a couple of years back–is ChatGPT, and more generally generative AI, whether text or image (or music, or moving image, or 3D model, or…) something that will have a long term impact? Or be a similar flash in the pan?
ChatGPT, Stable Diffusion and increasingly convincing generative AI
30 or more years ago I lived in Italy. I didn’t really speak Italian (over the couple of years I lived there I became functional, but I wasn’t going to be reading The Inferno in Italian any time soon).
A colleague had a language agency, where he translated all sorts of things including Brevetti (patents). Because I had something of a science (and law) background he asked whether I might edit the translations of Italian patents into English that his translators had done.
I tried. But they basically made no sense. I mean all the words did. And the sentences were grammatical. But essentially meaningless.
So I asked to also have the original Italian patents to help and increasingly found myself translating the patents myself (the language while technical was in a sense quite simple, plus all those years of high school Latin didn’t end up entirely going to waste!)
For some reason, this ancient episode from my life has recently come to mind when thinking about ChatGPT–and how the translators understood the language they were translating, but not the concepts, while I barely understood the language, but did understand the ideas being conveyed.
But let’s backup–ChatGPT. You’ve likely heard of it (or GPT•3, which has been around since mid 2020, and earlier incarnations from OpenAI for longer, but GPT•3 was widely considered the real breakthrough in this technology).
GPT stands for “Generative Pre-trained Transformer”, a relatively recent approach to machine learning. GPT•3 generates text, based on prompts by the user. The generated text is predictive–GPT•3 is trained on an enormous database of text including billions of web pages, wikipedia, and large databases of books.
Then, based on what is most likely to come next (‘likely’ is doing a lot of work here), GTP•3 generates text. Which might sound on the face of it preposterous. I mean essentially saying what on average comes next based on billions of examples, how can that work? But it turns out, this generally leads to very persuasive text.
Before we go on, why not give it a try? Head over to OpenAI, and create an account if you haven’t already. It’s free, no credit card required.
OK, now we’ve added to the record number of people who’ve tried ChatGPT since its launch a few weeks ago, is there something more interesting there? Or is this simply largely vapid hype, like ‘Web3’.
As I’ve said more than once, one advantage of getting older (other than it beats the alternative) is you have seen more and more happen, and have more experience to compare current events with–and while history may not repeat itself, it does, as Mark Twain wryly observed, rhyme.
So what does ChatGPT rhyme with? The early internet? Web based search? The early PC? The early Web? Or early push technologies?
An analogy I’ve used and seen others use as well, is the idea of power tools. A power drill doesn’t enable anything fundamentally different from a hand operated drill. Except scale. The same is true of a powered saw, and other such tools.
Challenges and Opportunities
New technologies, and in particular their widespread adoption tend to bring out two types of response–excessively optimistic, and excessively pessimistic. We’ve seen both on display in recent weeks. Which is not to say there’s not merit in either position, but neither is likely to of course capture the full picture.
So briefly here are some of the challenges that have been given wide consideration in recent weeks, and then let’s turn to some thoughts about the opportunities, in concrete, actionable terms, these tools present.
Let’s start with the more pessimistic. Concerns about generative AI have been raised on a number of fronts–from educators worried that students will no longer do original work, to artists and other creators concerned about the impact on the value of their work, and on the ethics and legality of training these models on copyright material.
Intellectual Property and ownership
These models are trained as we saw on huge databases of other people’s work. In the case of GPT, and at least some of the generative art models, a lot of this is copyright, or otherwise protected by intellectual property law (some designs might be trademarks for instance). In the case of co-pilot, Github’s GPT based generative code creation tool, while the claim is it’s trained on open source code, this hasn’t stopped a class action law suit being brought against them.
This is doubtless a thorny legal and ethical issue, one that will play out over years (law suits of any significance take months even years to finalise, especially when they present novel challenges, and legislation takes longer still).
There’s less likely to be a direct impact on users of these technologies, though I suspect the first piece of music produced by a music generation AI that becomes popular will likely find whoever ‘owns’ that music facing law suits–something that’s become increasingly common for music produced by humans.
There’s a flip-side to this too–it’s not at all clear that text, images, music and so on produced by AI can be copyrighted.
An economic system founded on the idea of property rights, including in ideas (patents) and their expression (copyright, trademarks), faces a very serious challenge when those ideas and expressions can be generated at scale by machines and yet that output belongs to no one.
As folks experimenting with and exploring the use of ChatGPT quickly found, its output can be very plausible and persuasive, but often not entirely inaccurate, and difficult to verify. Folks have begun (since late 2022) to use the term ‘hallucinating’ to describe “a confident response by an artificial intelligence that does not seem to be justified by its training data”.
Efforts exist to address this–having systems provide citations for example (though there are examples of ‘hallucinated’ citations in GPT responses).
This seems more an artefact of the current state of these systems, than a fundamental challenge of all such systems, and we’re likely to see future versions of GPT and other models capable of citing their sources (rather than simply inventing them).
An even darker Web
The dark forest theory of the web points to the increasingly life-like but life-less state of being online. Most open and publicly available spaces on the web are overrun with bots, advertisers, trolls, data scrapers, clickbait, keyword-stuffing “content creators,” and algorithmically manipulated junk.Maggie Appleton
Algorithms run the web and its economies–the auction for our attention, the engine that powers the economy of the Web has little human input, at least not in real time. And as Maggie Appleton observes, the content of the Web too, designed to appeal to the algorithms that decide how much to pay for ads and where, is increasingly machine generated. GPT like systems threaten to only accelerate this ‘Red Queen Dilemma‘.
Will ‘truthy’, plausible sounding content generated at a scale far beyond what’s possible by even teams of humans swamp the web, and the algorithms of search engines? And if so what implications will that have?
The impact on Education
If you know anyone in or have anything to do with particularly higher education, plagiarism, and students having others complete their assessment tasks for them is a considerable challenge for educators. But it’s also risky (it’s serious adacemic misconduct that ill have significant ramifications if you’re caught), and not inexpensive. Having AI that can essentially research and write assessments for you, at a fraction the cost and effort of having humans do it is doubtless a significant challenge to what education, and in particular assessment looks like.
With concerns over wikipedia and even calculators still not uncommon among some educators, I imagine the implications of and responses to AI systems will evolve over years and even decades. Initially I imagine use of these tools will be commonly banned (several states in Australia have already banned the use of these technologies in public schools), systems will emerge for detecting their use will become widely adopted (the anti plagiarism industry even at high school level is enormous) while fewer folks might ask “what does education look like when these tools are widely available?”
In just the last day or two, OpenAI announced a new classifier ‘trained to distinguish between AI-written and human-written text‘ and then literally today an AI researcher found that this classifier thinks an AI wrote Macbeth.
Another red queen challenge.
Phew-that seems rather gloomy. So what of the opportunities? Of course there are many big picture ones we might imagine-the long term, stuff of science fiction. And that’s exciting and fun to explore, but, my focus is the right here and now. What might these technologies enable, and how might you take advantage of them in the short term-to make what you build better?
I’ve used the analogy of these AI systems as power tools. A power drill, power saw, these don’t really enable anything particularly new–humans have been drilling and sawing for millennia.
But something happened when we added electricity (or earlier still steam power) to mechanical processes that required until then human (and also animal) muscle and power to complete. Flooded land, until then uneconomical or impossible to drain became farmland. Miners could mine deeper, below water tables, as it became possible to drain mines. OK, so maybe these weren’t entirely good outcomes, but this was one of the key drivers of the industrial revolution.
Already, systems like GPT enable things that are otherwise, while possible, so time consuming as to often make them prohibitively expensive to complete, or available only to very well resourced organisations.
Here’s a few examples I’ve been using, or exploring, in particular with our video streaming platform, Conffab in recent months, and more recently still.
Transcriptions and captioning
This one is relatively straightforward, and in a way obvious, but it’s also instructive.
For years we’ve transcribed our conference presentations. For a lot of that we used human based services like Rev. We kept our eye on AI based transcription, but the relative inaccuracy meant the amount of work editing transcripts made it unfeasible.
But regardless of their relative accuracy, human based transcription has a couple of significant drawbacks. It’s relatively expensive (this is not a criticism, it’s an observation about the cost of such services)–which means you tend to be judicious about what you have transcribed. And they take time–hours to a day or more to turn around say a 40 minute presentation.
A couple of years ago we started using a service called Descript. Initially the accuracy was solid, though the results still required a reasonable amount of editing, but two things were fundamental different and transformative for us
First, the turn around time went from hours or more to minutes. And the cost went from being a relatively high marginal cost (every minute of transcription cost us money) to a fixed monthly amount for more than we’d need. And over the 2-3 years we’ve been using Descript the accuracy has improved dramatically.
In a similar vein, we recently live-streamed Web Directions Summit. Having real time captioning for us was a deal-breaker, so we explored the options.
With a 6 track conference, human based captioning was genuinely unaffordable, and incorporating that into our streaming workflow would have required a really significant amount of engineering .
But Mux had recently introduced real-time AI based captioning, and so we went with that (we also explored using services like assembly.ai). It was impressively accurate, and had much less lag than human based real time captioning is often prone to. In essence, it made streaming feasible for this event, when it otherwise simply would not have been affordable, due to the cost of captioning. And its accuracy was very impressive.
One goal we have for Conffab is to make presentations valuable even without necessarily having watch them in their entirety (or sometimes at all).
One feature we’ve long had on our roadmap to help with that is having ‘chapters’ and summaries that you could read, then if a part of the video seemed relevant jumping straight into there.
This is something that takes a lot of human expertise and effort–and with nearly 1,000 presentations to date, and hopefully many more in future, not something we had capacity for.
Recently I’ve been exploring the use of ChatGPT, and services like assembly.ai and Deepgram to provide chapters and summaries, and have started adding this feature (we went with Assembly.ai due to how it did the chaptering). It’s not perfect, but it provides a starting point for us, or speakers to build upon, cutting down work dramatically. And as with our experience of the dramatically improving accuracy of transcription over the last couple of years, we expect this only to improve significantly.
The right question
These are some of the ways we’re using generative and related technologies–like power tools that make ideas and features that could be valuable, but which are simply unfeasible from the perspective of cost, or effort, and making them feasible.
To get more of a sense of what’s possible, GPT-demo has hundreds of examples that might provide inspiration, or a solution to a challenge you’ve been facing–why not take a little time exploring what folks are doing right now?
My sense is this is very much a beginning, of something genuine and valuable, perhaps even transformative.
And and important lesson I have taken from seeing mobile, the Web, and even the PC in their early days is when you think something might have this sort of value, explore it–this is when real opportunities exist, to do something new and interesting.
So seize that.