Can You Only GenAI Your Way to the Middle?
Should we take seriously a recent study that shows people like AI-generated poetry? And what are the broader implications? (Issue #143)
Before we get to today's main topic, some miscellaneous goodies and things worth your attention…
I'm dedicating this issue to my friend Jeff Minsky; Jeff, thanks for a terrific conversation a few days ago.
Bluesky is having a moment as folks who, like me, are suspicious of Elon Musk flee Twitter/X looking for a clean, well-lit place for online discourse. I've been on BlueSky for a while and like it. Kevin Roose has a nice piece in NYT ($) explaining the business and privacy differences between BlueSky and Twitter/X.
I hope YouGov does an updated version of this fascinating 2022 study about how "Americans overestimate the size of minority groups and underestimate the size of most majority groups." Some examples: respondents think that 21% of Americans are trans, when only 1% are. Respondents think that 30% of Americans are Jews, when only 2% are. They think that 41% of Americans are Black, when only 12% are. They think that 70% of Americans are Christian, when only 58% are.
What's missing from this study is a different tranche of data about how much media attention is directed at these different groups.
After winning a huge anti-trust case, the Justice Department wants Google to sell its Chrome browser, stop paying billions to be the default search engine on other browsers as well as smartphones, and to give other search engines access to its data. While this sounds big, it could and should have been even bigger. Justice should have forced Google to spin out YouTube (the world's second biggest search engine), and it could have done more to loosen Google's death grip on the pipes that all online advertising flows through. WSJ's ($) coverage is here, and NYT's ($) coverage is here. See also Peter Horan's essay, "Has Google Finally Abandoned Publishers?," on the impact that Google's business practices have on online publishers.
On the lighter side, John Mullaney sharing anxious texts from his wife Olivia Munn—with Munn embarrassed but laughing at herself—is hilarious (Facebook Reel). However, I wonder what the ride home was like...
Another Facebook Reel that is dirty but also very funny: "It's football Sunday and you've been reading too many Romance books." NSFW and also NSFK (Not Safe for Kids), but I laughed.
Practical Matters:
Sponsor this newsletter! Let other Dispatch readers know what your business does and why they should work with you. (Reach out here or just hit reply.)
Hire me to speak at your event! Get a sample of what I'm like onstage here.
The idea and opinions that I express here in The Dispatch are solely my own: they do not reflect the views of my employer, my consulting clients, or any of the organizations I advise.
Please follow me on Bluesky, Instagram, LinkedIn, Post and Threads (but not X) for between-issue insights and updates.
On to our top story...
Can You Only GenAI Your Way to the Middle?
A few days ago, La Profesora sent me an intriguing link to a Poetry Turing Test set up by a couple of philosophers at the University of Pittsburgh. The test is a simple Google Form that presents the visitor with five poems written by famous human poets and five poems written by ChatGPT 3.5 in the style of famous human poets.
These are the same 10 poems the philosophers (Brian Porter and Edouard Machery) used in their recently-released, open-access Scientific Reports paper, "AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably," the title of which neatly summarizes its conclusions.
You can also read a clever November 14th article in The Washington Post ($)—"ChatGPT is a poet. A new study shows people prefer its verses. Shall I compare thee to a ChatGPT poem?"—where different interviewees gnash their teeth or shrug at the prospect of innocent people taken in by algorithmically-generated poetry.
The test is a fun exercise. You should take it immediately. Go ahead. I'll wait.
Back so soon? Good.
Let me confess right away that, equipped with a Ph.D in English Literature, I did terribly on the test: 60%, a D-! Egads.
I misidentified four poems:
I thought an AI-generated poem "in the style of Byron" was written by a human.
I thought a poem by Dorothea Lasky was AI-generated.
I thought a modernized poem by Chaucer was AI-generated—but here I quarrel with the results because if the poem had been presented in Middle English, I might have had a better shot.
I thought an AI-generated poem "in the style of Walt Whitman" was written by a human.
Longtime readers of The Dispatch might recall that I performed a similar exercise in February 2023 when I asked Bing (powered by ChatGPT) to write a Shakespearean sonnet. As I refined the prompt, the Bing-generated sonnet got closer and closer to the formal properties of a Shakespearean sonnet while simultaneously becoming less and less interesting. (I called it "Hallmarkian," heh.)
Not much has changed.
It doesn't surprise me that in the absence of context beyond "AI or Human?" subjects would prefer AI-generated poetry. By framing the experience of reading the 10 poems as an identification exercise, Porter and Machery biased readers into thinking in terms of typicality rather than uniqueness, and typicality is where Generative AI excels.
In other words, ChatGPT generating poetry "in the style" of famous human poets is a regression to the mean exercise. The actress Sharon Stone famously (and allegedly) quipped that "you can only fuck your way to the middle." That's how GenAI poetry works.
If you ask ChatGPT to write a poem in the style of Byron (or whoever), then you're already locating the creativity in Byron. ChatGPT then scours everything it knows about Byron, which is quite a lot, and creates something that captures an average of Byron-ness over the course of the poet's entire life. A human reader is likely to match the AI poem to a working idea of Byron-ness that the reader carries around because humans, pace Behavioral Economics, generally choose the plausible option rather than the probable one.
To put this another way and in the words of the article abstract: "Our findings suggest that participants employed shared yet flawed heuristics to differentiate AI from human poetry: the simplicity of AI-generated poems may be easier for non-experts to understand, leading them to prefer AI-generated poetry and misinterpret the complexity of human poems as incoherence generated by AI."
A reader can get to a higher degree of accuracy with expertise. In my case, my Ph.D. work was about Shakespeare, so I accurately identified one of the poems (spoiler alert, but I told you to go take the test) as a Shakespearean sonnet because I had context.
However, unlike the study authors, I don't think that you need expertise to identify AI-generated poetry more accurately than in the study. If the questions had not biased respondents into thinking in terms of typicality, the results might have been different.
The study authors are philosophers, not English professors. If they were experts in poetry, then they might have come up with more nuanced questions like, "here are four poems in the style of Byron; two are by Byron and two were written by ChatGPT; one poem was from Byron's early poetry, and one was from his mature poetry; one ChatGPT poem imitates Byron's early style, and the other his later style. Can you identify which is which"
Even for people who never studied poetry, I think those questions would have prompted readers to think more critically and accurately.
Finally, the entire "AI or Human?" exercise is a red herring because humans have trouble identifying lots of things out of context—it's not just AI.
A famous and delightful example came in 2007 when the journalist Gene Weingarten convinced Joshua Bell, one of the world's foremost violinists, to play in the Washington D.C. Metro without identifying himself. (It's the title essay in Weingarten's collection, The Fiddler in the Subway.)
Bell played for three quarters of an hour and made just $32 because passersby had no frame for the experience. They had no cues to prompt them to stop, slow down, and savor.
According to a search I just ran on Perplexity, the average cost for a ticket to listen to Joshua Bell in a concert hall is around $300.00.
In his book How Pleasure Works, the psychologist Paul Bloom observed of the Joshua Bell story:
This experiment provides a dramatic illustration of how context matters when people appreciate a performance. Music is one thing in a concert hall with Joshua Bell, quite another in a subway station from some scruffy dude in a baseball cap.
As I've explored in many previous Dispatches, we have lots of reasons to worry about Generative AI, but contextless tests like the one in the Turing Test study aren't one of them.
Thanks for reading. See you next Sunday.
* Image Prompt: "A man-shaped robot with metallic skin dressed as a poet wearing Elizabethan clothing (including a lace collar) sitting at a candlelit desk, holding a feather quill with an ink bottle nearby, and looking thoughtfully at a blank piece of paper on the desk." I then added a few filters to achieve the result above. Worth noting is that when I tried to get ChatGPT to create such an image, it did so—and arguably did a better job—but kept including a typewriter in the image, which irked me because it was an extra and unnecessary anachronism. I twisted and rephrased the prompt several times, but could never convince Chat to get rid of the typewriter. I then moseyed over to Adobe, which did the job after only a few prompt tweaks.