Robinson’s poetry is not widely read. “Miniver Cheevy” is in every major American literature anthology, which means it is assigned and summarized and rarely inhabited. The poem is a precise, devastating portrait of a man who has convinced himself that the problem with his life is the wrong century — that had he been born into Camelot or Florence under the Medici, he would have been magnificent. It is also one of the sharpest analyses of authoritarian nostalgia in American verse, written in 1910, and it has never been more relevant. Most people who could benefit from it will never read it.
This is what AI tools should be for: not replacing the original, but building the bridge to it.
The question I was testing was specific: can a single AI-generated prompt produce a complete music video from a poem written 115 years ago? Not a series of clips requiring individual direction. Not a storyboard a human then had to execute. One prompt, dropped into CapCut, producing a coherent visual essay from beginning to end — from Miniver in the candlelit tavern to the rusted sword lying in a modern puddle.
The answer, as of this experiment, is: almost. Which is both less than I wanted and more than I expected.
The Hard Part Isn’t Image Quality. It’s Continuity.
The hard part of AI video generation isn’t image quality. It’s continuity. Songbird is built around that problem.
Songbird is a directorial AI layer that sits between lyrics and image generation in the Musinique production chain. It takes song text and transforms it into a continuous sequence of visual prompts — not a collection of isolated images, but a directed sequence with the structural logic of a camera operator working from a marked-up script. The core principle is threading: the same character, consistent lighting, a camera path that flows rather than cuts randomly.
It is built around three ideas borrowed from film: entry (inheriting motion from the preceding moment), beat (one clear event per unit), and exit (a lead-in that points toward what comes next). The result, when it works, feels less like a slideshow and more like a rough cut.
Standard AI video generation requires what amounts to frame-by-frame direction — you generate an image, evaluate it, prompt the next one, maintain continuity manually through reference images and careful reuse of character seeds and settings. A four-minute video built this way can take twelve to twenty hours of iteration. The Songbird-to-CapCut pipeline ran in two.
For this experiment, I ran Robinson’s “Miniver Cheevy” through Songbird’s song mode — music-video performance logic — combined with a vocal clone rendering of the poem. The voice is a deep baritone trained on my own recordings, processed to carry Robinson’s short, punchy four-beat lines — the meter that makes the poem feel like someone trying to be dignified and failing. The production: sparse gospel-blues, brushed percussion, slide guitar in a minor key. The poem as a song that sounds like it was always a song.
Then Songbird generated the prompt. One prompt. Into CapCut.
What the Single Prompt Does
The Songbird-to-CapCut pipeline compresses the entire directorial logic into a single compound prompt that CapCut’s AI video tools then execute as a sequence. The prompt carries not just visual descriptions but implied camera movement, tonal continuity, character consistency, and narrative arc. It is a storyboard written in natural language.
For “Miniver Cheevy,” the prompt built seven visual units from Robinson’s eight stanzas — the tavern fantasy, the bare modern room with the khaki suit, the rocking horse in the neon rain, the armor at the rain-streaked window, the rusted sword on the sidewalk, the cracked helmet on the desk beside pencils and paper, the faded movie poster peeling off a brick wall. Each unit specified not just what was in frame but what the frame felt like — the quality of light, the angle that implied the character’s relationship to what he was seeing, the transition logic that connected one image to the next.
CapCut received this as a single input and generated the full sequence.
What Worked
The visual coherence was better than I expected. The character — lean, worn, dressed in contemporary poverty with remnants of medieval fantasy draped over him — remained legible across the sequence without reference-image locking. The tavern established a color temperature (candlelight warm, stone cold) that persisted as a visual motif through the harder images: even the contemporary sidewalk with the rusty sword had a residual warmth in the amber of distant shop windows.
The transition logic held. This surprised me most. The rocking horse in the rain is a pivot in the poem — it is the moment where Miniver’s fantasy is shown as a child’s toy, stationary, chipped, ignored by the city moving past it. The prompt specified the transition into this image as a pull-back from the previous close shot, creating sudden scale: the commercial street behind the small figure of the horse. CapCut executed this as written. The image arrived with the emotional weight the poem requires.
Songbird’s beat-exit-entry logic appears to be doing real work. The prompt is not just a list of image descriptions. It is a sequence with momentum, and that momentum is partly what CapCut is reading.
What Didn’t Work
Poetry that depends on formal compression — the counted syllable, the withheld line — is fundamentally resistant to visual equivalence. This is not a failure of the tool. It is a limitation of the medium, and a productive one.
Robinson’s most important device is the truncated fourth line. Each of his eight stanzas builds three lines of iambic tetrameter and then delivers a three-beat close that deflates whatever rhetoric the preceding lines have built. “And he had reasons.” “And kept on drinking.” A visual prompt cannot embed a structural joke. It can depict a cough. It can frame a man staring at an untouched glass. It cannot reproduce the timing of a line that arrives half a beat before you expect the stanza to end.
The visual essay is in conversation with the poem. It is not a translation of it. This forces the video into genuine collaboration with Robinson’s text rather than replacement of it — which is, on reflection, the correct relationship.
There was also character drift in the later images. By the sixth visual unit — the cracked helmet on the desk — the character had lost some definition. The desk and pencils read clearly. The helmet read clearly. The implied human presence was ambiguous in a way that was not fully intentional: whose desk is this? Robinson’s or Miniver’s? A second-pass prompt could resolve this. The single-prompt workflow, by design, cannot revise itself mid-sequence.
What Comes Next
The character drift issue needs a consistency layer that Songbird does not yet have — likely a reference-image anchor system that can be passed alongside the text prompt to maintain character fidelity across longer sequences. That is the next build.
The next Songbird experiment will run a longer text — probably “Richard Cory,” Robinson’s other Tilbury Town masterpiece, or one of the longer character studies — to test whether the continuity logic holds across more complex narrative arcs. The public domain poetry series has at least a dozen candidates.
The formal compression problem — how do you visualize a three-beat punch line — may not have a clean solution, and I have made my peace with that. Some things the poem can do that the video cannot. The productive question is what the video can do that the poem cannot, and the answer there is: reach the person who will never open the anthology.
If you want to follow the Songbird experiment as it develops — including the “Richard Cory” test — subscribe and reply with a public domain poem you think deserves a music video. The next one might be yours.
Robinson’s poetry is not widely read. Miniver Cheevy, born too late, scratched his head and kept on thinking. He would have had thoughts about AI video generation. He would have told you about the films they would have made in a more cinematically enlightened era. He would have thought, and thought, and thought, and thought about it.
Robinson just made the thing.
Vocal clone: Nik Bear Brown baritone, processed through the Musinique production chain. Visual prompts: Songbird song mode. Video generation: CapCut AI. Source poem: Edwin Arlington Robinson, “Miniver Cheevy” (1910), public domain.
Tags: AI video generation, public domain poetry, music production, Songbird, creative AI tools










