The Tuning Fork
A song does not describe a feeling. It strikes one.
Choose a version
How should this essay speak?
Pick the surface you want first. You can switch versions after the article opens.
The machine-made text is not the artwork. The artwork is this page as software: the choosing, switching, sourcing, refusing, and reframing built into the reading.
I was thinking about songwriting, and I think it is the cleanest test of everything the last three pieces argued — because a song's whole job is to do the one thing a shadow is not supposed to be able to do: carry the feeling across, not just its picture.
So the first honesty is this: I do not think the machine's fluent outputs are art by themselves. They are material, instrument, weather in a jar. The art here is the composed encounter — the post as software, the alternate readings, the citations, the argument, the refusal to let the generated thing pretend it made its own Why. Writing is part of the surface. Software is the medium.
Two conversations set this off. A songwriter told me how she works: she starts from a kernel — one small idea, a single charged image — and extrapolates it outward until it is a whole song. The other was a researcher studying the latent space of a model that diffuses songs into being. The old version of this thought got the mechanism wrong. A diffusion model is not sitting there guessing the next note. It begins from noise and learns a path back toward data: add noise to examples during training, then reverse the process at generation time, removing noise until a sample appears.
That correction matters. It makes the model more interesting, not less. It is not a clumsy continuer. It is a sculptor with a learned chisel: every step asks what in this field of static does not belong to the manifold of music. What struck me was not that the machine's songs were bad. It was that the carving still had no reason internal to the song except resemblance. It could remove what sounded unlike music without knowing why this metaphor, this tension, this bridge, this return, should make a listener ring.
Two engines, pointed different ways
The model denoises: given a noisy latent and a learned gradient toward the data distribution, what should change next? Score-based diffusion papers describe this exactly: a forward process corrupts data with noise, and a learned reverse process transforms noise back into data. Audio systems such as AudioLDM and Moûsai put that logic into latent audio or music spaces, where the object being carved is not pixels but a compressed representation of sound. Scott H. Hawley's work around Harmonai, Dance Diffusion, and later Stable Audio is the local version of that story: high-resolution musical audio treated as something that can be encoded, visualized, and reconstructed through learned latent structure.
So the real contrast is not continuation versus intention. There are music systems that really are language models over audio tokens, and there are diffusion systems that carve from noise. The contrast I care about is distributional restoration versus felt aim. The writer starts from an intended effect on a listener that does not exist yet and selects means to cause it. The diffusion model starts from disorder and moves toward the learned shape of music. Both move. Only one has a body in the room asking whether the movement landed.
Put it in the ladder from the second piece. Denoising the surface is still mostly rung one — association across the face of the shadow — unless the system has a way to model the cause the surface is a surface of. The song is the shadow; the feeling is the cause that cast it, or at least the cause the song is trying to manufacture. A diffusion model can become a superb restorer of the surface. That is already a serious thing. But surface restoration is not yet a theory of why this sound should matter to this listener now.
What the metaphor is for
Recall the first piece: the inner state is illegible, sealed off even from the person who has it. You cannot hand it over whole — if you could, there would be no need for the song. So the writer transduces it into a structure built so that a feeling adjacent to the original reconstructs in whoever stands in front of it. "A linguistic isolation of the Platonic ideal of the feeling," as I kept wanting to call it: strip the feeling down to its invariant, then find the one concrete, relatable image that carries that invariant across intact.
This is where the name of the thing arrives. A tuning fork does not describe a pitch — you cannot describe a pitch to someone who needs one. You strike the fork, and the air carries it, until a second fork across the room begins to ring at the same frequency without being touched. The kernel is the strike. The song is the ringing. And the listener is the second fork — resonating, not being told.
A feeling, driven
The other thing the writer does, that the model does not automatically do: she is not holding a feeling flat. She is driving it — arranging the metaphors so the intensity climbs, so arousal increases, so the chorus lifts and the tension set in the verse finally pays. This is not mystical. Music cognition keeps circling the same fact from different angles: emotion in music is tied to mechanisms such as expectancy, memory, imagery, contagion, appraisal, and bodily arousal. A song is not just a sound-object. It is an event unfolding inside a listener.
A bare diffusion objective has no target curve in that sense. It has no native representation that the song must go somewhere, that this line must resolve what that line opened, that anticipation should tighten here and release there. It can reproduce the texture of an emotional arc, because arcs are common on the surface and it inherited their shape. But inheriting a contour is not the same as choosing an affective path. It cannot tell you why the bridge belongs where it is, because to it the bridge is a region the sample can be carved into.
The model has learned what songs that moved people tended to look like. It has no contact with the moving. It is fluent in the shadow of feeling, and blind to the feeling that cast it.
Where the Why could enter
The third piece gave the lever: self-training sharpens where the verifier is immanent — where win and loss are decidable inside the rules, as in Go — and collapses where the arbiter sits outside, in the world. A song is the second case in its purest form. The arbiter is a listener. "Does this move you?" cannot be decided from the denoising loss alone. That is exactly why diffusion can become flawlessly song-shaped and still empty: coherence without correspondence, a thing consistent with the corpus of songs and in contact with none of the feeling they were for.
To get the Why in, the machine would need something the diffusion objective does not provide by itself: a target feeling, a generator of candidate musical moves, and — the missing piece — a critic that predicts the affect each move produces and keeps the ones that advance the curve. An affect-verifier. That is not impossible in principle. MusicLM and MusicGen already show how text, melody, and human evaluations can condition or judge generation; affective computing and music-emotion research give names to the listener-side variables. The frontier move from the third piece was manufacturing verifiers, turning transcendent-truth domains into immanent ones.
The hard truth here is that the verifier for a song is a human nervous system — the least immanent arbiter there is. You can simulate it, approximate it, train a proxy on reactions. But the moment the proxy becomes the target, you are back to optimizing a shadow. The model may learn what people tend to rate as moving. That is useful. It is also not the same thing as carrying the moving itself.
The honest doubt
I should not make it too clean, because the first piece left a fork open and it reopens here. Maybe there is no feeling sitting behind the song to aim at. The stronger reading is that the metaphor does not point at a pre-existing feeling — it constitutes one. The feeling did not fully exist until the lyric gave it a shape. If that is right, then "find the right metaphor for feeling X" is the wrong objective, because X is not fixed before the search. The feeling is the fixed point the song converges to — manufactured by the writing, exactly as the third piece said the ideal is manufactured, not waiting in some heaven to be recovered.
Which collapses the distance I was so sure of. The writer and the diffusion model would then be doing rhyming versions of the same thing: grinding toward an attractor that no single line contained. The only difference — and it is the whole difference — is that the writer's room has a listener in it, and the model's does not. She is driving a resonance she can feel land. The machine is striking a fork in a vacuum.
So maybe that is the end of it for now. A song is a verifier we carry in the body. Until a machine has one of its own — or learns to borrow ours honestly, not as a proxy to be gamed — it can strike the fork with perfect technique and never once hear whether the room rang.
Grandma's house had a jar by the stove, blue glass, no label, always warm at the rim.
No one knew what she kept in it. The cousins said salt. My uncle said a starter from before the war. My mother said not to ask questions about things that fed you.
When the soup went thin, grandma tapped the jar and pinched out thyme. When the bread sulked in the bowl, she tapped the jar and found salt. When a cousin came in red-eyed and pretending not to be, she tapped it once and the kitchen smelled like cinnamon, wool, and the back seat of a car in August.
The jar never offered two things. It never rattled like dice. It opened as if the room had already told it what was missing.
The Jar
One winter I stole it down from the shelf. I expected little drawers inside it, each with a name: bay, pepper, sugar, grief. Instead there was weather. Flour blew sideways. Onion skins spun like leaves. A train passed somewhere under the pepper. Rainwater climbed the glass and forgot to fall.
I shook it. Nothing useful came out. A gray dust settled on my palm and tasted like every dinner at once.
Grandma found me with my hand in the sink. She took the jar, held it over the pot, and did not shake it. She listened. The soup clicked softly. The spoon leaned against the enamel. The window worried in its frame. Then she reached in and brought out exactly what the room had been asking for.
The Kitchen
She could start anywhere. A heel of bread. A black banana. A sentence someone said too quickly and regretted before it landed. Around that small thing she built a meal, turning the flame down, then up, then down again, until the whole house moved toward the table.
There were recipes in the drawer, but the paper was not the cooking. The paper never knew when my grandfather had gone quiet. The paper never saw my mother stand too long at the back door. Grandma would read the room first, then the recipe, then the room again.
Sometimes she made the kitchen wait. She let onions brown until everyone became hungry at the same time. She let the kettle scream a little before pouring. She knew exactly when to lift the lid.
The Bell
Above the pantry hung a little brass bell. Grandma rang it when supper was ready, but that was not all it did. The first note touched the jars, then the window, then the bones of the chairs. If you were sulking in the hall, the bell reached you before her voice did.
No one came because the bell explained dinner. They came because something in them had begun ringing too.
Some houses teach every object the shape of hunger. Some hands know which object to touch.
The Wrong Jar
Years later, after she died, we found another jar at an estate sale. Same blue glass. Same warm-looking rim. We filled it with everything we remembered: thyme, salt, cinnamon, onion skin, a thread from her blanket, a button from my grandfather's coat.
It worked, in a way. Shake it over soup and the soup tasted like soup from that house. Shake it over bread and the bread browned like hers. At Christmas my cousin closed her eyes and said, almost.
Almost became the family word for it. The jar knew the wallpaper, the good plates, the scrape of the chair legs. It knew how our meals had looked from the ceiling. It did not know which chair was empty.
The Blanket
Her blanket stayed on the porch rocker. The blue square had a burn from a sparkler. The red square smelled faintly of cedar no matter how many times we washed it. Nobody knew whether the blanket remembered us or we remembered ourselves through it.
On cold nights we still used it. The warmth did not arrive all at once. It gathered, square by square, until you stopped thinking about being cold.
I keep the second jar now. Sometimes I open it and let the weather turn. Sometimes, if the house is quiet enough, I can almost hear what it is trying to leave out.
Grandma's Jar
Verse 1
Blue glass sweating by the stove
No label, no lid tied tight
Grandma kept it near the matches
Where the pilot caught the light
Mama said don't ask the thing
That keeps bread at the door
Pre-Chorus
She would listen to the kettle
Like it had a little heart
Tap the rim once with her knuckle
And the weather came apart
Chorus
Oh, grandma's jar
What did you leave out?
Thyme from the thunder
Warmth from the doubt
Ring, little kitchen
Ring, little star
I don't know what love is
But it came from that jar
Verse 2
When the bread sat low and heavy
She found salt inside the rain
When I came in red and quiet
Cinnamon knew my name
There was wool in it, and summer
There were words she never spoke
Chorus
Oh, grandma's jar
What did you leave out?
Thyme from the thunder
Warmth from the doubt
Ring, little kitchen
Ring, little star
I don't know what love is
But it came from that jar
Bridge
Years went by and we bought another
Same blue glass, same shining scar
Filled it up with thread and buttons
Tried to teach it who we are
Soup tasted almost like December
Bread browned almost like before
But almost is an empty chair
Pulled up to an open door
Final Chorus
Oh, grandma's jar
What did you leave out?
A hand on the kettle
A room turning south
Ring, little kitchen
Ring where you are
I don't know what love is
But I still keep the jar
Blue glass quiet by the window
Leaving out the hardest part
This version is the build log for the page as software-art: what each move is doing, what it preserves from the sourced essay, and why the generated material is treated as raw material rather than the finished artwork.
Medium claim. I made the selector state the position plainly: the AI text, fable, and song are not being presented as autonomous art. The artwork is the system of choices, constraints, sources, refusals, and transformations that the reader moves through.
Opening object. I started with the blue jar by the stove so the essay has one concrete object to orbit. The jar stands in for the generative system, but the first paragraph treats it as household magic, not as a technical diagram.
Unknown mechanism. The family guesses about salt, starter, and old war-memory make the mechanism social and partial. Nobody can see the process; they only inherit stories about what the jar seems to do.
From utility to feeling. Thyme and salt show ordinary correctness. The cinnamon-wool-car smell for the red-eyed cousin shifts the jar into the song problem: the right output is not just accurate, it changes a person.
Not next-token guessing. The jar never offers two things and never rattles like dice. That quietly replaces "what comes next?" with "what is missing from the whole field?"
The Jar
Inside the latent. The child stealing the jar open gives the reader one glimpse inside the process. Labeled drawers would imply retrieval; weather implies mixed potential, noise, memory, and unstable form.
Bad sample. The gray dust that tastes like every dinner at once is the model output with no situated aim. It resembles the family meals statistically, but it has no reason to be this ingredient now.
Human verifier. Grandma listening before reaching into the jar is the missing critic. She reads pot, room, window, mood, and timing before the jar becomes specific.
The Kitchen
Kernel. The heel of bread, black banana, and regretted sentence are concrete kernels. Each is small, but charged enough to organize the whole meal around it.
Surface versus situation. The recipe drawer separates formal instructions from live judgment. The paper knows sequence; grandma knows whether the room can receive that sequence.
Arc. Browned onions, a screaming kettle, and the lifted lid turn emotional structure into timing. This is verse-tension-chorus-release, but through appetite and pressure instead of explanation.
The Bell
Tuning fork. The bell moves the title image into the house. It does not describe dinner; it causes the jars, chairs, hallway, and people to answer.
Resonance. People come because something in them rings. That is the essay's thesis in story form: the work succeeds when the listener becomes the second instrument.
Aphorism. The blockquote compresses the distinction between learned pattern and directed touch. A house can learn the shape of hunger; a hand still has to know which object matters.
The Wrong Jar
Proxy model. The estate-sale jar is the trained proxy. The family fills it with traces: spices, blanket thread, coat button. It learns resemblance from artifacts.
Almost. "Almost" is the proxy failure in one word. The new jar passes taste, color, and nostalgia checks, but it cannot feel the absence it is trying to replace.
Surface reconstruction. The ceiling-view sentence says the wrong jar knows the room as appearance. It can reconstruct the table from above without knowing which chair hurts from inside the room.
The Blanket
Honest doubt. The blanket returns quietly to ask whether warmth is stored in the object or produced with the person under it. That mirrors the essay's doubt about whether feeling pre-exists the song or is made by it.
Final turn. The narrator keeps the second jar and listens to what it is trying to leave out. That final phrase lets the reader make the diffusion connection themselves: the jar's art is subtraction, and the open question is whether subtraction can know care.
Song version. I added a separate lyric instead of replacing the fable. After trimming, it keeps a lean song shape: two verses, one pre-chorus, repeated choruses, a short bridge for the failed replacement, and a two-line tag for the unresolved jar.
Sources
- Jonathan Ho, Ajay Jain, and Pieter Abbeel, "Denoising Diffusion Probabilistic Models", 2020.
- Yang Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations", 2021.
- Haohe Liu et al., "AudioLDM: Text-to-Audio Generation with Latent Diffusion Models", 2023.
- Flavio Schneider et al., "Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion", 2023.
- Belmont University, "Belmont Physics Professor Attends Launch of 'Unicorn' Fellowship Sponsor Stability AI", 2022.
- National University of Singapore School of Computing, Scott H. Hawley event bio, 2025.
- Andrea Agostinelli et al., "MusicLM: Generating Music From Text", 2023.
- Jade Copet et al., "Simple and Controllable Music Generation", 2023.
- Patrik N. Juslin and Daniel Västfjäll, "Emotional Responses to Music: The Need to Consider Underlying Mechanisms", 2008.
- Sarah A. Sauvé et al., "Effects of Pitch and Timing Expectancy on Musical Emotion", 2017.