How it's made

AI-voiced. Human-directed.

The voices on this show are AI-generated and the hosts are fictional. We don't hide that — we document it. Below is the entire production pipeline, specific enough that you could rebuild it yourself: the model, the voice IDs, the word budgets, the exact tags we use to direct delivery, and the command that masters every episode. The point isn't the tech, though. It's the human judgment wrapped around it — which is the whole difference between a real show and AI slop.

Nate Hargrove

The Seasoned Pragmatist · AI voice: “Jon”

Ava Vasquez

The Modern Builder · AI voice: “Lauren”

Nate and Ava are fictional characters, voiced by AI — but everything they argue about is real.

The whole pipeline

Seven steps from a real problem to a published episode. Three of them are human, and two of those are hard gates: nothing gets voiced until a person approves the script, and nothing ships until a person hears the finished cut.

Human AI Automated✋ = human gate

1
Source the topic
Human
Mine real, anonymized customer conversations for one recurring, specific problem.
2
Write the script — content pass
AI
LLM-assisted draft: 24–28 turns, ~850 spoken words, Introducer/Explorer structure.
3
Editorial sign-off
Human
A PreSales practitioner approves the actual conversation — before a single word is voiced.
4
Performance pass
Human
Add bracketed delivery tags to 30–40% of lines so it breathes like real talk.
5
Synthesize the voices
AI
One call to ElevenLabs eleven_v3 Text-to-Dialogue returns the whole episode as MP3.
6
Master the audio
Automated
ffmpeg: atempo 1.08 + two-pass loudnorm to −16 LUFS / −1.5 dBTP.
7
Final listen & publish
Human
A human reviews the finished episode in Podigee, then schedules the release.

Step 1

Topics come from real conversations

Every episode starts with something that actually happened. We pull recurring themes from real, recorded customer calls and meetings — the questions SE leaders keep asking, the problems that surface again and again — and distill them into one specific, concrete topic. A model isn't guessing what sounds plausible; we're mining genuine field signal.

One hard rule governs this step: strict anonymization. Real company names, people, and deal details never reach a script. Everything is generalized to an archetype — “a mid-market SaaS company,” not the actual customer. The value is the patternand the pain point, never the private story.

Steps 2–3

Writing the script — in two passes

Each episode is a two-person conversation built on one dynamic: an Introducer brings a specific idea, and an Explorer — experienced, but new to this angle — questions and pushes back until they land a practical move together. They alternate roles every episode, so neither is always the teacher.

Structure

Cold open → intro → exploration → the move → outro (24–28 turns)

Length

~830–880 spoken words → ~4:50 once sped up. ~5 minutes, on purpose.

Ceiling

Under 5,000 API characters per episode (tags included) — a hard limit.

Then we run a humanizer pass — the step that strips the AI tells. Concretely, that means: no rule-of-three, no sycophantic openers, varied sentence length, real opinions, the hosts allowed to disagree — and a banned-word list that kills the usual giveaways like delve, landscape, crucial, pivotal, foster. Every script is also checked line-by-line against a detailed character bible so Nate and Ava stay consistent — their experience, their blind spots, what each can credibly say.

Pass 1 · Content

Clean dialogue

No tags. The conversation itself — what they say.

✋

Human sign-off

Approve the content. Revise until it rings true.

Pass 2 · Performance

Add delivery tags

How they say it — [tags] + prosody. Then voice it.

We deliberately split content from performance. Pass 1 is only about what the hosts say — and it doesn't move forward until a human signs off. Tags come second.

Step 4

Directing the performance

Once the content is approved, a second pass adds bracketed delivery tags. ElevenLabs' eleven_v3 model reads these as performance directions — not as words to speak — and they make the delivery noticeably more alive. We tag roughly 30–40% of lines, never more than two per line, and lean on plain prosody too: ellipses for trailing pauses, double-dashes for interruptions, CAPS for emphasis.

[thoughtfully]reflective openers, thinking out loud

[curious]follow-up questions, leaning into the topic

[skeptical]pushback — “I don't buy that”

[leaning in]the conversation getting good

[firmly]strong opinion, drawing a line

[pausing]a beat before a punchline or pivot

[matter-of-fact]an uncomfortable truth, no drama

[fired up]climax energy, conviction at the peak

Here's a real cold open — the opening of “Liability Is the Moat” [20] — exactly as it goes to the voice model:

Nate: [thoughtfully] We won a deal last quarter we had no business winning. The competitor's product was better. On paper, we should have lost.
Ava: [curious] What happened?
Nate: [leaning in] One question in the final round. The CFO leaned forward: "if the model gets it wrong, who pays?" The competitor's AE froze. Ours said, "we do. Here's the SLA, here's the indemnification." Deal closed.
Ava: [pausing] Wait, that was it? One question?
Nate: [firmly] Exactly. A model can produce a tax return. A model cannot SIGN one. The signature is the value.

Step 5

Synthesizing the voices

Two voices carry the show, each cast to fit the character. We use the ElevenLabs Text-to-Dialogue API — one call generates the entire episode as a single, naturally-paced conversation (not line-by-line, which sounds mechanical at the turn boundaries).

Nate → “Jon”

ElevenLabs · Natural Authority — gravelly, measured

sB7vwSCyX0tQmU24cW2C

Ava → “Lauren”

ElevenLabs · Friendly, Comforting — warm, quick

DODLEQrClDo8wCz460ld

The whole episode is one payload: the model, a single setting, and the tagged lines mapped to voice IDs.

POST https://api.elevenlabs.io/v1/text-to-dialogue?output_format=mp3_44100_128
{
  "model_id": "eleven_v3",
  "settings": { "stability": 0.3 },
  "inputs": [
    { "voice_id": "sB7vwSCyX0tQmU24cW2C",   // Nate  · "Jon"   (Natural Authority)
      "text": "[thoughtfully] One of my SE Directors tried this..." },
    { "voice_id": "DODLEQrClDo8wCz460ld",   // Ava   · "Lauren" (Friendly, Comforting)
      "text": "[curious] Wait -- the customer asked for that?" }
  ]
}

That's the complete settings object — stability: 0.3 and nothing else; the dialogue endpoint ignores the usual extras and lets the model carry the voice. It returns an MP3 directly (44.1 kHz, mono, 128 kbps) in about three minutes. The API key lives in an environment variable, never in the payload or this page.

script.md

tagged dialogue

payload.json

inputs[] + model

eleven_v3

/text-to-dialogue

raw .mp3

one API call

ffmpeg

atempo + loudnorm

mastered .mp3

−16 LUFS

Podigee

RSS feed

One call per episode · 5000-character API ceiling · MP3 44.1 kHz mono 128 kbps

Step 6

Mastering the audio

Raw output is never published as-is. Every episode runs through two ffmpeg passes: an 8% speed-up (atempo=1.08) that tightens the pacing and lifts the energy, and a two-pass loudness normalizationto the podcast broadcast standard — −16 LUFS integrated, −1.5 dBTP true peak — so it sits at the same level as the intro, the outro, and every other show in someone's queue.

# Pass 1 — measure loudness (with the 8% speed-up applied)
ffmpeg -i raw.mp3 -af "atempo=1.08,loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json" -f null -

# Pass 2 — apply speed-up + normalize to broadcast targets
ffmpeg -i raw.mp3 -af "atempo=1.08,loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=..:measured_TP=..:linear=true" \
  -c:a libmp3lame -b:a 128k -ar 44100 -ac 1  episode.mp3

Step 7

Publish — with a human in the loop

The mastered file is uploaded to our host (Podigee), where a human listens to the finished episode one last time. Only then is it scheduled — a new episode goes live on a fixed weekly rhythm. A person hears every episode before you do. No exceptions.

Why this isn't “AI slop”

AI slop is what you get when a model runs end-to-end with no expertise and nobody accountable for whether it's any good. We built the opposite. Look back at the pipeline: the topic comes from real field experience, a practitioner signs off on the script before it's ever voiced, a humanizer pass strips the tells, a character bible keeps the hosts honest, and a human hears the final cut before it ships. The AI gives the show its voice and helps us draft faster. It is never the source of truth, and it never has the last word. The voices are synthetic; the experience behind them is earned — from work with 350+ solution engineers trained.

What's human, what's AI

The topics & lived experienceHuman
The frameworks & point of viewHuman
Character consistency & editorial sign-offHuman
Performance direction (the tags)Human
Final review before publishingHuman
Drafting assistance (writing faster)AI + Human
The two voices you hearAI
Audio mastering (speed + loudness)Automated

Produced by SE Rockstars — the team behind the Trusted Advisor Academy, founded by Tim Brömme & Jan-Erik Jank. Building your own AI show and want to compare notes? We'd genuinely like to hear from you.

Hear the result Talk to us

AI-voiced. Human-directed.

The whole pipeline

Source the topic

Write the script — content pass

Editorial sign-off

Performance pass

Synthesize the voices

Master the audio

Final listen & publish