I Spent 90 Days Testing Claude on brand voice So You Wouldn't Have To

At Stoned Fruit, Rebecca and I build custom GPTs for founders who want em. It's part of what we offer — a way for founders to create content in their actual voice without having to be the one typing every sentence themselves…

The problem is they're built on OpenAI.

And OpenAI has a brand reputation problem that's getting worse.

Our clients are increasingly uncomfortable with the association. Some of them have asked us directly whether there's another option. So I went looking.

Claude was the obvious candidate. Anthropic has spent years positioning itself as the thoughtful alternative — safer, more aligned, less Silicon Valley in a way that registers with the kind of founders who hire us. On paper, the switch made sense. We just needed to know whether the tool could actually do the job before we moved client work over.

I gave it 90 days.


The test

I used myself as the test subject.

This wasn't laziness. It was the rigorous choice. My voice is the most documented voice I had access to. Seven years of writing. Defined frameworks. Explicit voice rules — no closing lines, grade 11 essay style, "we" not "you," conversational over corporate, endings that just stop when the point is made.

If Claude could hold a brand voice with parameters, memory, and explicit correction, it would hold mine. I'd know within a few weeks.

I set it up properly. System prompts. Memory entries. Detailed instructions. I corrected drift when it happened, the way I'd train a custom GPT. I worked across multiple projects — my newsletter, blog posts, podcast scripts, client work for Stoned Fruit, research for the Progress Report series I'm building, frameworks I was developing for Queens of Coin.

If anything, I was an easy case. I came in knowing what I wanted, I corrected immediately when something landed wrong, and I gave the tool every chance to learn.

What Claude Got Wrong

The voice drift happened constantly.

Closing lines. I'd explicitly told it not to add punchy wrap-up sentences. It added them anyway. Sometimes three drafts in a row. I'd correct it, get a clean version, and then the next piece would have a closing line again.

Corporate register. My voice is conversational. Sometimes profane. It would slip into marketing-speak, especially in CTAs. "Empower your business." "Unlock your potential." Things I would never write. Things nobody writes who actually sounds like themselves.

Over-explaining. My voice trusts the reader. Claude doesn't. It would add connective tissue I didn't want, walk through the logic I'd already implied, restate the obvious. Every draft needed cutting.

Tidy endings. I write essays that stop when the point is made. Claude writes essays that resolve. There's a difference. A resolved ending tells the reader what they just read. A stopped ending trusts them to know.

Hedging. I don't hedge. Claude hedges constantly. "It might be the case that." "Some would argue." "It could be said." My writing makes a claim and stands behind it. Claude's writing manages risk.

The recurring pattern was the most telling part. I'd correct the same drift fifty times across ninety days and the fifty-first time, it would happen again. The corrections didn't compound. They didn't stick. The voice was always being approximated, never held.

What ChatGPT Does Differently

The custom GPTs we build at Stoned Fruit hold voice for client production work.

They're not perfect either. They need refinement. They drift sometimes. But the architecture is different in a way that matters. When a custom GPT drifts, you can prompt it back — "this is Kat, what's Kat's voice?" — and it recalibrates. The correction sticks for longer. The voice settles back into itself and stays there for the rest of the session.

I tried the same correction approach with Claude. It would apologize, produce one clean piece, and drift on the next one. Same conversation. Same prompt. Same explicit instructions. The voice didn't lock.

I'm not making a technical claim about why this is. I'm not an engineer. I'm a person who tested a tool for a specific job over ninety days and noticed a pattern.

The pattern is this: custom GPTs at OpenAI hold voice. Claude approximates voice. For the specific job of producing client content at scale, custom GPTs are the better tool.

The brand reputation issue is real and I take it seriously. But our clients are paying us for output. They're paying for content that sounds like them. The philosophical question of which AI company we prefer matters less than whether the work is good. So the switch we were considering doesn't make sense. Not yet. Maybe not ever.

That was the answer I came for.

The Surprise

While the test was failing, I was using Claude for other things. Research for the Progress Report series — the long investigative content I'm building from old issues of Discover Magazine. Structural work on my own blog posts. Organizing material across projects. Pulling threads from past conversations. Pushing back on my own thinking when I needed something to argue with.

It was useful. Really useful. In a way I hadn't expected.

The tool that couldn't hold my voice for production work turned out to be excellent at a completely different job: thinking partnership. Pushback. Research. Synthesis. Holding the shape of a project across multiple sessions. Helping me see what I already knew but hadn't articulated yet.

Some of the best strategic thinking I've done this year happened in Claude conversations. Not because Claude is smarter than me. Because I needed something to push against, and I'd never had that before. The thing about working solo is that you do all the thinking inside your own head. Having a collaborator who can hold a question, return it sharpened, and stay in the same conversation across weeks — that's a different mode of working.

I came in testing whether Claude could replace ChatGPT for client production. The answer was no.

I left understanding that I'd been using the wrong frame the whole time.


What's Actually Happening Here????

The AI tools are being marketed as interchangeable general assistants. Each company is claiming theirs is the smartest, the most capable, the most aligned. They benchmark against each other on the same tests. They release similar features within weeks of each other. They look, on the surface, like substitutes.

They're not. They have actual specializations that don't map to how they're sold

OpenAI's product, with custom GPT architecture, holds voice for production work. That's what it's actually good at. Anthropic's product, with its conversational structure and willingness to push back, is good at thinking partnership. That's what it's actually good at. Neither company is marketing itself this way because "we're a thinking partner" doesn't scale to the same total addressable market as "we'll write your content."

The result is that most founders are using the wrong tool for the job they're trying to do. Not because they're choosing badly. Because the marketing flattened the differences. The choice gets framed as "which AI is best" instead of "what am I trying to do, and which tool is shaped for that work."

Underneath this is something I think about a lot in my own work, which is that the founders most attracted to "AI in your brand voice" tooling are the ones whose brand voice isn't strong enough yet to know it's being flattened. The ones with actual voice will notice immediately that the approximation isn't holding. So the product gets sold hardest to the people it serves worst. The buyers who'd benefit most from a thinking partner — founders with clear voice, real ideas, strong frameworks — are being sold a writing assistant they don't need.

This is PART of what I do for a living. I look at the gap between what's being sold and what's actually happening underneath. The AI tooling category is the same pattern I see in coaching, in courses, in the entire online business industry. The product is being designed for the buyer's anxieties, not the buyer's actual problem.


What I Found Out

I came to Claude looking for a tool to assist in content creation and I found out I don't need AI to write like at all.

I already know what I want to say. I have a voice. I have frameworks. I have years of thinking about buyer psychology and pricing and brand perception. When I sit down to write, the bottleneck isn't generation. It's everything around generation — the organizing, the structuring, the research, the formatting, the admin of getting a piece from my head into a published post.

What I needed wasn't a writing assistant. It was a collaborator. Someone to push against. To organize the material I already have. To handle the parts of the work that aren't thinking. To stay in a conversation across weeks and remember what I was building.

Ninety days of testing Claude didn't teach me whether to switch tools at Stoned Fruit. It taught me what I actually needed help with. Which turns out to be different from what AI is being sold to do.

The tools work. They're real. They're useful. But not the way they're marketed.

For client production work at Stoned Fruit, custom GPTs at OpenAI stay where they are. For my own thinking work, Claude is now part of how I do my job. Different tools, different specializations, different jobs.

The marketing wants you to pick one. The honest answer is they're not the same product.


Previous
Previous

What I Learned About Myself From 90 Days of Using Claude AI

Next
Next

What Queens of Coin Actually Does