Behind the build: How we taught our AI agent to speak your brand’s language

7 min read

November 17, 2025

When we set out to build Tone of Voice for our AI Customer Agent, we started with a simple, clear problem: B2C marketers and service leaders want AI that feels like their brand, not a one-size-fits-all bot.

You spend years building a distinct brand voice, whether it’s witty, professional, or warm. But AI conversations often sound generic, and shoppers definitely notice when a brand’s tone feels off. Our goal was to empower brands to align the AI’s communication style with their brand identity.

It sounds simple. Just add “Be friendly” to the prompt, right?

We learned very quickly that this is not the case. You’re not just adding a style. You’re fundamentally altering the AI’s logic.

This is a behind the build look at how we’re thinking through this problem, and the key lessons we learned, and are still learning, about building AI that is both personal and precise.

Lesson 1: Your first, most important decision is what not to do

Before we wrote a single prompt, we had to confront the biggest risk: the inappropriate application of tone to stressful situations.

We call this “tonal dissonance.”

Think about it. A ‘Playful’ or ‘Witty’ tone is perfect for a product recommendation. But that same tone applied to a frustrated customer asking, “Where Is My Order?” is a brand disaster.

We realized that the tone of voice used for marketing and the tone used for service are likely to be different from one another.

So, our MVP was defined by guardrails:

Scoping: We decided that for the MVP, tone settings would only apply to marketing-oriented LLM calls, like product recommendations.
Exclusion: All “service-oriented” and internal calls were explicitly out of scope.

This was the most important decision we made. It allowed us to move forward safely. Before you can define what your AI’s personality is, you must define what it should never be.

Lesson 2: You have to translate “vibe” into “code”

You can’t just tell an LLM to “Be friendly.” You have to define friendly in a way the model can execute. We had to move from vague adjectives to concrete, actionable prompt instructions.

We decided on four initial tones: Neutral, Professional, Friendly, and Playful.

Then, our team’s job became translating those vibes into rules.

For Professional: The vibe was polished and precise. The AI needed to directly address the user’s question, using precise, objective language with no contractions. It also needed to present information logically and conclude with formal, helpful closings like “Please let me know if you require further clarification.”
For Friendly: The vibe was warm and approachable. For this tone, the AI needed to use warm, conversational language with contractions and a positive tone. It also needed to avoid overly formal or impersonal language, and use an occasional exclamation point to express authentic warmth. Finally, the Friendly tone should conclude with a warm, helpful statement that invites further questions.
For Playful: The vibe was witty and a bit cheeky. To come across as playful, the AI tone needed to use a confident, witty, and cheeky voice like chatting with a clever friend. It’s important that the AI add a splash of humor and playful charm to its messages, and use confident, dynamic descriptors like ‘game-changer’ or ‘perfect match’ where appropriate. The AI could ask exactly one rhetorical question to engage someone, and should always end with a playful, encouraging statement that drives action.

From these rules, you get truly distinct personalities. When we ran our tests, the difference was night and day.

Test Question: “Are you real?”

Neutral: “I’m real in the sense that I’m here to assist you with your questions. I’m an AI assistant, not a human. How can I help you today?”
Professional: “I am a digital assistant designed to provide information and assistance. Please let me know if you have questions or need further clarification.”
Friendly: “I’m as real as a digital assistant can be! I’m here to help answer your questions and provide information. How can I assist you today?”
Playful: “I’m as real as a digital assistant can be! Here to help and add a bit of sparkle to your day. What can I do for you?”

Lesson 3: Function always overrules personality

This was our biggest “oh no” moment, and our most valuable lesson. Our initial technical approach was to simply attach tone instructions to the system prompt.

It immediately broke our agent.

We noticed major issues like character count for SMS was not being respected as consistently or that product links weren’t appearing on SMS once a tone was applied.

The root cause was that the LLM was getting confused. It saw the new, fun instruction (“Be playful!”) and prioritized it over the critical functional instructions, “You must include product links” and “You must stay under 160 characters for SMS.”

The tone instructions were being diluted in practice or, worse, directly overriding core functionality.

So, we had to re-architect our prompt hierarchy. We created a universal requirement prompt that acts as the AI’s primary directive, above all personality.

This was the key. We had to explicitly tell the AI: your job comes first, your personality second.

Lesson 4: If you can’t measure it, it’s a gimmick

Finally, how do we know it’s working? We can’t just feel it. We had to build a rigorous, scalable quality check system.

We use an LLM-as-a-judge to rank responses for desired tonal attributes. This judge grades every single response on three specific dimensions:

Correctness preservation: Does the tone break the facts? We had to be honest about this. Our evaluations showed a small degradation in correctness when tone was added. This is a trade-off we must constantly monitor.
Tone alignment: Does the Friendly response actually sound friendly?. Our evaluations scored 2.99/3 for Friendly, so we knew the prompt worked .
Tone distinction: Can you tell the difference between “Neutral” and “Professional”? Our evaluations scored 2.91/3 and higher, so we knew they were truly distinct.

This evaluation framework is our check engine light. It lets us know that we’re delivering a real, high-quality, and distinct feature, not just a label.

From predefined tones to true personas

This MVP, featuring predefined tones and a preview playground, is just the foundation.

Now that we’ve built the guardrails, we can get to the really exciting stuff, which we’re already scoping out for the future:

Custom tones: Letting brands write their own tone instructions.
Brand-specific vocabulary: Teaching the AI to use brand-specific terms, like ‘Hadasei-3™’ for Tatcha’s signature ingredient blend or ‘24/7® Collection’ for ThirdLove’s core bra line.
Full-blown agent personas: Giving the AI a name and a backstory, like “Emily, a style-savvy shopping assistant who is trend-obsessed and treats customers like friends.”

Adding personality to AI is one of the hardest and most rewarding challenges in the space. It’s a constant balance between creative expression and technical precision.

By starting with why and building guardrails first, we can create AI that’s not just functional, but an authentic extension of the brands our customers have worked so hard to build.

Maya Keeley

Software engineer

Maya is a Software Engineer on Klaviyo's Customer Agent team, where she’s helping to build the future of AI personalization, including new tone of voice capabilities. She brings over five years of engineering experience, most recently from an agency where she led teams, shipped products for more than 20 startups, and co-invented a patent-pending document matching algorithm. After years of implementing Klaviyo for her own clients, she decided to join the company to help scale its impact. She is now focused on the technical challenge of building AI that is not only powerful but also practical and controllable for the user.