Perspectives episodes
Mati Staniszewski (ElevenLabs): Why your voice will be the new AI interface

Mati Staniszewski (ElevenLabs): Why your voice will be the new AI interface

Co-founder and CEO of ElevenLabs, Mati Staniszewski, has become a defining voice in generative voice AI and multimodal interaction. Learn how natural-sounding AI, agentic workflows, and voice interfaces will change the way people communicate with software and with each other.

Table of Contents

Summary

Key takeaways

  • Voice will become the primary way humans interact with technology, replacing keyboards and screens as AI agents move from reactive support tools to proactive participants in daily workflows.
  • Emotion, intonation, and even imperfections make AI voices feel human. The hardest technical problem isn't generating speech, but understanding context well enough to deliver it naturally.
  • Trust in AI voice requires passing the Turing test in real conversations. That includes knowing when to pause, when to interrupt, and how to respond to emotional cues.
  • Engineering talent belongs everywhere, not just on product teams. Embedding technical resources across legal, operations, and go-to-market functions creates better processes and faster decision cycles.
  • Responsible AI use should start on day one. Safety mechanics, content moderation, and IP protection need to be baked into the development process from the first iteration.

For most of computing history, humans have adapted to machines. We learned to type, click, and swipe our way through interfaces designed around technical constraints rather than human instincts. Voice changes that equation entirely.

Founded in 2022, ElevenLabs has become the global standard for AI voice technology, powering everything from customer service agents that handle millions of conversations weekly to tools that help people who've lost their voices reconnect with their families.

At the helm is Mati Staniszewski, who co-founded ElevenLabs with his best friend of 15 years after recognizing a simple truth: the way we interact with technology is fundamentally broken.

In this conversation – part of our Perspectives series – Matt explains how voice will become the primary interface for AI, why emotion matters just as much as accuracy, and how ElevenLabs ships innovation at speed by organizing around small, autonomous teams with full ownership over their domains.

From one voice for every character to voices for everyone

Mati's story begins with an observation about how foreign films were dubbed in Poland, with a single monotone voice for every character. In his view, performances were flattened, and stories lost their emotional weight.

That early frustration planted a question. What would it take to break the language barrier entirely?

When ChatGPT launched in late 2022, Mati and his co-founder recognized that the technology needed to make this change had finally arrived. Large language models (LLMs) could understand context and process meaning across languages. But the interface was still text on a screen, making it a translation of the old paradigm rather than something genuinely new.

Voice, they realized, was the missing layer. Not just speech synthesis, but emotionally intelligent, context-aware audio that could make AI feel less like a tool and more like a conversation.

“We spend so much time on screens and keyboards. It's almost crazy. The ideal interaction between humans and technology will be different. Voice will be one of the primary interfaces, the way we're speaking now. It makes for a more real, nuanced, and interesting experience.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

That insight became the foundation for ElevenLabs, which now supports more than 7,000 customers and processes over a million AI-driven voice interactions every week.

Why making AI voices human is harder than it sounds

ElevenLabs didn't set out to build another text-to-speech engine. The team wanted to solve a deeper problem: how do you make a machine understand context the way a voice actor would?

Traditional speech models relied on hard-coded features: male versus female, young versus old, or happy versus sad. Those labels captured some characteristics, but they missed the nuance that makes voices feel distinct. ElevenLabs took a different approach, letting the model learn what makes a voice unique without forcing it into predefined categories.

“We figured out a method of not hard-coding anything. We relied more on the model to figure out the characteristics of the voice, and then we created a better decoder than anybody in this space.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

But understanding voice characteristics is only half of the equation. The harder part is context. The same sentence can sound completely different depending on what comes before it, who's speaking, and what emotion the moment requires. A great voice actor reads the script and adjusts instinctively. AI needs to learn that same intuition.

Intonation, pacing, and even imperfections all contribute to making speech feel natural. Accordingly, ElevenLabs trained its models to understand emotional patterns the way LLMs predict the next word in a sentence. The result is speech that doesn't just sound clear – it also sounds alive.

"The intonation, the emotions, even the imperfections are part of what makes a voice special," Mati says.

From static support to proactive agents

When Mati talks about the future of AI voice, he draws a clear line between reactive tools and proactive experiences.

Most customer service systems today operate in reaction mode. A customer has a problem, calls a number, navigates an IVR menu, and eventually reaches someone who can help. AI agents can make that experience faster and more accurate, but the real shift happens when agents stop waiting for problems and start participating in the full customer journey.

“It's not just a reactive help. It should be part of the full experience. You should be able to be on an e-commerce store and have an agent speak through you, show you items, explain what's good and bad about them, and navigate through the whole experience.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

That vision is already playing out in practice. Meesho, India's largest e-commerce platform, uses ElevenLabs to power conversational shopping experiences. Immobiliare, Italy's biggest real estate marketplace, integrates voice agents directly into property searches. Square has also built voice into its ordering and checkout workflows, making AI a natural part of transactions rather than just a fallback option.

The pattern is consistent. In each case, voice moves from the call center to the storefront, from support to sales, and from solving problems to creating experiences.

Why education will change faster than most people expect

Beyond commerce, Mati sees education as one of the most transformative use cases for voice AI and one of the closest to his heart.

"It could democratize access to learning, which is incredibly exciting," he says.

That shift is already visible on websites like Chess.com, which lets users learn from AI versions of grandmasters like Magnus Carlsen and Hikaru Nakamura. MasterClass now offers voice agents that can walk users through cooking lessons with Gordon Ramsay or negotiation tactics with Chris Voss, not as recorded videos but as interactive conversations.

The implications stretch far beyond celebrity voices. With AI, personalized tutoring becomes affordable at scale. Students can learn at their own pace, ask questions in real time, and receive feedback tailored to their specific gaps in understanding. This model doesn't replace teachers. It extends their reach and makes high-quality instruction accessible to students who would never have access otherwise.

ElevenLabs recently launched its Iconic Voice Marketplace, which includes educators and historical figures like Richard Feynman and Alan Turing. Rather than nostalgia, the idea is utility. Imagine learning physics directly from Feynman's voice, asking follow-up questions, and receiving explanations that adapt to your level of understanding.

“This will be phenomenal. You'll be able to learn, have a personalized experience from the get-go that understands what you're good at and what you're bad at, and tailor materials accordingly.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

Restoring human voices through AI technology

For all the commercial applications ElevenLabs enables, the most moving use cases involve people who've lost their voices to ALS, cancer, or other conditions.

ElevenLabs has worked with more than 300 organizations and helped restore over 3,000 voices, allowing people to speak again using AI versions of their own voice. The company recently hosted a speaker at its annual summit who had lost her voice but used ElevenLabs to address the audience with her original accent intact.

“Seeing their reaction of just being able to connect with their family, with their close ones in a completely technological way, but bringing it back to the human and so similar to what they wanted – it was amazing.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

These aren't edge cases. They're proof that the technology works, that it matters, and that voice carries emotional weight that no text interface can replicate.

ElevenLabs has also created a marketplace where people can license their voices to others, earning passive income when their voice is used. So far, more than 10,000 voices are available through the platform, and ElevenLabs has paid out $11 million to contributors.

How small teams with full ownership can ship faster

As ElevenLabs has scaled to nearly 400 people, Mati has held firm to an organizational principle that runs counter to most enterprise playbooks: small teams with high autonomy move faster than hierarchies with clear approval chains.

The company operates as a collection of independent labs – voice lab, agents lab, music lab – each with full ownership over its domain. Teams make their own decisions, set their own priorities, and execute without waiting for sequential approvals or cross-functional sign-offs.

“We have a lot of small teams, effectively small labs. Each team has a very high degree of independence and execution. When we hire the best people that are within those teams or leading those teams, they have almost full degree of ownership on leading the way.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

The approach works, but only under specific conditions. First, the company needs to hire people who are aligned with the mission and excited about the same future. Second, teams need to trust that their members are better than leadership at their specific domains. And, third, leaders need to be comfortable with less visibility into day-to-day work.

Mati is direct about the tradeoffs. Both he and his co-founder manage around 20 direct reports each. Most of those reports manage another 20 people. That structure creates broad spans of control and eliminates layers, but it requires trust at every level.

“It only works if you have people on those small teams that you can trust and rely on. That needs to stay true.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

The company also avoids titles entirely, which allows people to take on any role that fits their strengths from day one rather than waiting for promotions or permission.

Engineering talent belongs everywhere, not just in product

One of ElevenLabs' less obvious structural decisions has been embedding engineers across non-technical teams. Legal, operations, sales, and marketing all include technical resources who build tools, automate workflows, and improve processes.

Mati points to several examples where this approach paid off quickly.

On the go-to-market side, ElevenLabs built a voice agent that qualifies inbound leads by having a conversation with prospects at the end of the contact form. Around 30% of prospects now choose to speak with the agent rather than just submitting a form, and the qualification rate for those leads is significantly higher. People share more information in conversation than they would typing, and they feel more comfortable speaking with an AI agent than they might with a human salesperson in the early stages.

The company built a similar agent for candidate experience, where job applicants can ask questions about benefits, remote work policies, and interview preparation before they ever speak with a recruiter.

On the internal side, ElevenLabs created an LLM-powered wiki that employees can query for company information, along with tools that pull data from Salesforce and Gong to answer questions like, "What's the latest on this deal?" directly in Slack.

These aren't product features. They're operational improvements that make the company run faster, and they only exist because engineers were embedded in teams that traditionally wouldn't have direct access to technical resources.

Mati notes that this structure isn't unique to ElevenLabs. During a recent visit to Ukraine, he learned that the government uses a similar model, deploying technical resources from the Ministry of Digital Transformation to other agencies like Education, Economy, and Foreign Affairs to drive innovation locally.

“We have engineering talent across each of those different divisions. And that helps us continue the innovation and speed.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

Responsible AI starts on day one

ElevenLabs operates in a space where misuse is a constant risk. Voice cloning can be used for fraud, impersonation, and disinformation. The company has chosen to build safety mechanics into every product from the start rather than treating governance as an afterthought.

“Having guardrails from the start of the process actually makes the entire process easier. It shouldn't be an afterthought. It should be the first thing.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

That philosophy shows up in several ways. Every voice output includes traceability markers that identify whether content is AI-generated. The platform also uses moderation systems to detect and block attempts to clone voices without permission or generate harmful content. And ElevenLabs partners with AI security institutes in the US and UK, sharing data and models to help regulators understand how the technology works and where risks might appear.

Beyond safety, the company has focused on bringing people and intellectual property into the ecosystem in ways that create value rather than extracting it. The voice marketplace allows individuals to license their voices and earn passive income. The iconic voice marketplace brings estates and public figures into partnerships where their voices can be used for specific projects – from education to brand campaigns and creative work – with clear permissions and compensation.

“We try to build safety mechanics from the beginning. Traceability, moderation, partnering with security institutes, and making sure people and IP are part of it.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

What happens when language barriers fall

When Mati looks five years into the future, he sees a world where language is no longer a limitation.

“You could travel anywhere in the world, immerse yourself in that culture locally, understand people around you, and have no barrier to doing that. You feel like you're speaking with another person. That just feels like an ultimate flow.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

Real-time translation already exists. But it's clunky, with typed text, awkward pauses, and interfaces that require conscious effort. Mati's vision is more seamless: walking through a market in Tokyo, ordering food in a restaurant in Buenos Aires, or having a deep conversation with someone whose native language you don't speak, all without reaching for a keyboard or waiting for an app to catch up.

Voice makes that possible. Paired with visual context and the reasoning capabilities of LLMs, AI can translate not just words but tone, emotion, and cultural nuance. The result is communication that feels natural rather than mediated.

ElevenLabs is already working toward that future. The company recently launched a speech-to-text model that supports 100 languages (roughly 70 more than most competing models) with lower latency and higher accuracy than anything else on the market. More than translating words, it's about preserving meaning across contexts.

The interface of the future is still being written

Mati is quick to acknowledge that voice alone won't define how humans interact with AI. The future will be multimodal, with voice, vision, text, and reasoning all working together.

“Voice will be such an important delivery driver. But it will not be the only pillar. To be able to learn, you'll need a visual piece. For language barriers to fall, you'll need a combination of visual and audio models fusing together. And for the general interface of technology – whether it's robots or other devices – you'll need voice, but also vision and understanding of what's happening around you.”

Mati Staniszewski, Co-Founder and CEO, ElevenLabs

That shift will take time. Mati believes the core experience, where AI meets users on their terms rather than forcing them to adapt, is three to five years away. Robots and fully immersive environments will take longer, probably decades.

But the direction is clear. Instead of learning programming languages, mastering keyboard shortcuts, or navigating nested menus, people will simply ask for what they want. The machine will handle the translation.

"All of that shifting to what we want and how we want it… I think it's going to be a phenomenal change," Mati says.

Conclusion

Voice is the most natural interface humans have. It carries emotion, context, and nuance in ways that text never will. As AI agents move from reactive tools to proactive participants in everyday life, voice becomes the layer that makes those interactions feel human.

But building voice AI at scale requires more than models. It requires small teams with real ownership, engineers across the organization, and safety mechanics built into every product from day one. It requires thinking about IP, compensation, and governance as core features rather than compliance checkboxes.

For leaders navigating AI adoption, Mati's advice is straightforward: try the tools yourself. Don't just read about them or delegate testing to others. Build something, even if it's simple. Understand what works, what doesn't, and where the real value sits.

The companies that thrive in the AI era are the ones that stay curious, move fast, and remember that the technology exists to serve people first, not the other way around.

Pigment newsletter

Join the community shaping the future of AI and business

Sign up to our newsletter to receive expert takeaways, and behind-the-scenes insights from the people building the next generation of products, infrastructure, and AI capabilities.