How Sanas built a real-time video translation app in 3 months using Expo

UsersReact Native7 minutes read

Jason Lin

Jason Lin

Guest Author

Scott Hickmann

Scott Hickmann

Guest Author

John Hong

John Hong

Guest Author

How Sanas built the world's first instant translation video app in 3 months using Expo. Real-time voice across 25+ languages with sub-2-second latency.

From prototype to App Store: How Sanas built an instant language translation app

For decades, engineers have dreamed of building a true universal translator: technology that allows two people to speak different languages yet hear each other as if they shared one. At Sanas, we set out to make that dream real.

In just a few months, a small team at Sanas, a Speech AI company, built and shipped the world’s first instant language translation video calling app using Expo. The app combines advanced large language model (LLM) translation research with Expo’s modern mobile tooling to make real-time multilingual conversations possible.

Building something this ambitious required a development stack that balanced flexibility, native performance, and speed of iteration. Expo gave us exactly that foundation. With its unified developer experience, fast iteration cycle, and access to native APIs, we could focus on solving one of AI’s most human problems: helping people understand each other.

The challenge of real-time, low latency voice translation

Real-time, low-latency voice translation is one of the hardest problems in communication technology. To feel natural, it has to capture and stream speech instantly, translate and synthesize new audio fast enough for live calls, and still preserve the unique tone and emotion of every speaker. All this needs to work under unpredictable network conditions, scale to more than 25 languages, and deliver reliably in a mobile video call.

Achieving all this within a mobile video call meant solving for latency, performance, and voice fidelity simultaneously. Even a few hundred milliseconds of lag can make a conversation feel stilted. We needed a system that felt effortless for users — on both iOS and Android — and a development framework that let us test and ship improvements daily.

Why Sanas chose Expo

Our team members had used Expo during our time at Stanford, where it’s the recommended framework for mobile development in several courses. We knew it would let us move quickly without losing access to native APIs. With Expo, we could build, test, and deploy daily while maintaining full control over performance-critical components.

Expo’s unified toolchain and integrated services helped us stay focused on the hard problems of translation accuracy, latency, and naturalness.

  • Expo SDK 54 gave us deep integration with native APIs while maintaining a consistent developer experience.
  • EAS Build handled CI/CD for both iOS and Android without the overhead of manual configuration.
  • EAS Update enabled instant OTA rollouts to our internal testers and early users

By offloading the complexity of builds, updates, and native configuration to Expo, we freed ourselves to focus on what mattered most: pushing the limits of speech translation.

Key Expo Modules and integrations

To achieve instant translation and smooth communication, we built the app around a mix of Expo modules and custom native integrations

  • expo-audio: Audio recording and playback for live capture and synthesis
  • expo-camera: Real-time video streaming and participant feed control
  • expo-webrtc (custom integration): Encrypted peer-to-peer calling layer
  • expo-haptics: Subtle tactile feedback to signal translation events
  • expo-notifications: Call invites, missed call alerts, and translation updates
  • expo-secure-store: Local encryption for authentication and API credentials
  • expo-updates: Seamless OTA feature rollouts
  • Custom native modules: For LLM streaming inference and on-device voice cloning

Together, these modules allowed a small team to create a production-grade experience that feels instant and intuitive, without compromising on security, voice fidelity, or translation speed.

Sanas application architecture at a Glance

Behind the scenes, the app runs on a streamlined architecture that balances edge intelligence with on-device efficiency. Each layer is designed to minimize delay and maximize fidelity:

  • The client is a React Native + Expo app responsible for UI, state management, and local audio/video processing.
  • An edge server handles low-latency speech recognition, translation, and voice synthesis using a fine-tuned multilingual LLM.
  • WebRTC transport manages real-time, bi-directional communication with dynamic quality adaptation to ensure stability even under network strain.
  • EAS Build and Update orchestrates build automation, staging channels, and feature flagging for A/B latency testing.

By distributing computation intelligently between the device and the edge, we achieved a translation experience that feels instantaneous to users while maintaining consistent performance across platforms.

Example: Streaming translation in real time

One of the trickiest parts of building instant translation is coordinating live audio capture, transcription, translation, and synthesized speech playback all over a single connection with minimal latency. Here's how we wired it up:

Code
// A single WebSocket handles the entire translation pipeline
ws.onmessage = event => {
const message = JSON.parse(event.data);
switch (message.type) {
case 'ready':
// Server is ready — start streaming audio chunks
LiveAudioStream.on('data', (data: string) => {
ws.send(JSON.stringify({ type: 'audio', data }));
});
LiveAudioStream.start();
break;
case 'transcription':
// Progressive transcription with complete + partial words
const completeText = message.complete.map(w => w.word).join('');
const partialText = message.partial.map(w => w.word).join('');
updateTranscription(completeText, partialText);
break;
case 'translation':
// Full translation arrives after user stops speaking
displayTranslation(message.complete.map(w => w.word).join(''));
break;
case 'speech':
// Streamed TTS audio chunks with character-level timestamps
audioChunks.push(base64ToUint8Array(message.audio));
break;
}
};

This approach, a single persistent WebSocket multiplexing transcription, translation, and voice synthesis, lets us avoid the round-trip overhead of separate API calls. With this method users see their words transcribed as they speak, and hear the translated response almost immediately after they finish.

The ROI of building with Expo

In just three months, a small team took our idea from prototype to production, launching on both the App Store and Google Play. The app now supports more than 25 languages, each preserving the speaker’s natural tone and identity. With average translation latency under 2 seconds, the experience feels fluid and native across devices.

Expo was the reason this pace was possible. Its unified toolchain, open-source flexibility, and cloud-based build system gave us everything we needed to deliver at scale. Rather than managing complex native configurations, we could focus our time on advancing the core translation models and improving the live experience.

As one of our engineers put it, “Expo let us focus entirely on pushing the limits of translation, not configuring build systems.”

What’s next for Sanas?

The work doesn’t stop here. Our team is now focused on the next frontier: making real-time translation even more expressive and human. For more information, you can read our technical writeup here.

Expo continues to be a key part of that journey. Its fast, reliable update system lets us ship model improvements and new features every week, turning research breakthroughs into real user experiences almost instantly.

At Sanas, our goal has always been simple: to help people understand and be understood, no matter the language. Expo helps us make that vision real.


React Native
EAS Build
Expo SDK

React Native CI/CD for Android, iOS, and Web

Learn more