Platform

Real-time AI avatars, built for production.

Photorealistic faces, streaming full-duplex voice, and your choice of LLM — on web, mobile, or physical kiosks.

Real-time
Voice-to-voice round trip
50+
Supported languages
SaaS · On-prem · Kiosk
Deployment shapes
2022
Building digital humans

Architecture

How the platform fits together

Voice in, voice out

Streaming ASR captures speech, the LLM generates a response, TTS speaks it back — typically around two seconds end-to-end through the voice-backend pipeline on a co-located GPU pool.

Real-time avatar rendering

The 2D backend runs Lipsync-2D rendering with NVENC and pushes WebRTC frames via WHIP. The 3D backend renders Unreal Engine 5 characters with cinematic quality.

Bring any LLM

OpenAI, Anthropic, Google, Mistral, or your own fine-tuned model behind a single config. Swap engines without re-architecting.

Deploy on your terms

Central SaaS, on-prem GCP, or an air-gapped kiosk appliance — the same product, three deployment shapes. One customer per stack when isolation matters.

Engagement timeline

From kickoff to first conversation.

What our pilot engagements typically look like — three weeks from the first call to real users talking to your avatar.

01
Week 1

Avatar selection + scope

Pick an avatar from the live catalog or brief a custom one. We agree on the use case, the channel, and the metric the pilot is judged on — usually first-contact resolution or completion rate.

02
Week 2

Integration

Wire your LLM (or use one of ours), load your knowledge base, and plug the avatar into the surface that matters: web, mobile, or a kiosk on your wall.

03
Week 3+

Production

Go live to real users. We watch metrics with you. If the pilot lands, you graduate to Growth or Enterprise without re-platforming.

Platform capabilities

Built for Real-World AI

Real-Time Lip Sync

Neural rendering generates facial expressions and lip movements frame-by-frame, synchronized to speech in real time.

Voice In, Voice Out

Full-duplex voice: ASR captures speech, LLM generates response, TTS speaks it back — typically around two seconds end-to-end on a co-located GPU pool.

50+ Languages

Speak to your avatar in Arabic, Mandarin, Spanish, Russian, or any of 50+ supported languages. Auto-detect included.

Bring Any LLM

OpenAI, Anthropic, Google, Mistral, or your own fine-tuned model. Swap with a config change.

2D & 3D Avatars

Photorealistic video avatars (Lipsync-2D / WAV2Lip) or cinematic 3D characters (Unreal Engine 5).

Kiosk-Ready

Hardware-tested for public deployments: airports, malls, hotels, hospitals. Offline fallback included.

Use cases

AI Avatars Across Industries

Replace Your Info Desk with AI

Airport terminals, shopping malls, hotel lobbies. Multilingual help 24/7 — no staff needed.

First-Contact Resolution, Every Time

Handle returns, troubleshoot products, guide users through processes. Consistent service across every channel.

Hire and Onboard on Autopilot

Screen candidates, answer policy questions, walk new hires through day one. Your brand, your tone, 24/7.

Education That Talks Back

Museum guides, training modules, product explainers. A character people actually want to talk to.

Platform FAQ

Typically around two seconds voice-to-voice on the central SaaS deployment with a co-located GPU pool. Network distance and LLM choice can stretch it; on-prem and kiosk deployments target the same envelope when co-located.
Platform — AIvatars