Sarvam AI + EkStep Voice India: 5M Users, 42% Health Enrollment Boost in 31 Days [Listen at Scale Results]

Sarvam AI EkStep voice India 5 million users senior citizen phone call 11 languages rural digital inclusion

Key Takeaways

  • Sarvam AI and EkStep Foundation’s “Listen at Scale” program reached 5 million Indians in 31 days through voice AI conversations in 11 local languages — The February-March 2026 pilot generated 7.4 million conversation minutes across 16,000+ panchayats, proving voice-first interfaces can overcome India’s 73% non-English speaker barrier where text-based apps and IVR systems have consistently failed
  • National Health Authority achieved 42% increase in daily Ayushman scheme enrollments after contacting 1.4 million senior citizens via voice AI — Two-way conversational outreach in regional languages replaced ineffective SMS/IVR campaigns that historically saw under 5% response rates, demonstrating government service delivery transformation through accessible technology designed for India’s linguistic diversity
  • 414,000 persons with disabilities profiled and 51,000 actionable aid profiles generated in single deployment — Department of Empowerment partnered with Sarvam’s voice platform to identify assistive device needs and financial aid eligibility automatically, while Piramal Foundation processed 49,000 interactions in peak single-day capacity with 95%+ data accuracy for policy implementation at scale

On March 11, 2026, Sarvam AI and EkStep Foundation announced results from “Listen at Scale“—a month-long pilot demonstrating that voice AI in Indian languages can reach millions who’ve been excluded by English-only, text-based digital infrastructure. The numbers validate what India’s AI ecosystem has argued for years: the country’s digital future is voice-first, not smartphone-app-first.

Five million unique users interacted with AI voice agents over 31 days. These weren’t tech-savvy urban Indians experimenting with ChatGPT. These were senior citizens in rural Uttar Pradesh enrolling in health schemes, persons with disabilities in Odisha sharing accessibility needs, and panchayat members across 16,000+ villages providing community data—all through simple phone calls in their native languages.

The deployment represents the first large-scale collaboration between India’s emerging AI startup ecosystem, government ministries seeking better service delivery, and nonprofit infrastructure providers capable of reaching the last mile. Unlike top-down government technology rollouts or isolated startup experiments, this model orchestrated three layers: Sarvam AI provided voice technology and models, EkStep Foundation contributed ecosystem infrastructure and funding, and AI4Bharat offered research frameworks and performance analysis.

The Three Use Cases That Delivered Results

Sarvam AI three use cases National Health Authority disability empowerment panchayat data 31-day results infographic

The pilot focused on government service delivery challenges where existing digital approaches had consistently underperformed.

National Health Authority needed to enroll senior citizens in Ayushman Vay Vandana Yojana, a health scheme targeting elderly populations. Traditional outreach through SMS notifications and automated IVR calls generated under 5% response rates. Senior citizens either didn’t read text messages, couldn’t navigate IVR menus, or simply ignored robotic voice prompts.

Sarvam’s voice AI agents conducted two-way conversations in regional languages, explaining scheme benefits, answering eligibility questions, and guiding enrollment steps conversationally. The authority contacted 1.4 million senior citizens and recorded a 42% increase in daily enrollments compared to previous SMS/IVR campaigns. The shift from broadcast messaging to conversational engagement transformed passive information distribution into active enrollment assistance.

The Department of Empowerment of Persons with Disabilities needed granular data on accessibility requirements to allocate assistive devices and financial aid effectively. Existing surveys relied on field workers visiting households with paper forms—slow, expensive, and difficult to scale across India’s geography.

Voice AI agents contacted and profiled 414,000 persons with disabilities, asking standardized questions about mobility needs, vision/hearing requirements, current device usage, and financial situations. The system generated 51,000 actionable profiles containing specific device recommendations and aid eligibility determinations. Field workers could then visit households already identified as eligible, dramatically improving resource allocation efficiency.

Piramal Foundation, a nonprofit operating across rural India, tested voice AI for community data collection across 16,000+ panchayats. The foundation’s field teams previously spent weeks traveling to villages conducting manual surveys. Voice AI agents reached panchayat members directly via phone, collecting village-level data on infrastructure needs, program awareness, and implementation challenges.

The system processed 49,000 interactions in a single day at peak capacity and maintained 95%+ accuracy in data collection verified against ground-truth field visits. What previously required months of manual surveying now completed in days with higher data quality and instant digital availability for policy decisions.

Why Voice Works Where Apps Failed

Voice AI vs text apps India digital divide 73 percent non-English speakers 11 languages accessibility comparison

India’s digital divide isn’t primarily about smartphone penetration or internet connectivity—it’s about interface design that assumes English literacy and comfort with text-based interaction.

Census 2011 data shows 73% of Indians don’t speak English. Even among smartphone owners, navigating English-language apps or typing in local languages requires digital literacy many rural and elderly Indians lack. The Government of India’s 2019 Digital Literacy report found 66% of rural populations struggle with text-based interfaces despite owning basic phones.

Voice AI in local languages eliminates both barriers simultaneously. Users speak naturally in their mother tongue. The AI understands context, asks clarifying questions, and responds conversationally—replicating the human interaction model Indians already trust when speaking with government officials, healthcare workers, or community leaders.

The technology shift mirrors India’s UPI payments revolution. UPI succeeded by meeting users where they were (familiar with phone numbers, comfortable with PINs) rather than demanding they learn banking apps. Voice AI succeeds by meeting users in their language through their existing behavior (making phone calls) rather than demanding they learn to read, type, and navigate apps.

The Sovereign AI Infrastructure Play

Sarvam AI’s selection under the IndiaAI Mission—receiving ₹246.72 crore in government funding—positions the company as critical infrastructure for India’s digital sovereignty strategy. Unlike deployments built on Google Assistant, Amazon Alexa, or other foreign AI platforms, Sarvam operates domestic voice models trained on Indian languages and dialects.

The company released two open-source models in February-March 2026: Sarvam 30B (30 billion parameters with Mixture of Experts architecture) and Sarvam 105B (105 billion total parameters, 10.3 billion active, optimized for reasoning tasks). Both models are available under Apache License 2.0 on Hugging Face and AIKosh, India’s national AI repository.

This open-source approach deliberately builds an application ecosystem. Twenty startups received ₹1 crore grants plus 500,000 free voice AI minutes to develop applications on Sarvam’s platform. The model echoes UPI’s success: government-funded infrastructure (NPCI built UPI rails) enabling private innovation (PhonePe, Google Pay, Paytm built payment apps). Sarvam provides voice infrastructure; startups build healthcare, education, agriculture, and governance applications.

Union Home Minister Amit Shah, speaking at the India AI Impact Summit where Sarvam presented results, stated “the future belongs to India” in AI development, explicitly citing multilingual voice capabilities as competitive differentiation against English-dominant American and Mandarin-focused Chinese AI systems.

From Pilot to Production: What Comes Next

The Listen at Scale pilot proved technical feasibility and government interest. Scaling from 5 million users to 500 million—Sarvam’s stated goal—requires infrastructure, partnerships, and continued funding.

Sarvam CEO Vivek Raghavan confirmed the company is preparing an additional fundraise to expand deployments beyond the three pilot use cases into education (reaching non-literate adult learners), agriculture (providing crop advisory to farmers), and financial inclusion (enabling voice-based banking for unbanked populations). The company previously raised $41 million from Lightspeed Venture Partners, Peak XV Partners, and Khosla Ventures.

Hardware expansion represents another frontier. Prime Minister Narendra Modi wore Sarvam Kaze smart glasses—AI-powered wearables supporting 10+ Indian languages with real-time translation—at the AI Summit. The product launches May 2026, targeting professionals needing multilingual communication and persons with visual impairments requiring voice-first computing interfaces.

The 20 startups building on Sarvam’s platform will begin deploying applications in Q2 2026, creating the first wave of India’s voice AI application layer. If these deployments demonstrate ROI comparable to the pilot results—42% enrollment increases, 95%+ data accuracy, 10x cost reduction claims—expect rapid adoption across state governments and central ministries.

India’s bet on voice-first digital infrastructure challenges the smartphone-centric model dominating global technology. If Sarvam and EkStep’s model scales successfully, it offers a template for emerging markets facing similar linguistic diversity and literacy challenges: build AI that speaks the people’s languages, not AI that demands people learn English.


FAQs

How does Sarvam’s voice AI differ from Google Assistant or Alexa for Indian languages?

Sarvam’s models are specifically trained on 11 Indian languages with dialectal variations, achieving higher accuracy for regional accents compared to Google Assistant (supports 2-3 Indian languages) or Alexa (limited Hindi support). More critically, Sarvam operates on domestic infrastructure under Indian data sovereignty rules, keeping conversation data within India rather than routing to US servers. The system is optimized for phone call interfaces (traditional mobile/landline calls) rather than requiring smart speakers or specific apps, making it accessible via India’s existing 1.17 billion phone connections. Sarvam also supports lower-bandwidth connections common in rural areas where internet-dependent assistants fail.

Can Indian startups build applications on Sarvam’s voice AI platform?

Yes, through the Listen at Scale program backed by EkStep Foundation and IndiaAI Mission funding. Twenty startups received ₹1 crore grants plus 500,000 free voice AI conversation minutes. Developers can access Sarvam’s APIs for voice recognition, natural language understanding in 11 Indian languages, text-to-speech generation, and conversation management. The open-source Sarvam 30B and 105B models are available on Hugging Face under Apache License 2.0 for customization. Startups building healthcare, education, agriculture, or governance applications can apply through EkStep Foundation’s innovation programs or directly contact Sarvam for enterprise API access at commercial rates after free tier consumption.

What are the 11 Indian languages supported and what’s the accuracy rate?

Sarvam supports Hindi, Tamil, Telugu, Kannada, Malayalam, Marathi, Bengali, Gujarati, Punjabi, Odia, and Assamese—covering approximately 90% of India’s population. The company claims 95%+ accuracy for data collection tasks (verified in Piramal Foundation deployment) and <500ms latency for voice responses, comparable to human conversation flow. Accuracy varies by language: Hindi, Tamil, and Telugu achieve 96-98% accuracy due to larger training datasets, while Assamese and Odia range 92-94% as models continue training. The system handles code-switching (mixing English words into local language sentences, common in urban India) and regional dialects through continuous learning from deployment data.

Is conversation data kept secure and private under Indian regulations?

Sarvam operates under India’s Digital Personal Data Protection Act 2023 with data residency requirements ensuring all voice recordings and transcripts remain on servers physically located in India. The company partners with government-approved cloud providers (likely Yotta Data Services, CtrlS, or similar Indian data centers) rather than AWS/Google Cloud’s foreign infrastructure. For government deployments like National Health Authority, data is encrypted in transit and at rest, with access controls limiting usage to authorized ministry personnel. Sarvam’s privacy policy commits to deleting voice recordings after transcription (retaining only text for model improvement) unless users explicitly consent to voice data retention. The ₹246.72 crore IndiaAI Mission funding included mandatory data localization and security audit requirements.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top