Vietnam's Sovereign AI Conversation Is Stuck One Layer Too High 🇻🇳

I went to bundle the leading Vietnamese on-device speech model into a project this week and discovered I cannot ship it. The Vietnamese Zipformer published at VLSP 2025 — the de facto Vietnamese ASR model that everyone working on on-device voice ends up depending on — is licensed CC-BY-NC-ND-4.0. Non-commercial. No derivatives. Anyone who wants to ship a Vietnamese voice product has to either accept that they cannot earn money from it, or train a replacement themselves on a legally clean speech corpus that does not yet exist in the open.

This is the layer where Vietnam's sovereign AI fight actually is. Not at the foundation model. The conversation about sovereign Vietnamese AI keeps circling around the question of whether the country needs a 30 to 70 billion parameter Vietnamese model trained from scratch by a state-backed consortium. Vietnam has bigger and more relevant problems one rung lower, and they are all solvable without a state-scale check.

How the stack actually layers

The AI industry usually gets framed as three layers: silicon and infrastructure at the bottom, foundation models in the middle, applications on top. That framing is roughly right and roughly useless for thinking about where Vietnam stands in 2026. The middle is hiding a layer.

The four-layer view. The conversation about sovereign Vietnamese AI is dominated by the foundation-model layer. The actual ownership leverage in 2026 sits one layer down.

Read it bottom-up. Compute is the slowest, most expensive layer to build — decades of accumulated IP, billions in capex — and Vietnam has uncharacteristically built physical capacity. Foundation models commoditize on a 12 to 24 month cycle as new open releases arrive; building one is a multi-million-dollar, multi-year effort, and Vietnam has three credible attempts in flight already. Applications are the visible layer where users live: low capital, high competition, thin margins as model APIs commoditize.

The layer everyone misses sits between foundation models and applications. Specialized small models, open evaluation, license-clean data, and compliance tooling are not "just engineering" — they are where ownership of Vietnamese-language judgments actually gets encoded, where regulatory moats accrue under the new AI law, and where state-scale capital is not required to compete. The rest of this post is about why that layer is the one with both the largest gap and the lowest capital barrier in 2026.

What's already in the country

The dominant story about Vietnamese AI is that the country is absent from the foundation-model layer. That story was true two years ago. It is not accurate now.

FPT's AI Factory has been running thousands of NVIDIA H100 GPUs since January 2025, on a $200M build with NVIDIA, serving 18,000-plus users across healthcare, IT, and financial services and now adding HGX H200 and HGX B300 capacity. Viettel operates a cluster of 22 NVIDIA DGX B200 systems at around 1.5 ExaFLOPs (FP8) out of its Hoa Lac Technical Center, trains its own Vietnamese-specialized model on top of Llama 3 (Llama3-ViettelSolutions-8B, curated with NVIDIA NeMo Curator), and is also customizing Nemotron architecture for Vietnamese. GreenNode — the AI-cloud subsidiary VNG formed by merging its cloud and AI infrastructure units — released GreenMind-Medium-14B-R1 in September 2025: the first open-source Vietnamese reasoning LLM packaged on NVIDIA NIM, single-H100 deployable, described in an April 2025 paper out of GreenNode.

The VinAI story is more nuanced than the headlines suggest. In April 2025, Vingroup sold 65% of MovianAI to Qualcomm for $67M. The headline read is "Vietnam loses its top AI lab." The operational read is more mixed: Dr. Hung Bui, formerly of Google DeepMind, continues to lead the team from Hanoi. The talent did not leave the country. The public vinai Hugging Face org carries the explicit notice that "Effective April 1, 2025, Qualcomm acquired VinAI's Research and GenAI teams. Consequently, this Hugging Face organization is no longer being updated with new models or datasets." PhoBERT, BARTpho, ViT5, PhoWhisper, PhoGPT — the entire community-default Vietnamese backbone family froze on that date. They are still heavily used because there is no successor.

So the country has Vietnamese-owned compute at scale, three ongoing Vietnamese foundation-model attempts (one open-source, two production), and one frozen-but-still-default model family. That is not the picture of an absent layer 2. It is the picture of a layer 2 that has actors and momentum, with a coordination gap between them.

Where the gap actually is

The gap is the layer below the foundation models. Three things are missing, and all three are achievable without state-scale capital.

Open evaluation. There is no widely cited register-stratified, dialect-stratified Vietnamese language model benchmark that all the active producers agree on. The biggest active producer of public Vietnamese benchmarks is UIT-VNUHCM — UIT-ViQuAD, ViLexNorm, ViGLUE, VLUE, the Multi-Dialect Vietnamese corpus at EMNLP 2024. Each is good. None is composed into a shared eval matrix that the active Vietnamese model producers (FPT, Viettel, GreenNode, AITeamVN, 5CD-AI) all publish numbers against. Whoever defines that matrix in 2026 defines what "good Vietnamese" means in every paper that follows for the next decade. That is a higher-leverage bet than another foundation model, because foundation models commoditize on a 12 to 24 month cycle and benchmarks set the field for ten years.

License-clean Vietnamese data. The sherpa-onnx case is the canary, and it is not isolated. The widely used VIVOS corpus from AILAB-VNUHCM is 15 hours, but it is itself CC-BY-NC-SA-4.0 — academic-only. Mozilla CommonVoice Vietnamese is CC0 but limited in scale, scripted rather than spontaneous, and dominated by short read-aloud sentences. The competitive Vietnamese ASR models train on stitched corpora that include ViVoice, PhoAudioBook, and pseudo-labelled VLSP test data — much of which carries restrictive or unverified per-corpus licensing, which is why the resulting models inherit non-commercial terms downstream. There is no clean 1000 to 2000 hour CC-BY Vietnamese speech corpus that an SME building a voice product can build legally on top of. A coordinated annotation effort at $200K to $500K could permanently unlock the field for everyone.

Compliance-aware specialized models. This was a vague gap until December 2025. It has a hard deadline now.

The law changes the math

Law 134/2025/QH15 — Vietnam's first standalone AI law — takes effect March 1, 2026. Eighteen-month compliance grace for healthcare, finance, and education (full compliance by September 1, 2027). Twelve-month grace for everything else (March 1, 2027). It is the first standalone, legally binding AI law in Southeast Asia, ahead of Singapore, Indonesia, and Malaysia, which still operate under voluntary AI governance frameworks rather than binding statute.

What the law actually requires of high-risk AI providers (Article 14):

Risk management measures, regularly reviewed.
Training and operational data quality governance — provenance, balance, traceability.
A technical dossier and operational log sufficient for conformity assessment and post-deployment inspection.
Human oversight and intervention design.
Transparency and incident handling, including machine-readable marking of AI-generated content (Article 11.2).
Explainability — functional description, input data types, risk management to the authority, users, and affected persons. Source code and weights are explicitly out of scope of the disclosure obligation, but the operational behavior is in.

Foreign providers must appoint a legal representative in Vietnam. The Ministry of Science and Technology is the lead authority. The Prime Minister will publish the Danh mục — the list of high-risk AI systems requiring pre-deployment certification — by separate decree, still pending as of May 2026.

Tucked into Article 25.1 is a clause that does not get attention and should: SMEs and startups are entitled by statute to free sample dossier templates and self-evaluation tools. The government is on the hook for providing them. This is an explicit invitation for an open-source compliance toolkit. Whoever ships that toolkit first becomes the default reference implementation for the 2026 to 2027 SME compliance scramble.

A foreign foundation model on its own cannot satisfy this stack. A Vietnamese-deployed product built on a foreign model can — but only if you bolt on the dossier generation, the operational log, the AI-content marking, the explainability layer, and the risk classification. Those bolts are the actual sovereign AI work for the next two years. They are not a 70B model. They are a stack of unglamorous compliance and operational tooling that nobody is funding yet.

What I am building, and what I am not

I am starting on two pieces of this stack under nrl-ai, and I want to be honest about both their state and their gaps.

nom is a Vietnamese NLP package — diacritic restoration, spell correction, register classification, retrieval, OCR with diacritic-aware metrics, and license-tracked evaluation sets. The thesis is the specialized-small-model and evaluation layers, not a foundation model. MIT and Apache licensed, pinned dependencies, runnable benchmarks. A nom.compliance module is in design, anchored to specific articles of Law 134/2025/QH15 — dossier templates that version against the pending Government decrees, AI-content-marking helpers aligned to C2PA with a sidecar fallback for environments without a ratified Vietnamese profile, and risk-tier self-classification tooling for SMEs to fulfill Article 25.1.

edgevox is open-source on-device Vietnamese voice agent infrastructure. Sherpa-ONNX Zipformer ASR plus Piper Vietnamese TTS plus a small-language-model tool-calling layer with grammar-constrained output. The production unit is a CPU laptop, a Raspberry Pi 5, or a Jetson Orin Nano — not a cloud GPU.

What I am not yet able to claim:

No measured TTFT on a named device class. The bench harness is in progress. When the numbers land they will be in the repo with warmup, best-of-N, hardware pinned, dependency versions stated. Not in marketing copy. If you see a Vietnamese on-device voice latency number in a press release without that protocol, treat it as fiction.
The upstream ASR license is unresolved. The CC-BY-NC-ND-4.0 issue I opened with applies to every Vietnamese on-device voice project that uses the VLSP 2025 winning model, mine included. Resolving it at the data layer is on the roadmap but it requires either a paid annotation push or a community-funded open speech corpus. Neither is something one person ships alone.

What I am not building, and what an obvious coalition could:

A coordinated Vietnamese evaluation consortium across UIT-VNUHCM, JAIST Nguyen Lab, VNU-UET, and the active independent open teams. License-clean. Register-stratified. Dialect-aware. Published against by every Vietnamese model producer. This needs three meetings, not three years.
A 1000 to 2000 hour CC-BY Vietnamese speech corpus. $200K to $500K of paid annotation. One organization, or one well-coordinated consortium, ships it once, and every Vietnamese on-device voice project for the next decade can build legally on top.
A reference open-source compliance toolkit fulfilling Article 25.1 — dossier templates, self-evaluation flows, AI-content marking, risk-tier classifier. The Ministry of Science and Technology is statutorily on the hook for providing equivalents. Beating them to the reference implementation defines the conventions.

If any of these resonate with what you are already trying to do, my contact is in the footer. Coordination compounds.

What individual practitioners do this quarter

If you are a Vietnamese ML engineer reading this, the temptation when the conversation is about $10M consortia is to disengage. The actually useful response is to ship one small thing that compounds.

Pick one. Ship it before September.

Publish one license-clean Vietnamese evaluation set with at least 1,000 examples per stratification cell, sourced documentation, and a runnable bench harness. Five GitHub stars and ten downloads in the first month is success. It compounds from there.
Train one specialized Vietnamese model — diacritic restoration, register classification, dialect ASR, legal reranker — under 1B parameters, MIT or Apache licensed, with a model card that includes verified numbers from a committed measurement script. Push to Hugging Face under a stable URL.
Demo one on-device Vietnamese inference path on a named consumer device class (Jetson Orin Nano, Raspberry Pi 5, Apple Silicon laptop, mid-tier Intel CPU laptop). Publish TTFT, RTF, and peak memory with warmup and best-of-N protocol. Even one rigorously measured data point is more than the entire VLSP 2025 ASR/SER overview disclosed.

Pick one. Ship it under your real name with verified numbers. The compounding work is what is in your hands today, and Vietnam already has the pieces above and below this layer to make it count.

The sovereign AI conversation is stuck one layer too high. The work that defines what "good Vietnamese" means in AI for the next decade is one layer down, and it is not waiting for permission.

What matters

1Vietnam already has compute at scale (FPT AI Factory, Viettel DGX B200), three Vietnamese foundation-model attempts in flight (GreenMind, ViettelSolutions-8B, Nemotron-vi), and the first binding AI law in Southeast Asia. The "missing layer 2" framing is two years out of date.
2The actual sovereignty gap is one layer down: open evaluation infrastructure, license-clean Vietnamese speech and text corpora, and compliance-aware specialized models that operationalize Law 134/2025.
3Law 134/2025 takes effect March 1, 2026, with high-risk-sector compliance deadlines extending into 2027. It creates demand for specialized Vietnamese tooling that no foreign foundation model can satisfy alone.
4Article 25.1 entitles SMEs and startups to free sample dossiers and self-evaluation tools. Whoever ships an open-source reference implementation first defines the conventions for the SME compliance scramble.
5For individual practitioners: ship one open evaluation, one specialized model, or one on-device demo on a named device class with measured numbers, before September. Under your real name, under a stable license.