Korean Medical AI: How a Local Model Beat GPT-5.1

In the global AI arms race, bigger is almost always assumed to be better. However, a 31-billion parameter model built by a Korean startup just outscored OpenAI, Google, and Anthropic on a medical benchmark — and it did so without a Silicon Valley-sized budget. Korean Medical AI firm Acryl has quietly demonstrated that domain precision, not raw scale, may be the sharper competitive weapon.

The model, named ALLM.H, recorded an accuracy of 96.78% on the KorMedMCQA Doctor Test — a 435-question benchmark drawn from the Korean Medical Licensing Examination (KMLE) spanning 2022 to 2024. For context, the KMLE is the rigorous national exam every doctor in Korea must pass before practicing medicine. It is, in effect, a direct stress-test of high-level clinical knowledge. Anthropic’s Claude Opus 4 came closest, at 96.55%. Meanwhile, OpenAI’s GPT-5.1 scored 90.11%, and Google’s Gemini 2.5 Pro reached 90.8%. Seoul National University Hospital’s open-source medical model HARI, a domestic rival, trailed at 89.2% — a gap of 7.58 percentage points.

Specialized, domain-specific AI can carve out advantages that even the largest general-purpose models struggle to match.

The Strategy Behind Korean Medical AI: Leaner, Smarter, Cheaper

What makes this result particularly compelling is how Acryl achieved it. Rather than assembling a massive, resource-intensive model, the company took a more surgical approach. ALLM.H is built on Google’s open-source Gemma 4 (31B) foundation, fine-tuned with meticulous data curation and a sophisticated training pipeline. In other words, Acryl did not compete on compute — it competed on craft.

This approach belongs to a growing school of thought sometimes called SLM strategy (Small Language Model strategy) — the idea that a smaller model, trained on superior domain-specific data, can outperform a much larger generalist. Furthermore, this is not merely a technical preference. It is a capital-efficiency argument. GPU clusters at the scale of OpenAI or Google cost hundreds of millions of dollars. By contrast, Acryl’s approach requires a fraction of that investment.

It is not the size of the model, but the quality of the data that counts.

For investors, this matters enormously. It suggests that the AI race is not exclusively a game for trillion-dollar incumbents. As a result, smaller and more agile firms can stake out valuable niches — provided they command superior data pipelines and deep domain expertise. Acryl’s performance offers a concrete proof of concept for that thesis.

Park Woi-jin, CEO of Acryl, put it plainly: “Achieving performance that surpasses Claude Opus 4 and GPT-5.1 with a 31B model shows that data strategy and learning pipeline design are the core components. This is the result of combining our large-scale model training infrastructure with the optimization know-how of our evaluation platform, Jonathan.”

From Benchmark to Bedside: The Commercialization Roadmap

Acryl is not resting on benchmark results. The company has a clear roadmap to deploy ALLM.H in real-world clinical settings across Korea. In particular, this initiative runs under two major government-backed programs: Doctor Answer 3.0 and K-ARPA. These are state-led projects co-managed by Korea’s Ministry of Health and Welfare and the Ministry of Science and ICT, designed to accelerate medical AI commercialization. Government participation provides both funding stability and regulatory alignment — two factors that can make or break healthcare AI deployments.

The model will undergo empirical testing at Yonsei Medical Center and Kyungpook National University Hospital, two of Korea’s largest academic medical institutions. There, it will assist clinical staff with decision-making, analyze large volumes of medical data, and support patient consultations.

Crucially, Acryl plans to deploy ALLM.H using an on-premise system — meaning the software runs on local servers inside each hospital, not on external cloud infrastructure. This is a direct response to Korea’s strict data privacy framework, which governs how patient information can be stored and processed. For healthcare providers globally, data residency is one of the most persistent barriers to AI adoption. Therefore, Acryl’s on-premise approach addresses that concern head-on, making the product far more viable for regulated environments.

On-premise deployment is not a technical footnote — it is a commercial strategy that unlocks hospital doors.

A Family of Models: Building a Hospital-Grade AI Ecosystem

Acryl’s ambitions extend well beyond a single flagship model. The company plans to expand ALLM.H into a “family model” architecture — a suite of specialized sub-models tailored to individual clinical departments. A cardiology unit, for instance, would interact with a model fine-tuned on cardiology-specific data. An oncology ward would use a different variant. In addition, this modular structure allows hospitals to build a customized AI ecosystem rather than relying on a single generalist tool.

This architecture is both a product strategy and a moat-building exercise. Each department-level model requires its own curated dataset and validation process — creating barriers to replication that raw compute power alone cannot overcome. Furthermore, the deeper Acryl embeds itself into a hospital’s workflows, the harder it becomes to displace.

For the broader Healthcare AI Korea ecosystem, this modular approach represents a meaningful evolution — from point solutions toward integrated, institution-wide AI infrastructure.

Beyond Medicine: A B2B Platform Play

The technology behind ALLM.H will not stay confined to hospitals. Acryl intends to integrate its fine-tuning methodology into Jonathan, the company’s proprietary LLM evaluation and operations platform. Through Jonathan, Acryl plans to offer what it describes as an end-to-end AI internalization pipeline — a full-stack solution for organizations that want to embed specialized AI into their operations without building from scratch.

Target sectors include finance, law, and manufacturing. Each of these industries shares a common challenge: vast amounts of domain-specific data, strict regulatory requirements, and a need for accuracy that general-purpose models cannot reliably deliver. In other words, the same conditions that made ALLM.H possible in medicine apply just as well elsewhere.

This positions Acryl not merely as a healthcare AI vendor but as a specialized AI infrastructure company — a distinction that carries very different valuation implications. Nevertheless, execution risk remains. Moving from one successful vertical to three simultaneously is a different kind of challenge, and the company’s ability to replicate its data curation discipline across unfamiliar domains will be the real test.

Market Context: Why the Timing Is Right

The commercial case for Medical AI in Korea is strengthening rapidly. According to Grand View Research, Korea’s generative AI in healthcare market is projected to grow at a compound annual rate of 35.4% from 2026, reaching approximately $649 million by 2033. That trajectory reflects a convergence of factors: an aging population, a highly digitized hospital system, strong government investment, and a medical workforce under sustained pressure.

Korea’s healthcare system is among the most data-rich in the world. National health insurance coverage is near-universal, meaning patient records flow through a centralized infrastructure at scale. However, strict data localization laws have historically made it difficult for foreign cloud-based AI vendors to operate freely. As a result, domestic players with on-premise capabilities — like Acryl — hold a structural advantage that international competitors cannot easily replicate.

The market is large, the regulatory environment favors local players, and the technical proof of concept now exists.

For investors tracking the AI in Korean medicine space, Acryl’s benchmark result is a signal worth taking seriously. It demonstrates that Korea is producing not just AI consumers, but AI architects — companies capable of building world-class models with a fraction of the resources that define the global frontier. The question now is whether Acryl can scale that precision from a benchmark leaderboard to a sustainable, multi-sector business. The blueprint is credible. The execution is what remains to be proven.