Anterior reduces clinical review time by 75% with Amazon Bedrock and Llama

Como estava esse conteúdo?

Anterior, a clinician-led AI company building automation for healthcare payers (insurance companies), set out to solve one of healthcare’s hardest data problems: identifying and structuring clinical documents that often arrive as hundreds of pages of unstructured records. After implementing Llama models from Meta on Amazon Bedrock to power document identification within customers’ Amazon Web Services (AWS) environments, Anterior achieved production-grade performance while meeting strict healthcare data governance requirements. Using this approach, Anterior delivered complete document extraction, improved metadata accuracy, and enabled downstream automation that reduces manual clinical review by 75 percent.

Addressing healthcare’s document identification challenge

Healthcare administrative costs in the United States exceed $950 billion annually in a $5 trillion industry. Much of this burden comes from clinical review workflows inside health plans, where physicians and nurses manually review large packets of medical records to approve treatments, verify coverage, and manage patient care. Anterior is a clinician-led AI company focused on automating these workflows for healthcare payers, organizations that sit at the intersection of providers and patients.

At the center of these workflows is a task that sounds deceptively simple: before AI can reason about a clinical case, it must understand what it's looking at. Document identification is the prerequisite for all downstream automation. Anterior must segment each incoming clinical packet into its constituent documents, identify where each begins and ends, and extract structured metadata including document type, title, author, and creation date. Only then can clinical automation proceed, whether that’s routing an MRI report to the right step in a prior authorization review, surfacing recent imaging for a clinician, or verifying that documentation supports a recommended course of care. However, clinical packets can be hundreds of pages long and arrive as faxes, scanned PDFs, and merged multi-document files. They may combine imaging, tables, forms, and even handwritten notes in ways traditional AI and ML approaches have long struggled to handle reliably at production scale.

The stakes of getting this wrong are high. “Even small errors in document identification can cascade downstream, because you’re basing clinical decisions on incomplete or incorrect information,” said Khadija Mahmoud, MD, clinician scientist at Anterior. A misidentified document boundary could mean surfacing clinical information from the wrong part of a patient record, while a dropped page could create a compliance gap. Any model capable of handling production-grade document identification also has to meet strict healthcare data governance requirements. Many of Anterior’s largest customers require that all AI processing, including LLM inference on Protected Health Information (PHI), occur entirely within their AWS environment, making external APIs or third-party infrastructure unacceptable.

Building a scalable pipeline for clinical automation

Anterior implemented a document identification workflow powered by Meta Llama models running on Amazon Bedrock. This architecture processes complex clinical document packets end to end within a customer's AWS environment, so patient data never leaves that boundary. The workflow operates as a two-stage pipeline. In the first stage, large clinical PDFs are processed using optical character recognition (OCR) and layout-aware parsing. Each page is converted into structured text extracts while preserving page references and unique identifiers. In the second stage, a language model analyzes these parsed extracts to determine document boundaries, classify document type, and extract metadata such as title, author, creation date, and a clinical description. This stage is where Llama models on Amazon Bedrock do the work.

Anterior evaluated Llama 4 Maverick 17B and Llama 4 Scout 17B against a frontier-scale proprietary multimodal model using identical prompts, datasets, and evaluation criteria. The evaluation ran entirely within AWS infrastructure and measured production readiness across accuracy, completeness, consistency, and latency. Datasets were generated through Anterior's synthetic data pipeline and curated by clinician scientists to reflect real-world complexity: ambiguous formatting, multi-document packets, and edge cases. Llama was a strong candidate for several reasons: it supports multimodal inputs (which aligns with the inherently multimodal nature of clinical data), enables efficient inference for high-throughput workloads, and offers a large context window that comfortably handles lengthy clinical packets. It is also among the most tunable open-weight models available, allowing Anterior to tailor model behavior through prompting and system-level constraints and explore smaller, specialized models tuned to specific clinical tasks rather than relying solely on frontier-scale models.

Running Llama on Amazon Bedrock allowed the company’s team of clinicians and engineers to focus on solving the clinical problem rather than managing infrastructure. Bedrock provides a unified interface for evaluating and deploying foundation models while integrating directly with AWS environments. "Many major health plans we work with ask the same question: 'Can we run AI on PHI inside our AWS environment?' Bedrock-hosted Llama models let us say yes without compromising performance," said Anuj Iravane, applied AI lead at Anterior. Bedrock also preserves flexibility: Anterior can evaluate additional models or deploy custom fine- tuned versions as clinical requirements evolve without rebuilding its architecture.

Accelerating clinical decisions and operational efficiency

Across a dataset of clinician-curated synthetic clinical cases, both Llama 4 Maverick 17B and Llama 4 Scout 17B delivered production-grade performance for clinical document identification. The models matched a frontier-scale model with hundreds of billions of parameters while running more efficiently, despite using 17B active parameters within larger model architectures. They achieved complete page coverage, meaning every page in a clinical packet was assigned exactly once with no dropped or duplicated content. The results were particularly strong in metadata extraction. Llama models matched or exceeded the frontier baseline when identifying key information such as document authorship and descriptions. Author identification accuracy reached as high as 97 percent, compared with 93.5 percent for the frontier model, while description faithfulness reached 98.4 percent. “We were impressed,” said Iravane. “Llama models on Bedrock matched our frontier baseline at a fraction of the cost—and in metadata extraction, they actually outperformed it. You don’t need the biggest model to solve healthcare’s hardest problems.”

Latency across models was comparable, but the efficiency advantages of smaller Llama models running on Bedrock compound at scale. As document volumes grow, Anterior can process more cases per unit of compute at a lower cost per document without sacrificing accuracy. The downstream impact on healthcare workflows is significant. In prior authorization review, the Anterior platform reduces manual clinical review time by 75 percent while maintaining 99.24 percent clinical accuracy. A KLAS Research case study found the system reduced patient wait times for cancer care approvals from days or weeks to just 155 seconds. For a regional healthcare organization serving about one million covered lives, these improvements translate to approximately $30 million in annual operational savings. Faster document understanding ultimately means faster clinical decisions and quicker access to care for patients.

Anterior moved from initial integration to deployment in six weeks. Llama models are now part of the company’s production document identification workflow serving multiple enterprise customers. The results also validated a broader architectural approach: that smaller open-weight models hosted on Amazon Bedrock can compete with frontier-scale general-purpose models across healthcare workflows. “Much of US healthcare lives on AWS,” said Iravane. “Proving that Llama models on Bedrock can match frontier performance means our customers can deploy faster, control costs better, and maintain the security posture they require.”

Harshvardhan Chunawala

Harshvardhan Chunawala is a Solutions Architect at Amazon Web Services and an AWS Academy Authorized Educator, as well as an AWS Golden Jacket holder. He partners with startup founders and C-suite executives globally to architect scalable, secure cloud infrastructure on AWS across industries. At AWS, he serves as a TFC member of Aerospace, AI/ML, Security, Education, and ISV. He collaborates across multiple Amazon teams, including Amazon Leo, Amazon Security, and AWS Agentic AI, among others, to shape and deliver frontier cloud capabilities across satellite, security, and trustworthy agentic AI services. A globally recognized technologist and expert in cloud security, Harshvardhan is a Visiting Faculty member and researches at Carnegie Mellon University (CMU), mentoring the next generation of cloud professionals and entrepreneurs. Harshvardhan served as Space Mission Engineering Lead and Mission Operator at CMU for America's first lunar mission since Apollo 17. Outside of work, Harshvardhan pursues his love of the skies through skydiving and flying planes.

Chakravarthy Nagarajan

Chakravarthy Nagarajan is a Principal Solutions Architect specialized in machine learning. In his current role, he helps customers solve real-world, complex business problems using machine learning and generative AI solutions. He focuses on model customization, helping enterprises realize the true potential of LLMs with power of customization.

Khadija Mahmoud

Khadija Mahmoud, MD is a Clinician Scientist in applied AI at Anterior, working on clinical datasets, benchmarking, and model evaluation across real-world healthcare systems. Her research focuses on clinical reasoning, bias and fairness in AI, and the safe deployment of machine learning in healthcare, with work published in peer-reviewed journals. She holds an MSc in Artificial Intelligence in Healthcare and an MD from Imperial College London.

Anuj Iravane

Anuj Iravane leads AI at Anterior, where he works at the intersection of research and production to build reliable, policy-adherent, and self-improving agents for clinical reasoning. Before Anterior, he worked on recommender systems at Amazon. He holds a BS in Computer Science from Northwestern University. Beyond AI, he is a producer and director in India's independent cinema ecosystem.

Como estava esse conteúdo?