Frontier Tech

Apple Foundation Models Explained: What It Changes

Jun 20, 2026

Apple Foundation Models are Apple's family of in-house AI models — now in their third generation — that run directly on your iPhone, iPad, and Mac, read images as well as text, and are exposed to any developer through a single framework, with no cloud round-trip and no per-token bill for the on-device work.

That is not a forecast. It is what Apple shipped at WWDC 2026 on June 8, 2026.


TL;DR

On June 8, 2026, Apple introduced the third generation of Apple Foundation Models and opened them to developers. The on-device line now includes AFM 3 Core, a 3-billion-parameter model, and AFM 3 Core Advanced, a 20-billion-parameter sparse model that activates only 1 to 4 billion parameters per request, according to Apple Machine Learning Research. The on-device models are natively multimodal and now accept image input, so an app can analyze a photo or a scanned document locally — no server call, no per-token cost. The Foundation Models framework now accepts image input directly, so apps can pass photos and scanned documents to the on-device model, and gained server-side support for calling third-party models like Claude and Gemini through the same Swift API, according to MacRumors. This post explains what Apple Foundation Models are, the mechanism in plain English, why this arrived now, who shipped it, the honest limits, and what it does and does not mean for small and mid-size businesses over the next 12 to 36 months.


What Actually Happened (The Apple Foundation Models Signal)

At its Worldwide Developers Conference on June 8, 2026, Apple announced the third generation of its Apple Foundation Models — the same model family that powers Apple Intelligence features across iOS, iPadOS, and macOS — and, for the first time at this depth, handed the keys to outside developers.

There are two on-device models in the new generation. According to Apple Machine Learning Research, AFM 3 Core is a 3-billion-parameter dense model, while AFM 3 Core Advanced is a 20-billion-parameter sparse model that "activates just 1 to 4 billion parameters at a time depending on the request." The trick that makes a 20-billion-parameter model fit on a phone is storage: the same source notes the advanced model stores its full weights in flash memory (NAND) rather than requiring all of them in active memory (DRAM), so consumer hardware can run it.

The capability that matters most for business workflows is multimodality. The on-device models are natively multimodal and now accept image input, which means a developer's app can analyze photos and documents on the device itself — without a cloud round-trip or a per-token charge. To make that practical, the Foundation Models framework lets a developer attach an image directly to a model request, so the on-device model can read the text in a photo or scanned document itself. That combination — a local model that reads pictures and pulls the text out of them — is what turns "take a photo of this receipt" into a structured workflow that never leaves the phone.

Apple also widened the framework outward. According to MacRumors, the Foundation Models framework gained server-side model integration that lets developers call third-party models such as Claude and Gemini through the same Swift API, and Apple made Foundation Models on Private Cloud Compute free for developers with fewer than 2 million App Store downloads. The same report says Apple plans an open-source release of the framework later in summer 2026.

The tooling around it moved too. According to MacRumors, Xcode 27 ships a 30% smaller install size and Xcode Cloud builds run up to twice as fast — the kind of friction reduction that decides whether a small dev shop actually adopts a new platform capability.

Developer-access metricFigure
Free Private Cloud Compute under2 million App Store downloads
Per-token cost for on-device inference$0
Xcode 27 install-size reduction30%
Xcode Cloud build speedupup to 2x

Sources: MacRumors.

ModelTotal parametersActive per requestGeneration
AFM 3 Core3 billion3 billion3rd
AFM 3 Core Advanced20 billion1–4 billion3rd

Sources: Apple Machine Learning Research; MacRumors.


What "Apple Foundation Models" Actually Means — The Mechanism

Strip away the branding and there are three plain ideas doing the work.

The model runs where the data already is. A traditional cloud AI feature takes your input, ships it to a data center, runs the model on a rented GPU, and ships the answer back. Apple Foundation Models on-device do the inference on the chip in your hand. The practical consequences are not subtle: there is no network latency, no per-token API invoice for that step, and — for a receipt, a W-2, a damaged-roof photo, or a paystub — the sensitive image does not have to leave the device at all.

The model reads pictures, not just words. "Natively multimodal" means image understanding is baked into the same model rather than bolted on as a separate service. Because that image reading runs inside the same on-device model, you get a clean pipeline: capture an image, let the model read its text, and have it reason over the result — all locally. For any business that drowns in photographed or scanned paper, that is the whole ballgame.

The sparse model spends compute only when it has to. AFM 3 Core Advanced has 20 billion parameters' worth of knowledge but, per Apple Machine Learning Research, activates only 1 to 4 billion of them on any given request. That "instruction-following pruning" is why a model this large can run on battery-powered consumer hardware without melting it — you carry the capacity but pay the energy cost only for the slice each task needs.

So the developer workflow, as of June 2026, looks like this:

  1. The app captures or receives an image (a photo, a PDF page, a scanned form)

  2. The on-device model reads the text directly out of that image

  3. The model interprets that text alongside the user's instruction

  4. The model returns structured output (fields, a summary, a classification) locally

  5. For heavier reasoning, the app can route to a server model — Apple's own or a third party like Claude — through the same API

  6. The result flows into the app's own logic, with no per-token cost incurred for the on-device steps

The line between "free local step" and "paid cloud step" in step 5 is the design decision every team adopting this will spend the most time on.


Why Now: The Constraint That Broke

On-device AI has been "almost ready" for years. Two constraints kept breaking it, and both just moved.

The first was memory. A capable model historically needed all its weights resident in DRAM, and phones do not have data-center DRAM. The third-generation design sidesteps this: according to Apple Machine Learning Research, AFM 3 Core Advanced keeps its full weights in flash storage and activates only a 1-to-4-billion-parameter slice per request, so the active memory footprint stays within what a phone can spare.

The second was quality. An on-device model small enough to run on a phone used to be too weak to trust on real work. That gap narrowed sharply this generation. According to Apple Machine Learning Research, in human side-by-side text evaluations AFM 3 Core was preferred 45.6% of the time, up from a 23.3% baseline for the prior generation — and on image understanding, AFM 3 Core was preferred more than 61% of the time. The model got good enough that local processing stopped being a downgrade.

The third shift is distribution. A capability nobody can build on is a demo, not a platform. Making the framework free for developers under 2 million App Store downloads and planning an open-source release, per MacRumors, is what turns a model into an ecosystem. The constraint that broke was not just technical; it was that the on-ramp finally got wide enough for ordinary software teams to drive onto.


Who Shipped It, and the Benchmarks They Published

Apple shipped it, and — unusually — published comparative numbers rather than vibes. The figures below are Apple's own human-preference evaluations, where raters chose between the new model's output and a baseline.

CapabilityNew AFM 3 resultPrior-generation baseline
Text (Core, on-device)45.6% preferred23.3% preferred
Text (Cloud, server)64.7% preferred8.7% preferred
Image understanding (Cloud)37.8% preferred9.6% preferred
Text-to-speech (MOS, Core Advanced)4.153.87

Sources: Apple Machine Learning Research.

According to Apple Machine Learning Research, the server model AFM 3 Cloud was preferred 64.7% on text, versus 8.7% before — the largest jump in the table, and a signal that the heavier reasoning you would route to step 5 of the workflow above has also improved, not just the on-device tier.

On privacy, Apple's framing is explicit: the same source states, "We do not use our users' private personal data or user interactions when training our foundation models," and positions processing as happening on-device or through Private Cloud Compute. For regulated industries — accounting, insurance, lending — that data-handling posture is not a footnote; it is often the deciding factor in whether a tool can be used at all.


What It Means for Your Industry (The Cluster)

Apple Foundation Models are horizontal — they touch any business that handles photographed documents, scanned forms, or barcoded items. But the operational impact differs sharply by industry, because the bottleneck task differs. We have written dedicated, workflow-level breakdowns for three document-heavy verticals:

Each of those breaks down the specific daily tasks, the cost lines, and the staffing decisions this shifts. The common thread: the teams already routing documents through US Tech Automations workflows will treat on-device extraction as a model swap, not a rebuild — the orchestration, routing, and human-review steps stay; only the place the OCR runs changes.


Signal vs Speculation

Everything above this line is sourced fact as of June 2026. Everything in this section is our interpretation — labeled as such, so you can weigh it yourself.

What is demonstrated fact (sourced): Apple shipped third-generation on-device models with image input, a developer framework that accepts image attachments, free developer access under a download threshold, and third-party model routing — with published human-preference benchmarks showing real quality gains, per Apple Machine Learning Research and MacRumors.

Our read: the cost structure is the story, not the benchmark. A 45.6%-versus-23.3% preference bump is nice, but the durable shift is that high-volume, image-to-structured-data work — the exact work small firms pay per-page or per-document to outsource — can now run at zero marginal model cost on a device the staff already own. If that holds, the economic floor under "send the receipts to an offshore data-entry vendor" rises every quarter.

Our read: this is a 12-to-36-month adoption curve, not a switch. The models shipped in June 2026; the apps that wrap them for a specific accounting or insurance workflow have to be built, tested, and trusted. Expect the first wave to be document-capture features inside existing vertical apps, then standalone intake tools, then deeper agentic flows. The open-source release planned for later summer 2026, per MacRumors, likely accelerates the middle of that curve.

Our read: "on-device" will become a compliance feature, not just a speed feature. For firms handling SSNs, financial statements, and PHI-adjacent data, "the document never left the phone" is a sentence that closes deals and passes audits. We expect that framing to show up in vendor marketing long before the deepest technical capabilities do.

The honest limits: these are model and framework releases, not turnkey vertical products. Someone still has to build the app, define the fields, handle the exceptions, and decide what gets reviewed by a human. The benchmarks are Apple's own human-preference studies, not independent third-party evaluations. And on-device inference quality still trails the best server models for the hardest reasoning — which is precisely why the framework keeps a cloud-routing escape hatch.


How Teams Should Prepare

You do not need to write Swift to get ready. The preparation is operational, and it is the part US Tech Automations workflows are built to handle: map which of your document tasks are high-volume and low-judgment (those move to automated capture first), define the exact fields each document type must yield, and decide your human-review threshold before you turn anything on. The model is the easy part; the field definitions, exception routing, and review gates are where projects succeed or stall.

The sequence we recommend: inventory your document intake, instrument the current cost and turnaround per document type, then pilot automated extraction on the single highest-volume type. Firms that operationalize this first — wiring on-device or model-routed extraction into a US Tech Automations intake-and-review flow rather than treating each document type as a one-off — get the compounding benefit, because every new document type plugs into the same orchestration instead of starting from scratch.


Key Takeaways

  • Apple Foundation Models are Apple's in-house AI family; the third generation shipped June 8, 2026 and is open to developers.

  • The on-device tier is AFM 3 Core (3 billion parameters) plus AFM 3 Core Advanced (20 billion, 1–4 billion active), per Apple Machine Learning Research.

  • On-device models now accept image input and read the text in photos and documents, so document capture can run locally with no per-token cost.

  • According to MacRumors, the framework is free under 2 million App Store downloads, routes to Claude and Gemini, and goes open source later in summer 2026.

  • The business impact is industry-specific; the bottleneck task (receipts, ACORD forms, paystubs) decides where the value lands.

  • This is a 12-to-36-month adoption curve: real value comes from wiring extraction into existing intake-and-review workflows, not from the model alone.


Frequently Asked Questions

What are Apple Foundation Models?

Apple Foundation Models are Apple's family of in-house AI models that power Apple Intelligence and, as of the third generation announced June 8, 2026, are available to developers through the Foundation Models framework. The on-device versions run on the device's own chip and now read images as well as text.

How big are the new on-device models?

The on-device tier has two models. According to Apple Machine Learning Research, AFM 3 Core is a 3-billion-parameter model and AFM 3 Core Advanced is a 20-billion-parameter sparse model that activates only 1 to 4 billion parameters per request.

Can Apple Foundation Models read photos and documents?

Yes. The on-device models are natively multimodal and accept image input, so an app can read and extract the text from a photo or scanned document locally without a cloud round-trip.

Is there a per-token cost to use them?

There is no per-token charge for the on-device inference, because it runs on the device rather than a server. According to MacRumors, Apple also made Foundation Models on Private Cloud Compute free for developers with fewer than 2 million App Store downloads.

Can developers use Claude or Gemini through Apple's framework?

Yes. According to MacRumors, the Foundation Models framework gained server-side support for calling third-party models such as Claude and Gemini through the same Swift API, so an app can mix Apple's on-device model with a cloud model behind one interface.

Does this mean I can stop paying for document data entry?

Not overnight, but it lowers the floor. The capability to extract structured data from images locally at no per-token cost exists as of June 2026; what is left is building the app, defining the fields, and setting human-review gates — the operational layer that an orchestrated intake-and-review workflow handles.


Operationalize It

Apple supplied the models; the value comes from wiring them into a workflow your team actually runs. If you want to turn on-device or model-routed document extraction into a repeatable intake-and-review process — with the field definitions, exception routing, and review gates that make it trustworthy — explore how agentic workflows from US Tech Automations orchestrate the steps around the model. Start with your single highest-volume document type, instrument the before-and-after, and expand from there.

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.

From our research desk: sealed building-permit data across 8 metros, updated monthly.