Frontier Tech

MiniMax M3 Explained: What It Actually Changes Now

Jun 14, 2026

MiniMax M3 is an open-weight, 1-million-token-context model that pairs frontier-level coding with native image and video input, released on June 1, 2026. That single sentence is the whole story, and most of this page exists to unpack it in plain English.

The search result for the exact phrase "MiniMax M3" was nearly empty a few days ago, because the model is that new. This page is our attempt to be the clearest explanation of it on the internet right now: what shipped, how it works without the math, why it could only happen now, and where the honest limits are.

TL;DR

MiniMax M3 scores 59.0% on SWE-Bench Pro, ahead of GPT-5.5 and Gemini 3.1 Pro. It is the first model to clear both on this coding benchmark while shipping as open weights, with figures from Codersera.
It handles up to 1 million tokens of context and natively understands images and video, not just text.
It uses a new sparse-attention design (MSA) that makes long-context inference dramatically faster than the previous generation.
Pricing at launch was roughly an order of magnitude below comparable frontier models.
The catch: it is not the single best coder on every test, parameter counts were not disclosed at launch, and "open weights" arrived on a short delay rather than on day one.

If you only read one section, read Signal vs Speculation below — that is where we separate what is proven from what we are guessing.

What actually happened

On June 1, 2026, the Shanghai-based lab MiniMax released M3 to the public via its API and agent product. According to SiliconFlow, MiniMax M3 was released June 1, 2026 with a 1M-token context window and support for image and video inputs, built on the MiniMax Sparse Attention (MSA) architecture. The same listing prices it at $0.3 per million input tokens and $1.2 per million output tokens.

MiniMax M3 launched June 1, 2026 with a 1-million-token context window. That headline is confirmed directly by Pandaily, whose coverage is titled around M3 launching "with 1M context and native multimodal capabilities." The combination — long context plus multimodal plus open weights — is what made the release notable rather than incremental.

The benchmark numbers are where it gets interesting. According to DataNorth, M3 scored 59.0% on SWE-Bench Pro, 83.5 on BrowseComp, 66.0% on Terminal-Bench 2.1, and 74.2% on MCP Atlas. SWE-Bench Pro tests whether a model can resolve real software-engineering issues end to end, so a score in this range puts M3 in genuine frontier company rather than the mid-tier.

Why this matters in one table

What M3 claims	The figure	Why a non-engineer cares
Coding ability (SWE-Bench Pro)	59.0%	Resolves real code issues, not toy problems
Context window	1,000,000 tokens	Reads a whole repo or document set at once
Launch input price	$0.30 / 1M tokens	An order of magnitude cheaper than typical frontier rates
Launch output price	$1.20 / 1M tokens	Cheaper long outputs make agents affordable to run
Modalities	text, image, video	Reads screenshots, scans, and clips directly

That table is the elevator version. The rest of this page explains how each row is possible.

The mechanism, in plain language

A language model "pays attention" to every token it has seen so it can decide what to write next. The problem is that classic attention gets expensive fast: doubling the input can roughly quadruple the work. That cost is the reason long-context models have historically been slow, pricey, or both.

MiniMax M3's answer is MiniMax Sparse Attention (MSA) — a design that lets the model skip most of the irrelevant token-to-token comparisons instead of computing all of them. You can picture it as the difference between re-reading an entire book to answer one question versus jumping to the three pages that matter.

The payoff is measurable. The developer guide from Codersera reports M3 delivers a 9.7x prefill speedup and a 15.6x decoding speedup at 1M context versus the prior MiniMax M2. Prefill is the model reading your prompt; decoding is it writing the answer. Both getting an order of magnitude faster is what makes a million-token window practical rather than theoretical.

There is a second number worth knowing. According to apidog, MiniMax reports MSA cuts per-token compute to roughly 1/20 of its previous-generation model. MSA cuts per-token compute to roughly one-twentieth of the prior generation. Compute is the dominant variable cost in running these models, so cutting it that far is the lever behind the low launch price.

Plain-English glossary

Term	What it means here
Context window	How much text/image the model can consider at once
Token	A chunk of text, roughly 3/4 of a word
Open weights	The trained model file is published for anyone to download and run
Sparse attention	Skipping most token comparisons to save compute
Prefill / decode	Reading the prompt / writing the response

Why now — what constraint broke

For two years the practical wall has not been intelligence; it has been the cost and speed of long context. A model that could read a million tokens but took minutes and dollars per call was a demo, not a workflow. The constraint that broke is the efficiency of attention at long context, and M3's MSA design is the concrete instance of that break.

According to DataNorth, M3 generates output at approximately 100 tokens per second while supporting the full 1M-token window. M3 sustains roughly 100 tokens per second at full 1M context. Speed at long context, not raw benchmark score, is the variable that turns a model into something you can put inside an automated process that runs hundreds of times a day.

The pricing follows from the efficiency. According to SiliconFlow, input runs $0.3 per million tokens and output $1.2 per million tokens. When a frontier-grade model costs roughly a tenth of what comparable models cost, the math on automating a task that was previously "too expensive to run on AI" quietly flips.

Who shipped it

MiniMax is a Shanghai-based AI lab, and M3 is the successor to its M2-generation models. The "open-weight" framing is central to why this release matters beyond benchmarks: according to apidog, MiniMax promised to publish the model weights and a technical report within roughly 10 days of the June 1, 2026 launch. Open weights mean an operator can, in principle, run the model on their own infrastructure rather than only through a vendor API — which changes the data-control and cost calculus for regulated or cost-sensitive teams.

If you operate document-heavy processes, the relevant question is not "is this the smartest model" but "can I swap it in." Teams already routing documents through US Tech Automations workflows can treat M3 as a model swap at the extraction-and-reasoning step rather than a rebuild of the pipeline around it.

The honest limits

No model is free of trade-offs, and pretending otherwise is how you get burned.

Limit	What it means	Evidence basis
Not #1 on every test	M3 leads on some benchmarks, trails Claude Opus 4.7 on others	BrowseComp 83.5 beats Opus' 79.3, per DataNorth; coding leadership is narrower
Parameters undisclosed	Total/active parameter counts were not published at launch	apidog and search coverage note the technical report was still pending
Open weights on a delay	Weights were promised within ~10 days, not day one	apidog and Codersera both report the ~10-day window
Output is text	Multimodal inputs, but it writes text, not images or video	Codersera notes text-only output

According to Codersera, M3 scored 59.0% on SWE-Bench Pro versus 58.6% for GPT-5.5 and 54.2% for Gemini 3.1 Pro — a real lead, but a narrow one over GPT-5.5. A 0.4-point gap is a tie for practical purposes; the durable advantage is openness and price, not a runaway quality margin.

There is also a context nuance. According to Codersera, the window is up to 1 million tokens with a guaranteed minimum of 512K. M3 guarantees at least a 512K-token window, up to 1M. That distinction matters if you are designing a process that depends on a specific context size — design for the floor, not the ceiling.

How M3 stacks up on the numbers we can verify

It helps to see the verified figures in one place rather than scattered through the prose. The table below collects only numbers that appear in the sources cited on this page — nothing inferred.

Benchmark / metric	MiniMax M3	Comparison point
SWE-Bench Pro	59.0%	GPT-5.5 at 58.6%, Gemini 3.1 Pro at 54.2%
BrowseComp	83.5	Claude Opus 4.7 at 79.3
Terminal-Bench 2.1	66.0%	—
MCP Atlas	74.2%	—
Prefill speedup at 1M	9.7x	vs prior MiniMax M2
Decode speedup at 1M	15.6x	vs prior MiniMax M2

Two things stand out. First, M3 is not uniformly ahead — its clearest lead is on autonomous browsing, where the figures from DataNorth put it at 83.5 on BrowseComp against Opus 4.7's 79.3, while its coding lead over GPT-5.5 is a fraction of a point. Second, the speedups are the load-bearing numbers for anyone planning to actually run it: a model that is fast and cheap at 1M context is usable in production in a way that a marginally-higher benchmark score does not capture.

For a business reader, the practical filter is simple. Ignore the rows that are within a point or two of the competition — those flip with every release. Pay attention to the speed-and-price rows, because those are what determine whether you can afford to run the model on every document, every day, inside an automated process. That is the durable change, and it is the lens we use in the forecast below.

Signal vs Speculation

Everything above this line is sourced fact. Everything below is our read, clearly labeled as forecast.

Demonstrated fact (sourced): M3 exists, scores 59.0% on SWE-Bench Pro, handles up to 1M tokens, accepts image and video input, runs on MSA, and launched at roughly $0.30/$1.20 per million tokens on June 1, 2026.

Our read, looking a few years out: The benchmark lead will be forgotten within a quarter — frontier scores leapfrog constantly. The durable change is the price-and-openness combination. If a 59%-SWE-Bench model genuinely costs a tenth of the prior frontier and can be self-hosted, then the bottleneck for small and mid-size businesses moves off the model entirely and onto workflow design: what to feed it, how to check its output, and how to wire it into the tools you already use.

Our read on adoption timing: We expect the first wave of practical SMB and mid-market use to be long-document tasks — reading entire contract sets, code repositories, or claim files in one pass — because that is exactly where the 1M window and the low price intersect. The teams that win in 2026–2027 will not be the ones with the best model; they will be the ones who already had clean, observable workflows that a cheaper, longer-context model could simply plug into. M3 is a reminder that the model layer is becoming a commodity and the workflow layer is becoming the moat.

What would change our read: If the open weights never ship in usable form, or if the published technical report reveals a parameter count that makes self-hosting impractical, the "openness" advantage shrinks back toward "just another API," and the story becomes about price alone.

What to do with this if you run a business

You do not need to chase the model. You need a process that can adopt a better model without a rebuild. Concretely:

Identify the one document-heavy task that is currently "too expensive" to automate. The price drop is what makes it newly viable.
Make sure that task already produces structured, checkable output — a model is only as safe as the verification around it.
Treat the model as a swappable part. A team running extraction inside US Tech Automations workflows changes one configuration value to test M3 against their current model on real inputs, then keeps whichever wins on their data.

For the industry-specific version of this, see what M3 means for small businesses, for manufacturers, and for accounting firms.

Frequently asked questions

What is MiniMax M3 in one sentence?

MiniMax M3 is an open-weight AI model released June 1, 2026 that combines frontier coding ability, a 1-million-token context window, and native image and video understanding. According to Pandaily, it launched with 1M context and native multimodal capabilities.

Is MiniMax M3 actually better than GPT-5.5?

On one specific coding benchmark, narrowly yes. The benchmark table published by Codersera shows M3 at 59.0% on SWE-Bench Pro versus 58.6% for GPT-5.5 — a real but small lead. On other tasks the ranking differs, so "better" depends entirely on the job.

How much does MiniMax M3 cost to use?

At launch it was priced well below comparable frontier models. The model listing on SiliconFlow shows $0.3 per million input tokens and $1.2 per million output tokens, roughly an order of magnitude under typical frontier rates.

What does the 1 million token context window actually let me do?

It lets the model read very large inputs in a single pass — a full code repository, a multi-document contract set, or a long claim file. The documentation summarized by apidog puts the context window at up to 1,000,000 tokens, so you can feed it material that would have required chunking before.

Can I run MiniMax M3 on my own servers?

Eventually, in principle, because it is open-weight. The guide from Codersera notes the weights and technical report were scheduled for release within roughly 10 days of the June 1 launch, after which self-hosting becomes possible for teams with the right infrastructure.

Is MiniMax M3 multimodal — can it make images?

It reads images and video but writes text. Its multimodal capability is on the input side, so it can analyze a screenshot or a scanned page, but it does not generate images or video as output.

Should a small business switch to MiniMax M3 right now?

Only if you already have an automated, checkable workflow to plug it into. The smart move as of June 2026 is to treat it as a swappable model inside an existing process and A/B test it on your real data, not to rebuild anything around the headline.

Key Takeaways

MiniMax M3 scores 59.0% on SWE-Bench Pro, edging GPT-5.5 (58.6%) and beating Gemini 3.1 Pro (54.2%), per Codersera.
The real story is price and openness, not the benchmark margin — long context got cheap enough to run inside everyday workflows.
MSA architecture delivers a roughly 9.7x prefill and 15.6x decode speedup at 1M context, which is why the price could fall.
Self-hosting depends on the open weights actually shipping in usable form; treat that as a "wait and verify," not a done deal.
The competitive edge for businesses moves to the workflow layer — clean, observable processes that can adopt any better model as a swap.

The takeaway for operators is simple: build the pipeline, not the dependency. When you route work through agentic automation workflows, the model becomes the easy part to change — and releases like M3 turn into a quiet upgrade instead of a fire drill.

About the Author

US Tech Automations Team

AI Automation Specialists

We design and run agentic automation workflows for small and mid-size operators, and we track frontier model releases for the practical changes they create in real systems.

What MiniMax M3 Actually Means for Small Businesses