← Back to Insights

Insight

Tune On The Trace

Ariel Agor
Tune On The Trace

Listen · Read by Leo

0:00 /

On June 2, 2026, Microsoft used the Build keynote stage to announce Frontier Tuning. The press release sentence was small. "Teaching AI to work the way you do." Read that sentence twice. The pitch is about a trace. The model is incidental.

Seven days later, on June 9, Anthropic released Claude Fable 5 to the public at $10 per million input tokens and $50 per million output tokens. Less than half the rate of the prior Mythos Preview. Two days after that, on June 11, the Wall Street Journal reported that OpenAI was weighing steep price cuts to its enterprise API to defend share. Chinese providers had already undercut both by as much as nine times.

Same month. Same week, even. Two completely different sentences about what AI is going to cost an enterprise from here on.

For two years, the question on every executive's desk has been the same one. Custom AI versus off-the-shelf tools. Buy a platform from a name brand or build something proprietary inside the wall. Pick a vendor or pick an engineering bet. That question is now broken, and what happened in June broke it. The shelf got commodity-priced. The custom got redefined. Anyone still arguing buy-versus-build in those words is solving last year's problem with last year's vocabulary.

The real question, the one Microsoft asked out loud at Build and Anthropic asked sideways on its pricing page, is whether your company owns a trace clean enough to tune on. Because the model is now cheap. The trace is what is expensive.

The Shelf Went To Commodity

Start with the price story, because price is the clearest signal you ever get.

On June 9, Anthropic listed Fable 5 at $10 in, $50 out per million tokens. The previous Mythos Preview tier had been more than double that. Fable 5 sits at the top of the lab's public model lineup, the model that handles long-horizon work and can stay productive across days inside an agent harness. The Wall Street Journal piece on June 11, picked up by CNBC the same day, reported that OpenAI was weighing matching cuts. Industry trackers note that token prices have fallen sharply across the year, with Chinese providers cutting deeper still.

In other words, the model you can rent off the public API has converged. Quality differences narrow with every release. Price differences narrow even faster. The thing every off-the-shelf platform sits on, the foundation model, is becoming infrastructure the way electricity became infrastructure in the 1920s.

This is the punchline that has been building since 2023 and that arrived as fact in June. Access to a frontier model is a utility bill. The bill is going down. The utility is approachable from a laptop. So is your competitor's laptop.

The follow-on implication is the one most boards miss. If the model is utility, then the value the off-the-shelf platforms add on top of the model has to do its own work. Each platform has to justify itself separately from the model it wraps. And the moment you ask a platform to justify itself, you discover that most of them are a thin wrapper that retrieves your documents and pipes them to a model anyone can rent for ten dollars per million tokens.

That is how the off-the-shelf market collapses. Quietly, on the price page.

What Microsoft Said Out Loud

The Microsoft Build 2026 keynote did something the labs have been hinting at for a year. It put a different layer in front of the model and said, clearly, the value lives here. The training signal is the differentiator. The weights are commodity.

Frontier Tuning, in Microsoft's own words on the 365 Developer Blog, captures enterprise behavioral signals inside a managed reinforcement learning environment and feeds them back into model behavior. The system watches the trace of work being done. It watches the sequence of tool calls, the decisions made, the corrections applied, the outcomes achieved. It learns the patterns. It tunes the model on the work, not on a static export.

Microsoft published one number to make the point land. An internal Frontier Tuning deployment lifted task completion from 13 percent to 87 percent. A 13 percent agent is unusable. An 87 percent agent runs a function. Reinforcement learning on a real trace is what produces a jump like that. Static fine-tuning on a curated dataset has never produced one.

The Mayo Clinic partnership announced the same day is the customer-facing version of the same idea. Microsoft and Mayo Clinic are co-creating a frontier model for healthcare on Mayo's de-identified clinical data, deploying first inside Mayo's environment and only then becoming available to other organizations through Azure Foundry. The model is being grown inside the workflow. Not built outside and shipped in.

In the same window, OpenAI wound down its fine-tuning platform to new users, leaving Anthropic and Google as the primary fine-tuners alongside Microsoft's new RL runtime. Two big labs and one cloud provider all moved, in the same month, to position the work of tuning as the differentiator. The model itself is being given away on price.

This is the announcement under the announcement. The companies who build the frontier models are betting that their next revenue layer is the loop that wraps your workflow around the model.

The Trace Is The Asset

What is a trace?

A trace is the record of how a piece of work actually got done. Start to finish, capture the sequence. A customer service ticket arrives. An agent reads three other tickets. The agent pulls a contract clause. The agent escalates to a senior. The senior overrides the suggested response, types eleven characters into the closing line, and resolves. That entire chain, including the override and the eleven characters, is the trace. The SOP document is a summary of what should have happened. The trace is a recording of what did.

Almost no company has a clean version of this asset. They have logs. They have CRM records. They have email threads. They have meeting transcripts. None of those things are the trace. The trace is the sequence captured with enough fidelity that a reinforcement learning environment can replay it and ask the model to do better.

Retrieval-augmented generation, the technique that powers most off-the-shelf AI deployments today, gives the model access to facts. The trace gives the model access to judgments. Facts are commodities. Judgments are not. The difference between a senior underwriter and a junior underwriter is not what they know. It is what they decide in cases where the policy is silent. The decisions live in the trace.

Microsoft's pitch with Frontier Tuning is direct. The trace stays inside the Azure compliance boundary. No proprietary signal leaves the wall. The tuning is in-house in the operational sense, even though the model belongs to the lab. This is a structural argument about where value lives. The labs are saying, quietly, that the part of your business they cannot replicate is the part that has to be tuned into the model. And the only way to do that tuning is to use a tool that runs on top of their model, on their cloud, with their reinforcement learning runtime.

Read that as a leasing arrangement. The model is the building. The trace is the business operating inside the building. The lab is the landlord. The tenant who owns nothing but a clean trace still owns the only thing that compounds.

Custom AI Versus Off-The-Shelf Tools, Re-Posed

Here is the question reframed.

The old buy-versus-build debate assumed two end states. State one: rent a tool from a vendor and live inside the rules of that tool. State two: hire engineers, license a base model, and train a thing of your own from data you control. Menlo Ventures' enterprise survey from late 2025 found that 76 percent of AI use cases were purchased rather than built, up from 53 percent the year before. That number gets cited as evidence that buying won the war.

Read that number more carefully. Seventy-six percent of AI use cases got purchased because the lower layer of the stack got commoditized faster than anyone could justify building it themselves. Of course you do not roll your own foundation model. Of course you do not roll your own inference infrastructure. Frontier model API access is a Stripe-grade utility now. You buy the model the way you buy hosting.

The decision that matters lives one layer up. The trace layer. The reinforcement learning runtime. The eval suite. The continuous-tuning loop. That layer is where your operational moat sits, and that layer has to be designed. There is no off-the-shelf trace. There cannot be. By definition, the asset is your company's actual behavior, captured at the resolution a tuning loop can use.

So the real question for a board this quarter is not "do we build or do we buy." The real question is, "what part of our work, today, generates a trace clean enough to tune on, and what part is silent?" Whatever generates a clean trace can be compounded by AI. Whatever is silent stays a manual line item forever, because there is nothing for the model to learn from.

This reframes a lot of meetings. The IT function that picks a vendor and signs an SLA still adds value. The strategic work has moved up the stack. It is choosing which business processes get instrumented well enough to feed a tuner. That choice is permanent. It defines what your enterprise can outsource to silicon over the next decade.

What Off-The-Shelf Cannot Learn

A practical test of the framing. List the decisions in your business that, today, depend on a person who has been there long enough to have an instinct. Pricing exceptions on key accounts. Which tickets get escalated and to whom. How to phrase a renewal email to a customer who churned and came back. When to allow a refund outside policy.

Every one of those decisions is a moat. Every one of them is invisible to any off-the-shelf tool you can buy in 2026. Because every one of them is a judgment your senior person makes by reading context the documentation does not capture.

An off-the-shelf chatbot retrieves the policy. It cannot retrieve the override. The override is the value. The override is what your senior person earned the salary for. If you do not record the override, with context, in a form a reinforcement learning loop can replay, you have not captured the moat. The off-the-shelf tool will give the policy answer. Your senior person knew when the policy answer was the wrong one. That knowledge dies with the person.

This is why "custom AI versus off-the-shelf tools" was always the wrong axis. The axis that matters is whether you have the operational discipline to capture overrides as training signal. Companies who do can hand the model the judgment and watch it compound. Companies who do not get a chatbot that answers the easy half of their queue and stalls on the half that mattered.

The June news from Microsoft is the labs admitting this out loud. They are no longer selling the model. They are selling the infrastructure to train the model on your overrides, inside your wall, on a continuous loop. The product is the loop. The model is the input you no longer have to pay full price for.

What To Do This Quarter

Three questions get a board to the right side of this.

The first question is about visibility. Walk five of your highest-value workflows from the start of the work to the moment the work is judged complete, and ask whether the trace is captured at the resolution a tuner could use. If the work happens across fifteen browser tabs, on Slack threads, and in email replies, the trace is silent. You can rent any off-the-shelf model and you will be impressed for a quarter, then plateau, because the model has nothing to learn from. The fix is operational. Instrument the workflows. Turn the work into a record that has a beginning, a sequence, and an outcome.

The second question is about boundary. Can the trace stay inside a wall your governance team can defend? Frontier Tuning's whole pitch is that the reinforcement learning happens inside the Azure compliance boundary. The cloud vendors built this argument deliberately, because they know the trace is the part you cannot ship out of the company. If your current AI vendor needs you to send raw operational behavior to a multi-tenant system so the model can improve, you are paying twice. You are paying the rent on the model, and you are paying with the asset that should compound for you.

The third question is about authorship. Does anyone in your company own the loop? Not the vendor relationship. The loop. The eval suite, the trace ingestion, the override capture, the periodic tuning, the rollback path. This is a real role and most companies do not have it. The companies that will compound across the next three years are the ones who hire for this in 2026. The ones who treat it as a side project of IT will look up in 2028 and find their competitors' models running their decisions for them.

The Architecture Is The Work

Off-the-shelf tools have a place. Use them where the model is the answer. Coding assistance, content drafting, meeting summaries, sales call analysis. The model is the product. Buy it. Pay the utility bill. Switch when a cheaper one shows up, which it will, on a six-month rhythm now.

But anywhere the decision is the product, the architecture is the work. Capturing the trace is the work. Tuning on the trace is the work. Owning the eval that says whether the model got the override right is the work. The labs have made the model cheap so they can sell you the rest. The rest is exactly the part you should build with someone who has done it before.

Agor AI Advisory does this. We do not sell a platform. We design the loop. We instrument the workflow, define the trace schema, build the eval suite, integrate the continuous-tuning runtime your governance team will sign off on, and stay through the first three tuning cycles. The model can come from Anthropic, OpenAI, Microsoft, Google, or whoever is cheapest the week we ship. The model is the part the lab gives away. The architecture around it is the part your business keeps.

The shelf got cheap in June. The custom moved underneath, into the trace your work leaves behind. The companies that will own their decade are the ones who notice the shift this quarter and start recording.

Schedule a strategic consultation with us today.

Sources

Want this kind of automation working for your business?

Agor AI designs and ships the systems these posts describe, scoped in weeks, not quarters.

Book a Free Strategy Call