← Back to Insights

Insight

Six Hires, Not Sixty

Ariel Agor
Six Hires, Not Sixty

Listen · Read by Leo · click any word to jump

0:00 / · loading…

Earlier this year, Mark Zuckerberg, personally, offered Andrew Tulloch a job. The package was worth as much as $1.5 billion over at least six years. Tulloch, a co-founder of Mira Murati's Thinking Machines Lab, said no. The reporting landed in Entrepreneur and was picked up across the tech press. That was the high-water mark.

The mid-range water mark was Ruoming Pang, who left Apple for Meta in July with a package worth a reported $200 million. In June 2025, Sam Altman said publicly that Meta had been offering OpenAI staff $100 million signing bonuses, "and even more than that," in straight-up cash. Meta poached more than fifty researchers and engineers in roughly six months. By March 2026, the company's own AI hiring had to be frozen because the spend got out of hand even for Meta. Apple lost at least four AI researchers to Mark Zuckerberg's effort in July alone.

These numbers were not paid to lab directors. They were paid to individual contributors who write code, run experiments, and tune loss functions. They are the market clearing price for the people your VP of HR is bidding against when she sets the salary band for the "senior ML engineer" listed on your new Center of Excellence org chart.

The conventional advice for the last three years was simple. Stand up an internal AI team. Hire a Chief AI Officer. Build a center of excellence. Train your existing engineers on prompt engineering. Buy enterprise licenses. Roll out pilots. Scale what works. Sixty-page McKinsey decks and IBM thought-leadership white papers were written in this shape, and most of them are still on your shared drive.

That advice was written for a world that ended around the time Jeff Bezos closed his second billion-dollar round in five months.

The Wages You Cannot Match

Project Prometheus, the AI lab Bezos quietly founded in November 2025 with chemist Vik Bajaj, is closing on a $10 billion round at a $38 billion valuation, with JPMorgan and BlackRock anchoring. Total capital raised pushes past $16 billion. The lab has roughly 120 employees. They were poached, according to Built In and the Wikipedia entry compiled from the reporting, from Meta, OpenAI, Anthropic, xAI, Nvidia and Google DeepMind. One of them, Kyle Kosic, co-founded xAI before he left for Bezos. The lab focuses on physical AI: models trained on real-world experimental data, robotics interactions, and engineering workflows for aerospace, automotive, advanced manufacturing, and drug discovery.

A hundred and twenty engineers. Sixteen billion dollars. Five months. That is the unit economics of the talent market you are competing in.

Now look at the median enterprise. Frontier-lab software engineers cleared $600K to $795K in total comp as of May 2026, per Levels.fyi compensation data. The OpenAI L5 individual contributor band sits at $1.15 million, broken down as $336K base and $774K in stock. Anthropic is offering nine-figure packages to senior researchers, per multiple recent reports. PwC's Global AI Jobs Barometer documented a 56% wage premium for AI skills, up from 25% the year before. The premium itself doubled in twelve months.

Your company's compensation bands are not built for this. They were calibrated by Mercer and Willis Towers Watson consultants who benchmarked against the SaaS engineering market of 2022. The model is broken. The senior ML engineer you want to hire is already getting recruiter pings from Meta with seven figures attached. The mid-level data scientist you trained internally is being courted by Anthropic. The applied research lead you just promoted is interviewing at Prometheus.

This is the part where most strategy decks say "you need to differentiate on mission, equity, and culture." That advice was right in 2018. It is no longer responsive. The frontier labs sell mission too. They also pay nine figures.

The Team That Cannot Hold

Even if you could afford to hire, the team you assembled would not stay together for a budget cycle.

Gartner's most recent prediction, from May 2026, is that half of enterprises without a people-centric AI strategy will lose their top AI talent by 2027. The IBM CEO Study from the same month found that 76% of organizations now have a Chief AI Officer, up from 26% the prior year. Postings for Chief AI Officer and equivalent senior AI executive titles grew roughly 400% from 2023 through early 2026. Read those numbers together. Every major company stood up a CAIO in the last twelve months. Half of them will watch their best people walk out the door over the next eighteen.

This is not a recruiting problem. It is a structural one.

The internal AI team modeled on a frontier lab is a category error. A frontier lab is a research organization with a single product line, a venture-scale funding base, and a value chain that ends at the model. Your company has none of those properties. You have a P&L, a board that wants margin, an existing workforce that already does the thinking, and a need for AI capability that shows up everywhere from contract review to logistics scheduling to call center triage. A research team cannot serve that need. A research team builds models. You do not need models. You need decisions, faster.

The headcount-shaped AI team also assumes the role shapes are stable. They are not. The "prompt engineer" role peaked in late 2024 and was largely automated by mid-2025. The "RAG engineer" role peaked in 2025 and is being absorbed into the standard application engineer skill set this year. The "agent orchestration engineer" role is hot right now and will be standard within twelve months. The skills you train your team on this quarter will be commodity by year-end. The job descriptions you wrote in January are already wrong.

A roster of thirty specialists assembled around the current model is a static answer to a question that mutates monthly. The roster ages out faster than you can backfill it.

Building an Internal AI Team Without Pretending to Be a Frontier Lab

What does building an internal AI team actually look like, given all of this?

It looks small. It looks senior. It looks composable. It looks like a routing function, not a production function.

The unit of work for a useful internal AI team in 2026 is the decision about which cognitive capability to route a given problem to, on what cadence, with what evaluation, under what governance. The cognitive capability itself sits at OpenAI, Anthropic, Google DeepMind, or your private model hosted on Bedrock or Vertex. Your team does not build it. They place the bet.

That changes the headcount equation completely. You do not need thirty ML engineers. You need six routers. Or, if your company is larger, twelve. Almost never sixty.

Here is the shape I have been recommending to clients this quarter.

The Spine

Four to six senior people who own the four jobs no vendor will ever do for you: routing, evaluation, governance, and institutional memory.

Routing decides which model, which agent, which workflow, which human, and which fallback handles a given class of work. The router holds the budget. She knows what Claude Opus costs to run for two minutes versus GPT-5 versus the Gemini 2.5 Pro batch endpoint, and she watches the cost curve weekly.

Evaluation builds the test set that proves the AI is doing the work correctly, the regression suite that catches drift after a model upgrade, and the calibration loop that tells leadership when accuracy has moved from 91% to 84% because OpenAI shipped a quiet update on a Tuesday.

Governance owns the policies, the audit trail, the kill switches, the data exit conditions, and the answer to the regulator's question. She works closely with legal and security, but she sits inside the AI function so the policies actually inform implementation.

Institutional memory is the librarian. She owns the corpus of past prompts, past evaluations, past mistakes, past fixes, past procurement decisions. When a new model release lands, she is the one who knows which seventeen workflows need a re-evaluation, and why two of them will fail in production if you flip the switch on Monday.

These four jobs require senior people. None of them are entry level. None of them are filled by a recent bootcamp graduate. You can pay them well below frontier-lab rates because the work is not model research. The work is judgment, coordination, and institutional continuity. The market for these people exists, and it is not the same market Meta is paying nine figures into.

Borrowed Cognition

Your team does not write a model. Your team writes the integration around someone else's model.

The cognition itself comes from API calls. Anthropic, OpenAI, Google DeepMind, xAI, Mistral, Databricks, the AWS Bedrock catalog, the Azure OpenAI deployment, Vertex on GCP. The router decides which one for which job. The evaluator confirms it actually works. The governance lead approves the contract terms. The institutional memory function logs the choice for future reference.

When OpenAI ships a new version, your team does not retrain. Your team re-evaluates and reroutes. When Anthropic releases a new Sonnet variant, your team does not panic. Your team A/B tests on the eval set and updates the routing table.

You buy capability from the labs that print billion-dollar offer letters. You do not try to print one yourself.

Augmented Operators

The bulk of your AI capability does not live in the internal AI team. It lives in the operators: the salespeople, the contract analysts, the supply chain managers, the call center agents, the underwriters, the radiologists, the technical writers. These are the people who already hold the domain expertise your business runs on. AI augmentation makes them faster, sharper, and able to take on work that used to require a colleague.

Your six-person spine builds and maintains the tools the operators use. The operators do the actual work. The split is critical. The team that owns the AI capability does not own the business outcome. The team that owns the business outcome does not own the AI capability. The spine and the operators are loosely coupled.

This is the part that the McKinsey decks always get wrong. They keep drawing the AI team as a central function that "delivers AI use cases" to the rest of the business. That model produces a backlog, a queue, an internal vendor relationship, and a long cycle time. The model that works is the opposite. The spine gives the operators the AI primitives. The operators compose them, route them, and own the output. The spine watches, audits, and improves.

Agentic Execution

The last layer is the agents. Software that runs continuously, takes actions on systems of record, and reports back. This is the layer where most of the cost shows up, because agents run tokens by the hour. It is also the layer where the largest fraction of your eventual labor savings comes from.

The spine specifies which workflows agents should own. The evaluator confirms the agent does the work to spec. The governance lead defines the kill switches. The institutional memory function records what the agent learned. The operators interact with the agent the way they used to interact with a junior colleague.

This is the layer where the Axios story about the half-billion-dollar Claude bill came from earlier this year. An agent runs unsupervised, in a loop, with no budget controls and no evaluation harness, and at month end the finance team finds out what unbounded inference actually costs. Your spine prevents that. Without a spine, you do not deploy agents at scale. Period.

What Six Hires Look Like

A workable internal AI team for a mid-market company, in mid-2026, is something close to this.

One Head of AI Engineering, reporting to the CTO. She owns the spine. She is senior, ideally a former staff or principal engineer who has built and shipped production systems with LLMs in the loop. She is not a researcher. She has read the papers. She does not write them.

One AI Evaluation Lead. He owns the test sets, the eval framework, the regression suites, and the calibration dashboards. He thinks like a quality engineer and a data scientist. He is the one who notices when accuracy slips between releases.

One AI Governance and Policy Lead. She owns the policy stack, the audit trail, the model risk management documentation, the regulator readiness, and the procurement reviews. She sits between AI engineering, legal, and security. She has done compliance work before. She is not a lawyer; she briefs the lawyers.

One AI Platform Engineer. He owns the integration plumbing: the API connectors, the observability stack, the budget meters, the routing layer, the prompt management system. He is a full-stack engineer who has earned the trust to build production infrastructure.

Two Applied AI Leads, one each for the first two business lines that need AI. They are embedded in the line. They are paid by the line, but report dotted-line to the Head of AI Engineering for technical standards. Their job is to translate domain expertise into evaluations, prompts, and agent specifications, and to backstop the operators when the AI does something they cannot interpret.

That is six. For a company up to about a billion dollars in revenue, that is enough. For a five-billion-dollar company, it might be twelve, with one Applied AI Lead per business line and a second platform engineer. For a company larger than that, you stage the build by line of business and keep each spine small.

You do not need a Chief AI Officer to manage this team. The Head of AI Engineering reports to the CTO. The Governance Lead has a dotted line to the General Counsel and the CISO. The Applied AI Leads sit in the business. The whole structure is six to twelve people, and it is built to compose, not to scale headcount.

The hire-by-hire budget for that team, in major US metros at competitive but not frontier rates, lands somewhere around three to five million dollars in fully loaded annual cost. The annual infrastructure budget, including model API spend, observability, evaluation tooling, and vector storage, lands somewhere between one and four million depending on usage. Eight million in total, all-in, gets you a defensible internal AI capability for a company that does several hundred million in revenue. That is roughly one-third of what the 2023 Center of Excellence playbook would have cost, and it does more work.

The Architecture Is Not the Headcount

The reason this works is that AI capability in 2026 is not a labor input. It is a routing problem.

The model exists. Multiple models exist. The labs spent two hundred billion dollars on the buildout last year and they will spend more this year. Your six-person spine writes none of that code, trains none of those models, and operates none of those data centers. Your spine decides which capability to invoke for which job, monitors whether it worked, and steps in when it did not.

That decision-rights structure is the architecture. The architecture is what you actually build internally. The headcount supports the architecture. The architecture does not exist to justify the headcount.

This is the thing the conventional advice misses. The Center of Excellence model says: hire a team, the team produces the AI capability, the rest of the business consumes it. The composable model says: hire a routing layer, borrow the AI capability from the labs, embed it where the work is, and let the routing layer keep the whole thing honest.

The first model assumes AI capability is scarce, expensive, and labor-intensive, and that producing it internally is the lever. The second model assumes AI capability is abundant, getting cheaper monthly, and the lever is choosing well among external options. The numbers in mid-2026 say the second model is correct. Frontier-lab inference costs dropped roughly an order of magnitude in the last twelve months. The supply of capability is moving up and to the right. The supply of senior judgment about how to use it is not.

A team built for the first model will spend two years standing up infrastructure that the labs are commoditizing as you build it. A team built for the second model will spend two years compounding institutional knowledge about routing, evaluation, and governance, which is the thing the labs will never sell you. The composable team wins, slowly, and then suddenly.

Build the Spine, This Quarter

You do not have until next year to figure this out.

The market is moving fast. Bezos's lab raised sixteen billion dollars in five months. Meta is offering nine figures for individual contributors and was forced to halt hiring in March because the spend got out of hand even for them. Apple lost researchers to Meta in batches. The enterprise leaders I work with are not trying to compete on talent. They are racing to build the spine that lets them compose external capability faster than their competitors can.

If you are still operating from the 2023 playbook, you are about to spend somewhere between fifteen and forty million dollars hiring an AI team that will be poached out from under you, building infrastructure the labs will commoditize, and producing a backlog of internal AI projects that the business cannot consume. The Center of Excellence will be a cost center your CFO is asking pointed questions about by Q2 2027.

There is a better path. It costs less. It moves faster. It composes external cognition with internal judgment. It is built around six to twelve senior people who own routing, evaluation, governance, and institutional memory, and a much larger group of augmented operators who own the business outcomes. It is built for a world where the model gets cheaper every quarter and the judgment to wield it gets more valuable every quarter.

Architecting this kind of capability requires a partner who has built it, watched it work, and watched the alternative fail. The shape of the spine, the routing logic, the evaluation framework, the governance posture, the operator augmentation playbook, and the agentic execution patterns are knowable but specific. They are different for a regional bank than for a logistics company than for a clinical research organization. They are different again for a private equity firm with a portfolio of operating companies versus a single operating company with a hundred lines of business. The framework is universal. The build is not.

We have architected this kind of internal AI capability for clients across financial services, professional services, and industrial operations. The pattern holds. The savings are real. The capability sticks.

Sources

Want this kind of automation working for your business?

Agor AI designs and ships the systems these posts describe, scoped in weeks, not quarters.

Book a Free Strategy Call