Agentic workflows are starting to behave like background cloud spend with a friendly face.
They run on pull requests, they look productive, and they quietly chew through tokens while nobody is paying close attention. That’s fine when you are experimenting. It gets a lot less fine once those workflows are firing all day in CI.
GitHub published a solid post today on how it reduced token usage across its own agentic workflows, and the useful part is not the percentages. It is the pattern behind them.
The Real Problem is Not the Model
A lot of teams still treat agent cost as a model-selection problem.
Use a cheaper model. Trim the prompt. Hope for the best.
That can help, but GitHub’s write-up makes the more important point: a surprising amount of agent work is not really agent work at all. It is just data fetching, tool schema overhead, and repeated context stuffing.
This matters because these costs compound in automation.
A chat session with a developer is one thing. A workflow that runs on every pull request is a budget leak if it is built carelessly.
The Biggest Win: Stop Making the Model Do Boring Plumbing
GitHub moved predictable data gathering out of the LLM loop and into ordinary CLI steps. Instead of having the agent repeatedly call tools to fetch a pull request diff, changed files, comments, or metadata, the workflow pre-downloads that information and writes it into files before the agent even starts.
That is exactly the right call. If a step is deterministic, don’t wrap it in reasoning theatre.
In practice, that means things like:
- Fetch PR diffs before the agent starts
- Fetch changed files up front
- Collect issue metadata with CLI or API calls
- Write predictable context into workspace files once
- Let the agent read local files instead of burning turns on tool calls
This is not glamorous, but it is probably the cleanest cost-control technique available right now.
MCP Tool Bloat is Real
Another useful finding: unused MCP tools are expensive passengers.
If you register a giant toolset, the schemas for those tools often become part of the request context over and over again. If the agent only needs two tools but you hand it forty, you are paying for the other thirty-eight every turn.
That is one of those problems that sounds small until you multiply it across a busy repo.
The practical fix is simple:
- Audit which tools a workflow actually uses
- Remove the ones it never touches
- Keep specialized workflows on narrow toolsets
- Stop defaulting to the full kitchen sink
This also has a side benefit: narrower tools usually make agents less weird.
Measure Cost Like an Engineer, Not Like a Marketer
GitHub also did something sensible with measurement.
Raw token counts are noisy. They hide the difference between cheap and expensive models, and they blur together small runs with big ones. So GitHub used a weighted metric it calls Effective Tokens, which treats output tokens as more expensive, cache reads as cheaper, and model tiers differently.
You do not have to copy that exact formula, but you should steal that mindset.
If you are evaluating agent efficiency, track at least:
- Total runs per workflow
- Turns per run
- Tokens per call
- Output token growth
- Cache-read versus fresh-input usage
- Cost per successful run, not just cost per run
Otherwise you will end up congratulating yourself for “optimizing” a workflow that is really just doing less work.
The Cheapest Agent Turn is the One You Never Make
That line is basically the whole lesson.
GitHub saw some of its best savings by skipping LLM work entirely when a workflow was clearly irrelevant. For example, a security-sensitive workflow could gate on whether the pull request touched the files that actually matter.
That is a very normal engineering move, and that is why it works.
Before an agent starts, ask:
- Does this event actually need model reasoning?
- Can a simple rule short-circuit the run?
- Can I classify the change with cheap deterministic checks first?
- Am I calling an agent because it is useful, or because it is available?
A lot of agent systems are still overusing intelligence where filtering would do.
Where This Gets Interesting Next
The article hints at the next step: breaking monolithic agents into smaller subagents and using cheaper models where possible.
That is probably right, but I think the more important design rule is even simpler: separate orchestration, retrieval, and judgment.
Use normal code for retrieval. Use lightweight checks for gating. Use the model only where ambiguity actually exists.
That is how these systems start looking less like expensive demos and more like infrastructure.
Why Developers Should Care
If you are running Copilot, Codex, Claude, or any other agent inside CI, this is worth paying attention to now, not later.
The problem with agent cost is that it usually arrives looking like convenience.
Then the monthly bill shows up.
GitHub’s post is useful because it is not selling magic. It is documenting ordinary engineering discipline applied to agent systems: measure the work, cut the dead weight, move deterministic steps out of the loop, and stop pretending every API call needs a model.
Source links
- GitHub Blog: Improving token efficiency in GitHub Agentic Workflows
- GitHub Blog: Agent pull requests are everywhere. Here’s how to review them.

A seasoned Senior Solutions Architect with 20 years of experience in technology design and implementation. Renowned for innovative solutions and strategic insights, he excels in driving complex projects to success. Outside work, he is a passionate fisherman and fish keeper, specializing in planted tanks.