AI that Codes - Saturation, Pricing, and the Measurement Problem

"Google it" - The first advice I received on coding on my first internship in 2016. 18 year old Ram had just discovered the magic of crowdsourcing. Any question you had about code was already asked by someone on Stack Overflow and also promptly answered by someone else. Coding almost was research then. Somewhere in the year 2022, I stared onto my VS Code interface in awe as I was developing a Streamlit application. As I wrote code, a bunch of autocompleted code showed up in translucent font that was exactly the code I intended to write. I didn't quite understand how it worked and neither did I have an appetite to care too much about how it worked. It was magic. That looked like the future! It's 2026. I use Cursor and Gemini CLI at work. I use Claude Code for my side projects. I have experimented with Codex, Antigravity and Amp with varying degrees of success. In about 10 years, I have experienced coding go from **write / copy code** to **autocomplete** to **autonomous code generation**. ![[code-write-evolution.png]] Like any reasonable person, I am compelled to understand - *Where exactly is this all going?* ## From Blue Ocean to Blood in the Water A [video I watched recently (from Caleb Writes Code)](https://www.youtube.com/watch?v=LOjwfOf39mg) traces the evolution nicely - Copilot was a glorified autocomplete, then Cursor and Windsurf forked entire IDEs to let AI do more, then terminal-based tools like Aider and Claude Code changed the game entirely. Developers went from writing code with AI suggestions to writing specs and letting AI write the code. For the first time, developers could work "one layer up in the stack," as the video puts it, translating business requirements into prompts rather than translating them into code directly. But here's the thing: everyone figured this out at roughly the same time. Claude Code, OpenCode, Cursor, Windsurf, Aider, Cline, Roo. The list keeps growing. What was once a blue ocean of opportunity is now very much a red ocean, with blood in the water and big players circling. According to [Greptile's State of AI Coding 2025 report](https://www.greptile.com/state-of-ai-coding-2025), over 85% of developers now use AI coding tools, and [GetDX's research](https://getdx.com/blog/ai-coding-assistant-pricing/) shows most developers use 2-3 different tools simultaneously. The market is projected to reach $12.3 billion by 2027. ![[robo-sharks-blood.png]] Most of these tools are backed by well-funded companies offering surprisingly cheap subscription prices. Cursor hit a $9.9 billion valuation by mid-2025 and has since [tripled to $29.3 billion](https://www.cnbc.com/2025/11/13/cursor-ai-startup-funding-round-valuation.html). GitHub Copilot generates over $2 billion in annual revenue. Prices that most enterprises can comfortably afford. So if everyone's tooling is becoming broadly similar in capability, and everyone's pricing is becoming broadly competitive, **what exactly differentiates them?** ## The Pricing Game Anthropic recently banned third-party applications from using their subsidised subscription pricing, forcing them onto the more expensive API tier. This caused a fair bit of drama, particularly around OpenCode, an open-source alternative to Claude Code that had been riding on that cheaper access. While [Anthropic's official position has been that Opencode's approach was "hacky"](https://news.ycombinator.com/item?id=46549823), it is fair to say that the pricing strategy might have to be revisited if they had such a big "loophole" out here in the first place. But, what I want to specifically talk about is the overall "cheap" pricing of coding agents. When I say "cheap", I am mainly referring to pricing that large enterprises can comfortably afford if they find value in using such tooling. Atleast at the moment, it looks like these companies are going for the *make it cheap, get people hooked, then adjust the economics once you've got them* strategy. Cursor did something similar last June, introducing a sudden $20 usage cap on their Pro plan. The utility of the tool dropped overnight for heavy users. It's not quite predatory pricing in the *you-know-who-cab-provider* sense as a colleague put it on an internal call; there's no local cabbie being run out of business here. But it's not entirely open either. These companies are building ecosystems, and ecosystems have walls. [Industry analysts predict 20-30% price reductions across mid-tier plans by Q3 2026](https://getdx.com/blog/ai-coding-assistant-pricing/) as competition intensifies, but that assumes the players can sustain the losses long enough to outlast each other. >*When the product is cheap, you're being acquired as a user. When the price goes up, you find out if you were a customer or a commodity.* ## The Measurement Problem No One Talks About Here's where the real challenge for enterprises lies. Let's say I'm running a team of five developers. We adopt an *agentic* coding tool. Over the month, we rack up £1,000 in API costs. The question my finance team will inevitably ask is: what value did that £1,000 deliver? Finance vs Tech - A Classic. ![[dev-fin-fight.png]] Let's also layer in individual performance. Developer A accounts for 50% of the API spend but delivered less impact than Developer D, who barely touched the tool. Next month, do I allocate more budget to Developer D? On what basis? Honestly? I have no idea how to answer that. And I really don't see how anyone else would answer this question with quantified numbers. Without knowing the value each pound generated, we are all just guessing. The research too is genuinely mixed. A [randomised controlled trial by METR](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/), published in mid-2025, studied experienced open-source developers working on codebases they'd contributed to for years. They found that developers using AI tools actually took 19% longer to complete tasks than those without AI, yet those same developers *believed* they were 20% faster. It's a striking perception gap. But it's worth noting the study's scope: these were highly experienced developers on familiar codebases. The results might look different for junior developers, or for anyone working on unfamiliar code. On the other side, vendor-funded studies tell a different story. [GitHub claims](https://techcrunch.com/2025/07/30/github-copilot-crosses-20-million-all-time-users/) Copilot users complete tasks 55% faster, with 46% of code now generated by AI and an 88% retention rate. These numbers are impressive, but they come from the company selling the product. [Faros AI's research](https://www.faros.ai/blog/ai-software-engineering) offers a middle ground: 75% of engineers use AI tools, yet most organisations see no measurable performance gains. [MIT Technology Review](https://www.technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/) recently ran a piece noting that while AI coding is now everywhere, not everyone is convinced it's actually helping. Did we ship features faster? Probably. By how much? Hard to say. Did the code quality improve? Maybe. How would I measure that? Lines of code is meaningless. It's a vanity metric. Velocity points? Those are made-up numbers we use to feel productive. Sorry, PMs! According to [Index.dev](https://www.index.dev/blog/developer-productivity-statistics-with-ai-tools), median PR size increased 33% and lines of code per developer grew from 4,450 to 7,839, but projects that relied too heavily on AI saw 41% more bugs and a 7.2% drop in system stability. More code is not better code. Also, AI is a skill amplifier. If you are a great data engineer, your job workflows are already solid. AI can make your work quality better. But, if you are new on the job (like a grad), AI is going to make your work look like an aftermath of handing over a garland to a monkey. It's got no bloody clue what to do with it! If a team lead can't measure value, how do they even compare Tool X to Tool Y? The only thing left to compare is price. So the entire competitive landscape collapses into a pricing war, regardless of whether one tool's LLM is more accurate or its agentic workflows are more sophisticated. > *We're in an industry obsessed with data and metrics, yet we can't measure the ROI of our own AI tools. Oh, the irony!* ## What This Means for the Rest of Us Caleb also made another interesting point in the YT video. AI adoption in coding will only truly take off when developers also become good product managers. The tribal knowledge stuck in people's heads needs to come out into well-defined documents, into `claude.md` files and spec sheets that AI can actually work with. That's a cultural shift, not a technical one. And it's slow. Meanwhile, enterprises are being asked to sign off on AI budgets without clear measurement frameworks. Finance teams are being asked to trust that the spend is worthwhile. Developers are being evaluated on "AI adoption" without anyone defining what good adoption looks like. I also did come across this article where [AWS's enterprise strategy team](https://aws.amazon.com/blogs/enterprise-strategy/measuring-the-impact-of-ai-assistants-on-software-development/) recommends A/B testing where you try and compare productivity between teams that use AI and those that do not. It's sounds great but how practical would it be? You tell me! Nevertheless, a lack of clear measurement is not evidence enough for me to scream bullshit on the whole AI coding industry. I use AI coding tools in my day to day as an analytics engineer / rookie web developer and I will not complain about the tools I use. The flaws in my code are often my own lackadaisical foundations while the stimulating possibilities and successes are often a by-product of the marriage between my clarity of what I need + agentic tooling. And I strongly believe that if you are someone who uses these tools the way they are supposed to be used, you would be of similar opinion. ## The Open Source Chatter No matter how low a tool is priced, we all love getting something for free. That is in a nutshell the beauty of open source. A "FREE" label does not just mean an absent fiscal load in software. It means more flexibility. It means no vendor lock in. It means freedom. This is exactly why there is a strong case for open source LLMs. If the model itself is the main cost driver, enterprises can simply swap in cheaper open models and break free from vendor lock-in. Problem solved. Except it's not that simple. [Raffi Krikorian, Mozilla's CTO](https://www.turingpost.com/p/krikorian), makes an interesting point - Organisations *need a certain level of maturity* before they can *even consider open source adoption*. For most teams still iterating on prototypes, it's far easier to just plug in an API and pay for it. The OpenAI API is, as Krikorian puts it, "kind of lovely" for rapid experimentation. Only when applications stabilise does the open source math start to make sense[^1]. Here's where it gets counterintuitive. If you were like me, you'd expect immature teams, the ones with tight budgets, to be drawn toward free open source alternatives. But they're often the ones who can least afford the engineering overhead. Open source isn't free; it just shifts the cost from subscription fees to internal complexity. ![[ai-stack.png]] *Source: Image generated with ChatGPT, based on a sketch based on the discussion in [A Fight Worth Having: the Case for Open Source AI](https://www.turingpost.com/p/krikorian).* The bigger issue is what Krikorian calls the missing "LAMP stack for AI." We've got compute at the bottom (still difficult to democratise), then the model layer (increasingly commoditised), then data (underexplored), and finally developer experience at the top. Most open source energy has gone into models. But the real gap is the connective tissue, the glue that makes it easy for developers to build without reinventing the wheel every time. Tools like LangChain and frameworks like [Flower AI](https://flower.ai/) are chipping away at this, but we're nowhere near the plug-and-play simplicity of, say, spinning up a web server. Standards are emerging though. [OpenRouter](https://openrouter.ai/) provides a unified API across multiple model providers. Anthropic's Model Context Protocol (MCP) is pushing toward interoperability. These matter because they embed choice into your stack early. One abstraction layer, one API call, and you can swap providers without rewriting your application. That's the Terraform playbook applied to AI - *don't get locked in, build optionality from day one*. And then there's the data question, which might be the biggest tipping point of all. Who owns the context you feed into these models? If I train a system on my company's proprietary data, should that value accrue to me or to the model provider? How do individuals get recognised, let alone compensated, for the data that trained the LLMs we're all using? Federated learning offers one path forward: specialised models running locally, across heterogeneous datasets and hardware, without shipping everything to a centralised GPU farm. ## Where I am left I use these AI coding tools daily. Cursor helps me deliver so much more at work. Claude Code has genuinely helped me build and understand basic web development outside my day job. Individually, I can see improvements, especially when I already know what I'm doing. When I have a clear mental model of the problem and just need to get the code out, AI helps. But the counterfactual is really hard to prove for an enterprise and the industry in broad. Without AI, would we suffer? Would delivery slow down? Would quality drop? I genuinely don't know. And I suspect most teams don't either. The questions I'm sitting with: - How do we build measurement frameworks for AI-assisted development that aren't just vibes? - If pricing becomes the main differentiator, what happens to innovation in the space? - Are we, as developers, being set up for a lock-in we'll regret in two years? I don't have answers. But I think these are the questions worth asking before we all get too comfortable with our shiny new tools. ## References - [Claude Code Debacle: OpenCode, AI Coding](https://www.youtube.com/watch?v=LOjwfOf39mg) by Caleb Writes Code - [Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/) by METR - [The AI Productivity Paradox Research Report](https://www.faros.ai/blog/ai-software-engineering) by Faros AI - [AI coding is now everywhere. But not everyone is convinced.](https://www.technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/) by MIT Technology Review - [AI coding assistant pricing 2025: Complete cost comparison](https://getdx.com/blog/ai-coding-assistant-pricing/) by GetDX - [Developer Productivity Statistics with AI Tools 2025](https://www.index.dev/blog/developer-productivity-statistics-with-ai-tools) by Index.dev - [The State of AI Coding 2025](https://www.greptile.com/state-of-ai-coding-2025) by Greptile - [Measuring the Impact of AI Assistants on Software Development](https://aws.amazon.com/blogs/enterprise-strategy/measuring-the-impact-of-ai-assistants-on-software-development/) by AWS - [Cursor Secures $2.3 Billion Series D at $29.3 Billion Valuation](https://www.cnbc.com/2025/11/13/cursor-ai-startup-funding-round-valuation.html) by CNBC - [GitHub Copilot crosses 20M all-time users](https://techcrunch.com/2025/07/30/github-copilot-crosses-20-million-all-time-users/) by TechCrunch - [A Fight Worth Having: The Case for Open-Source AI](https://www.turingpost.com/p/krikorian) by Turing Post (interview with Raffi Krikorian) [^1]: Pinterest reportedly saved $10 million by switching to open models, but that's Pinterest, not a scrappy startup figuring out product-market fit.