zedbyl.tech/blog/Why 40% of AI Projects Get Canceled - And the Five Decisions That Separate the Rest

OPINIONField note · 001

Why 40% of AI Projects Get Canceled - And the Five Decisions That Separate the Rest

Gartner says more than 40% of AI agent projects will be canceled by the end of 2027. The reason is almost never the technology. Here is what actually goes wrong - and a practical framework for not ending up in that pile.

PublishedJun 16, 2026

Reading time15 min

AuthorNikita Chetverikov

Categoryon-prem · opinion

01 - THE CFO, THREE VENDORS, AND THREE PRESENTATIONS THAT CHANGED NOTHING

Picture a financial director at a mid-sized company. She has a clear problem: her accounts-receivable process is slow, error-prone, and eats up her team's best hours. She hears about AI. She invites three vendors to present.

Three presentations arrive. All of them are beautiful. All of them promise transformation. None of them contains a single question about how her department actually works - what her team does on a Tuesday morning, where the exceptions come from, who signs off on disputed invoices.

She picks one, signs a contract, and six months later the project is quietly shelved.

This story is not unusual. It is, in fact, the dominant pattern in enterprise AI right now. And the number that proves it is worth sitting with for a moment before we move on.

02 - THE NUMBER THAT SHOULD SURPRISE NO ONE WHO HAS DONE THIS WORK

In June 2025, Gartner published a prediction based on a poll of more than 3,400 organizations actively investing in AI agents. The finding: more than 40% of agentic AI projects will be canceled by the end of 2027 - due to escalating costs, unclear business value, or inadequate risk controls.

The Gartner analyst who authored the research put it plainly: "Most agentic AI projects right now are early-stage experiments or proof of concepts that are mostly driven by hype and are often misapplied."

This is not a fringe finding. IBM's 2025 CEO Study found that only 16% of AI initiatives have achieved scale at the enterprise level. MIT's NANDA initiative reported that 95% of generative AI pilots fail to convert into measurable business value. The Project Management Institute puts the overall AI project failure rate at 70-80%, driven by neglected data quality, overpromising, and misaligned expectations.

The numbers vary by source, but the direction is consistent: most AI investments do not deliver what was promised.

Here is the part that practitioners find unsurprising: the failure is almost never because the model hallucinated, or because the technology was "not ready." When you look at the actual case breakdowns, the same models that produce disasters in one company produce genuine results in another. The difference is not the AI. It is every decision made before and around the AI.

03 - WHAT IS ACTUALLY AT STAKE - AND WHY "WE'LL FIGURE IT OUT" IS NOT A PLAN

The public narrative around failed AI projects tends to follow a familiar script: the technology was immature, the models hallucinated, the vendor oversold. This framing is comfortable because it places the blame outside the organization.

But Compunnel's 2026 analysis of AI failures found that most failures are not due to technical issues at all - they are the result of organizational and strategic misalignment. The technology worked. The organization around it did not.

For a decision-maker, this matters in a specific way. If the problem were purely technical, you could wait for better models. But if the problem is how your organization structures its investment - how it defines the work, selects the right approach, and evaluates results - then waiting does not help. You will make the same mistakes with a more powerful model.

The cost is not just the failed project budget. It is the months of internal disruption, the erosion of trust in future initiatives, and - if you deployed something customer-facing prematurely - the damage to relationships that took years to build. Gartner estimates that in 2026, one in three companies will harm customer experience by deploying AI before the governance is in place to manage it.

The good news is that the failure pattern is well-documented and avoidable. The five levers below are not theory - they are the decision points where projects either hold together or fall apart.

04 - THE REAL PROBLEM: YOU CANNOT AUTOMATE WHAT YOU CANNOT DESCRIBE

Here is the insight that runs through every serious analysis of AI project outcomes, and it is deceptively simple: a language model is a probabilistic system - it makes educated guesses about the next best word or action. It does not have a built-in understanding of your business logic, your approval hierarchies, or what "good" looks like in your specific context.

The workflow - the clear, step-by-step description of where data comes from, where it is allowed to go, what happens when something goes wrong, who validates the result, and who is accountable - is the deterministic frame that keeps the probabilistic AI under control. Without it, you are not deploying AI. You are releasing a very articulate guesser into your operations and hoping for the best.

This is why the CFO's story at the top of this article ended the way it did. The vendor sold access to a capable model wrapped in a nice interface. But a model is just an engine. The car - the workflow, the exception handling, the quality criteria, the accountability chain - was never built.

And here is where it gets concrete: accounts receivable is not one task. It is at least three fundamentally different kinds of work. Invoice matching is routine data reconciliation. Dispute resolution with clients requires reading correspondence, understanding context, and communicating with empathy. Reporting requires synthesis and judgment. Dumping all three into a single vendor request - "we need AI for accounts receivable" - is how you get a beautiful demo that collapses on contact with real traffic.

The golden rule, stated plainly: don't automate what you can't describe.

05 - FIVE LEVERS - AND HOW TO KNOW WHICH ONE TO PULL

Once you accept that the conversation about AI should start with the workflow and not the technology, a practical framework emerges. Think of it as five levers, each appropriate for a different combination of circumstances.

Lever 1: Automate (fully absorb the process)

If a process is frequent, routine, and the cost of an error is low or easy to catch, full automation is the right move. The difference between modern AI agents and the older rule-based bots (the kind that broke whenever someone changed a field name in a form) is the ability to handle unstructured input - a messy email, a handwritten note, an awkward phrasing. That is a genuine capability improvement.

But vendors love to demo the clean 5% of cases. In real traffic, the exceptions are often 60%. If your process is full of hard-to-predict edge cases, automation will choke. Which brings you to the next lever.

Lever 2: Buy (purchase a ready-made solution)

The market now offers two distinct flavors of "buy." The first is buying primitives - modular building blocks with intelligence already embedded, like a document classification module or a request-routing model. You assemble them into your own process. The second is buying an entire workflow turnkey - a system designed for a specific function end to end.

The critical question before buying anything turnkey: does the vendor's version of this process match how your organization actually works - 80%, ideally 90%? If a law firm primarily handles standard contracts, a purpose-built legal document system fits. If the firm's competitive advantage lies in a specific approval process developed over fifteen years, a boxed solution becomes a Procrustean bed - you will spend the budget forcing the system to fit rather than using it.

Lever 3: Build (custom development)

When your context is the point - your proprietary data, your specific risk thresholds, your institutional knowledge - you have to build. This is where the largest number of catastrophic failures occur, and the reason is almost always the same: management delegates the work to the engineering team with a vague brief ("build us a risk analysis system, make it good") and no definition of what "good" means.

Engineers will close the ticket. The code will compile. But AI output quality cannot be evaluated by checking for errors in the code. You need explicit quality criteria - rubrics, benchmarks, mechanisms for evaluating whether the output is actually correct and not just plausible. If the business owner cannot define what a high-quality result looks like, the system will quietly produce confident-sounding wrong answers and no one will notice until the damage is done.

Lever 4: Hire (bring in specific expertise)

When a project stalls because of a genuine skills gap, the instinct is to hire a "purple unicorn" - someone who is simultaneously a deep domain expert, a machine learning engineer, a systems architect, and a change manager. This person does not exist in meaningful supply, and the hiring market in 2026 is noisy enough (AI-generated resumes, deepfake video interviews) that finding even a competent specialist takes months.

The more effective approach: identify the single specific competency your team lacks for the next six to twelve months, and hire or train for exactly that. A process engineering specialist. An expert in designing evaluation systems. Someone who understands your industry and can learn the AI tools - not the other way around. Corporate context takes years to develop. Technology can be learned.

Lever 5: Wait (deliberate non-action)

This one sounds like business suicide in an environment where every conference talk warns that your competitors are moving faster. But it is often the most rational choice.

Organizational capacity for change is finite. If you have a working, deterministic analytics system - one that takes a precise query and returns a precise number - replacing it with a probabilistic model right now creates problems where none existed. The smarter move: leave the underlying system alone, and build a natural-language interface on top of it so that a manager can ask a question in plain English and the system translates it into a safe, exact query. You wait in one place to make a targeted improvement in another.

The matrix that ties it together

Imagine a simple grid. On one axis: how specific is this process to your organization (from generic to highly unique)? On the other: how mature is the vendor market for this particular function?

Generic process, mature market: buy a standard solution (Zendesk, Workday).
Generic process, immature market: prototype narrowly, or wait.
Unique process, mature market with primitives: build on top of purchased components - you keep the uniqueness, you save years of foundational work.
Unique process, empty market: build from scratch. High risk. High potential upside. Proceed with eyes open.

This grid does not make the decision for you. But it stops you from applying the wrong lever to the wrong situation - which is how most of those 40% end up canceled.

06 - WHAT THIS LOOKS LIKE IN PRACTICE - AND WHAT NOT TO EXPECT

The framework above is not a guarantee. It is a way of structuring your thinking before you spend money. A few honest observations from working with organizations that have gone through this:

What tends to work: Starting with one process, not a department. Defining success criteria before the first line of code is written. Having a business owner - not just a technology team - accountable for the quality of the output. Treating the first deployment as a learning exercise with a defined evaluation period, not a permanent solution.

What does not work: Broad RFPs that bundle ten different workflows into one request. Evaluating AI vendors based on demos (demos are always clean; your data is not). Assuming that a capable model will figure out your business logic on its own. Hiring for a vague "AI leader" role without specifying what problem they are solving in the first twelve months.

Honest limits of this approach: Even a well-structured AI investment will encounter surprises. Models improve and then change behavior with updates. Processes that seemed stable turn out to have undocumented exceptions. Quality evaluation is an ongoing activity, not a one-time setup. The organizations that succeed are the ones that build the internal capability to evaluate and adjust - not the ones that hand off responsibility to a vendor and walk away.

The PMI's analysis of AI project failures makes a point worth repeating: AI requires a data-centric approach, where understanding the data and the process comes before the technology selection. This is not a technical observation. It is a management discipline.

And the deeper implication - the one worth sitting with - is this: if AI will eventually absorb every process that can be clearly described and regulated, then the lasting value of human judgment lies precisely in the processes that cannot be fully described. The ability to navigate ambiguity, to read a situation that has no precedent, to make a call when the data does not yet exist. That is not a threat to human work. It is a description of what human work is becoming.

07 - WHERE TO START IF YOU ARE LOOKING AT THIS HONESTLY

The Gartner number - 40% of projects canceled by end of 2027 - is not a reason to slow down. It is a reason to be deliberate. The organizations on the right side of that statistic are not the ones with the biggest AI budgets or the most vendor relationships. They are the ones that started by describing their work clearly, matched the right lever to each process, and built the internal capacity to evaluate results.

If you are a partner at a law firm, a clinic director, or a financial consultant looking at this honestly, the first question is not "which AI tool should we buy?" It is: "Which one specific process can we describe precisely enough to hand to a system - and what does a good result look like?"

That question is harder than it sounds. But it is the only one worth asking first.

If you want to work through it with someone who builds these systems for organizations where the data cannot leave the building, an independent assessment of what this would actually take for your specific situation is a reasonable starting point - no vendor agenda, no colorful presentation. And if your organization needs to demonstrate to clients or regulators that AI is being used responsibly, the compliance documentation and due diligence pack covers what auditors and counsel typically ask for.

Nikita Chetverikov

Fullstack · Private AI

Related field notes.

All posts →

FN / 001

2026 · 04

ARCHITECTUREApr 12, 20269 min

Why on-premises is not "cloud without internet"

The engineering trade-offs behind real isolation, GDPR data residency, and where most "private ChatGPT" pitches fall apart under audit.

FN / 002

2026 · 05

INFRAMay 7, 202616 min

Apple Silicon as an inference node: M4 Max & M3 Ultra, honest digits

Benchmarks for 70B models on M4 Max and M3 Ultra. Why Apple is betting on local inference - and what the token economics tell us about the future.

FN / 003

2026 · 05

COMPLIANCEMay 4, 202611 min

Public AI assistants in higher education: the GDPR exposure most institutions have not assessed

When staff paste student work into a public AI assistant - ChatGPT, Claude, Gemini, whichever - the institution becomes the controller for a processor it never contracted. A walk through GDPR Articles 5, 28, 32 and 35, the rulings already issued, and the architectural fix that does not require banning AI.