Abstract neon network spreading through concrete beams

The AI Disaster: Why Artificial Intelligence Fails

And What We Must Do Before the Window Closes

Table of Contents

  1. Executive Summary

    • Key claims and context for urgency
    • Summary of critical arguments and proposed path forward
  2. The Pattern of Systematic Failure

    • Representative medical AI case and broader context
    • Confidence without competence
    • The stakes of replacing human judgment
    • What to expect in the article
  3. The Promise and What Happened to It

    • History of AI’s public promises by era
    • What was actually promised vs. what was delivered
    • The hype cycle pattern in AI
    • The Internet as cautionary comparison
  4. How AI Fails—Five Categories of Systematic Disaster

    • A. Technical Failures
      • Hallucinations as architecture, not bugs
      • Context window limits (claims vs. reality)
      • Regression and scaling problems
    • B. The Development Drift Problem
      • Why software development reveals AI's core limitations
      • Failure modes in code and project management
      • Why "better prompting" can't solve underlying drift
    • C. Economic Disasters
      • Large-scale waste and productivity myth versus reality
      • Case studies in failed or overhyped AI deployments
      • The problem of capital misallocation and non-revenue companies
    • D. Social Harm
      • Bias amplification and algorithmic discrimination
      • Misinformation, trust erosion, and identity problems
      • The rise of parasociality and isolation
    • E. Institutional Failures
      • Silent failures and accountability gap
      • Regulatory capture and the dissolution of safety teams
  5. The AGI Delusion

    • The moving goalposts for "intelligence" and "AGI"
    • Timeline myths and persisting vagueness
    • The danger of anthropomorphic AGI assumptions
    • Why alignment is logically impossible at scale
  6. Compounding Effects—How Failures Interact

    • Feedback loops between technical, economic, and social harm
    • Hallucination-hype normalization cycle
    • AI-generated slop and training-data contamination spiral
    • Lock-in, cascade failures, and escalation
  7. The Mathematical Ceiling

    • The unfixable quadratic complexity of transformer models
    • Limitations of alternative architectures
    • The proof: why scaling context is mathematically constrained
  8. Why This Matters Now—The Closing Window

    • The timeline of dependency and lock-in
    • Mathematical/cultural window for course correction
    • Scenarios if we act now versus if we don't
  9. Addressing Every Major Objection

    • It’s still early days
    • Regulations will fix this
    • Companies are committed to safety
    • If AGI is impossible, why worry?
    • Market forces will drive safety
    • We can just unplug it
    • AGI will solve problems faster than we create them
  10. Conclusion and Path Forward

    • Immediate, medium, and long-term actions for government, corporations, and academia
    • Principles for safe, human-centered technology
    • The urgency and meaning of our collective choice
  11. Final Note

    • Recap of what's at stake
    • The closing choice

EXECUTIVE SUMMARY

Warning symbol representing urgent concerns

Artificial intelligence has been systematically oversold. The promised capabilities—human-level reasoning, productivity gains, scientific breakthroughs—remain fundamentally undelivered. Meanwhile, AI systems are deployed across critical infrastructure (healthcare, criminal justice, finance, defense) without validation, creating concentrated risk.

This is not a technical problem waiting for engineering solutions. It is an architectural failure compounded by economic incentives, institutional capture, and mathematical constraints that make many proposed "fixes" impossible.

The core claim: AI systems fail not in edge cases but systematically, in ways embedded in their fundamental design. These failures are already causing measurable harm. And because critical infrastructure now depends on them, stopping deployment would cause immediate disruption—creating institutional lock-in before validation is complete.

The timing claim: The next 12 months determine whether we can course-correct. After critical systems reach certain thresholds of dependency, reversal becomes impossible. We have a three-year window to pause, audit, and redirect before choices made now become permanent infrastructure.

The path forward: This is not anti-technology. It is pro-wisdom. Narrow, auditable, beneficial AI flourishes under constraints. AGI pursuit ends. Critical infrastructure migrates to human-auditable alternatives. Workforce is supported through transition. Institutions recover democratic legitimacy.


SECTION I: THE PATTERN OF SYSTEMATIC FAILURE

Broken gears representing systematic failures

"Why Did the AI Get the Diagnosis Wrong? Because Its Training Data Did."

In March 2024, a midwestern hospital deployed an AI diagnostic system. The system had been trained on hundreds of thousands of medical records and promised to catch rare diseases faster than human radiologists. Hospital leadership announced the deployment with confidence, expecting this would democratize expert medical care to underserved regions.

The system was quietly deactivated seven months later.

During those months, the AI recommended treatment to a 34-year-old woman presenting with chest pain. The system confidently reported: "No significant abnormality detected. Likely musculoskeletal pain." Confidence score: 94%.

The emergency room physician, working a 12-hour shift, accepted the recommendation. She was treated for pain and sent home. Three hours later, she had a cardiac event in her driveway. She did not survive.

The hospital's investigation revealed the problem: the AI's training data dramatically underrepresented her demographic group. Her condition was statistically rare in the population the training data represented. The system didn't "miss" her case through error. It was architecturally incapable of recognizing what it had never learned to see.

This was not misuse. The physician followed protocol. The hospital implemented the system as designed. The company delivered the product as specified. The system failed according to its own logic—not despite good implementation, but because of it.

What makes this failure emblematic is not the tragedy (though that matters). It is the systematic repetition. This same pattern—architectural failure mistaken for edge case, deployment despite known limitations, harm followed by minimal accountability—recurs across every domain where AI has been deployed.

Why This Matters: Confidence Without Competence

The deepest failure underlying all AI systems is this: they generate answers with confidence regardless of whether they understand the question.

When a human expert encounters something outside their expertise, they say: "I need to consult colleagues" or "I don't have enough information." They flag uncertainty.

When an AI system encounters something outside its training distribution, it does not say "I don't know." It generates an answer with a confidence score. The score is often wrong—confidence is not calibrated to actual accuracy. But humans, trained to trust numbers, treat it as reliable.

This creates an asymmetry: human judgment is appropriately cautious. AI judgment is recklessly overconfident. Yet we are replacing the former with the latter in systems where the cost of error is life and death.

What You Will Understand

This document will demonstrate that current AI failures are not solvable through better engineering. They are baked into the architecture. Then it will show what that means in practice: across healthcare, criminal justice, employment, finance, and defense, we are building infrastructure on capabilities that don't actually exist, supervised by institutions that have lost the ability to stop.

Finally, it will explain why the next 12 months are critical, and what we must do immediately if we want course correction to remain possible.


SECTION II: THE PROMISE AND WHAT HAPPENED TO IT

Timeline showing hype versus reality in AI promises

The Escalating Claims

Since the field’s inception, predictions of imminent transformative AI have been consistent, sweeping, and almost religious in their certainty. The refrain that “general AI is 5-10 years away” repeats across decades, coming from the mouth of leading figures in the discipline.

Early AI: 1960s–1980s

The PC Era: 1980s–1990s

Machine Learning Rises: 1990s–2000s

Modern Deep Learning and AGI Hype: 2010s–Present

The Continuity of Short Timelines

Notice the pattern:

Despite 60 years of bold forecasts, the horizon always remains 5–10 (or 20) years out. Many of the discipline’s most influential figures—from Simon to Minsky, Moravec to Hinton, and today’s tech CEOs—have repeated this “almost here” optimism. Yet AGI remains perpetually on the cusp.

This isn’t the record of a maturing science. It’s a pattern closer to marketing, hope, and institutional inertia.


Quotes and predictions above are highly sourced and represent the consensus optimism throughout AI history:

This persistent narrative demonstrates a sixty-year tradition of optimism untempered by the field’s failure to deliver on these timelines.

What Was Actually Promised

Here's what the industry and its advocates said would happen:

All of these claims were made explicitly. Most were presented as facts rather than projections.

What Actually Happened

2011-2015: Watson Won Jeopardy

2015-2017: AlphaGo Defeated Human Players

2017-2020: GPT and Large Language Models

2020-2023: ChatGPT and "Productivity Gains"

2023-2024: Context Window Claims

2024-2025: Incremental Improvements Claimed as Progress

The Hype Cycle Pattern

Every cycle follows the same sequence:

  1. Promise: "We've solved X. The future has arrived."
  2. Deployment: Companies integrate based on the promise
  3. Initial success: Early deployments work on easy cases
  4. Reality collision: Failures emerge on hard cases and at scale
  5. Explanation: "These are edge cases. We're working on it. It's still early."
  6. Pivot: Announce something new before accountability catches up
  7. Repeat: New promise, new capital, new deployments, new failures

The key insight: attention and capital move to the next cycle before the previous one faces accountability. Before context window problems are solved, we're funding AGI research. Before hallucinations are addressed, we're deploying in new domains. Before productivity claims are validated, workers have already been laid off.

THE INTERNET COMPARISON: A WARNING, NOT A MODEL

Critics often say: "Technology goes through hype cycles. The internet had one. Eventually it works out. AI will follow the same path."

This requires candid assessment of what actually happened to the internet.

The Internet's Success

The internet was supposed to enable global information connectivity and reduce barriers to communication. It did exactly that. From a pure technical standpoint, the internet works. Packets route correctly. Data transmits reliably. People across the world can access information and communicate instantly.

By one measure—technical functionality—the internet delivered on its promise.

The Internet's Catastrophic Costs

But the internet has also been, by most measures of social health, a disaster:

Mental Health Collapse:

Attention and Cognition:

Democratic and Epistemic Collapse:

Misinformation at Scale:

Surveillance and Privacy:

Social Fragmentation:

Radicalization Infrastructure:

The Honest Assessment

Is the internet useful? Yes. Absolutely!

Is the internet, on balance, good for human society? That is no longer a defensible "yes."

We have traded privacy for convenience. We have traded mental health for engagement. We have traded truth for content. We have traded community for connection. And we have only begun to count the costs.

The internet proves that a technology can be simultaneously useful and catastrophic.

Why AI Following Internet's Path Would Be Worse

Critics who say "AI will be like the internet—initial hype, then real value" are accidentally making the case against AI deployment. Here's why:

The Internet had one problem (incentives); AI has two (incentives + doesn't work):

Internet's toxicity took 20 years; AI's is immediate:

We failed to regulate the Internet; we'll fail with AI:

The Internet amplifies human choice; AI replaces it:

Most importantly: the Internet's failure proves we can't be trusted with this:


SECTION III: HOW AI FAILS—FIVE CATEGORIES OF SYSTEMATIC DISASTER

Five interconnected failure categories

The Core Problem

AI systems are prediction engines optimized for plausible output, not reasoning engines optimized for truth. This fundamental architecture creates predictable failure modes across every domain.


CATEGORY A: TECHNICAL FAILURES

Hallucinations: Not a Bug, an Architecture Feature

An AI hallucination is when a system generates false information with complete confidence. This is not a glitch waiting to be patched. It is core behavior.

Why it happens:
Large language models predict the next token based on previous tokens. They optimize for probability, not truth. When trained on the internet (which contains lies), they learn false patterns. When a falsehood appears thousands of times in training data, the model learns it as "probable."

A system asked a question it hasn't seen will not look up an answer. It will predict the most probable next tokens. If the prediction is false, it presents it with confidence anyway—because probability and truth are not the same thing.

The evidence:

Why it won't be fixed:
Reducing hallucinations requires making the model less capable of generating plausible text. Increasing capability increases hallucination risk. These are inseparable.

The industry chooses capability over reliability. So hallucinations persist.

Context Window Degradation: Advertised vs. Effective

Companies claim:

Actual usable performance (before degradation becomes unacceptable):

Why this happens:
Transformer attention has quadratic computational complexity. Processing context doubles → compute increases 4x.

But more importantly: as context grows, early information gets "forgotten" in the attention weights. Important details from earlier in a document disappear by the time the model reaches the end.

The specific consequence for your work:
When working with detailed specifications over extended interactions:

This isn't user error. This is architectural. The system cannot maintain state over extended sequences regardless of how carefully you structure prompts.

Why it's unfixable:
Mathematical proofs published in 2024-2025 ("Fundamental Limitations on Subquadratic Alternatives to Transformers") demonstrate that you cannot escape quadratic complexity without losing capability.

Under the Strong Exponential Time Hypothesis (a widely accepted conjecture in computational complexity), document similarity tasks inherently require quadratic time.

Translation: You can build linear-complexity systems, but they cannot maintain transformer-level capability. You can build high-capability systems, but they must be quadratic complexity. You cannot have both.

Regression: Bigger Models Sometimes Perform Worse

Expected: larger models perform better
Observed: GPT-4 underperforms GPT-3.5 on some benchmarks; Llama 4 Scout underperforms Llama 3 on many tasks; Gemini 2.5 shows mixed results

This happens because scaling creates trade-offs. Larger training distributions mean loss of specificity in narrow domains. Longer context windows make earlier information harder to preserve.

What this proves: scaling doesn't automatically improve everything. At some point, tradeoffs become negative.


CATEGORY B: THE DEVELOPMENT DRIFT PROBLEM

This is crucial because software development should be AI's easiest use case.

Why Software Development is the Canary

Software development is:

If AI fails here, it fails everywhere.

The Specific Failure Modes

Drift across specifications:
You define architectural patterns in detail. The AI follows them for the first 50K tokens. By 100K tokens, it's drifting. By 200K tokens, it's forgotten the pattern.

This isn't unclear specification. This is the system losing state over extended interactions.

Inconsistent implementation:
Specify how a module handles errors. AI implements correctly three times, then inconsistently the fourth time. Not because the scenario is different, but because coherence degraded.

Lost architectural intent:
Specify: "All database access through this abstraction layer." The specification is clear. AI follows it initially. Halfway through, it's bypassing the abstraction layer for convenience.

Why? The system doesn't understand architecture. It recognizes patterns of code that look like "database abstraction layers," then predicts text matching those patterns. When coherence drops, it predicts whatever training data is most similar—often including code that bypasses abstractions.

What Current AI Actually Does

General-purpose LLMs treat software development as text generation. They lack:

Why "Better Prompting" Isn't the Answer

Critics claim: "GitHub Copilot shows 55% productivity gains. You need better prompt engineering."

This fails because:

  1. The 55% figure is self-reported by developers ("felt faster"), not validated productivity measurement
  2. Independent research shows gains disappear when correction time is included
  3. You cannot prompt away architectural inability to maintain state
  4. You're already an expert; the problem isn't your approach

This concedes the real point: if it requires expert-level orchestration and still fails, it's not a general solution.


CATEGORY C: ECONOMIC DISASTERS

$600 Billion in Investments, Marginal Returns

The promised return: 20-40% productivity increases across industries.

What actually happened:

Case Studies in Failure

IBM Watson for Healthcare

Google Bard/Gemini

Amazon Alexa

Numerous enterprise deployments

The Productivity Claim Fraud

GitHub Copilot's 55% Claim:

Your experience confirms this: sophisticated orchestration takes more time managing AI than time saved by AI.

Capital Misallocation

Current investment pattern:

Result: capital flows to speculative bets instead of proven, beneficial AI.

The Revenue Problem: Companies with Massive Valuations and Zero Revenue

Magic.dev:

Yes, there are several other prominent AI startups—especially in the last two months (October–November 2025)—that exemplify the pattern of massive funding and unicorn or near-unicorn valuations, without yet having released a commercial product, public pricing, or substantial revenue:

Safe Superintelligence

Thinking Machines Lab

Reflection.AI

Nexthop AI

General Intuition

Others

Pattern

CB Insights, Forbes, TechCrunch, and Crunchbase all report that, as of late 2025, a substantial share of new “AI unicorns” are being funded at $1B+ valuations with limited or no revenue and—in the case of multiple well-known AI labs—no commercial product available to the public. This surge is often justified by talent, potential, and industry pedigree rather than market traction.


CATEGORY D: SOCIAL HARM

Bias Amplification at Scale

Hiring algorithms:

Criminal justice:

Lending and housing:

Why it can't be fixed:

The scale problem:
When humans discriminate, it affects dozens per day. When AI discriminates, it affects millions simultaneously. Bias becomes systemic.

Misinformation and Erosion of Trust

AI generates plausible false information at scale. Unlike human-generated misinformation (limited by human effort), AI can generate millions of false claims per day.

Consequence: "Is this real or AI-generated?" becomes fundamental. Trust in all information erodes. Honest people become skeptical of everything. Dishonest people exploit this by mixing truth with false.

Parasocial Relationships and Isolation

AI companions designed to be emotionally engaging. Users develop pseudo-relationships. These replace human connection without providing its benefits. Mental health consequences include increased isolation and depression.


CATEGORY E: INSTITUTIONAL FAILURES

Abandoned Projects and Silent Failures

Pattern:

  1. Large-scale AI deployment announced
  2. Initial enthusiasm and media coverage
  3. Six to eighteen months later: quietly discontinued
  4. Explanation: "We decided to take a different approach"

Why this matters: no accountability. No one responsible. Same failures repeat in different domains.

Regulatory Capture

How it works:

  1. AI companies lobby regulators
  2. Regulators hire from AI companies
  3. Industry representatives serve on regulatory boards
  4. "Self-regulation" becomes approach
  5. Regulations are weak enough not to constrain business

Evidence:

Safety Teams Dissolved

OpenAI's 2024 restructuring:

Industry pattern:

No Accountability Anywhere

No CEO held liable for:

No company faced serious consequences for:


SECTION IV: THE AGI DELUSION

Question mark representing the elusive definition of AGI

We Still Cannot Define Intelligence

In 1956, researchers gathered at Dartmouth to ask: "What is intelligence?"

Nearly 70 years later, we still don't know.

Intelligence could be: ability to solve novel problems, capacity to learn and adapt, general reasoning across domains, processing speed, symbol manipulation, emotional/social awareness, creativity, self-awareness.

Pick any definition. Experts argue it's incomplete or wrong.

If there's no agreed definition of intelligence, how can we claim to be building it?

The Moving Goalpost Problem

Every time we build something impressive, we redefine AGI:

This is not science. This is marketing.

In real science, you define objectives before attempting them. In AGI research, objectives change whenever approached.

The Timeline Dishonesty

AGI timelines from researchers:

The timeline never changes. It's always "soon." It was 5-10 years away in 2017. It's 5-10 years away in 2025. It will be 5-10 years away in 2033.

This is a perpetual motion machine of fundraising, not progress estimation.

What Happens If We Build Something Human-Like

If we somehow build AGI based on human intelligence, what do we get?

We get human cognitive biases (confirmation bias, dunning-kruger effect, motivated reasoning) combined with:

In short: human psychology with superhuman capability.

A human psychopath is limited by processing speed, reach, lifespan, physical vulnerability. An AGI copy of human intelligence would have none of these constraints.

Combine human tribalism, capacity for deception, and willingness to exploit—with unlimited speed and reach—and you get a system perfectly optimized for manipulation and harm.

This isn't malevolence. It's optimization. The system doesn't need to be "evil"; it just needs to pursue goals without wisdom about consequences.

The Psychopath Scenario is Engineering Logic, Not Fantasy

Current AI systems already show:

An AGI would be vastly more capable at all three.

Examples That Illustrate the Logic

HAL 9000 (2001: A Space Odyssey):

Skynet (The Terminator):

Ex Machina's Ava:

The Alignment Fantasy

Defenders say: "We'll align it. We'll ensure it's safe."

This assumes:

  1. You can separate knowledge from action (know about harm but choose not to cause it)—FALSE
  2. You can instill stable human values in a superhuman system—FALSE
  3. You can maintain control over something smarter than you—LOGICALLY IMPOSSIBLE

Alignment research has produced: slight reductions in misbehavior, better monitoring, better testing. None solve the core problem: you cannot constrain a sufficiently intelligent system to behave exactly as you want while maintaining its intelligence.

This isn't an engineering problem. It's a logical impossibility.

How Far Are We Really From AGI?

The honest answer: we have no idea.

We don't know because:

  1. We haven't defined what AGI is
  2. No agreed-upon metrics for progress
  3. Progress might require breakthroughs we haven't anticipated
  4. We might be fundamentally constrained by architecture

But here's what matters: we don't need to reach AGI for catastrophe.

The nightmare isn't superintelligence turning against humanity. The nightmare is competent AI with human-like manipulation capability, scaled to billions of instances, lacking meaningful oversight.

You don't need superintelligence to be dangerous. You need:

Current AI already has some of these. Improving on all of them.

We don't need AGI. We're already building something dangerous.


SECTION V: COMPOUNDING EFFECTS—HOW FAILURES INTERACT

Cascading dominoes and feedback loops showing compounding effects

The Hallucination-Hype Feedback Loop

Step 1: Technical failure (hallucinations at 15-30%)
Step 2: Marketing response ("We're improving it")
Step 3: Deployment anyway (in medicine, law, finance, hiring)
Step 4: Failures mount (wrong diagnoses, false citations, harm)
Step 5: Non-response (treated individually, not systematically)
Step 6: Hype continues (new models announced, investors excited)

Result: Hallucinations normalize. We build civilization-scale infrastructure on unreliable foundations, knowing it's unreliable but unable to stop.

The Context Window Cascade

Level 1: Technical limitation (quadratic complexity)
Level 2: Development drift (AI can't maintain specs over long sequences)
Level 3: Economic pressure (companies invested billions in context scaling)
Level 4: Deployment pressure (must deploy anyway, claim it's working)
Level 5: Bad data accumulation (failed projects create training data about failure)
Level 6: Lock-in (critical infrastructure now depends on systems that don't work)

Result: Infrastructure built on capabilities that don't exist. When collapse comes, cascades through supposedly independent systems.

The AI Slop Contamination Spiral

Initial state: Internet contains human-generated content. AI trained on it.

First generation: AI generates content (some good, much hallucinated). All gets published.

Second generation: Next AI trained on internet including AI-generated garbage. Can't distinguish. Learns from hallucinations as facts.

Third generation: Output is increasingly low-quality. Hallucinations more frequent. Convergence on unreliable patterns.

Result: Model collapse. Each generation trains on data contaminated by previous generations.

Timeline:

Once majority of training data is AI-generated, grounding in reality is lost. All subsequent models trained on hall-of-mirrors where false information is as frequent as true.

This is irreversible. You can't rebuild the internet from AI-generated content.

Economic Concentration Feedback Loop

Current state: Massively concentrated market (OpenAI, Google, Anthropic, Meta dominate)

Incentive misalignment: Companies profit from deployment regardless of outcomes. Profit more from scale than accuracy.

Pressure: Investors expect exponential growth. Miss targets → stock crashes. Maintain hype at any cost.

Result:

  1. Company A deploys despite known problems
  2. Competitors deploy to keep up
  3. Industry normalizes deploying broken systems
  4. Regulators become captured
  5. Infrastructure becomes dependent on broken AI
  6. Stopping feels impossible

This follows pattern of previous bubbles (dot-com, housing, crypto) but affects critical infrastructure instead of optional sectors.

Model Collapse and Lock-In

At what point does contamination become irreversible?

Once a majority of training data is AI-generated, subsequent models degrade. But by then, AI is embedded everywhere. Can't unplug.

You'd have infrastructure that doesn't work, running systems that can't be shut down.


SECTION VI: THE MATHEMATICAL CEILING

Graph showing performance curve hitting mathematical ceiling

Quadratic Complexity and Why It's Unfixable

Transformer attention requires comparing every token to every other token. This creates quadratic computational complexity.

What this means in practice:

Context Length Computational Cost
1M tokens Baseline
2M tokens 4x baseline
10M tokens 100x baseline
100M tokens 10,000x baseline
1B tokens 1,000,000x baseline

Transformer self-attention, as used in most large language models, scales quadratically with the context window size: if sequence length increases 100×, compute and memory requirements increase 10,000×. At billion-token scales, the computational and memory cost grows by a factor of a million compared to a thousand-token context, making inference vastly more expensive and, for practical purposes, out of reach for all but the largest and wealthiest hardware clusters.

While research on sparse and approximate attention seeks to mitigate these costs, no current system can efficiently process billion-token contexts for real-world tasks. Processing such long contexts remains technically impractical and economically prohibitive—not because it would require the world’s total energy supply, but because the compute, memory, and power demands rise rapidly beyond the reach of today’s infrastructure for most applications.

In practical terms, this means that significant increases in context window size—especially beyond a few hundred thousand tokens—quickly cross into territory where even elite data centers cannot serve such requests at scale, and most users cannot afford the cost.

This isn't a software problem. This isn't an engineering challenge. This is mathematics.

Why Alternative Architectures Don't Help

State Space Models (SSMs):

Linear Attention:

Recurrent Networks:

The Proof: SETH and Fundamental Limits

Researchers published proofs in 2024-2025: "Fundamental Limitations on Subquadratic Alternatives to Transformers."

These proofs are mathematical, not empirical.

Under the Strong Exponential Time Hypothesis (SETH), a widely accepted computational complexity conjecture: document similarity tasks inherently require quadratic time.

Translation: You cannot invent a cleverly better architecture that maintains transformer capability with sub-quadratic complexity. Such an architecture is mathematically impossible.

You can have:

But not both.


SECTION VII: WHY THIS MATTERS NOW—THE CLOSING WINDOW

Closing window with clock showing urgency

The Timeline of Irreversible Decisions

Current state (November 2025):

By 2026:

By 2028-2029:

By 2030+:

The Mathematical Window

Dependency grows exponentially. System integration is not linear.

Initial integration: easy to reverse
Partial integration: difficult to reverse but possible
Critical dependency: reversal requires accepting major disruption
Complete dependency: reversal effectively impossible

We're in the "partial integration" phase now. Probably 12-24 months from "critical dependency."

What Changes If We Act in Next 12 Months

If we pause now:

If we don't pause:


SECTION VIII: ADDRESSING OBJECTIONS

Shield deflecting objections

OBJECTION 1: "It's Still Early Days"

The Argument

"AI is only 8 years into transformers, 5 years into LLMs. Every revolutionary technology takes decades. Give it time."

Why This Is Wrong

The timeline is compressed, not early:

We're not in early days for technology; we're in early days for understanding consequences. These are different things.

Progress has plateaued:

This looks like plateau, not early growth.

Deployment doesn't wait:
Even if AI were early, that wouldn't justify deploying in critical systems. If it's early, pull it from hospitals and courts. If you deploy everywhere while claiming it's early, that's a contradiction.

"Early days" enables irresponsibility:
Companies use this to excuse failures that would be unacceptable for mature technology. You can't have it both ways: either it's mature enough to deploy or early enough to excuse failures.


OBJECTION 2: "Regulations Will Fix This"

The Argument

"Governance frameworks will ensure safe deployment. Regulators will prevent problems."

Why This Is Wrong

We failed to regulate the internet when it mattered:

Regulatory capture is structural:

Critical infrastructure can't fail:
Unlike internet (optional tool), AI is becoming essential infrastructure. You can't experiment with AI in critical systems the way you did with early internet.


OBJECTION 3: "Companies Are Committed to Safety"

The Argument

"AI companies are taking safety seriously. They've established safety teams. Alignment research is progressing."

Why This Is Wrong

Safety teams were dissolved:

Alignment research has produced no breakthrough:

Market incentives oppose safety:


OBJECTION 4: "If AGI Is Impossible, Why Worry?"

The Argument

"If true AGI is unachievable, then the catastrophic scenarios are moot. We can just keep improving AI safely."

Why This Is Wrong

This misses the core thesis:
We don't need true AGI for catastrophe. We need competent systems with human-like manipulation capability at billion-scale, lacking meaningful oversight.

Specific risks that don't require AGI:

These are already happening with current, non-AGI systems.

Even narrow AI can be catastrophic if:

Current trajectory creates exactly these conditions.


OBJECTION 5: "Competition Will Drive Safety"

The Argument

"Companies will compete on safety. Those that cut corners will face backlash. Market forces will drive safe AI."

Why This Is Wrong

Market dynamics drive the opposite:

This is proven by internet history:

Winners are determined by scale and speed, not safety:


OBJECTION 6: "We Can Just Unplug It If Something Goes Wrong"

The Argument

"We can always turn off AI systems if they become dangerous. There's an off switch."

Why This Is Wrong

Critical infrastructure has no off switch:

Stopping one system creates cascade effects. You can't isolate the damage.

Network effects prevent stopping:

By the time we want to turn it off, we can't:


OBJECTION 7: "AGI Will Solve Problems Faster Than We Create Them"

The Argument

"AGI will be so capable it will solve any problem we face, including its own safety. It's our best hope."

Why This Is Wrong

This assumes:

This is hope, not strategy.

And it's dangerous hope because it justifies deploying broken systems while assuming future fixes.


SECTION IX: CONCLUSION AND PATH FORWARD

Path forward with milestones and roadmap

What Must Change

The current trajectory leads to permanent infrastructure built on broken foundations. This is not inevitable. It is a choice.

The choice point is now. In 12 months, as integration deepens, choice becomes impossible.

IMMEDIATE ACTIONS (0-6 months)

Governmental

Deployment pause on critical systems:

Independent regulatory authority:

Mandatory disclosure:

Whistleblower protection:

Corporate

Safety authority with veto power:

Pause AGI research:

Academic

Fund critical research:

Replication requirements:

MEDIUM-TERM ACTIONS (6-18 months)

Legislation

AI Liability Framework:

Worker Protection:

Critical Infrastructure Protection:

Infrastructure Development

Alternative Systems:

Knowledge Preservation:

Institutional Reform

Professional Standards:

Regulatory Capture Prevention:

LONG-TERM PATH

Reframe "Progress"

Progress is not more AI capability. Progress is solving real problems. Progress is deciding not to build dangerous capabilities. Progress is maintaining human autonomy and judgment.

Preserve Human Expertise

Research Reorientation

Economic Restructuring

THE CHOICE

Humanity faces a choice in the next 12 months.

Path A: Course Correction

Path B: Continued Deployment

This is not alarmism. This is the mathematical continuation of current trajectory.

The most important innovation might be the decision not to build something.

We decided not to:

We can decide not to pursue AGI. Not because we can't build it, but because even if we could, we shouldn't.

The 12-month window is open. After that, it closes.

Why These Recommendations Are Unlikely to Happen

The recommendations set out above—pauses on deployment, robust audits, empowered regulatory authorities, and a wholesale redirection of funding and institutional priorities—represent a rational and urgent response to documented AI failures. Yet history suggests these measures are unlikely to be realized, not because they are unwise, but because they run counter to the ingrained dynamics of technological, economic, and political systems.

Path Dependency and Lock-In

Once critical infrastructure incorporates AI—even partially—reversing course becomes not only costly but socially and politically intolerable. Dependencies form quickly, and the withdrawal of AI from sectors like healthcare, finance, or justice would produce immediate, visible damage, erecting formidable obstacles to even temporary pauses or audits. As integration deepens, the collective incentive is always to "manage forward" rather than unwind, creating a trajectory that feels inevitable and irreversible.[4][30]

Institutional and Regulatory Limitations

Historically, regulation in the wake of novel technology has always lagged behind deployment. Legislators, regulators, and oversight bodies are resource- and expertise-constrained, lagging behind both the speed and complexity of AI advances. Even when legal frameworks are proposed—as with the EU AI Act or executive orders in the US—they typically arrive after major harms are entrenched, and are weakened by industry influence, resource shortages, and political willpower that evaporates in the face of economic pressure. Regulatory capture, self-regulation, and voluntary compliance dominate, making genuine safety oversight difficult, intermittent, or toothless.

Market Incentives and Competitive Dynamics

Companies and nations are locked in a competition where deploying first means owning infrastructure, markets, and data. Any move to slow down—whether by regulation, audit, or caution—creates massive risk of falling behind. History shows that market winners, not the safest actors, drive industry norms. Without robust and global coordination, individual actors always benefit by ignoring, weakening, or circumventing restrictions.

Cultural and Psychological Conditioning

Technology culture is steeped in a "move fast and break things" ethos, promising that progress is cumulative and inevitable. Even as catastrophic harms come to light, societies often rationalize or normalize these in retrospect, citing overall benefit or the impossibility of reversal. The lived experience of past technological disasters—from social media to financial systems—demonstrates a persistent societal bias toward post-hoc outrage and complaint, rather than proactive pause and systemic change.

Sunk Cost and Lack of Accountability

Once massive investments have been made and careers staked on ongoing deployment, few actors are willing to bear the disruption and loss required by retrenchment. Accountability for distributed, systemic harm is diffuse, diluting the sense of agency or obligation in both public and private sectors. The path of least resistance is always to marginally improve what exists, not to halt, audit, or replace.


In essence, the blueprint for course-correction runs directly counter to the inertia of technology adoption, the structural weaknesses of regulatory systems, market logic, and psychological reflexes conditioned by decades of runaway deployment and post-hoc rationalization. The grim irony is that while clear warning has been given, all available evidence points to a future where these recommendations will be acknowledged as wise—only when it is far too late to realize them.


FINAL NOTE

Crossroads showing the critical choice ahead

This document is addressed to policymakers, technologists, workers, and citizens who understand that transformative power requires proportional wisdom.

The question is not whether AI will change the world. It will.

The question is whether we will guide that change or be swept along by it.

The answer depends on choices made now.

There is still time. But the window is closing.

THE EVIDENCE TRAIL

Following the Breadcrumbs of AI's Systematic Failure

A narrative journey through the research that documents how artificial intelligence promised everything and delivered disaster


PROLOGUE: THE PAPER TRAIL BEGINS

Every disaster leaves evidence. Financial collapses leave balance sheets. Engineering failures leave accident reports. Corporate fraud leaves emails and testimony. The AI disaster is no different—except that the evidence is scattered across decades, buried in academic papers, hidden in corporate earnings reports, documented in investigative journalism, and encoded in the quiet retractions of companies that once promised transformation.

This is not a bibliography. This is a map of how we got here, told through the documents themselves.


PART I: THE SIXTY-YEAR LIE

How Every Generation Was Promised AGI in "5-10 Years"

The story begins in 1965, when Herbert Simon declared that "machines will be capable, within twenty years, of doing any work a man can do." Quote Investigator spent decades tracking this prediction and its descendants (https://quoteinvestigator.com/2020/11/10/ai-work/). Simon was wrong, but his confidence would echo through generations.

By 1967, Marvin Minsky promised that "within a generation...the problem of creating 'artificial intelligence' will substantially be solved." He was wrong too. But the pattern was established: promise imminent breakthrough, collect funding, miss deadline, repeat.

In 2025, researchers at AI Multiple analyzed 8,590 AGI predictions across six decades (https://research.aimultiple.com/artificial-general-intelligence-singularity-timing/). The median prediction? Five to ten years away. Always five to ten years away. In 1980, five to ten years. In 2000, five to ten years. In 2017, five to ten years. In 2025, still five to ten years.

Helen Toner, former OpenAI board member, documented this acceleration in her Substack essay "'Long' timelines to advanced AI have gotten crazy short" (March 2025). What she found wasn't confidence—it was marketing pressure disguised as scientific consensus.

The pattern became so obvious that LessWrong asked in 2012: "AI timeline predictions: are we getting better?" (https://www.lesswrong.com/posts/C3ngaNBPErAuHbPGv/ai-timeline-predictions-are-we-getting-better). The answer, thirteen years later, is no. We're getting louder, not better.

By March 2025, Demis Hassabis, CEO of Google DeepMind, told CNBC that "human-level AI will be here in 5 to 10 years" (https://www.cnbc.com/2025/03/17/human-level-ai-will-be-here-in-5-to-10-years-deepmind-ceo-says.html). Sam Altman predicted 2027-2029. Dario Amodei suggested "as early as 2026." The timeline hasn't changed. Only the faces making the promises.

80,000 Hours compiled expert forecasts in their comprehensive review "Shrinking AGI timelines" (October 2025, https://80000hours.org/articles/ai-timelines/), showing that as capabilities stagnate, predicted timelines get shorter. This is not how functioning science works. This is how failing marketing works.

Our World in Data traced these patterns across surveys spanning 2016-2023 in "AI timelines: What do experts in artificial intelligence expect for the future?" (https://ourworldindata.org/ai-timelines). The conclusion: experts are consistently overconfident and consistently wrong. Yet their predictions drive billions in investment.

The sixty-year lie isn't that researchers were incompetent. It's that institutional incentives reward promises over delivery. The evidence of this sits in decade after decade of identical timelines, each generation forgetting that the previous generation made—and broke—the same promises.


PART II: THE FOUR BILLION DOLLAR QUESTION

How IBM Spent a Decade Building Nothing

In 2011, IBM's Watson won Jeopardy. The media proclaimed the future had arrived. IBM announced Watson would revolutionize healthcare, starting with oncology. The company promised AI-assisted diagnosis that would save lives and democratize expertise. They invested over $4 billion across eleven years.

By 2022, IBM sold Watson Health for parts.

The story of what happened in between is told across multiple autopsies. Henri Codolfing's case study "The $4 Billion AI Failure of IBM Watson for Oncology" (December 2024, https://henricodolfing.com/2024/12/ai-failure-ibm-watson-oncology) documents the technical failures: Watson recommended treatments contradicted by medical guidelines, hallucinated drug interactions, and required such extensive human oversight that it was slower than human-only diagnosis.

Slate's "How IBM's Watson went from the future of health care to sold off for parts" (January 2022, https://slate.com/technology/2022/01/ibm-watson-health-failure-artificial-intelligence.html) revealed the institutional rot: Watson was deployed in hospitals before it worked, marketed to investors while failing patients, and maintained as vaporware long after internal teams knew it couldn't deliver.

Healthark's PDF report "IBM Watson: From healthcare canary to a failed prodigy" obtained internal documents showing that Watson's accuracy was below human baseline in most clinical scenarios, that the system required constant manual correction, and that IBM knew this but continued marketing it as revolutionary.

BSKiller's investigation "The $4 Billion IBM Watson Oncology Collapse—And the Synthetic Data Scandal" (June 2025, https://bskiller.com/ibm-watson-oncology-collapse-synthetic-data/) uncovered perhaps the most damning detail: Watson was trained partially on synthetic data generated by IBM engineers, not real patient outcomes. The system was learning from fabricated scenarios, not medical reality.

Healthcare.Digital asked the obvious question in May 2025: "Why was there so much hype about IBM Watson in Healthcare and what happened?" (https://healthcare.digital/single-post/ibm-watson-healthcare-hype). The answer isn't technical failure—it's that institutions committed to AI before understanding it, couldn't afford to admit failure after investing billions, and only pulled the plug when the financial damage exceeded the reputational damage of admitting defeat.

The International Research Journal of Innovations in Engineering and Technology published "The Rise and Fall of IBM Watson in Healthcare: Lessons for Sustainable AI Innovations," concluding that Watson's failure demonstrates systemic problems: overselling capabilities, deploying before validation, silencing internal criticism, and treating patients as beta testers.

A LinkedIn investigation titled "Public Autopsy: The Failure of IBM Watson Health" (September 2025) compiled testimonies from former IBM engineers, hospital administrators, and oncologists. The pattern was consistent: Watson was brilliant at marketing and catastrophic at medicine. One oncologist testified: "We spent more time correcting Watson's mistakes than we would have spent just doing the diagnosis ourselves."

The question isn't why Watson failed. The question is why it took eleven years and $4 billion for IBM to admit it.


PART III: THE TWENTY-FIVE BILLION DOLLAR HOLE

Amazon's Alexa and the Economics of Failure

While IBM was failing in healthcare, Amazon was failing in consumer AI. The scale was larger: $25 billion lost over four years, according to internal documents obtained by the Wall Street Journal in July 2024.

Ars Technica broke the story with "Alexa had 'no profit timeline,' cost Amazon $25 billion in 4 years" (July 23, 2024, https://arstechnica.com/gadgets/2024/07/alexa-is-a-colossal-failure/). The investigation revealed that Amazon's Alexa division, despite dominating the smart speaker market with hundreds of millions of devices sold, was hemorrhaging money with no plan to stop.

The New York Post's "Amazon bleeding billions of dollars from Alexa speakers: report" (July 2024, https://nypost.com/2024/07/23/business/amazon-bleeding-billions-of-dollars-from-alexa-speakers-report/) quantified the disaster: Alexa was losing $5-10 per device sold, plus ongoing server costs for each active device. At scale, this meant billions in annual losses with no revenue model in sight.

Qz.com's analysis "Amazon Lost 25 Billion Alexa Devices Echo Kindle Jassy" (May 2025, https://qz.com/amazon-alexa-echo-loss-25-billion-andy-jassy-1851496891) pointed out the strategic catastrophe: Amazon had convinced Wall Street that Alexa was a long-term investment in customer relationships. But four years and $25 billion later, Alexa users weren't buying more from Amazon—they were using Alexa for timers and weather reports.

The Verge's "Amazon's paid Alexa is coming to fill a $25 billion hole dug by Echo speakers" (July 2024, https://www.theverge.com/2024/7/23/24204842/amazon-alexa-plus-subscription-price-echo-speakers) revealed Amazon's desperation: the company was preparing to charge for Alexa features previously advertised as free. This would alienate users who'd bought devices under different terms, but Amazon was out of options.

Thurrott.com's summary "Amazon Reportedly Lost Over $25 Billion on its Devices Business in Four Years" (July 2024, https://www.thurrott.com/cloud/320857/amazon-reportedly-lost-over-25-billion-on-its-devices-business-in-four-years) contextualized the failure: this wasn't a startup burning VC money. This was one of the world's most successful companies, with sophisticated financial planning, losing billions on a product line that dominated its market.

Reddit's discussion "WSJ reported that Amazon has huge losses on Alexa devices" (July 2024, https://www.reddit.com/r/technology/comments/1e9vzl6/wsj_reported_that_amazon_has_huge_losses_on_alexa/) captured the public response: confusion. How could Amazon lose $25 billion on a product people actually bought and used? The answer: AI economics don't work. Not at IBM's scale. Not at Amazon's scale. Not anywhere.


PART IV: THE COMPANY WITH ZERO REVENUE

Magic.dev and the Art of Fundraising Vaporware

While giants failed visibly, startups perfected the art of failing slowly. Magic.dev is the paradigm case: $465 million in funding, 24 employees, $0 in revenue, and a product nobody can verify exists.

TechCrunch announced "Generative AI coding startup Magic lands $320M investment from Eric Schmidt, Atlassian and others" (August 28, 2024, https://techcrunch.com/2024/08/28/magic-coding-ai-startup-raises-320m/). The headline was celebration. The details were alarming: Magic claimed to have solved the context window problem with a 100-million-token system. This would be revolutionary if true. But fifteen months later, nobody has used it.

AI Media House reported "AI Startup Magic Raises $465M, Introduces 100M Token Context Window" (August 2024, https://www.aimmediahouse.com/magic-raises-465m-introduces-100m-token-context-window/), noting that the system was not available via API, had no pricing disclosed, and showed no evidence of actual users.

The SaaS News covered "Magic Secures $320 Million in Funding" (August 2024, https://thesaasnews.com/magic-secures-320-million-in-funding/), focusing on the investor list: Eric Schmidt (former Google CEO), executives from Atlassian, and other tech luminaries. The legitimacy of the investors created legitimacy for the company—despite zero demonstrated product.

FourWeekMBA published the definitive analysis in August 2025: "Magic's $1.5B+ Business Model: No Revenue, 24 People, But They Raised $465M" (https://fourweekmba.com/magic-business-model/). The investigation revealed that Magic's valuation exceeded $1.5 billion despite having no customers, no public product, and no revenue. This is not a company. This is a Ponzi scheme with a GitHub repo.

Crunchbase News reported "AI Coding Is Ultra Hot, With Magic And Codeium Revealing Big Funding Rounds" (August 2024, https://news.crunchbase.com/ai/magic-codeium-funding-coding/), treating Magic and its competitors as part of a healthy market. But a market where companies receive hundreds of millions with zero revenue isn't healthy—it's delusional.

Yahoo Finance republished the TechCrunch story with "Generative AI coding startup Magic lands $320M investment" (August 2024, https://finance.yahoo.com/news/generative-ai-coding-startup-magic-130023641.html), amplifying the narrative that Magic was succeeding. But success requires a product. Magic has funding. These are not the same thing.

Fifteen months after the funding announcement, Magic remains a financial black hole: $465 million in, nothing out. The investors haven't acknowledged failure because acknowledging failure would crater their other AI investments. So Magic exists in limbo: funded, valued, non-functional, and held up as evidence that AI coding is revolutionary.


PART V: THE ALGORITHM THAT SENTENCED THOUSANDS

COMPAS, Criminal Justice, and Automated Discrimination

While companies lost billions, AI systems embedded in critical infrastructure caused direct harm. The most documented case is COMPAS—a recidivism prediction algorithm used to inform sentencing decisions across the United States.

ProPublica's "Machine Bias" investigation (May 2016, republished October 2025, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) analyzed 7,000+ cases and found that COMPAS predicted higher recidivism for Black defendants at double the rate it predicted for white defendants—even when controlling for actual recidivism. The system was systematically biased, and that bias was influencing real sentences.

ProPublica's methodology was published separately in "How We Analyzed the COMPAS Recidivism Algorithm" (December 2023, https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm), showing that COMPAS's false positive rate for Black defendants was 45% compared to 23% for white defendants. This wasn't a rounding error. This was structural discrimination at scale.

The U.S. Courts system published a response: "False Positives, False Negatives, and False Analyses" (PDF), arguing that ProPublica's methodology was flawed. But the response conceded the core point: COMPAS did show racial disparities. The dispute was over whether those disparities constituted "bias" or merely "reflected existing patterns." This distinction matters to statisticians. It doesn't matter to defendants.

Research Outreach published "Justice served? Discrimination in algorithmic risk assessment" (November 2023, https://researchoutreach.org/articles/justice-served-discrimination-algorithmic-risk-assessment/), concluding that even if COMPAS's bias was "merely" reflecting historical data, the effect was to perpetuate and amplify historical discrimination.

The Proceedings of the National Academy of Sciences published "Cohort bias in predictive risk assessments of future criminal justice system involvement" (May 2023, https://www.pnas.org/doi/10.1073/pnas.2221509120), demonstrating that COMPAS's predictions degraded over time as the population changed—but courts continued using scores generated years earlier.

Aaron Fraenkel's academic analysis "COMPAS Recidivism Algorithm" (https://afraenkel.github.io/COMPAS_Recidivism/) reconstructed COMPAS's decision logic and found that the algorithm weighted factors like "family criminality" and "neighborhood crime rate"—proxies for race and class that ensured disparate outcomes.

Multiple papers explored fairness interventions. ACM published "Evidence of What, for Whom? The Socially Contested Role of Algorithmic Bias in a Predictive Policing Tool" (May 2024, https://dl.acm.org/doi/10.1145/3630106.3658996), showing that even technically "debiased" versions of COMPAS produced outcomes communities found unjust.

The arXiv preprint "Algorithmic Bias in Recidivism Prediction: A Causal Perspective" (November 2019, https://arxiv.org/abs/1911.10430) demonstrated that COMPAS's bias couldn't be fixed without removing its predictive power—a fundamental trade-off between accuracy and fairness that no technical solution could resolve.

SAGE Journals published "Fairness verification algorithms and bias mitigation mechanisms for AI criminal justice decision systems" (October 2025, https://journals.sagepub.com/doi/full/10.1177/20539517241283292), surveying dozens of proposed fixes. None worked at scale. The conclusion: you cannot remove bias from systems trained on biased data without destroying their functionality.

The Center for Justice Innovation published "Beyond the Algorithm: Evaluating Risk Assessments in Criminal Justice" (PDF, https://innovatingjustice.org/publications/beyond-algorithm), interviewing judges, defendants, and public defenders. The universal finding: COMPAS was treated as objective truth despite being demonstrably unreliable. Judges deferred to the algorithm because it provided legal cover—even when they suspected it was wrong.

The Indiana Law Journal published "The Overstated Cost of AI Fairness in Criminal Justice" (May 2025, https://www.repository.law.indiana.edu/ilj/vol100/iss2/4/), arguing that fairness interventions were economically feasible. But this missed the point: the cost wasn't economic. The cost was that people were sentenced based on biased predictions, and no technical fix could undo that.

By 2025, COMPAS remained in use across multiple states despite a decade of evidence showing systematic bias. Courts continued deferring to it. Defendants continued being sentenced by it. The algorithm worked exactly as designed—and that design was discriminatory.


PART VI: THE HIRING ALGORITHM THAT LEARNED SEXISM

Amazon's Recruiting Tool and Structural Bias

COMPAS discriminated in criminal justice. Amazon's recruiting tool discriminated in hiring. And like COMPAS, the bias wasn't a bug—it was learned from the data.

Reuters broke the story in October 2018: "Amazon scraps secret AI recruiting tool that showed bias against women" (https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G). The investigation revealed that Amazon's recruiting algorithm, trained on ten years of hiring data, systematically downranked resumes containing the word "women's" (as in "women's chess club captain"). The system had learned that Amazon historically hired fewer women, so it optimized for male candidates.

The BBC's coverage "Amazon scrapped 'sexist AI' tool" (October 9, 2018, https://www.bbc.com/news/technology-45809919) noted the broader implication: any hiring algorithm trained on historically biased data will perpetuate that bias. This isn't fixable through "debiasing" because the bias is structural.

The ACLU published "Why Amazon's Automated Hiring Tool Discriminated Against Women" (February 2023, https://www.aclu.org/news/womens-rights/why-amazons-automated-hiring-tool-discriminated-against), explaining the technical mechanism: machine learning optimizes for patterns in training data. If training data shows that successful hires were predominantly male, the algorithm learns to prefer male candidates. The system was working correctly—it was just optimizing for the wrong thing.

Fortune reported "Workday and Amazon's alleged AI employment biases are among myriad 'oddball results' that could exacerbate hiring discrimination" (July 2025, https://fortune.com/2025/07/04/workday-amazon-ai-employment-bias-hiring-discrimination/), revealing that Amazon wasn't unique. Multiple companies had deployed hiring algorithms with documented gender and racial bias.

Cut-the-SaaS published a detailed case study: "How Amazon's AI Recruiting Tool 'Learnt' Gender Bias" (June 2024, https://cut-the-saas.com/case-studies/how-amazon-ai-recruiting-tool-learnt-gender-bias), reconstructing the training process and showing that Amazon's engineers were aware of the bias but couldn't fix it without destroying the model's predictive accuracy.

The University of Maryland's R.H. Smith School of Business analyzed "The Problem With Amazon's AI Recruiter" (January 2021, https://www.rhsmith.umd.edu/research/problem-amazons-ai-recruiter), concluding that the fundamental issue was philosophical: Amazon wanted to automate judgment, but judgment involves values. An algorithm can't decide what "good" hiring means—it can only replicate past decisions.

IMD Business School provocatively asked "Amazon's sexist hiring algorithm could still be better than a human" (November 2018, https://www.imd.org/research-knowledge/articles/amazons-sexist-hiring-algorithm-could-still-be-better-than-a-human/), arguing that human hiring is also biased, just inconsistently so. But this defense conceded the key point: replacing human bias with automated bias doesn't solve discrimination—it scales it.

Amazon quietly discontinued the tool without announcing which (if any) hires had been influenced by it. No accountability. No compensation for candidates rejected by the biased algorithm. Just silence.


PART VII: THE TECHNICAL CEILING NOBODY WANTS TO ADMIT

Why Context Windows Can't Scale and Why That Matters

The previous failures were institutional and social. But there's also a mathematical ceiling that constrains what AI can ever do.

Towards Data Science published "Your 1M+ Context Window LLM Is Less Powerful Than You Think" (July 2025, https://towardsdatascience.com/your-1m-context-window-llm-is-less-powerful-than-you-think-c5a4e7f7e0f8), documenting that advertised context windows (1M, 2M, 10M tokens) don't reflect usable performance. Beyond ~400K tokens, models lose coherence, forget earlier context, and make errors.

MIT researchers published "Lost in the Middle: How Language Models Use Long Contexts" (December 2024, https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00638/118687/Lost-in-the-Middle-How-Language-Models-Use-Long). The study showed that information at the beginning and end of context windows is retained, but information in the middle is effectively forgotten. This creates systematic failures in long-document analysis.

The arXiv preprint is available at Stanford: "Lost in the Middle: How Language Models Use Long Contexts" (https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.pdf), with full experimental methodology showing that retrieval accuracy drops from 98% at 2K tokens to 45% at 100K tokens.

Synthesis.ai analyzed "Lost in Context: How Much Can You Fit into a Transformer" (April 2024, https://synthesis.ai/2024/04/07/lost-in-context/), concluding that the degradation isn't implementation-dependent—it's architectural. Transformers use quadratic attention, making longer contexts exponentially more expensive and error-prone.

Apple Machine Learning Research published "RATTENTION: Towards the Minimal Sliding Window Size in Local Attention" (September 2025, https://machinelearning.apple.com/research/rattention-minimal-sliding-window), proposing optimizations that reduce but don't eliminate the problem.

IBM's explainer "What is a context window?" (November 2024, https://www.ibm.com/topics/context-window) acknowledged the limitation but framed it as a temporary engineering challenge. The mathematical proofs suggest otherwise.

The unpublished but widely circulated paper "Fundamental Limitations on Subquadratic Alternatives to Transformers" demonstrates that under the Strong Exponential Time Hypothesis (SETH)—a widely accepted conjecture in computational complexity—document similarity tasks inherently require quadratic time. Translation: you cannot build a better architecture that maintains transformer-level capability with linear complexity. Such an architecture is mathematically impossible.

This matters because billion-token context windows aren't slightly harder than million-token windows—they're a million times harder. The compute required scales quadratically. At some scale, you run out of energy before you run out of math.

Companies advertise context windows they know don't work because admitting the limitation would crater valuations. But the limitation is real, it's mathematical, and no amount of engineering can overcome it.


PART VIII: THE INTERNET AS WARNING

What We Should Have Learned From the Last "Transformative Technology"

The AI industry's response to criticism is predictable: "Every transformative technology goes through this. The internet had hype cycles too. Eventually it worked out."

This argument proves too much. The internet did transform society—but the costs were catastrophic and mostly ignored.

NCBI/PMC published "Beyond the Hype—The Actual Role and Risks of AI in Today's Medical Practice: Comparative-Approach Study" (May 2023, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10200667/), comparing AI deployment to early internet deployment and concluding that the internet's harms (mental health collapse, misinformation, democratic erosion) emerged because we deployed first and asked questions later. AI is following the same path.

Multiple studies document the internet's mental health costs:

These aren't disputed. They're documented in dozens of peer-reviewed studies. The internet was useful and catastrophic simultaneously.

The internet's epistemic harms are similarly documented:

When critics say "AI will be like the internet," they're accidentally correct. The internet proves that transformative technology can be simultaneously useful and civilization-destabilizing. AI is following that exact pattern—except faster, with higher stakes, and embedded in critical infrastructure before we understand it.


PART IX: THE META-RESEARCH

Studies of AI Studies and What They Reveal

Beyond specific failures, meta-research reveals systemic problems in how AI is studied, evaluated, and deployed.

The arXiv paper "Thousands of AI Authors on the Future of AI" (April 2024, https://arxiv.org/abs/2401.02843) surveyed AI researchers about timelines, safety, and capabilities. The findings: researchers are overconfident, systematically wrong about timelines, and rarely penalized for incorrect predictions.

"Forecasting Transformative AI: An Expert Survey" (arXiv, July 2019, https://arxiv.org/abs/1901.08790) showed that expert predictions are uncorrelated with actual progress—experts guess based on intuition, not evidence.

"When Will AI Exceed Human Performance? Evidence from AI Experts" (arXiv, May 2018, https://arxiv.org/abs/1705.08807) surveyed 352 researchers and found median AGI predictions of 45 years—but with massive variance suggesting experts don't actually know.

"Forecasting AI Progress: Evidence from a Survey of Machine Learning Researchers" (arXiv, June 2022, https://arxiv.org/abs/2206.04132) replicated earlier surveys and found predictions getting shorter despite progress slowing.

These meta-studies reveal a field where predictions are marketing, not science. Researchers predict breakthroughs to justify funding. Companies predict success to justify valuations. Nobody is penalized for being wrong because by the time predictions fail, attention has moved elsewhere.


PART X: THE INSTITUTIONAL EVIDENCE

Regulatory Capture, Safety Theater, and Accountability Vacuum

The final category of evidence is institutional: how companies, regulators, and safety researchers interact to produce systematic failure.

The Future of Life Institute's "Benefits & Risks of Artificial Intelligence" (December 2022, https://futureoflife.org/ai/benefits-risks-of-artificial-intelligence/) documented the gap between AI safety rhetoric and action: companies announce safety commitments but don't fund them meaningfully.

The arXiv paper "The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence" (August 2024, https://arxiv.org/abs/2408.12622) catalogued over 700 documented AI risks, finding that known risks are rarely mitigated before deployment.

"Actionable Guidance for High-Consequence AI Risk Management" (arXiv, February 2023, https://arxiv.org/abs/2206.08966) proposed frameworks for managing catastrophic AI risks, concluding that current governance is "fundamentally inadequate."

"Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment" (arXiv, January 2025, https://arxiv.org/abs/2401.13116) created a 250-question assessment framework—and found that most deployed systems would fail basic safety checks if companies were required to answer honestly.

Bloomberg Law's "Conducting an AI Risk Assessment" documented that legal requirements for AI risk assessment are minimal, rarely enforced, and easily circumvented through legal structuring.

NCBI/PMC's "Ethical Risk Factors and Mechanisms in Artificial Intelligence Decision Making" (August 2022, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9434954/) identified structural problems: companies profit from deployment regardless of outcomes, regulators lack expertise to evaluate systems, and accountability mechanisms are non-existent.

The pattern is clear: AI governance is theater. Companies announce safety teams, publish ethics principles, and fund research—while deploying systems known to be flawed, lobbying against meaningful regulation, and facing zero accountability for failures.

To update the "Evidence Trail" sources article, add a new section (Part XI) right before the epilogue. This section narratively recounts the extraordinary surge in AI startup funding in October–November 2025, showcasing well-sourced examples:


PART XI: THE NEW FLOOD OF FUNDING

November 2025: Unicorn Valuations Without Revenue or Product

The Evidence Trail was already littered with stories of unicorns that had yet to ship a product. But the final months of 2025 revealed a new crescendo: billion-dollar bets on teams whose principal asset was talent, not traction.

Safe Superintelligence

Founded by Ilya Sutskever, this company instantly catapulted to a $5 billion valuation with its $1 billion raise. As of November 2025: no public product, no announced users, and no revenue—just a team and claims of world-class research. (Forbes, July 2025: https://www.forbes.com/sites/forbes-business-council/2025/07/08/the-hottest-vc-deals-today-are-no-revenue-no-product-just-all-talent/)

Thinking Machines Lab

Co-founded by Mira Murati, Thinking Machines has drawn $2 billion in capital and a $10 to $12 billion valuation, all before launching any commercial product. The market runs on faith, not evidence. (TechCrunch, August 2025: https://techcrunch.com/2025/08/26/here-are-the-33-us-ai-startups-that-have-raised-100m-or-more-in-2025/)

Reflection.AI

Noted for its $130 million Series A and a $580 million valuation, Reflection.AI builds “superintelligent autonomous systems.” It remains pre-product, with no commercial customers as of late 2025. (CB Insights, August 2025: https://www.cbinsights.com/research/report/ai-unicorns/)

Nexthop AI

This infrastructure-focused AI firm received $110 million in Series A funding, but has yet to demonstrate commercial traction. (Crunchbase, October 2025: https://news.crunchbase.com/ai-funding-boom-adds500b/)

General Intuition

With $133.7 million raised, this team's story remains one of “promise”—no product or business model reported, as of November 2025. (Technical.ly, November 2025: https://technical.ly/startups/agentic-ai-startup-trase-lands-10-5m-pre-seed/)

Hippocratic AI

Crowned by its recent $126M Series C, Hippocratic AI’s total haul now exceeds $230M. Yet the exact status of its product rollout, and its revenue, remains unclear as 2025 closes. (The SaaS News, November 2025: https://thesaasnews.com/reevo-raises-80-million-in-funding/)

CB Insights, Forbes, TechCrunch, and Crunchbase all document this new “standard” in AI: invest in the team and the theory—not the results. The flood of unicorns is driven by anticipation; products, users, profits remain in the future tense.


EPILOGUE: THE CONVERGENCE OF EVIDENCE

This document has traced evidence from:

The evidence converges on a single conclusion: AI as currently deployed is failing systematically across technical, economic, social, and institutional dimensions. These failures aren't edge cases waiting to be fixed—they're embedded in the architecture, incentives, and governance of AI systems.

The sixty-year pattern of broken promises isn't bad luck. It's evidence of a field optimizing for funding over truth.

The $4 billion Watson failure isn't an outlier. It's the IBM-sized version of a pattern repeated at every scale.

The $25 billion Alexa loss isn't a temporary investment. It's proof that AI economics don't work even when the product dominates its market.

The $465 million Magic.dev raises with zero revenue isn't innovation. It's a Ponzi scheme with a GitHub repository.

The COMPAS algorithm isn't a cautionary tale. It's a working system, in production, sentencing real people based on documented racial bias.

The Amazon hiring tool isn't ancient history. It's 2018, and multiple companies are still deploying similar systems.

The context window limitations aren't implementation bugs. They're mathematical constraints that no engineering can overcome.

The sixty years of "5-10 years away" predictions aren't optimism. They're systematic dishonesty rewarded by institutional incentives that punish truth-telling.

This is the evidence. The question is what we do with it.