Year round learning for product, design and engineering professionals

What If You Never Needed an API Key Again? Building a Mesh LLM From Spare Compute — Mic Neale at AI Engineer Melbourne 2026

Mic Neale at AI Engineer Melbourne 2026

What If You Never Needed an API Key Again? Building a Mesh LLM From Spare Compute

Every AI application today is built on a dependency. You make an API call to OpenAI, Claude, Gemini—one of a handful of providers running models in massive data centres. Your code can't run without that call succeeding. Your costs scale with usage. Your privacy and control are mediated by someone else's terms of service.

What if that entire architecture was optional?

The current AI stack works because it's convenient and centralized. One company runs the model, everyone calls it, billing happens automatically. It's efficient at global scale. But it introduces bottlenecks, costs, and dependencies that make many applications impractical. Want to run inference offline? Want to avoid sending your data to a US data centre? Want to use a model fine-tuned to your specific domain? The centralized architecture makes all of this either expensive or impossible.

Mic Neale is exploring a different path: a decentralized mesh LLM where spare GPU capacity becomes pooled, shared infrastructure.

The idea is deceptively simple. Your GPU sits idle most of the time. Mine does too. Collectively, neighbourhoods, communities, and organisations have vastly more compute than they're using. What if that idle capacity could be automatically pooled? When you need inference, the mesh provides it. When your machine is free, it contributes to serving others' requests.

This isn't new as a concept—peer-to-peer networks, distributed computing, and grid computing have existed for decades. But LLMs add complexity. Models are huge. A state-of-the-art model can be 70 billion parameters. Running it requires coordination across many machines. Latency matters—a 500ms request shouldn't take 5 seconds because calls are being routed across the globe. Reliability matters too: if one node goes down, the inference shouldn't fail.

Neale's work on this prototype shows it's not just theoretically possible; it's practically viable. The technical challenges are real—model sharding across heterogeneous hardware, latency optimization, fault tolerance, scheduling—but they're solvable. And the economic implications are enormous.

Consider the numbers. A single high-end GPU costs a few thousand dollars. A neighbourhood with 100 households probably has the equivalent of 10–20 high-end GPUs sitting idle. That's $50,000–$100,000 of compute capacity doing nothing most of the time. If even a fraction of it could be harnessed collectively, it would dwarf the cost of API calls for that entire community.

But this goes deeper than cost. A mesh LLM is inherently more resilient than a centralized service. No single company can shut it down. No API rate limit can throttle it. Models can be updated by the community rather than dictated by a vendor. Applications can run on models fine-tuned for local languages, cultures, and specific domains—something that centralized providers have little incentive to optimize.

There are hard problems here. How do you coordinate millions of machines offering different hardware specs? How do you ensure fairness—that people contributing more compute get equitable access? How do you prevent bad actors from poisoning the mesh or stealing inferences? How do you handle the economics: who gets paid, and how much?

Neale's background matters here. Two decades building developer tooling, distributed systems, and AI infrastructure at companies like CloudBees and Red Hat means he understands both the technical depth and the organizational reality of getting decentralized systems to work at scale. His work on Goose, Block's open source AI coding agent, has forced him to think about what happens when you're not constrained by a single vendor's model serving infrastructure.

The mesh LLM shifts power and capability from data-centre operators to communities. It's not about replacing the cloud; it's about making alternatives viable. For many applications—especially those where latency is flexible, privacy is essential, or cost is prohibitive with centralized APIs—a mesh approach becomes not just interesting but necessary.

What emerges is a different kind of decentralization. Not the "everyone runs everything" fantasy of some blockchain narratives, but practical mutual aid: my spare GPU serves your inference; your spare GPU serves mine. The economics work because we're using capacity that's otherwise wasted.

Mic Neale's work on building mesh LLMs offers a glimpse of what becomes possible when we shift from a "data centre required" model to "your neighbourhood has enough." This isn't about ideology; it's about engineering a system that's more resilient, more equitable, and more practical for the next decade of AI applications.

Hear Neale explore the technical realities, the economic model, and the implications at AI Engineer Melbourne 2026 (June 3–4).

delivering year round learning for front end and full stack professionals

Learn more about us

Web Directions South is the must-attend event of the year for anyone serious about web development

Phil Whitehouse General Manager, DT Sydney