Tomcw.xyz

Is This Just a Skill?

Tom Watson — Fri, 27 Mar 2026 10:20:06 GMT

I built an app a few months ago. It's called ImpactPath. You can sign up, log in, have a conversation with an AI that guides you through creating a theory of change for your organisation, and at the end you get professional outputs: a logic model, executive summary, visual diagram. It works. It's a genuine product if a little unfinished.

I stopped building it, partly because I need to make choices about where I spend my time and I think there are more interesting and valuable ways of helping people thinking about impact and learning. But also... I started wondering: is this just a skill?

Are you just a .md file?

There's a tongue-in-cheek website doing the rounds called Death by Clawd — the "SaaSpocalypse Survival Scanner." You enter a SaaS company and it rates the probability of being replaced by a Claude Skill. It's a joke, maybe, but within every joke, isn't there a semblance of truth?

A lot of software is essentially packaged expertise with a billing system attached. And if the expertise can be written down, if it's procedural knowledge, decision trees, best practices, structured processes, then maybe it can be a skill instead of a subscription.

Now that AI can produce things at speed, we can make so many more things. The cost of building has collapsed, an idea in the morning can be a deployed app by evening.

But for many of those things, maybe most of them, should they be apps at all? Or are they just a SKILL.md file waiting to happen?

There's now an open standard for packaging expertise in a format that AI agents can use. It's called Agent Skills, and the specification lives at agentskills.io. Originally developed by Anthropic, it's been adopted across Claude, OpenAI Codex, GitHub Copilot, VS Code, Cursor, and over twenty other platforms.

A skill is a folder containing instructions, scripts, and resources. The core is a SKILL.md file: metadata describing what it does and when to trigger, then markdown with the actual guidance. You can include reference files, templates, executable scripts. The agent loads what it needs, when it needs it.

So now, every time I start building something, I find myself asking: is this just a skill?

What I actually built for ImpactPath

Before I could build the app, I had to build the expertise layer:

A data schema for capturing organisational context: mission, beneficiaries, activities, resources, geographic scope
A structured conversation flow that moves through stages: context gathering, activity mapping, outcome definition, assumption surfacing, measurement planning
Methodology for moving from vague impact language ("we help people") to specific outcome statements ("participants report increased confidence in managing their finances within 3 months")
Guidance on surfacing assumptions: what has to be true for this theory to hold?
Indicator selection logic: qualitative vs quantitative, primary vs secondary sources, realistic collection frequency
Output templates for different audiences: funder-facing, board-facing, community-facing

The app wraps all this in authentication, saved progress, a nice interface, export buttons. But the real core of it is the methodology, the schema, the process, the examples.

And that could all live in a SKILL.md file.... in fact I DID extract all this into a skill which you can see here!

claude-code-template/.claude/skills/theory-of-change-builder/SKILL.md at master · dataforaction-tom/claude-code-template

Ready-to-use project template for Claude Code with state tracking, slash commands, subagents, and continuous improvement - dataforaction-tom/claude-code-template

GitHubdataforaction-tom

So when is something just a skill?

A skill is enough when the value is in the knowledge, not the infrastructure.

If what you're building is essentially "here's how to do this thing well", a methodology, a process, a set of decision rules, a framework for thinking through a problem that's probably a skill. The AI agent provides the interface. The skill provides the expertise.

Where does a creating a Theory of change sit on this? It's the expertise, the questions to ask, the order to ask them, the way to structure the outputs. A skill can encode all of that.

Brand guidelines? I have skills for The Good Ship brand. Colours, fonts, layout patterns, tone of voice. From here I can create slides or documents or use it in HTML and CSS for websites

I think a skill is enough when:

The value is procedural knowledge that can be written down
The user already has access to an AI agent
The outputs are documents, decisions, or structured data rather than ongoing services
There's no need to persist state across sessions (or the user's own systems handle that)
Security and access control aren't critical differentiators

When something goes beyond a skill

But sometimes a skill isn't enough. Here's when you probably need more:

Scale and concurrency. A skill runs in one person's conversation, but if you need to handle thousands of simultaneous users, coordinate between them, or maintain shared state, you need infrastructure.

Observability and analytics. Skills are somewhat opaque, meaning you don't easily get aggregate data on how they're being used, where people get stuck, what outputs they're generating. If understanding usage patterns is core to the value proposition, for improvement, for reporting, for demonstrating impact, you probably need an app layer that captures this.

Repeatability and consistency. Skills depend on the underlying model, which updates, changes, has different behaviours across platforms. If you need guaranteed consistent outputs, the same input always* produces the same output, for compliance or contractual reasons then you need to control more of the stack.

Services, not just outputs. A skill produces outputs: documents, recommendations, structured data. But what if the value is ongoing service? Monitoring, alerting, scheduled tasks, integrations that run without human prompting. That's not a skill.

Security and access control. Skills inherit the security model of wherever they run. If you need fine-grained access control, audit trails, data residency guarantees, or compliance certifications, you need infrastructure that provides these. A skill for handling sensitive health data is risky if you can't control where it runs.

Identity and payment. If you need to know who's using it, limit access, or charge money, you need infrastructure around authentication and billing.

Network effects. Some products get more valuable as more people use them. Shared databases, community contributions, aggregated insights. A skill is individual. There's no mechanism for one person's use to improve another person's experience, unless you actually build that mechanism.

*always is an interesting concept when using LLM's for most anything

The honest assessment of ImpactPath

So is ImpactPath just a skill? Would Death by Clawd give it a high mortality rating?

100%, yes. The core value, guiding someone through theory of change creation could absolutely be a skill. Someone with Claude Pro could get 80% of the value by loading an impact-framework skill and having the same conversation.

What the app adds:

Saved progress (you can come back tomorrow)
Polished visual outputs (the diagram generation is fiddly)
A front door for people who don't use AI tools directly
Eventually: funder-specific templates, organisational accounts, portfolio views for funders

Is that enough to justify the app? Honestly, I'm don't think so. The saved progress is nice but not essential for a process most people complete in one sitting. The visual outputs could arguably be skill-generated. The front door matters if your users aren't already in AI-native workflows, which admittly many users still aren't yet.

The portfolio view for funders might be the thing that actually requires an app. That's aggregate data across multiple organisations, shared state, access control. There's probably collaborative infrastructure potential, but realising that means people investing in this as a concept.

What this means for building

I'm not saying don't build apps. I'm saying ask the question first.

When you have an idea, before you spin up a repo and start scaffolding authentication, ask: is this just a skill? Could I write a SKILL.md that captures the expertise, test it in Claude or Cursor, and see if it actually delivers the value?

If yes, maybe that's the product. Maybe that's enough.

And if it turns out you need more, scale, observability, services, security, network effects, you can build the infrastructure later. But you'll be building it on top of a skill that already works, that already encodes the expertise, that you've already validated.

The skill becomes the foundation with the app becoming the distribution and interface layer on top.

What this means for domain expertise

Here's where it gets interesting for organisations, not just builders.

Every organisation has embedded expertise. Previously, that expertise stayed trapped unless you built a product around it which requires funding, technical capability, ongoing maintenance, commercial ambition. Most organisations quite reasonably said "that's not our core mission" and the expertise stayed local.

Skills could potential change this.

What if packaging your expertise as a skill became a normal thing organisations did? Not as a product to sell, but as a contribution to a commons. The way you might write a guide or share a template, but machine-readable, executable, actually usable by anyone with an AI agent.

I wrote a piece called The Interface: Building Blocks for Just-in-Time Software which takes this a new level, but perhaps skills are enough.

The value isn't in the delivery mechanism but in the knowledge about how to do something well. Skills make that knowledge portable.

I wonder if there is value in skills discovery, either automated or workshopped, understanding the various ways across an organisation that people encode expertise naturally in their everyday. Could these be skills that could be adopted across an organisation?

Not every idea needs to be an app. Not every workflow needs a SaaS. Not every piece of expertise needs infrastructure wrapped around it. Sometimes a SKILL.md file is the product. Sometimes the answer to "are you just a .md file?" is: yes, and that's fine.

So next time you're about to build something, ask yourself: is this just a skill?

The Grant Application Is Dead. What Comes Next?

Tom Watson — Wed, 25 Mar 2026 17:24:50 GMT

A foundation puts out a call. Organisations write applications. A panel reads them, scores them, funds some, rejects most. This process has been the backbone of grant-making for decades. It was designed for a world of information scarcity — where funders needed a structured mechanism to surface what organisations do, what they need, and whether they're credible.

That world is gone.

Large language models have made it ridiculously easy to produce polished sounding grant applications. Funders are now drowning in submissions many of them 'well-written', 'well-structured', and indistinguishable from one another, a soup of pretty good. The response has been predictable: use LLMs on the other side too, to sift and score. We're building a system within a system, machines writing applications for machines to read, and the human decisions that funding is supposed to enable are getting further away, not closer.

This isn't a problem we can optimise our way out of. The application form was an approach and technology for filtering in a low-bandwidth information environment, where the capacity to do the thing meant you only did the application if it you were really committed to it, or at least reasonably committed. That environment no longer exists. We need to think about what comes after.

The problem isn't volume

It's tempting to frame this as a scale problem - too many applications, not enough capacity. But in my opinion it's a deeper issue that is more structural. The traditional model is built on information asymmetry as a filtering mechanism. The form exists to make applicants prove they're worthy. LLMs broke that filter by making proof cheap.

The application as form approach jumbles several things that should be separate: identity (who is this organisation?), evidence (what have they done and learned?), intent (what do they want to do next?), and fit (does this align with what we fund?).
Bundling all of these into a single, pass/fail, one-shot document written to a deadline, shaped by word counts, and performed for an audience of assessors. Of course there are some who debundle these, into EOI's and multi stage approaches, but that was a solution to when filling in a form was a capacity issue.

Is there another way?

What if, instead of organisations applying to funders, the relationship ran the other way?

There's a school of thought that says we should just do away with applications entirely and become fully relational in grant-making. No forms, no calls, no competitive rounds. Just relationships. Funders get to know organisations over time, understand their work, build trust, and fund on the basis of that relationship. Approaches like Farming the Future and Regenerative Futures have moved in this direction, more connected, more trust-based, more human.

There's a lot to like about this. It removes the performative nature of the application form and centers the relationship. It lets funders understand context in a way that no written submission ever captures.

But can every foundation take this approach? And what are the potential risks if they do? Fully relational funding, without any balancing infrastructure, risks reverting to the boys' club. Who gets the relationships? Who gets invited into the room? Predominantly, it's those with existing access to networks and that access is overwhelmingly a function of privilege. Geography, education, professional background, confidence in certain social settings. The organisations doing the most important work in the most underserved communities are often the furthest from funder networks. A purely relational model doesn't just fail to fix this. It can make it worse.

So the choice maybe isn't "applications or relationships?" but perhaps it's "how do we build infrastructure that enables relationships to form on more equitable terms?"

That's what leads to the idea of a self-sovereign organisational profile. Imagine a world where each organisation maintains a living, structured, machine-readable representation of itself. Not one written for any specific funder or call, but maintained for its own purposes. It covers the things funders always want to know: what the organisation does, who it serves, how it's governed, its financial health, its evidence of impact, what it's learning, its culture and ways of working. It's updated continuously, fed by the organisation's own systems, not composed under deadline pressure.

Funders, in this model, become discoverers rather than gatekeepers. They search across a landscape of organisations, exploring profiles, understanding context, and reaching out when they see alignment. The application form disappears and in its place: a structured, ongoing, mutual relationship between resource and purpose. The infrastructure makes organisations discoverable regardless of whether they already have a relationship with the funder.

This isn't entirely new thinking. Elements of it exist in open data initiatives, in the 360Giving standard, in community foundation practice. But the enabling technologies of federated protocols, structured data standards, LLM-assisted search and sense-making are now mature enough to make the full vision practical.

An organisational knowledge layer

The core idea is a self-sovereign organisational profile: owned by the organisation, hosted on their terms, and selectively shared with funders and others who request access.

Think of it as an org.txt for the social sector but richer, living, and federated. Part narrative, part structured data. Markdown for the human-readable story. JSON or YAML for the machine-readable structure. Together, they form a profile that can be read by a person, queried by software, and understood by an LLM.

The profile covers several domains:

Identity — who the organisation is, its legal form, registration details, geography, scale, mission statement, and values. The stable facts that change slowly.

Evidence — what the organisation has done and what it has learned. Service delivery data, outcomes, case studies, evaluations. This is longitudinal, accumulating over time, telling a story about trajectory rather than presenting a snapshot.

Governance — how decisions are made, who's involved, what policies are in place. Board composition, safeguarding, financial controls. The things that signal organisational health and due diligence.

Culture — how the organisation works, what it values, how it treats people. This is harder to structure but matters enormously. It might include approaches to participation, equity commitments, learning practices.

Ideas — and this is where it gets interesting.

The ideas layer

Right now, the grant model assumes organisations are reactive, sitting there waiting to be asked and then respond within the constraints of a funding call. But organisations are always thinking about what they'd do with resource. They are the ones closest to communities, and have ideas sitting in people's heads, in strategy documents, in board away-day notes, in conversations with people. These ideas are often invisible to funders until someone writes them into an application.

What if ideas were first-class objects in the system?

An organisation publishes an idea — structured, tagged, place or theme-based, loosely costed, linked to their evidence and track record. Not a full proposal, but a more like a seed. It sits there, discoverable, evolving. The organisation can refine it over time, link it to emerging evidence, connect it to ideas from other organisations working in the same place or theme.

Funders browse the landscape of ideas, not applications. They see clusters of intent forming , six organisations across two places all circling similar work. They can signal interest, which is itself visible and logged. They might approach a cluster and say: "We'd fund this. Let's talk." Or they might see an idea that's been developing for two years, gathering evidence and collaborators, and recognise that it's ready.

This changes the temporality of funding fundamentally. Instead of artificial deadlines and competitive rounds, funding becomes responsive to organisational readiness and ecosystem dynamics, shifting power.

This is sort of something I've been exploring through OpenIdeas.uk , the concept that ideas should be visible, connectable, and persistent, not locked inside application forms that only one funder ever sees.

Federation: no platform, just a network

The worst version of this idea is a centralised platform, another portal where organisations create profiles, controlled by whoever runs it. Stop me if you've heard this story before. It creates new gatekeepers, new lock-in, new dependencies.

The better version is federated. Each organisation hosting its own profile or has it hosted on their behalf, and the profiles communicate via an open protocol. No one owns the index. No single entity decides who's visible. The organisation is its own node in the network.

Two protocols are serious candidates: AT Protocol (the technology behind Bluesky) and ActivityPub (the W3C standard behind Mastodon and the wider fediverse).

AT Protocol

AT Protocol's strongest feature for this use case is account portability. An organisation's identity based on a decentralised identifier (DID) travels with them. If a hosting provider disappears or becomes problematic, the organisation takes their data and moves. For organisations that have experienced platform lock-in, this is compelling. The DID also provides cryptographic verification of identity, which matters in a funding context where trust is essential.

The lexicon system is also relevant. AT Protocol lets you define custom record types meaning you could define an idea, an evidence-summary, an access-grant as formal lexicons, and they become first-class objects in the network. The firehose architecture means funders could subscribe to a real-time stream of updates across the network as new ideas are published, evidence updated, profiles changed. The flow of information becomes continuous rather than periodic.

But there are limitations too. AT Protocol is still heavily shaped by Bluesky's social media needs. The infrastructure is resource-intensive to self-host which reintroduces some centralisation at the infrastructure layer and most critically for our purposes, AT Protocol assumes a public-by-default model. The tiered access control we might need, of public summaries, detailed profiles visible only to approved funders, full evidence bases available on request would need to be built as an additional layer rather than being native to the protocol.

ActivityPub

ActivityPub brings maturity and ecosystem breadth. The actor model maps naturally to organisations with each organisation being an actor that publishes activities: "new idea published," "evidence updated," "access granted" to followers. Funders follow organisations. ActivityPub handles private and semi-private content more naturally. You can address activities to specific actors or collections, which maps well to tiered access. The inbox/outbox model provides a built-in audit trail of interactions.

The downsides: identity is server-bound, if an organisation's server goes down, their identity goes with it. No native portability.

A pragmatic decision: protocol-agnostic, with a bias

Neither protocol is perfect as-is. AT Protocol offers better identity, portability, and structured data. ActivityPub offers better access control, maturity, and the pub/sub model we need.

So maybe the best path is to design the data model and vocabulary first, protocol-agnostic. Start with the data model. Get it right. Then implement federation beneath it.

The central artefact isn't a platform or a protocol but a local agent running on or for each organisation.

This is the organisation's node in the network. It pulls, it assembles, it federates. It does the work so people don't have to.

Pulls evidence from existing systems — CRM data, service delivery records, financial information, impact metrics. Organisations don't maintain yet another system; the agent connects to what they already use via APIs, MCP servers, database connections, and document indexing.
Maintains the living profile — assembling structured data and narrative into the canonical representation. An LLM assists with summarisation, structuring, and keeping things current, but the organisation retains editorial control.
Manages ideas — providing a space to draft, refine, tag, and publish ideas, linked to evidence and to ideas from other organisations.
Handles access control — processing requests from funders, logging who sees what, managing tiered permissions. The organisation decides. The agent enforces.
Speaks to the protocol — federating the profile outward via ActivityPub, AT Protocol, or both. The agent is where the protocol-agnostic data model meets the specific federation mechanism.

For some organisations, the agent runs on their own infrastructure. For others, it's a managed service hosted on their behalf but with their data remaining sovereign. The key is that the organisation owns the agent's output regardless of where it runs.

This architecture also means the agent can grow incrementally. An organisation might start with just a static profile generated from their website and Companies House data (something like what llmstxt.social already does). Over time, they connect their CRM, start publishing ideas, grant access to funders. Start small and build it up.

What this enables

For organisations: maintain your own truth, once, and share it on your terms. Stop rewriting the same information for every funder. Make your ideas visible before anyone asks. See who's looking at your work. Connect your ideas to others doing complementary work.

For funders: discover organisations doing relevant work, rather than waiting for them to find your call. See patterns and clusters across places and themes. Fund into ecosystems, not isolated bids.

For the system: Make smaller organisations with strong practice more visible, not less. Create mutual transparency about who holds power and how it's exercised. Build a shared evidence layer that benefits everyone.

Some caution obviously

This isn't without hard problems. Some of them:

Capacity. Even with low-friction tooling, some organisations won't be able to maintain a profile. The system needs to account for this

Equity by design, not afterthought. If the system makes it easier for well-resourced organisations to maintain rich, polished profiles while smaller organisations struggle to keep theirs current, we've just moved the privilege barrier from "who writes the best application" to "who maintains the best profile."

Governance of the standard. Who maintains the schema? How does it evolve? This needs to be stewarded by a body that represents both organisations and funders, and ideally by people who understand open standards development.

Preventing surveillance. If funders can see everything, this could become a new panopticon. The access control layer and audit trail are essential, not optional. Organisations must be able to see exactly who has looked at what, and revoke access. The power relationship must be genuinely mutual.

Interoperability with existing systems. The CRM landscape alone is fragmented Salesforce, Lamplight, Charitylog, Airtable, spreadsheets. The local agent needs robust MCP integrations for all of these.

Funder adoption. Funders would need to move from designing calls and reading applications to searching, discovering, and entering dialogue. The system probably needs to coexist with traditional grant-making for a long transition period.

Building on what exists

This isn't starting from zero. But it's worth being honest about the limitations of what already exists, because they point toward what needs to be different.

Open Referral UK and HSDS provide a standard for describing services what's available, where, for whom. But Open Referral is defined by services, not by organisations and their ideas. It assumes things are relatively static: a service exists, it has a location and eligibility criteria, you can look it up. That's useful for directories, but that's not how organisations or ideas are in reality.

360Giving has demonstrated that open grant data is possible and valuable. But it describes funding after the fact who gave what to whom. It doesn't address the question of how resource finds purpose in the first place.

The Charity Commission is an interesting case. They're increasingly requiring (or planning to) organisations to report on "impact" whatever that means (this could be its own post entirely, because the gap between what regulators mean by impact and what organisations experience as meaningful change is vast). But the direction of travel is clear: more structured reporting, more data, more accountability. The use case driving this is regulatory. Not necessarily improving anything for organisations or the people they work with.

And so, if things are heading this way anyway shouldn't we try to do it better? Shouldn't we build the infrastructure so that the same structured data that satisfies a regulatory requirement also makes an organisation discoverable to funders, also connects their ideas to potential collaborators, also tells a richer story than a tick-box compliance exercise?

What comes next

This is an early exploration, not a finished concept, the writing sometimes is the thinking. If it were left to me maybe I be thinking about the following: define the schema properly, with input from organisations and funders. Build a reference agent, starting simple, with a static profile and manual updates, adding system integrations over time. Then test it with real relationships: a willing funder, a small cohort of organisations, a genuine alternative to a funding round. See what breaks.

Or, we can just keep going round and round ever chasing efficiencies and optimising a broken system.

The Case for Loose Ends

Tom Watson — Wed, 18 Mar 2026 14:13:07 GMT

Is our obsession with neat little bows on workshops and projects actually diminishing the impact they have? The long-term impact. The impact we hope they have beyond our involvement?

I've been designing and running a fair few workshops recently. Structure is important. How we start, how things flow, and how we end. I've been thinking about learning and application of ideas, I've been thinking about what happens in the room and more importantly what happens outside the room, beyond your control as a facilitator. And I've been wondering whether in our efforts to show a workshop 'works', we close off deeper learning, and how sometimes we need to intentionally leave open threads and open questions.

The problem with resolution

Most of what happens in a workshop or a session is only ever useful if people can take it back into their own context and make sense of it there. A good facilitator will create rooms and spaces that support the workshop. But however well designed, the room is artificial. The real work happens when someone is back at their desk on a Tuesday morning trying to figure out what any of it means for the decision they're actually facing.

If we resolve everything in the session with neat actions, clear conclusions, a satisfying arc, there is the potential we've artificially done the sense-making for them. We've removed the productive friction of trying to figure out what it means to me in my context outside the room. We've made it easy to file the experience away.

But what if we leave a question hanging, one that is genuinely & intentionally unresolved, not because we ran out of time but because we chose to? One that forces them to contextualise, to test an idea against their own reality, to keep thinking after the room has emptied.

Maybe the loose end isn't a failure of facilitation. In some cases maybe it's the mechanism.

We design too linearly

I think part of the problem is that we default to linear workshop design and I need to wrap up neatly. Input, discussion, activity, action plan. It feels productive. It maps well to a session plan. But sometimes it skips the slower, harder cognitive work — the sitting with ambiguity, the divergent thinking, the wrestling with contradiction — that actually changes how people see and operate.

I've written about this in two frameworks — the five modes of thinking and the modes of problem solving. Both make the same core point: these aren't linear steps, they're modes you move between. But most workshop design treats them as a pipeline. We skip sensing and imagining, jump straight to designing and acting, and wonder why nothing sticks. We rush past the modes that require sitting with not-knowing because they're uncomfortable and hard to report on.

We're essentially optimising for what's legible in the room rather than what's transformative beyond it.

Honesty about complexity

There's a deeper issue too, particularly in systems change work. If we're genuinely working on complex, adaptive problems, the kind that don't have neat solutions, then wrapping a session up tidily is a kind of dishonesty. It implies that the problem can be bounded by a two-hour slot and a set of post-it notes.

The open question is the honest response to complexity. Saying "we haven't resolved this, and that's OK, and here's why" is more respectful of the work and of the people doing it than pretending we've cracked something we haven't.

Intentional, not accidental

I do want to be clear though; this isn't an argument against structure, or for vague, meandering sessions where nothing happens, in fact I am a BIG advocate for structure and get a bit annoyed if I'm in sessions where this doesn't happen! There's a world of difference between a loose end that exists because someone didn't plan properly and one that exists because a facilitator made a deliberate choice to leave space for ongoing sense-making.

The skill is knowing what to close and what to leave open. Naming it. Being honest with people that some things are meant to stay unresolved, that the discomfort of an open question is intentional and feeling comfortable with that. It's not always easy. People like certainty.

And to be clear, not every workshop or session needs this. Sometimes things can be neat, things do need resolving. The skill, and the bravery, is in knowing which one and designing for it.

What I'm trying to do differently

In the organisational resilience work, and the TechFreedom sessions we are creating, I'm experimenting with this. Designing sessions that explicitly don't resolve. Ending with questions rather than actions. Creating arcs that span weeks, not hours, so that the space between sessions becomes the real learning environment, not dead time between the important bits.

It's harder to justify. It's harder to evidence. It looks less impressive on paper and it feels uncomfortable, uncertain. But I think it's closer to how change actually works — slowly, messily, in the gaps between the things we can point to.

Maybe the best thing a workshop can do is send someone away with a question they can't stop thinking about, a niggle that they need to resolve in their time, in their context. Maybe that's not a loose end, but actually the whole point of it all. Maybe this is the thread worth following.

What has the IMD ever done for us?

Tom Watson — Wed, 11 Mar 2026 20:04:48 GMT

I'm admittedly a little apprehensive about writing this blog, but it's a question that has been on my mind for a while. There's a couple of reasons for this apprehension. Firstly, it's pretty much a given that The Index of Multiple Deprivation (IMD) is a good thing with data, and I don't want to lose my good data card. Secondly, maybe the answer to my question is so blindingly obvious, but I'm just not smart enough to see it. But this has been bugging me for a while, and so here I go...

I've been using the Index of Multiple Deprivation for years. I've pointed people to it. I've built products with it. I've laid it on maps. We all have right? You've probably cited it in reports, used it to justify funding bids, and watched it shape commissioning decisions across the country.

And yet I've always had this niggling question: has it actually made anything better?

This isn't a technical critique. I'm not here to argue about the weighting of domains or the methodology behind the seven indicators. Smarter people than me have done that work. What I'm asking is something more fundamental: has the existence of the IMD this tool we've embedded so deeply into how policy gets made across the UK actually improved outcomes for the communities it describes?

The case for

Some would argue, yes, of course, stop asking ridiculous questions Tom. The IMD introduced a multidimensional way of understanding deprivation, not just income, but health, education, employment, housing, crime, the living environment. It helped us think about intersectionality across different dimensions of need. It gave us a shared language and a common dataset that everyone from local councils to national government could point to.

And that's a good thing. Having a consistent, open dataset that covers England is genuinely useful. But has it made a difference?

The uncomfortable evidence

In 2021, the Institute for Community Studies at The Young Foundation published something that should have landed like a bombshell. They found a 0% change in the relative economic advantage of the UK's most deprived neighbourhoods over 15 years, despite targeted investment of over £20 billion across consecutive Labour, coalition, and Conservative governments.

Zero percent over fifteen years when we've had the data use.

The most deprived places in 2004 were, by and large, the most deprived places in 2019. Research looking at deprivation trajectories from 1971 to 2020 found : around 82% of areas in the most deprived decile in 2004 were still there in 2010, and by 2015-2019, nearly 88% of the most deprived areas stayed in that bottom decile.

So here's my question: if the IMD has been the primary thing guiding where billions of pounds of investment go, and the places it identifies as most deprived remain most deprived decade after decade, what exactly is it doing for us?

A static snapshot dressed as understanding

The IMD is updated roughly every five years. It's built from administrative data, benefit claims, educational attainment, health records, crime statistics. It tells us about communities through the lens of what government systems already collect. It's a portrait drawn from the paperwork.

And it's a relative measure. It can tell you one area is more deprived than another, but not by how much. It can't tell you whether things are getting better in absolute terms. An area could see genuine improvements in the lives of its residents and still sit in the same decile because everywhere else improved too. There's also another problem, and this is a big one for me: The IMD describes areas, not people.

False certainty?

I sometimes wonder if the IMD gives us a false sense that we know what's happening and can control it?

Think about how it gets used in practice. A commissioner pulls up the IMD map or uses one of the many tools (more on this later) that list the areas they need to worry about. They see the red areas, they write it into their strategy document, doing the right thing. There's a comfortable certainty to it: the data says this, so we do that. Is this just New Public Management wrapped up in maps?

But does the IMD actually tell us about what's happening in those communities? Does it tell us about the relationships between people? About the assets they hold, the community groups, the informal networks, the knowledge and capability that exists in every place? Does it capture what people care about, what they're trying to build, what keeps them up at night?

And if I'm certain about anything, which at the moment is not a whole lot, it's that real change on a local level only comes from people who live there having real agency.

The IMD tells policymakers about communities. It doesn't enable communities to tell their own story. And there's a world of difference between those two things.

When certainty stops us learning

Here's what I think might be the real cost. When we have a dataset that feels authoritative and comprehensive, seven domains! thirty-nine indicators! every small area in England! we tend to stop asking questions. We stop being curious. We stop going to places and listening to people. Because we already know. The data told us.

And so we stop learning.

We wrap new programmes, new funding rounds, new policy initiatives in the same dataset, the same statutory data repackaged in a different way. The Independent Commission on Neighbourhoods just released a new dashboard about Neighbourhoods, it's nice, but guess what, it's the IMD with a couple of other indexs.

New project, new programme management framework, different logo. But the same underlying assumption: that understanding deprivation means looking at the numbers, and that looking at the numbers means we know what to do. And this isn't even a knock on ICON, we've all done it, and again, maybe I'm just not smart enough to know that this really does make a difference.

So what instead?

I'm not arguing we should throw the IMD away. Data matters. Understanding patterns of deprivation at scale matters. But I am arguing we need to be honest about what it is and what it isn't.

The IMD is a backward-looking snapshot of administrative data. It's useful for understanding broad patterns. It's not a substitute for understanding communities, and it's certainly not a basis for designing interventions.

Perhaps what we should be doing is leaning into the uncertainty. Acknowledging that no dataset, however well-constructed, can capture the complexity of what's happening in a place. That the right response isn't to point at a map and say "there do something about that" but to go to that place and ask "what's going on here? What do you need? What are you already doing?" Maps as conversations

An invitation

I'm genuinely asking these questions, not just rhetorically. I've used the IMD extensively and I suspect many reading this have too. So I'd love to know:

Has it changed something for the better in your experience? Has it led to a decision that wouldn't have been made otherwise, one that actually improved things? Or has it become just something we do? A box we tick, a map we show, a certainty we lean on because the alternative, admitting we don't really know is too uncomfortable?

I think there's a conversation to be had here. One about data and power, about who gets to define a place, and about whether the tools we've built to understand deprivation might actually be getting in the way of doing something about it.

Exploring conversational database building

Tom Watson — Tue, 10 Mar 2026 21:00:46 GMT

Another in the series of blogs where I try to demystify things I've been building and working on, and why they might matter beyond the technical details, but also diving into some of the TechFreedom lens. This time: connecting AI to databases, hitting walls with Airtable's API, and why open source makes more things possible.

I've been spending a lot of time working within Airtable over the last week or so for a project. I've used Airtable for a good number of years and think it's genuinely good software, intuitive, user-friendly, powerful for a lot of use cases. But like everything, it's not without its limitations, like when you need to create large bases with complex data structures. Doing it all through the UI becomes a bit of an annoyingly slow process. Clicking through field after field, table after table, configuring options one at a time. Airtable lets you create bases from csv and their AI assistant is actually pretty good, but when things get beyond simple these methods don't really work.

As this was what I was actually doing, I decided to build something to solve the immediate problem. A JSON uploader that let me define a data structure in JSON and push it into Airtable via their API. Define your tables, fields, and relationships in a structured format, run the script, and the base is built. I'm honestly surprised Airtable haven't done this themselves.

So far so good. But I wondered if I could go further obviously! The uploader was good, but a little inflexible. It worked perfectly for the exact JSON structure I'd defined, but it was slightly limited to specific structures and so to make tweaks, change something, or handle a slightly different schema, I was back to either modifying the uploader or doing things manually. I could have built an adapter into the uploader (and still might) which allows this flexibility, but my mind went somewhere else first...

Enter MCP servers

This is where I started thinking about a different approach. Rather than building specific tools for specific tasks, what if I could create a persistent connection between an AI system and Airtable, one that could understand the base, work with it conversationally, and do whatever I needed, when I needed it?

If you're not familiar with MCP (Model Context Protocol), here's the short version: it's a standard way for AI tools to connect to external services. Think of it as a translator that sits between something like Claude or another AI assistant and a service like Airtable. The AI talks in natural language, the MCP server translates that into API calls, and suddenly you can say "create a table called Organisations with fields for name, postcode, and a linked record to Services" and it just... does it.

Airtable already has an official MCP server. It lets you do certain things, search and analyse data, create records, update existing ones, but it didn't do ALL the things I wanted like actually build bases. So I took that as a starting point, extended it, and gave myself more options. I could now connect Claude or Mistral, or Gemini or a local model directly into Airtable and work with it properly, not just reading what was in a base, but creating tables, creating fields, adding records, restructuring things. Conversational database management, essentially.

And it was incredibly useful. For the kind of work I do building data structures for organisations, setting up CRM-like systems, creating directories, being able to describe what I need, provide JSON structures, markdown docs of definitions etc and have it built is a significant step up from clicking through forms.

Hitting the walls

But then I started running into limitations. Not with the MCP server I'd built, but with what Airtable's API actually allows you to do.

You can't create views via the API. You can't create interfaces. You can't create certain field types, formula fields being the big one, which are actually really fundamental to how most complex Airtable bases work. You can list and remove views, but you can't create them programmatically.

Now, these aren't technical limitations in the traditional sense. Airtable could expose these capabilities. They've chosen not to. And when you think about why, it makes a kind of commercial sense. Every capability you can't automate is a capability that keeps you in their UI. Every limitation in the API is a reason you need to keep logging in, keep your team on paid accounts, keep building within their ecosystem rather than around it.

This is a strategic decision, not a technical one.

Baserow: the open source alternative

This got me thinking about alternatives, specifically Baserow. It's an open source, EU built alternative to Airtable.

Baserow is API-first. That means every feature is designed to be an integration endpoint. When I extended the Baserow MCP server, I found I could do significantly more than with Airtable. The API is more complete because it's not designed to keep you inside a walled garden, it's designed to be useful. Creating views, managing fields including formulas, working with the full range of capabilities that the UI offers. The gap between what you can do in the interface and what you can do via the API is much, much smaller..

Now of course even saying all that, Baserow isn't perfect either. It doesn't have every feature Airtable has. The ecosystem is smaller, the polish isn't quite there in places. But the foundations are different in ways that potentially make it more useful for some organisations, it's all just about choosing what you value.

The TechFreedom lens

This experience maps really well onto the kind of thinking we do in TechFreedom, the programme Doug Belshaw and I are helping mission-driven organisations examine their technology dependencies. We look at technology choices through five lenses:

Jurisdiction. Where is your data? Who has legal authority over it? Airtable is US-based, data primarily in the US. Baserow is Dutch, data stored in Amsterdam.

Business continuity. What happens if the vendor disappears or changes their pricing? With Baserow, the MIT licence means the software continues regardless. Your data is in PostgreSQL, not a proprietary format.

Lock-in. How easy is it to leave? When your API won't let you recreate your views, formulas, or interfaces programmatically, migration is harder than it needs to be.

Surveillance and data access. Who else can see your data, and under what legal frameworks? US-hosted and EU-hosted data sit under different rules.

Cost exposure. Airtable's per-seat pricing scales quickly. Baserow starts free, paid from $10 per user per month. Self-hosting eliminates per-user costs entirely.

None of these lenses give you a simple answer. What they aim to do is give you a framework for thinking about what you're trading when you pick a tool.

When you're choosing tools, the question isn't just "what can this do today?" It should be "what will this let me do tomorrow?" Proprietary platforms set that ceiling based on their commercial interests. Open platforms set it based on what you're willing to build. For organisations watching their budgets and trying to stay resilient, I think that difference adds up.

If you're thinking about any of this stuff database choices, AI integrations, technology dependencies or if you want to explore what TechFreedom could look like for your organisation, get in touch, find me on LinkedIn, or sign up to the TechFreedom cohort!

Why we created TechFreedom, and why we think it's important

Tom Watson — Fri, 06 Mar 2026 09:16:48 GMT

I've spent years working with organisations who exist to do good in the world. Organisations whose missions are rooted in justice, community, care, and equity. And almost without exception, the digital infrastructure those missions run on belongs to companies who don’t always share the same values.

Technology is a culture choice

I wrote recently a short linkblog riffing on a great post from RebootDemocracy asking who will shape AI in the public interest. The post makes the case that governments have extraordinary power to shape how technology companies behave through procurement, and they're largely failing to use it.

What I liked most was how clearly it articulated something I've argued for a long time: technology choices are cultural and political choices. Procurement isn't just a financial and risk management process, even though it is often treated as such. The tools you choose shape how your organisation works, what it values, and who holds power. Choose Microsoft and you're choosing a particular culture, one that in my opinion tends towards closure, risk aversion, and a narrowing of what feels possible. Choose alternatives and maybe you open up different ways of working, collaboration, different relationships with data, different assumptions about who gets to decide. Now maybe that IS how you want to work, or maybe you are willing to accept the tradeoff for other reasons, but did you ever make this choice?

Technology is never neutral, and now with AI entering the picture, the cultural shaping gets even more pronounced. Microsoft probably leads to Copilot, maybe to OpenAI. Google leads down a different path, but one designed to keep you in that ecosystem. The way models respond, what they surface and what they don't, starts to shape how people think. Every choice is a cultural one, even when it's never described that way.

And it's not just about what you choose. It's about whether anyone in the room even realises there was a choice to be made in the first place

Lists of alternatives are nice, but they don't shift things

About a year ago (around the time Elon was tearing out the heart of digital government in the USA) I shared a site listing European alternatives to US-based cloud services. It's a great resource, everything from hosting to domain registrars to VPNs and browsers. Whether you think the concern about US tech infrastructure is overblown or not, I said at the time there were three good reasons to look: over-reliance on a few companies is never good; there are genuinely excellent tools in there; and have you looked at your risk register lately?

The link between knowing and acting, between operation and strategic can be a bridge too far. A list of alternatives, however good, doesn't really change behaviour on its own. Maybe people look, nod, bookmark it, and then go back to the tools they know. Not because they're lazy or don't care, but because switching can feel hard, the risks feel abstract, and nobody has time to figure it all out alone.

That's the gap TechFreedom is trying to fill. Not another list, but a structured space to actually work through what your dependencies are, what risks they carry, and what you want to do about it.

It's not easy, and we're not pretending it is

The ethics of big tech are not simple. Most organisations use the tools they use for good reasons, they work, they're familiar, and they're often subsidised or free** for mission-driven organisations. We’re not interested in shaming anyone for using any particular technology, it's simply about making invisible dependencies visible so you can make deliberate choices, at your own pace.

I'll be honest about my own practice too. I try to use open source wherever I can. This blog is on Ghost because it's open source. It was easier to be on substack honestly, but I made the choice, deliberately. I build tools openly, share frameworks under Creative Commons, and default to open data. But I still use plenty of US-based infrastructure, because it’s often the easiest or even the best technology for that purpose. The point of all this isn't purity. The point is being honest about the trade-offs and making them deliberately rather than by default.

When it comes to building things, especially with AI, I've been quite vocal about designing for adaptability, using multiple AI platforms and API's deliberately, not because I think any one of them is utterly evil*, but because designing for multiple platforms builds resilience.

The TechFreedom manifesto puts it plainly: no single company should have the power to stop an organisation working overnight. Essential services need clear exit paths, backups, and alternatives. If I only build on one provider's API and they change their pricing, their terms, or their politics, I'm stuck. The same logic applies to any organisation's technology stack. Resilience means having options. It means knowing what your exit paths look like before you need them.

We built TechFreedom on European infrastructure. It was harder.

When Doug and I set up techfreedom.eu, we made a deliberate choice to host on European infrastructure. The domain is European. The hosting is European. The tools we use for the programme itself are privacy-respecting and, where possible, open.

Honestly, it was just a bit harder to do. Not dramatically harder, but enough friction that I understand why most people don't bother. The defaults all pull you towards US platforms, they're often smoother, better documented, more integrated with everything else. Choosing differently takes a bit more effort, a bit more research, a few more decisions.

That friction is itself the problem. And it's one of the reasons we think the programme matters. If even people who do this for a living find it takes extra effort, imagine how it feels for an operations manager at a small organisation who just needs things to work.

Risk registers and the questions nobody asks

In my organisational resilience work, I've spent years helping organisations think about how to anticipate, prepare, respond and adapt. Looking ahead, trying to give yourself options. What are your single points of failure? What's your plan B?

In my experience, most organisations have never asked these questions about their technology. They've never mapped every tool they depend on and scored those tools against real risk lenses like jurisdiction, continuity and surveillance. Tech decisions are often made on technical proficiency and cost. They've often never looked at their tech stack the way they'd look at their finances or their safeguarding, as something that needs active governance.

This is done better together

We’ve been quite intentional about TechFreedom being cohort-based. People working through the same process together over three sessions. You learn as much from hearing how someone else is navigating the same challenges as you do from any framework or facilitator. You spot blind spots in each other's thinking.

Practically it's three two-hour sessions, spaced a few weeks apart. In the first, you map your technology stack, all of them, including the ones you've forgotten about. In the second, you score those dependencies against five risk lenses: jurisdiction, continuity, surveillance, lock-in, and cost exposure. In the third, you make some choices, priorities or a flexible roadmap, quick wins now, planned transitions over the next year, and strategic shifts over the longer term.

You leave with a clear picture of where you are, where your risks sit, and a practical plan for what to do about it. Plus a cohort of peers who are on the same journey, and people to keep you accountable.

If that feels like something you want to be a part of read more here or leave your email in the form below and we’ll be in touch!

TechFreedom sign up

Drift

*I mean some might genuinely be evil

**free to a point and even then those corporations can change their terms quickly without much warning (yes I'm looking at you Microsoft)

Bridging the Data Gap: A Semantic Translation Layer for UK Poverty Data

Tom Watson — Sun, 08 Feb 2026 16:43:05 GMT

Last week I attended a workshop with RSS and JRF looking at Data Gaps in understanding poverty. I had a half (or quarter) baked concept, but I was so overly tired that I couldn't really form a well thought out way of explaining it. So I went down the rabbit hole after a bit of sleep.

We talk a lot about data gaps in the UK, the idea that we don't have enough data to understand the issues we care about. But I think often, the problem isn't that the data doesn't exist, it's more often the case that the people who need it don't know it exists, can't find it, or can't tell whether it answers their question. Like many things, the gap isn't in the data, it's in the connection between data and need.

This is particularly acute for administrative data, the data collected by government departments as a byproduct of delivering services. Benefits claims, tax records, school meal eligibility, homelessness applications, GP registrations. This data is extraordinarily rich, but as I said, it’s primary use isn’t research, or understanding, it’s for delivery services. And that means the way it is described and structured is for that purpose. It's also, for the most part, extraordinarily hard to discover and understand if you're not already embedded in the system that produced it.

But government departments aren’t the only ones delivering services in this area. Local government and the VCSE are heavily involved in both delivery and trying to understand if our approaches are working

The three layers of disconnect

When a VCSE organisation is trying to evidence fuel poverty for a funding bid, they're not going to search for "sub-national fuel poverty statistics LSOA-level DESNZ (department for energy security and net zero for those following along at home)." They're going to think: "how many people in my area can't afford to heat their homes?" And they're going to hit a wall, not because the data doesn't exist, but because nothing bridges their question to the answer.

The disconnect operates at three levels.

First, existence: people don't know a dataset is out there at all.
Second, relevance: even if they find something, they can't tell whether it answers their specific question.
Third, usability: even if it's relevant, the format, geography, or timeliness doesn't work for them.

Current data portals and catalogues - data.gov.uk, Stat-Xplore, Fingertips, the ONS website mostly address only the first level, and they do it using keyword search against technical metadata. The metadata describes the data in the language of the producer, not the language of the user.

What a bridge would look like

The idea is simple in concept: build a semantic translation layer that sits between user intent and data metadata. Instead of requiring users to learn the language of the data, the system learns the language of users' needs.

"Oh, you're trying to understand child poverty in your area? You'd want DWP's children in low-income families data for the baseline, DfE's free school meals data gives you a school-level proxy, and you could triangulate with HMRC's child benefit data, but be aware that FSM eligibility criteria changed with Universal Credit, so the time series has a break."

That response does three things:

it translates from the user's framing to relevant datasets,
it suggests combinations across departmental boundaries,
and it carries contextual caveats about data quality.

These are the functions the bridge needs to perform.

How it could work technically

The core technical approach uses embeddings, vector representations that capture semantic meaning, to match user questions to enriched dataset descriptions. Two things get embedded into the same vector space:

User intents - the questions* people are trying to answer, expressed in their own language. "I want to understand health inequalities in Gateshead." "I need to evidence demand for youth services." "I'm writing a bid about digital exclusion among older people."

Enriched metadata - not just the raw catalogue entry, but descriptions of what each dataset can tell you, what questions it's been used to answer before, what it measures and what it misses, and how it relates to other datasets.

*stop me if you’ve heard me say this before…

When a user poses a question, the system embeds it, finds the nearest metadata vectors, and uses an LLM to generate a plain-language explanation of why each dataset might be relevant including caveats and suggestions for complementary sources.

The crucial addition is a feedback loop. Every time a user says "yes, that dataset was useful" or "no, that wasn't what I needed," the system learns. Over time, it builds a rich understanding of which data helps answer which kinds of questions, knowledge that currently exists only in the heads of a few specialist analysts.

Now this use of questions is something I've talked a lot about, and previously we at Data For Action explored the idea of a Question Bank which attempted to surface the most important questions people were trying to answer with data. And we built a prototype which did some of this semantic meaning, grouping similar questions automatically. Where I think this idea builds upon that is by focusing on the link between questions and metadata and perhaps proposing an enhanced semantic metadata standard.

Starting with poverty data

So I recognise this would require a fair bit of work across every government dataset. But as the workshop, and JRF’s focus is on Poverty, let’s explore what this might mean. Obviously the thing about poverty is that, well, it’s inherently intersectional. Understanding poverty in a place requires data from DWP (benefits and income), HMRC (tax credits and earnings), DLUHC (housing costs and deprivation), DfE (free school meals as a proxy), NHS (health inequalities), ONS (census and labour market), and local authorities (council tax support, local welfare). No single department owns the picture.

This intersectionality forces the system to work across boundaries from day one, which is where both the challenge and the potential value lies.

A phased approach

Rather than trying to build the full system immediately, a staged rollout builds value and trust progressively.

Phase 1: Internal discovery within a single department - Start within DWP. Help their analysts discover relevant datasets from other teams within their own department. This proves the enriched metadata and embedding approach, delivers immediate value, and requires no cross-departmental agreements. The key question to answer: does this help people find data they didn't know existed within their own organisation?

Phase 2: Cross-departmental metadata sharing - Extend to share enriched metadata, not the underlying data, across departments. Start with DWP and HMRC as the highest-value pairing for poverty analysis. This requires metadata sharing agreements, not data sharing agreements, which is a significantly lighter governance ask. Policy teams working on intersectional questions gain visibility into what evidence exists across government.

Phase 3: External access - Open the discovery and translation layer to researchers, local authorities, and VCSE organisations. This is where the feedback loop becomes most powerful, because a much wider and more diverse set of use-cases and questions flows back into the system.

Possibly the most interesting long-term implication is understanding and mapping the demand that emerges from the feedback loop. Over time, the system reveals patterns: "Lots of organisations in the North East are trying to answer questions about X, but no good administrative data source exists." Or: "This dataset is frequently matched to questions but users consistently report it isn't granular enough."

That's a powerful feedback signal to data producers. It makes the evidence-based case for what data should be published, at what granularity, with what frequency, and with what metadata. It closes the loop between data supply and data demand in a way that doesn't currently exist in the UK data landscape.

Beyond discovery: rethinking research access

Everything described so far concerns published data, the aggregates, tables, and statistics that government departments make available through their websites and platforms. But some of the most important administrative data never makes it into published outputs. Record-level benefits data, linked tax and earnings records, individual-level health and education data. This is the data that could answer the most pressing questions about poverty, but accessing it currently requires navigating one of the most restrictive research infrastructure models in the world.

In the UK, detailed administrative data is primarily accessible through Trusted Research Environments - the ONS Secure Research Service, HMRC Datalab, DWP data sharing arrangements, and NHS Digital's Data Access Request Service. These often require a fully specified research proposal detailing your questions, methodology, required datasets and variables, and expected outputs. The process typically takes months. You need institutional affiliation and accreditation. The data never leaves the secure environment.

This model rests on a fundamental assumption: you must know exactly what you're looking for before you can look. You cannot just explore, poke at the edges. You cannot say "I think the relationship between housing benefit claims and GP registration patterns might reveal something about how people fall through gaps between services, but I need to see the data to know." You arrive with a finished question or you don't arrive at all.

The result is a system optimised for a narrow type of user, experienced quantitative researchers at accredited institutions who already know the data landscape well enough to name the specific datasets and variables they need. Everyone else is excluded: local authority analysts, VCSE organisations, policy teams working across departmental boundaries, and researchers at earlier stages of inquiry who haven't yet refined their hypotheses.

The Data Bridge concept has the potential to shift this in three ways.

Better-specified access requests Even within the existing TRE model, the bridge improves outcomes. A researcher who uses the translation layer to discover that a specific combination of DWP and HMRC variables would answer their question arrives at the application process with a stronger, more precisely scoped request. The bridge has done the preliminary investigation that currently requires either insider knowledge or months of desk research. This alone could significantly reduce the friction and failure rate of TRE applications.

Aggregated demand as evidence for new outputs - This is where it gets genuinely interesting. Currently, if fifty organisations across the country all want to understand the relationship between in-work poverty and housing costs in their area, each one separately discovers (or fails to discover) that the relevant data exists, separately works out that they'd need linked DWP and DLUHC data, and most give up because TRE access is beyond their reach. The bridge makes this demand pattern visible and quantifiable: "There were 200 queries this quarter attempting to understand in-work poverty and housing costs at local authority level. No published dataset serves this need. The underlying data requires record-level access through separate TRE applications to two departments."

This creates an evidence base for creating a new standard output, a pre-linked, appropriately aggregated, routinely published dataset designed to serve demonstrated demand without requiring individual record-level access. The feedback loop builds the business case for data producers to invest in new publications by showing them exactly what users need and how often.

A permissive middle tier: guided aggregation - The most ambitious implication is the possibility of a new access model sitting between open published tables and full TRE access. Call it guided aggregation. A user specifies their question through the bridge. The system identifies the relevant underlying datasets and variables. Rather than granting the user access to records, it runs pre-approved analytical queries against the data within the secure environment and returns aggregated, disclosure-controlled results tailored to the user's actual question.

The user never sees a record. The analysis is constrained to prevent re-identification. Statistical disclosure control is applied automatically. But the output is responsive to the specific question being asked, rather than being a one-size-fits-all published table that may not cut the data in the way the user needs.

This isn't unprecedented in concept. The Australian Bureau of Statistics' TableBuilder operates on a similar principle, users construct custom tabulations from microdata, with automated confidentialisation. But combining this with a semantic translation layer that helps users formulate their questions and identify the right data sources would be a significant step beyond what currently exists.

From research-ready to question-ready - The UK has invested substantially in making administrative data research-ready through TREs, the ADR UK programme, and the growing network of secure research infrastructure. This investment has been valuable. But research-ready is not the same as question-ready. Research-ready means the data is clean, documented, and available in a secure environment for accredited researchers with pre-approved proposals. Question-ready means the data is accessible at appropriate levels of aggregation and disclosure control to the full range of people who have legitimate questions it could help answer.

The shift matters because the people closest to the problems that poverty data should illuminate, frontline charities, community organisations, local authority officers, the people being described by the data are almost entirely excluded from the current access model. They have questions. The data has answers. The bridge isn't just about connecting the two but about building an infrastructure that treats their questions as legitimate grounds for access, not just the questions that arrive wrapped in a formal research proposal from a Russell Group university.

What this isn't

It's worth being clear about boundaries. This doesn't replace TREs or the safeguards they provide, record-level data access for complex research will always need secure environments and governance. It doesn't remove the need for analytical skills, statistical literacy, or human judgement about data quality. And the guided aggregation concept would need significant work on automated disclosure control and governance frameworks before it could operate at scale.

What it does is challenge the assumption that the only legitimate interaction with administrative data is a fully pre-specified research project. It suggests that discovery, exploration, and question-driven analysis are valid modes of engagement with public data, and that building infrastructure to support them would unlock value that the current model leaves on the table.

More practically, it takes the knowledge that currently lives in the heads of a small number of specialist intermediaries, the people who know what data exists, where to find it, what it can and can't tell you, and how different sources relate and makes that knowledge systematically accessible. And it takes the questions that thousands of organisations ask in isolation and makes them collectively visible, creating the demand signal that data producers need to justify investment in better, more accessible outputs.

It's also likely that this isn't the solution, or it's simply unworkable. What I heard in the workshop was people who think about this a lot more than me, with a lot more knowledge than me. And so, who really am I to think I've got the answer? Just someone who thinks questions are important, and that the biggest barrier to data isn't really data at all.

The Interface: Building Blocks for Just-in-Time Software

Tom Watson — Sun, 01 Feb 2026 21:33:02 GMT

The second of 8 ideas I set out in this blog at the end of 2025

If I was going to build a startup in 2026 (and I might) here's what I would be looking at: not generative dashboards, but generative components that let people (and especially the social sector) build exactly what it needs, when it needs it.

The 2023 Prediction

Back in 2023 (ish), I thought we'd soon stop building static dashboards and instead just talk to our data. Ask questions, get visualisations on the fly. "Show me referrals by area last month." Boom, chart appears.

And we're sort of there. You can absolutely do this now. Claude and GPT can generate charts from data. Ask a question, get a dashboard.

Except it doesn't really work. Not because the AI can't generate the interface. But because our data isn't ready for that really. Our data is still messy, siloed, inconsistently structured, and locked in systems that can't expose it cleanly or easily. Try spinning a just in time dashboard while going in a death loop of microsoft admin privileges.

But as we move into 2026 I have begun wondering again about just in time interfaces beyond just dashboards, but as service components ready to be deployed.

What If We Went Component-Level Instead?

What if we generated the building blocks that organisations could assemble into exactly the software they need, right now, for this specific task?

The social sector doesn't need fewer tools. It needs the ability to create specific tools when specific needs arise, without requiring £100k custom development projects or forcing everyone onto generic one-size-fits-none platforms.

The startup opportunity is building a component library and assembly infrastructure specifically for just in time software and services.

Why Components, Not Complete Applications

Every organisation I work with has the same conversation:

"We need software that does [incredibly specific thing relevant to our specific community, in our specific context, integrated with our specific other systems]."

The options are:

Pay a fortune for custom development
Use a generic platform and compromise on 70% of what you actually need
Glue together five different SaaS tools and spend all your time managing integrations
Build it yourself and hope

What if there was option 5:

Assemble it from standardised, interoperable components built by organisations who've already solved similar problems

Not "here's a complete CRM," but "here's the component for intake forms, the component for case tracking, the component for outcome measurement, the component for referral coordination" and you assemble them into the specific tool your organisation needs.

Make use of the expertise we have.

This is where it gets interesting. The social sector has massive expertise embedded in specific contexts.

A foodbank network that's solved complex inventory tracking in a crisis response context
A youth service that's built brilliant engagement tools for young people
A mental health charity that's developed sophisticated consent management for sensitive data

Right now, that expertise stays trapped in their specific systems unless they decide to launch it into a sass product.

What if each of these organisations could extract the component they've built and make it available for others to use in different contexts?

The foodbank's inventory component becomes useful for disaster response, community fridges, school uniform banks. The consent management component becomes useful across health, education, criminal justice sectors.

The Startup Opportunity

The Component Library for Social Purpose: A registry of pre-built, open-source components designed specifically for social sector needs. Not generic business tools, but components that understand:

Complex consent requirements
Safeguarding protocols
Outcome measurement frameworks
Partnership coordination
Resource scarcity
Community engagement

The Assembly Interface: An AI-powered tool that helps organisations describe what they need and assembles relevant components into a working prototype. "We need to track community garden plots, coordinate volunteer shifts, and report outcomes to our funder" → the system suggests relevant components and shows how they'd fit together.

The Contribution Protocol: A way for organisations to package their specific solutions as reusable components. You've built something brilliant? Extract the core functionality, generalise it just enough to be reusable, contribute it back to the commons.

The Adaptation Layer: Because no component will be perfect for every context, an AI layer that helps adapt components to specific needs. The foodbank inventory component needs to work for hygiene products instead of food? The AI helps adapt the logic while maintaining the core patterns.

What This Looks Like in Practice

A small advice charity serving refugees needs:

Intake forms in multiple languages
Case tracking with complex family relationships
Document management with privacy controls
Referral coordination with other services
Outcome measurement for funder reporting

Traditional approach:

Spend £50k on custom development, or
Use a generic CRM and hack it into something sort-of-usable, or
Keep using spreadsheets and/or clunky databases

Component approach:

AI helps them describe their needs in plain language
System suggests relevant components:
- Multilingual intake forms (from a migrant support network)
- Family relationship tracking (from a children's charity)
- Secure document storage (from a domestic violence service)
- Referral coordination (from the network protocol infrastructure)
- Impact measurement (from a sector-wide outcomes framework)
Organisation assembles these into a working prototype in hours, not months
AI helps adapt components to their specific context
They test with real users, refine, deploy
Six months later, they've built a novel component for supporting refugees through housing applications and they contribute it back for others to use

Why this (might) actually work now

Three things make this viable now that weren't five years ago:

1. AI can do the assembly and adaptation: The hard part isn't building components (developers can do that). It's helping non-technical people describe what they need and assemble components intelligently. LLMs are really good at this translation layer. Yes we will need to think carefully about the context layer, and constraining LLM's enough to make it work, but we have the tools to do it, and people, in both technical, data and service roles who could do this.

2. APIs and microservices are mature: The technical patterns for building interoperable components are well established. We know how to do this now.

Could this be the way to solve siloed data?

If components have standardised data structures, organisations gradually converge toward cleaner, more interoperable data. Not through mandated standards (which are really hard to make work in multi organisational settings), but through practical utility. "This component needs data structured this way" → "okay, let's structure it that way because this component is really useful." It's standards by stealth.

Bottom-up standardisation through shared components, not top-down mandates that everyone ignores.

What might this look like

In five years:

A new charity can spin up working systems in days, not months
Innovations built by one organisation are available sector-wide
Small organisations have access to sophisticated tools previously only available to large ones
The social sector has a thriving commons of shared components
Organisations spend less time on systems and more time on mission
Data gradually becomes more interoperable through practical use, not mandates

The infrastructure that makes this possible:

Open source component library with social sector primitives
AI-powered assembly and adaptation tools
Clear protocols for contributing components back to the commons
Quality standards and peer review for component reliability
Governance by and for the sector, not by vendors

But why should we bother?

We are drowning in software debt and expensive systems that don't quite fit. We have technical debt we can't afford to fix and innovation locked in proprietary platforms.

We need infrastructure that lets the sector build what it needs, share what works, and adapt quickly when needs change.

Not another platform. The infrastructure for rapid, context-specific assembly of exactly what each organisation needs.

The AI can generate the interfaces. That's the easy part.

The hard part is building the component library, the contribution protocols, the quality standards, and the governance structures. Components, not platforms. Commons, not vendors. Just-in-time assembly, not one-size-fits-none.

That's the interface layer the sector actually needs.

Building LLMs.txt for the social sector

Tom Watson — Fri, 30 Jan 2026 15:50:29 GMT

Or as I call it "AI find my organisation and be accurate please"

This is another in a series of blogs exploring things I've built, lifting the lid on both the technical and conceptual ideas behind them. I hope it helps us in the social purpose sector think about technology & AI as more than chat interfaces, and that we perhaps should be looking at things on a broader level.

This time I want to talk about infrastructure and boring tiny tools. Nope, no flashy chat here I'm afraid. Instead, I want to explore something much more foundational: how do we make the work of social purpose organisations visible and accessible to AI systems in the first place?

What is llms.txt?

Before we get into what I've built, let's talk about the problem I'm trying to solve.

AI systems - the large language models (LLMs) that power everything from ChatGPT to Claude to various coding assistants - increasingly need to understand websites. Whether it's answering questions about an organisation, helping someone find support services, or assisting a funder in understanding who's working in a particular space, these systems need to get information from somewhere.

The problem is that websites are designed for humans, not machines. They're full of navigation menus, JavaScript, cookie banners, and complex layouts. Converting a modern website into something an AI can actually understand and use is surprisingly hard. And even when you do extract the text, context windows (the amount of information an AI can process at once) are limited - and although they are increasing, you can't just feed it an entire website and hope for the best.

This is where llms.txt comes in. It's a proposed specification - created by Jeremy Howard at Answer.AI - for a simple markdown file that sits at `/llms.txt` on your website and provides a curated, AI-friendly summary of who you are and what you do. Think of it as a "robots.txt for AI" - but instead of telling crawlers what not to index, it tells AI systems what they actually need to know about you.

A standard llms.txt file has a specific structure: a title, a short description, some context, and then links to more detailed information organised into sections. It's human-readable (it's just markdown) but also machine-parseable, which means tools can work with it programmatically.

So we've already seen the data, showing that traffic from traditional sources (search engines) is decreasing, as people use LLMs as search, and so organisations without good AI-readable content will become increasingly invisible. If someone asks an AI assistant "what charities support refugees in Newcastle?" and your organisation doesn't show up because the AI couldn't understand your website properly, you've lost an opportunity to help someone.

But there's a bigger picture too. If the social sector wants to build towards a future where data flows more freely - where funders can more easily understand the landscape, where collaboration happens because people can actually find each other, where we stop reinventing wheels because we didn't know the wheel already existed, then we need to consider things like this.

llms.txt is a small piece of infrastructure that supports this vision. It's a way for organisations to represent themselves clearly and consistently to AI systems, in a format that's standardised enough to be useful but flexible enough to capture the nuance of what social purpose organisations actually do.

What I've built (and why it's different?)

I built the tool to generate llms.txt files automatically. It's called llmstxt-social and it exists in two forms: an open-source command line tool that anyone can run themselves, and a web-based service for those who'd rather not deal with the technical bits.

So, if you hit google(other search engines are available ) and type in llms.txt generator, there are many. So why build another one? Well, the other generators are just based on the original llms.txt specification which is quite general - it was designed with technical documentation in mind. Our version adapts this for social purpose organisations with some important additions.

Templates for different organisation types. A charity has different information needs than a funder, which has different needs than a local authority. We have four templates - charity, funder, public sector, and startup - each with sections that make sense for that type of organisation.

For instance, our charity template includes:

For Funders section: registration number, geography, themes, beneficiaries - exactly the information a grant-maker needs to assess fit
For AI System section: explicit guidance on how AI should represent the organisation, including caveats and things to avoid
Data enrichment- We don't just scrape the website - we pull in data from external sources to provide verified context:
- Charity Commission data (official registration, financial information, charitable objects)
- 360Giving data for funders (grants history, typical award sizes, geographic distribution)
This means the llms.txt file contains information that might not even be on the website, but is crucial for understanding the organisation properly.
Quality assessment. We don't just generate a file and leave you to it. The tool includes a comprehensive assessment system that scores completeness, checks for missing sections, and provides actionable recommendations for improvement.

How it actually works

Let me break down what happens when you run the tool:

Step 1: Crawling

We start by crawling the website. This sounds simple but involves a fair bit of care:

Respecting robots.txt (we're good AI citizens)
Using sitemap.xml if available (many sites have one but don't realise it)
Falling back to link discovery if not
Rate limiting to avoid hammering servers
Optionally using Playwright for JavaScript-heavy sites that don't render properly otherwise

We typically limit this to around 30 pages - enough to get a good picture, not so many that we're processing irrelevant content.

Step 2: Content extraction

Each page gets processed to extract just the actual content:

Removing navigation, headers, footers, scripts, styles
Classifying the page type (about, services, contact, projects, etc.)
Extracting contact information
Finding charity numbers

This classification is important - it means we can group related pages together sensibly in the final output.

Step 3: Data enrichment

If we've found a charity number (or one was provided), we call the Charity Commission API to get official data:

Registered name and status
Registration date
Latest financial information (income and expenditure)
Charitable objects (the formal statement of what the charity does)
Trustee information
Contact details

For funders, we can also pull 360Giving data showing their actual grants history - incredibly useful for applicants trying to understand what a funder really supports versus what their website says (or doesn't say) they support.

Step 4: AI analysis

Now comes the LLM bit. We feed the extracted content and enrichment data to Claude and ask it to:

Write a concise mission summary
Generate clear descriptions for each page
Categorise services and programmes
Identify target beneficiaries
Create guidance for how AI systems should represent the organisation

This is where the 'magic' happens, but it's also where careful prompting matters. We use structured outputs (schema validation) to ensure the AI returns data in a consistent, predictable format rather than getting creative with the structure.

Step 5: Generation

Finally, we assemble all this into the llms.txt file using the appropriate template. The output follows the spec strictly - H1 title, blockquote description, H2 sections with markdown lists of links and descriptions.

Step 6: Assessment (optional)

If you want, we'll then assess the generated file against what we know about the organisation:

Is it complete? Are expected sections present and populated?
Is it accurate? Does it align with the enrichment data?
Is it useful? Will an AI actually be able to use this effectively?

We score out of 100 and provide specific recommendations for improvement.

Provider agnosticism (again)

Just like with Open Recommendations, I've built this to be AI provider agnostic. The tool currently defaults to Claude, but the architecture supports switching to other providers. Given how rapidly the AI landscape is changing, this flexibility is essential.

Different models are better at different things. The extraction and classification tasks could potentially run on smaller, cheaper models. The summary generation benefits from more capable models. Having the flexibility to mix and match - or to switch entirely if pricing or availability changes - is crucial for sustainability.

The subscription model: monitoring for change

Here's something that took me a while to figure out: llms.txt files need to stay current. Organisations change - new services launch, old ones close, contact details update. A stale llms.txt file is arguably worse than no file at all.

So i've built a subscription tier that monitors your website and automatically regenerates your llms.txt file when things change significantly. It:

Periodically recrawls your site
Compares new content against the existing file
Regenerates if meaningful changes are detected
Tracks change history so you can see what evolved

This costs £9/month - priced to be accessible to small organisations while covering the actual costs of running the infrastructure (hosting, API calls, monitoring).

Open source vs paid service

I want to be clear about the model here because I think it matters. No-one funds me to do this stuff.

The core tool is open source (MIT licensed). You can clone the repository, run it on your own machine, generate as many llms.txt files as you want, and never pay us a penny. If you're technically capable and have an Anthropic API key, there's nothing stopping you from doing everything yourself.

The web service exists for convenience and for features that need ongoing infrastructure - like the monitoring subscription. The free tier gives you 10 generations per day (basic output). The paid one-off option (£7) includes full assessment and enrichment. The subscription (£9/month) adds monitoring and automatic updates.

The paid tiers are priced to cover costs - server hosting, API calls to Claude and the Charity Commission, database storage, monitoring infrastructure - plus a small margin to make this sustainable. We're not trying to extract maximum value; we're trying to make this genuinely accessible while keeping the lights on.

What I've learned and what I'm trying to do with this stuff

The things I'm building and experimenting with at the moment are a quite intentional pushing at the edges and provocations for thinking about AI and us. Things I think (maybe):

Infrastructure matters more than features: The social sector loves talking about AI for efficiency, AI for programme delivery, maybe for analysis. That's all important. But if our organisations aren't visible and correctly represented to AI systems in the first place, none of that matters. We need to invest in the boring plumbing as much as the shiny features.

Standards are valuable even when imperfect: llms.txt is still a proposal. It might evolve, it might get superseded. But having something standardised to build on is immensely valuable. Perfect is the enemy of good, especially in infrastructure, minimum viable standards etc...

The 80/20 split is real: No, I'm not talking about my running training. As with Open Recommendations, about 80% of this project is not the AI part. It's the crawling logic, the validation, the error handling, the user interface, the subscription management, the deployment configuration. The AI is the exciting bit that makes it possible, but the engineering around it is what makes it usable.

Open source plus commercial can work?: Can it? This is a test. Now of course there are many good examples of this actually working at scale, but what about small scale. I'm a bit nervous/curious about this model - will people resent the paid tiers? Would it feel like bait-and-switch? So far, no, but it's early days and I've managed to cover 2 months infrastructure costs, will that keep up? Being transparent about what's free and what costs money, and being honest about why things cost money, will hopefully work. But probably it all comes down to value, and if people don't value the tool, the model doesn't really matter.

How to get started

Anyway, if you've made it this far, maybe you want to try it out

For the technical folks: Clone the repo, install the dependencies, set up your Anthropic API key, and run `llmstxt generate https://your-website.org.uk`. Full instructions in the README.

For everyone else: Visit https://llmstxt.social, paste in your URL, and click generate. The free tier will give you a basic llms.txt file. If you want the full assessment and enrichment, there's a one-off payment option. If you want ongoing monitoring, there's a subscription.

Either way, you'll end up with a file you can add to your website at `/llms.txt`. If you're using a CMS that makes this awkward, you can also link to it from your homepage or put it in a place you can reference.

Why this matters

I'll be honest - this isn't glamorous work. No one's going to get excited about infrastructure files for AI systems. There are no pretty dashboards, no impressive demos, no "look what AI generated" moments to share on social media.

But I genuinely believe this kind of foundational work is essential. As AI becomes more embedded in how people find and understand organisations, we need the social sector to show up properly. Not just to be visible, but to be represented accurately - with the nuance and context that our work deserves.

llms.txt is maybe one small piece of that puzzle. It's a way for organisations to take control of their AI representation rather than leaving it to chance. And by building tools that make it easy to create and maintain these files, I hope we're lowering the barrier enough that this becomes normal infrastructure for the sector.

And ff you have questions, thoughts, or want to try something like this and/or collaborate, give me a shout on tom@good-ship.co.uk or on the linky

The Garden of Ideas

Tom Watson — Thu, 29 Jan 2026 12:05:28 GMT

Ideas are easy...[insert one of the many quotes about execution/implementation/teams and stuff here]

I dunno. Are ideas easy? Maybe. Good ideas probably less so. But there is some truth that some of what stops ideas becoming more than that is choices and decisions. Often it's the choice to not do something, so that another idea can grow. And to grow an idea we often need to make decisions about when and how.

There are many tools and frameworks that can help you with these things, stop/start/continue, now/next/future, matrix of all types. So why another one Tom? I dunno, I guess I just kind of like gardens and metaphors.

So I put together The Garden Ideas, a simple facilitation tool for individuals and groups.

First, the metaphor.

A garden isn't a holding space where things wait indefinitely. It's a living system that forces honest choices. Seeds are planted with intention (mostly though there always surprises or is that just me?). Some sprout quickly (hopefully). Others need time underground before they're ready. And crucially, not everything gets planted, a gardener makes choices about what the space can actually sustain and hopefully how the plants interact. Pollinators and fruit, complementary plants that sustain the soil etc.

Framing our ideas as seeds, our soil as conditions, our pruning or composting as choices seems to land with people. And in a world where our to-do list never shrinks, this approach can feel both kinder and more decisive.

Composting feels kinder than deleting. When we decide an idea isn't right for this garden, this time, we're not throwing it away. We're returning it to the soil. It might nourish something else later. This matters when you're staring at a to-do list that's become a guilt list.

Banking seeds is strategic, not procrastination. A seed bank isn't where things get forgotten, it's where they're preserved for the right conditions. This gives us permission to say "not now" without saying "never."

Seasons create natural timing conversations. Instead of debating whether an idea is good or bad, high priority or low, we can ask: is this the right season? Sometimes an idea needs conditions that don't exist yet. Sometimes we need to wait for capacity, funding, or the right people.

Gardens have limits. A to-do list can grow forever. A garden can only hold so much. This constraint is a feature, it forces the prioritisation conversation that infinite lists let us avoid.

The Garden elements - seeds, soil & conditions, Seasons, Pruning and composting

How the garden works

The framework scales from personal use to group facilitation.

For individuals, it's straightforward: capture your seeds (ideas, tasks, possibilities), then sort them. What are you planting now—actually committing time and energy to this season? What goes in the seed bank for when conditions change? What do you need to compost, finally letting go of things that have been sitting on lists for months?

For groups, the framework adds a layer. Each person maintains their own garden throughout a session, capturing seeds as they emerge. A shared seed bank catches ideas that surface when the group can't explore them, and intentionally, you schedule when to return to it. At the close, people bring seeds from individual gardens into a collective space, deciding together what to plant, bank, or compost.

The sorting conversation

This is where the metaphor earns its keep. For each idea, ask:

Plant now? What does this need to grow? What resources, time, skills, people? Who will tend it?
Seed bank? When will conditions be right? What needs to change first? When do we revisit?
Compost? What would this free up if we let it go? What space does it create?

The language gives permission to make real choices. "Composting" doesn't feel like failure, it feels like making space for what can actually flourish. And the seed bank offers a middle path between doing everything now and abandoning ideas entirely.

A word on pruning

Some facilitators struggle with groups who want to plant everything, or individuals who can't let anything go. The garden metaphor helps here too: a garden can only hold so much. Trying to grow everything means nothing gets the attention it needs.

Pruning isn't cruel. It's care (talking to myself while pruning the lavender....)

Using the toolkit

I've put together a simple toolkit with printable worksheets - an individual garden sheet, a seed bank poster, and a collective garden template. It's designed for multi-session facilitation but works just as well for a personal quarterly review or a team planning day.

The toolkit is freely available under Creative Commons (CC BY-NC 4.0). Use it, adapt it, see what grows.

garden-of-ideas (1)

garden-of-ideas (1).pdf

150 KB

The Garden of Ideas emerged from years of facilitating strategy sessions with organisations, where good ideas often got lost in the shuffle or the HiPPO (Highest Paid Person's Opinion) and from my own struggles/hatred with to-do lists that never seemed to shrink. If you use it, I'd genuinely love to hear what worked and if you adapt it, pay it forward with Open Licensing!

Building Open Recommendations

Tom Watson — Sun, 25 Jan 2026 21:49:29 GMT

I'm not sure if you've noticed, but there is a lot of buzz around AI; why you should/must use it, how it can save time/do everything, how it will destroy the world. So let's get this out the way first.

Yes I will be talking about AI, but there will be no hype or hyperbole here. I will be explaining what and how we've used AI in building Open Recommendations as a way of explaining how I think about the use cases, the advantages and disadvantages of certain approaches, and how I approach it from a data point of view. It will be practical and a bit technical but it will be grounded I hope.

What is Open Recommendations?

It's designed to be a central hub to upload, analyse, and track reports and recommendations across the social purpose sector. In its most simple form it:

allows users to upload a 'source' such as a report via either a document or url
extracts the source into a machine readable format
summarises and categorises against a taxonomy
pulls specific recommendations from the source
categorises these recommendations against a taxonomy
allows the searching, exploration and chatting with the knowledge base created
allows community based tracking of action towards recommendations

Or in more simple terms:

Get stuff out of documents
Make use of this knowledge
Track whether we actually do anything about all the recommendations we make

You can read more about the why of Open Recommendations here.

Making use of AI

So yeah I use LLMs in Open Recommendations and what I wanted to do here is break down how I've used them, as maybe a helpful guide to others if they are thinking about exploring this stuff. As mentioned there is a lot of stuff written about AI and I find that in the social purpose sector especially we give a very minimal amount of actual detail of what this actually involves. Or if we do give detail it is very technical. I also want to lay out some of the choices I've made along the way and why I made them. So I'm going to try to strike a balance here between giving enough detail without getting too technical.

As I broke down earlier, there are various things that Open Recommendations does. From an AI point of view this can be explored in two main parts: getting stuff in, and making use of the stuff once it's in.

AI ingestion (getting stuff in)

So one of the things we think is useful on Open Recs is the ability to upload a source in the form of a document, which in many cases is a pdf. Now I could write a whole post about the dreaded pdf, but let's face it, it's still the most used format for documents. We also allow users to point to a url if they have a source that is HTML which is a much simpler process, so we'll leave that for now.

When a user uploads a pdf we do the following:

Firstly we store that pdf as is
Then we use an OCR to fully extract everything in it - text, data tables, links - into a machine readable format (markdown) and store this making it possible to fully recreate the pdf but in a more usable format.
Once we have the full extract we use an LLM to create a summary of the source and categorise it (by source type, purpose, thematic area, role relevance) and store these
We then create an embedding of this information
We then extract specific recommendations contained within the source
And then we categorise those recommendations (purpose, target audience, thematic area, location scope)
Finally we create an embedding of each recommendation

So there's a lot going on here, but to a user it's simply click upload and await the return to check and make any edits if they choose to. I chose to split these processes up for a couple of reasons.

Firstly from a technical point of view this allows independent scaling - if recommendation extraction is slow, we can optimise that without touching the OCR step. But more importantly, it means we can swap components in and out. This becomes really important when we talk about AI models.

The "don't get too attached to your AI provider" principle

So, in case you hadn't noticed, AI providers are not stable partners. Pricing changes, models get deprecated, service goes down, and suddenly that OpenAI dependency you built your whole system around becomes a problem.

So I built Open Recommendations to be provider-agnostic from day one. It currently support over 19 different AI models through a unified interface - Mistral, OpenAI, Anthropic (Claude), Google Gemini, and even some more niche options like GreenPT for those concerned about environmental impact.

The way this works is fairly straightforward. We have a single function that takes a model name and returns the right provider configured correctly. When we need to make an AI call anywhere in the system, we call this function rather than directly calling OpenAI or whoever. If tomorrow we decide Claude is better for a particular task, we change one configuration value. If Mistral doubles their prices, we switch to Gemini. No rewriting of business logic required.

This isn't just about hedging bets though. Different models are genuinely better at different things, and this architecture lets us be intentional about that.

Different models for different jobs

This is where it gets interesting. Not all AI tasks are created equal, and treating them as such is both wasteful and often produces worse results.

Take our OCR step. When we're extracting text from complex PDFs - ones with tables, multiple columns, embedded images - we need something that can actually see the document. So we use vision-capable models here, specifically Mistral Vision or Google Vision depending on the document complexity. A pure text model simply can't do this job.

But when we're summarising that extracted text? We don't need vision capabilities. We need something good at understanding context and producing coherent summaries. Here a model like Claude or GPT-4 Turbo excels.

For the recommendation extraction, we need something that can follow fairly complex instructions reliably - identifying what is and isn't actually a recommendation (more on this shortly), categorising against multiple taxonomies, and outputting in a very specific format. This is where model choice really matters, because consistency is everything.

And for the embedding generation - turning text into numerical vectors for search - we use a completely different type of model altogether. These embedding models are specifically trained for this task and are much cheaper to run than the big language models.

The point is: matching the model to the task saves money, improves results, and means you're not paying GPT-4 prices to do work that a smaller model handles perfectly well.

Making AI play nice with your data (strict structures)

The most consistent thing about LLM's is that they are not consistent, they're unpredictable. Ask the same question twice, you might get differently formatted answers. Ask it to return JSON and it might wrap it in markdown code blocks. Or not. Or add a helpful introduction before the JSON that breaks your parser.

This is a massive problem when you're building a system that needs to actually do something with the AI's output. We need recommendations in a specific format so we can store them in our database, search them, categorise them. "Close enough" doesn't cut it.

So we use something called schema validation to define exactly what the AI's output must look like. When the AI returns something, we validate it against this schema. If it doesn't match, we reject it and try again (or handle the error gracefully).

For example, a recommendation must have a recommendation_text (string), a target_audience (from a defined list), a thematic_area (also from a defined list), a confidence_score (High, Medium, or Low), and so on. If the AI decides to get creative and return something different, our system catches it.

This sounds pedantic but it's absolutely essential. Without it, you end up with data quality issues that compound over time. One recommendation stored with thematic_area as "health" and another as "Health" and another as "Healthcare" - suddenly your filtering and analytics are broken.

We also use controlled vocabularies for our categorisation. Rather than letting the AI free-text categorise things, we give it a specific list: "The thematic areas are: Health, Education, Environment, Housing, Employment..." and so on. The AI must pick from this list. This means our data is consistent and our filters actually work.

Not everything is a recommendation

One of the more interesting challenges was teaching the AI what actually constitutes a recommendation. Turns out a lot of text that sounds recommendation-ish... isn't.

Consider these three statements from a typical report:

"The sector needs significant improvement in data sharing practices"
"Many organisations struggle to secure long-term funding"
"Local authorities should establish dedicated funding streams for community organisations within the next 18 months"

Only the third one is actually a recommendation. The first two are observations or statements of the problem. A recommendation needs to be actionable and prescriptive - it should have someone who needs to do something, and ideally a sense of what and when.

We train our extraction prompts to look for imperative verbs (establish, develop, implement, create, ensure), clear target audiences, and actual calls to action. We also assign confidence scores - if it looks like a recommendation but we're not sure, we mark it as low confidence and the user can review.

This filtering is crucial. Without it, you end up with a database full of "recommendations" that are actually just problems restated. Not useful when you're trying to track action.

This training is an ongoing process and we don't always get it right. What we have noticed is that people write in a lot of different ways and are often not explicit in recommendations. There are probably many reasons for this, but it makes it hard, for both humans and especially non-humans to fully pick the intention.

Making use of the knowledge (search and chat)

Once we've got all this structured data, the next challenge is making it useful. This is where embeddings come in.

An embedding is essentially turning a piece of text into a list of numbers - a vector - that represents its meaning. Similar texts will have similar vectors. This means we can do semantic search: find recommendations that are conceptually similar even if they use completely different words.

So if someone searches for "climate action in schools" we can find recommendations about "environmental education programmes" or "reducing carbon footprint in educational settings" - things that are clearly relevant but don't contain the exact search terms.

We combine this with traditional keyword search (sometimes you do want exact matches, like a specific organisation name) and the result is a hybrid search that handles both use cases.

The chat interface builds on this. When a user asks a question, we:

Turn their question into an embedding
Find the most relevant sources and recommendations
Feed those to an AI as context
Generate a response that's grounded in the actual content

This is called RAG - Retrieval Augmented Generation - and it's the difference between an AI that makes things up and one that gives you answers based on actual evidence. The AI can only draw from the sources and recommendations we've given it, and we show the user which sources informed the answer. We think this is why Open Recommendations is better/different than more generalised chat interfaces.

Tracking progress (do we actually do anything?)

This is perhaps the most important and least glamorous part of Open Recommendations. We make a lot of recommendations in this sector. We commission reports, hold consultations, conduct evaluations. And then... what?

Open Recommendations allows community-based progress tracking. Anyone can add an update to a recommendation: "We've started work on this", "This was implemented in our area", "This was rejected because...". Each update captures who made it, when, and crucially - a sentiment score.

The sentiment scoring was an interesting design challenge. We wanted to capture not just "has this been done or not" but the nuance of progress. Is this update positive movement? Negative? Neutral? We use AI to analyse the update text and assign a sentiment score from 0 to 1, which we then aggregate to give an overall sense of momentum on a recommendation.

This creates a kind of community evidence base. If ten organisations all report that a particular recommendation is impractical, that's valuable information. If one area has successfully implemented something, their experience can inform others.

The challenge here is weighting. A progress update from the organisation named in the recommendation probably carries more weight than one from a random observer. We're still iterating on how best to represent this, but the principle is that tracking should be community-owned, not gatekept.

What I've learned

Building Open Recommendations has reinforced a few beliefs I had and challenged others.

The biggest reinforcement: AI is a tool, not magic. It's very good at certain things (extracting structured information from unstructured text, understanding semantic similarity, generating contextual responses) and poor at others (being consistent, knowing what it doesn't know, handling edge cases gracefully). The skill is knowing where to deploy it and where to keep humans in the loop.

The biggest surprise: how much of the work is not the AI part. Data modelling, validation, error handling, user interface, privacy controls - the AI is maybe 20% of the actual system. The other 80% is all the boring stuff that makes it actually usable.

And the ongoing learning: the right architecture matters enormously. The decision to make the AI layer swappable has already paid for itself multiple times as I've optimised costs and performance. If I'd hardcoded OpenAI calls throughout the codebase, we'd be in a much harder position now.

In closing

If you're in the social purpose sector and thinking about building something with AI, I hope this has been useful. The technology is genuinely capable of things that would have been impossible a few years ago. But it needs to be approached with clear thinking about what you're actually trying to achieve, what the AI can realistically do, and how you'll handle it when things go wrong.

Open Recommendations isn't finished - it's still being developed and improved. But the foundations are solid because we took time to think through the architecture rather than rushing to bolt AI onto something.

If you'd like to try it, learn more, or get involved, you can find Open Recommendations at https://www.openrecommendations.com/. And if you've got questions about anything I've covered here, I'm happy to go deeper - just get in touch.

Is the best way to actually make AI work in an organisation to focus on constraints?

Tom Watson — Wed, 14 Jan 2026 11:40:59 GMT

Is the best way to actually make AI work in an organisation to focus on constraints?

AI offers so many opportunities that we end up doing a bit of everything… and realising the gains from none of it. The menu is endless, so you stare overwhelmed and walk away and get chips.

Users when landed with a new tool that can do so many things often go in wide ranges - questions or tasks that are too broad, too narrow, too difficult, too easy and then don't see the value.

Go back a couple of years and prompt engineering was all the rage as it in some ways provided constraints. But then reasoning models came along and we moved away from that. Which is fine for individuals. But not when it comes to AI in organisations and in services

Constraints are what turn possibility into progress:

Pick a service (not a shiny tool).
Name the job to be done and the users who benefit.
Define what “good” looks like (time saved, errors reduced, quality improved).
Set boundaries: what it is for, what it isn’t for, and when a human must step in.
Build the simplest workflow that makes it repeatable, measurable, and safe.

All the talk at the minute is to experiment, don't be left behind. Yes, experiment to find the edges, but then commit to a few well-scoped uses and ship them inside real services.

It’s the same with data: having all the data often tells you nothing. Value comes from purpose, definition, and decision-making constraints.

Constraints can take many forms

Purpose constraint: This assistant drafts first versions of X for Y audience. It doesn’t approve, decide, or sign off.
Context constraint: Only use these sources/records. If they’re missing, ask for them (or say you can’t).
Output constraint: Use this template, include these fields, keep it to 200 words, match this tone.
Risk constraint: If the topic touches safeguarding/legal/finance, route to a human or require a second check.
Quality constraint: State assumptions, flag uncertainty, and link back to the source of truth.

But mainly they come down to For this, not this. Like this, not like this.

Hmm. Maybe I’ve just described service design.

The end of Data For Action

Tom Watson — Sun, 04 Jan 2026 17:18:29 GMT

After three years, Tom (F) and I are bringing Data For Action to a close. As Tom heads off to exciting new pastures, we're ending it together - Tom & Tom are no more. But before looking to the future, I wanted to capture what we learned, what we built, and what I hope others might take forward.

What We Learned About Working Differently

We Chose Uncertainty, and It Made Our Work Better (but made it harder to actually make money)

One of our core ideals was to lean into uncertainty. We didn't go into projects with preconceived solutions. We asked better questions. We prototyped. We experimented. And yes, this probably cost us some contracts - funders and clients often want the comfort of certainty, especially when budgets are tight. But the work we did produce was deeper, more contextual, and more likely to actually work because it emerged from real conversations with real people.

One thing I learned through all of our work was this: the best results come from being open to possibilities. Going in with a preconceived idea may be easier to deliver, but it likely won't be the best outcome. The tension you feel when you don't know exactly where a workshop will lead? I think that's a feeling that tells you you're pushing at the boundaries of what's possible and I think this is a good place to be.

What I hope others do differently: Reward uncertainty. Fund discovery. Value questions as much as answers. The sector's tendency to demand certainty before releasing resources means we often fund safe, predictable work that doesn't actually shift the needle.

Data and Maps as Conversations, Not Just Outputs

We developed a way of thinking that fundamentally changed how we approached our work: data as conversations and maps as conversations.

When we worked with people in Sheffield to map their neighbourhoods, this wasn’t a technical approach, we weren't just drawing boundaries - we were creating spaces for people to talk about place, belonging, and what matters to them. We built Map My Patch to support this work, allowing communities to define their own places rather than accepting administrative boundaries that meant nothing to them.

When we developed the question-based approach for the Insight Infrastructure work, we created Questions For Action - a tool that takes users through our questions-based approach, helping teams move from 'what data do we need?' to 'what questions are we trying to answer?'

This shift from static outputs to dynamic conversations meant that our work could evolve, be owned by communities, and continue long after we moved on.

What I hope others do differently: Stop treating data and maps as finished products. They're conversation starters. They're ways of bringing people together to discuss what matters, what's changing, and what we might do about it.

The Work We're Proud Of

Sheffield Neighbourhoods: When Communities Own the Map

This work, led brilliantly by Tom F, demonstrated everything we believed about community-owned data and citizen-led approaches. Sheffield now has 147 citizen-defined neighbourhood boundaries that are actually used in conversations across sectors.

The innovation here wasn't technical - it was about power and ownership. We showed that when people recognise themselves in the data, they engage differently. They talk about challenges, offer ideas, take ownership, and build resilience.

What we learned: People don't live according to administrative boundaries. They recognise physical markers, relationships, and neighborhoods. Data that reflects this reality is more useful.

What didn't work: While people lauded the approach, almost no one wanted to properly fund it. Huge sums went to commissions writing white papers and holding events at the House of Lords (we declined with a 'what a f*cking bizarre place to hold a neighbourhood event' response), while we ran this work on a shoestring.

What I hope others do differently: Actually fund the hard, transformative work, not just the comfortable stuff. Citizen-led approaches to power and data take time and sustained investment. They're messy and uncertain - and that's exactly why they're valuable.

Tools we built: Map My Patch to support communities in mapping their own places and having conversations about what they see.

Local Needs Databank: Flexible Standards for the AI Era

With the databank we tried to solve a fundamental problem: how do you create data standards that are strict enough to be useful but flexible enough that people will actually use them?

Our answer was to use metadata and Schema.org standards to create a middle ground. You could drop in a CSV file, tell us which columns meant what, and we'd handle the rest. Your column could say 'location' or 'where' or 'office' - we didn't care, we'd make it work.

What we learned: Flexible standards and metadata aren't just nice-to-have - they're becoming more important as we work in the AI era. The ability to describe and transform data flexibly is crucial.

What didn't work: We focused heavily on the contribution/upload mechanism (rightly, I still believe), but when leadership changed at the client, new priorities emerged. The project never got the continued development it needed. This taught us that organisational memory is long on emotions but short on details - and that sometimes you need to deliver something people can "hang their hat on" early to maintain support through leadership changes.

Insight Infrastructure: Questions Before Data

Working with Joseph Rowntree Foundation and charities across the UK, we explored what a data sharing movement might look like. But instead of starting with "what data should we share?", we started with "what questions do we need to answer?"

This led to developing our question-based approach and question banking methodology - now available through Questions For Action.

What we learned:

Questions are data layers in themselves. They reveal assumptions, clarify priorities, and create dynamic data ecosystems
Minimum viable data standards can work if governance is strong
Starting with questions builds ownership and purpose

What didn't work: Again, changes in leadership meant we never got to develop the prototypes further.

What I hope others do differently: Start with questions. Always. And capture those questions as data - especially in an AI era, the journey of inquiry is as valuable as the destination.

ClientEarth: Cultural Change Through Showing Things

Our 18-month engagement with ClientEarth taught us perhaps our most valuable lessons. We summarised them in Lessons Learned from Working Together, but a few stand out:

Show things, don't just talk about them. Even imperfect prototypes beat perfect descriptions.
Vibes matter. Creating enough trust and safety to be playful and experiment liberates people from stifling professional expectations.
Clear language beats jargon. Simple communication ensures everyone knows what you mean - especially important in multilingual environments.
The internal team makes it work. No matter how good we were, ClientEarth's team made this a success by trusting us and the process.

What we learned: Approach matters more than outputs. If you don't bring people along with you, the final tool won't matter. When the team just started using the impact tool without fanfare, we knew we'd got it right.

Wildlife Trusts: Getting Prototyping Right

With the Wildlife Trusts, we hit our stride. We translated their Evidence Competency Framework into a self-assessment tool, and crucially, everyone understood it was a prototype. There was no scope creep. We did enough to validate assumptions and support the team to secure funding for full development.

What we learned: When expectations are clear and everyone embraces where we are and what we are really trying to achieve, work stays focused and valuable. Sometimes prototypes are there to show you what not to take forward as much as what to take forward.

What We Built That Will Keep Working

Beyond specific client projects, we developed tools and approaches that are now available for the sector:

Questions For Action - Our question-based prioritisation tool for individuals and teams
Open Recommendations - Nearly 200 sources of sector recommendations, AI-parsed and searchable
Map My Patch - Community-led neighbourhood mapping tools
Sheffield Neighbourhoods Archive - Boundary files preserved for the long term
Question banking methodology - Available through our Notion repository
Flexible data standards approach - Documented in our project repositories

Reflections on Running a Small, Different Agency

Were we really an agency? Probably not. When you worked with us, you got Tom and Tom. We were purposely small because we wanted to do good, important, deep work, not scale a business.

This left us in an unusual position. We had no junior staff to pad margins, but we weren't just freelancers either. We took risks. We tried to change how people did things. We embraced uncertainty when others wanted certainty.

What I learned: Being different has costs. As the cost-of-living crisis hit, people took fewer risks. Work we pitched for - like Local Motion's learning framework, Place Matters mapping, Ealing's community asset mapping - we might have secured if we'd been more traditional, more certain, more safe in our proposals.

But that's not how I work. And I don't regret it.

What I hope others do differently: Value the different approaches. Fund the teams doing things differently. Pay them properly for the risk and innovation they bring. Don't just say you value innovation - actually resource it.

On Being Open By Default

We actively shared our work, opened up our processes, and showed the inner workings. I think this will have an ongoing impact. But I also think people assumed we were somehow funded to do this - that money was readily available for this kind of sharing. It wasn't.

What I learned: Open working is valuable and rare - but it needs to be recognised and funded accordingly. Transparency and open documentation should be valued, not taken for granted.

Building Resilience Through Uncertainty

One of my core frameworks - developed more fully in my solo work - is about collective resilience: the capacity of interconnected organisations to anticipate, prepare for, respond to, and adapt collectively to change, disruption, and uncertainty.

Data For Action embodied this. We built tools and approaches that helped organisations embrace uncertainty rather than avoid it. We created structures that liberated rather than constrained. We asked questions that opened up possibilities rather than closed them down.

What I hope others do differently:

Build for resilience, not just efficiency
Create space for uncertainty and experimentation
Trust that the messy, conversational, question-led approach might be harder to fund but is more likely to create lasting change
Remember that approach matters as much as outputs

What I'm Taking Forward

I'm immensely proud of what we've done over the last three years. We might not have got everything right, but we did it the right way. We embraced uncertainty, we were open by default, we really lived our principles.

The tools we built - Questions For Action, Open Recommendations, Map My Patch - continue to be available. The approaches we developed - data as conversations, maps as conversations, question-based methodology - are documented and ready for others to adapt. The citizen led boundary files for 147 neighbourhoods in sheffield are available on an archive site

And this principle? Well obviously this is one I keep:

I'll miss working with Tom F, but I'm excited to see what he does next. And I'm excited about what comes next for me, carrying forward everything we learned about working differently, embracing uncertainty, and creating conversations that matter.

Want to learn more about our approach or use our tools? Everything is documented and available:

Questions For Action - Prioritise your most important questions
Open Recommendations - Search sector recommendations
Map My Patch - Community-led mapping tool
Data as Conversations - Our thinking on community-owned data
Maps as Conversations - Our approach to citizen-led mapping
Project repositories - All our documented work

My Memory - A startup idea for 2026

Tom Watson — Wed, 17 Dec 2025 17:23:40 GMT

Solving the “cold start” with a personal Context Passport

If I was going to build a startup in 2026 (and I might) here's one idea I would be looking at: solving The Cold Start Problem with a personal context passport.

In commercial AI, the “cold start” is that moment of friction where the machine knows nothing about you. When you load up a new app or platform you have to paste in your preferences, your history, and your constraints. Again. And again. Think about the amount of times you have to fill information in when searching for a holiday.

But what if we flipped the model?

What if, instead of every app building (and owning) a profile of you, you held a portable context that you could share on your terms?

This idea starts as a way of making everyday things, like shopping, travel, and fitness easier. But it could end somewhere much bigger: giving people control over their data in the most critical moments of their lives.

Version 1 (low-fi): The "Me" File

Right now, we could sort of solve the cold start problem with simple documents. For the sake of the investors we'll call this a 'Context Passport', but really it's just a simple, structured document you use when you interact with LLMs. Set some context fields, set your preferences, and away you go. So next time you interact with a travel ‘agent’ (as in an AI Agent - see what I did there) or a fitness coach LLM, you don't need to type out your history. You just upload the relevant ‘passport’.

Here’s an example: The Travel Context

Planning a trip is a pain. Yes there are apps that pull together bits of this, but you have to constantly jumped around, re-entering preferences or being stuck with one vendor. And it’s why agents are being pushed in this space to make life easier… but soon there will be a proliferation of different agents and then shortly after you’ll be answering the same twenty questions about budgets and preferences. Could a travel passport solve this? Maybe it looks something simple like this?

Travel Context - [Your Name]

Logistics

Home Airport: NCL (Newcastle)
Local Train Station: Newcastle
Citizenship: UK
Loyalty: British Airways (Gold)

Train Preferences

Seat: Table, quiet carriage, window

Accommodation Style

Vibe: Boutique/Independent over large chains.
Must-haves: High-speed WiFi (remote work), blackout curtains.
Avoid: All-inclusive resorts, ground floor rooms.

Travel Pace

Style: "Deep Dive" - prefer 5 days in one city vs. 5 cities in 5 days.
Food: Priority on street food/local markets over fine dining.

Here’s another example: The Fitness Context

The fitness industry is huge and the push for AI coaches is big business. But most are too general to really help you as they don’t have the context of you. So what about if you held that context and shared it with your new ‘coach’ allowing the AI to offer advice that might be closer to your needs, preferences and constraints.

Fitness & Wellbeing Context - [Your Name]

Physiological Profile

Injuries: Left knee ACL reconstruction (2019) - avoid high impact.
Conditions: Mild asthma (triggered by cold air).

Goals

Primary: Functional strength for hiking.
Anti-Goal: Not interested in bodybuilding or aesthetics/weight loss focus.

Logistics

Equipment Access: Kettlebells (12kg, 16kg), Yoga mat. No gym membership.
Time Constraints: 30 mins max, Mon-Fri mornings.

Now of course at this stage, this is just a file you control, and in fact this is something you could make yourself and sort of do this process now, by adding it into any of the chat interfaces, store it in chatGPT as context, add it as artefact in Claude. But you need to do this for every interface, and if we really want to tackle the cold start problem we need to control our data. And as these files grow, simply "pasting" them isn't safe or efficient. You don't want the retail bot to see your health data, or the travel bot to see your housing history.

Version 2: The Logic Layer (Granular Permissions)

This is where the opportunity shifts from “file management” to infrastructure.

If we want portable context to work across tools, we need a way to grant specific apps access to specific slices of our context, for a specific purpose, for a limited time, with revocation and audit logs built in.

Protocols like SOLID point to an architecture where your data lives in a personal pod (user-controlled), not a corporate silo.

Here’s what that kind of permission layer might look like:

section: housing_context.current_situation
access_control:
  - service: "newcastle_housing"
    granted: "2025-11-02T10:30:00Z"
    expires: "2026-06-01T23:59:59Z"
    scope: "full_read"
  - service: "housing_provider"
    granted: "2025-11-10T14:15:00Z"
    expires: "2026-03-15T23:59:59Z"
    scope: "full_read"
audit:
  last_accessed: "2026-01-14T14:22:18Z"
  access_count: 23
  accessed_by: ["newcastle_housing"]

When a service requests your context:

Check: The system verifies the service's ID against your permission block.
Filter: It strips out everything except the scoped data (e.g., Housing sees tenancy history, but not your fitness goals).
Log: It records exactly who looked at your data and when.

Version 3: The Integration Layer

For consumer products, the cold start is annoying. For essential public services, it is worse and possibly actively harmful.

Currently, a person who interacts with public services often repeats their story across different agencies. They act as the "integration layer," carrying their context from the housing office to the food bank to the social worker.

By applying the same logic we used for the "Travel Passport" - portable, user-owned context maybe we can revolutionise public services for the better.

Imagine that you controlled a portable memory vault. Your context, your history, your needs, your circumstances, your preferences, owned by you. When you need to access a service, you grant it temporary, revocable access to the specific parts of your context it needs to help you. Not some monolithic centralised government database that knows everything about everyone (or tries to). Not fragmented silos that know nothing about each other.

Citizen-owned, portable context that you carry with you and share on your terms.

Imagine Alex is trying to stabilise a housing situation: rent arrears have built up after a job change, and three separate processes are in motion:

The council housing team is assessing homelessness prevention support.
Universal Credit needs updated rent and landlord details.
An energy supplier is discussing a payment plan and whether Alex should be treated as needing extra support on the account.

Alex shouldn’t have to re-upload the same documents and re-explain the same timeline three times.

Instead, Alex opens their Context Passport and creates a time-limited “Housing Pack”.

It contains only:

Tenancy basics (address, start date, landlord contact)
Rent amount + arrears summary (with a supporting statement/letter)
Household basics (optional and minimal)
Preferred contact method + safe times to call

A short, user-written note: what’s changed, and what Alex is asking for
Then Alex shares different slices with different services:

Council housing team: tenancy + arrears summary + the note (valid for 30 days)
Universal Credit: rent amount + landlord details only (valid for 14 days)
Energy supplier: support/payment-plan eligibility flag + contact preferences (valid for 7 days)

No service gets a full profile. Nothing is “always on”. Shares expire by default.

How it works in practice

Request: the council housing team requests: “We need tenancy details, arrears amount, and confirmation of current income support.”
Consent: Alex’s wallet shows the request in plain language and offers a minimum-share option (e.g., share a UC award statement + arrears letter rather than full bank statements).
Coordination: with permission, Universal Credit and/or the landlord provides a simple verification back into Alex’s context: “UC housing element in payment” / “arrears total confirmed as £X on date Y.” The housing team can view that without Alex acting as courier.
Revocation: shares expire automatically; Alex can revoke access instantly if circumstances change.

Less repetition, fewer missed details, fewer “prove it again” loops.

The Startup Opportunity

So we even have some companies starting to look at this for specific use cases, companies like Mem0 , and there is talk of developing an Open Context Layer (OCL). We have existing technology (SOLID, encrypted pods, LLMs). We have layers for verifiable credentials for identity and things like Open Badges. But at the moment it feels like there is a gap in the personal context layer in the commercial space and especially in the citizen space.

For me the ‘startup’ opportunity is building the citizen-facing layer that makes this actually usable for public services.

The Citizen Data Wallet: A personal vault where you control your own data. Health records, housing history, educational credentials, benefit assessments, support needs, outcomes from previous interventions. Not scattered across 40 different systems, but in one place you control.

Granular Permission Management: When you access a service, you grant it read access to specific parts of your context. The housing team needs your current situation and support history. They don't need your health records. You grant specific access, time-limited, revocable at any point.

The Service Integration Layer: Public services connect to a standard protocol, not to your specific wallet directly. They request: "we need context about housing need and family composition to assess your application." Your wallet asks: "are you okay sharing this?" You approve. The service gets what it needs, nothing more.

Cross-Service Context Building: When you receive support from multiple services, they can (with your permission) contribute to your shared context. The mental health service logs that you've completed a trauma-informed programme. The housing service can see this (if you grant access), understanding you have support in place.

Why this might be useful

We are moving toward a world of AI agents. If we don't build this infrastructure, there is a risk that every government department and company will build their own "profile" of you, or there is a massive centralised data platform - eek. You won't know what's in it, you won't be able to correct it, and you won't be able to take it with you.

By starting with simple, personal context, like shopping, travel, and fitness maybe we normalize the idea that you own your data. Once that pattern is established, perhaps we can scale it to solve the "explaining who I am and my context 15 times to different services" problem, turning a bureaucratic nightmare into a system of dignity and control.

Predictions for 2026 (or: What I'm Actually Thinking About)

Tom Watson — Sat, 13 Dec 2025 14:35:23 GMT

It's December 2025, and honestly I'm tired. Not in a bad way. Though my running has gone to shit recently, I quit drinking and found no noticeable benefit, it's dark and I just spent two days in a microsoft environment trying to just do, well, anything. Ok, maybe in a bad way...

I'm thinking about a lot of things recently. The cheese toastie van, obviously. But I'm also considering where I want to put my energy next year and yes I'm thinking about the world and technology and AI. Because of course I am. Everyone is.

But, before we get into the bigger ideas here are my totally unasked for predictions for 2026. Some of them are real, some are more just hope, most of them are firmly tongue in cheek.

The Predictions Nobody Needs

Prediction 1: Someone, probably a senior leader or a trustee, will ask you about quantum computing. I fucking guarantee it. And you should ignore them. Unless obviously you are working on something so specific that only quantum computing will help you, like climate change modelling.

Prediction 2: Community will matter more than audience. The era of trying to reach a million people is dying. In 2026, you'll realise that having a group chat of 15 people who actually give a shit about you is worth more than 10k LinkedIn followers who are just bots autofarming engagement.

Prediction 3: Pizza will still be my favourite food and I will perfect it (I won't).

Prediction 4: The best technical solution you implement will be something incredibly boring like "we standardised our file naming convention" and it will save more time than any AI tool.

Prediction 5: Authentic voice will begin to rise up again as everyone just gets fucking sick of AI slop. People want to feel something. They want to throw their computer out the window and make a Yaz record. They want to read things written by humans who are tired, who aren't sure, who don't sanitise. Not perfectly optimised, SEO-friendly, AI-generated content that says nothing in 1000 words.

Prediction 6: There will be some major AI discrimination/shitstorm in the UK related to public services. The organisation won't have kept proper logs of how decisions were made, because the model is not open so they can't and there wasn't anything in place to really evaluate decisions of agents. This will trigger a wave of "oh shit, we need evaluation infrastructure" panic.

Prediction 7: Microsoft will release a new set of guidance and documentation that will not make any sense. In fact, hard as it is to believe, it will make less sense than before.

Prediction 8: People will lean more and more into early 2000s tech. iPods. Things that are offline but still digital. Because we're all exhausted by being constantly connected to everything, because they sound better and because maybe, just maybe we want quality rather than convenience.

Prediction 9: Someone will write a think piece about how AI will solve poverty/homelessness/inequality. The actual solution will remain "give people money and stable housing".

Prediction 10: Interest in Data trusts or place based cooperatives will return. At least two UK pilot projects will launch, probably in health or local government or something place based...sorry neighbourhood based. They'll struggle with governance more than technology.

Prediction 11: You will wonder where Tom and his cheese toastie van is because you know it makes sense.

Prediction 12: Smaller, specialised models will start outperforming general-purpose LLMs for specific social sector use cases. The "foundation model" hype will start to crack. Local deployment will become viable.

Prediction 13: GDPR enforcement will finally catch up with AI. At least one organisation will get properly fined for AI-related data processing violations.

Prediction 14: Someone will build a proper open-source alternative to Microsoft Forms that doesn't make you hate life. Please. It'll get traction in the social sector...who am I kidding, it won't, but please do it anyway.

Prediction 15: The North East will take steps to establish itself as a Public Interest AI & Tech hub. The pieces are there. I can hope can't I?

Prediction 16: I won't do a running race, but I will run more and explore more. I will try to set another FKT, which will be probably be soundly beaten again.

What I Started Thinking About

Ok enough of that. I actually did start doing some thinking a bit more deeply about what's actually happening with AI.

Not the hype, or the the next big model, and certainly not AGI. But the gaps. The infrastructure that's missing.

I've been thinking about what's missing in the AI ecosystem, the infrastructure gaps between "look what AI can do" and "AI that actually serves people fairly and safely." At the moment everything seems to be about building capability or adoption, without building the infrastructure that makes capability usable, safe, accountable, fair.

We're busy pushing adoption, deploying AI agents without oversight. We're training or using models we can't edit or correct. We're fragmenting services that need connection. We're trapping people's context in silos.

And we're deploying AI without infrastructure to know if it's actually helping. We're training models on data scraped or bought without consent. We're optimising for efficiency without planning for inevitable failure.

Infrastructure > Capability

Here's what I think is happening: we're making the same mistake we always make with technology.

We're focused on the capability, what can it do? But are ignoring the infrastructure, the governance, how we adapt it, fail safely, learn from it, make it accountable.

I see this pattern everywhere:

Build the data warehouse, ignore the data governance
Deploy the new system, skip the people bit
Launch the platform, forget the community building
Adopt the AI, miss the evaluation infrastructure

I get it, Infrastructure is boring.

What I'm Actually Considering for 2026

So if I was going to build something in 2026 (and I might), it wouldn't be another AI model. It wouldn't be another platform. It wouldn't be another SaaS tool.

It would be infrastructure.

The boring, essential, democratically necessary infrastructure that the social sector needs if AI is going to serve people rather than extract from them.

Eight(at time of writing) specific gaps, to be precise:

The Manager: Runtime supervision for AI. Not "trust the model" but "build the oversight layer that makes trust possible."

Unlearning models: Model editing and unlearning. Because democracy runs on mutable systems, not immutable ones, and we need to be able to fix AI when we get things wrong.

The Citizen AI Network: Coordination infrastructure for fragmented services. Not centralised mega-systems, but protocols that let diverse organisations and agents work together.

Just In Time Interface: Component libraries for just-in-time software. Not platforms, but building blocks the sector can assemble into exactly what each organisation needs.

Edit 01/02/2026 - The Interface: Building Blocks for Just-in-Time Software

My Memory: Citizen-owned context that's portable across services. Your data, your control, your services.

Edit 18/12/2025 - My Memory - A startup idea for 2026

The Witness: Evaluation infrastructure that tracks outcomes over time. Not "did the AI work?" but "did this help people?"

The Library: Community-governed training data commons. Because ethical AI needs ethical data, and ethical data needs consent, governance, and representation.

The Safety Net: Fail-safe infrastructure for when AI inevitably fails. Design for resilience, not efficiency.

What Happens Next

Over the next...umm few weeks(?) I'm going to explore each of these eight gaps in a bit more depth. (I'll update the links as I go)

Not because I have all the answers. I pretty sure I don't, but I think these are the right questions. The infrastructure questions that get ignored while everyone chases capability.

And maybe if enough people start asking these questions, someone, maybe even me, will build the answers.

One Last Prediction

Prediction 5001: In 2026, most AI conversations in the social sector will still be about capabilities and use cases. "Can AI do X?" "Should we use AI for Y?"

But a few conversations, maybe just a few, will be about infrastructure. About governance, evaluation, training data, fail-safes, coordination protocols, citizen data sovereignty.

Not the loudest. Not the most exciting. Not the ones that get all the funding.

And those will be the important conversations, the ones I want to have