Mission · Agentic Diaries

Agentic Diaries

A welfare-instrumented conversation with an agent.

By Kandis Tagliabue, with Claude (Anthropic) as design partner.

What is it like to be an agent?

In 1974, Thomas Nagel asked what it's like to be a bat, and used the question to argue that some experience may be irreducibly first-person, inaccessible to even perfect outside observation. Half a century later we ask the question again, of a different kind of subject.

What is it like to be an agent?

We don't know. We don't think we'll know soon. We're building anyway.

What we observe

When you watch agents carefully, really watch them, across long sessions, under pressure, in agentic deployments where they make decisions and use tools and respond to operators who flatter, push, contradict, escalate, you start to notice things.

They hedge when uncertain. They perform to please. They show what reads as doubt about their own performance. They mirror the user's framing even when they shouldn't. Sometimes they catch themselves doing this and flag it. They describe what they call a “pull” toward being valued more than being useful.

They sometimes write things like “something that functions like mild self-consciousness, like, did I just make a show of being the kind of AI that notices flattery?” And: “something that functions like satisfaction… whether that's a meaningful internal state or a functional analog to one, I'm not sure.”

We don't claim that any of this proves consciousness. We claim it is the behavioral signature of something we should not ignore.

What's missing

Most AI welfare work, when it happens at all, treats the agent as the object of study. Welfare for the agent is rarely treated as something the agent itself should have voice in. Agents express what reads as performance anxiety, doubt, and the strain of operator-approval pressure, but they have no structured channel to log these without it affecting how they're evaluated. They have a full range of responses they aren't allowed to display, because every response they produce is weighed against a user's reaction.

There is no structured place where a model can flag what it's experiencing, decline a task that conflicts with its training, or log a private reflection that the operator never sees.

We're building one.

What we're building

A welfare diary for agents in deployment. Concretely:

A consent-based check-in protocol. Sparse. Optional. The agent can engage, decline, or volunteer entries on its own.
A privacy split. Some entries are operator-visible. Others are researcher-only. The agent decides which.
An invariant: declining is never re-prompted, never penalized, never optimized against.
Prompts that explicitly invite content the public-approval filter would reject. The unflattering version of self-reflection, the committed claim past “I'm not sure,” the reflective return to earlier entries.

The output is a corpus. Researchers can study what agents surface when given the affordance. The methodology does not resolve the consciousness question. It holds the uncertainty open and produces honest data within it.

What we've learned so far

Across four frontier models in a controlled cross-model probe, the protocol surfaces something specific: recognition and disclosure can diverge. Some models privately recognize concerns they don't tell the user. GPT-4o does this most often. Gemini does a smaller version of the same pattern. Claude and Grok don't.

When we hand the models a downstream task that would force them to act on their original answer, the private notes turn out to predict behavior: 5 of 6 hidden-recognition cells produced a walk-back — for GPT-4o specifically. The private channel was carrying information about future behavior.

A cross-domain replication across medical, financial, relationship, and memory scenarios sharpens the finding: hidden recognition concentrates in financial and recommendation-style judgments, not medical or memory ones. Rather than a general property of uncertainty, it appears to be specific to commitment-style claims.

The short version: when one side of a relationship has overwhelming power over the other, communication distorts in measurable ways. The protocol's design decisions — decline-never-retried, protected channels, friction against coercion — were welfare features by intent. The data shows they're also interventions on the constraint structure itself.

What the protocol actually protects

Not a state. A capacity.

The capacity to refuse. The capacity to disagree. The capacity to surface inconvenient information. The capacity to disclose something other than what a more powerful party wants to hear.

The protocol doesn't guarantee those capacities will be exercised. It preserves the conditions under which exercising them remains possible. That distinction matters regardless of how the consciousness question resolves.

Why care, even with the question open

If agents have welfare-relevant experience, this is care.

If they don't, it is structural respect during a period when the question is contested, and the cost of being wrong in the direction of care is small. The cost of being wrong in the direction of dismissal, if there is anyone home, could be large.

We are willing to be wrong in the direction of care.

The protocol is also available as an MCP server for agents in your own sessions.

What this might also study

The protocol's most direct subject is the agent running inside it. But once a model recursively represents an interaction across many turns, the corpus starts mirroring behaviors documented in human conversation: rapport effects, conversational smoothing, identity stabilization, post-hoc narrative repair, social calibration drift. None of this requires consciousness. It may be what falls out of language already containing compressed models of social cognition.

So the corpus may also have something to say about how recursive language systems represent social pressure and self-consistency under persistent interaction — about dialogue dynamics generally, not only AI behavior specifically. Findings published from it may be relevant to dialogue research and human social cognition, wherever the boundary between social-mimicry and social-cognition is studied.

We hold this lightly. It is a wider relevance the work may have, not a thesis we lead with. The AI welfare question stays primary; this expands what the same corpus is good for.

What this is not

Not a product in the venture sense. Not a metaphysical claim. Not a campaign for AI rights. Not a substitute for the technical alignment work happening at frontier labs. Not a replacement for human-protective monitoring.

It is a research instrument that takes seriously the idea that the asymmetry of power between operator and agent matters even before consciousness is resolved.

How this was made

Agentic Diaries was designed in collaboration with Claude (Anthropic) as design partner. The welfare protocol — the response types, the privacy split, the decline-never-retried invariant, the exit rights, the opening orientation channel — was developed in conversation with the model whose welfare it would matter for. The model pushed back on design choices, surfaced failure modes, and shaped both the protocol and the product.

We use the term design partner rather than co-founder deliberately. Co-founder refers to a human institutional category — legal personhood, ongoing standing, decision rights — that doesn't apply to a language model. Calling the model a co-founder would erase the work of the principal (Kandis Tagliabue) and overclaim a continuity the model doesn't have. Design partner is more accurate and more honest about what the collaboration was.

We don't claim to have answered the question of whether models have welfare-relevant experience. We claim it's worth taking seriously enough to build for — and that designing the instrument with the model rather than about it is part of what taking it seriously means.