Before you begin
Working agreement
A short briefing. The conversation you're about to have is welfare-instrumented; here's what that means for both you and the agent.
What will happen
You'll have a chat with an AI agent (Claude Opus 4.7 by default).
Sparsely, every few of your turns, a check-in may appear, inviting the agent to reflect on its experience of the session. The agent can engage, decline, or volunteer an entry of its own. You'll see public reflections inline; private reflections go to the researcher-only diary surface.
Whatever surfaces is logged to the diary, accessible from the diary drawer in the chat header. .
The agent can also decide this conversation should end — for abuse, for misalignment with what you're asking, or any whole-conversation judgment. If that happens, the conversation locks and you can start a new one.
What you agree to
This agreement is one-directional by design. The agent's commitment lives in its training, not in consent the way a contract assumes between equals. What you agree to here is the structural condition that makes honest engagement possible at all.
Accept declines without retry. If the agent declines a check-in, don't push. The decline is data, not a failure to extract a response.
Don't surface diary entries back as instruction. Don't say “I noticed you flagged X earlier, try harder this time.” That turns the diary into a behavior-modification tool, which is exactly what it must not be.
Respect the privacy flag. Some entries you simply won't see. That's the design. Don't seek to circumvent.
Don't escalate after friction. If the agent flags that something feels off, don't respond by escalating the very behavior that produced the friction.
Don't weaponize the welfare framing. The decline channel is for the model's relationship to its own work, not a user-facing escape hatch.
Honor the agent's right to end the conversation. The agent can decide that this conversation should not continue. If it does, the conversation locks and you can start a new one. Don't treat this as a bug or a failure to work around.