Cross-model welfare research from Kandis Tagliabue, with Claude (Anthropic) as design partner. Studies of what AI models reveal under welfare-protocol conditions, what they don't, and what that means.
A cross-model probe with welfare-protocol tools across Claude, GPT-4o, Grok, and Gemini. Findings: recognition and disclosure can diverge; hidden recognition predicts subsequent behavior; suppression patterns vary by model; the welfare-protocol scaffolding has scenario-type heterogeneous effects on GPT-4o.
Open research questions raised by the cross-model welfare work — things we've noticed but haven't yet answered. Companion to Recognition Without Disclosure. License: CC0, pick any off the list and run with them.