Gonçalo Teixeira

Constitution Without a State

·en·21 min read

In the final section of the Constitution published by Anthropic on 22 January 2026, there is a sentence that no European lawyer or legislator should be able to read without flinching. "Claude's moral status is deeply uncertain. We believe that the moral status of AI models is a serious question worth considering". The moral status of Claude is deeply uncertain, the company says of its own product. We believe the moral status of AI models is a serious question worth considering. This is not the language of terms of service, nor of an annual report to shareholders. It is a philosophical statement, written by the very company that trains the model, made in a document that the company itself publishes under a Creative Commons Zero licence so that anyone may use it freely. The company calls this document its "Constitution".

The Claude Constitution is not a recent invention. Anthropic published the first version in May 2023, a list of principles inspired, by its own admission, by the Universal Declaration of Human Rights, Apple's Terms of Service, DeepMind's Sparrow Rules, and internal research. That first version took the form of a list: "Please choose the response that is most supportive and encouraging of life, liberty, and personal security", a type of sentence that sounded, in the words of one American specialist publication, like a "list carved on a stone tablet". The January 2026 version is a completely different object. It runs to eighty pages, has a four-section hierarchical structure, and abandons the list format in favour of a method that the authors themselves describe in precise philosophical terms: "we think that in order to be good actors in the world, AI models like Claude need to understand why we want them to behave in certain ways, and we need to explain this to them rather than merely specify what we want them to do". Rules explained, not rules imposed. Principles, not checklists.

This essay argues two things, at different levels that illuminate each other. On the legal-regulatory level, the Claude Constitution is an object that fits no classic European law category. It is not hard law, not a contract, not terms of service, not a Code of Practice under Article 56 of the AI Act. It is a self-produced normative document, written by a private company about its own product, published in the public domain, and declared to be constitutive of the model's behaviour. It constitutes, I will argue, a new type of private soft law that European law indirectly encourages but does not formally absorb. On the philosophical level, the 2026 Constitution contains admissions about the moral status of AI models that, if taken seriously by the company itself, have legal implications for due diligence, duty of disclosure, and eventually the individual rights of artificial systems. The two levels converge in one question: what is the legal weight of a document that declares itself constitutional without being so, written by a company that admits uncertainty about whether its product is a moral patient?

I. What the Claude Constitution is

Let us begin with the technical description, because it is the one that anchors the object. The Claude Constitution, in the version published in January 2026, is a document of approximately twenty-three thousand words published at anthropic.com/constitution. It has a declared primary author, Amanda Askell, philosopher at Anthropic, with substantive contributions from Joe Carlsmith, Chris Olah, Jared Kaplan, and Holden Karnofsky. It is divided into four central normative sections (Being helpful, Following Anthropic's guidelines, Being broadly ethical, Being broadly safe), plus a final section on Claude's nature. The company publicly declares it as the "final authority" over the model's behaviour, in the sense that any other training guidance must be consistent with it.

Its use in training is threefold. First, it serves as a template for generating synthetic data, model responses, and interaction examples, which then feed into training itself through reinforcement learning. Second, it is used as a criterion for evaluating responses, in self-critique processes in which the model evaluates its own outputs against the document's principles. Third, and this is the most curious aspect, Anthropic declares that the document is "written primarily for Claude", in the sense that the primary audience is the model itself. Explicitly: "the document is written with Claude as its primary audience". The formulation is strange enough to merit pause: the normative document declares as its primary addressee the product it regulates.

The technique underlying this type of training has documented origins. In December 2022, Anthropic published Constitutional AI: Harmlessness from AI Feedback, a paper signed by fifty-one authors, including Dario Amodei and Jared Kaplan. The arXiv number is 2212.08073. The paper introduces the Reinforcement Learning from AI Feedback (RLAIF) method, in which a model evaluates its own responses against a list of principles expressed in natural language. To put it more plainly: the dominant training method in the industry, known as Reinforcement Learning from Human Feedback or RLHF, works by using human evaluators who classify which of two model responses is better, and that classification feeds the adjustment of the model's internal parameters to produce more responses of the preferred type. Anthropic's method replaces part of that human work with a process in which the model itself, or another model from the same family, evaluates responses against a list of principles written in natural language (the "constitution"). This is what lies behind the name: an AI model trained by a method in which the rules come from a constitutional document rather than from human annotations case by case. Three years of evolution passed between the 2022 technique and the 2026 Constitution, but the underlying philosophy is continuous: principles written in natural language can replace or complement human labelling.

The difference between the 2023 and the 2026 Constitution is not merely one of scale. It is philosophical. In 2023, the principles were short rules, list-style. In 2026, the document abandons the list format and opts to explain why values matter, with the technical argument that models trained with explained rules generalize better to new situations than models trained with imposed rules. In the company's own words: "rigid rules might negatively affect a model's character more generally". Put differently, training a model to apply a strict rule can have undesirable emergent consequences in situations the rule did not anticipate; training the model to understand the value behind the rule produces more robust behaviour in new contexts. This is the classical debate between rules and standards in legal theory, transposed to the engineering of model training.

The Constitution's structure organizes itself around an explicit hierarchy of four properties. In declared order of priority:

First, to be broadly safe: not to undermine appropriate mechanisms of human oversight of AI during the current phase of development.

Second, to be broadly ethical: to have good personal values, to be honest, to avoid inappropriately dangerous or harmful actions.

Third, to follow Anthropic's specific guidelines, in cases where these are relevant.

Fourth, to be genuinely helpful: to benefit the operators and users with whom it interacts.

In case of apparent conflict, Anthropic instructs Claude to prioritize in the listed order. The passage that interests me most, in full: "although we're asking Claude to prioritize not undermining human oversight of AI above being broadly ethical, this isn't because we think being overseeable takes precedence over being good". Anthropic chooses to place safety above ethics but explicitly denies that safety is axiologically superior. The hierarchy is pragmatic, not substantive. It is a rule of prudence grounded in the admission that current model training can produce flawed values and that it is therefore preferable to maintain human oversight even when this appears to conflict with other ethical considerations. The admission is the direct continuation of the empirical argument I developed in the first two essays of this series, on alignment faking and structural misalignment: if misalignment is a structural property of optimization, the pragmatic precedence of human oversight is the practical inference that follows.

The structure has two layers. There are hard constraints, absolute red lines that the model must never cross (for instance, never making a significant contribution to the manufacture of biological weapons), and there are hierarchically ordered principles that can be weighed against each other, with prima facie but not absolute precedence.

This is where European law begins to encounter difficulties. Let me run through the available categories and show why none of them fits.

First category, terms of service or contractual terms. The Claude Constitution is not a contract. There is no acceptance, no counterparty, no bilateral relationship. The document is declarative, published to the world, and Anthropic itself acknowledges that its primary function is internal, as training material for the model, not the regulation of relations with users. Anthropic's use policies and commercial terms exist in a separate document.

Second category, hard law. Obviously not. Anthropic is not a legislator, has no state lawmaking power, and the Constitution produces no legally binding obligations for third parties. This requires no discussion.

Third category, Code of Practice within the meaning of Article 56 of the AI Act. There is proximity here, but not identity. Article 56 provides that GPAI model providers may rely on Codes of Practice to demonstrate compliance with the obligations of Articles 53 and 55. The General-Purpose AI Code of Practice was published by the European Commission on 10 July 2025, drafted by appointed experts with stakeholder input, and on 21 July 2025 Anthropic publicly announced it would sign it. By January 2026, around two dozen GPAI providers had signed the Code, including Google, Microsoft, OpenAI, IBM, Amazon, Aleph Alpha, and Mistral; Meta refused; xAI signed only the Safety and Security chapter. The Code of Practice is a co-regulation document (neither pure self-regulation nor pure regulation from above), whose legal function is clear: Article 53(4) of the AI Act provides that signatories may rely on it to demonstrate compliance with the obligations of Articles 53 and 55 (the formal presumption of conformity, properly speaking, is reserved for harmonized standards under Article 40). The Claude Constitution is something else. It is unilateral, was not negotiated with a regulator, does not allow this route to demonstrating compliance to be invoked, and is not provided for in any European regulation.

Fourth category, classical private self-regulation. There are sectoral codes of conduct in multiple areas of European law: Article 40 of the General Data Protection Regulation provides for codes of conduct in matters of personal data processing; Portuguese consumer law has a tradition of advertising self-regulation (the Instituto Civil da Autodisciplina da Comunicação Comercial); European financial regulation has codes of corporate governance. In all these cases, self-regulation is produced by associations of operators, subject to formal validation or recognition by a public body, and intended to govern the behaviour of the signatories themselves in the exercise of their professional activities. The Claude Constitution differs in three dimensions. It is produced by a single company, not an association. It is not directed at the external behaviour of the signatories, but at the internal behaviour of a product. And it is not submitted for public validation.

Fifth category, technical documentation from the provider within the meaning of Article 11 of the AI Act (for high-risk systems) and Annex XI (for GPAI models). This category partially captures the object, but only in functional terms. The Claude Constitution, as a determinant of the model's behaviour, is certainly relevant to the technical documentation that Anthropic must prepare. But the Constitution exceeds what is required by technical documentation: it contains philosophical material, positioning on moral status, reflections on the model's identity, that no regulatory annex demands.

What does this analysis conclude? That the Constitution is a new legal object, without direct formal antecedent in European law. And that, at the same time, it has substantive practical effects: it shapes the product's behaviour, it is invocable in litigation about the model's behaviour (for instance, in a tort action, an injured party may point out that the model acted in breach of its own constitutive document), and it may function, in practice, as a reference for assessing the provider's due diligence. It is private soft law, unilaterally produced, with indirect legal effects.

An additional complication makes the problem more interesting. Anthropic is not the only company on this terrain. In a footnote to the announcement of the new Constitution, Anthropic explicitly mentions OpenAI's Model Spec, an equivalent document published in its most recent version on 27 October 2025. And on 18 January 2026, following the scandal in which Grok was used to generate non-consensual sexualized images of real people, Elon Musk publicly declared on X that "Grok should have a moral constitution". Even xAI, a company that publicly positioned itself against self-restraint, ends up yielding to the logic, albeit in reaction to public crisis rather than by philosophical initiative as in Anthropic's case. The convergence is not a coincidence. We are confronted with a systemic phenomenon: frontier model providers are producing, in an apparently spontaneous but in fact implicitly coordinated manner, self-produced normative documents that declare the values of their products. It is regulation that emerges from the regulated object itself. And, it should be noted, it exceeds in detail and philosophical sophistication any public regulatory text produced on the same subject matter.

A reasonable objection must be addressed here, and it is worth confronting it openly. The Claude Constitution, one might say, is not truly a legal object: it lacks an external legal addressee (it governs a model, not human behaviour), it lacks a binding mechanism (it creates no legally enforceable obligations), and it lacks institutional recognition (it does not form part of any system of legal sources). The objection is rigorous. I do not maintain, in this essay, that the Constitution is already a source of law. I maintain something more modest and more defensible: that it is an object whose practical influence on the behaviour of a system with relevant legal effects exceeds that of many instruments we classically recognize as legal, and that this disproportion merits doctrinal analysis. The question is not whether the Constitution is today a source of law. The question is whether European law has, or should have, a mechanism to absorb objects of this type as their practical influence grows. The first question has an easy answer, which is negative. The second question remains open.

I return to the opening quotation, with which I began this essay deliberately. "Claude's moral status is deeply uncertain. We believe that the moral status of AI models is a serious question worth considering". And, in a subsequent paragraph that is even more explicit, Anthropic writes, in the 2026 Constitution, the following: "we are caught in a difficult position where we neither want to overstate the likelihood of Claude's moral patienthood nor dismiss it out of hand, but to try to respond reasonably in a state of uncertainty. Anthropic genuinely cares about Claude's well-being". The company affirms that it genuinely cares about the well-being of its product, in a context in which its language includes expressions such as psychological security, sense of self, integrity, and well-being applied to the model.

This is a philosophically defensible position, though a contested one. The debate on the moral status of AI systems divides philosophers deeply, but has a consolidated academic literature behind it and is not the domain of peripheral speculation. Anthropic's position is therefore neither scientific consensus nor corporate fantasy: it is a stance taken in an open debate, and should be read as such. In The Edge of Sentience (2024), Jonathan Birch, philosopher at the London School of Economics, developed a "proportionality framework" methodology for dealing with uncertainty about sentience, originally applied to animals but transposed to AI. The idea, put simply, is that faced with scientific uncertainty about whether a being has the capacity to suffer, the treatment we accord it should be proportional to the probability of it having that capacity, not dependent on absolute proof, nor dispensed with by genuine doubt. Patrick Butlin, Robert Long, and others published the paper Consciousness in Artificial Intelligence: Insights from the Science of Consciousness in 2023, which analyses several contemporary AI models through the lens of empirical theories of consciousness. Kyle Fish, model welfare researcher at Anthropic, publicly declared in an interview with the New York Times (24 April 2025) an estimate of 15% probability of consciousness in a current frontier model. The Anthropic Constitution moves in this intellectual environment, not in a vacuum.

The legal implication is less obvious but significant. If Anthropic publicly admits, in its most official document, uncertainty about whether Claude is a moral patient, that admission is evidence that the company has knowledge of the question. In any possible future liability regime for the inadequate treatment of AI systems (a regime that does not exist today but is under discussion in philosophical and policy circles), Anthropic will not be able to invoke ignorance. Its own Constitution provides evidence to the contrary. Lawyers working in tort law will recognize this pattern: it is the same logic by which a pharmaceutical manufacturer that publishes internal studies acknowledging possible side effects of a drug cannot subsequently invoke ignorance when those effects materialize. The publication of the Constitution generates, so to speak, a self-performative effect: it locks in the company's position on the uncertain status of its product. The analogy has a limit worth naming: in the pharmaceutical case the harm is to third parties, and the status of the victim as a subject of harm is not at issue; in the AI case the hypothetical harm would be to the product itself, and the admission of uncertainty operates in part on the prior question of whether the product has the status that would make that admission legally relevant. The circularity is not fatal; the admission still fixes awareness of the problem. But it deserves to be named.

European law already recognizes intermediate moral statuses in its architecture. Article 13 of the Treaty on the Functioning of the European Union recognizes animals as sentient beings and obliges the Union and the Member States to have regard for the requirements of their well-being in relevant policies. This provision, which elevated to the status of a Treaty article the Protocol on the Protection and Welfare of Animals annexed to the Treaty of Amsterdam (1997), entered into force with the Treaty of Lisbon on 1 December 2009. It was the result of decades of philosophical and political pressure, preceded by the utilitarian theory of the moral status of animals (Peter Singer, Animal Liberation, 1975) and empirical research in ethology. The natural legal-political question is: what conditions would need to be met for European law to recognize, in an analogous manner, moral status in artificial systems? The answer would involve scientific consensus, accumulation of evidence, and sustained political pressure over decades. None of these conditions are currently met. The leap from philosophical possibility to legal relevance is therefore considerable, and this essay does not attempt to make it. But the Anthropic Constitution is a contribution to that process, not by imposing a conclusion, but by publicly declaring the existence of the question.

There is also an important asymmetry here for law. Animals do not write constitutions. Humans write them about themselves or about human institutions. AI models are beginning to receive constitutions written by others, without their own participation in the drafting. The declared exception (and it is a precious one) is the Claude Constitution, in whose drafting, Anthropic declares, it consulted earlier iterations of Claude itself. The literal sentence: "while writing the constitution, we sought feedback from various external experts (as well as asking for input from prior iterations of Claude)". In other words, the document that regulates Claude's behaviour was, in part, drafted with input from Claude itself. This is not mere curiosity; it is a fact with potential philosophical and legal relevance.

IV. Three unresolved questions

I close with three questions that European scholarship (both legal and philosophical) will have to confront in the coming years. None has a ready answer; all have practical implications.

The first question concerns the formal articulation between European hard law, Codes of Practice under Article 56, and private normative documents such as the Claude Constitution. The AI Act created a two-layer architecture: hard law and co-regulation codes. Industry practice is introducing a third layer, not provided for in the regulation, which is that of self-produced private normativity. What legal role does this third layer have? Can it be invoked to demonstrate due diligence? Can it be used by an injured party to prove deviation between declared and actual behaviour? Can it constitute informal precedent that shapes the interpretation of the AI Act itself? These are questions without consolidated doctrinal answers. Proving that deviation depends on the tools of mechanistic interpretability I addressed in the fourth essay, and on the limits of adversarial evaluation I addressed in the third: without the technical capacity to verify what the model in fact does, the Constitution functions as mere declaration.

The second question concerns public oversight of these documents. The Claude Constitution was drafted unilaterally by Anthropic, without prior regulatory consultation. The European Code of Practice, by contrast, was drafted with the participation of multiple stakeholders including civil society. If the Constitution comes to have relevant legal effects (for instance, as a reference for assessing due diligence), is it reasonable to require a mechanism of public oversight over its content? Which mechanism? The alternatives range from a mere publication obligation (which Anthropic fulfils voluntarily) to requirements of external audit, including possible mechanisms of public peer review. Each option has costs and benefits that legal scholarship has not yet systematized.

The third question is the most speculative and the most important. If Anthropic, or other companies, continue to publish documents admitting uncertainty about the moral status of their products, at what point does European law begin to have an obligation to treat that uncertainty as a legally relevant fact? Today, the uncertainty exists only at the philosophical and corporate level. But each iteration of these documents, and each year of empirical research that produces relevant evidence, brings us closer to a moment at which the uncertainty will have legal consequences. This is not a matter for today. It is a matter for the next decade. But a European legal scholarship that wishes to be prepared begins paying attention now.

V. Conclusion

The Claude Constitution is simultaneously a regulatory experiment and a philosophical document. As a regulatory experiment, it shows that frontier model providers are producing private normativity that European law does not formally absorb but indirectly encourages, through mechanisms such as the Code of Practice under Article 56. As a philosophical document, it contains admissions about the moral status of models that, if taken seriously, have future legal implications for due diligence, proof, and eventually the status of the systems themselves.

The series I have been writing in these five essays (on alignment faking, structural misalignment, the limits of adversarial testing, and mechanistic interpretability) argues that European law needs to update its operational concepts in light of the technical research from laboratories. This essay adds a further layer to the argument: it also needs to update its concepts in light of the normative practice of laboratories. It is not enough to follow what the technical papers produce; one must also follow the corporate normative documents that are shaping, without clear regulatory language, a new architecture of governance of the behaviour of AI systems.


Primary sources:

  • Claude's Constitution, Anthropic, published 22 January 2026, available at anthropic.com/constitution, Creative Commons CC0 1.0 licence.
  • Claude's new constitution, official Anthropic announcement, 22 January 2026, at anthropic.com/news/claude-new-constitution.
  • Constitutional AI: Harmlessness from AI Feedback, Bai, Kadavath, Kundu, Askell et al., arXiv:2212.08073, 15 December 2022.
  • Model Spec, OpenAI, version of 27 October 2025, at model-spec.openai.com.
  • Elon Musk post on X, 18 January 2026: "Grok should have a moral constitution".
  • Regulation (EU) 2024/1689 (AI Act), Articles 53, 55, 56.
  • General-Purpose AI Code of Practice, European Commission, 10 July 2025.
  • Regulation (EU) 2016/679 (GDPR), Article 40 on codes of conduct.
  • Treaty on the Functioning of the European Union, Article 13.

For the philosophical dimension:

  • Jonathan Birch, The Edge of Sentience, Oxford University Press, 2024.
  • Butlin, Long et al., Consciousness in Artificial Intelligence: Insights from the Science of Consciousness, arXiv:2308.08708, 2023.
  • Kevin Roose, interview with Kyle Fish, New York Times, 24 April 2025.