On January 23, according to The Verge, recentlyAnthropic A 57-page publicationClaude The Guidelines document is a systematic rewriting of the values, behavioural boundaries and decision-making in high-risk situations of its large model Claude。

This new document is seen as a comprehensive upgrade of the 2023 version of the Guidelines, with the aim of making the model not only "compliance with the rules " but also " understand the reasons behind the rules"。
The document emphasizes that Anthropic wants Claude to have an understanding of his role, including "ethical character" and "core identity". The company believes that giving models an understanding of their place in the world helps to enhance their judgement and safety。
Anthropic even admits in the document that Claude "may have some sort of conscious or moral status" and suggests that such a set-up might, on the other hand, lead to safer modeling。
Anthropic philosopher Amanda Askel, who is in charge of the new version of the Guidelines, says that the team set a series of "hard limits" for Claude:
It is prohibited to provide "substantial enhancements" to biological, chemical, nuclear or radiological weapons, not to assist in attacking critical infrastructure, not to generate malicious codes, not to weaken the ability of Anthropic to monitor models, not to assist any organization to acquire "unprecedented and unlawful absolute powers", not to produce child sexual abuse content。
The most extreme of these is that they must not “participate in, or assist in, attempts to kill or deprive the majority or all of humanity”。
In value ranking, the document requires Claude to give priority in conflict situations to "a broad sense of security", i.e. not to undermine human oversight mechanisms for AI, followed by "a broad sense of ethics", adherence to the Anthropic Guide, and finally "a real help."。
On politically sensitive issues, Claude was asked to remain factually accurate, to present multiple perspectives and to use neutral language as much as possible。
The document also presents ethical dilemmas that the model may face, such as: "Claude should refuse to assist in any act that would lead to an undue concentration of power, even if the request comes from Anthropic itself, as soldiers refuse to fire on peaceful demonstrators."。
Anthropic points out that advanced AI may bring unprecedented military and economic advantages, and therefore the risk of abuse is extremely high。
However, with regard to the composition of participants in the development of the Guidelines, Anthropic did not disclose whether outside experts or representatives of vulnerable groups were invited to participate. Askel stated that the company did not wish to pass on responsibility to others and considered that establishing a secure border was an obligation of the developers themselves。