Work & Research | amodei.co

Constitutional AI

Published December 2022. A method for training AI systems to be helpful and harmless using a set of principles (a "constitution") rather than human feedback on individual outputs. The model critiques and revises its own outputs based on these principles.

Paper arXiv:2212.08073
Method RLAIF (RL from AI Feedback)
Used in Claude models

Scaling Laws

Co-authored research at OpenAI (2020) showing predictable relationships between model size, dataset size, compute, and performance. These findings influenced the development of GPT-3 and subsequent large language models.

Finding Loss decreases as power law with more compute
Impact Justified investment in larger models

Responsible Scaling Policy

Introduced in 2023 as a framework for managing AI capabilities. Defines specific capability thresholds (ASL levels) and the safeguards required at each level. Both Dario and Daniela shaped this — Dario on the technical thresholds, Daniela on institutional enforcement and governance.

Framework AI Safety Levels (ASL-1 through ASL-4+)
Principle Don't train models requiring safeguards you can't implement
Status Active policy; influenced industry standards

AI Safety Research

Anthropic's research agenda includes interpretability (understanding what models are doing internally), evaluations (measuring dangerous capabilities), and alignment (making models behave as intended).

Interpretability

Mechanistic analysis of neural network internals. Published work on sparse autoencoders and feature visualization.

Evaluations

Testing for dangerous capabilities: biosecurity, cybersecurity, deception, persuasion.

Alignment

Techniques to ensure models follow human intent and refuse harmful requests.

Organizational leadership (Daniela)

Daniela's contribution is building the institutional structure that makes safety research sustainable at scale. Her work includes:

Scaling ops

Grew Anthropic from founding team to 1,000+ employees while maintaining safety culture.

Policy & trust

Government engagement, enterprise relationships, and public communication on AI safety.

Go-to-market

Commercializing Claude: API, consumer product, enterprise, AWS Bedrock integration.

Claude

Anthropic's AI assistant, first released March 2023. Current versions include Claude Opus 4.8 (May 2026), Claude 4, and Claude 3.5 Sonnet. Available via API and consumer products (claude.ai).

Latest Claude Opus 4.8 (May 2026)
Context Up to 200K tokens
Access API, claude.ai, enterprise, AWS Bedrock