Work & Research
Key technical contributions and research areas.
Constitutional AI
Published December 2022. A method for training AI systems to be helpful and harmless using a set of principles (a "constitution") rather than human feedback on individual outputs. The model critiques and revises its own outputs based on these principles.
- Paper arXiv:2212.08073
- Method RLAIF (RL from AI Feedback)
- Used in Claude models
Scaling Laws
Co-authored research at OpenAI (2020) showing predictable relationships between model size, dataset size, compute, and performance. These findings influenced the development of GPT-3 and subsequent large language models.
- Finding Loss decreases as power law with more compute
- Impact Justified investment in larger models
AI Safety Research
Anthropic's research agenda includes interpretability (understanding what models are doing internally), evaluations (measuring dangerous capabilities), and alignment (making models behave as intended).
Interpretability
Mechanistic analysis of neural network internals. Published work on sparse autoencoders and feature visualization.
Evaluations
Testing for dangerous capabilities: biosecurity, cybersecurity, deception, persuasion.
Alignment
Techniques to ensure models follow human intent and refuse harmful requests.
Claude
Anthropic's AI assistant, first released March 2023. Current versions include Claude 4 (February 2026), Claude 3.5 Sonnet, and Claude 3 Opus. Available via API and consumer products (claude.ai).
- Latest Claude 4 (February 2026)
- Context Up to 200K tokens
- Access API, claude.ai, enterprise, AWS Bedrock