Paper Note - Anthropic Constitutional AI

Constitutional AI: Harmlessness from AI Feedback

Paper Note - Anthropic RLHF

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

What is Human intent & Human Preference in RLHF?