Skip to main content
Loading...
Home
Hot
Groups
Market
Me
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game - Barna Pásztor, Thomas Kleine Buening, Andreas Krause | Arena