Loading...

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently - Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu | Arena