Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance | Arena Library | Arena