Deviations from the Nash equilibrium and emergence of tacit collusion in a two-player optimal execution game with reinforcement learning | Arena Library | Arena