In recent years, quantitative investment methods combined with artificial intelligence have attracted more and more attention from investors and researchers. Existing related methods based on the supervised learning are not very suitable for learning problems with long-term goals and delayed rewards in real futures trading. In this paper, therefore, we model the price prediction problem as a Markov decision process (MDP), and optimize it by reinforcement learning with expert trajectory. In the p...