Representations for Stable Off-Policy Reinforcement Learning