Back in the summer of 2017, in-between a few farewell vacations around Europe, I became a (then-remote) member of the newly-formed Google Brain team in Montreal. My home office offered a stunning view of Belsize Park in northern London, and hosted the entirety of Google Montreal’s RL group: yours truly.
Today, as part of a DeepMind / Google Brain team collaboration, we’re releasing the Hanabi Learning Environment (code and paper), a research platform for multiagent learning and emergent communication based on the popular card game Hanabi. The HLE provides an interface for AI agents to play the game, and comes packaged with a learning agent based on the Dopamine framework. The platform’s name echoes that of the highly-successful Arcade Learning Environment.
At the Deep Reinforcement Learning Symposium at NIPS this year I had the pleasure of shaking hands with Dimitri Bertsekas, whose work has been foundational to the mathematical theory of reinforcement learning. I still turn to Neuro-Dynamic Programming (Bertsekas and Tsitsiklis, 1996) when searching for tools to explain sample-based algorithms such as TD. Earlier this summer I read parts of Dynamic Programming and Optimal Control: Volume 2, which is full of gems that could get grown into a full-blown NIPS or ICML paper (try it yourself). You can imagine my excitement at finally meeting the legend. So when he candidly asked me, "What's deep reinforcement learning, anyway?" it was clear that "Q-learning with neural networks" wasn't going to cut it.