<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reinforcement-Learning on Marc G. Bellemare</title><link>https://marcgbellemare.info/en/tags/reinforcement-learning/</link><description>Recent content in Reinforcement-Learning on Marc G. Bellemare</description><generator>Hugo</generator><language>en</language><copyright>© {year} Marc G. Bellemare</copyright><lastBuildDate>Sun, 03 Feb 2019 00:00:00 +0000</lastBuildDate><atom:link href="https://marcgbellemare.info/en/tags/reinforcement-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>A Cooperative Benchmark: Announcing the Hanabi Learning Environment</title><link>https://marcgbellemare.info/en/blog/2019/hanabi-learning-environment/</link><pubDate>Sun, 03 Feb 2019 00:00:00 +0000</pubDate><guid>https://marcgbellemare.info/en/blog/2019/hanabi-learning-environment/</guid><description>&lt;p&gt;Today, as part of a DeepMind / Google Brain team collaboration, we&amp;rsquo;re releasing the
Hanabi Learning Environment (&lt;a href="https://github.com/deepmind/hanabi-learning-environment"&gt;code&lt;/a&gt;
and paper), a research platform for multiagent learning and emergent communication
based on the popular card game &lt;a href="https://boardgamegeek.com/boardgame/98778/hanabi/"&gt;Hanabi&lt;/a&gt;.
The HLE provides an interface for AI agents to play the game, and comes packaged with
a learning agent based on the &lt;a href="https://github.com/google/dopamine"&gt;Dopamine framework&lt;/a&gt;.
The platform&amp;rsquo;s name echoes that of the highly-successful
&lt;a href="https://github.com/mgbellemare/Arcade-Learning-Environment"&gt;Arcade Learning Environment&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Hanabi is a two- to five-player cooperative game designed by Antoine Bauza. It was a
revelation at the 2012 Internationale Spieltage in Essen and went on to win Spiel des
Jahres, the most prestigious prize for board games, in 2013. In Hanabi, players work
together to build five card sequences, each of a different colour. What makes the game
interesting is that players can see their teammates&amp;rsquo; cards, but not their own.
Communication happens in great part through &amp;ldquo;hint&amp;rdquo; moves, where one person tells another
something about their cards so that they know what to play or discard. Because there is
a limited number of hints that can be given, good players communicate strategically and
make use of conventions, for example &amp;ldquo;discard your oldest card first&amp;rdquo;.&lt;/p&gt;</description></item><item><title>Classic and Modern Reinforcement Learning</title><link>https://marcgbellemare.info/en/blog/2017/classic-and-modern-rl/</link><pubDate>Sun, 10 Dec 2017 00:00:00 +0000</pubDate><guid>https://marcgbellemare.info/en/blog/2017/classic-and-modern-rl/</guid><description>&lt;p&gt;At the Deep Reinforcement Learning Symposium at NIPS this year I had the pleasure of
shaking hands with Dimitri Bertsekas, whose work has been foundational to the
mathematical theory of reinforcement learning. I still turn to
&lt;a href="http://www.athenasc.com/ndpbook.html"&gt;Neuro-Dynamic Programming&lt;/a&gt;
(Bertsekas and Tsitsiklis, 1996) when searching for tools to explain sample-based
algorithms such as TD. Earlier this summer I read parts of
&lt;a href="http://www.athenasc.com/dpbook.html"&gt;Dynamic Programming and Optimal Control: Volume 2&lt;/a&gt;,
which is full of gems that could grow into a full-blown NIPS or ICML paper (try it
yourself). You can imagine my excitement at finally meeting the legend. So when he
candidly asked me, &amp;ldquo;What&amp;rsquo;s deep reinforcement learning, &lt;em&gt;anyway&lt;/em&gt;?&amp;rdquo; it was clear
that &amp;ldquo;Q-learning with neural networks&amp;rdquo; wasn&amp;rsquo;t going to cut it.&lt;/p&gt;</description></item><item><title>Introducing the ALE 0.6</title><link>https://marcgbellemare.info/en/blog/2017/ale-0-6-release/</link><pubDate>Tue, 28 Nov 2017 00:00:00 +0000</pubDate><guid>https://marcgbellemare.info/en/blog/2017/ale-0-6-release/</guid><description>&lt;p&gt;It&amp;rsquo;s been quite a long time since the Arcade Learning Environment saw any significant
changes. Today we&amp;rsquo;re releasing version 0.6, which provides support for two new features:
&lt;em&gt;modes&lt;/em&gt; and &lt;em&gt;difficulties&lt;/em&gt;. As it turns out, there are more buttons on the Atari 2600
console than the ALE lets you play with. The &lt;strong&gt;select&lt;/strong&gt; button is typically used to
determine which of the many game modes to play. For example, there are a total of 8 modes
in Freeway, including different car formations, trucks, and varying vehicle speeds.
Here&amp;rsquo;s modes 0 to 2, plus two modes from Space Invaders:&lt;/p&gt;</description></item></channel></rss>