<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://reccobeats.com/blog</id>
    <title>ReccoBeats Blog</title>
    <updated>2019-09-05T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://reccobeats.com/blog"/>
    <subtitle>ReccoBeats Blog</subtitle>
    <icon>https://reccobeats.com/img/favicon.ico</icon>
    <entry>
        <title type="html"><![CDATA[Tic-Tac-Toe AI using Deep Q-Learning]]></title>
        <id>https://reccobeats.com/blog/welcome-docusaurus-v2</id>
        <link href="https://reccobeats.com/blog/welcome-docusaurus-v2"/>
        <updated>2019-09-05T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[This is my first post on Docusaurus.]]></summary>
        <content type="html"><![CDATA[<p>On December 5, 2017, DeepMind introduced AlphaZero to the world, a chess-playing program that defeated Stockfish simply by playing against itself for 4 hours. Impressed by AlphaZero, I wrote a simple AI to play Tic-Tac-Toe based on Deep Q-Learning, and today I want to share it with you.</p>
<p>In this article, I will cover Markov Decision Process, Q-Learning, Deep Q-Learning, and the application of Deep Q-learning to Tic-Tac-Toe.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="1-what-is-a-markov-decision-process-the-problem-with-mdp">1. What is a Markov Decision Process? The Problem with MDP<a href="https://reccobeats.com/blog/welcome-docusaurus-v2#1-what-is-a-markov-decision-process-the-problem-with-mdp" class="hash-link" aria-label="Direct link to 1. What is a Markov Decision Process? The Problem with MDP" title="Direct link to 1. What is a Markov Decision Process? The Problem with MDP">​</a></h2>
<p>A Markov Decision Process (MDP) provides a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision-maker. For a problem to be reduced to an MDP, the states in that problem must satisfy the Markov property: each state depends only on the state preceding it.</p>
<p>The transition from state  to state  has a transition probability defined by a specific formula.</p>
<p>Describing the Tic-Tac-Toe problem as an MDP</p>
<p><strong>Agent:</strong> An agent can be an entity that interacts with the environment by observing and deciding on actions based on those observations. In Tic-Tac-Toe, the agent is the player. The player's task is to win the match; using the symbols 'X' and 'O', 2 players take turns filling in empty cells, and the side that gets 3 in a row wins.</p>
<p><strong>State:</strong> After each action taken by the agent, the environment returns a state for the agent to observe and determine the next action. In Tic-Tac-Toe, I represent the state as an array of length 9, corresponding to the 9 cells in the 3x3 board. Since the machine cannot understand the characters 'X' or 'O', we define empty cells as having a value of 0, cells marked by player X as 1, and O as -1.</p>
<p><strong>Action:</strong> Each cell on the board corresponds to an action numbered from 0 to 8. For cells that have already been marked, that action is considered invalid and ignored.</p>
<p><strong>Reward:</strong> For every action the agent takes, the environment returns a reward. We can view the reward as the environment's assessment of the agent's accuracy in taking that action; throughout the interaction with the environment, the agent's goal is to maximize the total reward. In Tic-Tac-Toe, the reward is +1 when the agent chooses an action that ends the game with a win, -1 for a loss, and 0 otherwise.</p>
<p>For the agent to select an action, we need an <strong>action-value function</strong>, denoted as . The action-value function indicates the total reward when choosing action  at state .  (gamma) is the discount factor; this value characterizes the importance of future rewards—the smaller the value, the less important they are.</p>
<p>Since the agent's goal is to maximize total reward, the remaining task is to find the action with the largest action-value.</p>
<p>Returning to Tic-Tac-Toe, suppose we currently have a state. The next turn belongs to player O. Passing each action in the state through the action-value function, we get specific values. Ignoring occupied cells, we see that the action at cell 4 has the highest value, so the agent will choose action 4. In reality, this move is good, so we can say the policy at this state is optimal.</p>
<p>Thus, we have modeled the problem as an MDP. But how do we create the action-value function? We can create the action-value function by solving the Bellman equation; however, with Tic-Tac-Toe, the number of states is too large for calculation, so the Q-learning algorithm will help us do this.</p>
<ol start="2">
<li>What is Q-learning?</li>
</ol>
<p>Q-learning is model-free; by repeating actions and receiving rewards, Q-learning can create an action-value function. The action-value function in Q-learning is a look-up table called a <strong>Q-table</strong>, initialized with arbitrary values.</p>
<p>With each step in an episode, the agent selects an action using <strong>epsilon-greedy</strong>.</p>
<ul>
<li>
<p><strong>Epsilon-greedy</strong> is an algorithm for selecting actions based on an epsilon value (). First, the algorithm randomizes a number  (). If , it selects a random action; otherwise, it chooses the greedy action with the highest value.</p>
</li>
<li>
<p>The epsilon value is used to balance <strong>exploration</strong> and <strong>exploitation</strong>. If the agent only chooses greedy actions, it might miss other actions with higher rewards. If it only chooses random actions, the Q-table will converge slowly.</p>
</li>
<li>
<p>In practice, one initializes epsilon as a large number, such as 0.99, and then gradually decreases this value to a smaller number, like 0.1.</p>
</li>
</ul>
<p>Next, the action is sent to the environment, and the environment returns the next state and reward. The Q-table is updated based on the formula where  is the learning rate ().</p>
<ol start="3">
<li>Deep Q-learning</li>
</ol>
<p>The limitation of Q-learning is that if the state space is too large, it takes a lot of capacity to store the Q-table and slows down learning time because the Q-table is accessed continuously during training. Thus, <strong>Deep Q-learning (DQN)</strong> was born.</p>
<p>DQN inherits everything from Q-learning, but the Q-table is replaced by a deep neural network from deep learning (Deep neural network + Q-learning = Deep Q-learning). To calculate the action-value, we simply input the state, and the neural network immediately outputs the action-values.</p>
<p>However, to avoid <strong>overfitting</strong>, we use two neural networks here: a main network called the <strong>action-value network</strong> and an auxiliary network called the <strong>target action-value network</strong>.</p>
<p>Another issue with DQN is <strong>catastrophic forgetting</strong>. To reduce this problem, we use a "lifehack" called <strong>experience replay</strong>. For each step, we have a tuple <code>(s, a, r, s', done)</code> saved to memory, where <code>done</code> is a flag indicating if  is a terminal state. We then sample a mini-batch and feed it into the neural network for training.</p>
<p>The update function also differs slightly from Q-learning. Let's look at the entire algorithm process.</p>
<ol start="4">
<li>Applying DQN to Tic-Tac-Toe</li>
</ol>
<p>Now let's start coding.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="environment-setup">Environment Setup<a href="https://reccobeats.com/blog/welcome-docusaurus-v2#environment-setup" class="hash-link" aria-label="Direct link to Environment Setup" title="Direct link to Environment Setup">​</a></h3>
<p>First, build the Tic-Tac-Toe environment.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#bfc7d5;--prism-background-color:#292d3e"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#bfc7d5;background-color:#292d3e"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#bfc7d5"><span class="token keyword" style="font-style:italic">import</span><span class="token plain"> random</span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token keyword" style="font-style:italic">class</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(255, 203, 107)">Tictactoe_v0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">__init__</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">9</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 346]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">wining_position </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">3</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">4</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">5</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">6</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">7</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">8</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                                </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">3</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">6</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">4</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">7</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">5</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">8</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                                </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">4</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">8</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">6</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">4</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 347-368]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 369]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">player_mark </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 370]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">reset</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> is_human_first</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">9</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 372]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 373]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">player_mark </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> is_human_first </span><span class="token keyword" style="font-style:italic">else</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">-</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 374]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">not</span><span class="token plain"> is_human_first</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">env_act</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 376]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">copy</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 381]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">check_win</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">for</span><span class="token plain"> pst </span><span class="token keyword" style="font-style:italic">in</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">wining_position</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># Check for win condition</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">str</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">+</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">str</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">+</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">str</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">in</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token builtin" style="color:rgb(130, 170, 255)">str</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">3</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                 </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token operator" style="color:rgb(137, 221, 255)">==</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">player_mark</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                     </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">True</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 386]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                 </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">-</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">True</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 387] # Note: OCR suggests 1, but context implies loss/opponent win</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">not</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">in</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">True</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 389]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">False</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 390]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">env_act</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">action </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> random</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">choice</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">i </span><span class="token keyword" style="font-style:italic">for</span><span class="token plain"> i </span><span class="token keyword" style="font-style:italic">in</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">range</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token builtin" style="color:rgb(130, 170, 255)">len</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">==</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 392]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># Basic logic to block or win (simplified from OCR text)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">for</span><span class="token plain"> pst </span><span class="token keyword" style="font-style:italic">in</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">wining_position</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            com </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">str</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">+</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">str</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">+</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">str</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> com</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">replace</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(195, 232, 141)">'0'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(195, 232, 141)">''</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">==</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">str</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                 </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">==</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> action </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                 </span><span class="token keyword" style="font-style:italic">elif</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">==</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> action </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                 </span><span class="token keyword" style="font-style:italic">else</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> action </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> pst</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">action</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">!=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(195, 232, 141)">'Invalid action'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 406]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">action</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 407]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">check_win</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 408]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">-</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 409-410]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 411]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">step</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> action</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">action</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">!=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(195, 232, 141)">'Invalid action'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 414]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">action</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 415]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">check_win</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 416]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">current_turn </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">-</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 417-419]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> done</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">copy</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">None</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 420]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">env_act</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 421]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">board</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">copy</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">None</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 422]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Implement Epsilon-Greedy</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#bfc7d5;--prism-background-color:#292d3e"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#bfc7d5;background-color:#292d3e"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#bfc7d5"><span class="token keyword" style="font-style:italic">import</span><span class="token plain"> numpy </span><span class="token keyword" style="font-style:italic">as</span><span class="token plain"> np</span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token keyword" style="font-style:italic">class</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(255, 203, 107)">EpsilonGreedy</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">__init__</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> epsilon</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">epsilon </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> epsilon </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 426]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">perform</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> q_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> action_space</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">list</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">None</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">prob </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">random</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">sample</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># get probability of taking random action [cite: 432-433]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> prob </span><span class="token operator" style="color:rgb(137, 221, 255)">&lt;</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">epsilon</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># take random action [cite: 434]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> action_space </span><span class="token keyword" style="font-style:italic">is</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">None</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># all actions are available</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">random</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">randint</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token builtin" style="color:rgb(130, 170, 255)">len</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">q_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 436]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">random</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">choice</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">action_space</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 437]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">else</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># take greedy action [cite: 438]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> action_space </span><span class="token keyword" style="font-style:italic">is</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">None</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">argmax</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">q_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 440]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">max</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">q_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">a</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> a</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">for</span><span class="token plain"> a </span><span class="token keyword" style="font-style:italic">in</span><span class="token plain"> action_space</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> key</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token keyword" style="font-style:italic">lambda</span><span class="token plain"> x</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 441]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">decay</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> decay_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> lower_bound</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># Adjust the epsilon value by the formula: epsilon = max(decayValue * epsilon, lowerBound)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">epsilon </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">max</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">epsilon </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain"> decay_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> lower_bound</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 447]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Implement Experience Replay</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#bfc7d5;--prism-background-color:#292d3e"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#bfc7d5;background-color:#292d3e"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#bfc7d5"><span class="token keyword" style="font-style:italic">class</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(255, 203, 107)">ExperienceReplay</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">__init__</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> e_max</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> e_max </span><span class="token operator" style="color:rgb(137, 221, 255)">&lt;=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">raise</span><span class="token plain"> ValueError</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(195, 232, 141)">'Invalid value for memory size'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 451-452]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">e_max </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> e_max</span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">memory </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">list</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">index </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 455]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">add_experience</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> sample</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">list</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">len</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">sample</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">!=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">5</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token string" style="color:rgb(195, 232, 141)">'Invalid sample'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 457-458]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">len</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">memory</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">&lt;</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">e_max</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">             </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">memory</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">append</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">sample</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 462]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">else</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">             </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">memory</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">index</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> sample </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 463]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">index </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">index </span><span class="token operator" style="color:rgb(137, 221, 255)">+</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">%</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">e_max </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 464]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">sample_experience</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> sample_size</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> cer_mode</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">bool</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">samples </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> random</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">sample</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">memory</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> sample_size</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 466]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> cer_mode</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">samples</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token operator" style="color:rgb(137, 221, 255)">-</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">memory</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">index </span><span class="token operator" style="color:rgb(137, 221, 255)">-</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 467]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># state_samples, action_samples, reward_samples, next_state_samples, done_samples</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        s_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> a_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> r_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> ns_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done_batch </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">map</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">array</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">zip</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain">samples</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> s_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> a_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> r_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> ns_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done_batch </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 468]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">get_size</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">len</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">memory</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 474]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Implement DQN</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#bfc7d5;--prism-background-color:#292d3e"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#bfc7d5;background-color:#292d3e"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#bfc7d5"><span class="token keyword" style="font-style:italic">from</span><span class="token plain"> tensorflow</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">keras</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">models </span><span class="token keyword" style="font-style:italic">import</span><span class="token plain"> Sequential</span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token keyword" style="font-style:italic">from</span><span class="token plain"> tensorflow</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">keras</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">layers </span><span class="token keyword" style="font-style:italic">import</span><span class="token plain"> Dense</span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token keyword" style="font-style:italic">from</span><span class="token plain"> tensorflow</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">keras </span><span class="token keyword" style="font-style:italic">import</span><span class="token plain"> optimizers</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> losses</span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token keyword" style="font-style:italic">class</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(255, 203, 107)">DQN</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># Inherits BaseModel in original code</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">__init__</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> discount_factor</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">float</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> epsilon</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">float</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> e_min</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> e_max</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># super().__init__(discount_factor, epsilon, e_min, e_max)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">epsilon_greedy </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> EpsilonGreedy</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">epsilon</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 480]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">gamma </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> discount_factor </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 481]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">e_min </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> e_min </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 482]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">exp_replay </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> ExperienceReplay</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">e_max</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 483]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> Sequential</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 484-485]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">target_network </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> Sequential</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 486-487]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">cache </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">list</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 488]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">observe</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> state</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> action_space</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">list</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">None</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">q_value </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">predict</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">array</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">state</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">ravel</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 490]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> action_space </span><span class="token keyword" style="font-style:italic">is</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">not</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">None</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">max</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">q_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">a</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> a</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">for</span><span class="token plain"> a </span><span class="token keyword" style="font-style:italic">in</span><span class="token plain"> action_space</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> key</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token keyword" style="font-style:italic">lambda</span><span class="token plain"> x</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 492]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">argmax</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">q_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">observe_on_training</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> state</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> action_space</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">list</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">None</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">-</span><span class="token operator" style="color:rgb(137, 221, 255)">&gt;</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        q_value </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">predict</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">array</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">state</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">ravel</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">action </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">epsilon_greedy</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">perform</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">q_value</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> action_space</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 493]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">cache</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">extend</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">state</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> action</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 494]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> action </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 495]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">take_reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> next_state</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">cache</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">extend</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">reward</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> next_state</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">exp_replay</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">add_experience</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">cache</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">copy</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 497]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">cache</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">clear</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 498]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">train_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> sample_size</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> batch_size</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> epochs</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> verbose</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">int</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">exp_replay</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">get_size</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">&gt;=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">e_min</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># state_samples, action_samples, reward_samples, next_state_samples, done_samples</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            s_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> a_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> r_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> ns_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done_batch </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">exp_replay</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">sample_experience</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">sample_size</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token boolean" style="color:rgb(255, 88, 116)">False</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            states</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> q_values </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">replay</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">s_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> a_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> r_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> ns_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> done_batch</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            history </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">fit</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">states</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> q_values</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> epochs</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain">epochs</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> batch_size</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain">batch_size</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> verbose</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain">verbose</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> history</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">history</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token string" style="color:rgb(195, 232, 141)">'loss'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 500]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">replay</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> states</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> actions</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> rewards</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> next_states</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> terminals</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">q_values </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">predict</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">array</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">states</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># get q value at state t [cite: 502]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        nq_values </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">predict</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">array</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">next_states</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># get q value at state t+1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token keyword" style="font-style:italic">for</span><span class="token plain"> i </span><span class="token keyword" style="font-style:italic">in</span><span class="token plain"> </span><span class="token builtin" style="color:rgb(130, 170, 255)">range</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token builtin" style="color:rgb(130, 170, 255)">len</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">states</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            a </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> actions</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">done </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> terminals</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 503-504]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">r </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> rewards</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 508]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token keyword" style="font-style:italic">if</span><span class="token plain"> done</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                 </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">q_values</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">a</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> r </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 511]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">            </span><span class="token keyword" style="font-style:italic">else</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">                 </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">q_values</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">a</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> r </span><span class="token operator" style="color:rgb(137, 221, 255)">+</span><span class="token plain"> self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">gamma </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain"> np</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token builtin" style="color:rgb(130, 170, 255)">max</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">nq_values</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 513-514]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token keyword" style="font-style:italic">return</span><span class="token plain"> states</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> q_values </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 515]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token keyword" style="font-style:italic">def</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">update_target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">set_weights</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">get_weights</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 517]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="building-the-dqn-model">Building the DQN model<a href="https://reccobeats.com/blog/welcome-docusaurus-v2#building-the-dqn-model" class="hash-link" aria-label="Direct link to Building the DQN model" title="Direct link to Building the DQN model">​</a></h3>
<p>I use Keras to build the neural network.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#bfc7d5;--prism-background-color:#292d3e"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#bfc7d5;background-color:#292d3e"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#bfc7d5"><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">env </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> Tictactoe_v0</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 519]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> DQN</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">0.7</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">4096</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token number" style="color:rgb(247, 140, 108)">1048576</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 520]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">op1 </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> optimizers</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">RMSprop</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">learning_rate</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token number" style="color:rgb(247, 140, 108)">0.00025</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 521]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">add</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">Dense</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">128</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> activation</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token string" style="color:rgb(195, 232, 141)">'relu'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> input_shape</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">9</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 522]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">add</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">Dense</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">128</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> activation</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token string" style="color:rgb(195, 232, 141)">'relu'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 523]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">add</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">Dense</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">9</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> activation</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token string" style="color:rgb(195, 232, 141)">'linear'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 524]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">training_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token builtin" style="color:rgb(130, 170, 255)">compile</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">optimizer</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain">op1</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> loss</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain">losses</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">mean_squared_error</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 525]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">op2 </span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain"> optimizers</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">RMSprop</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">learning_rate</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token number" style="color:rgb(247, 140, 108)">0.00025</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 526]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">add</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">Dense</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">128</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> activation</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token string" style="color:rgb(195, 232, 141)">'relu'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> input_shape</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">9</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 527]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">add</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">Dense</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">128</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> activation</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token string" style="color:rgb(195, 232, 141)">'relu'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 528]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">add</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">Dense</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token number" style="color:rgb(247, 140, 108)">9</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> activation</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token string" style="color:rgb(195, 232, 141)">'linear'</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 529]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">cite_start</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token builtin" style="color:rgb(130, 170, 255)">compile</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token plain">optimizer</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain">op2</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> loss</span><span class="token operator" style="color:rgb(137, 221, 255)">=</span><span class="token plain">losses</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">mean_squared_error</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic"># [cite: 530]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">agent</span><span class="token punctuation" style="color:rgb(199, 146, 234)">.</span><span class="token plain">update_target_network</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain" style="display:inline-block"></span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Here, I only trained the AI to play first.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="evaluation">Evaluation<a href="https://reccobeats.com/blog/welcome-docusaurus-v2#evaluation" class="hash-link" aria-label="Direct link to Evaluation" title="Direct link to Evaluation">​</a></h3>
<p>After the training process, the model is evaluated. I had the agent play against a random player and a minimax player. Each round consisted of 1000 matches, and the first move was chosen randomly. Here are the results.</p>
<p><strong>VS MINIMAX PLAYER</strong></p>
<table><thead><tr><th>Training times</th><th>Wins</th><th>Draws</th><th>Loses</th></tr></thead><tbody><tr><td>50000</td><td>0</td><td>1000</td><td>0</td></tr></tbody></table>
<p><strong>VS RANDOM PLAYER</strong></p>
<table><thead><tr><th>Training times</th><th>Wins</th><th>Draws</th><th>Loses</th></tr></thead><tbody><tr><td>50000</td><td>809</td><td>190</td><td>1</td></tr></tbody></table>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="https://reccobeats.com/blog/welcome-docusaurus-v2#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion">​</a></h3>
<p>When playing against the Minimax player, the AI plays perfectly. However, when the opponent is a random player, the AI makes mistakes. I trained for another 100,000 episodes, but the results were not more optimistic, so I stopped.</p>
<p>You can view my entire code here.</p>
<p><strong>References</strong></p>
<ul>
<li>
<p>introduction-deep-q-learning-python</p>
</li>
<li>
<p>Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem</p>
</li>
<li>
<p>Reinforcement Learning: An Introduction</p>
</li>
<li>
<p>Human Level Control Through Deep Reinforcement Learning</p>
</li>
</ul>]]></content>
        <author>
            <name>Joel Marcey</name>
            <uri>https://github.com/JoelMarcey</uri>
        </author>
        <category label="hello" term="hello"/>
        <category label="docusaurus-v2" term="docusaurus-v2"/>
    </entry>
</feed>