gym 介绍

96
博士伦2014
0.2 2018.11.29 20:24* 字数 403

1. 组成

OpenAI Gym由两部分组成:

  1. gym开源库:测试问题的集合。当你测试强化学习的时候,测试问题就是环境,比如机器人玩游戏,环境的集合就是游戏的画面。这些环境有一个公共的接口,允许用户设计通用的算法。
  2. OpenAI Gym服务:提供一个站点(比如对于游戏cartpole-v0:https://gym.openai.com/envs/CartPole-v0)和api,允许用户对他们的测试结果进行比较。

2. 接口

gym的核心接口是Env,作为统一的环境接口。Env包含下面几个核心函数:

  • reset(self):重置环境的状态,返回观测。
  • step(self, action):物理引擎,向前推进一个时间步长,返回observation,reward,done,info
  • render(self, mode=’human’, close=False):图像引擎,重绘环境的一帧。默认模式一般比较友好,如弹出一个窗口。

3. 注册自己的模拟器

  1. 目标是在注册表中注册自己的环境。假设你在以下结构中定义了自己的环境:
myenv/
    __init__.py
    myenv.py
  1. myenv.py包含适用于我们自己的环境的类。 在init.py中,输入以下代码:
from gym.envs.registration import register
register(
    id='MyEnv-v0',
    entry_point='myenv.myenv:MyEnv', # 第一个myenv是文件夹名字,第二个myenv是文件名字,MyEnv是文件内类的名字
)
  1. 要使用我们自己的环境:
import gym
import myenv # 一定记得导入自己的环境,这是很容易忽略的一点
env = gym.make('MyEnv-v0')
  1. 在PYTHONPATH中安装myenv目录或从父目录启动python。
目录结构:
myenv/
    __init__.py
    my_hotter_colder.py
-------------------
__init__.py 文件:
-------------------
from gym.envs.registration import register
register(
    id='MyHotterColder-v0',
    entry_point='myenv.my_hotter_colder:MyHotterColder',
)
-------------------
my_hotter_colder.py文件:
-------------------
import gym
from gym import spaces
from gym.utils import seeding
import numpy as np

class MyHotterColder(gym.Env):
    """Hotter Colder
    The goal of hotter colder is to guess closer to a randomly selected number

    After each step the agent receives an observation of:
    0 - No guess yet submitted (only after reset)
    1 - Guess is lower than the target
    2 - Guess is equal to the target
    3 - Guess is higher than the target

    The rewards is calculated as:
    (min(action, self.number) + self.range) / (max(action, self.number) + self.range)

    Ideally an agent will be able to recognise the 'scent' of a higher reward and
    increase the rate in which is guesses in that direction until the reward reaches
    its maximum
    """
    def __init__(self):
        self.range = 1000  # +/- value the randomly select number can be between
        self.bounds = 2000  # Action space bounds

        self.action_space = spaces.Box(low=np.array([-self.bounds]), high=np.array([self.bounds]))
        self.observation_space = spaces.Discrete(4)

        self.number = 0
        self.guess_count = 0
        self.guess_max = 200
        self.observation = 0

        self.seed()
        self.reset()

    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def step(self, action):
        assert self.action_space.contains(action)

        if action < self.number:
            self.observation = 1

        elif action == self.number:
            self.observation = 2

        elif action > self.number:
            self.observation = 3

        reward = ((min(action, self.number) + self.bounds) / (max(action, self.number) + self.bounds)) ** 2

        self.guess_count += 1
        done = self.guess_count >= self.guess_max

        return self.observation, reward[0], done, {"number": self.number, "guesses": self.guess_count}

    def reset(self):
        self.number = self.np_random.uniform(-self.range, self.range)
        self.guess_count = 0
        self.observation = 0
        return self.observation

参考:

  1. https://github.com/openai/gym/issues/626
  2. https://github.com/openai/gym/tree/master/gym/envs#how-to-create-new-environments-for-gym
  3. https://github.com/openai/gym/blob/522c2c532293399920743265d9bc761ed18eadb3/gym/envs/init.py
深度强化学习