Cancel

Predictive neural networks for reinforcement learning

Eugenio Culurciello Nov 29, 2018 2018-11-29T00:00:00-05:00

1 min

Predictive neural networks for reinforcement learning

Ingredients:

Visual network: a CNN encoder (CNN1) or similar that takes the visual input (raw pixels) and provides embedding e_t (a vector, say: 512 values)
Predictive network: from state e_t and action chosen a_t, predicts the next state e_t+1
Policy network: from state e_t and model prediction e_t+1, produces actions a_t to be performed by agent

Training:

use imitation learning to train CNN1 model
train controller to achieve target (reward)
train predictive network to predict next state

Models

Model 1 (our)

Step 1: frame f_t — CNN1 → embedding e_t — policy → action a_t

Step 2: e_t, a_t — pred_net → e^_t+1

Step 3: step play: a_t — game → f_t+1 — CNN1 → e_t+1

Step 4: minimize ||e_t+1 — e^_t+1||

Note: our model, with 1 CNN model in common and no Inverse model

Model 2 from here

Step 1: Policy: S_t → CNN1 + PI classifier→ a_t

Step 2: S_t, a_t → Forward model→ r^_t+1 and: S_t, S_t+1 → CNN2→ Inverse model→ a^_t

Step 3: minimize || r^_t+1 — r_t+1 || and || a_t — a^_t|| for both Forward and Inverse models

Note: two CNN model that have to encode the same representation! Not efficient

Model 3 — modified from 2

Step 1: Policy: S_t → CNN1 + PI classifier→ a_t

Step 2: S_t, a_t → Forward model→ r^_t+1 and: S_t, S_t+1 → CNN1→ Inverse model→ a^_t

Step 3: minimize || r^_t+1 — r_t+1 || and || a_t — a^_t|| for both Forward and Inverse models

Note: just one CNN model

This post is licensed under CC BY 4.0 by the author.

Trending Tags