Predictive neural networks for reinforcement learning
Ingredients:
- Visual network: a CNN encoder (CNN1) or similar that takes the visual input (raw pixels) and provides embedding e_t (a vector, say: 512 values)
- Predictive network: from state e_t and action chosen a_t, predicts the next state e_t+1
- Policy network: from state e_t and model prediction e_t+1, produces actions a_t to be performed by agent
Training:
- use imitation learning to train CNN1 model
- train controller to achieve target (reward)
- train predictive network to predict next state
Models
Model 1 (our)
Step 1: frame f_t — CNN1 → embedding e_t — policy → action a_t
Step 2: e_t, a_t — pred_net → e^_t+1
Step 3: step play: a_t — game → f_t+1 — CNN1 → e_t+1
Step 4: minimize ||e_t+1 — e^_t+1||
Note: our model, with 1 CNN model in common and no Inverse model
Model 2 from here
Step 1: Policy: S_t → CNN1 + PI classifier→ a_t
Step 2: S_t, a_t → Forward model→ r^_t+1 and: S_t, S_t+1 → CNN2→ Inverse model→ a^_t
Step 3: minimize || r^_t+1 — r_t+1 || and || a_t — a^_t|| for both Forward and Inverse models
Note: two CNN model that have to encode the same representation! Not efficient
Model 3 — modified from 2
Step 1: Policy: S_t → CNN1 + PI classifier→ a_t
Step 2: S_t, a_t → Forward model→ r^_t+1 and: S_t, S_t+1 → CNN1→ Inverse model→ a^_t
Step 3: minimize || r^_t+1 — r_t+1 || and || a_t — a^_t|| for both Forward and Inverse models
Note: just one CNN model