Experience Replay¶
A simple ring buffer for experience replay. |
|
A simple ring buffer for experience replay, with prioritized sampling. |
This is where we keep our experience-replay buffer classes. Some examples of agents that use a replay buffer are:
For specific examples, have a look at the agents for Atari games.
Object Reference¶
- class coax.experience_replay.SimpleReplayBuffer(capacity, random_seed=None)[source]¶
A simple ring buffer for experience replay.
- Parameters:
capacity (positive int) – The capacity of the experience replay buffer.
random_seed (int, optional) – To get reproducible results.
- add(transition_batch)[source]¶
Add a transition to the experience replay buffer.
- Parameters:
transition_batch (TransitionBatch) – A
TransitionBatch
object.
- sample(batch_size=32)[source]¶
Get a batch of transitions to be used for bootstrapped updates.
- Parameters:
batch_size (positive int, optional) – The desired batch size of the sample.
- Returns:
transitions (TransitionBatch) – A
TransitionBatch
object.
- class coax.experience_replay.PrioritizedReplayBuffer(capacity, alpha=1.0, beta=1.0, epsilon=0.0001, random_seed=None)[source]¶
A simple ring buffer for experience replay, with prioritized sampling.
This class uses proportional sampling, which means that the transitions are sampled with relative probability \(p_i\) defined as:
\[p_i\ =\ \frac {\left(|\mathcal{A}_i| + \epsilon\right)^\alpha} {\sum_{j=1}^N \left(|\mathcal{A}_j| + \epsilon\right)^\alpha}\]Here \(\mathcal{A}_i\) are advantages provided at insertion time and \(N\) is the capacity of the buffer, which may be quite large. The \(\mathcal{A}_i\) are typically just TD errors collected from a value-function updater, e.g.
QLearning.td_error
.Since the prioritized samples are biased, the
sample
method also produces non-trivial importance weights (stored in theTransitionBatch.W
attribute). The logic for constructing these weights for a sample of batch size \(n\) is:\[w_i\ =\ \frac{\left(Np_i\right)^{-\beta}}{\max_{j=1}^n \left(Np_j\right)^{-\beta}}\]See section 3.4 of https://arxiv.org/abs/1511.05952 for more details.
- Parameters:
capacity (positive int) – The capacity of the experience replay buffer.
alpha (positive float, optional) – The sampling temperature \(\alpha>0\).
beta (positive float, optional) – The importance-weight exponent \(\beta>0\).
epsilon (positive float, optional) – The small regulator \(\epsilon>0\).
random_seed (int, optional) – To get reproducible results.
- add(transition_batch, Adv)[source]¶
Add a transition to the experience replay buffer.
- Parameters:
transition_batch (TransitionBatch) – A
TransitionBatch
object.Adv (ndarray) – A batch of advantages, used to construct the priorities \(p_i\).
- sample(batch_size=32)[source]¶
Get a batch of transitions to be used for bootstrapped updates.
- Parameters:
batch_size (positive int, optional) – The desired batch size of the sample.
- Returns:
transitions (TransitionBatch) – A
TransitionBatch
object.