Gradient estimates of return distributions
Type of publication: | Conference paper |
Citation: | dimitrakakis:pascal:2005 |
Booktitle: | PASCAL Workshop on Principled Methods of Trading Exploration and Exploitation |
Year: | 2005 |
Note: | IDIAP-RR 05-29 |
Crossref: | dimitrakakis:rr05-29: |
Abstract: | We present a general method for maintaining estimates of the distribution of parameters in arbitrary models. This is then applied to the estimation of probability distributions over actions in value-based reinforcement learning. While this approach is similar to other techniques that maintain a confidence measure for action-values, it nevertheless offers an insight into current techniques and hints at potential avenues of further research. |
Userfields: | ipdmembership={learning}, |
Keywords: | |
Projects |
Idiap |
Authors | |
Added by: | [UNK] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|