Gradient estimates of return distributions
| Type of publication: | Conference paper |
| Citation: | dimitrakakis:pascal:2005 |
| Booktitle: | PASCAL Workshop on Principled Methods of Trading Exploration and Exploitation |
| Year: | 2005 |
| Note: | IDIAP-RR 05-29 |
| Crossref: | dimitrakakis:rr05-29: |
| Abstract: | We present a general method for maintaining estimates of the distribution of parameters in arbitrary models. This is then applied to the estimation of probability distributions over actions in value-based reinforcement learning. While this approach is similar to other techniques that maintain a confidence measure for action-values, it nevertheless offers an insight into current techniques and hints at potential avenues of further research. |
| Userfields: | ipdmembership={learning}, |
| Keywords: | |
| Projects: |
Idiap |
| Authors: | |
| Added by: | [UNK] |
| Total mark: | 0 |
|
Attachments
|
|
|
Notes
|
|
|
|
|