Nonparametric Variational Regularisation of Pretrained Transformers

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Fehr_COLM_2024
Publication status:	Accepted
Booktitle:	First conference on Language Modelling
Year:	2024
Month:	October
Crossref:	Fehr_NVRegularisation_2023: Nonparametric Variational Regularisation of Pretrained Transformers, Fehr, Fabio and Henderson, James, in: ArXiv, 2023
URL:	https://openreview.net/forum?i...
Abstract:	retrained transformers have demonstrated impressive abilities, but tend not to generalise well out-of-domain and are very expensive to fine-tune on new domain data. Nonparametric Variational Information Bottleneck (NVIB) has been proposed as a regulariser for training cross-attention in transformers, potentially addressing this domain overfitting problem. We extend the NVIB framework to replace all types of attention functions in transformers. We show that existing pretrained transformers can be reinterpreted as nonparametric variational models using an empirical prior distribution and identity initialisation with controllable hyperparameters. We then show that changing the initialisation introduces a novel, information-theoretic post-training regularisation in the attention mechanism, which improves out-of-domain generalisation on NLP tasks without any additional training. This success supports the hypothesis that the way pretrained transformer embeddings represent information is accurately characterised by nonparametric variational Bayesian models.
Keywords:	Nonparametric VIB, Out-of-domain generalisation, Post-training regularisation, Reinterpretation, transformers
Projects	Idiap EVOLANG
Authors	Fehr, Fabio Henderson, James
Added by:	[UNK]
Total mark:	0
Attachments
Fehr_COLM_2024.pdf
Notes

processing time: 0.0003 seconds.