HyperMixer: An MLP-based Low Cost Alternative to Transformers

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Mai_ACL2023_2023
Publication status:	Published
Booktitle:	Proc. of the 61st Annual Meeting of the Association for Computational Linguistics
Year:	2023
Month:	July
Pages:	15632-15654
Location:	Toronto, Canada
Organization:	Association for Computational Linguistics
ISBN:	978-1-959429-72-2
Crossref:	Idiap-Internal-RR-08-2022
DOI:	http://dx.doi.org/10.18653/v1/2023.acl-long.871
Abstract:	Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.
Keywords:
Projects	Idiap LAOS NAST COMPBIO
Authors	Mai, Florian Pannatier, Arnaud Fehr, Fabio Chen, Haolin Marelli, François Fleuret, Francois Henderson, James
Added by:	[UNK]
Total mark:	0
Attachments
Mai_ACL2023_2023.pdf
Notes

processing time: 0.0003 seconds.