HyperMixer: An MLP-based Low Cost Alternative to Transformers
Type of publication: | Conference paper |
Citation: | Mai_ACL2023_2023 |
Publication status: | Published |
Booktitle: | Proc. of the 61st Annual Meeting of the Association for Computational Linguistics |
Year: | 2023 |
Month: | July |
Pages: | 15632-15654 |
Location: | Toronto, Canada |
Organization: | Association for Computational Linguistics |
ISBN: | 978-1-959429-72-2 |
Crossref: | Idiap-Internal-RR-08-2022 |
DOI: | http://dx.doi.org/10.18653/v1/2023.acl-long.871 |
Abstract: | Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning. |
Keywords: | |
Projects |
Idiap LAOS NAST COMPBIO |
Authors | |
Added by: | [UNK] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|