logo Idiap Research Institute        
 [BibTeX] [Marc21]
HyperMixer: An MLP-based Low Cost Alternative to Transformers
Type of publication: Conference paper
Citation: Mai_ACL2023_2023
Publication status: Published
Booktitle: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics
Year: 2023
Month: July
Pages: 15632-15654
Location: Toronto, Canada
Organization: Association for Computational Linguistics
ISBN: 978-1-959429-72-2
Crossref: Idiap-Internal-RR-08-2022
DOI: http://dx.doi.org/10.18653/v1/2023.acl-long.871
Abstract: Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.
Projects Idiap
Authors Mai, Florian
Pannatier, Arnaud
Fehr, Fabio
Chen, Haolin
Marelli, Fran├žois
Fleuret, Francois
Henderson, James
Added by: [UNK]
Total mark: 0
  • Mai_ACL2023_2023.pdf