Text-only adaptation in LLM-based ASR through text denoising
| Type of publication: | Conference paper |
| Citation: | Burdisso_ICASSP2026_2026 |
| Publication status: | Accepted |
| Booktitle: | ICASSP |
| Year: | 2026 |
| Abstract: | Adapting automatic speech recognition (ASR) systems based on large language models (LLMs) to new domains using text-only data is a significant yet underexplored challenge. Standard fine-tuning of the LLM on target-domain text often disrupts the critical alignment between speech and text modalities learned by the projector, degrading performance. We introduce a novel text-only adaptation method that emulates the audio projection task by treating it as a text denoising task. Our approach thus trains the LLM to recover clean transcripts from noisy inputs. This process effectively adapts the model to a target domain while preserving cross-modal alignment. Our solution is lightweight, requiring no architectural changes or additional parameters. Extensive evaluation on two datasets demonstrates up to 22.1% relative improvement, outperforming recent state-of-the-art text-only adaptation methods. |
| Main Research Program: | AI for Everyone |
| Additional Research Programs: |
AI for Everyone |
| Keywords: | |
| Projects: |
Idiap UNIPHORE ELOQUENCE |
| Authors: | |
| Added by: | [UNK] |
| Total mark: | 0 |
|
Attachments
|
|
|
Notes
|
|
|
|
|