logo Idiap Research Institute        
 [BibTeX] [Marc21]
Idiap Abstract Text Summarization System for German Text Summarization Task
Type of publication: Idiap-RR
Citation: Parida_Idiap-RR-03-2020
Number: Idiap-RR-03-2020
Year: 2020
Month: 1
Institution: Idiap
Abstract: Text summarization is considered as a challenging task in the NLP community. The availability of datasets for the task of multilingual text summarization is rare, and such datasets are difficult to construct. In this work, we build an abstract text summarizer for the German language text using the state-of-the-art “Transformer” model. We propose an iterative data augmentation approach which uses synthetic data along with the real summarization data for the German language. To generate synthetic data, the Common Crawl (German) dataset is exploited, which covers different domains. The synthetic data is effective for the low resource conditions, and is particularly helpful for multilingual scenario where availability of summarizing data is still a challenging issue.
Projects Innosuisse-SM2
Authors Parida, Shantipriya
Motlicek, Petr
Added by: [ADM]
Total mark: 0
  • Parida_Idiap-RR-03-2020.pdf (MD5: c0775389ac704c586c855e3c0d80a999)