The 10 most viewed publications of 2022

February 5, 2024

[ad_1]

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on one-shot summarization tasks, outperforming a much larger 540B PaLM decoder model.

Related content

With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation.

AlexaTM 20B also achieves SOTA in one-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.

Read and download the paper

[ad_2]

Source link

The 10 most viewed publications of 2022

Top 10 blog posts of 2022

Improving automatic discrimination of logos with similar texts

Amazon launches new Alexa Prize SimBot Challenge

Philip Stier is cracking cloud-based climate conundrums

Amazon’s first research scientist elected an INFORMS fellow

Amazon Robotics AI leaders believe now is a ‘particularly good time’ to explore careers in robotics

Leave a reply Cancel reply