


After starting with representations of individual words or even pieces of words, they aggregate information from surrounding words to determine the meaning of a given bit of language in context. Neural networks usually process language by generating fixed- or variable-length vector-space representations. On top of higher translation quality, the Transformer requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training by up to an order of magnitude.īLEU scores (higher is better) of single models on the standard WMT newstest2014 English to German translation benchmark.īLEU scores (higher is better) of single models on the standard WMT newstest2014 English to French translation benchmark.Īccuracy and Efficiency in Language Understanding In our paper, we show that the Transformer outperforms both recurrent and convolutional models on academic English to German and English to French translation benchmarks.

In “ Attention Is All You Need”, we introduce the Transformer, a novel neural network architecture based on a self-attention mechanism that we believe to be particularly well suited for language understanding. Neural networks, in particular recurrent neural networks (RNNs), are now at the core of the leading approaches to language understanding tasks such as language modeling, machine translation and question answering. Posted by Jakob Uszkoreit, Software Engineer, Natural Language Understanding
