Neural Machine Translation with Deep Attention
- 软件学院－已发表论文 
【Abstract】Deepening neural models has been proven very successful in improving the model's capacity when solving complex learning tasks, such as the machine translation task. Previous efforts on deep neural machine translation mainly focus on the encoder and the decoder, while little on the attention mechanism. However, the attention mechanism is of vital importance to induce the translation correspondence between different languages where shallow neural networks are relatively insufficient, especially when the encoder and decoder are deep. In this paper, we propose a deep attention model (DeepAtt). Based on the low-level attention information, DeepAtt is capable of automatically determining what should be passed or suppressed from the corresponding encoder layer so as to make the distributed representation appropriate for high-level attention and translation. We conduct experiments on NIST Chinese-English, WMT English-German and WMT English-French translation tasks, where, with 5 attention layers, DeepAtt yields very competitive performance against the state-of-the-art results. We empirically find that with an adequate increase of attention layers, DeepAtt tends to produce more accurate attention weights. An in-depth analysis on the translation of important context words further reveals that DeepAtt significantly improves the faithfulness of system translations.
描述该论文提出一种深层的注意机制，用于融合深层编码器和深层解码器之间的语义信息，从而进一步增强翻译系统建模源语言和目标语言之间翻译关系的能力。该论文提出的模型可以利用低层注意机制学习到的上下文信息，自动地判定如何从相应的编码层中提取、过滤源端语义信息并融入到相应的解码层之中，从而使高层注意机制拥有更充分的信息来建模深层次的翻译关系，并促使模型的隐层表示更适合目标词汇的预测。在中英、英德和英法三个翻译任务上，新模型取得了近乎最先进的翻译结果。该研究工作由我校软件学院苏劲松老师团队和天津大学熊德意老师团队合作完成。通讯作者为我校软件学院苏劲松副教授，第一作者为我校软件学院硕士生张飚。The authors were supported by National Natural Science Foundation of China (Nos. 61672440 and 61622209), the Fundamental Research Funds for the Central Universities (Grant No. ZK1024),and Scientific Research Project of National Language Committee of China (Grant No. YB135-49). Biao Zhang greatly acknowledges the support of the Baidu Scholarship. 该项研究得到了国家自然科学基金（Nos. 61672440, 61622209）、中央高校基础科研基金（No. ZK1024）、国家语委科研项目（No. YB13549）、百度奖学金等的资助。
出处IEEE Transactions on Pattern Analysis and Machine Intelligence,2018:doi.org/10.1109/TPAMI.2018.2876404