Attention mechanism was originally developed to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation.
Nowadays attention is one of a most useful tools on a broad range of applied tasks in NLP, CV, speech etc.
We can witness the development of this simple architecture trick into mature tools in disposal of deep learning engineers and researchers.
One of the most popular network architectures from Google, the Transformer, based solely on attention mechanism, dispensing with recurrence and convolutions entirely.
Besides producing major improvements in translation quality, it provides a new architecture for many other NLP tasks. The paper itself is very clearly written, but the conventional wisdom has been that it is quite difficult to implement correctly.
We will also dedicate some time on the seminar to deal with that.
And recently OpenAI demonstrated that large gains on NLP tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task.
As this approach showing great results on a wide range of NLP tasks get you thinking if you can apply same techniques to different source-code-related tasks. So this is not only a seminar but also a research proposal we can discuss afterwards.
Speaker: Rauf Kurbanov.
Presentation language: Russian.
Date and time: January 30th, 20:00-21:30.
Location: Times, room 204.
* Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. “Effective approaches to attention-based neural machine translation.” arXiv preprint arXiv:1508.04025 (2015).
* Yang, Zichao, et al. "Hierarchical attention networks for document classification." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.
* Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
* Radford, Alec, et al. “Improving language understanding by generative pre-training.” URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf (2018).
Videos from previous seminars are available at http://bit.ly/MLJBSeminars