Adaptive Sampled Softmax with Kernel Based Sampling
In classification tasks it is common to use the Softmax function that turns model's outputs into the classes' probabilities. If the number of classes N is large then computation of the gradients becomes a performance bottleneck: it requires O(N) time in case of the plain Softmax. Such a problem may arise in language modelling or recommender systems.
In practice effective approximations of Softmax are used, e.g. Sampled Softmax which creates and utilizes a small sample of classes. The sampling distribution plays a big role for the approximation's quality. Nevertheless, despite its importance, almost all recent applications still use simple sampling distributions, such as uniform, which leads to either bad quality or bad performance.
At the seminar we wiil discuss the issues of Sampled Softmax and will take a look at a recent neat method of designing the sampling distribution that resolves these issues.
Speaker: Egor Shcherbin.
Presentation language: Russian.
Date and time: April 10th, 6:30-8:00 pm.
Location: Times, room 204.
Videos from previous seminars are available at http://bit.ly/MLJBSeminars
- About seminars
29 May 2019ICLR 2019 Overview
22 May 2019Segmentation in 2019. The fastest and the most accurate.
8 May 2019Speech recognition and speech synthesis
17 April 2019Open Questions about Generative Adversarial Networks
10 April 2019Adaptive Sampled Softmax with Kernel Based Sampling