A Unified Approach to Boundedness Properties in MSO.
Lukasz Kaiser, Martin Lang, Simon Leßenich, Christof Löding: A Unified Approach to Boundedness Properties in MSO. CSL 2015: 441-456
View ArticleCan Active Memory Replace Attention?
Lukasz Kaiser, Samy Bengio: Can Active Memory Replace Attention? CoRR abs/1610.08613 (2016)
View ArticleGoogle's Neural Machine Translation System: Bridging the Gap between Human...
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu,...
View ArticleMachine Learning with Guarantees using Descriptive Complexity and SMT Solvers.
Charles Jordan, Lukasz Kaiser: Machine Learning with Guarantees using Descriptive Complexity and SMT Solvers. CoRR abs/1609.02664 (2016)
View ArticleTensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian J. Goodfellow, Andrew Harp,...
View ArticleMulti-task Sequence to Sequence Learning.
Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser: Multi-task Sequence to Sequence Learning. ICLR (Poster) 2016
View ArticleNeural GPUs Learn Algorithms.
Lukasz Kaiser, Ilya Sutskever: Neural GPUs Learn Algorithms. ICLR (Poster) 2016
View ArticleCan Active Memory Replace Attention?
Lukasz Kaiser, Samy Bengio: Can Active Memory Replace Attention? NIPS 2016: 3774-3782
View ArticleOne Model To Learn Them All.
Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit: One Model To Learn Them All. CoRR abs/1706.05137 (2017)
View ArticleAttention Is All You Need.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention Is All You Need. CoRR abs/1706.03762 (2017)
View ArticleDepthwise Separable Convolutions for Neural Machine Translation.
Lukasz Kaiser, Aidan N. Gomez, François Chollet: Depthwise Separable Convolutions for Neural Machine Translation. CoRR abs/1706.03059 (2017)
View ArticleLearning to Remember Rare Events.
Lukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio: Learning to Remember Rare Events. CoRR abs/1703.03129 (2017)
View ArticleRegularizing Neural Networks by Penalizing Confident Output Distributions.
Gabriel Pereyra, George Tucker, Jan Chorowski, Lukasz Kaiser, Geoffrey E. Hinton: Regularizing Neural Networks by Penalizing Confident Output Distributions. CoRR abs/1701.06548 (2017)
View ArticleAttention is All you Need.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention is All you Need. NIPS 2017: 5998-6008
View ArticleRegularizing Neural Networks by Penalizing Confident Output Distributions.
Gabriel Pereyra, George Tucker, Jan Chorowski, Lukasz Kaiser, Geoffrey E. Hinton: Regularizing Neural Networks by Penalizing Confident Output Distributions. ICLR (Workshop) 2017
View ArticleLearning to Remember Rare Events.
Lukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio: Learning to Remember Rare Events. ICLR (Poster) 2017
View ArticleArea Attention.
Yang Li, Lukasz Kaiser, Samy Bengio, Si Si: Area Attention. CoRR abs/1810.10126 (2018)
View ArticleUniversal Transformers.
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser: Universal Transformers. CoRR abs/1807.03819 (2018)
View ArticleTensor2Tensor for Neural Machine Translation.
Ashish Vaswani, Samy Bengio, Eugene Brevdo, François Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Lukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit:...
View ArticleFast Decoding in Sequence Models using Discrete Latent Variables.
Lukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer: Fast Decoding in Sequence Models using Discrete Latent Variables. CoRR abs/1803.03382 (2018)
View ArticleImage Transformer.
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku: Image Transformer. CoRR abs/1802.05751 (2018)
View ArticleGenerating Wikipedia by Summarizing Long Sequences.
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer: Generating Wikipedia by Summarizing Long Sequences. CoRR abs/1801.10198 (2018)
View ArticleDiscrete Autoencoders for Sequence Models.
Lukasz Kaiser, Samy Bengio: Discrete Autoencoders for Sequence Models. CoRR abs/1801.09797 (2018)
View ArticleUnsupervised Cipher Cracking Using Discrete GANs.
Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, Lukasz Kaiser: Unsupervised Cipher Cracking Using Discrete GANs. CoRR abs/1801.04883 (2018)
View ArticleImage Transformer.
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran: Image Transformer. ICML 2018: 4052-4061
View ArticleFast Decoding in Sequence Models Using Discrete Latent Variables.
Lukasz Kaiser, Samy Bengio, Aurko Roy, Ashish Vaswani, Niki Parmar, Jakob Uszkoreit, Noam Shazeer: Fast Decoding in Sequence Models Using Discrete Latent Variables. ICML 2018: 2395-2404
View ArticleGenerating Wikipedia by Summarizing Long Sequences.
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer: Generating Wikipedia by Summarizing Long Sequences. ICLR (Poster) 2018
View ArticleDepthwise Separable Convolutions for Neural Machine Translation.
Lukasz Kaiser, Aidan N. Gomez, François Chollet: Depthwise Separable Convolutions for Neural Machine Translation. ICLR (Poster) 2018
View ArticleUnsupervised Cipher Cracking Using Discrete GANs.
Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, Lukasz Kaiser: Unsupervised Cipher Cracking Using Discrete GANs. ICLR (Poster) 2018
View ArticleTensor2Tensor for Neural Machine Translation.
Ashish Vaswani, Samy Bengio, Eugene Brevdo, François Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Lukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit:...
View ArticleThe Best of Both Worlds: Combining Recent Advances in Neural Machine...
Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George F. Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser,...
View ArticleParallel Scheduled Sampling.
Daniel Duckworth, Arvind Neelakantan, Ben Goodrich, Lukasz Kaiser, Samy Bengio: Parallel Scheduled Sampling. CoRR abs/1906.04331 (2019)
View ArticleSample Efficient Text Summarization Using a Single Pre-Trained Transformer.
Urvashi Khandelwal, Kevin Clark, Dan Jurafsky, Lukasz Kaiser: Sample Efficient Text Summarization Using a Single Pre-Trained Transformer. CoRR abs/1905.08836 (2019)
View ArticleModel-Based Reinforcement Learning for Atari.
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Ryan Sepassi, George Tucker, Henryk...
View ArticleArea Attention.
Yang Li, Lukasz Kaiser, Samy Bengio, Si Si: Area Attention. ICML 2019: 3846-3855
View ArticleUniversal Transformers.
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser: Universal Transformers. ICLR (Poster) 2019
View ArticleRethinking Attention with Performers.
Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamás Sarlós, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy J. Colwell, Adrian...
View ArticleReformer: The Efficient Transformer.
Nikita Kitaev, Lukasz Kaiser, Anselm Levskaya: Reformer: The Efficient Transformer. CoRR abs/2001.04451 (2020)
View ArticleReformer: The Efficient Transformer.
Nikita Kitaev, Lukasz Kaiser, Anselm Levskaya: Reformer: The Efficient Transformer. ICLR 2020
View ArticleModel Based Reinforcement Learning for Atari.
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George...
View ArticleSparse is Enough in Scaling Transformers.
Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Lukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva: Sparse is Enough in Scaling Transformers. CoRR abs/2111.12763 (2021)
View ArticleTraining Verifiers to Solve Math Word Problems.
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman: Training...
View ArticleHierarchical Transformers Are More Efficient Language Models.
Piotr Nawrot, Szymon Tworkowski, Michal Tyrolski, Lukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski: Hierarchical Transformers Are More Efficient Language Models. CoRR abs/2110.13711 (2021)
View ArticleEvaluating Large Language Models Trained on Code.
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger,...
View ArticleQ-Value Weighted Regression: Reinforcement Learning with Limited Data.
Piotr Kozakowski, Lukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, Katarzyna Kanska: Q-Value Weighted Regression: Reinforcement Learning with Limited Data. CoRR abs/2102.06782 (2021)
View ArticleSparse is Enough in Scaling Transformers.
Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Lukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva: Sparse is Enough in Scaling Transformers. NeurIPS 2021: 9895-9907
View ArticleRethinking Attention with Performers.
Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamás Sarlós, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger,...
View ArticleHierarchical Transformers Are More Efficient Language Models.
Piotr Nawrot, Szymon Tworkowski, Michal Tyrolski, Lukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski: Hierarchical Transformers Are More Efficient Language Models. NAACL-HLT (Findings)...
View ArticleQ-Value Weighted Regression: Reinforcement Learning with Limited Data.
Piotr Kozakowski, Lukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, Katarzyna Kanska: Q-Value Weighted Regression: Reinforcement Learning with Limited Data. IJCNN 2022: 1-8
View ArticletsGT: Stochastic Time Series Modeling With Transformer.
Lukasz Kucinski, Witold Drzewakowski, Mateusz Olko, Piotr Kozakowski, Lukasz Maziarka, Marta Emilia Nowakowska, Lukasz Kaiser, Piotr Milos: tsGT: Stochastic Time Series Modeling With Transformer. CoRR...
View Article
More Pages to Explore .....