Pretraining
Supervised Pretraining for Vision:Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , Girshick at al., 2013
Exploring the Limits of Weakly Supervised Pretraining, Mahajan et al., May 2018
Rethinking ImageNet Pretraining, He et al., Nov. 2018
Unsupervised Pretraining for Vision:Context Encoders: Feature Learning by Inpainting, Pathak et al., April 2016
Learning Representations for Automatic Colorization, Larsson et al., March 2016
Representation Learning with Contrastive Predictive Coding (CPC), van den Oord et al, July 2018
Learning Deep Representations by Mutual Information Estimation and Maximization, Hjelm et al., Aug. 2018
Difference of Entropies Mutual Information Pretraining:Information Theoretic Co-Training, McAllester, Feb. 2018
Formal Limitations on the Measurement of Mutual Information, McAllester and Stratos, Nov. 2018
Unsupervised Pretraining for Language:Efficient Estimation of Word Representations in Vector Space (Word2Vec), Mikolov et al., 2013
Advances in Pre-Training Distributed Word Representations (FastText, BPE), Mikolov et al., 2017
Attention is All You Need (The Transformer), Vaswani et al., June 2017
Deep contextualized word representations (ELMO), Peters et al., Feb. 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, (BERT) Devlin et al., Oct. 2018
Language Models are Unsupervised Multitask Learners, (GPT-2) Radford et al., Feb. 2019
Unsupervised Machine Translation (UMT):Unsupervised Nerual Machine Translation, Artexte et al., Oct. 2017
Unsupervised Machine Translation Using Monolingual Corpora Only, Lample et al., Oct. 2017
An Effective Approach to Unsupervised Machine Translation, Artexte et. al., Feb. 2019