- [Paper Review] Language Models are Unsupervised Multitask Learners (GPT-2)
- [Paper Review] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- [Paper Review] Deep contextualized word representations (ELMo)
- Transformer and Self-Attention
- Sequence to Sequence (seq2seq) and Attention
- Long Short Term Memory
- Recurrent Neural Networks (RNN)
- Word Embedding
- [Paper Review] Sequence to Sequence Learning with Neural Networks
- [Paper Review] Efficient Estimation of Word Representations in Vector Space