Model Collection
BERT (opens in a new tab)2018Bidirectional Encoder Representations from TransformersGPT (opens in a new tab)2018Improving Language Understanding by Generative Pre-TrainingRoBERTa (opens in a new tab)2019A Robustly Optimized BERT Pretraining ApproachGPT-2 (opens in a new tab)2019Language Models are Unsupervised Multitask LearnersT5 (opens in a new tab)2019Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerBART (opens in a new tab)2019Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionALBERT (opens in a new tab)2019A Lite BERT for Self-supervised Learning of Language RepresentationsXLNet (opens in a new tab)2019Generalized Autoregressive Pretraining for Language Understanding and GenerationCTRL (opens in a new tab)2019CTRL: A Conditional Transformer Language Model for Controllable GenerationERNIE (opens in a new tab)2019ERNIE: Enhanced Representation through Knowledge IntegrationGShard (opens in a new tab)2020GShard: Scaling Giant Models with Conditional Computation and Automatic ShardingGPT-3 (opens in a new tab)2020Language Models are Few-Shot LearnersLaMDA (opens in a new tab)2021LaMDA: Language Models for Dialog ApplicationsPanGu-α (opens in a new tab)2021PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel ComputationmT5 (opens in a new tab)2021mT5: A massively multilingual pre-trained text-to-text transformerCPM-2 (opens in a new tab)2021CPM-2: Large-scale Cost-effective Pre-trained Language ModelsT0 (opens in a new tab)2021Multitask Prompted Training Enables Zero-Shot Task GeneralizationHyperCLOVA (opens in a new tab)2021What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained TransformersCodex (opens in a new tab)2021Evaluating Large Language Models Trained on CodeERNIE 3.0 (opens in a new tab)2021ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and GenerationJurassic-1 (opens in a new tab)2021Jurassic-1: Technical Details and EvaluationFLAN (opens in a new tab)2021Finetuned Language Models Are Zero-Shot LearnersMT-NLG (opens in a new tab)2021Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language ModelYuan 1.0 (opens in a new tab)2021Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot LearningWebGPT (opens in a new tab)2021WebGPT: Browser-assisted question-answering with human feedbackGopher (opens in a new tab)2021Scaling Language Models: Methods, Analysis & Insights from Training GopherERNIE 3.0 Titan (opens in a new tab)2021ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and GenerationGLaM (opens in a new tab)2021GLaM: Efficient Scaling of Language Models with Mixture-of-ExpertsInstructGPT (opens in a new tab)2022Training language models to follow instructions with human feedbackGPT-NeoX-20B (opens in a new tab)2022GPT-NeoX-20B: An Open-Source Autoregressive Language ModelAlphaCode (opens in a new tab)2022Competition-Level Code Generation with AlphaCodeCodeGen (opens in a new tab)2022CodeGen: An Open Large Language Model for Code with Multi-Turn Program SynthesisChinchilla (opens in a new tab)2022Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.Tk-Instruct (opens in a new tab)2022Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP TasksUL2 (opens in a new tab)2022UL2: Unifying Language Learning ParadigmsPaLM (opens in a new tab)2022PaLM: Scaling Language Modeling with PathwaysOPT (opens in a new tab)2022OPT: Open Pre-trained Transformer Language ModelsBLOOM (opens in a new tab)2022BLOOM: A 176B-Parameter Open-Access Multilingual Language ModelGLM-130B (opens in a new tab)2022GLM-130B: An Open Bilingual Pre-trained ModelAlexaTM (opens in a new tab)2022AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq ModelFlan-T5 (opens in a new tab)2022Scaling Instruction-Finetuned Language ModelsSparrow (opens in a new tab)2022Improving alignment of dialogue agents via targeted human judgementsU-PaLM (opens in a new tab)2022Transcending Scaling Laws with 0.1% Extra ComputemT0 (opens in a new tab)2022Crosslingual Generalization through Multitask FinetuningGalactica (opens in a new tab)2022Galactica: A Large Language Model for ScienceOPT-IML (opens in a new tab)2022OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of GeneralizationLLaMA (opens in a new tab)2023LLaMA: Open and Efficient Foundation Language ModelsGPT-4 (opens in a new tab)2023GPT-4 Technical ReportPanGu-Σ (opens in a new tab)2023PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous ComputingBloombergGPT (opens in a new tab)2023BloombergGPT: A Large Language Model for FinanceCerebras-GPT (opens in a new tab)2023Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Last updated