GitHub - flybirdz/GPT-is-All-you-need: 这里是截至目前关于GPT的一切你应该知道的，包括论文，开源模型，网站，博客... Here is everything about GPT what you should know so far, including papers, open source models, websites, blogs...

一、项目简介（Repo’s Brief Introduction）

这里是截至目前关于GPT的一切你应该知道的，包括论文，开源模型，网站，博客... 我将它们整理到一起，以便大家更方便地了解和使用GPT。项目不定期更新，内容可能不全，还请大家补充！

Here is everything about GPT what you should know so far, including papers, open source models, websites, blogs... I organize them together so that everyone can understand and use GPT more easily. The project is updated from time to time, and the content may be incomplete. Please add it!

二、ChatGPT(GPT4)的前世今生（The past and present of ChatGPT (GPT4)）

1. LLMs(Large Language Models)(大语言模型)

code:未开源(Not open source)

1.4 CPM

paper:CPM: A Large-scale Generative Chinese Pre-trained Language Model

code:TsinghuaAI/CPM

1.5 FastMoE(Foundation of WuDao 2.0)

paper:FastMoE: A Fast Mixture-of-Expert Training System

code:laekov/fastmoe

1.6 CPM2

paper:CPM-2: Large-scale Cost-effective Pre-trained Language Models

code:TsinghuaAI/CPM

1.7 Megatron-LM

paper:Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

code:NVIDIA/Megatron

1.8 ERINE 3.0

paper:ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation（base)

code:PaddleNLP/erine-3.0

1.9 Claude

paper:Constitutional AI: Harmlessness from AI Feedback

code:未开源(Not open source)

1.10 GLaM

paper:GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

code:未开源(Not open source)

1.11 Gopher

papar:Scaling Language Models: Methods, Analysis & Insights from Training Gopher

code:未开源(Not open source)

1.12 LaMDA

paper:LaMDA: Language Models for Dialog Applications

code:未开源(Not open source)

1.13 Turing-NLG-530B

paper:Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

code:未开源(Not open source)

1.14 ChinChilla

paper:Training Compute-Optimal Large Language Models

code:未开源(Not open source)

1.15 GPT3.5(InstructionGPT)

paper(有两篇供参考)(There are two ones to be referred):

Learning to summarize from human feedback

Training language models to follow instructions with human feedback

code:未开源(Not open source)

1.16 PaLM

paper:PaLM: Scaling Language Modeling with Pathways

code:lucidrains/PaLM-pytorch

1.17 OPT

paper:OPT: Open Pre-trained Transformer Language Models

code:facebookresearch/metaseq

1.18 BaGuaLu

paper:BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores

code:未开源(Not open source)

1.19 Minerva

paper:Solving Quantitative Reasoning Problems with Language Models

code:未开源(Not open source)

1.20 BLOOM

paper:BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

code:

huggingface/bigscience/bloom

bigscience-workshop/petals

1.21 GLM

paper:GLM: General Language Model Pretraining with Autoregressive Blank Infilling

code:THUDM/GLM

1.22 GLM-130B

paper:GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL

code:THUDM/GLM-130B

1.23 LLaMA

paper:LLaMA: Open and Efficient Foundation Language Models

code:facebookresearch/llama

1.24 LLAMA2

paper:LLAMA2:Open Foundation and Fine-Tuned Chat Models

code:facebookresearch/llama

1.25 Alpaca

code:tatsu-lab/stanford_alpaca

1.26 GPT4

paper:GPT-4 Technical Report

code:未开源(Not open source)

1.27 Vicuna/FastChat

code:lm-sys/FastChat

1.28 Cerabras-GPT

paper:Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

code:huggingface/cerabras

1.29 PanGu-α

paper:PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

code:huawei-noah/Pretrained-Language-Model/PanGu-α

1.30 PanGu-Σ

paper:PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

code:未开源(Not open source)

1.31 PanGu-π

paper:PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation

code:未开源(Not open source)

1.32 Yuan 1.0

paper:Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning

code:未开源(Not open source)

1.33 XuanYuan 2.0

paper:XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters

code:未开源(Not open source)

1.34 BloombergGPT

paper:BloombergGPT: A Large Language Model for Finance

code:未开源(Not open source)

1.35 BiomedGPT

paper:BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks

code:未开源(Not open source)

1.36 Mengzi

paper:Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

code:Langboat/Mengzi

1.37 PaLM2

paper:PaLM 2 Technical Report

code:未开源(Not open source)

1.38 LLaMA-Adapter

paper:LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

code:ZrrSkywalker/LLaMA-Adapter

1.39 CPM-Bee

code:OpenBMB/CPM-Bee

1.40 Baichuan2

code:baichuan-inc/Baichuan-7B

2.Embeedings

2.1 Word2Vec

paper:Efficient Estimation of Word Representations in Vector Space

code:Tensorflow Core/Word2Vec

2.2 Doc2Vec

paper:Distributed Representations of Sentences and Documents

code:Gensim/Doc2Vec Model

(Gensim is a popular python nlp processing library)

2.3 Context2Vec

paper:context2vec: Learning Generic Context Embedding with Bidirectional LSTM

code:orenmel/context2Vec

2.4 lda2Vec

paper:Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

code:cemoody/lda2vec

2.5 TWEC

paper:Training Temporal Word Embeddings with a Compass

code:valedica/twec

2.6 USE

paper:Multilingual Universal Sentence Encoder for Semantic Retrieval

code:Dimitre/universal-sentence-encoder

2.7 fastText

paper:Enriching Word Vectors with Subword Information

code:facebookresearch/fastText

2.8 ELMo

paper:Deep contextualized word representations

code:HIT-SCIR/ELMoForManyLangs

2.9 GloVe

paper:GloVe:Global Vectors for Word Representation

code:stanfordnlp/GloVe

3.Transformer and Its Variants

3.1 Transformer

paper:Attention Is All You Need

code:tunz/transformer-pytorch

3.2 BERT

paper:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

code:google-research/bert

3.3 T5

paper:Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

code:google-research/text-to-text-transfer-transformer

3.4 Reformer

paper:Reformer: The Efficient Transformer

code:google/trax/reformer

3.5 Longformer

paper:Longformer: The Long-Document Transformer

code:allenai/longformer

3.6 XLM

paper:Cross-lingual Language Model Pretraining

code:facebookresearch/XLM

3.7 UniLM

paper:Unified Language Model Pre-training for Natural Language Understanding and Generation

code:microsoft/unilm

3.8 RoBERTa

paper:RoBERTa: A Robustly Optimized BERT Pretraining Approach

code:huggingface/roberta

3.9 ALBERT

paper:ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

code:google-research/albert

3.10 DeBERTa

paper:DeBERTa: Decoding-enhanced BERT with Disentangled Attention

code:microsoft/DeBERTa

3.11 BART

paper:BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

code:huggingface/bart

3.12 XLNet

paper:XLNet: Generalized Autoregressive Pretraining for Language Understanding

code:zihangdai/xlnet

3.13 ViT

paper:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

code:google-research/vision_transformer

3.14 Swin Transformer

paper:Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

code:microsoft/Swin-Transformer

3.15 DistillBERT

paper:DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

code:huggingface/distilbert-base-uncased-distilled-squad

3.16 Switch Transformer

paper:Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

code:LoniQin/english-spanish-translation-switch-transformer (Not Source Code,just an example)

3.17 Mirror-BERT

paper:Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

code:cambridgeltl/mirror-bert

3.18 Charformer

paper:Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

code:google-research/charformer

3.19 Big Bird

paper:Big Bird: Transformers for Longer Sequences

code:google-research/bigbird

3.20 ELECTRA

paper:ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS

code:google-research/electra

3.21 Gshard

paper:GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

4. Attention

4.1 Self-Attention

paper:Attention Is All You Need

code:tunz/transformer-pytorch

4.2 Multi-head Attention

paper:Attention Is All You Need

code:tunz/transformer-pytorch

4.3 Addictive Attention

paper:Neural Machine Translation by Jointly Learning to Align and Translate

code:Tensorflow/AddictiveAttention

4.4 Global-Local Attention

paper:ETC: Encoding Long and Structured Inputs in Transformers

code:google-research/etcmodel

4.5 Sparse Attention

paper:Generating Long Sequences with Sparse Transformers

code:openai/sparse_attention

4.6 Routing Attention

paper:Efficient Content-Based Sparse Attention with Routing Transformers

code:lucidrains/routing-transformer

5. Others

5.1 n-gram

paper:N-gram Language Model (Not Original Paper)

5.2 skip-gram

paper:Efficient Estimation of Word Representations in Vector Space

code:SeanLee97/nlp_learning/word2vec/cbow

5.3 Bag of Words(BOW)

paper:An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges (Not Original Paper)

5.4 CBOW

paper:Efficient Estimation of Word Representations in Vector Space

code:SeanLee97/nlp_learning/word2vec/skipgram

5.5 Seq2Seq

paper:Sequence to Sequence Learning with Neural Networks

code:farizrahman4u/seq2seq

5.6 WordPiece

paper:Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

code:google/sentencepiece

5.7 SentencePiece

paper:SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

code:google/sentencepiece

5.8 RNN

paper:Find Structure in Time

code:tensorflow/keras/RNN (Not Source Code)

5.9 LSTM

paper:LONG SHORT-TERM MEMORY

code:tensorflow/keras/LSTM (Not Source Code)

5.10 GRU

paper:Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

code:tensorflow/keras/GRU (Not Source Code)

5.11 BiLSTM

paper:Framewise Phoneme Classification with Bidirectional LSTM Networks

code:tensorflow/keras/bidirectional (Not Source Code)

三、已有的类ChatGPT产品及基于ChatGPT的应用(Existing ChatGPT-like products and applications based on ChatGPT)

1 BriefGPT

intro:Arxiv论文速递

web:https://briefgpt.xyz/

2 一起用AI、GPT3demo、Futurepedia、AI Library、AI工具集

intro:collections of AI products(tools) and gpt-based products(tools)

web:

https://17yongai.com

https://gpt3demo.com/

https://www.futurepedia.io/

https://library.phygital.plus/

https://ai-bot.cn/

3 灵羽助手

web:https://www.ai-anywhere.com/(目前只有电脑客户端)

4 Catalyst Plus(催化剂+)

web:https://www.researchercosmos.com/(目前大部分功能只为客户端提供)

5 Notion AI

intro:Write notes intelligently

web:https://www.notion.so/ (inside Notion)

6 Microsoft 365 AI

Inside Microsoft 365 (Not available in some countries)

7 WPS AI

Inside WPS(A Chinese Microsoft 365-like Product)

8 Windows Copilot

Inside Windows (Not available in some countries)

9 飞书"My AI"

Inside 飞书(Lark) (A Chinese office software) (Not currently open to the public)

10 钉钉"/"(魔法棒)

Inside 钉钉 (A Chinese office software) (Not currently open to the public)

11 文心一言 (百度 Baidu)

web:https://yiyan.baidu.com/welcome

12 通义千问 (阿里巴巴 Alibaba)

web:https://tongyi.aliyun.com/

13 商量 (商汤科技 SenseTime)

web:https://chat.sensetime.com/

14 讯飞星火认知大模型 (科大讯飞 iFLYTEK)

web:https://xinghuo.xfyun.cn/

15 MOSS (复旦大学 FDU)

code:OpenLMlab/MOSS

web:https://moss.fastnlp.top/ (Not currently open to the public)

16 曹植 (达观数据 Data Grand)

web:http://www.datagrand.com/products/aigc/ (Not currently open to the public)

17 天工AI助手 (昆仑万维 & 奇点智源)

web:https://tiangong.kunlun.com/

18 奇妙文(序列猴子) (出门问问 Mobvoi)

web:https://wen.mobvoi.com/

19 式说(SageGPT) (第四范式 4Paradigm)

web:http://www.4paradigm.com/product/SageGPT.html (Not currently open to the public)

20 从容(云从科技 CloudWalk)

web:https://maas.cloudwalk.com/ (Not currently open to the public)

21 面壁露卡(Luca) (面壁智能 Modelbest)

web:https://luca-beta.modelbest.cn/

22 360智脑

web:https://ai.360.cn/

23 TigerBot

code:TigerResearch/TigerBot

web:https://www.tigerbot.com/

24 山海(云知声 UniSound)

web:https://shanhai.unisound.com/

25 智谱青言(Based on ChatGLM2 developed by THUDM)

code:https://github.com/THUDM/ChatGLM2-6B

web:https://chatglm.cn/

26 百川大模型(百川智能)

web:https://chat.baichuan-ai.com/chat

27 豆包(字节跳动 ByteDance)

web:https://www.doubao.com/chat/

28 MathGPT(好未来)

web:www.mathgpt.com

29 Bard

web:https://bard.google.com/

30 Claude2(The Newest Version of Claude developed by Anthropic)

intro:The Strongest Competitor of ChatGPT

web:https://claude.ai/

31 AutoGPT

code:Significant-Gravitas/Auto-GPT

32 AgentGPT (AutoGPT Powered)

web:https://agentgpt.reworkd.ai/

code:reworkd/AgentGPT

33 MiniGPT-4

code:Vision-CAIR

34 HuggingChat

web:https://huggingface.co/chat

35 Perplexity AI

web:https://www.perplexity.ai/

36 Chat with Open Large Language Models

web:https://chat.lmsys.org/

37 FactGPT

web:https://factgpt-fe.vercel.app/

38 GPTZero

web:https://gptzero.me/

四、GPT和LLM的理论研究与应用研究(Theoretical and applied research of GPT and LLM)

五、关于GPT和LLM的博客与文章（主要来自微信公众号和Medium）（Blogs and articles about GPT and LLM (mainly from WeChat public account and Medium)）

六、写在最后(Write At The End)

人工智能（AI）的发展日新月异，AI的进步速度远远超乎我们的想象，我们应该始终保持学习的动力，积极主动拥抱AI新时代。但同时，也要看到大规模应用AI所带来的潜在风险和LLM的局限性。因此，我们应该拥有独立思考的能力，辩证看待AI，AIGC和LLM的发展。无论如何，我们的终极目标都是让AI造福人类，造福世界。

The development of Artificial Intelligence(AI) is changing with each passing day, and the speed of AI progress is far beyond our imagination. We should always maintain the motivation of learning and actively embrace the new era of AI. But at the same time, we must also see the potential risks brought about by the large-scale application of AI and the limitations of LLM. Therefore, we should have the ability to think independently and look at the development of AI, AIGC and LLM dialectically. In any case, our ultimate goal is to make AI benefit mankind and the world.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README.md		README.md

flybirdz/GPT-is-All-you-need

Folders and files

Latest commit

History

Repository files navigation

一、项目简介（Repo’s Brief Introduction）

二、ChatGPT(GPT4)的前世今生（The past and present of ChatGPT (GPT4)）

1. LLMs(Large Language Models)(大语言模型)

1.1 GPT1

1.2 GPT2

1.3 GPT3

1.4 CPM

1.5 FastMoE(Foundation of WuDao 2.0)

1.6 CPM2

1.7 Megatron-LM

1.8 ERINE 3.0

1.9 Claude

1.10 GLaM

1.11 Gopher

1.12 LaMDA

1.13 Turing-NLG-530B

1.14 ChinChilla

1.15 GPT3.5(InstructionGPT)

1.16 PaLM

1.17 OPT

1.18 BaGuaLu

1.19 Minerva

1.20 BLOOM

1.21 GLM

1.22 GLM-130B

1.23 LLaMA

1.24 LLAMA2

1.25 Alpaca

1.26 GPT4

1.27 Vicuna/FastChat

1.28 Cerabras-GPT

1.29 PanGu-α

1.30 PanGu-Σ

1.31 PanGu-π

1.32 Yuan 1.0

1.33 XuanYuan 2.0

1.34 BloombergGPT

1.35 BiomedGPT

1.36 Mengzi

1.37 PaLM2

1.38 LLaMA-Adapter

1.39 CPM-Bee

1.40 Baichuan2

2.Embeedings

2.1 Word2Vec

2.2 Doc2Vec

2.3 Context2Vec

2.4 lda2Vec

2.5 TWEC

2.6 USE

2.7 fastText

2.8 ELMo

2.9 GloVe

3.Transformer and Its Variants

3.1 Transformer

3.2 BERT

3.3 T5

3.4 Reformer

3.5 Longformer

3.6 XLM

3.7 UniLM

3.8 RoBERTa

3.9 ALBERT

3.10 DeBERTa

3.11 BART

3.12 XLNet

3.13 ViT

3.14 Swin Transformer

3.15 DistillBERT

3.16 Switch Transformer

3.17 Mirror-BERT

3.18 Charformer

3.19 Big Bird

3.20 ELECTRA

3.21 Gshard