这里是截至目前关于GPT的一切你应该知道的,包括论文,开源模型,网站,博客... 我将它们整理到一起,以便大家更方便地了解和使用GPT。项目不定期更新,内容可能不全,还请大家补充!
Here is everything about GPT what you should know so far, including papers, open source models, websites, blogs... I organize them together so that everyone can understand and use GPT more easily. The project is updated from time to time, and the content may be incomplete. Please add it!
paper:Improving Language Understanding by Generative Pretraining
code:openai/finetune-transformer-lm
paper:Language Models are Unsupervised Multitask Learners
code:openai/gpt-2
paper:Language Models are Few-Shot Learners
code:未开源(Not open source)
paper:CPM: A Large-scale Generative Chinese Pre-trained Language Model
code:TsinghuaAI/CPM
paper:FastMoE: A Fast Mixture-of-Expert Training System
code:laekov/fastmoe
paper:CPM-2: Large-scale Cost-effective Pre-trained Language Models
code:TsinghuaAI/CPM
paper:Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
code:NVIDIA/Megatron
paper:ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation(base)
code:PaddleNLP/erine-3.0
paper:Constitutional AI: Harmlessness from AI Feedback
code:未开源(Not open source)
paper:GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
code:未开源(Not open source)
papar:Scaling Language Models: Methods, Analysis & Insights from Training Gopher
code:未开源(Not open source)
paper:LaMDA: Language Models for Dialog Applications
code:未开源(Not open source)
code:未开源(Not open source)
paper:Training Compute-Optimal Large Language Models
code:未开源(Not open source)
paper(有两篇供参考)(There are two ones to be referred):
Learning to summarize from human feedback
Training language models to follow instructions with human feedback
code:未开源(Not open source)
paper:PaLM: Scaling Language Modeling with Pathways
paper:OPT: Open Pre-trained Transformer Language Models
paper:BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores
code:未开源(Not open source)
paper:Solving Quantitative Reasoning Problems with Language Models
code:未开源(Not open source)
paper:BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
code:
paper:GLM: General Language Model Pretraining with Autoregressive Blank Infilling
code:THUDM/GLM
paper:GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL
code:THUDM/GLM-130B
paper:LLaMA: Open and Efficient Foundation Language Models
paper:LLAMA2:Open Foundation and Fine-Tuned Chat Models
code:tatsu-lab/stanford_alpaca
paper:GPT-4 Technical Report
code:未开源(Not open source)
code:lm-sys/FastChat
paper:Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
code:huggingface/cerabras
code:huawei-noah/Pretrained-Language-Model/PanGu-α
paper:PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
code:未开源(Not open source)
paper:PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation
code:未开源(Not open source)
paper:Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
code:未开源(Not open source)
paper:XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters
code:未开源(Not open source)
paper:BloombergGPT: A Large Language Model for Finance
code:未开源(Not open source)
code:未开源(Not open source)
paper:Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese
code:Langboat/Mengzi
paper:PaLM 2 Technical Report
code:未开源(Not open source)
paper:LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
code:ZrrSkywalker/LLaMA-Adapter
code:OpenBMB/CPM-Bee
paper:Efficient Estimation of Word Representations in Vector Space
paper:Distributed Representations of Sentences and Documents
code:Gensim/Doc2Vec Model
(Gensim is a popular python nlp processing library)
paper:context2vec: Learning Generic Context Embedding with Bidirectional LSTM
code:orenmel/context2Vec
paper:Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
code:cemoody/lda2vec
paper:Training Temporal Word Embeddings with a Compass
code:valedica/twec
paper:Multilingual Universal Sentence Encoder for Semantic Retrieval
code:Dimitre/universal-sentence-encoder
paper:Enriching Word Vectors with Subword Information
code:facebookresearch/fastText
paper:Deep contextualized word representations
code:HIT-SCIR/ELMoForManyLangs
paper:GloVe:Global Vectors for Word Representation
code:stanfordnlp/GloVe
paper:Attention Is All You Need
paper:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
code:google-research/bert
paper:Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
code:google-research/text-to-text-transfer-transformer
paper:Reformer: The Efficient Transformer
code:google/trax/reformer
paper:Longformer: The Long-Document Transformer
code:allenai/longformer
paper:Cross-lingual Language Model Pretraining
code:facebookresearch/XLM
paper:Unified Language Model Pre-training for Natural Language Understanding and Generation
code:microsoft/unilm
paper:RoBERTa: A Robustly Optimized BERT Pretraining Approach
code:huggingface/roberta
paper:ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
paper:DeBERTa: Decoding-enhanced BERT with Disentangled Attention
code:microsoft/DeBERTa
code:huggingface/bart
paper:XLNet: Generalized Autoregressive Pretraining for Language Understanding
code:zihangdai/xlnet
paper:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
code:google-research/vision_transformer
paper:Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
code:microsoft/Swin-Transformer
paper:DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
code:huggingface/distilbert-base-uncased-distilled-squad
paper:Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
code:LoniQin/english-spanish-translation-switch-transformer (Not Source Code,just an example)
paper:Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
code:google-research/charformer
paper:Big Bird: Transformers for Longer Sequences
paper:ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS
paper:GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
paper:Attention Is All You Need
paper:Attention Is All You Need
paper:Neural Machine Translation by Jointly Learning to Align and Translate
code:Tensorflow/AddictiveAttention
paper:ETC: Encoding Long and Structured Inputs in Transformers
paper:Generating Long Sequences with Sparse Transformers
paper:Efficient Content-Based Sparse Attention with Routing Transformers
code:lucidrains/routing-transformer
paper:N-gram Language Model (Not Original Paper)
paper:Efficient Estimation of Word Representations in Vector Space
code:SeanLee97/nlp_learning/word2vec/cbow
paper:An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges (Not Original Paper)
paper:Efficient Estimation of Word Representations in Vector Space
code:SeanLee97/nlp_learning/word2vec/skipgram
paper:Sequence to Sequence Learning with Neural Networks
paper:Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
code:google/sentencepiece
code:google/sentencepiece
paper:Find Structure in Time
code:tensorflow/keras/RNN (Not Source Code)
paper:LONG SHORT-TERM MEMORY
code:tensorflow/keras/LSTM (Not Source Code)
paper:Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
code:tensorflow/keras/GRU (Not Source Code)
paper:Framewise Phoneme Classification with Bidirectional LSTM Networks
code:tensorflow/keras/bidirectional (Not Source Code)
intro:Arxiv论文速递
intro:collections of AI products(tools) and gpt-based products(tools)
web:
https://library.phygital.plus/
web:https://www.ai-anywhere.com/(目前只有电脑客户端)
web:https://www.researchercosmos.com/(目前大部分功能只为客户端提供)
intro:Write notes intelligently
web:https://www.notion.so/ (inside Notion)
Inside Microsoft 365 (Not available in some countries)
Inside WPS(A Chinese Microsoft 365-like Product)
Inside Windows (Not available in some countries)
Inside 飞书(Lark) (A Chinese office software) (Not currently open to the public)
Inside 钉钉 (A Chinese office software) (Not currently open to the public)
web:https://yiyan.baidu.com/welcome
web:https://tongyi.aliyun.com/
web:https://chat.sensetime.com/
code:OpenLMlab/MOSS
web:https://moss.fastnlp.top/ (Not currently open to the public)
web:http://www.datagrand.com/products/aigc/ (Not currently open to the public)
web:https://tiangong.kunlun.com/
web:http://www.4paradigm.com/product/SageGPT.html (Not currently open to the public)
web:https://maas.cloudwalk.com/ (Not currently open to the public)
web:https://luca-beta.modelbest.cn/
web:https://shanhai.unisound.com/
code:https://github.com/THUDM/ChatGLM2-6B
web:https://chat.baichuan-ai.com/chat
web:https://www.doubao.com/chat/
web:www.mathgpt.com
intro:The Strongest Competitor of ChatGPT
code:Significant-Gravitas/Auto-GPT
web:https://agentgpt.reworkd.ai/
code:reworkd/AgentGPT
code:Vision-CAIR
web:https://huggingface.co/chat
web:https://www.perplexity.ai/
web:https://factgpt-fe.vercel.app/
五、关于GPT和LLM的博客与文章(主要来自微信公众号和Medium)(Blogs and articles about GPT and LLM (mainly from WeChat public account and Medium))
人工智能(AI)的发展日新月异,AI的进步速度远远超乎我们的想象,我们应该始终保持学习的动力,积极主动拥抱AI新时代。但同时,也要看到大规模应用AI所带来的潜在风险和LLM的局限性。因此,我们应该拥有独立思考的能力,辩证看待AI,AIGC和LLM的发展。无论如何,我们的终极目标都是让AI造福人类,造福世界。
The development of Artificial Intelligence(AI) is changing with each passing day, and the speed of AI progress is far beyond our imagination. We should always maintain the motivation of learning and actively embrace the new era of AI. But at the same time, we must also see the potential risks brought about by the large-scale application of AI and the limitations of LLM. Therefore, we should have the ability to think independently and look at the development of AI, AIGC and LLM dialectically. In any case, our ultimate goal is to make AI benefit mankind and the world.