- 数据堂
- 语料库在线
- 3 Million Instacart Orders, Open Sourced
- ACM Multimedia Systems Conference Dataset Archive
- A comprehensive dataset for stock movement prediction from tweets and historical stock prices.
- A dataset for book recommendations: ten thousand books, one million ratings
- An awesome list of high-quality datasets ⭐
- An awesome list of high-quality open datasets in public domains ⭐
- A new dataset for Attribute Based Classification and Zero-Shot Learning
- Audio Data Links
- Clustering basic benchmark
- CNSD 中文自然语言推理数据集
- Cool Datasets ⭐
- Corpora of misspellings for download
- Datasets for Data Science and Machine Learning
- DeepDive Open Datasets ⭐
- FiveThirtyEight开放可视化数据
- Hard Drive Data and Stats
- Open Datasets
- Picture and specifications scraper
- Pixiv Dataset Overview
- SLAC: A Sparsely Labeled ACtions Dataset from MIT and Facebook
- Some good papers I like
- Standardized data set for machine learning of protein structure
- Telenav.AI competition public repository
- The Quick, Draw! Dataset
- Wolfram Data Repository
- 300 Faces In-the-Wild Challenge
- A dataset for personalized highlight detection
- A Large-Scale Dataset for Vehicle Re-Identification in the Wild
- A MNIST-like fashion product database ⭐
- Caltech 10, 000 Web Faces
- CASIA WebFace Database
- Cross-Age Celebrity Dataset
- DeepFashion: Fashion Landmark Detection
- EMOTIC Dataset
- Face Recognition for Web-Scale Datasets
- IMDB-WIKI – 500k+ face images with age and gender labels
- Kaggle Datasets
- Labeled Faces in the Wild Home
- Large-scale CelebFaces Attributes (CelebA) Dataset
- LLD - Large Logo Dataset
- Medical imaging datasets
- Media Integration and Communication Center
- MegaFace Dataset
- MSRA-CFW: Data Set of Celebrity Faces on the Web
- Netizen-Style Commenting on Fashion Photos – Dataset and Diversity Measures
- Open Images Dataset V4
- SCUT HEAD is a large-scale head detection dataset
- Street View Image, Pose, and 3D Cities Dataset
- VGG Face Dataset
- VGGFace2 Dataset
- WebVision视觉数据集2.0
- WIDER FACE: A Face Detection Benchmark
- YouTube Faces DB
- 大规模中文自然语言处理语料
- 用于对话系统的中英文语料
- 搜狗实验室
- 情感分析︱网络公开的免费文本语料训练数据集汇总
- 中文情感分析用词语集
- 人民日报切分/标注语料库
- 哈工大信息检索研究中心(HIT CIR)语言技术平台共享资源
- 中文句结构树资料库
- 中文对白语料 chinese conversation corpus
- 中文语料小数据:Some useful Chinese corpus datasets
- 中文人名语料库。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名
- 中文突发事件语料库
- 联合国平行语料库
- 保险行业语料库
- 中华新华字典数据库。包括歇后语,成语,汉字。提供新华字典API
- 用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
- 最全中华古诗词数据库
- PTT 八卦版問答中文語料
- Acemap Knowledge Graph ⭐
- A dataset of 200k English plaintext jokes.
- Alphabetical list of free/public domain datasets with text data for use in NLP
- A New Multi-Turn, Multi-Domain, Task-Oriented Dialogue Dataset
- A text file containing 479k English words for all your dictionary/word-based projects
- BBC Sound Effects Archive Resource
- CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
- Chat corpus collection from various open sources
- Chinese Nlp Corpus
- Chinese Text in the Wild
- CoLA - The Corpus of Linguistic Acceptability ⭐
- Collections of Chinese NLP corpus
- Cornell NLVR
- Course materials for Text as Data Lab
- Datasets of Annotated Semantic Relationships
- Datasets for Entity Recognition
- Japanese Word Similarity Dataset
- Movie Review Data
- Multi-Domain Sentiment Dataset
- Open Domain Question Answering ⭐
- Open Speech and Language Resources ⭐
- Poetry-related datasets collected by THUAIPoet (Jiuge) group.
- Public Datasets For Recommender Systems
- Second International Chinese Word Segmentation Bakeoff Data ⭐
- Taiga Сorpus
- Ten thousand books, six million ratings
- The Big Bad NLP Database
- The DBpedia Knowledge Base
- The Movies Corpus
- TriviaQA: A Large Scale Dataset for Reading Comprehension and Question Answering
- Yelp Open Dataset
- 70万条对联数据库