Synthesis_Step-by-Step_Official

This repository contains the datasets and codes for our paper "Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models"

Datasets

If you just want to explore our dataset. We provide a simple json version of the dataset ./Datasets. The format for each task is as follows:

IMDb: The dataset is formatted as: [[Review text], [label]]
QNLI: The dataset is in format: [[premises], [questions], [labels]]
RTE: The dataset has format: [[premises], [hypothesis], [label]]
AdQA: The dataset is formatted as: [[contexts], [questions], [answers], [start_idx], [end_idx], [id]]

Click and Run Inference using our Dataset

We have created a simple click-and-run example for how to use our generated dataset. The code are in ./ModelTraining. In order to run our code, you first need to install environment from ./ModelTraining/environment.yml. For different task, the main file is organized as ./ModelTraining/main_<task_name>.py. It can auto store the misclassified data into the folder. You can play with our generated dataset by running corresponding main file.

Data Generation

Due to the generation code is a bit complicated, please give us a little more time to finish the code cleaning. It will be coming soon

Acknowledgement

We thank the anonymous reviewers for their feedback on our paper. MS acknowledges support from the Swiss National Science Foundation (Project No. 197155), a Responsible AI grant by the Haslerstiftung; and an ETH Grant (ETH-19 21-1).

Citation information

If you use this code, please cite our paper:

@inproceedings{wang2023let,
  title={Let’s Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models},
  author={Wang, Ruida and Zhou, Wangchunshu and Sachan, Mrinmaya},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
  pages={11817--11831},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Datasets		Datasets
ModelTraining		ModelTraining
imges		imges
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthesis_Step-by-Step_Official

Datasets

Click and Run Inference using our Dataset

Data Generation

Acknowledgement

Citation information

About

Releases

Packages

Languages

RickySkywalker/Synthesis_Step-by-Step_Official

Folders and files

Latest commit

History

Repository files navigation

Synthesis_Step-by-Step_Official

Datasets

Click and Run Inference using our Dataset

Data Generation

Acknowledgement

Citation information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages