Skip to content
forked from Ravoxsg/MiddleSum

Code for the paper "On Context Utilization in Summarization with Large Language Models" (ACL 2024)

Notifications You must be signed in to change notification settings

ntunlp/MiddleSum

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Getting Started

Once you clone the repo, create a dedicated conda environment with Python 3.8:

cd MiddleSum/
conda create --name middlesum python=3.8.17

Next activate the environment:

conda activate middlesum

Then install all the dependencies:

pip install -r requirements.txt

Do not forget to change the values in the src/keys.py file. You need to enter the path to your home working directory, your HuggingFace token and your OpenAI key (if you want to use GPT-3.5).

Prepare the data

Next, we need to do some small extra data preparation for 2 datasets: Multi-XScience and SummScreen.

For Multi-XScience, we add a separation token between documents. Please run:

python src/prepare_data/prepare_multixscience.py

For SummScreen, first download the dataset here: https://github.com/mingdachen/SummScreen.
Then place it in src/prepare_data/summscreen/.
Next, run:

python src/prepare_data/prepare_summscreen.py

To build the MiddleSum evaluation dataset, please run:

python src/prepare_data/prepare_middlesum.py

In order for this script to work, you need to have already saved in summaries/ files with labels on the whole datasets, as well as to have computed the sentence-level alignments between label sentences, and sentences from the source. Please see the next section for this.
For MiddleSum, the code logic adds a "queries" list (on top of the sources, and the labels) which tracks which original dataset each data point comes from.

If you just want to work on MiddleSum, check the file jsonl file MiddleSum/middlesum.jsonl in this repo. It one dictionary for each data point of the shuffled version of MiddleSum, where shuffling was used with the same permutation used prior to running inference with all LLMs.

The 3 scripts above will save source documents and label summaries under: raw_summaries/dataset_name/subset_name/

Experiments

Inference

First, you need to generate the summaries. 2 consumer grade (24-48GB) GPUs will be enough:

CUDA_VISIBLE_DEVICES=0,1 python src/llm_inference.py --dataset <dataset_name> --subset <subset_name> --clean_model_name <llm_name> 

This will save generated summaries (as well as files with the sources and files with the labels) under summaries/dataset_name/subset_name/.

Then, you need to score the generated summaries:

python src/main.py --dataset <dataset_name> --subset <subset_name> --clean_model_name <llm_name> --metric <metric_name>

This will save scores under scores/dataset_name/subset_name/.

Alternatively, you can download the summaries with this link. Then place them in summaries/.

Research Questions

To reproduce the analysis in RQ1, about mapping bigrams in generated summaries to the source:

python src/rq1_alignment_bigrams.py --dataset <dataset_name> --subset <subset_name> --clean_model_name <llm_name> 

To reproduce the analysis in RQ2, about mapping sentences in generated summaries to the visible source:

python src/rq2_alignment_sentences.py --dataset <dataset_name> --subset <subset_name> --clean_model_name <llm_name> 

In this file, you also have an arg --compute_alignment which controls whether to build and save or load the sentence-level alignments, which can be quite time consuming. These are necessary for all long-input datasets to build the MiddleSum dataset with src/prepare_data/prepare_middlesum.py

To reproduce the analysis in RQ3, about checking the correlation between the mean position of salient info and the source:

python src/rq3_mean_salient_position.py --dataset <dataset_name> --subset <subset_name> --clean_model_name <llm_name> --metric <metric_name>

Analysis

To run the control experiment on salient position of Figure 3, for instance on Multi-XScience placing the relevant document at position 0:

CUDA_VISIBLE_DEVICES=0,1 python src/llm_control_inference.py --dataset multixscience --subset test --control_n_docs True --n_docs 7 --control position --control_doc_pos 0 --swap_docs True --clean_model_name <llm_name>

To run the control experiment of Table 3 with only the first and last documents, for instance on Multi-News:

CUDA_VISIBLE_DEVICES=0,1 python src/llm_control_inference.py --dataset multinews --subset test --control_n_docs True --n_docs 5 --control filling --swap_docs False --clean_model_name <llm_name>

For the same setup but including 3 random documents between the first and last ones:

CUDA_VISIBLE_DEVICES=0,1 python src/llm_control_inference.py --dataset multinews --subset test --control_n_docs True --n_docs 5 --control filling --swap_docs True --clean_model_name <llm_name>

To run inference on MiddleSum with the focus prompt (Figure 5):

CUDA_VISIBLE_DEVICES=0,1 python src/llm_inference.py --dataset middlesum --subset test --clean_model_name <llm_name> --focus_prompt True

with hierarchical inference (Figure 5):

CUDA_VISIBLE_DEVICES=0,1 python src/llm_inference.py --dataset middlesum --subset test --clean_model_name <llm_name> --inference_method pyramidal

with incremental inference (Figure 5):

CUDA_VISIBLE_DEVICES=0,1 python src/llm_inference.py --dataset middlesum --subset test --clean_model_name <llm_name> --inference_method incremental

To run inference with a truncated input length, for instance on Arxiv with length 2048 (Figure 6):

CUDA_VISIBLE_DEVICES=0,1 python src/llm_inference.py --dataset arxiv --subset test --clean_model_name <llm_name> --enforced_max_length 2048

Citation

If you find any of this useful, please kindly consider citing our paper in your publication.

@article{ravaut2023context,
  title={On Context Utilization in Summarization with Large Language Models},
  author={Ravaut, Mathieu and Joty, Shafiq and Sun, Aixin and Chen, Nancy F},
  journal={arXiv e-prints},
  pages={arXiv--2310},
  year={2023}
}

About

Code for the paper "On Context Utilization in Summarization with Large Language Models" (ACL 2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%