Existing Datasets

Summary :

How they did it?
- 10,921 pdfs - 2007 version
- ???? pdfs - 2016 version
- PDFBOX
What they did - novelty?
- inter and intra reference linking
- proposed a task for benchmarking the above

Corpling@GU (Georgetown University) have ACL Anthology from 1985-2022 - Behind firewall

Other work using these datasets/similar work-

Citation Analysis, Centrality, and the ACL Anthology Detailed citation network analysis. They even list which paper has most citation inside the network which would be good to see. The work also calculates the impact factor of ACL anthology which is interesting.
Purpose and Polarity of Citation: Towards NLP-based Bibliometrics This might of interest to folks working in citation context classification.
CORD-19: The COVID-19 Open Research Dataset This paper can be a template for the work we are doing. It is very similar to what we are doing but in ACL domain. We can take inspiration and do stuff tailored for the linguistics community. Tasks mentioned in section 4 Research directions are specially very iteresting for us.

There are mostly summarization tasks around AAN - using the citation context given by the dataset.

Provide feedback