-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anthology build information #157
Comments
I just noticed the format on the main page. It seems that in fact many of these items are already checked off. The main missing components are (a) to copy over the two yml files to the build directory and (b) to make linked paths relative. It might be nice to add a Makefile so people could just type |
Another issue: we need the copyright forms. Ideally these would be placed in a parallel folder, |
There's also a point of confusion: many workshops organizers seem to think that we do the building. We (note to self) need a validation script that tells them whether their tarball is up-to-spec, and if not, what the likely reasons are. I really do like the build format and the use of Github repos for each workshop! It makes it quite easy to find everything. |
It seems many people are confused about who's going to do the building (them or us). Having a Makefile target ( |
Many workshops use "main" for the |
We need to add examples where people can list the SIG so it gets automatically back-linked. |
We should build automatically using Github actions, from a raw format (will add watermarks) |
(Generally using this as a dumping ground for Anthology needs/wants, will sort out and clean up eventually)
While it's fresh (ACL 2023 ingestion) I want to make a few notes about the build format.
First, my understanding. I haven't used the software, but t seems that ACLPUB2 takes its input files and produces output in a build directory, similar say to cmake. I like this design. The output has the constructed book and watermarked PDFs, among other things. It also seems to create other directories; in addition to
build
, there isoutput
,output/inputs
, and sometimes some other directories, with some redundancy in files.For ingestion, it would be helpful if you delivered to us a single build directory. It will help to know how ingestion works. We run ingest_aclpub2.py with the following syntax:
It looks in this directory for the files
papers.yml
andconference_details.yml
, which contain all the necessary metadata. It then reads through the papers in order, assigning Anthology IDs starting with 1 (0 is reserved for frontmatter), and copying and renaming PDFs for upload.Here is a list of problems that the format delivered to us creates.
build
,output
,output/inputs
, etc.output/inputs/
.papers.yml
that we read should have PDF and attachments links that are relative paths, so that we can read them directly with no hidden or hard-coded assumptionsbuild/front_matter.pdf
andbuild/proceedings.df
?)Here is what would be ideal for us:
build/
, or a new directory, sayanthology/
papers.yml
andconference_details.yml
should be copied to this directory's rootwatermarked_pdfs
directory inside it. I think this already exists. Since it is part of the build, it could also just be calledpdfs/
(the idea being that this subdir contains "built" PDFs, i.e., watermarked ones)attachments/
subdirectory.front_matter.pdf
andproceedings.pdf
(if built) will be in the build directory, too, so we can test for their presencepapers.yml
should be updated to be relative to the build directory, and ideally should be contained within the build directoryHere is an example layout that would work extremely well for us
Note that if PDFs were missing, we would still ingest the metadata. So this would allow us, for example, to introduce a three-stage ingestion:
This would just be for *ACL main conferences; workshops would have to stick to a single-stage ingest.
The text was updated successfully, but these errors were encountered: