Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion: CCL 2024 #4535

Merged
merged 5 commits into from
Feb 3, 2025
Merged

Ingestion: CCL 2024 #4535

merged 5 commits into from
Feb 3, 2025

Conversation

mjpost
Copy link
Member

@mjpost mjpost commented Jan 31, 2025

  1. In the Github sidebar, add the PR to work items and the current milestone
  2. In the Github sidebar, under "Development", make sure to link to the corresponding issue
  3. Make sure the branch is merged with the latest master branch
  4. Ensure that there are editors listed in the <meta> block
  5. For workshops, add a <venue>ws</venue> tag to its meta block
  6. For workshops, add a backlink from the main event's <event> block
  7. Add events to their relevant SIGs
  8. Look at the venue listing for prior years, and ensure that the new volume titles are consistent. You can do this by clicking on the venue name from a paper page, which will take you to the vendor listing.
  9. Navigate to the event page preview (e.g., https://preview.aclanthology.org/icnlsp-ingestion/events/icnlsp-2021/), and page through, to see if there are any glaring mistakes
  10. Skim through the complete listing, looking for mis-parsed author names.
  11. Download the frontmatter and verify that the table of contents matches at least three randomly-selected papers
  12. Download 3–5 PDFs (including the first and last one) and make sure they are correct (title, authors, page numbers).

@mjpost mjpost linked an issue Jan 31, 2025 that may be closed by this pull request
2 tasks
@mjpost mjpost added this to the 2025Q1 milestone Jan 31, 2025
@mjpost
Copy link
Member Author

mjpost commented Jan 31, 2025

I want to note for future record that we encountered the same errors as in #2487. The process for ingesting was:

  • Revert to commit 97d61a5
  • Manually create a new venv with python3.10
  • Downgrade pybtex to 0.22.2
  • Hack the imports to make pybtex work
  • Run the ingestion
  • Run the script bin/author_names_to_variants.py on the CCL file
  • Manually enter the metadata blocks, which were ignored because the book PDF wasn't found

We'll need a new ingestion format to handle name variants in the future. Per #4045, I suggest this be supported in aclpub2 alone.

Copy link

Build successful. Some useful links:

This preview will be removed when the branch is merged.

@mjpost mjpost changed the title Ingest CCL 2024 Ingestion: CCL 2024 Jan 31, 2025
@mjpost mjpost self-assigned this Jan 31, 2025
Copy link
Collaborator

@anthology-assist anthology-assist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

</paper>
<paper id="4">
<title><fixed-case>C</fixed-case>hinese Frame Semantic Parsing Evaluation</title>
<author><first>Yang</first><last>Peiyuan</last></author>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some authors' ZH names are missing.

@mjpost mjpost merged commit 723825b into master Feb 3, 2025
4 checks passed
@mjpost mjpost deleted the ingest-ccl-2024 branch February 3, 2025 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingestion Request [07-20-2024]: CCL 2024
2 participants