Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoPR for paper information correction #4147

Merged
merged 106 commits into from
Dec 27, 2024
Merged
Show file tree
Hide file tree
Changes from 101 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
aa79252
Auto pr for paper information correction (#4138)
yufei118liu Dec 10, 2024
f93544f
Auto pr (#4146)
yufei118liu Dec 11, 2024
6831fe1
Merge branch 'master' into autopr
mjpost Dec 11, 2024
6404262
Add new correction form and paper link to form
mjpost Dec 13, 2024
7cad0c6
black
mjpost Dec 13, 2024
4325c92
Define missing variable
mjpost Dec 13, 2024
8fc8bc3
Merge branch 'master' into autopr
mjpost Dec 13, 2024
afd9952
Add issue title; make id conditional
mjpost Dec 13, 2024
228162c
Merge remote-tracking branch 'origin/master' into autopr
mjpost Dec 13, 2024
5571e85
Remove non-working hidden template
mjpost Dec 13, 2024
755c471
Fix inline variable expansion
mjpost Dec 13, 2024
635222f
Correct template to JSON
mjpost Dec 13, 2024
ebfeafb
Partial bulk edit script
mjpost Dec 13, 2024
b727d5f
black
mjpost Dec 13, 2024
f17bacb
Remove first attempt
mjpost Dec 13, 2024
62b9787
Remove alert, fix string
mjpost Dec 13, 2024
949d051
Remove unused keys from hugo stub
mjpost Dec 17, 2024
6a6adfa
Add {title,booktitle,abstract}_raw
mjpost Dec 17, 2024
d2bb10b
Use raw fields in form
mjpost Dec 17, 2024
904fc44
Format authors as single lines
mjpost Dec 17, 2024
4017fbd
Remove variable
mjpost Dec 17, 2024
b2f37cc
black
mjpost Dec 17, 2024
490b352
Fix variable
mjpost Dec 17, 2024
4189ad5
Remove custom function
mjpost Dec 17, 2024
d711d8e
Update template
mjpost Dec 17, 2024
0d3d24d
Remove booktitle_raw
mjpost Dec 17, 2024
3a3a690
Revert ingestion requestion change
mjpost Dec 17, 2024
71703aa
black
mjpost Dec 17, 2024
514f3dc
Testing metadata correction locations
mjpost Dec 17, 2024
855b099
black
mjpost Dec 17, 2024
200bf64
Playing with correction link location
mjpost Dec 17, 2024
ee2da1e
black
mjpost Dec 17, 2024
f18f44d
Fiddling
mjpost Dec 17, 2024
450144c
Change button color
mjpost Dec 17, 2024
65161fe
Add metadata issue annotator
mjpost Dec 18, 2024
5409a8a
black
mjpost Dec 18, 2024
212c530
Loosen JSON check
mjpost Dec 18, 2024
01e45cc
Merge remote-tracking branch 'origin/master' into autopr
mjpost Dec 18, 2024
a58abce
Simplify workflow
mjpost Dec 18, 2024
5ac541e
Merge branch 'master' into autopr
mjpost Dec 18, 2024
a942394
Remove manual metadata correction template
mjpost Dec 18, 2024
1b14581
black
mjpost Dec 18, 2024
04cd465
core.setFailed -> console.log
mjpost Dec 18, 2024
f1cc6ea
Split into steps
mjpost Dec 18, 2024
257f431
Merge remote-tracking branch 'origin/master' into autopr
mjpost Dec 18, 2024
3837995
Add JSON block syntax
mjpost Dec 18, 2024
7d48671
Add LLM-based issue validation
mjpost Dec 19, 2024
9fe9644
Progress on bulk script
mjpost Dec 19, 2024
5955694
black
mjpost Dec 19, 2024
494194a
Refactoring a bit in preparation for the dialog
mjpost Dec 19, 2024
136fd8a
Added example dialog script
mjpost Dec 19, 2024
7b57813
First-pass LLM validator working
mjpost Dec 20, 2024
8ea1a2f
Add LLM metadata validator
mjpost Dec 20, 2024
7d89733
Merge branch 'llm-validator' into autopr
mjpost Dec 20, 2024
fdf9d7d
whitespace
mjpost Dec 20, 2024
64d5adf
Nice example but dragging doesn't work
mjpost Dec 20, 2024
d61e0f4
More finessing
mjpost Dec 21, 2024
6057899
Merge remote-tracking branch 'origin/master' into autopr
mjpost Dec 21, 2024
7e09025
Remove temp files
mjpost Dec 21, 2024
b150ed9
Restore dl
mjpost Dec 21, 2024
d28c334
Add code for display dialog
mjpost Dec 21, 2024
76239c2
Added missing styles
mjpost Dec 21, 2024
de2dbdc
Minor rewording
mjpost Dec 21, 2024
8f57839
black
mjpost Dec 21, 2024
265b01b
rename
mjpost Dec 21, 2024
fef5c2b
Restore additional entries
mjpost Dec 21, 2024
1eacc43
Add license header
mjpost Dec 21, 2024
a06a988
Add pen icon
mjpost Dec 21, 2024
e0de401
Remove duplicate authorsContainer
mjpost Dec 21, 2024
fafe50a
Use evidenced dialog closer
mjpost Dec 21, 2024
3940b9b
Added dummy static file used for layout testing
mjpost Dec 21, 2024
038fe39
black
mjpost Dec 21, 2024
30e0e7b
black
mjpost Dec 21, 2024
3f4aa3c
Move styling propers
mjpost Dec 22, 2024
1845849
Only submit changes
mjpost Dec 22, 2024
a87bc3e
Fiddle with responsiveness
mjpost Dec 25, 2024
89262c5
Try to fit drag control with first name
mjpost Dec 25, 2024
283aa96
Trying again
mjpost Dec 25, 2024
8e2bd47
Debugging grouping, add alerts
mjpost Dec 26, 2024
d2f55ed
More compact grabber
mjpost Dec 26, 2024
0902946
Fix spacing, remove alerts
mjpost Dec 26, 2024
2cfae96
Restore anth ID in title
mjpost Dec 26, 2024
c11e7c2
Update documentation
mjpost Dec 26, 2024
7692326
Bulk script works
mjpost Dec 26, 2024
25ce356
Remove existing authors
mjpost Dec 26, 2024
4b7db8b
Add --dry-run
mjpost Dec 26, 2024
2ce4c75
Add author node in correct place
mjpost Dec 26, 2024
29269e0
Properly handle updated subtags
mjpost Dec 26, 2024
f449cc7
black
mjpost Dec 26, 2024
37cd0b0
Merge branch 'master' into autopr
mjpost Dec 26, 2024
0253ffe
Add author IDs to comparison
mjpost Dec 26, 2024
b81970a
Preserve only explicit IDs
mjpost Dec 26, 2024
4252ded
Add flag to close old issues
mjpost Dec 27, 2024
fe50ccd
Update note
mjpost Dec 27, 2024
3099ce7
Remove test code
mjpost Dec 27, 2024
62181e2
Restore deleted lines
mjpost Dec 27, 2024
68b9c04
Added workflow to remove approval status
mjpost Dec 27, 2024
7e592c9
Look for "approved" label
mjpost Dec 27, 2024
85e9ce7
Cleanup
mjpost Dec 27, 2024
26b29e7
black
mjpost Dec 27, 2024
f6e173f
Use try block around each issue
mjpost Dec 27, 2024
f4c2d3e
clean up header
mjpost Dec 27, 2024
6503e48
Add missing anthology id key to javascript
mjpost Dec 27, 2024
50fe7f9
typos
mjpost Dec 27, 2024
5540e45
Fix XML file paths
mjpost Dec 27, 2024
599ef1e
Remove LLM checker
mjpost Dec 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 0 additions & 73 deletions .github/ISSUE_TEMPLATE/01-metadata-correction.yml

This file was deleted.

13 changes: 5 additions & 8 deletions .github/ISSUE_TEMPLATE/99-bulk-metadata-correction.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,16 @@ body:
- type: markdown
attributes:
value: >
This form is activated by following a link from each paper page in the Anthology (e.g., https://preview.aclanthology.org/autopr/K17-1003/). The form will list the title, abstract, and authors in JSON format, which you can manipulate to make corrections, such as adjusting a title, correcting an author name, adding a missing author, or reordering names.

Please note to take care to preserve structure such as the `<fixed-case>` tag that is sometimes present in titles.

**This form is not meant to be used manually.** Instead, it is activated by following the "Fix metadata" link from each paper page in the Anthology (e.g., https://aclanthology.org/K17-1003/). Clicking this button displays a UI tool for modifying the title, abstract, and author list. Submission of that form will automatically populate the field below.
- type: markdown
attributes:
value: >
Corrections will be processed in bulk on a weekly basis after verification by Anthology staff.
- type: textarea
id: data
attributes:
label: JSON code block
description: Please make corrections below, taking care to preserve the JSON structure (e.g., no trailing commas at the end of lists). If you add an author, do not worry about the ID unless you know what it is.
validations:
required: true
- type: markdown
attributes:
value: |
**Note:** If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.
**Important:** If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.
Copy link
Preview

Copilot AI Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase 'If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.' should be 'If you request an author name correction for yourself, please ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.' for clarity.

Suggested change
**Important:** If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.
**Important:** If you request an author name correction for yourself, please ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
5 changes: 1 addition & 4 deletions .github/workflows/annotate-metadata-issue.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ jobs:
const hasRequiredLabels =
labels.includes('correction') &&
labels.includes('metadata');

core.setOutput('has_required_labels', hasRequiredLabels.toString());

- name: Parse JSON from issue body
Expand Down Expand Up @@ -56,9 +55,7 @@ jobs:
script: |
const anthology_id = core.getInput('anthology_id');
const comment = `
Found ACL Anthology entry:

📄 Paper: https://aclanthology.org/${anthology_id}
Found ACL Anthology entry: https://aclanthology.org/${anthology_id}

![Thumbnail](https://aclanthology.org/thumb/${anthology_id}.jpg)
`;
Expand Down
38 changes: 38 additions & 0 deletions .github/workflows/reset-metadata-approval.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Reset Approval on Edit

on:
issues:
types:
- edited

jobs:
reset-approval:
if: contains(join(github.event.issue.labels.*.name, ','), 'correction') || contains(join(github.event.issue.labels.*.name, ','), 'metadata')
runs-on: ubuntu-latest
steps:
- name: Leave a comment and reset approval status
uses: actions/github-script@v6
with:
script: |
const { issue, repository } = context.payload;
const owner = repository.owner.login;
const repo = repository.name;
const approvedLabel = "approved";

// Check if the issue has the "approved" label and remove it
if (issue.labels.some(label => label.name === approvedLabel)) {
await github.issues.removeLabel({
owner,
repo,
issue_number: issue.number,
name: approvedLabel
});

// Add a comment to notify about the edit
await github.issues.createComment({
owner,
repo,
issue_number: issue.number,
body: "Approval status has been reset after the issue was edited."
});
}
27 changes: 19 additions & 8 deletions bin/anthology/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,14 +508,25 @@ def parse_element(
return attrib


def make_simple_element(tag, text=None, attrib=None, parent=None, namespaces=None):
"""Convenience function to create an LXML node"""
el = (
etree.Element(tag, nsmap=namespaces)
if parent is None
else etree.SubElement(parent, tag)
)
if text:
def make_simple_element(
tag, text=None, attrib=None, parent=None, sibling=None, namespaces=None
):
"""Convenience function to create an LXML node.

:param tag: the tag name
:param text: the text content of the node
:param attrib: a dictionary of attributes
:param parent: the parent node
:param sibling: if provided and found, the new node will be inserted after this node
"""
el = etree.Element(tag, nsmap=namespaces)
if parent is not None:
if sibling is not None:
parent.insert(parent.index(sibling) + 1, el)
else:
parent.append(el)

if text is not None:
el.text = str(text)
if attrib:
for key, value in attrib.items():
Expand Down
3 changes: 3 additions & 0 deletions bin/create_hugo_yaml.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,15 @@ def export_anthology(anthology, outdir, clean=False, dryrun=False):
if paper.parent_volume.ingest_date:
data["ingest_date"] = paper.parent_volume.ingest_date
data["title_html"] = paper.get_title("html")
data["title_raw"] = paper.get_title("xml")
if "xml_title" in data:
del data["xml_title"]
if "xml_booktitle" in data:
data["booktitle_html"] = paper.get_booktitle("html")
del data["xml_booktitle"]
if "xml_abstract" in data:
data["abstract_html"] = paper.get_abstract("html")
data["abstract_raw"] = paper.get_abstract("xml")
del data["xml_abstract"]
if "xml_url" in data:
del data["xml_url"]
Expand Down Expand Up @@ -138,6 +140,7 @@ def export_anthology(anthology, outdir, clean=False, dryrun=False):
log.debug("export_anthology: processing volume '{}'".format(id_))
data = volume.as_dict()
data["title_html"] = volume.get_title("html")
data["title_raw"] = volume.get_title("xml")
del data["xml_booktitle"]
if "xml_abstract" in data:
del data["xml_abstract"]
Expand Down
Loading