Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to evaluate commonsense locality? #424

Open
lliutianc opened this issue Nov 16, 2024 · 4 comments
Open

How to evaluate commonsense locality? #424

lliutianc opened this issue Nov 16, 2024 · 4 comments
Labels
question Further information is requested

Comments

@lliutianc
Copy link

Hi,

Thanks for maintaining the repo!

After reading through the codes and your paper: Editing Large Language Models: Problems, Methods, and Opportunities, I am not sure how to evaluate the locality results shown in Table 4 in the paper. The dataset looks like "locality" but I didn't find an example of using it properly. Can you share a minimal example?

image
@zxlzr zxlzr added the question Further information is requested label Nov 16, 2024
@littlefive5
Copy link
Collaborator

littlefive5 commented Nov 17, 2024

Hi there, you can find it in Appendix B.3.3. For the computation, we combine the question and choice as the input, compute the loss between different choices, and select the one with the minimum loss as the answer.

@zxlzr
Copy link
Contributor

zxlzr commented Nov 17, 2024

Hi buddy, do you have any further questions?

@lliutianc
Copy link
Author

Thanks for your answer. By loss do you mean using PPL? BTW, were distracting neighbor and other attribution computed in the same logic?

@littlefive5
Copy link
Collaborator

Yes, PPL.
But, distracting neighbor and other attribution are computed by the token-level exact metric, this can be calculated by our evaluation code.
This means you can directly use our code to get results of distracting neighbor and other attribution but you need to evaluate the reasoning task by your own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants