How to evaluate commonsense locality? #424

lliutianc · 2024-11-16T03:00:31Z

Hi,

Thanks for maintaining the repo!

After reading through the codes and your paper: Editing Large Language Models: Problems, Methods, and Opportunities, I am not sure how to evaluate the locality results shown in Table 4 in the paper. The dataset looks like "locality" but I didn't find an example of using it properly. Can you share a minimal example?

littlefive5 · 2024-11-17T06:04:17Z

Hi there, you can find it in Appendix B.3.3. For the computation, we combine the question and choice as the input, compute the loss between different choices, and select the one with the minimum loss as the answer.

zxlzr · 2024-11-17T11:13:20Z

Hi buddy, do you have any further questions?

lliutianc · 2024-11-17T23:23:23Z

Thanks for your answer. By loss do you mean using PPL? BTW, were distracting neighbor and other attribution computed in the same logic?

littlefive5 · 2024-11-18T02:41:33Z

Yes, PPL.
But, distracting neighbor and other attribution are computed by the token-level exact metric, this can be calculated by our evaluation code.
This means you can directly use our code to get results of distracting neighbor and other attribution but you need to evaluate the reasoning task by your own.

zxlzr added the question Further information is requested label Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluate commonsense locality? #424

How to evaluate commonsense locality? #424

lliutianc commented Nov 16, 2024

littlefive5 commented Nov 17, 2024 •

edited

Loading

zxlzr commented Nov 17, 2024

lliutianc commented Nov 17, 2024

littlefive5 commented Nov 18, 2024

How to evaluate commonsense locality? #424

How to evaluate commonsense locality? #424

Comments

lliutianc commented Nov 16, 2024

littlefive5 commented Nov 17, 2024 • edited Loading

zxlzr commented Nov 17, 2024

lliutianc commented Nov 17, 2024

littlefive5 commented Nov 18, 2024

littlefive5 commented Nov 17, 2024 •

edited

Loading