SyntaxGym redundant suites? #184

mschrimpf · 2023-05-23T00:41:43Z

is there a reason for specifying both the github links as well as keep the file contents themselves for the test suites?

jimn2 · 2023-05-23T01:09:35Z

This is a result of Jon G and I doing it two different ways. Jon stored the file contents locally in the BrainScore repo and was using them directly, but it struck me as not a very flexible way to do it so I later set it up to read the data externally using urls. I left them both in figuring that other users would then have examples of how to do it either way. Technically though, one method or the other could be removed.

mschrimpf · 2023-05-23T01:26:32Z

which one would you say is preferable? I think it would be clearer to have one consistent way. We can always re-introduce the secondary option if necessary, but right now this feels like YAGNI.

jimn2 · 2023-05-23T01:30:48Z

there are advantages/disadvantages of both ways, but my gut was that using urls was more general and also avoids the possible headache of somebody trying to store huge amounts of data in our repo.

jimn2 · 2023-05-23T10:36:36Z

i thought about this some more and i think if we are only going to keep one we should keep the one that best matches how most of the other benchmarks work. certainly storing the data locally is simpler and accessing external data is an added feature that maybe we ain't gonna need.

how is the data stored/accessed with most of the other benchmarks? if you let me know i'll fix this today.

mschrimpf · 2023-05-23T14:52:35Z

Mostly the data is on S3 or some server, so the URL access you built likely is most aligned with that. Especially because you're already pointing to a specific version of the data, so the files are not going to change on us. Although maybe we want to add some checksums to verify integrity?

jimn2 · 2023-05-23T15:09:12Z

aren't the specific values given in test_integration.py essentially acting as checks on these data files being changed? (i know it's possible to cleverly change those files and still get the same score but it's unlikely)

mschrimpf · 2023-05-23T17:44:53Z

true!

jimn2 · 2023-05-23T18:05:15Z

thought about this a little more. the way the syntaxgym benchmarks work, the resulting scores are very discrete and not unique. you can definitely change multiple tokens and then end up with the same score. so no matter what kind of checksums we add, there will almost always be the possibility of a slightly changed data file not being caught. so what's there right now in test_integration.py is about as good as we probably want to do (anything more effective would cost a lot).

mschrimpf assigned jimn2 May 23, 2023

mschrimpf added the question Further information is requested label May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SyntaxGym redundant suites? #184

SyntaxGym redundant suites? #184

mschrimpf commented May 23, 2023

jimn2 commented May 23, 2023

mschrimpf commented May 23, 2023 •

edited

Loading

jimn2 commented May 23, 2023

jimn2 commented May 23, 2023

mschrimpf commented May 23, 2023

jimn2 commented May 23, 2023 •

edited

Loading

mschrimpf commented May 23, 2023

jimn2 commented May 23, 2023 •

edited

Loading

SyntaxGym redundant suites? #184

SyntaxGym redundant suites? #184

Comments

mschrimpf commented May 23, 2023

jimn2 commented May 23, 2023

mschrimpf commented May 23, 2023 • edited Loading

jimn2 commented May 23, 2023

jimn2 commented May 23, 2023

mschrimpf commented May 23, 2023

jimn2 commented May 23, 2023 • edited Loading

mschrimpf commented May 23, 2023

jimn2 commented May 23, 2023 • edited Loading

mschrimpf commented May 23, 2023 •

edited

Loading

jimn2 commented May 23, 2023 •

edited

Loading

jimn2 commented May 23, 2023 •

edited

Loading