Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about unique_entries #18

Open
stevencarlislewalker opened this issue Oct 25, 2022 · 9 comments
Open

Questions about unique_entries #18

stevencarlislewalker opened this issue Oct 25, 2022 · 9 comments
Assignees

Comments

@stevencarlislewalker
Copy link
Member

stevencarlislewalker commented Oct 25, 2022

Should the unlisting be recursive?

Nested fields can generate lists that are too flat (maybe?). For example,

> iidda:::unique_entries(iidda.api::ops$metadata(), "titles")
  [1] "Canadian Weekly CDI by Province in 1958"                                                 
  [2] "en"                                                                                      
  [3] "Canadian Weekly CDI by Province in 1957"                                                 
  [4] "en"                                                                                      
  [5] "Canadian Weekly CDI by Province in 1956"                                                 
  [6] "en"               

Verify that there is no missing export

I expected to be able to use this function the unique_entries function with :::, but it was not exported.

Is this intentional @mikeroswell ?

@stevencarlislewalker stevencarlislewalker changed the title Verrify that there is no missing export Questions about unique_entries Oct 25, 2022
@mikeroswell
Copy link
Collaborator

Looks like the "too flat" thing is a bug that I can fix... good catch.

I'm not set on exporting or not, but it seemed to me like unique_entries was a little utility function that a downstream user wouldn't probably want to work with... actually, my guess is you have that function somewhere already and I should call the existing one... do you have some kind of "custom unlist" function for working with those metadata already?

Anyways, in my little mental schema, unique_field_entries was the wrapper of the unlister and ops$metadata that I imagined a user would interact with directly when querying the api, but happy to call your unlister or fix mine, and happy to export if you think it's something someone other than us would use.

@stevencarlislewalker
Copy link
Member Author

This all makes sense, thanks @mikeroswell. I have no specific unlister in mind, so please just fix yours.

@mikeroswell
Copy link
Collaborator

Ok! I found yours in data_frame_tools.R in a function called get_unique_col_values (which I assume works fine, but i'll test). I'll pull that out, test it, and then call it in both get_unique_col_values and also in unique_field_entries.

@stevencarlislewalker
Copy link
Member Author

Excellent

@mikeroswell
Copy link
Collaborator

yikes, the metadata are a bit more complicated than the columns! What would you want behavior to be? For your e.g., would you want this function to return c("title", "lang")? Some structured list like list(lang = c("en", "fr"), title = c("my first title", "my second title", etc.)), or what it gives now (all the tokens anywhere inside the top level field(s) matching your search string)?

@stevencarlislewalker
Copy link
Member Author

I don't know.

Indeed the structure of the metadata is nested with no restrictions on depth, other than those described technically here: https://github.com/datacite/schema/blob/master/source/json/kernel-4.3/datacite_4.3_schema.json. In fact we actually validate against this schema, which ensures that we conform to the spec perfectly.

I think that we should return what the API returns. However, we could change what the API returns if we think it is not useful. The existing return value that you describe is the result of an automatically generated jq query that we thought might be useful, but we could indeed change it. The user is currently free to issue their own jq query but I don't think that we should try to wrap jq in R (unless perhaps if something already exists).

I think that the answers to all of this will depend on what is to be done with the results. For example, if it is just to learn about what is out there then I think we can just leave the return value as it is. If it is to feed into a subsequent query then we might want to process it first perhaps ... or perhaps not. It might make sense to chat.

@mikeroswell
Copy link
Collaborator

Propose (unless something you're currently working on depends on resolving this) we table for now and revisit when we have use cases in mind. I can't chat today but we're scheduled for tomorrow a.m.

@stevencarlislewalker
Copy link
Member Author

I couldn't agree more. Talk tomorrow.

@stevencarlislewalker
Copy link
Member Author

I couldn't agree more. Talk tomorrow.

It turned out that your results @mikeroswell were more interesting and so didn't talk about this 'tomorrow'. But let's talk about it the next time we meet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants