Questions about unique_entries #18

stevencarlislewalker · 2022-10-25T19:36:20Z

Should the unlisting be recursive?

Nested fields can generate lists that are too flat (maybe?). For example,

> iidda:::unique_entries(iidda.api::ops$metadata(), "titles")
  [1] "Canadian Weekly CDI by Province in 1958"                                                 
  [2] "en"                                                                                      
  [3] "Canadian Weekly CDI by Province in 1957"                                                 
  [4] "en"                                                                                      
  [5] "Canadian Weekly CDI by Province in 1956"                                                 
  [6] "en"

Verify that there is no missing export

I expected to be able to use this function the unique_entries function with :::, but it was not exported.

iidda-tools/R/iidda/R/regex_query_tools.R

Line 71 in 88b4781

Is this intentional @mikeroswell ?

The text was updated successfully, but these errors were encountered:

mikeroswell · 2022-10-25T20:01:25Z

Looks like the "too flat" thing is a bug that I can fix... good catch.

I'm not set on exporting or not, but it seemed to me like unique_entries was a little utility function that a downstream user wouldn't probably want to work with... actually, my guess is you have that function somewhere already and I should call the existing one... do you have some kind of "custom unlist" function for working with those metadata already?

Anyways, in my little mental schema, unique_field_entries was the wrapper of the unlister and ops$metadata that I imagined a user would interact with directly when querying the api, but happy to call your unlister or fix mine, and happy to export if you think it's something someone other than us would use.

stevencarlislewalker · 2022-10-25T20:03:30Z

This all makes sense, thanks @mikeroswell. I have no specific unlister in mind, so please just fix yours.

mikeroswell · 2022-10-25T20:14:53Z

Ok! I found yours in data_frame_tools.R in a function called get_unique_col_values (which I assume works fine, but i'll test). I'll pull that out, test it, and then call it in both get_unique_col_values and also in unique_field_entries.

stevencarlislewalker · 2022-10-25T20:15:45Z

Excellent

mikeroswell · 2022-10-26T14:52:54Z

yikes, the metadata are a bit more complicated than the columns! What would you want behavior to be? For your e.g., would you want this function to return c("title", "lang")? Some structured list like list(lang = c("en", "fr"), title = c("my first title", "my second title", etc.)), or what it gives now (all the tokens anywhere inside the top level field(s) matching your search string)?

stevencarlislewalker · 2022-10-26T15:06:34Z

I don't know.

Indeed the structure of the metadata is nested with no restrictions on depth, other than those described technically here: https://github.com/datacite/schema/blob/master/source/json/kernel-4.3/datacite_4.3_schema.json. In fact we actually validate against this schema, which ensures that we conform to the spec perfectly.

I think that we should return what the API returns. However, we could change what the API returns if we think it is not useful. The existing return value that you describe is the result of an automatically generated jq query that we thought might be useful, but we could indeed change it. The user is currently free to issue their own jq query but I don't think that we should try to wrap jq in R (unless perhaps if something already exists).

I think that the answers to all of this will depend on what is to be done with the results. For example, if it is just to learn about what is out there then I think we can just leave the return value as it is. If it is to feed into a subsequent query then we might want to process it first perhaps ... or perhaps not. It might make sense to chat.

mikeroswell · 2022-10-26T15:11:46Z

Propose (unless something you're currently working on depends on resolving this) we table for now and revisit when we have use cases in mind. I can't chat today but we're scheduled for tomorrow a.m.

stevencarlislewalker · 2022-10-26T15:14:15Z

I couldn't agree more. Talk tomorrow.

stevencarlislewalker · 2022-10-28T15:29:10Z

I couldn't agree more. Talk tomorrow.

It turned out that your results @mikeroswell were more interesting and so didn't talk about this 'tomorrow'. But let's talk about it the next time we meet.

stevencarlislewalker assigned mikeroswell Oct 25, 2022

stevencarlislewalker changed the title ~~Verrify that there is no missing export~~ Questions about unique_entries Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about unique_entries #18

Questions about unique_entries #18

stevencarlislewalker commented Oct 25, 2022 •

edited

Loading

mikeroswell commented Oct 25, 2022

stevencarlislewalker commented Oct 25, 2022

mikeroswell commented Oct 25, 2022

stevencarlislewalker commented Oct 25, 2022

mikeroswell commented Oct 26, 2022

stevencarlislewalker commented Oct 26, 2022

mikeroswell commented Oct 26, 2022

stevencarlislewalker commented Oct 26, 2022

stevencarlislewalker commented Oct 28, 2022

Questions about unique_entries #18

Questions about unique_entries #18

Comments

stevencarlislewalker commented Oct 25, 2022 • edited Loading

Should the unlisting be recursive?

Verify that there is no missing export

mikeroswell commented Oct 25, 2022

stevencarlislewalker commented Oct 25, 2022

mikeroswell commented Oct 25, 2022

stevencarlislewalker commented Oct 25, 2022

mikeroswell commented Oct 26, 2022

stevencarlislewalker commented Oct 26, 2022

mikeroswell commented Oct 26, 2022

stevencarlislewalker commented Oct 26, 2022

stevencarlislewalker commented Oct 28, 2022

stevencarlislewalker commented Oct 25, 2022 •

edited

Loading