-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict Wikidata reconciliation queries by item type #101
Comments
Hi, David. Thank you very much for your report. The bug you found is indeed critical. First, let me provide a brief explanation of how QIDs are fetched, using Wikidata reconciliation service:
The reconciliation service returns a series of candidate QIDs, each with a matching score. If there is an exact match (whether there is an exact match or not is decided by the reconciliation service, but the documentation is not explicit about this; I would have to re-check their source code to make sure) the QID is saved to the item. If there are partial matches, the user may be asked to choose among them (only non-batch operations). Restricting to items of the same item type sounds like a great suggestion. I will make this the subject of this issue. In addition, including author names and publication date in the reconciliation query may help identify these mismatches as well. I had tried to do this, but it didn't work as expected. Apparently it was a bug. Now that it seems to have been solved, I'll try again (#103). Finally, there is yet another case related to the bug you reported; that is, book chapters in Zotero with an ISBN for the book they belong to. Right now this ISBN is included in the query's |
Regarding the mapping of item types: I think the mapping can be built straightforwardly (following the mapping in Zotkat), with one exeption: A conference paper may appear as conference paper, or as journal article, or as book chapter (if the proceedings are published as serial volume, or as book, which often happens). So, in that case, it would be good to allow all three item types for a possible wikidata match. |
Hi, @dlindem! Sorry I couldn't reply to your message before. I'll take your suggestion about conference papers into account when I have time to fix this. Do you know if conference papers treated as conference paper, journal article or book chapter should be treated as separate items in Wikidata? I wonder if this may be related to what was discussed here regarding whether a preprint and the final article should have different items in Wikidata (I think they should). Analogously, if Zotero had a separate "preprint" item type (it doesn't, but I'm using it in analogy to the conference paper case), if a user had a preprint item in their library and the Wikidata resoultion returned the corresponding final article, I think I wouldn't treat that as an exact match, but as a partial match (i.e., suggestion), as commented in my reply above.
I'm afraid I think that wouldn't work because of the way how the reconciliation service works. A book chapter in Wikidata doesn't seem to have a reference to the ISBN of the book it belongs to. Rather, it has a "part of" statement pointing to a "version, edition, translation" (hopefully) which in turn has an ISBN.
That's a good idea. I can have Cita store these as extra, just like it does with QID and OCC: #109 |
Non-strict (partial) type matching is not yet supported by openrefine-wikibase (wetneb/openrefine-wikibase/issues/4). As a workaround, I will send two requests to this API, the first one with a specific item type, and the second one (if the first one returns no exact matches) with a more general item type. Any matches (exact or not) returned from this second request will be treated as partial matches. In addition, addressing #52 would also help users with deciding whether a partial match refers to their item or not. |
Should be published in v0.2.4 |
I had a book chapter item in Zotero, with ISBN of the containing volume: Wooldridge, Russon (2004): "Lexicography", in Schreibman et al. (eds.): A Companion to Digital Humanities, Oxford: Blackwell, ISBN 978-1-4051-0321-3.
The fetched Q-ID is Q96725112, which refers to a journal (not to an article!) with the same title.
I think reconciliation only based on titles is very error-prone. A restriction to items of the same item type would be one strategy (allow only book chapters in this case).
The text was updated successfully, but these errors were encountered: