-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add openalex document type classification draft
- Loading branch information
Showing
3 changed files
with
2,579 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
@misc{haupka_2024, | ||
author = {Nick Haupka and Jack H. Culbert and Alexander Schniedermann and Najko Jahn and Philipp Mayr}, | ||
title = {Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar}, | ||
year = {2024}, | ||
url = {https://arxiv.org/abs/2406.15154}, | ||
} | ||
@article{donner_document_2017, | ||
title = {Document type assignment accuracy in the journal citation index data of {Web} of {Science}}, | ||
volume = {113}, | ||
issn = {1588-2861}, | ||
url = {https://doi.org/10.1007/s11192-017-2483-y}, | ||
doi = {10.1007/s11192-017-2483-y}, | ||
number = {1}, | ||
journal = {Scientometrics}, | ||
author = {Donner, Paul}, | ||
year = {2017}, | ||
pages = {219--236} | ||
} | ||
@article{mokhnacheva_document_2023, | ||
title = {Document {Types} {Indexed} in {WoS} and {Scopus}: {Similarities}, {Differences}, and {Their} {Significance} in the {Analysis} of {Publication} {Activity}}, | ||
volume = {50}, | ||
issn = {1934-8118}, | ||
url = {https://doi.org/10.3103/S0147688223010033}, | ||
doi = {10.3103/S0147688223010033}, | ||
number = {1}, | ||
journal = {Scientific and Technical Information Processing}, | ||
author = {Mokhnacheva, Yu. V.}, | ||
year = {2023}, | ||
pages = {40--46} | ||
} | ||
@article{visser_large-scale_2021, | ||
title = {Large-scale comparison of bibliographic data sources: {Scopus}, {Web} of {Science}, {Dimensions}, {Crossref}, and {Microsoft} {Academic}}, | ||
volume = {2}, | ||
issn = {2641-3337}, | ||
url = {https://doi.org/10.1162/qss\_a\_00112}, | ||
doi = {10.1162/qss_a_00112}, | ||
number = {1}, | ||
journal = {Quantitative Science Studies}, | ||
author = {Visser, Martijn and van Eck, Nees Jan and Waltman, Ludo}, | ||
year = {2021}, | ||
pages = {20--41}, | ||
} | ||
@misc{alperin_analysis_2024, | ||
title = {An analysis of the suitability of {OpenAlex} for bibliometric analyses}, | ||
url = {http://arxiv.org/abs/2404.17663}, | ||
doi = {10.48550/arXiv.2404.17663}, | ||
abstract = {Scopus and the Web of Science have been the foundation for research in the science of science even though these traditional databases systematically underrepresent certain disciplines and world regions. In response, new inclusive databases, notably OpenAlex, have emerged. While many studies have begun using OpenAlex as a data source, few critically assess its limitations. This study, conducted in collaboration with the OpenAlex team, addresses this gap by comparing OpenAlex to Scopus across a number of dimensions. The analysis concludes that OpenAlex is a superset of Scopus and can be a reliable alternative for some analyses, particularly at the country level. Despite this, issues of metadata accuracy and completeness show that additional research is needed to fully comprehend and address OpenAlex's limitations. Doing so will be necessary to confidently use OpenAlex across a wider set of analyses, including those that are not at all possible with more constrained databases.}, | ||
urldate = {2024-05-06}, | ||
publisher = {arXiv}, | ||
author = {Alperin, Juan Pablo and Portenoy, Jason and Demes, Kyle and Larivière, Vincent and Haustein, Stefanie}, | ||
year = {2024} | ||
} | ||
@misc{van_eck_2024_10949622, | ||
author = {Van Eck, Nees Jan and | ||
Waltman, Ludo}, | ||
title = {{A methodology for identifying core sources and | ||
core publications in OpenAlex}}, | ||
month = apr, | ||
year = 2024, | ||
publisher = {Zenodo}, | ||
doi = {10.5281/zenodo.10949622}, | ||
url = {https://doi.org/10.5281/zenodo.10949622} | ||
} |
42 changes: 42 additions & 0 deletions
42
_posts/openalex_document_types/openalex_document_types_2024.Rmd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
title: "Recent Changes in Document type classification in OpenAlex compared to Web of Science and Scopus" | ||
description: In June 2024, we published a preprint on the classification of document types in Openalex compared to Web of Science, Scopus, PubMed and Semantic Scholar. In this follow-up study, we want to investigate further developments in OpenAlex and compare the results with the proprietary databases Scopus and WoS. | ||
author: | ||
- name: Nick Haupka | ||
affiliation: State and University Library Göttingen | ||
affiliation_url: https://www.sub.uni-goettingen.de/ | ||
- name: Sophia Dörner | ||
affiliation: State and University Library Göttingen | ||
affiliation_url: https://www.sub.uni-goettingen.de/ | ||
- name: Najko Jahn | ||
affiliation: State and University Library Göttingen | ||
affiliation_url: https://www.sub.uni-goettingen.de/ | ||
date: "`r Sys.Date()`" | ||
output: distill::distill_article | ||
bibliography: literature.bib | ||
draft: TRUE | ||
--- | ||
Over the last months, OpenAlex has revised its classification of document types, making it more independent of Crossref, and also introduces new document types such as [preprints and reviews](https://groups.google.com/g/openalex-users/c/YujaIIjY02A). In addition, document types are now also adopted from [PubMed](https://groups.google.com/g/openalex-users/c/eXiWOlBXKC0), which raises the question of how these changes affect the analysis of data from OpenAlex. | ||
|
||
Here, we build on the former results of our preprint [@haupka_2024] from June 2024 and provide an updated insight. Our investigation highlighted differences in the curation strategies by scholarly database operators which complicates the inquiry of accurate bibliometric figures. Similar findings were obtained by @donner_document_2017, @visser_large-scale_2021 and @alperin_analysis_2024 which reported deviations between numbers of publication derived from bibliometric databases when restricting to certain document types. | ||
|
||
The correct classification of documents is crucial for bibliometric surveys as the are used for various measurements and reports and also for searches in databases and catalogues. The diverge classification of a publication can lead to imprecise assertions about the scholarly landscape, e.g. if a document is labelled as an article in one database and as a letter in another. As our preprint has shown, there is a relatively large gap between publications in scholarly databases that are labelled as research texts and publications that are associated with editorial texts. As of 2023, 1% of the publications analysed in OpenAlex were labelled as editorial texts, compared to over 10% in the commercial databases (based on publications in journals from 2012 to 2022). A change to the document types contained in OpenAlex could also have an impact on the CWTS Leiden Ranking Open Edition [@van_eck_2024_10949622], which is based on the document types in OpenAlex. | ||
|
||
In this blog post, we examine the recent changes in document type classification in OpenAlex and contrast the findings with the approaches of Scopus and Web of Science. | ||
|
||
## Data and Methods | ||
For our analysis we reused the dataset that was compiled for the preprint and updated the respective document types to match current developments. The initial dataset included approximately 9.5 million publications that occur in OpenAlex as well as in Scopus, WoS, PubMed and Semantic Scholar. In addition, to align with the methods applied in the preprint, we restricted the data to the publication years 2012 to 2022 and only considered items from journals. OpenAlex data used in this report is from July 2024. Scopus and WoS data is from April 2024. The initial dataset was compiled using data from mid 2023. | ||
|
||
## Findings | ||
|
||
Plot 1: Comparison OpenAlex and Scopus | ||
|
||
Plot 2: Comparison OpenAlex and Web of Science | ||
|
||
Table 1: Comparison shares research and editorial discourse | ||
|
||
## Discussion and Conclusion | ||
|
||
## Funding {.appendix} | ||
|
||
This work is funded by the Bundesministerium für Bildung und Forschung (BMBF) project KBOPENBIB (16WIK2301E). We acknowledge the support of the [German Competence Center for Bibliometrics](https://bibliometrie.info/). |
Oops, something went wrong.