Skip to content

Commit

Permalink
add openalex document type classification draft
Browse files Browse the repository at this point in the history
  • Loading branch information
naustica committed Aug 20, 2024
1 parent 14178e1 commit b532802
Show file tree
Hide file tree
Showing 3 changed files with 2,579 additions and 0 deletions.
63 changes: 63 additions & 0 deletions _posts/openalex_document_types/literature.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
@misc{haupka_2024,
author = {Nick Haupka and Jack H. Culbert and Alexander Schniedermann and Najko Jahn and Philipp Mayr},
title = {Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar},
year = {2024},
url = {https://arxiv.org/abs/2406.15154},
}
@article{donner_document_2017,
title = {Document type assignment accuracy in the journal citation index data of {Web} of {Science}},
volume = {113},
issn = {1588-2861},
url = {https://doi.org/10.1007/s11192-017-2483-y},
doi = {10.1007/s11192-017-2483-y},
number = {1},
journal = {Scientometrics},
author = {Donner, Paul},
year = {2017},
pages = {219--236}
}
@article{mokhnacheva_document_2023,
title = {Document {Types} {Indexed} in {WoS} and {Scopus}: {Similarities}, {Differences}, and {Their} {Significance} in the {Analysis} of {Publication} {Activity}},
volume = {50},
issn = {1934-8118},
url = {https://doi.org/10.3103/S0147688223010033},
doi = {10.3103/S0147688223010033},
number = {1},
journal = {Scientific and Technical Information Processing},
author = {Mokhnacheva, Yu. V.},
year = {2023},
pages = {40--46}
}
@article{visser_large-scale_2021,
title = {Large-scale comparison of bibliographic data sources: {Scopus}, {Web} of {Science}, {Dimensions}, {Crossref}, and {Microsoft} {Academic}},
volume = {2},
issn = {2641-3337},
url = {https://doi.org/10.1162/qss\_a\_00112},
doi = {10.1162/qss_a_00112},
number = {1},
journal = {Quantitative Science Studies},
author = {Visser, Martijn and van Eck, Nees Jan and Waltman, Ludo},
year = {2021},
pages = {20--41},
}
@misc{alperin_analysis_2024,
title = {An analysis of the suitability of {OpenAlex} for bibliometric analyses},
url = {http://arxiv.org/abs/2404.17663},
doi = {10.48550/arXiv.2404.17663},
abstract = {Scopus and the Web of Science have been the foundation for research in the science of science even though these traditional databases systematically underrepresent certain disciplines and world regions. In response, new inclusive databases, notably OpenAlex, have emerged. While many studies have begun using OpenAlex as a data source, few critically assess its limitations. This study, conducted in collaboration with the OpenAlex team, addresses this gap by comparing OpenAlex to Scopus across a number of dimensions. The analysis concludes that OpenAlex is a superset of Scopus and can be a reliable alternative for some analyses, particularly at the country level. Despite this, issues of metadata accuracy and completeness show that additional research is needed to fully comprehend and address OpenAlex's limitations. Doing so will be necessary to confidently use OpenAlex across a wider set of analyses, including those that are not at all possible with more constrained databases.},
urldate = {2024-05-06},
publisher = {arXiv},
author = {Alperin, Juan Pablo and Portenoy, Jason and Demes, Kyle and Larivière, Vincent and Haustein, Stefanie},
year = {2024}
}
@misc{van_eck_2024_10949622,
author = {Van Eck, Nees Jan and
Waltman, Ludo},
title = {{A methodology for identifying core sources and
core publications in OpenAlex}},
month = apr,
year = 2024,
publisher = {Zenodo},
doi = {10.5281/zenodo.10949622},
url = {https://doi.org/10.5281/zenodo.10949622}
}
42 changes: 42 additions & 0 deletions _posts/openalex_document_types/openalex_document_types_2024.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: "Recent Changes in Document type classification in OpenAlex compared to Web of Science and Scopus"
description: In June 2024, we published a preprint on the classification of document types in Openalex compared to Web of Science, Scopus, PubMed and Semantic Scholar. In this follow-up study, we want to investigate further developments in OpenAlex and compare the results with the proprietary databases Scopus and WoS.
author:
- name: Nick Haupka
affiliation: State and University Library Göttingen
affiliation_url: https://www.sub.uni-goettingen.de/
- name: Sophia Dörner
affiliation: State and University Library Göttingen
affiliation_url: https://www.sub.uni-goettingen.de/
- name: Najko Jahn
affiliation: State and University Library Göttingen
affiliation_url: https://www.sub.uni-goettingen.de/
date: "`r Sys.Date()`"
output: distill::distill_article
bibliography: literature.bib
draft: TRUE
---
Over the last months, OpenAlex has revised its classification of document types, making it more independent of Crossref, and also introduces new document types such as [preprints and reviews](https://groups.google.com/g/openalex-users/c/YujaIIjY02A). In addition, document types are now also adopted from [PubMed](https://groups.google.com/g/openalex-users/c/eXiWOlBXKC0), which raises the question of how these changes affect the analysis of data from OpenAlex.

Here, we build on the former results of our preprint [@haupka_2024] from June 2024 and provide an updated insight. Our investigation highlighted differences in the curation strategies by scholarly database operators which complicates the inquiry of accurate bibliometric figures. Similar findings were obtained by @donner_document_2017, @visser_large-scale_2021 and @alperin_analysis_2024 which reported deviations between numbers of publication derived from bibliometric databases when restricting to certain document types.

The correct classification of documents is crucial for bibliometric surveys as the are used for various measurements and reports and also for searches in databases and catalogues. The diverge classification of a publication can lead to imprecise assertions about the scholarly landscape, e.g. if a document is labelled as an article in one database and as a letter in another. As our preprint has shown, there is a relatively large gap between publications in scholarly databases that are labelled as research texts and publications that are associated with editorial texts. As of 2023, 1% of the publications analysed in OpenAlex were labelled as editorial texts, compared to over 10% in the commercial databases (based on publications in journals from 2012 to 2022). A change to the document types contained in OpenAlex could also have an impact on the CWTS Leiden Ranking Open Edition [@van_eck_2024_10949622], which is based on the document types in OpenAlex.

In this blog post, we examine the recent changes in document type classification in OpenAlex and contrast the findings with the approaches of Scopus and Web of Science.

## Data and Methods
For our analysis we reused the dataset that was compiled for the preprint and updated the respective document types to match current developments. The initial dataset included approximately 9.5 million publications that occur in OpenAlex as well as in Scopus, WoS, PubMed and Semantic Scholar. In addition, to align with the methods applied in the preprint, we restricted the data to the publication years 2012 to 2022 and only considered items from journals. OpenAlex data used in this report is from July 2024. Scopus and WoS data is from April 2024. The initial dataset was compiled using data from mid 2023.

## Findings

Plot 1: Comparison OpenAlex and Scopus

Plot 2: Comparison OpenAlex and Web of Science

Table 1: Comparison shares research and editorial discourse

## Discussion and Conclusion

## Funding {.appendix}

This work is funded by the Bundesministerium für Bildung und Forschung (BMBF) project KBOPENBIB (16WIK2301E). We acknowledge the support of the [German Competence Center for Bibliometrics](https://bibliometrie.info/).
Loading

0 comments on commit b532802

Please sign in to comment.