Analysis shows strengths and potential of open bibliometric data compared to Scopus and Web of Science
A paper was recently published in Scientometrics (Culbert et al. 2025) by members of the KB on a large-scale comparison of three of the bibliometric databases provided in the KB: the Web of Science (WoS), Scopus and OpenAlex.
In this paper, we matched records classified as articles and published between 2015-2022 in the three databases based on DOIs, excluding articles in each database which had another record with the same DOI. This gave us a “shared” corpus of 16,788,282 records (out of the ~71M in WoS, 65M in Scopus and 243M in OpenAlex.) A Venn diagram of the deduplicated match is as follows:
We then focused on calculating the reference coverage in each database, that is the proportion of referenced articles in an article which are available in each database. Our results are as follows:
WoS | Scopus | OpenAlex | |
---|---|---|---|
Whole Corpus | |||
Reported Average Reference Count | 24.765 | 31.254 | – |
Pre-calculated Average Source Reference Count | 16.867 | 18.692 | 7.572 |
Internal Coverage | 68.1% | 59.8% | – |
Shared Corpus (2015–2022) | |||
All References | |||
Reported Average Reference Count | 43.185 | 43.320 | – |
Pre-calculated Average Source Reference Count | 33.416 | 33.363 | 34.863 |
Internal Coverage | 77.4% | 77.0% | – |
References 1996–2022 | |||
Calculated Average Reference Count | 38.226 | 38.062 | – |
Calculated Average Source Reference Count | 31.207 | 33.359 | 31.823 |
Internal Coverage | 81.6% | 87.6% | – |
Firstly, OpenAlex does not have a Reference Count in its database as it only reports articles inside OpenAlex as references, which we have termed a source reference count. This also prevents us from directly calculating the internal coverage. However, we can assume that the fairly similar reference counts in WoS and Scopus are accurate, and with this assumption we see:
- On the whole corpus, OpenAlex has a much lower source reference count than WoS and Scopus
- When restricted to articles published between 2015-2022, this number rises significantly – to surpass WoS and Scopus in source reference count.
- However, when restricting the references to those published between 1996-2022 (to prevent any potential bias against the newer Scopus) we see that OpenAlex has around the same number of reported source references as the other two databases.
We can therefore infer from our assumption that the reference coverage of OpenAlex is somewhere between 83.6% and 83.2% for articles within the Shared Corpus and with references between 1996 and 2022.
We also wished to check that the figures reported from the providers is accurate, and found that calculating the ratio of references per record across the whole of each corpus was as follows:
Whole Corpus | WoS | Scopus | OpenAlex |
---|---|---|---|
Ratio of References per Record | 24.765 | 30.979 | 7.592 |
Reported Average Total Reference Count | 24.765 | 31.254 | – |
Reported Average Source Reference Count | 16.867 | 18.692 | 7.572 |
Which indicates caution may be required when utilising Scopus’ reported reference counts. The discrepancy was lower however for a similar calculation on articles only.
We also performed additional analysis on the corpus also analysing other metadata on a journal aggregated basis, showing that:
- The distribution of reference coverage per journal is similar between WoS and Scopus against OpenAlex – implying that the reason for the differing reference coverage is independent of the database. The distribution of reference coverage between WoS and Scopus is similar.
- Article Funding information is better captured in both WoS and Scopus than OpenAlex.
- Open Access information is similarly captured in all databases.
- ORCID identifiers are much better captured in OpenAlex.
- Abstracts are better covered in WoS and Scopus than in OpenAlex.
In summary, we believe we been able to identify some interesting differences and similarities between OpenAlex, WoS and Scopus, demonstrating that while OpenAlex captures a larger remit of records it still performs comparably on reference coverage on a modern and “curated” dataset similar to WoS and Scopus. Furthermore we have demonstrated other aspects of the record metadata to be equivalent.
References
Citation
@online{culbert2025,
author = {Culbert, Jack},
title = {Analysis Shows Strengths and Potential of Open Bibliometric
Data Compared to {Scopus} and {Web} of {Science}},
date = {2025-06-10},
url = {http://www.open-bibliometrics.de/posts/20250610-OpenAlexCoveragePaper/},
langid = {en}
}