Announcing OPENBIB Initial Data Release

data
Authors
Affiliations

Nick Haupka

State and University Library Göttingen

Jack Culbert

GESIS – Leibniz Institute for the Social Sciences

Paul Donner

German Centrer for Higher Education Research and Science Studies (DZHW)

Najko Jahn

State and University Library Göttingen

Christopher Lenke

Universität Bielefeld

Philipp Mayr

GESIS – Leibniz Institute for the Social Sciences

Andreas Meier

Forschungszentrum Jülich

Bernhard Mittermaier

Forschungszentrum Jülich

Barbara Scheidt

Forschungszentrum Jülich

Stephan Stahlschmidt

German Centrer for Higher Education Research and Science Studies (DZHW)

Niels Taubert

Universität Bielefeld

Published

May 6, 2025

Announcing OPENBIB Initial Data Release

The German Kompetenznetzwerk Bibliometrie is pleased to announce our initial release of selected and curated OpenAlex data as part of our work to foster the use of open bibliometrics data and contribute to the evolving open bibliometrics landscape.

We have also started to provide complete data snapshots from OpenAlex within the same bibliometric data research infrastructure as Scopus and Web of Science on an ongoing basis1, and have examined the suitability of open bibliometrics data.234

We believe that our selected curated datasets will make a meaningful contribution to the growing landscape of open bibliometrics data based on OpenAlex.

Initial data release

Our release includes:

Datasets with a Focus on Germany

  • A curated set of publications from German research organisations indexed in OpenAlex according to the established Bielefeld Institutionenkodierung 56
  • Additional funding information for articles of research projects funded by the German Research Foundation and reported to the programme “Open Access Publishing”, curated and expanded from the German Open Access Monitor
Figure 1: Comparison of publications affiliated with German research organisations, 2014-2024, indexed by OpenAlex. The visualisation demonstrates additional introduction of sectors. Most institutions appear below the diagonal line, indicating improved institutional coding for German research organisations by OPENBIB using OpenAlex metadata. [Notebook used to create this figure]

Datasets with a Focus on Global Developments

  • Publisher disambiguation with emphasis on imprints and mergers
  • Preserved and enriched metadata about articles, institutions and journals covered by transformative agreements, curated and expanded from the initial data provided by the cOAlition S Journal Checker Tool
  • A classifier for the identification of research articles (document type classification)
Figure 2: Estimate of articles indexed in OpenAlex under transformative agreements, 2020-2024. Note that only first author affiliations were matched to participating institutions according to transformative agreement data provided by the cOAlition S Journal Checker tool. [Notebook used to create this figure]

How to get it?

The dataset, including individual data files and documentation, is available under a permissive CC0 license and can be found at:

Haupka, N., Culbert, J., Donner, P., Jahn, N., Lenke, C., Mayr, P., Meier, A., Mittermaier, B., Scheidt, B., Stahlschmidt, S., & Taubert, N. (2025). OPENBIB: Selected curated open metadata based on OpenAlex (0.1) [Data set]. Kompetenznetzwerk Bibliometrie. https://doi.org/10.5281/zenodo.15308680

The data is also available for KB users in the dedicated data infrastructure provided by the FIZ Karlsruhe and publicly on BigQuery as provided by the SUB Göttingen (dataset openbib).

What’s next?

With this announcement, we start a blog post series and share use cases as blog posts. We would also like to invite you to a virtual validation hackathon, which will be announced shortly. We hope to learn from your experiences with these initial datasets as we continue our curation of essential data entities for the German research landscape and beyond.

Your feedback and contributions would be valuable as we work to continue to provide open bibliometrics resources based on established and leading data sources like OpenAlex for the community.

Footnotes

  1. Schmidt, M., Rimmert, C., Stephen, D., Lenke, C., Donner, P., Gärtner, S., Taubert, N., Bausenwein, T., & Stahlschmidt, S. (2024). The Data Infrastructure of the German Kompetenznetzwerk Bibliometrie: An Enabling Intermediary between Raw Data and Analysis. Zenodo. https://doi.org/10.5281/zenodo.13935407↩︎

  2. Culbert, J. H., Hobert, A., Jahn, N., Haupka, N., Schmidt, M., Donner, P., & Mayr, P. (2025). Reference coverage analysis of OpenAlex compared to Web of Science and Scopus. Scientometrics, 130(4), 2475–2492. https://doi.org/10.1007/s11192-025-05293-3↩︎

  3. Haupka, N., Culbert, J. H., Schniedermann, A., Jahn, N., & Mayr, P. (2024). Analysis of the publication and document types in OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar. arXiv. https://arxiv.org/abs/2406.15154↩︎

  4. Jahn, N. (2025). Estimating transformative agreement impact on hybrid open access: A comparative large-scale study using Scopus, Web of Science and open metadata. arXiv. https://arxiv.org/abs/2504.15038↩︎

  5. Rimmert, C., Schwechheimer, H., & Winterhager, M. (2017). Disambiguation of author addresses in bibliometric databases-technical report. https://bibliometrie.info/downloads/DisambiguationOfAuthorAddressesInBibliometricDatabases.pdf↩︎

  6. Lenke, C., & Taubert, N. C. (2024). Institutionen-Kodierung für Adressdaten wissenschaftlicher Publikationen. Bielefeld University. https://doi.org/10.4119/unibi/2999367↩︎