New OPENBIB Data Release
Launch of the new OPENBIB Data Release
In the last year, we, the Kompetenznetzwerk Bibliometrie, presented the initial OPENBIB data release, which contains selected and curated metadata from OpenAlex. In this blog post, we would like to introduce an updated OPENBIB data release, which is based on the OpenAlex snapshot from July 2025. Alongside updated curation pipelines, this data release is also featuring an author disambiguation system developed by GESIS - Leibniz Institute for the Social Sciences. We hope that this public release will make an important contribution to the bibliometric community.
Besides this release, the results of our ongoing investigation of OpenAlex are also regularly published in journals 12345 or as blog posts 67.
Data Release
Our release includes:
Datasets with a Focus on Germany:
- A curated set of publications from German research organisations indexed in OpenAlex according to the established Bielefeld Institutionenkodierung 89
- Additional funding information for articles of research projects funded by the German Research Foundation and reported to the programme “Open Access Publishing”, curated and expanded from the German Open Access Monitor
- A disambiguated author dataset for publications affiliated with German institutions developed by GESIS.

Datasets with a Focus on Global Developments:
- Publisher disambiguation with emphasis on imprints and mergers
- Preserved and enriched metadata about articles, institutions and journals covered by transformative agreements, curated and expanded from the initial data provided by the cOAlition S Journal Checker Tool
- A classifier for the identification of research articles (document type classification)10

SQL code used for Figure 2
SELECT COUNT(DISTINCT(dt.doi)) AS n,
UNNEST(kb_a_addr.sector_id) AS sector_id,
is_research,
publication_year
FROM oal_rep_20250711.works AS oal
JOIN kb_project_openbib.add_institution_kb_a_oal_b_20250711 AS kb_a_addr
ON LOWER(TRIM('https://doi.org/' FROM oal.doi)) = LOWER(kb_a_addr.doi)
JOIN kb_project_openbib.add_document_types_20250711 AS dt
ON LOWER(kb_a_addr.doi) = LOWER(dt.doi)
WHERE oal.publication_year BETWEEN 2014 AND 2024
GROUP BY sector_id, is_research, publication_yearBlog post series
Over the last year, we posted a few blog posts as part of our data release. This includes:
How to get it?
This dataset is available under a permissive CC0 license and can be downloaded from Zenodo:
Haupka, N., Culbert, J., Donner, P., Jahn, N., Lenke, C., Mayr, P., Meier, A., Mittermaier, B., Scheidt, B., Stahlschmidt, S., & Taubert, N. (2026). OPENBIB: Selected curated open metadata based on OpenAlex (0.3) [Data set]. Kompetenznetzwerk Bibliometrie. https://doi.org/10.5281/zenodo.18429476
A detailed documentation about the OPENBIB snapshot can be found on GitHub: https://github.com/kbopenbib/kbopenbib_data.
This dataset release is also available for KB users in the dedicated data infrastructure provided by the FIZ Karlsruhe and publicly on BigQuery as provided by the SUB Göttingen (dataset openbib).
What’s next?
We are planning to hold a Do-A-Thon within the next few weeks, which will be based on this public data release. Further details will be provided in the near future. We are looking forward to your feedback, as it helps us to refine and improve our curation processes!
Footnotes
Schmidt, M., Rimmert, C., Stephen, D., Lenke, C., Donner, P., Gärtner, S., Taubert, N., Bausenwein, T., & Stahlschmidt, S. (2025). The Data Infrastructure of the German Kompetenznetzwerk Bibliometrie: An Enabling Intermediary between Raw Data and Analysis. Quantitative Science Studies, 6(1), 1129–1146. https://doi.org/10.1162/QSS.a.20↩︎
Culbert, J. H., Hobert, A., Jahn, N., Haupka, N., Schmidt, M., Donner, P., & Mayr, P. (2025). Reference coverage analysis of OpenAlex compared to Web of Science and Scopus. Scientometrics, 130(4), 2475-2492. https://doi.org/10.1007/s11192-025-05293-3↩︎
Haupka, N., Culbert, J. H., Schniedermann, A., Jahn, N., & Mayr, P. (2025). Analysis of the publication and document types in OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar. Quantitative Science Studies. https://doi.org/10.1162/QSS.a.406↩︎
Jahn, N. (2025). Estimating transformative agreement impact on hybrid open access: a comparative large-scale study using Scopus, Web of Science and open metadata. Scientometrics. https://doi.org/10.1007/s11192-025-05390-3↩︎
Haupka, N. (2026): Presenting a classifier to improve the identification of research journal publications in OpenAlex. Scientometrics. https://doi.org/10.1007/s11192-025-05524-7↩︎
Jahn, N. (2025): Decreasing Affiliation Metadata Coverage in OpenAlex. Scholarly Communication Analytics. https://doi.org/10.59350/z3c5x-bfk63.↩︎
Haupka, N. (2025): Is Semantic Scholar Suitable for Enriching References in OpenAlex? Scholarly Communication Analytics. https://doi.org/10.59350/8t2s7-vtw86↩︎
Rimmert, C., Schwechheimer, H., & Winterhager, M. (2017). Disambiguation of author addresses in bibliometric databases-technical report. https://bibliometrie.info/downloads/DisambiguationOfAuthorAddressesInBibliometricDatabases.pdf↩︎
Lenke, C., & Taubert, N. C. (2025). Institutionen-Kodierung für Adressdaten wissenschaftlicher Publikationen, Version 2.0. Bielefeld University. https://pub.uni-bielefeld.de/record/3006783↩︎
Haupka, N. (2026): Presenting a classifier to improve the identification of research journal publications in OpenAlex. Scientometrics. https://doi.org/10.1007/s11192-025-05524-7↩︎
Citation
@online{haupka2026,
author = {Haupka, Nick and Culbert, Jack and Donner, Paul and Jahn,
Najko and Lenke, Christopher and Mayr, Philipp and Meier, Andreas
and Mittermaier, Bernhard and Scheidt, Barbara and Stahlschmidt,
Stephan and Taubert, Niels},
title = {New {OPENBIB} {Data} {Release}},
date = {2026-02-11},
url = {http://www.open-bibliometrics.de/posts/20260211-OpenDataRelease/},
langid = {en}
}