Location
School of Law Seminar Room 3.09
Start Date
3-6-2026 4:00 PM
End Date
3-6-2026 4:30 PM
Description
The rise of open research information resources is transforming the way we track, analyse and study research systems. Increasingly sources like OpenAIRE, OpenAlex, Crossref, DataCite, ORCID, ROR and others are being used as the basis for making decisions, designing interventions
and understanding progress. This operates both at the small scale, where access to data and evidence is easier than it has ever been, to the very large scale analysis of whole systems. Modern open data sources provide access, including access to full copies of the data, but there has been less focus on providing this access in a way that allows complex querying and joining of whole data archives - for example to compare the coverage of research outputs by OpenAlex and OpenAIRE or analyse global information on clinical trials using affiliation data
from OpenAlex and clinical trials information from Pubmed. Another valuable possibility is the ability to incorporate local data enrichments from national or regional data sources to support local data needs, or improve the overall pool of data. Google BigQuery has emerged as a powerful tool for combining and working on these large datasets at scale. Multiple groups (including the InSysPo team at Campinas, SUB Göttingen and Sesame Open Science), have created versions of specific open datasets in the BigQuery system, which anyone can access and run their own analyses. Here, the ‘provider’ pays for storage, and the user covers the costs of processing.
For large scale analysis that require entire data sources to be combinable and actionable at scale, this approach can add something valuable to the overall Open Research Information ecosystem. In this presentation, we will share examples of use cases and discuss key questions around coordination and sustainability of this approach (including potential alternatives to Google Big Query), and what would be needed for different stakeholders to make this attractive.
Included in
Sharing the load: Building an open research information collective
School of Law Seminar Room 3.09
The rise of open research information resources is transforming the way we track, analyse and study research systems. Increasingly sources like OpenAIRE, OpenAlex, Crossref, DataCite, ORCID, ROR and others are being used as the basis for making decisions, designing interventions
and understanding progress. This operates both at the small scale, where access to data and evidence is easier than it has ever been, to the very large scale analysis of whole systems. Modern open data sources provide access, including access to full copies of the data, but there has been less focus on providing this access in a way that allows complex querying and joining of whole data archives - for example to compare the coverage of research outputs by OpenAlex and OpenAIRE or analyse global information on clinical trials using affiliation data
from OpenAlex and clinical trials information from Pubmed. Another valuable possibility is the ability to incorporate local data enrichments from national or regional data sources to support local data needs, or improve the overall pool of data. Google BigQuery has emerged as a powerful tool for combining and working on these large datasets at scale. Multiple groups (including the InSysPo team at Campinas, SUB Göttingen and Sesame Open Science), have created versions of specific open datasets in the BigQuery system, which anyone can access and run their own analyses. Here, the ‘provider’ pays for storage, and the user covers the costs of processing.
For large scale analysis that require entire data sources to be combinable and actionable at scale, this approach can add something valuable to the overall Open Research Information ecosystem. In this presentation, we will share examples of use cases and discuss key questions around coordination and sustainability of this approach (including potential alternatives to Google Big Query), and what would be needed for different stakeholders to make this attractive.