Analysing preprint metadata to understand adoption and impact

Preprinting in the life sciences has grown rapidly in recent years but still represents a very small fraction (~3%) of the biomedical literature published each year. One of ASAPbio’s major tasks is to engage the research community about continuing adoption and developing best practices for preprints – but we rely on fairly broad metrics when monitoring preprint adoption, and it would be informative to understand the level of preprint posting within individual communities, whether bounded by institution or research area.

In 2019, we collaborated with the ScholCommLab to better understand the status of preprint adoption and impact in specific research communities by analysing available preprint metadata. The ScholCommLab is an interdisciplinary team of researchers based in Vancouver and Ottawa, Canada, interested in all aspects of scholarly communication.

From May to September 2019, ASAPbio supported Mario Malički and Janina Sarol to work on the Preprints Uptake and Use project as Visiting Scholars under the supervision of Juan Pablo Alperin, ScholCommLab, Simon Fraser University, Vancouver.

Janina, Mario and Juan’s work has involved collating metadata from across servers of relevance to biology, and understanding the completeness and accuracy of these metadata, before analysing it to understand adoption and impact. The team reported progress in a series of blogs exploring the challenges of working with preprint metadata from various sources:

[2020-01-27 update] COS have provided more details about their approach and vision for the OSF Preprint infrastructure.

The aims of the Preprints Uptake and Use project were to:

Consolidate available data sources into an efficient and usable database of preprint metadata across servers.
Analyse these data to understand the level and impact of preprinting in specific communities, for example by keyword or institution/region.
Map these data to available scholarly network data to understand nodes of preprint adoption and non-adoption by scientists in the context of their research connections.

Existing data sources and monitors present indicators of growth in broad research areas and large regions, for example:

The preprint servers release annual overview statistics for their own platforms, often on Twitter and sometimes as articles, including Narock & Goldstein, 2019 [doi]; Sever et al., 2019 [doi];
Preprinting and author information data for bioRxiv are collected for Rxivist (by Rich Abdill and Ran Blekhman) on a monthly basis, with stats per major research category as defined by bioRxiv (bioRxiv recently released an API that may further support monitoring of content on this platform);
Prepubmed.org (by Jordan Anaya) indexed preprinting data by month until December 2018 across some platforms, which can be displayed by subject area and number of new corresponding authors, and we have continued monitoring several sources of preprints since December 2018 (see https://asapbio.org/preprint-info/biology-preprints-over-time);
The European Commission’s Open Science Monitor includes preprinting activity split by European country until early 2017;
Searching Europe PMC and calling the Crossref API can retrieve preprints linked to their published versions with associated metadata for venues that report to Crossref;
The SHARE infrastructure developed by the Association of Research Libraries (ARL) in partnership with the Center for Open Science (COS) consolidates some metadata across various sources.

ASAPbio are interested to reach a more granular understanding of adoption by individuals and groups to focus our work to support the productive use of preprints in the life sciences. For example, developing or improving ways to search, filter and analyse preprint metadata may help us to:

Improve the accuracy and timeliness with which we monitor and report preprint adoption in the life sciences;
Support ASAPbio ambassadors to develop personalised information for their lab, institution or conference network, by showing real examples of preprint posting by researchers within or associated with their network, potentially engaging with these authors to understand the impact on their science and whether they would recommend the use of preprints to their peers;
Support researchers wishing to collate lists of preprints presented at their conference (several researchers have manually recruited entries to such lists using PreLists);
Identify preprint authors who appear to be nucleating adoption – such as by leading co-authors to preprint for the first time and who go on to preprint again independently – and understand the drivers behind this influence.

The ScholCommLab team continue to share their findings – follow the project on Twitter (#scholcommlab) or sign up for the ScholCommLab’s newsletter to find out when future project outputs become available.

With thanks to Juan Pablo Alperin, Alice Fleerackers, Lauren Maggio, Mario Malicki, Janina Sarol, and the ScholCommLab for their contributions.

Analysing preprint metadata to understand adoption and impact

Leave a Reply Cancel reply