FORCE2019: Establishing a shared vision for preprints

Following a panel discussion about “Who will influence the success of preprints in biology and to what end?” at FORCE2019 (summarised here), we continued the discussion over dinner with the panellists and other community stakeholders:

On table 1:

Emmy Tsang (facilitator), eLife
Theo Bloom, BMJ and medRxiv
Andrea Chiarelli, Research Consulting
Scott Edmunds, GigaScience
Amye Kenall, Springer Nature
Fiona Murphy, independent consultant
Michael Parkin, Europe PMC, EMBL-EBI
Alex Wade, Chan Zuckerberg Initiative

On table 2:

Naomi Penfold (facilitator), ASAPbio
Juan Pablo Alperin, ScholCommLab/Publick Knowledge Project
Humberto Debat, National Institute of Agricultural Technology (Argentina)
Jo Havemann, AfricArXiv
Maria Levchenko, Europe PMC, EMBL-EBI
Lucia Loffreda, Research Consulting
Claire Rawlinson, BMJ and medRxiv
Dario Taraborelli, Chan Zuckerberg Initiative

To tackle some tricky issues as a group with diverse perspectives, we discussed five straw-man statements about how preprints may or may not function. Emmy’s table discussed:

The level of editorial checks and/or peer review that a preprint has been through should be transparently communicated at the point of access to the preprint
It should always be free for an author to post a preprint
Preprints should not be used to establish priority of discovery
Preprint servers should be agnostic to upstream and downstream tools and processes

Meanwhile Naomi’s table (pictured above) discussed “preprint servers should not be supported by research funders and policymakers unless they demonstrate community governance”, before exchanging different visions for what preprints could be.

Straw-man statement 1: The level of editorial checks and/or peer review that a preprint has been through should be transparently communicated at the point of access to the preprint.

While we generally agreed that editorial checks and any reviewing done on a preprint should be transparently communicated, we quickly realised we have different visions for what transparency means in this context. It is important that we take readers’ needs and experiences into consideration: a researcher who is casually browsing may just need to know the level of scrutiny a preprint has been through (none? Pre-screening for compliance with ethical and legal requirements and of scientific relevance? Some level of deeper peer review?), while a researcher who is digging deep into that research topic or method may find peer-review comments and version histories useful. Some information, such as retractions, should be communicated clearly to all readers. For effective curation, it will also be crucial that information on the checks and reviews be adequately captured using a well-defined and agreed-upon metadata schema. But how can such data be practically captured across a distributed set of servers? Peer-review and editorial processes nowadays vary hugely between journals and preprint servers, so to what extent can we effectively schematise these processes?

Straw-man statement 2: It should always be free for an author to post a preprint.

We unanimously agreed that preprints should be free at the point of use.

Straw-man statement 3: Preprints should not be used to establish priority of discovery.

Ideally, the priority of discovery should not matter, but we recognised that, in the current research climate, this issue should be addressed. Once a preprint is published in the public domain, scientific priority of the work described in the preprint is established. We recognise that current legal instruments may not act in line with this: for example, US patent law still establishes priority based on the filing of the patent application, and any public disclosure – by preprint or informal meeting – can undermine this. Further consideration and clarity is needed for how posting a preprint intersects with priority claims and what this means for discovery and intellectual property.

Straw-man statement 4: Preprint servers should be agnostic to upstream and downstream tools and processes.

To use preprints to their full potential, we think preprint servers should be compatible and interoperable with upstream and downstream tools, software and partners, and at the same time not indifferent to information or pointers towards emerging practices, community standards and so on. For example, upstream processes to capture and curate metadata can be invaluable for discovery. Community preprint servers can also advise on best practices for downstream workflows, adding value to the work and facilitating reuse and further contributions.

Straw-man statement 5: Preprint servers should not be supported by research funders and policymakers unless they demonstrate community governance.

What do we mean by community governance and why is this important?

We discussed that a major motivation behind the push for community-led infrastructure is to minimise the chance of commercial interests being prioritised over benefit to science, as has happened with the loss of ownership and access to peer-reviewed manuscripts (by the collective) due to the profit imperative of commercial publishers. Here, we may be asking: are commercial interests prioritised over the purpose of sharing knowledge and facilitating discourse, and how might we ensure this isn’t the case for preprint servers?

Beyond commercial drivers, we acknowledged that service/infrastructure providers (publishers, technologists) are making process and design choices that affect user behaviour. This was not raised as a criticism – instead, several of us agreed that the behaviour of individual researchers is often largely guided by their immediate individual needs and not collective gain, due in part to the pressure and constraints of the environment they are working within. People who work at publishing organisations bring professional skills and knowledge to the reporting of science that are complementary to academic editors, reviewers and authors. The question is how to ensure process and design choices are in line with what will most readily advance scholarship.

We discussed how no single stakeholder can represent the best interests of science, nor is there a single vision on how to best achieve it. Is advancing the growth of a particular server a boon for the whole community? Or should all decisions be made in the interests of the collective? Whose content should we pay attention to, and how do we know who to trust? How is any one group’s decision-making process accountable to the whole? We asked these questions with a shared understanding that many journals operate as a collaboration between members of the academic community and publishing staff, and that some preprint servers (such as bioRxiv) are operated along the same lines. However, whether and how this works may not be transparent, and the lack of transparency may be the central issue when it comes to trusting that decisions are in the collective best interests. Leaving the decision of who to trust to funders or policymakers may not reflect what the broader community wants, either.

So, how might decisions at a preprint server be made in a way that the broader community can trust? We looked to other examples of community-led governance – whether that’s the community having input into decisions or being able to hold decision-makers accountable for them, particularly to moderate any decisions influenced by commercial interests. One mechanism is to run an open request for comments (RFC; for example, see https://meta.wikimedia.org/wiki/Requests_for_comment) so that anyone can provide inputs. However, there needs to be a transparent and fair process to decide whose input is acted upon, and a recognition that such processes do not guarantee better outcomes. Alternatively, projects could employ a combination of mechanisms to listen to different stakeholders: for example, the team behind Europe PMC listen to users through product research, to academics through a scientific advisory board, and to policymakers through a coalition of funders. This latter process can provide a resilient decision-making process, not easily directed by a single stakeholder (such as anyone representing the commercial bottom line), but it can be costly in terms of management resources.

User behaviour is influenced by social and technological decisions made at the infrastructure level, so how a preprint server is run, and by whom, will contribute to whose vision for preprints in biology will ultimately play out in reality. The discussion continued online after our dinner.

Can we establish a shared vision for preprints in biology?

Our experiences, spheres of knowledge and values all influence what we each envision preprints to be and become: from helping results to be shared in a timely manner, to disrupting the current commercial publishing enterprise.

Emmy’s table discussed how confusion around what constitutes a preprint (and what does not) creates difficulties when developing tools, policies and infrastructure for them. With different use cases for preprints, and where communities may want to share different pre-publication research outputs, it was proposed that narrowing the definition of preprints to “manuscripts ready for journal publication” could help simplify technological development, communication and advocacy work. Preprint servers would then have the sole purpose of housing and serving preprints. This may not capture all use cases of preprints, but it was seen as a worthwhile tradeoff for increasing adoption at this moment in time. However, on Naomi’s table, we proposed it may be useful to be transparent about more complicated and/or extended visions for change, to avoid progress stalling once adoption of this simplified definition is stable.

Importantly, we discussed our concerns about preprints, sometimes envisioning situations we did not want to see materialise:

Preprints may not always be free to post and read, depending on the financial models used to sustain the costs of preprint infrastructure – there was a word of caution about how the open-access movement in the US and Europe is currently pursuing the use of article processing charges (APCs) to pay for open access. This may be how preprints are paid for unless other options, such as direct support by funders and institutions (for example, through libraries), are used.
With preprints available publicly, what if they are misunderstood or misinterpreted? What if incorrect science is spread like “fake news”? We discussed how some patient groups are able to critique the literature without formal science education and that peer review does not guarantee correctness. Offering readers greater transparency and information about whether and how the work has been reviewed by other experts would be helpful.
Preprints may not disrupt scholarship – we may continue to operate in a world where rapid, open, equitable access to the production and consumption of knowledge is not optimised for. This may be seen today by the use of preprinting to claim priority of discovery without including access to the underlying datasets, and by the uptake of journal-integrated pre-publication where authors can show they have passed the triage stage at journal brands with prestigious reputations.
Publisher platforms may generate lock-in, as authors post the preprint to their platform and are then directed to remain within that publisher’s peer-review channels.
We briefly talked about the use of open resources to generate profit: do preprints need protecting from commercial exploitation through the use of licensing clauses, such as share alike (-SA)? Maybe not: profit generation on open resources may not be a problem, as long as the community agrees that the benefits of openness continue to outweigh any exploitation, as is currently seen to be the case for Wikipedia.

So what did we want to see happen? We concluded by sharing our own visions for preprints, including:

The main venue for research dissemination, in a timely manner, which is free to authors and readers, and upon which peer review proceeds. This peer-review may be community-organised; it may be more efficient and timely when this is needed, for example during infectious disease outbreaks. The verification and validation of a preprint may change over time, and versioning enables the full history to be interrogable.
A transparent record of scientific discourse that is a resource for learning accepted and/or preferred practices, within a discipline (for example, the appropriate statistical method to apply in a given experimental setup) or more broadly (for example, how to be a constructive peer reviewer and responsible author).
Supporting faster and better advances in medicine, particularly in a world where patients have improved their own lives by hacking medical technologies (e.g. #WeAreNotWaiting) or showing their clinician(s) evidence from the literature.
A vehicle through which researchers can connect and engage with other audiences (patients, policymakers), and learn how to do this well.
A way for knowledge generation and use to be more equitable and inclusive, for example by increasing the visibility of researchers around the globe (as AfricArXiv and others are doing for researchers in or from Africa).
A vehicle for scholarly discourse that does not necessitate in-person attendance at conferences, reducing the use of airplane travel and avoiding exclusion due to costs, visa issues and other exclusionary factors.

Moving forward, suggestions were to include different voices in the discussion, provide more thought leadership, develop a consensus vision for the future of preprints, develop best practice guidelines for preprint servers, and provide users with sufficient information and clarity to help them choose (through action) the future they wish to see.

What is the future you wish to see? We invite you to talk about this with your colleagues and leave a comment below.