On the topic of preprints, “how do I find them?” remains one of the most common questions. While several search tools already index preprints, many require researchers to look outside of their normal workflows.
On June 9, the National Library of Medicine announced a pilot to include NIH-funded preprints in PubMed and PMC beginning this week. While the NIH has been encouraging the use of preprints for years, this launch, detailed in a post on the NLM Director’s blog, marks the first time these research objects have been discoverable alongside peer-reviewed articles in major NIH databases. Below, Kathryn Funk, Program Manager of PubMed Central (PMC) at the National Library of Medicine (NLM), NIH, answers some of our questions about this project.
Why did the NIH decide to run this pilot now?
NLM has been discussing with NIH leadership how to increase the discovery of preprints for some time, with a plan in place to pilot the inclusion of preprints in PMC, NLM’s full-text database, this year. The COVID-19 pandemic, in which preprints are obviously playing an important role in communicating early research findings, has understandably given those discussions and our pilot plans a greater sense of urgency. We’re hoping by launching this effort now, we’ll be able to more easily and quickly connect people to the information they are looking for, which gets to the heart of how we see our role as a library.
Which preprints will be available to view on PubMed/PMC?
As the pilot name—NIH Preprint Pilot—suggests, the focus will be on collecting preprints that report NIH-funded research results. In the planned first phase, we’ll be making COVID-19-related preprints with NIH support available in PMC with a corresponding citation in PubMed. Our curation efforts for this phase are focused on preprints identified by subject matter experts for inclusion in the iSearch COVID-19 Portfolio tool developed by the NIH Office of Portfolio Analysis. To be eligible for inclusion in the pilot, the preprint must be easily identifiable as NIH-supported either in the author affiliations or acknowledgments.
Building on lessons learned and workflows developed during the first phase of the pilot, we hope to leverage existing tools (e.g., My Bibliography) and introduce more automated preprint curation processes that will allow us to scale up the pilot across the spectrum of NIH research.
What considerations did the NIH take into account when deciding which servers to index?
NLM has always valued industry and community best practices in publishing. Though the preprint landscape is obviously less established than that around journal publishing and is still evolving, we didn’t want to reinvent the wheel. Rather, our goal was to build on emerging good practice and industry guidance, as well as NIH guidance.
We pulled together general considerations based on the NIH guidance for selecting interim research product repositories (NOT-OD-17-050) and recommendations for preprint servers outlined in the Committee on Publication Ethics Discussion Document on preprints (Version 1). Additionally, we looked to the best practices in journal publishing that might be more broadly applicable to preprints, such as those outlined in the Principles of Transparency and Best Practice in Scholarly Publishing.
What this ultimately means for the pilot is that we may consider including preprint servers that demonstrate transparency in policies and practices, such as clear indicators of peer review status, a stated screening process, publication ethics policies, and licensing options. We also look at the availability of a server’s preservation policies/strategy and whether the content is made available in human- and machine-readable formats. The hope is that by outlining these considerations we can provide a framework to further the conversation around best practices for preprints. At the same time, recognizing this is a pilot, we anticipate there may be a need for our policies and expectations to evolve, and a certain degree of flexibility is needed upfront.
For practical, resource-related reasons, we also consider the general scope of a preprint server and the volume of identified NIH-supported preprints.
As with the rest of the pilot, we are taking a phased approach to including preprint servers. Right now, as we’re using the COVID-19 Portfolio tool for preprint identification, we’re looking at the servers they are currently indexing.
How will preprints be labeled & linked to their final articles?
Having a clear indicator that a record is a preprint – and context for what this means – is incredibly important to us. PMC and PubMed are accessed by the general public as well as researchers, and we want to be transparent about the types of article content we are making available.
Both PMC and PubMed will have large green info banners on preprint records that clearly identify the articles as preprints. The preprint banners will include text indicating the papers have not been peer reviewed. We’ll also be using indicators of preprint status within the citations, much like we do with ahead-of-print records and author manuscripts. All preprint records will link to further information about the pilot for additional context.
As for linking to a final published article from a preprint (and vice versa), we’re exploring a few mechanisms and resources for identifying these links. When the link is already available in the published article metadata or when we can confidently identify these links on title, author list, and abstract, we’ll provide the related article link in the standard place for related content for each database.
How will the pilot be evaluated, and what do you hope to learn from it?
At the most basic level, we’ll want to see evidence that we’re meeting the NIH goal of increasing the discoverability of preprints, i.e.: Are preprints discoverable in search results? Are they accessed? Are they cited?
Additionally, we hope to cultivate a greater awareness and understanding of preprints as a research product. We are interested in whether increased discoverability seems to change preprint sharing practice (e.g., more people posting preprints) and if it has an impact on the licensing of preprints and acknowledgment of support by NIH investigators.
Finally, we want to administer this pilot in a way that ensures continued trust in NLM resources and supports responsible scientific communications, so any evaluation that we do will also consider the feedback we receive via email, social media, and word of mouth.
NLM committed in its 2017-2027 Strategic Plan to anticipating developments such as preprints and novel ways of organizing articles, with the ultimate goal of accelerating discovery and advancing health. This pilot offers an opportunity to begin to explore what this commitment could look like in practice and inform future NLM efforts around preprints.
How would you like to see preprints used in the future?
Right now, there is so much dialogue – rightfully so – around the role preprints are playing in sharing COVID-19 research. For me, the question looking into the future is how can we build on this accelerated discovery and open sharing of results across cancer research and opioid research and Alzheimer’s research (the list goes on) to improve human health across the spectrum? This is our mission at NIH, and we want to see if preprints are an effective way to do that.
Also, as a national library, we’re very cognizant at NLM of the fact that we aren’t just collecting research for today’s researchers. We’re building collections that will be of value decades and centuries into the future. So if we can get to a place where our collections can reflect a scholarly ecosystem and open scholarly dialogue, where early research results are openly shared as preprints alongside the supporting data and then later linked to the published journal article with the relevant peer review documents, that feels like a huge win to me both in the here and now and for future researchers and readers.