Author Archives: Jessica Polka

Appendix 1: Rationale for a Central Service

Central Service model documents

Do we need new infrastructure and governance for preprints?

Because the preprint server arXiv was born very early in the history of the internet and served its community well, it has become the de facto repository for preprints in the physical, mathematics and computer sciences without any major competitors. During the past two decades, various scientific disciplines decided to join arXiv rather than start their own servers. Thus, arXiv has become a “central server” for the physical science community and has achieved high visibility.

The success of preprints in physics was aided by the coalescence of a large body of content in one highly visible site (arXiv) that had a high standard for quality, attracted outstanding work, and had a scientist-led governance model. Biologists could attempt to replicate this single server model. However, biologists already deposit work in several existing preprint servers, notably bioRxiv (established 2013), PeerJ Preprints (established 2013), and the q-bio section of arXiv (established 2003). In addition, PLOS has had a long-standing interest in posting pre-peer review manuscripts as an option for submitted papers, which could add considerable content in the near future. F1000 has developed a publishing platform that provides access to manuscripts before formal peer review (effectively a preprint; see definition below) and the Wellcome Trust has adopted the F1000 platform to launch Wellcome Open Research. Other journals or funding agencies may also decide to develop similar dissemination mechanisms for pre-peer review content. Thus, the concept of disseminating “pre-peer review” manuscripts is broadening beyond a traditional “arXiv”-like server.

The future of preprints in biology is now poised at both an exciting as well as fragile moment. If organized and thought-through properly, preprints could accelerate scientific communication, serve the public good, clarify priority of discovery, and help career transitions of young scientists (see Preprints for the Life Sciences, Science). The development of preprints could be governed by the scientific community, in partnership with publishers and other service providers, leading to exciting and innovative possibilities for science communication.

However, the future of preprints could be less bright. Preprints could become fragmented among the efforts of multiple competing parties, lack overall visibility and critical mass, fail to harness modern possibilities for dissemination and use, and lack clear governance. Preprints may fail to achieve the level of respectability needed to convince scientists, funders, and universities that the disclosure of work by scientists plays a valuable role in the ecosystem of science communication, along with post-peer review journal publications. If preprints fail to grow substantially in submissions and readership in the next five years (e.g. to the level of arXiv), scientists and funders will view them a failed experiment. Because the future of preprints is poised in critical time window, it is important to think through issues of execution that will maximize the chance of preprint adoption by the community.

If the scientific community does not act, continued fragmentation of preprint sites could undermine the potential of this communication system by generating:

  • Ambiguity about what qualifies as an acceptable preprint and a recognized content provider. Currently, funding agencies and universities are considering whether preprints or other “pre-peer review” publications should be included in applications. However, what is defined as a “recognized preprint server” is ambiguous at the moment. Every server or publisher may define their own screening protocol, causing uncertainty about whether a preprint has been screened for plagiarism or adheres to ethical standards. In this current system, each journal, funding agency, and hiring or promoting committee must define a list of approved preprint sources based on their own assessments of preprint servers. This practice, which is already occurring at certain journals, will create a situation that is confusing and discouraging for researchers.
  • Lack of visibility and difficulty of discovery. If preprints are spread across multiple sites, they will become more difficult to find. Maximizing discoverability, visibility, and respectability are key to adoption and widespread use by scientists, as is suggested by the success of arXiv.
  • Variable and potentially limited access to data. In the current system, each server sets its own licensing policies and is responsible for archiving its own content. This puts content in danger of being held under restrictive licenses or lost altogether.

Limited potential for technology development. If each server must create or outsource IT infrastructure, overall costs of the preprint system will be high, many servers will not have funds for more advanced IT development, and the potential for using and disseminating information may be limited.

Value of a Central Service

To overcome the deficiencies described above, we believe it would be in the best interest of the scientific community to create a Central Service (name subject to change at a later date) that will aggregate “pre-peer review” manuscripts from several sources, maintain standards of quality for its intake, preserve content for posterity, and disseminate information in a manner that advances scientific progress. The Central Preprint Service would, in essence, function as a database that serves the public good, analogous to the Protein Data Bank or Pubmed Central. We envision that a Central Preprint Service will be supported by a consortium of funding agencies for a minimum five year term of operation. It wil be overseen by governance body that will be 1) international, 2) led by highly respected members of the scientific community, and 3) transparent in all of its proceedings, actions, and recommendations.

Partnerships with journals and servers

The Central Service will host manuscripts that contain 1) data, 2) the methods needed by other scientists to replicate that data, and 3) an interpretation of that data. The governing body will determine how manuscripts are screened for entry into the Service (for example, to exclude content that is plagiarized, non-scientific, or in violation of ethical guidelines). However, the Service will not engage in validation or judgment of the work as is performed by traditional peer review. Thus, the Central Service will work as a partner, and not a competitor, with existing journals.

The Service also seeks to act as a partner with preprint servers and publishers that ingest manuscripts from authors. Partners who can deposit their content into the Central Preprint Service will benefit from additional infrastructure support (e.g. plagiarism detection, conversion tools, etc) and most importantly will have greater appeal to scientists who will want their preprint broadly viewed and recognized by grant and promotion committees.

Creation of a Central Preprint Service for the Life Sciences

ASAPbio is iteratively seeking community feedback on a draft model for a Central Preprint Service. We will integrate community and stakeholder feedback into a proposal, containing several model variants, to funders this fall. Please leave your feedback on utility of the Central Service, its features, and the model described in the Summary in the comment section at the bottom of the page, or email it privately to jessica.polka at gmail.com. More comments are posted on hypothes.is (follow this link and expand the menu at right)

Central Service model documents

Summary

At the ASAPbio Funders’ Workshop (May 24, 2016, NIH), representatives from 16 funding agencies requested that ASAPbio “develop a proposal describing the governance, infrastructure and standards desired for a preprint service that represents the views of the broadest number of stakeholders.” We are now holding a Technical Workshop to advise on the infrastructure and standards for a Central Service (CS) for preprints. ASAPbio will integrate the output of the meeting and community and stakeholder feedback into a proposal to funding agencies this fall. The funders may issue a formal RFA to which any interested parties could apply for funding. More details on this process are found at the end of Appendix 2.

Background

The preprint ecosystem in biology is already diverse; major players include bioRxiv, PeerJ Preprints, the q-bio section of arXiv, and others. In addition, platforms such as F1000Research and Wellcome Open Research are producing increasing volumes of pre-peer reviewed content. PLOS has a stated commitment to exploring posting of manuscripts before peer review, and other services may be developed in the future.

Increasing the number of intake mechanisms for the dissemination of pre-peer reviewed manuscripts has several advantages, for example: 1) generating more choices for scientists, 2) promoting innovative author services, and 3) increasing the overall volume of manuscripts, thus helping to establish a system of scientist-driven disclosure of their research. However, an increasing number of intake mechanisms also may lead to confusion and difficulty in finding preprints, heterogenous standards of ethical disclosure, duplication of effort in creation of infrastructure, and uncertainty of long-term preservation. (See a more complete discussion of why we think it is essential to aggregate content in Appendix 1.)

Based upon funder interest from the May 24th Workshop, ASAPbio will propose that funding agencies support the creation of a Central Service (CS) that will aggregate preprint content from multiple entities. This service will have features of PubMed (indexing/search) and PubMed Central (collection, storage, and output of manuscripts and other data).

The advantages of this system for the scientific community would be:

  1. Oversight by a Governance Body. The content, performances, and services of the CS would be overseen by a Governance Body composed of highly respected scientists and technical experts. The formation of Governance Body, which will have international representation and be transparent in its operation, will be addressed by a separate ASAPbio task force and will not be discussed in the Technical Workshop. The connection between the CS and a community-led Governance Body will ensure that preprints continue to serve the public good and develop in ways that benefit the scientific community, beyond the needs of individual publishers and servers. This formation of a central, well-functioning Governance Body has been repeatedly described by funders and scientists as an essential element in gaining respectability for preprints and guiding the system in the future.
  2. Guaranteed stable preservation. Archiving content through a CS better assures permanence of the scientific record, even if a preprint server/publisher decides to discontinue their services.This is a key feature for both scientists and funders.
  3. Greater discoverability and visibility for scientists. The CS would become the location for scientists to search for all new pre-peer reviewed content. Lessons from arXiv indicate that a highly visible, highly respected single site for searching for new findings is essential for the scientific community.
  4. Clarity on what qualifies as a respected preprint. Scientists want their preprint to “count” for hiring, promotions, and grant applications. However, universities and funding agencies are concerned about quality control for preprints and how they can guide their scientists and reviewers on what qualifies as a credible preprint or preprint server. The CS/Governance Body will work with universities and funders to apply uniform standards of author identity, checks for plagiarism, moderation of problems, and create ethical guidelines for research and disclosure. Thus, content on the CS, coming from several sources, will meet uniform guidelines acceptable to funders and universities.
  5. Better services for scientists. Scientists, as consumers, want better ways of viewing content. They want to read manuscripts in an xml format on the web or as a PDF download, more easily link to references, and more easily view figures and movies. The CS would perform document conversion to ease viewing and searching for material, thereby accelerating new discoveries. The CS would have an API to enable innovative reuse by other parties to provides services that could be valuable for scientists beyond the scope of the CS (e.g. evaluations of work, journal clubs, additional search engines).
  6. Reduced overall cost. The central service can efficiently provide services (such as archiving, automated screening, and document conversion) that otherwise would be provided redundantly by each intake server/publisher.

We discussed various models for the CS with stakeholders (see Appendix 2 for types of models and the feedback that we received). This document describes the current iteration of the model, which is still in draft form. We will present several variations to funders this fall, based on feedback received, including the comments here. If you prefer, you may email comments privately to jessica.polka at gmail.com.

The CS would undertake several functions including centralized document conversion, accrediting (via setting guidelines for intake), archiving, search, and an API for third-party use. We are currently considering that the CS would not display full-text, but instead would send back the converted full-text to the intake server for display.

In this draft model:

  • Servers would facilitate the submission of a .doc or .tex file and a standardized set of metadata (e.g. authors names, potentially ORCID numbers, etc) to the CS. From this file, the CS could extract an html or xml file (possibly including links to references, figures, etc).
  • If this file passes CS screening (including plagiarism detection, and potentially human moderation etc), it would be admitted into the central database, assigned a unique ID, and be sent back to the intake provider for display.
  • The CS would archive the original .doc file and other associated files, and also make these available via an API; as reference extraction technology improves, etc, new html/xml derivatives can be prepared. The CS would reserve the right to display content if the intake provider is not able to do so or if required by the funders or governance body.
  • Readers could search for preprints (or receive alerts) through CS-hosted tools that would display metadata (including abstracts); readers would be sent to the intake server for full-text display of preprints.
  • All aspects of the central service would be under the control of a governing body, which would have international representation from the scientific community and could develop over time.

The Technical Workshop will discuss the features, mechanisms, existing infrastructure, potential concerns and challenges, and timelines for implementation for the elements in orange on the diagram below. 

CS model v2

(previous version)

ASAPbio will continue to modify the model before and after the Technical Workshop before presenting several variations to funders in the fall.

Below: possible early-stage implementation

CS model v2 initial

(previous version)

Four foundations announce support for ASAPbio

This announcement was originally posted on the Simons Foundation website.

On June 20, four foundations announced their support for ASAPbio (Accelerating Science and Publication in Biology), a scientist-driven effort with a mission to promote the use of preprints in the life sciences. The combined total provisional funding — from the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, the Laura and John Arnold Foundation and the Simons Foundation — is $400,000 for work to be conducted over the next 18 months.

The hope is that use of preprints will catalyze scientific discovery, facilitate career advancement and improve the culture of communication within the biology community. Continue reading

Vale & Hyman publish eLife article on preprints & priority

Tony Hyman and ASAPbio founder Ron Vale have just published a Point of View in eLife building on their earlier blog post.

ABSTRACT: The job of a scientist is to make a discovery and then communicate this new knowledge to others. For a scientist to be successful, he or she needs to be able to claim credit or priority for discoveries throughout their career. However, despite being fundamental to the reward system of science, the principles for establishing the “priority of discovery” are rarely discussed. Here we break down priority into two steps: disclosure, in which the discovery is released to the world-wide community; and validation, in which other scientists assess the accuracy, quality and importance of the work. Currently, in biology, disclosure and an initial validation are combined in a journal publication. Here, we discuss the advantages of separating these steps into disclosure via a preprint, and validation via a combination of peer review at a journal and additional evaluation by the wider scientific community.

Summary of the ASAPbio Funders’ Workshop

The following is a message from funding agency representatives who attended our recent Funders’ Workshop.

As research funders who attended the ASAPbio Funder’s Workshop for Preprints held at the National Institutes of Health (NIH) on May 23-24, 2016, we wish to provide a brief summary of the meeting. This follows the initial Funder’s Perspective drawn from the first ASAPbio Workshop held on February 16-17, 2016, and continues our desire to be transparent while the community continues to explore the value of preprints to the biomedical research enterprise.

At this workshop, the funders were presented with a summary from the first workshop and the results of a survey conducted by ASAPbio. This was followed by an open discussion of the scholarly and technical goals of a preprint service. The agenda then moved to a discussion of two exemplary models of shared governance of a resource in an international setting, Europe PubMedCentral (Europe PMC) and the Worldwide Protein Data Bank (wwPDB). The final context setting for the funders discussion was provided by representatives of existing and anticipated preprint services,  ArXiv, bioRxiv, PeerJ, F1000 Research, and PLOS. What followed was an open session with all stakeholders present and a closed session involving only the funders.

The consensus of the workshop attendees reflected high enthusiasm about further development of a preprint service for the life sciences. At the end of the day, it was agreed by all in attendance that:

  1.      A preprint policy that is as homogeneous as possible across funders is desired, especially in the way that preprints are considered as part of proposal grant submission and review. A subgroup of funders will draft a concept paper addressing some of the policy issues that might arise when implementing such a preprint policy. This draft will be shared with other funders for their input.
  2.      The funders asked ASAPbio to develop a proposal describing the governance, infrastructure and standards desired for a preprint service that represents the views of the broadest number of stakeholders. The proposal should include a budget, goals, milestones and implementation timeline to bring an appropriate community defined preprint service into operation.
  3.      This letter be distributed as widely as possible to inform all stakeholders of the continued interest by funders in expanding the use of preprints by the life sciences community.

Philip Bourne, The National Institutes of Health
Maryrose Franko, Health Research Alliance
Michele Garfinkel, European Molecular Biology Organization
Judith Glaven, Howard Hughes Medical Institute
Eric Green, The National Institutes of Health
Josh Greenberg, The Alfred P Sloan Foundation
Jennifer Hansen, Bill and Melinda Gates Foundation
Robert Kiley, The Wellcome Trust
Cecy Marden, The Wellcome Trust
Paul Lasko, Canadian Institutes of Health Research
Maria Leptin, European Molecular Biology Organization
Tony Peatfield, Medical Research Council, UK
Brooke Rosenzweig, The Helmsley Trust
Jane Silverthorne, The National Science Foundation
John Spiro, The Simons Foundation
Michael Stebbins, The Arnold Foundation
Nils Stenseth, European Research Council
Carly Strasser, Gordon and Betty Moore Foundation
Neil Thakur, The National Institutes of Health
K. VijayRaghavan, Department of Biotechnology, India

CC-BY-SA Thomas Ulrich, Flickr

Moore Foundation requests grantee feedback on preprint policy

The Data-Driven Discovery group at the Gordon and Betty Moore Foundation released a post on Medium today soliciting feedback on proposed changes to their policies on a variety of open access practices. Preprints are discussed as follows:

Ideally, all journal articles would first be available as preprints. Preprints are versions of your manuscript that are not yet peer reviewed. Many journals allow you to submit articles that have been available as preprints (see this listfor more information). Read more about the benefits of preprints here. Typical places where preprints are deposited for free (read more from Jabberwocky Ecology blog):

  • arXiv (for physics, mathematics, computer science, quantitative biology)
  • bioRxiv (for any biology research)
  • PeerJ Preprints (for biology, medical/health sciences, computer sciences)
  • figshare (for any research)

You can read more and provide input at the post.

Image CC-BY-SA Thomas Ulrich, Flickr

Simons Foundation supports preprints in grants

On May 20, 2016, a Simons Foundation initiative, SFARI, announced that it has changed its policies to support and encourage the use of preprints.

The Simons Foundation Autism Research Initiative (SFARI) recently made two important changes that we hope will help to accelerate the pace of autism research. First, we changed our grant award letter to strongly encourage all SFARI Investigators to post preprints on recognized servers in parallel with (or even before) submission to a peer-reviewed journal. Second, our biosketch form was updated to include space for SFARI grant applicants to list manuscripts deposited in preprint servers; we and our outside peer reviewers will take these manuscripts into consideration when making funding decisions.

Read more on the SFARI website here.

ASAPbio attendees’ commentary in Science

A group of attendees of ASAPbio have published a commentary in the “Policy Forum” section of the journal Science on May 20, 2016. Written by scientists and representatives from journals and funding agencies, the paper serves as a meeting report and summary of opinions on the use of preprints in the life sciences.

Correction: This paper contains a sentence stating that “the median review time at journals has grown from 85 days to >150 days during the past decade.” This is true of Nature, but not journals as a whole. Daniel Himmelstein’s analysis shows that delays across all journals have remained stable.

Photo by N. Cary/Science

Document 4: What does IT infrastructure for a next generation preprint service look like?

Authored by Jo McEntyre and Phil Bourne

Goal: To satisfy the fundamental requirements of establishing scientific priority rapidly and cheaply through providing the ability to publish and access open preprints, balanced with the desire to support open science innovation around publishing workflows.

Approach: An internationally supported, open, archive (or platform) for preprints as infrastructure is ideal because (a) should the use of preprints become widespread, there is potential to reap long-term open science benefits, as is the case for public data resources and (b) some core functions only need to be done once not over and over (think: CrossRef, ORCID, INSDC, PDB, PMC/EuropePMC). Ideally this would involve working with existing preprint servers to provide a core platform and archival support.

Some assumptions

  • No point-of-service cost to post a preprint for the author.
  • Licenses that support reuse (ie CC-BY) of posted articles.
  • Preprints will be citable (have DOIs).
  • Should be embedded with related infrastructures such as Europe PMC/PMC, ORCID, CrossRef and public data resources.
  • Reuse and integration as core values – by various stakeholder groups including publishers, algorithm developers, text miners, other service providers
  • Standard implementations of key requirements across multiple stakeholders e.g. version control, events notification (such as publication in a journal or preprint citation), article format standards (JATS)
  • All preprints basically discoverable and minable through a single search portal.
  • Transparent reporting/management builds trust and authority around priority
  • International and representative governance
  • Metrics to provide data on meaningful use of the content
  • Tools to manage submissions e.g. triage, communication etc. in keeping with existing manuscript submission systems
  • Public commentary on submissions
  • Linkage with final published version of the article (when it exists)

Data Ingest

The preprint server should be considered an active archive. This means that all content can be accessed at any time and certain core services are provided to enable access by both people and machines.

  1. Basic submission support and support for standard automated screening.
  2. Possible limited branding on submission portals.
  3. Competition on screening methods, or other author services by existing preprint servers (or others) is possible.
  4. Advantages: simplified content flow, standards implementation, content in one place for future use.

Basic Services

  • A stand-alone archive. Initial submission needs to be very quick for author: ie basic metadata plus files establishes priority.
  • Files rapidly published as PDFs with DOI and posted after screening/author services.
  • Ingest mechanisms could be diversified through existing preprint servers – but always some basic [automated] criteria would need to be met (automated to retain speed). For example it could e.g. require an ORCID for the submitting author as a simple trust mechanism, with further validation against grant IDs. Algorithms working on content (plagiarism, detection of poor animal welfare, scope, obscenity) could operate. There is scope for automated screening to be phased in and improved over time.
  • This model provides the opportunity for innovation around screening algorithms by the platform as well as third parties. It also provides business opportunities around author services.
  • Importantly, it also provides opportunities for innovation around coordinated submission for other materials relevant to the article, for example, data or software. But any integration of this nature would need to be lightweight for submitting authors as the speed of publication is a non negotiable feature of the preprint service.
  • Core version management would be required, both regarding new versions of the same article and linking with any future published versions of the article in journals.

Authenticated Content

  • After basic services get content in and published, the preprint service could generate JATS XML for authenticated content. Authenticated content could be defined by a number of criteria but could be e.g. PIs funded by funding organisations that support the infrastructure, popular preprints.
  • There is a cost to generating JATS XML. Limiting this added value to authenticated content could help control costs and give some confidence around that content for promoting discoverability via existing infrastructures.
  • Conversion to JATS XML will take some time, and would require input from the submitter to sign off on the resulting converted article. However it has the bonus of being integrity checked (e.g. all the figures are present), available for deep indexing, integration, and more widely discoverable via Europe PMC/PMC. Wider discoverability could be an incentive to authors to take the modest amount of extra time required to provide this data quality.
  • Note this could be an ingest point into the archive for XML content from other services/platforms.
  • In the future this more rigorous treatment may be extended to basic services, as work on methods that directly convert Word to Scholarly HTML and JATS XML mature, improve, and costs lower. However it is likely that publishing speed will be an issue for some time and a degree of submittor involvement will always be required.
  • The availability of content in JATS XML provides many opportunity for innovation around the provision of more structure in articles for integration purposes (e.g. tagging reagents, data citations and other deep linking mechanisms).

Post publication screening, filtering and sorting on the preprint platform

  • All content would be available for post-publication human screening such as user reporting of problematic content, commenting and so on.
  • More sophisticated algorithms that rank, sort and filter search results based on trust, content or other criteria could be developed by the platform and most importantly, by 3rd parties.

Data Out

  • All content available for bulk download (PDFs and XML), and via APIs as well as through website search & browse
  • Authenticated content could be made available via established archives (e.g. Europe PMC (PMC)) as a clear subset.
  • Core services managed centrally for example, content sharing with journals (this could be in collaboration with e.g. CrossRef since they already have some infrastructure around this)
  • There are possibilities for sharing article XML across publication workflows, comments/reviews, with journals/other platforms thus saving processing costs.
  • There are countless opportunities to support further innovation on the content by both commercial and academic parties with an open platform approach.

Document 6: Additional Questions for Possible Consideration

Drafted by ASAPbio

How can funders help to validate preprints as a mechanism for communication?

In a Commentary in Science published on May 20, 2016, co-authors representing several funding agencies recommended:

1) Publishing an explicit statement encouraging researchers to make early versions of their manuscripts available through acceptable preprint repositories.

2) Permitting the citation of preprints in acceptable repositories in grant proposals as evidence of productivity, research progress and/or preliminary work.

3) Providing guidance to reviewers on how to assess preprints in grant proposals.

How do funders envision taking these recommendations forward within their own agencies? Can ASAPbio assist in those efforts by working with scientific societies, institutions, journals, and advocacy groups?

Special considerations for human research?

Are there special limitations or concerns regarding preprints and human research that should be taken into consideration for a funder-supported core preprint service?

Gathering data on preprint usage?

Currently, the effectiveness and potential pitfalls in how we communicate and evaluate scientific findings are mostly opinion rather than data-driven. Might funders wish to gather data concerning preprint servers (or compare work going to preprint servers and journals)?   Do preprints servers facilitate the transmission of irreproducible work?  Or do preprints reduce the appearance of problematic journal publications? Do scientist submit lower quality work to preprint servers or is the content similar to journal submission?  Is transmission of “pseudo-science” a problem in reality? Do grant committees find preprints useful or burdensome? Are there additional questions that could be informed by data?

Managing Quality Control?

What kind of quality control would funders like to see (ie, preventing pseudoscience)? How possible is it to ensure uniform quality control on multiple servers? Must all screening be done manually by members of the community, or could algorithms be useful as a mechanism of prioritization of human screening (based upon ORCID numbers, grant support, prior publications, etc.)? Do funders want additional quality control provisions (e.g. e-signing by the submitting author of agreements of authorship and ethical standards of data gathering?).  Would acknowledgment and linkages of submitted work to grant support help to solidify the credibility of submitted work?  Should the service remove or flag a preprint that had been shown through subsequent review to contain incorrect or falsified data? How important is QC for preprints?  Which features of QC should to be implemented now, and which could be approached in the future?