Document 3: Implementation of the Preprint Service

Drafted by ASAPbio

At present, biologists submit very few preprints. However, possible growth to the level of arXiv (100,000 submissions/year) or beyond needs to be considered.  This will challenge the IT capacity (including robust data back-up) and quality control screening systems of existing servers as well as heighten the need to integrate this information.  While still a nascent effort in biology, now is opportune moment to think through a good preprint system that will be accepted by the biology community, have good functionality, and will have lasting value. A particularly important topic for discussion will be whether a consortium of funders will be want to support:

1) a single server for the intake of preprints? or 2) a system for linked, but independent  servers with common standards for quality control and data exchange, etc? Factors to consider for these models:

Maintaining uniform data sharing standards, licensing, and quality control of input. The wwPDB provides an example of a central body that provides standards for multiple PDB servers so that they all contribute in a uniform manner to a single global archive.  This demonstrates feasibility of the multiple server model. However, is it the most efficient model?  If one is to build a system from scratch today, would it be easier to achieve these same goals with one server?

Governance.  A single server supported by the consortium would presumably have one governing body.  With multiple servers, how would governance work for setting standards for integration?  Would funders (or funders/scientists) be involved in the appointments to the advisory boards of individual servers?

International Representation and Involvement.  Preprints are a global resource of knowledge and international involvement is critical for the realization of this vision.  A single preprint server located in the United States with only funding from US agencies may not be perceived as a global resource and attract scientists from around the world. A single server could be supported by a consortium of international funders. Alternatively different preprint servers for different geographic regions could emerge and be supported by regional funding agencies, along the lines of the PDB and PMC.

Overall Cost and Funding Mechanism. Funding of a single server by a consortium of cooperative parties) is relatively straight-forward, but how would that single server be chosen?  Would it be through a competitive call for a contract?  On the other hand, if there are multiple preprint servers, at what level would the funders engage?  Would funders build the IT infrastructure for linking the data?  Or will they be funding the operation of multiple intake servers?  If so, would there be redundant costs for operating several servers versus funding one and would this be mitigated by extra value created?

Long-Term Archival and Preservation.  Preprints should be a permanent record of scientific work and should be backed up. How would the one versus multiple server models affect the implementation of an effective strategy for maintaining a permanent record?

Spurring Innovation. A primary argument for the multiple preprint server model is the potential to promote innovation. arXiv, for example, only has PDF download and no commentary features.  What if a physicist wants a nice HTML web interface for their manuscript and internet commentary on their work? Perhaps multiple, competitive intake servers with different interfaces and features would be beneficial for physicists?   However, with a single server, innovation could be still occur at a level above the initial submission.  With free access to the single server’s API, for-profit or non-profit entities could provide added value, which could include better customized search engines for information, recommendations of work, post-publication peer review, and discussion forums. By separating 1) the initial submission to a highly visible and stable platform (what all scientists want) from 2) additional services (what some scientists want and will be willing to pay for), the market place for innovation can still occur and new ideas tested based upon need and performance. This type of innovation, however, requires that the server be developed with an open API and the right licensing terms.

Do We Know What to Build?  Supporting one server entails risk since it could fail for a variety of reasons. Multiple servers might mitigate risk and perhaps even promote a Darwinian competition with an eventual winner (or with multiple winners each providing value). This market place rationale is reasonable. However, there are also counter arguments and questions.  Will economics (financial support through scientist grants or directly from funders) be sufficient to allow the growth of many flourishing, properly-maintained, and innovative preprint servers?  Can we start by building one consortium-funded server now that will succeed in its goals and not be likely to fail?

An Alternative Model:  Preprints from Journal Submissions

Every journal could develop it own “preprint service” by posting submitted work while in review. One advantage is that the entire process (submission to publication) could be made transparent (an interesting model being pioneered by F1000 Research). Funders would not need to pay for the preprint directly since that cost will absorbed by the journal (but passed along ultimately to the scientist who pays final publication fees). Innovation is promoted since each journal can develop its own preprint style.

Disadvantages are that funders will have to develop a mechanism for creating the preprint database (into PubMed or a new mechanism) from their ingestion through many journals. Furthermore, while some journals accept the majority of submitted manuscripts after peer review, most do not.  This will create a non-viable starting position for many scientists, since a preprint will be linked to a journal that might ultimate reject the work. Furthermore, an important premise of preprints is to circumvent the current problem of judging quality based upon on journal name.