Category Archives: Uncategorized

Preprint journal clubs

Have you organized a preprint journal club, or are you interested in running one? Please take this survey by ASAPbio Ambassadors Sam Hindle and Daniela Saderi!

For authors, one of the most exciting potential benefits of preprints is the ability to attract early feedback from broad and diverse sources during the preparation of a scientific manuscript. Preprint journal clubs can provide this input – and a more meaningful review experience for their own members as well. Here are some examples and resources for setting them up.

Prachee Avasthi’s preprint journal club

Prachee Avasthi, an Assistant Professor at the University of Kansas Medical Center, runs a course for graduate students called “Analysis of Scientific Papers.” The class takes the shape of a journal club in which students learn how to critically evaluate scientific manuscripts.

What makes Prachee’s course unique is that the papers under evaluation are drawn exclusively from preprints. As she explains in the video above, this has several benefits:

  • Students’ feedback is actually useful to authors since it’s created while a manuscript is under revision, instead of after it has been published.
  • Since students are expected to share their reviews, they must pay more attention to maintaining high quality commentary and a productive tone.
  • Posting these reviews publicly helps to demonstrate the review process to other students and to scientists interested in the evolution of the paper in question.

Prachee has generously shared her syllabus and introductory slide deck, and the students’ reviews can be found on the Winnower.

Other preprint journal clubs

We’d like to collect a list of similar groups and the tools that facilitate them. If you’re aware of others, let us know at jessica.polka@asapbio.org.

Whether in a course or on a blog, feedback from journal clubs can have a positive impact on authors and their science.

10 ways to support preprints (besides posting one)

Preprinting in biology is gaining steam, but the process is still far from normal: the upload rate to all preprint servers is about 1% that of PubMed. The most obvious way for individual scientists to help turn the tide is, of course, to preprint their own work. But given that it now takes longer to accumulate data for a paper, this opportunity might not come up as often as we’d like.

So, what else can we do to promote the productive use of preprints in biology?

1. Cite preprints

Many biologists, especially early career researchers, are concerned that their preprints won’t be properly acknowledged.

If you’d like to see some anecdata, here’s a word cloud generated by digitally polling the audience at the 2016 EMBO Long Term (postdoctoral) Fellowship retreat in November, 2016 (39 devices responded). The prompt was, “What is your biggest concern about preprints?”

embo-fellows-2

While we have yet to hear an example of a preprint author getting scooped, the concern remains very real. To counter this fear, we need to set an expectation that work disclosed in preprints will be cited fairly when relevant to other preprints and journal articles. A commitment to fairly cite relevant preprints was included in a draft statement from our first meeting, and it was widely endorsed.

2. Comment on a preprint

One of the greatest opportunities preprinting presents is the chance to receive more feedback on a paper. For example, Nikolai Slavov describes how thoughtful, constructive feedback helped his paper improve:

By using this feedback mechanism, we can strengthen one anothers’ science.

3. Set up email alerts

With an increasing number of preprint servers, it can be difficult to manually visit each one to stay on top of the literature. ASAPbio is working to facilitate an aggregation tool that will make this easier, but for now, there are several ways to get automatic email alerts on preprints of interest to you, including PrePubMed’s RSS tool. More details and instructions for different preprint alert options are described here.

prepubmed

4. Review a preprint in your journal club

Reviewing preprints may be even more rewarding that reviewing papers: you have the option to share your opinions with the authors, publicly or privately.

It’s a great educational experience for students, too. Prachee Avasthi at the University of Kansas Medical Center draws material for her “Analysis of Scientific Papers” course exclusively from preprint servers. She’s generously shared her syllabus and introductory slide deck, and the students’ reviews can be found on the Winnower.

See more examples of preprint journal clubs here.

5. Stickers

There are probably researchers in your department who aren’t aware that someone they know has posted a preprint. You can spark conversations around your lab or at conferences by affixing a sticker to your laptop, water bottle, or office door. See examples below for inspiration.

@mcdawg
@pollyp
@sciezgin
@BrianKelch
@clathrin
@aidarodrigo
@JonathanPDrury
@vinjlynch

Fill out this simple form to request some free stickers.

6. Add a message to your email signature

You can raise awareness about preprints with every message sent. Here’s an example:

7. Tell your preprint story

We’re collecting stories about researchers’ experience with preprints. You can tweet them @jessicapolka or using the #ASAPbio hashtag. You can also make a video (similar to Nikolai’s, above), or email jessica.polka@asapbio.org if you’d like to share something in longer written format.

8. Become an ambassador

ASAPbio ambassadors have agreed to act as local points of contact for discussions regarding preprints. They are listed on asapbio.org and have a private discussion group and access to shared presentation materials like slides and posters.

Sign up here.

9. Promote policy change

Journal, funder, and university policies are critical to make preprinting a viable options in biology. If journals with restrictive preprint policies operate in your field, you could write to editors to request that they reconsider. Requests could be made spontaneously or by using an ongoing correspondence to bring attention to the matter.

If you’re on a faculty search committee, consider working to insert a call for preprints into the job ad:

We’re keeping a list of university, funder, and journal policies to provide examples of existing progressive policies that you can reference.

10. Add a slide about preprints to the end of your talkssingle-slide-png

Of course, this works best if customized to fit your own experience (eg, a screenshot of the preprint you’ve been discussing in the talk). You can download a template in pptx here.

Note: the original version of this post encouraged responses to the NIH’s RFI on preprints as item #10.

Please share more ideas below!

Appendix 2: Current feedback on Central Service features

Central Service model documents

Current discussions with the community on proposed features of the Central Service

Surveys and information from scientists

We will continue to engage the scientific community on what services they want to see in a next generation of preprints. However, based upon a survey that ASAPbio conducted in May 2016 (Results summary (pdf) and Anonymized responses (xls)) and other resources (e.g. Preprint user stories compiled by Jennifer Lin at Crossref and ASAPbio survey #1 (early 2016)), we believe that biologists want:

  • High visibility and discoverability of preprints
    • A single recognized website
    • Good search tool
    • Email notifications
  • Web-readable xml format
    • Click on link to figures to display them
    • Ability to click on links to references
    • Export to more readable and compact pdfs
  • A system for cross-referencing versions of the same work
    • Linking the final journal publication to preprint versions (and vice versa), so that the history of the work is transparent and preserved

as a reader servers responsible submission process

Input from servers, publishers, funders and data management experts

In July-August 2016, ASAPbio conducted informal interviews with preprint servers, funders, scientists and developers. We originally presented a variety of Central Preprint Service models of increasing complexity and centralization, ranging from a PubMed-like metadata search tool (Model 1) to a PubMed Central-like database that hosts well-formed XML content (JATS) and makes it available through a web display tool and an API (Model 4). One version also included a central submission tool (Model 5).

5models

While responses to the creation of a central tool were generally very positive, opinions on the best implementation varied. Below is a summary of some of the critical feedback we received.

  • Models 1 & 2 provide little benefit over the current state of affairs. These models generated less interest among funding agencies.There are already multiple ways to search preprints (search.bioPreprint, PrePubMed, Google Scholar) and existing preprint servers already preserve their own content.
  • Models without an open API and common licensing will stifle innovation. Without free access to content, 3rd parties will have difficulty in implementing new services (such as peer review, data mining, or aggregation)
  • Providing central submission and full-text display would be undesirable for some existing servers. These tools would directly compete with existing servers for traffic and recognition in the community. Also, display in multiple locations could disrupt download/view metrics and commenting systems. However, some funders felt that the CS should have the ability for full display as well as drive traffic to server sites. Some funders have expressed an interest in allowing submission directly to the CS (Model 5), but most favor a practical solution that embraces the needs of the ecosystem.
  • Many of the original models are complicated and development of any system with many moving parts will take a long time. Therefore, “perfection must not be the enemy of the good.” There will be a need to generate a CS that will work “out of the box” and improve on it over time. The CS needs to take into account realistic development of technologies.
  • Technological limitations make the use of JATS impractical. No good unsupervised .doc -> JATS converters currently exist. Thus, the conversion process requires human intervention.
  • Document conversion is costly. Server-side conversion to a structured format (such as JATS) is expensive (on the order of ~$20+); therefore, it doesn’t make sense for preprint servers to provide this, especially when preprints generate no revenue. The CS should be close to cost-neutral to servers and other publishing entities.
  • Licensing has generated a diversity of opinion. Some parties favor author or publisher choice in licensing, arguing that scientists will have concerns of the re-use of their material. Our own surveys and interactions with scientists suggest that most do not understand licensing options and their associated benefits/disadvantages. Most funders favor a uniform licensing policy for the material in the CS in order to allow re-use in innovative ways and avoid complicated restrictions for data mining. The license most favored at the moment is CC-BY, although this may require research and engagement with the scientific community.
  • Servers, Platforms, Publishers consulted were generally interested in working with the CS. However, alignment and preference for models varied between model 2 and model 4.  
  • A major topic in which opinion varies is ‘display’. Some funders and publishers the CS should have capability of displaying its archived content.  Others feel that the CS should not display content to readers (other than abstracts) and that display should reside with servers and publishers.
  • Use existing technologies whenever possible. Don’t reinvent the wheel, and carefully evaluate existing software/infrastructure.
  • Balances immediate concerns against opportunities for future development. Expressed by many, this sentiment emphasizes the need for a governance body that can continuously weigh these issues over time and make adjustments. In addition, inter-operability between preprints systems in biology, physics and other disciplines may need to be considered in the future.

We have drafted a provisional model of the Central Service (Summary) that takes into account the various input received above. The model emphasizes the development of document conversion services and the provision of web-ready full-text outputs to the input server. Providing full-text display via intake server/publishers will deliver to scientists many of the benefits they want while providing intake servers with incentives to participate in the program.

Possible benefits of the proposed service

To scientists

  • Preservation
  • Ease of use and readability (through web display at intake server)
  • Adherence to standards of author identity and ethical guidelines for research and disclosure
  • Potential for innovative reuse (with appropriate attribution)

To intake servers/platforms

  • No-cost document conversion into web-readable format
  • No-cost preservation
  • Improved exposure through a search portal that links exclusively to the intake server for display
  • “Accreditation” of servers or individual preprints through central screening process

To funders

  • Uniform standards of quality
  • Access to entire corpus via API
  • Ability to search/filter by funding source

Desired technical features for discussion

We welcome comments on the list of desired features below, which could become an agenda for discussion at the Technical Workshop (August 30, 2016).

Input (collected from the author)

  • Original manuscript file (.doc so that reference metadata can be extracted)
  • Supplementary files
  • ORCID
  • License (note- the Governance Body task force will also address this issue)
  • User authentication
  • Metadata (if extracted from .doc, get the user to check)
  • Grant support
  • Ethical statements (note- a separate task will also address this issue)
    • Self-ID COI
    • All authors agree on submission
    • Methods needed to reproduce this work are contained within the work
    • The work has been conducted in agreement with human & animal research guidelines

Document conversion

  • Extraction of text from source file
  • Extraction of metadata (such as title, authors, affiliations, keywords, and abstract)
  • Extraction of references
  • Insertion of figures, or recognition of existing in-line figures

Screening and moderation

  • Automated plagiarism detection
  • Automated detection of non-scientific content (via arXiv-like algorithm)
  • Interface for human-supervised screening/curation/moderation

Versions and identifiers

  • Unique, persistent ID for each version
  • All versions linked to one another (and to published journal article)
  • Linked to datasets
  • Tombstone pages for retracted content

Archiving

  • Stable archiving of source file (.doc) and also derivatives
  • Permission to display content if intake server reaches end of life

API

  • Bulk download of all content (.doc) and also derivatives
  • Filtering by metadata

Discovery tool

  • Full-text indexing of all content in the central database
  • Advanced search (boolean operators, search fields such as author, keyword, funding support)
  • Alerts (RSS/email)
  • Display of abstracts, etc, but exclusive link to intake server for full-text display

Proposal development process

The output of the Technical Workshop will be an announced in a Request For Information (RFI), in response to which any interested party can provide information on the development and approximate costs of developing a CS. The responses to the RFI will be shared by ASAPbio with major international funding agencies for potential consortium support. Pending their collective interest in financially supporting a plan for a CS and refining its method of operation and governance, a formal RFA may follow the RFI to which interested parties could apply for funding.

decision process

Appendix 1: Rationale for a Central Service

Central Service model documents

Do we need new infrastructure and governance for preprints?

Because the preprint server arXiv was born very early in the history of the internet and served its community well, it has become the de facto repository for preprints in the physical, mathematics and computer sciences without any major competitors. During the past two decades, various scientific disciplines decided to join arXiv rather than start their own servers. Thus, arXiv has become a “central server” for the physical science community and has achieved high visibility.

The success of preprints in physics was aided by the coalescence of a large body of content in one highly visible site (arXiv) that had a high standard for quality, attracted outstanding work, and had a scientist-led governance model. Biologists could attempt to replicate this single server model. However, biologists already deposit work in several existing preprint servers, notably bioRxiv (established 2013), PeerJ Preprints (established 2013), and the q-bio section of arXiv (established 2003). In addition, PLOS has had a long-standing interest in posting pre-peer review manuscripts as an option for submitted papers, which could add considerable content in the near future. F1000 has developed a publishing platform that provides access to manuscripts before formal peer review (effectively a preprint; see definition below) and the Wellcome Trust has adopted the F1000 platform to launch Wellcome Open Research. Other journals or funding agencies may also decide to develop similar dissemination mechanisms for pre-peer review content. Thus, the concept of disseminating “pre-peer review” manuscripts is broadening beyond a traditional “arXiv”-like server.

The future of preprints in biology is now poised at both an exciting as well as fragile moment. If organized and thought-through properly, preprints could accelerate scientific communication, serve the public good, clarify priority of discovery, and help career transitions of young scientists (see Preprints for the Life Sciences, Science). The development of preprints could be governed by the scientific community, in partnership with publishers and other service providers, leading to exciting and innovative possibilities for science communication.

However, the future of preprints could be less bright. Preprints could become fragmented among the efforts of multiple competing parties, lack overall visibility and critical mass, fail to harness modern possibilities for dissemination and use, and lack clear governance. Preprints may fail to achieve the level of respectability needed to convince scientists, funders, and universities that the disclosure of work by scientists plays a valuable role in the ecosystem of science communication, along with post-peer review journal publications. If preprints fail to grow substantially in submissions and readership in the next five years (e.g. to the level of arXiv), scientists and funders will view them a failed experiment. Because the future of preprints is poised in critical time window, it is important to think through issues of execution that will maximize the chance of preprint adoption by the community.

If the scientific community does not act, continued fragmentation of preprint sites could undermine the potential of this communication system by generating:

  • Ambiguity about what qualifies as an acceptable preprint and a recognized content provider. Currently, funding agencies and universities are considering whether preprints or other “pre-peer review” publications should be included in applications. However, what is defined as a “recognized preprint server” is ambiguous at the moment. Every server or publisher may define their own screening protocol, causing uncertainty about whether a preprint has been screened for plagiarism or adheres to ethical standards. In this current system, each journal, funding agency, and hiring or promoting committee must define a list of approved preprint sources based on their own assessments of preprint servers. This practice, which is already occurring at certain journals, will create a situation that is confusing and discouraging for researchers.
  • Lack of visibility and difficulty of discovery. If preprints are spread across multiple sites, they will become more difficult to find. Maximizing discoverability, visibility, and respectability are key to adoption and widespread use by scientists, as is suggested by the success of arXiv.
  • Variable and potentially limited access to data. In the current system, each server sets its own licensing policies and is responsible for archiving its own content. This puts content in danger of being held under restrictive licenses or lost altogether.

Limited potential for technology development. If each server must create or outsource IT infrastructure, overall costs of the preprint system will be high, many servers will not have funds for more advanced IT development, and the potential for using and disseminating information may be limited.

Value of a Central Service

To overcome the deficiencies described above, we believe it would be in the best interest of the scientific community to create a Central Service (name subject to change at a later date) that will aggregate “pre-peer review” manuscripts from several sources, maintain standards of quality for its intake, preserve content for posterity, and disseminate information in a manner that advances scientific progress. The Central Preprint Service would, in essence, function as a database that serves the public good, analogous to the Protein Data Bank or Pubmed Central. We envision that a Central Preprint Service will be supported by a consortium of funding agencies for a minimum five year term of operation. It wil be overseen by governance body that will be 1) international, 2) led by highly respected members of the scientific community, and 3) transparent in all of its proceedings, actions, and recommendations.

Partnerships with journals and servers

The Central Service will host manuscripts that contain 1) data, 2) the methods needed by other scientists to replicate that data, and 3) an interpretation of that data. The governing body will determine how manuscripts are screened for entry into the Service (for example, to exclude content that is plagiarized, non-scientific, or in violation of ethical guidelines). However, the Service will not engage in validation or judgment of the work as is performed by traditional peer review. Thus, the Central Service will work as a partner, and not a competitor, with existing journals.

The Service also seeks to act as a partner with preprint servers and publishers that ingest manuscripts from authors. Partners who can deposit their content into the Central Preprint Service will benefit from additional infrastructure support (e.g. plagiarism detection, conversion tools, etc) and most importantly will have greater appeal to scientists who will want their preprint broadly viewed and recognized by grant and promotion committees.

Document 4: What does IT infrastructure for a next generation preprint service look like?

Authored by Jo McEntyre and Phil Bourne

Goal: To satisfy the fundamental requirements of establishing scientific priority rapidly and cheaply through providing the ability to publish and access open preprints, balanced with the desire to support open science innovation around publishing workflows.

Approach: An internationally supported, open, archive (or platform) for preprints as infrastructure is ideal because (a) should the use of preprints become widespread, there is potential to reap long-term open science benefits, as is the case for public data resources and (b) some core functions only need to be done once not over and over (think: CrossRef, ORCID, INSDC, PDB, PMC/EuropePMC). Ideally this would involve working with existing preprint servers to provide a core platform and archival support.

Some assumptions

  • No point-of-service cost to post a preprint for the author.
  • Licenses that support reuse (ie CC-BY) of posted articles.
  • Preprints will be citable (have DOIs).
  • Should be embedded with related infrastructures such as Europe PMC/PMC, ORCID, CrossRef and public data resources.
  • Reuse and integration as core values – by various stakeholder groups including publishers, algorithm developers, text miners, other service providers
  • Standard implementations of key requirements across multiple stakeholders e.g. version control, events notification (such as publication in a journal or preprint citation), article format standards (JATS)
  • All preprints basically discoverable and minable through a single search portal.
  • Transparent reporting/management builds trust and authority around priority
  • International and representative governance
  • Metrics to provide data on meaningful use of the content
  • Tools to manage submissions e.g. triage, communication etc. in keeping with existing manuscript submission systems
  • Public commentary on submissions
  • Linkage with final published version of the article (when it exists)

Data Ingest

The preprint server should be considered an active archive. This means that all content can be accessed at any time and certain core services are provided to enable access by both people and machines.

  1. Basic submission support and support for standard automated screening.
  2. Possible limited branding on submission portals.
  3. Competition on screening methods, or other author services by existing preprint servers (or others) is possible.
  4. Advantages: simplified content flow, standards implementation, content in one place for future use.

Basic Services

  • A stand-alone archive. Initial submission needs to be very quick for author: ie basic metadata plus files establishes priority.
  • Files rapidly published as PDFs with DOI and posted after screening/author services.
  • Ingest mechanisms could be diversified through existing preprint servers – but always some basic [automated] criteria would need to be met (automated to retain speed). For example it could e.g. require an ORCID for the submitting author as a simple trust mechanism, with further validation against grant IDs. Algorithms working on content (plagiarism, detection of poor animal welfare, scope, obscenity) could operate. There is scope for automated screening to be phased in and improved over time.
  • This model provides the opportunity for innovation around screening algorithms by the platform as well as third parties. It also provides business opportunities around author services.
  • Importantly, it also provides opportunities for innovation around coordinated submission for other materials relevant to the article, for example, data or software. But any integration of this nature would need to be lightweight for submitting authors as the speed of publication is a non negotiable feature of the preprint service.
  • Core version management would be required, both regarding new versions of the same article and linking with any future published versions of the article in journals.

Authenticated Content

  • After basic services get content in and published, the preprint service could generate JATS XML for authenticated content. Authenticated content could be defined by a number of criteria but could be e.g. PIs funded by funding organisations that support the infrastructure, popular preprints.
  • There is a cost to generating JATS XML. Limiting this added value to authenticated content could help control costs and give some confidence around that content for promoting discoverability via existing infrastructures.
  • Conversion to JATS XML will take some time, and would require input from the submitter to sign off on the resulting converted article. However it has the bonus of being integrity checked (e.g. all the figures are present), available for deep indexing, integration, and more widely discoverable via Europe PMC/PMC. Wider discoverability could be an incentive to authors to take the modest amount of extra time required to provide this data quality.
  • Note this could be an ingest point into the archive for XML content from other services/platforms.
  • In the future this more rigorous treatment may be extended to basic services, as work on methods that directly convert Word to Scholarly HTML and JATS XML mature, improve, and costs lower. However it is likely that publishing speed will be an issue for some time and a degree of submittor involvement will always be required.
  • The availability of content in JATS XML provides many opportunity for innovation around the provision of more structure in articles for integration purposes (e.g. tagging reagents, data citations and other deep linking mechanisms).

Post publication screening, filtering and sorting on the preprint platform

  • All content would be available for post-publication human screening such as user reporting of problematic content, commenting and so on.
  • More sophisticated algorithms that rank, sort and filter search results based on trust, content or other criteria could be developed by the platform and most importantly, by 3rd parties.

Data Out

  • All content available for bulk download (PDFs and XML), and via APIs as well as through website search & browse
  • Authenticated content could be made available via established archives (e.g. Europe PMC (PMC)) as a clear subset.
  • Core services managed centrally for example, content sharing with journals (this could be in collaboration with e.g. CrossRef since they already have some infrastructure around this)
  • There are possibilities for sharing article XML across publication workflows, comments/reviews, with journals/other platforms thus saving processing costs.
  • There are countless opportunities to support further innovation on the content by both commercial and academic parties with an open platform approach.

Document 6: Additional Questions for Possible Consideration

Drafted by ASAPbio

How can funders help to validate preprints as a mechanism for communication?

In a Commentary in Science published on May 20, 2016, co-authors representing several funding agencies recommended:

1) Publishing an explicit statement encouraging researchers to make early versions of their manuscripts available through acceptable preprint repositories.

2) Permitting the citation of preprints in acceptable repositories in grant proposals as evidence of productivity, research progress and/or preliminary work.

3) Providing guidance to reviewers on how to assess preprints in grant proposals.

How do funders envision taking these recommendations forward within their own agencies? Can ASAPbio assist in those efforts by working with scientific societies, institutions, journals, and advocacy groups?

Special considerations for human research?

Are there special limitations or concerns regarding preprints and human research that should be taken into consideration for a funder-supported core preprint service?

Gathering data on preprint usage?

Currently, the effectiveness and potential pitfalls in how we communicate and evaluate scientific findings are mostly opinion rather than data-driven. Might funders wish to gather data concerning preprint servers (or compare work going to preprint servers and journals)?   Do preprints servers facilitate the transmission of irreproducible work?  Or do preprints reduce the appearance of problematic journal publications? Do scientist submit lower quality work to preprint servers or is the content similar to journal submission?  Is transmission of “pseudo-science” a problem in reality? Do grant committees find preprints useful or burdensome? Are there additional questions that could be informed by data?

Managing Quality Control?

What kind of quality control would funders like to see (ie, preventing pseudoscience)? How possible is it to ensure uniform quality control on multiple servers? Must all screening be done manually by members of the community, or could algorithms be useful as a mechanism of prioritization of human screening (based upon ORCID numbers, grant support, prior publications, etc.)? Do funders want additional quality control provisions (e.g. e-signing by the submitting author of agreements of authorship and ethical standards of data gathering?).  Would acknowledgment and linkages of submitted work to grant support help to solidify the credibility of submitted work?  Should the service remove or flag a preprint that had been shown through subsequent review to contain incorrect or falsified data? How important is QC for preprints?  Which features of QC should to be implemented now, and which could be approached in the future?

Document 5: Existing databases funded by consortiums

Drafted by ASAPbio

arXiv

arXiv is the most directly comparable model in terms of the database content (preprints). Important elements of arXiv’s success are:

  1. single point of ingestion and one-stop shopping for viewing (everyone in the physics community wakes up and searches arXiv),
  2. high visibility and quality (a reason why scientists submit to arXiv to establish priority)
  3. operated for the community good on a not-for-profit basis by a trusted academic institution (Cornell) which has been operating for a century,
  4. funding by a consortium (a major private foundation (the Simons Foundation) and institutions), and
  5. governance by scientists (not just a passive advisory board).

From the arXiv web site:

            In January 2010, Cornell University Library (CUL) undertook a three-year planning effort to establish a long-term sustainable support model for arXiv, one that reduced arXiv’s financial burden and dependence on a single institution and transitioned it to a collaboratively governed, community-supported resource. CUL identified institutions worldwide where the use of arXiv was most active and worked collaboratively with them to develop a membership and governance model based on voluntary institutional contributions. A formal long-term plan took effect in January 2013. In this new model, arXiv is supported by libraries and research laboratories worldwide that represent arXiv’s heaviest users, as well as by CUL and generous matching funds from the Simons Foundation.

Protein Data Bank /Worldwide Protein Data Bank

The protein data bank is a worldwide cooperative of independently supported databases. Thus the wwPDB is a multiple server model based upon geography and geographically located funding agencies.  A common archive of structures is updated and mirrored on all sites, although each site maintains its independence in terms of ingestion and its own web sites (researchers can choose from which site they download).  The incoming data are more complex than those handled by preprint servers, since different types of data are deposited (e.g. X-ray, NMR, etc). Quality control is more of an important issue for structural data than for preprints.  Posting on the PDB is validation (not true for preprints) and constitutes a major part of the PDB mission.  Overall these databases are viewed as being very successful and are reasonably well funded (RCSB alone receives $6.5 million in funding from US government agencies). Arguably, elements of the worldwide collaboration might be subject to inefficiencies and difficulties in governance, but overall the system is also a reasonable model of organizing and distributing information as a public good.

Below Prepared by Stephen K. Burley (Director, RCSB Protein Data Bank)

Protein Data Bank Archive and the Worldwide PDB Protein Data Bank Organization:

The Protein Data Bank (PDB) is the single global archive for experimentally determined, atomic-level structures of biological macromolecules. The PDB archive is managed by the Worldwide Protein Data Bank organization (wwPDB; http://wwpdb.org) [Berman et al. 2003], which currently includes three founding regional data centers, located in the US (RCSB Protein Data Bank or RCSB PDB; http://rcsb.org), Japan (Protein Data Bank Japan or PDBj; http://pdbj.org), and Europe (Protein Data Bank in Europe or PDBe; http://pdbe.org), plus a global NMR specialist data repository BioMagResBank,

composed of deposition sites in the US (BMRB; http://www.bmrb.wisc.edu) and Japan (PDBj-BMRB; http://bmrbdep.pdbj.org). Together, these wwPDB partners collect, annotate, validate, and disseminate standardized PDB data to the public without any limitations on its use. The wwPDB collaboration is governed by an agreement signed by all four partners (last revised in 2013; http://www.wwpdb.org/about/agreement). The activities of the wwPDB partners are overseen by the wwPDB Advisory Committee, currently chaired by Dr. Andrew Byrd (NCI).

PDB Archive Data Contents:

The PDB archive contains information about structural models that have been derived from three experimental methods, including X-ray/neutron/electron crystallography, NMR spectroscopy, and 3D electron microscopy (3DEM). In addition to the 3D coordinates, the details of the chemistry of the polymers and small molecules are archived, as are metadata describing the experimental conditions, data-processing statistics and structural features such as the secondary and quaternary structure. The structure-factor amplitudes (or intensities) used to determine X-ray structures, and chemical shifts and restraints used in determining NMR structures are also archived. The electron density maps used to derive 3DEM models are archived in EMDB [Lawson et al. 2016] and the experimental data underpinning them can be archived in EMPIAR [Iudin et al. 2016].

wwPDB Partner Responsibilities:

The RCSB PDB provides Data In services for all depositions coming from the Americas (North and South) and Oceania. PDBe provides Data In services for all depositions coming from Europe and Africa. PDBj provides Data In services for all depositions coming from Asia. BMRB archives additional NMR data that are not captured by the other three wwPDB partners during archival data depositions. The RCSB PDB serves as the global Archive Keeper, coordinating weekly updates of the PDB archive with PDBe, PDBj, and BMRB. wwPDB partners distribute identical copies of PDB data from redundant, regional FTP

sites at no charge and with no limitations on utilization. All four wwPDB partners also distribute PDB data at no charge and with no limitations on utilization from their own value added websites in a healthy competition.

wwPDB Partner Funding:

RCSB PDB is supported by NSF [DBI-1338415], NIH, DOE; PDBe by EMBL-EBI, Wellcome Trust [104948], BBSRC [BB/J007471/1, BB/K016970/1, BB/K020013/1, BB/M013146/1, BB/M011674/1, BB/M020347/1, BB/M020428/1], EU [284209, 675858], and MRC [MR/L007835/1]; PDBj by JST-NBDC, and BMRB by NIGMS [1R01 GM109046].

Governance of the RCSB PDB:

Excerpted from “RCSB Protein Data Bank Advisory Committee Terms of Reference”

The RCSB PDB is managed by two members of the RCSB: Rutgers, The State University of New Jersey and University of California, San Diego, and is funded by the National Science Foundation, the National Institutes of Health, and the Department of Energy through a cooperative agreement. The current Director is Dr. Stephen Burley and the Associate Director is Dr. Helen Berman, who was previously the Director. Both are located at Rutgers. The site head at UCSD is Dr. Peter Rose. In addition, there is a leadership team in charge of key aspects of the RCSB mission including operations, application development, biocuration, data architecture, education and outreach.

The RCSB PDB Protein Data Bank Advisory Committee (RCSB PDBAC) is responsible for providing independent advice to the RCSB PDB Director and staff on current and pending issues of policy, operations, technical implementation, and project performance. The Advisory Committee consists of members chosen from the scientific community, who are recognized experts in their fields, including but not limited to, structural biology, cell and molecular biology, computational biology, information technology, and education. These scientists will be drawn from academia and industry. The AC is appointed by the Director in consultation with other members of the RCSB PDB, the AC Chair, and others. The 3-­year term of membership is renewable.

The RCSB PDBAC meets once a year. The Director is responsible for developing the meeting agenda in consultation with the Chair and, where deemed appropriate, funding agency staff. Meetings typically last a full working day. At the conclusion of each meeting, a written report is prepared by the members of the RCSB PDBAC describing its discussions, including any specific conclusions or recommendations with respect to changes in management and policies of the RCSB PDB. As specified by the cooperative agreement, this report is provided to the Director within 30 days of the AC meeting. The Director formulates a response to the report, addressing recommendations made, issues raised for further consideration, etc., and provides the Chair with the response. The report and the attendant responses are incorporated in the Annual Progress Report submitted to the National Science Foundation.

Europe PMC

The cooperative funding of the European PMC is an interesting model for a consortium.  In this case, each funder supports Europe PMC in proportion to their annual research spend.  One funder (Wellcome Trust) provides the lead role in organizing the consortium. The system of governance involves both the funders and the scientific community.

Prepared by Robert Kiley, the Wellcome Trust

Europe PMC is run, managed and developed by the EMBL-EBI (European Bioinformatics Institute) on behalf of the 26 Europe PMC Funders, which includes the Wellcome Trust, Medical Research Council, Cancer Research UK, the European Research Council and the World Health Organization.

A grant of £5.7M ($8.3M, €7.2M) has been awarded to Dr Jo McEntyre by the Wellcome Trust, on behalf of the Europe PMC Funders.  This grant runs from 2016 to 2021.

Governance

Europe PMC has three governing bodies: the Funders’ Group, Funder Committee and Scientific Advisory Board.

  • The Funders’ Group is made of research funders who both mandate the deposition of research papers which arise from their funding in this repository, and provide funding to facilitate this. It is responsible for setting the overall direction of travel for Europe PMC, and meets annually.
  • The Funder Committee is a subset of the Funders’ Group which meets twice a year to review completed developments, comment on future development and approve the release of funds on behalf of the Funders’ Group.
  • The Scientific Advisory Board meets annually to review progress on the development of the service over the past year, and the plans for development for the forthcoming year. The Board ensures development is sensitive the needs of the scientific community. They also advise the Europe PMC Funder Committee as to the overall effective use of funds from the Europe PMC grant.

Funding

The Wellcome Trust – on behalf of the Europe PMC Funders Group – provides grant funding to EBI to cover the cost of supporting, maintaining and developing the Europe PMC repository.

In turn all the Europe PMC funders (with the exception of the European Research Council, ERC) reimburse the Wellcome Trust according to the payment schedule detailed in the Collaboration Agreement.  ERC’s contribution to funding Europe PMC is made via a grant to the Wellcome Trust.

Each funder supports Europe PMC in proportion to their annual research spend.  This was deemed to be the most equitable way of spreading the costs across all funders.

Additional funders can join the Funders’ Group during the course of the grant by signing an addendum to the Collaboration Agreement.  In turn, EBI can submit development proposals to apply for the additional funds provided by new funders. Such applications are considered by the Funder Committee.

Document 3: Implementation of the Preprint Service

Drafted by ASAPbio

At present, biologists submit very few preprints. However, possible growth to the level of arXiv (100,000 submissions/year) or beyond needs to be considered.  This will challenge the IT capacity (including robust data back-up) and quality control screening systems of existing servers as well as heighten the need to integrate this information.  While still a nascent effort in biology, now is opportune moment to think through a good preprint system that will be accepted by the biology community, have good functionality, and will have lasting value. A particularly important topic for discussion will be whether a consortium of funders will be want to support:

1) a single server for the intake of preprints? or 2) a system for linked, but independent  servers with common standards for quality control and data exchange, etc? Factors to consider for these models:

Maintaining uniform data sharing standards, licensing, and quality control of input. The wwPDB provides an example of a central body that provides standards for multiple PDB servers so that they all contribute in a uniform manner to a single global archive.  This demonstrates feasibility of the multiple server model. However, is it the most efficient model?  If one is to build a system from scratch today, would it be easier to achieve these same goals with one server?

Governance.  A single server supported by the consortium would presumably have one governing body.  With multiple servers, how would governance work for setting standards for integration?  Would funders (or funders/scientists) be involved in the appointments to the advisory boards of individual servers?

International Representation and Involvement.  Preprints are a global resource of knowledge and international involvement is critical for the realization of this vision.  A single preprint server located in the United States with only funding from US agencies may not be perceived as a global resource and attract scientists from around the world. A single server could be supported by a consortium of international funders. Alternatively different preprint servers for different geographic regions could emerge and be supported by regional funding agencies, along the lines of the PDB and PMC.

Overall Cost and Funding Mechanism. Funding of a single server by a consortium of cooperative parties) is relatively straight-forward, but how would that single server be chosen?  Would it be through a competitive call for a contract?  On the other hand, if there are multiple preprint servers, at what level would the funders engage?  Would funders build the IT infrastructure for linking the data?  Or will they be funding the operation of multiple intake servers?  If so, would there be redundant costs for operating several servers versus funding one and would this be mitigated by extra value created?

Long-Term Archival and Preservation.  Preprints should be a permanent record of scientific work and should be backed up. How would the one versus multiple server models affect the implementation of an effective strategy for maintaining a permanent record?

Spurring Innovation. A primary argument for the multiple preprint server model is the potential to promote innovation. arXiv, for example, only has PDF download and no commentary features.  What if a physicist wants a nice HTML web interface for their manuscript and internet commentary on their work? Perhaps multiple, competitive intake servers with different interfaces and features would be beneficial for physicists?   However, with a single server, innovation could be still occur at a level above the initial submission.  With free access to the single server’s API, for-profit or non-profit entities could provide added value, which could include better customized search engines for information, recommendations of work, post-publication peer review, and discussion forums. By separating 1) the initial submission to a highly visible and stable platform (what all scientists want) from 2) additional services (what some scientists want and will be willing to pay for), the market place for innovation can still occur and new ideas tested based upon need and performance. This type of innovation, however, requires that the server be developed with an open API and the right licensing terms.

Do We Know What to Build?  Supporting one server entails risk since it could fail for a variety of reasons. Multiple servers might mitigate risk and perhaps even promote a Darwinian competition with an eventual winner (or with multiple winners each providing value). This market place rationale is reasonable. However, there are also counter arguments and questions.  Will economics (financial support through scientist grants or directly from funders) be sufficient to allow the growth of many flourishing, properly-maintained, and innovative preprint servers?  Can we start by building one consortium-funded server now that will succeed in its goals and not be likely to fail?

An Alternative Model:  Preprints from Journal Submissions

Every journal could develop it own “preprint service” by posting submitted work while in review. One advantage is that the entire process (submission to publication) could be made transparent (an interesting model being pioneered by F1000 Research). Funders would not need to pay for the preprint directly since that cost will absorbed by the journal (but passed along ultimately to the scientist who pays final publication fees). Innovation is promoted since each journal can develop its own preprint style.

Disadvantages are that funders will have to develop a mechanism for creating the preprint database (into PubMed or a new mechanism) from their ingestion through many journals. Furthermore, while some journals accept the majority of submitted manuscripts after peer review, most do not.  This will create a non-viable starting position for many scientists, since a preprint will be linked to a journal that might ultimate reject the work. Furthermore, an important premise of preprints is to circumvent the current problem of judging quality based upon on journal name.

Document 2: A preprint service supported by an international consortium of funders

Drafted by ASAPbio

As a public good, preprints should obey a single data standard that will enable them to reside in a single database.  This database should be permanent, well-maintained, free for all to use, and easily accessible and searchable. This preprint service should have an outstanding governance structure that will represent the needs of the scientific community and oversee adaptations that will be inevitably needed in the future. Further issues regarding the implementation of this preprint service are considered in Document 3.

**As an eventual outcome of this meeting, we recommend the funding of a preprint service in biology by an international consortium of public and private agencies. The contract for this preprint service should support the initial development of infrastructure, yearly operating costs (with the possibility of metric-driven growth), a system of trustworthy governance, and a commitment that the server will not charge submission fees for at least its first five years.** 

The reasons for recommending a single preprint service supported by public/private funding are:

Best chance of adoption by the biology community.  Communication through preprints is foreign to biologists.  Developing a highly-trusted preprint service in the life sciences that is directly supported by major funding agencies and governed by outstanding scientists will promote buy-in to this form of communication.

High visibility.  Scientists want their work to be widely viewed, since this helps them to establish reputation and priority. In the case of arXiv, having a single platform where scientists go to look for new work in the field immensely aids visibility and helps physicist to establish priority of discovery. Maximum visibility also will ultimately require a search engine that integrates preprints with peer-reviewed publication, which should be considered in the initial development of the preprint service as well.

Ensuring trustworthy governance. Direct funding of a preprint service will enable funders to play a more direct role in the governance and future directions of preprints, rather than being relegated to a more peripheral role.  An international component of governance needs to be considered from the onset for this global resource.

Maintaining quality and standards.  A consortium-funded preprint service could help to define quality control for submission and standards for sharing and maintaining data.

Overall cost and ease of funding.  Direct support of a preprint service by a consortium of funders will be small, certainly in comparison to investments made by funders indirectly or directly towards journal publication and the open access of journal publications.  As an alternative to funding a preprint service, funders could provide financial support directly to the scientist to pay for preprint submissions (e.g. as a specific new line item on a grant).  This is similar to the present-day journal system, but in this model, the scientific community and funders have little say in cost and governance.  Furthermore for such a scientist-payer model to take effect globally, all funding agencies would need to develop a new policy allowing scientists to include preprint costs in their budgets.  Even if achieved, this plan is less democratic, as it favors scientists with larger grants versus scientists who have less funds but would like to benefit from preprint use.

Time. There is a sense of momentum and support for the growth of preprints in biology, but this momentum can be lost as quickly as it is being gained. Funders could play a major role in sustaining this momentum by becoming involved and supporting a preprint service. If funders take a lead role, together with the influential junior and senior scientists, then preprints have a chance of becoming common practice in biology.  If funders take a back seat over the next couple of years, if biologist adopt a wait-and-see attitude to see funders become interested, and if preprint submissions tick upward only at a slow rate, biologists will see preprints as a nice idea in theory but a failed experiment in practice.  It might prove difficult to recover from such a situation.

Document 1: Defining Basic Objectives of a Core Preprint Service

Drafted by ASAPbio

Preprints are a global archive of knowledge that serves the public good.

Here, we hope to discuss and define the basic functions that scientists and funding agencies seek from a Core Preprint Service. We use the term “Core” to define the basic features that meet these goals and might define a “minimal viable product”.  Additional services (either for- or not-for-profit) could grow on top of this core.  The needs for good governance and the potential for adaptability are critical, since it may not be possible to build or even envision all elements of an initial core preprint service.

What Scientists Desire:

  • Good visibility of their work. If preprints are difficult to find or behind a paywall, then the purposes of using preprints- to share findings rapidly, to establish priority, and to obtain feedback- are diminished.
  • Credit for their efforts. Preprints need to be acknowledged by funders and universities as component of the evidence of productivity, particularly recent productivity.
  • Easy access to preprints to aid their own research projects. The ability to search the entire preprint archive easily and receive RSS feeds will be appealing to biologists as has proven true for the physicists with arXiv.
  • Easy and no (low) cost author uploads. Like many internet services, being free and easy to use (ie upload) will lower barriers for submission and facilitate wide-spread use in the life science sector.  The option of charging small fees may be possible later, if there is larger scale adoption and possibly added features.
  • The ability to update and submit new versions of the manuscript. Revisions are critical for a preprint service.
  • Freedom to submit to a journal of the author’s choice. Preprints should not be aligned with a single journal or publisher in a way that creates the perception that the preprint server is a preferential ingestion mechanism for specific journals/ publishers.
  • Sustainable model and long-term permanence of submitted work.
  • Governance by well-established and trusted parties. Trust is an important component of a preprint service. Scientists and funders should have a voice in its governance.

Funders desire:

  • Open and rapid communication. The ability to make their supported research openly and rapidly available to the world wide scientific community and thus advance scientific progress.
  • Information to help make funding decisions. The desire to make well-informed decisions on grant applications.  Preprints offer access to the most recent and publicly accessible work from an applicant and facilitate merit-based evaluation of recent productivity
  • Permanence of the scientific record. Stability and permanence of the scientific record (including maintaining different versions of the work).
  • Community support. Excellent leadership and governance from the scientific community, credibility, and wide use.
  • Facility sharing and innovation. The potential for future innovations and change in ways that maximize the funder’s scientific mission, promotes data sharing, and improves scientific evaluation.
  • Upfront quality control to screen for pseudoscientific work and plagiarism. This QC process could include other assurances that fraudulent or low-quality science will not be maintained as a permanent and misleading record.  Preprints will also enable more scientists to evaluate and potentially help to correct work before reaching journal publication.