Daniel Mietchen, School of Data Science, University of Virginia, @EvoMRI
Peer review of preprints typically has two main components (a) general comments on parameters like the quality, scope, topicality or structure of the manuscript and (b) comments on details, e.g. inaccuracies, missing references, confusing plot parameters, unsubstantiated claims, typos. The general part is usually a synthesis that is best expressed in a new piece of text. Claims in this text are frequently substantiated with examples from the details part, though rarely in a click-through manner.
Preprints may include – or be accompanied by – a variety of materials, possibly in various file formats and locations. Curation of preprints has to take into account the submitted files and any associated materials, as well as appropriate metadata.
Feedback on preprints is more likely if (c) it is honoured in some way, (d) the preprint or its components appear in the workflows of potential reviewers, (e) it is clearer how to review certain kinds of materials, (f) tooling is available to support the review process, (g) a few other things. This proposal is mostly about e, partly about d and e, and it would perhaps add a nudge to c and g.
Reviewing or curating the materials associated with preprints poses challenges that can and do vary as a function of parameters like their kind, format or location. For instance, if the preprint (i) is publicly available online in HTML format, (ii) is not versioned, (iii) does not contain non-textual elements like figures, tables, equations, code snippets or external identifiers and (iv) does not depend on things like software or data located elsewhere, then it can be reviewed by simply posting web annotations by means of a tool like Hypothesis (example). If any of these conditions are not met, then the review and/or curation workflows may require modification, particularly for the detailed comments. Some of these modifications can be simple (e.g. just mentioning the preprint’s version ID in the review report can be enough to address the versioning problem), others may be more complex, e.g. reviewing software typically requires installing and running it, which may by complicated by dependencies or incompatibilities with respect to external software or data.
Some best practice recommendations for preprint review and curation by content type, along with examples and validation tools that are contextualized from the perspective of these recommendations.
Initially once: analysis of a random subset of preprints — ideally from a range of platforms — in terms of their non-textual components as well as the file formats and locations of associated materials. Ideally, such an analysis would be performed automatically on a daily basis thereafter, for all preprints meeting certain criteria.
There are several milestones for the project to reach, in an iterative fashion
- identification of the initial test corpus
- identification of the initial set of criteria for analyzing the test corpus
- identification of best practice examples (perhaps with further improvements) with respect to these criteria
- distillation of the analysis into best practice recommendations
- distillation of the recommendations into validation tools, badge system and dashboard
- testing, documentation and training regarding the above
- Development of validation tools that can check — individually or in bulk — preprints or possibly also published articles for compliance with the recommendations.
- Development of the technical components of a badge system that signals compliance with the recommendations at the level of individual preprints and perhaps platforms or other units of organization (e.g. journals).
- Development of a dashboard to monitor adoption of the recommendations across the preprint landscape.
- Tools assisting with the discovery, review and curation of specific content types.
- Collaborative development of the best practice recommendations and suitable examples.
- Development and testing of the validation and badge systems.
- Development and testing of mechanisms to discover, review and curate specific content types.
- Documentation and training materials for authors and reviewers of preprints as well as operators of preprint platforms or other stakeholders.
To move any of the above forward.