These FAQ are intended to guide researchers in selecting a license for their preprint. Have a question that isn’t addressed here? Please email katie.corker@asapbio.org.

Last update: 2024-08-28

 

Disclaimer

The information provided in this Licensing FAQ is not legal advice and your use of this FAQ does not create an attorney-client relationship. The examples provided throughout are intended to be illustrative only and are not exhaustive of all facts that may be relevant to a determination about your rights or risks. Please consult an attorney if you would like legal advice about your rights, obligations, or individual situation.

Click to enlarge or download as pdf

Copyright and CC license basics

For more on CC licenses, see the CC FAQ.

Who holds the copyright in a manuscript?

An author automatically gets a copyright the minute that an original creative work is fixed in a tangible medium of expression.

Unless an author has already transferred their copyright to a publisher or it is a work-made-for-hire (in the United States), they own and control the copyright and have the option to retain all rights or license it under a Creative Commons or other public license. The author can also transfer or license some or all of the copyright to a journal. Note that U.S. copyright law considers the copyright holder of a work to be the employer in a work-made-for-hire situation. Therefore, in some cases, your institution or employer may be the rightsholder of your manuscript and its permission required to transfer or license the copyright.

What legal rights do co-authors have?

In the case of papers with multiple co-authors, most jurisdictions (including the United States) allow each co-author (in legal terms, a “joint author”) to legally grant non-exclusive rights to publish the work and apply a Creative Commons license. In the absence of an agreement to the contrary among co-authors, each co-author can legally do this without the permission of the others.

Does the act of posting a preprint transfer copyright or “sign rights away” to the preprint server provider?

That depends on the terms required by the preprint server provider, but ASAPbio is not aware of any preprint servers that request or require copyright transfer. Most of the time, authors of preprints retain copyright of their work and can subsequently license or transfer it to a publisher if they choose. In order to legally post the preprint, almost all server providers require that authors grant them a perpetual, non-exclusive license to host and distribute the preprint (either under a Creative Commons license, or otherwise). Note that non-exclusivity allows the author to engage in other licensing activities; the purpose of this grant to the preprint server is to ensure it has the right to make the work available on its website. Check the terms of service of a preprint server before submitting your article.

Why should authors consider applying an open license to their preprints?

Preprints provide a mechanism for rapidly communicating research. The Creative Commons (CC) licenses break down the traditional barriers to sharing by communicating explicit rights and permissions up front with anyone.

The default of copyright for all works is “all rights reserved.” This means that unless usage permissions are granted up front, anyone who encounters the work cannot be sure of what rights, if any, they have to use the material. In the face of uncertainty, this requires them to seek permission from the author before using the material. CC licenses offer a standard, “some rights reserved” approach, enabling an author to communicate to the public up front how they can use their works without violating copyright. Another benefit of using CC licenses (CC BY and its family of licenses) is that they all require that users provide attribution to the original author and include a link back to the original whenever the material is used and shared.

Creative Commons also offers CC0, a waiver that places work in the public domain and removes any and all copyright worldwide, for those authors not wishing to retain any control over their material. CC0 is used by some US government agencies, as their works are already in the public domain as a matter of U.S. copyright law and the laws of many (though not all) other countries. Therefore, when an employee of the US government indicates a work is CC0, the U.S. government is waiving copyrights it may have around the world (see further discussion at Project Open Data).

Releasing preprints under CC licenses has several benefits for authors. It removes the need to communicate with individual reusers, which reduces transaction costs for everyone, and eliminates headaches for authors who otherwise must deal with requests for reuse on a case-by-case basis. Furthermore, it increases the potential exposure for work by allowing it to be displayed or adapted in new contexts. (More on this in the “Derivatives” section).

However, CC licenses, just like any other public license and even some custom licenses, may complicate the analysis of view and download metrics, since articles may be reposted elsewhere and reusers are not obligated to return metrics information to the original preprint server. Furthermore, since CC licenses provide broad permission to reuse work, they reduce an author’s ability to take action against reuses they dislike compared with retaining all rights to an article.

The National Institutes of Health has encouraged the use of CC BY licenses on preprints for grantees, and several other funders encourage CC BY licenses for other work products.

If an author posts a preprint under a CC licence, does that mean that they can’t publish later versions under other terms, or transfer their rights in the final, published manuscript to a publisher?

No. Authors maintain the ability to publish subsequent versions under a different license or even assign their copyright to a publisher if they wish. Any new copyrights in subsequent versions of a manuscript covers only the additions, changes, or other new material appearing for the first time in that particular version.

 

An author can always enter into other agreements with publishers for the final, published version—whether it be transferring copyright to publish in a paywalled/subscription journal, or licensing the final product under a different CC license to accommodate the requirements of an open access journal. Remember, however, that the preprint itself, in the form published as a preprint, is always available for reuse under the CC license until the applicable copyright term expires (in some jurisdictions, 70 years after the death of the author, at which point the work enters the public domain). You are free to share, adapt, and re-use the preprint if released under CC, as per the terms of the specific CC license, but the publisher license to publish or copyright covers the publisher version of record.

Will journals publish my work if I’ve previously posted it in a preprint version under a CC license?

That depends on the journal. As the SHERPA/RoMEO database reveals, most paywalled/subscription journals in the basic life sciences are willing to consider submissions that have previously circulated as preprints, and policies that refuse to consider submissions based upon the license of the preprint are extremely rare. Again, authors should check policies listed on the journal website before submitting.

If you have posted a preprint, it is a recommended practice to disclose to the publisher that the preprint was previously published and indicate the license or terms under which the preprint was posted.

Can the author revoke a CC license applied to a preprint?

No. CC licenses are irrevocable on the preprint version until the applicable term of copyright expires; however, nothing in any CC license requires the author or anyone else to keep the work on a website or otherwise make it continually available. Additionally, in an effort to ensure preprints remain part of the permanent scholarly record, many preprint servers will not as a matter of practice allow material to be removed.

If an author’s CC BY-licensed preprint is reproduced on another website/repository that they doesn’t want to be associated with, what recourse does the author have?

CC licenses contain several provisions that enable authors to distance themselves from such sites or require that the preprint be removed altogether. Two situations might arise. First, if a reuser violates any of the terms of the license (for example, by not attributing the author or suggesting the author endorses the user’s website, project, publication, program, etc.) the license terminates immediately for that reuser. The reuser must remove the article or risk a claim of copyright infringement in the absence of the permission of the author unless (under the CC 4.0 licenses) they fix the problem within 30 days of notification, as described below. Second, in the absence of a violation, the author may simply not want to be associated with their work when re-published.

License violations: The following situations result in violations of CC licenses, causing the license to terminate for the reuser (meaning that they can no longer use the work). In each of the examples below, under the CC 4.0 licenses, the offending reuser has 30 days to remedy the problem and get their rights back under the license. (Note, they may still be liable for infringement for the period up until they fix the problem.)

  1. Improper suggestion of endorsement. All CC licenses require adherence to the license terms, and among other things prohibit reusers and re-publishers from implying the author endorses them. If you believe a reuser’s publication of your article suggests that you, the licensor, endorses the reuser or the reuser’s views, this may be a violation of the license and you can insist the article or any reference to you be removed. This may include situations where you believe a journal to whom you did not submit your article has reprinted your article in order to suggest you endorse the credibility of the journal.
  2. Failure to provide proper attribution. If the attribution and marking requirements are not met, then the license is violated and the publisher is subject to a claim of copyright infringement and no longer has permission to publish the manuscript or paper.
  3. Failure to remove attribution upon request. If the author does not like how the material has been used, or how the work has been modified, the reuser must take reasonable means to remove the attribution information upon request (but need not remove the material itself). If the reuser does not, then this is a violation of the license.
  4. Failure to indicate that changes were made. Anyone modifying licensed material must indicate that the original has been modified, even if the modifications are minor. This includes small changes that are allowed under the NoDerivatives licenses (see below). This ensures that changes made to the original material— whether or not the original author approves of them—are not associated with the original author. If the user fails to indicate that changes were made, this is a violation of the license.
  5. Failure to link back to the original version. Reusers must link back to the original document as published. This allows the public to see the original work in context, as differentiated from the context in which the work is re-published.
  6. Other failures. All CC licenses contain additional restrictions on reusers. These include a prohibition on imposing technological protection measures (known as DRM) or adding additional restrictions on reuse by others that prevents them from exercising rights granted by the CC license (e.g., putting an NC-licensed work behind a paywall).

CC recommends that violations of the licenses be handled amicably. Often, mistakes in marking and other terms are unintentional. As mentioned above, with the version 4.0 licenses, reusers have a period of time to fix marking and other license violations, and can get their rights back if they fix them within 30 days of discovery (though authors still can recover damages for the violation period). CC strongly encourages authors to apply the most recent version of the licenses, currently at 4.0, for the many benefits it has over earlier versions.

Dissociation: Even if the license has not been violated, an author has the right to require the publisher remove all attribution from the article whenever reasonably practicable. The extent to which this right is available depends on which license version is used. Under all versions of the license, this right exists if the publisher has created a collection (e.g., where a publisher may aggregate your original work with other separate and independent works in their journal) or if a derivative of your work has been made. In pre-4.0 licenses, this right does not apply when your work is reproduced as a stand-alone republication without changes.

.

Commercial uses

All Creative Commons licenses allow commercial resuses of a work except for works licensed under the three NonCommercial licenses. Under those three licenses — BY-NC, BY-NC-SA and BY-NC-ND — only non commercial uses are allowed. Below are some examples of what is and is not allowed under the NC licenses. Note that Creative Commons maintains a page on its website with information about the definition of NonCommercial. Please refer to that as the definitive, most up-to-date explanation about how the NC term operates. The examples provided below are illustrative and are not exhaustive in explanation; outcomes may vary depending on the particular facts of your situation.

What’s the difference between scholarly citation and requirements for attribution under a CC license?

While works under Creative Commons licenses can be legally reused as a matter of copyright if attribution requirements are met, those legal requirements are not necessarily the same as scholarly citation requirements for some disciplines. CC’s attribution requirements are not intended to serve as a substitute for those citation requirements. Sometimes, the CC requirements are more lenient than applicable citation requirements, sometimes more strict. CC licenses provide a baseline, standard requirement to attribute the work’s author, source, license terms, if changes were made. They are not a replacement for scholarly norms, which operate independently from legal requirements.

As such, plagiarism can occur even if copyright is not violated. For example, if authors use the ideas or knowledge contained in a preprint to advance their work, but do not reproduce any of its material, they may be professionally, but not legally, obligated to cite that preprint.

How can I reuse CC licensed material (from a preprint or otherwise) in my paper?

For general examples, please visit CC’s marking page. Examples in the context of academic papers can be found in the figure legends of these papers about yeti crabs (see figure 1), volcanic zones (see figure 1), and nanopartcles (see figures 3-6 & 8-9).

Please be aware that CC BY-NC content may require separate permissions from the author if you plan to publish in a subscription journal, and CC BY-ND content may require separate author permission if you modify it substantially.

What is “non-commercial” use, and what does it mean to license a work under a CC NonCommercial license?

The Creative Commons licenses define non-commercial as “not primarily intended for or directed towards commercial advantage or monetary compensation.” The definition does not depend on the identity of the user: a for profit company can use NC-licensed content in situations that do not violate the NC-term, just as a non profit organization can use NC-licensed content in situations that would violate the term. Courts in the United States have held that bona fide non commercial reusers may outsource the copying of NC-licensed content so long as the materials are used by entity requesting the copies for NC-purposes. This is the case even if the outsourced copyshop makes a profit. The courts found that entities must act through employees and others, and so long as the copyshop is acting on behalf of the entity, they are protected by the entity’s license, just as an employee of the entity making a salary is protected by the entity’s license. For more information and use case scenarios, visit the CC website.

The term “commercial advantage” and what constitutes primary (vs secondary or other) intent are not defined (much like “fair use” is not strictly defined under U.S. law). Some argue that this vagueness limits reuse. For reusers who need certainty, this may mean they should not use NC-licensed content and instead opt for content licensed under one of the CC licenses that permit commercial use. Authors should also be aware that NC-licensed content cannot be uploaded to Wikipedia or Wikimedia Commons, which means that your preprint manuscript (or sections of it, including diagrams) may not be distributed through those platforms.

What protections does a CC-BY-NC license provide against use of content by AI or large language models?

It must first be noted that CC licenses (including licenses with non-commercial restrictions) are only applicable in situations where copyright is applicable. Creative Commons has argued that “we believe there are strong arguments that, in most cases, using copyrighted works to train generative AI models would be fair use in the United States, and such training can be protected by the text and data mining exception in the EU. However, whether these limitations apply may depend on the particular use case.” Ongoing legal action (of the New York Times vs. OpenAI) will test this interpretation. If AI use is determined to be “fair use,” then CC license restrictions would not apply, but see the full post for other considerations.

If copyright protections are applicable in a given situation, then a given CC license could be used to restrict reuse by AI or large language models. In particular, using a CC-BY license would require attribution for reuse, and a CC-BY-NC license would require attribution and restrict use to non-commercial uses. In short, using a CC-BY-NC license may or may not discourage reuse by AI or large language models, depending on whether copyright is relevant in a given context. Readers should be aware that there have been instances of commercial publishers (including Taylor & Francis and Wiley) who have sold access to research that they have published to companies training large language models. Authors often surrender their copyright when they publish research in a journal or edited book, giving publishers the right to resell those works as they see fit.

Does inclusion of the work in a subscription or OA journal qualify as “commercial” use?

If the journal’s publication of the preprint is “primarily intended for commercial advantage or monetary compensation”, then yes. This is almost always the case when a journal charges a subscription fee for viewing or downloading articles. Note, however, that so long as the journal has received some additional permission from the author of the article to commercialize the article through terms of use or private agreement (even if the article is under an NC license for the public), the journal can make commercial uses of the article. This is because CC licenses do not restrict licensors from relaxing the terms of the license or releasing their work on other terms, in addition to the terms of the CC license. Under CC, authors are always free to choose to relax terms of the CC license or provide altogether different terms even though the CC license continues to apply to the work in the form as licensed under its terms to others.

 

Example scenario 1:

Researcher A releases their manuscript (including all figures) on a preprint server under a CC BY-NC license. Researcher B wants to reproduce a figure from researcher A’s CC BY-NC preprint in a review they’re writing for a subscription journal. After the review is published, readers must have a paid subscription to access Researcher B’s article containing researcher A’s CC BY-NC figure. Does this violate the CC BY-NC license because it is a commercial use by the journal?

 

It depends. Inclusion of a portion of a work for purposes of criticism or commentary may be permitted by fair use in the United States (and other countries as well – for more information, see the fair use section below). Otherwise, unless either Researcher B or the subscription journal has additional permission from Researcher A allowing them to commercially reuse the figure, then the charge by the subscription journal to access to the figure likely violates the CC BY-NC license.

 

Example scenario 2:

Researcher A releases their manuscript on a preprint server under a CC BY-NC license. Researcher B wants to reproduce a figure from researcher A’s CC BY-NC preprint in a review they’re writing for a CC BY open access journal. Researcher B must pay an APC (article processing charge) to the journal in order to publish the review. After the review is published, anyone can read the article for free. Researcher B’s article contains researcher A’s CC BY-NC figure. Does either Researcher B or the open access journal violate the CC BY-NC license if the journal article is licensed under a CC license allowing commercial reuse?

Researcher B may be incorporating the content under fair use (see scenario 1, above), and if so could never be violating the BY-NC license. Assuming fair use does not apply, however, it is unlikely Researcher B is violating the BY-NC license because paying an APC to have a work published is not primarily intended for commercial advantage to Researcher B (they are paying, not receiving payment, to distribute the work further). The journal must comply with the NC license absent permission from Researcher A. If the journal is providing the article without charging a fee, it is unlikely the journal is violating the BY-NC license. However, Researcher B and the journal should be certain to properly mark the figure as being under BY-NC, so that reusers of the review know that they cannot make commercial uses of the figure even if they may be able to commercialize the rest of the review under the BY license.

Commercial Uses table

As a reminder, the CC definition of NonCommercial licenses prohibit uses that are “primarily intended for commercial advantage or monetary compensation.” The answers that follow are conditioned on this definition, and the answers (where given) are based on the best reading of the licenses but may not apply to your factual situation.

 

Type of use Is this a commercial use?
General dissemination
A website that generates revenue through advertising offers others’ NC-licensed content for download (not just providing a link) Often depends on prevalence and placement of ads and how dominant the NC-licensed content is relative to other commercially-licensed content (e.g., having to click through an ad in order to see content very likely violates NC; but side ads that are not necessary to viewing the content may not unless they dominate the site)
A website makes NC-licensed content available behind a paywall (e.g. in a book or an article that requires a subscription or per-article charge to access) Yes, absent permission from the licensor
A website requires payment of a subscription fee to access content Yes, absent permission from the licensor
Educational uses
A public school district whose use is only non-commercial engages a commercial copyshop to make prints of NC-licensed content and the copyshop makes a profit when doing so No, as long as the school district is a bona fide non-commercial user and the copyshop is acting solely at the school district’s direction and didn’t need its own license before making copies for the district at its direction.
A student paying tuition is provided a link to a NC-licensed resource that is required reading for the course The institution is not violating the NC restriction because the institution is not implicating copyright (no license is needed to supply a link, at least in the United States). The student is not violating the NC license as long as they use only the resource for NC-purposes. If they resell the copy they downloaded from the link, the student may be violating the NC license.
Persuasion
A pharmaceutical company uses NC-licensed material in its marketing materials with the intention of influencing a market or its sales potential (e.g., papers related to pharmaceutical sales) Likely.
Grassroots activists use NC-licensed materials to lobby Congress to adopt a permissive policies on GMOs Depends. If the activists have a financial stake in GMO companies and are lobbying primarily with the intention of commercial gain for themselves or their employer, then likely yes.
A GMO-producing company (or someone with a financial stake in GMOs) uses NC-licensed materials to lobby Congress to adopt permissive policies on GMOs that will result in an increase in stock price and profits Very likely, yes

 

Remember: the for- or not-for-profit status of the reuser does not matter.

.

 

Derivatives

Note that under CC licenses, whether changes to a work are sufficiently creative to result in a derivative work depends on copyright law. Not all changes result in a derivative work. See CC’s FAQs for more information, and this Circular published by the U.S. Copyright Office for more information. University libraries and offices of scholarly communication may be able to provide additional information.

What is an “adaptation” of a work, and what does it mean to license a work under a CC BY-ND (Attribution-NoDerivatives) license?

An adaptation is a work based on one or more pre-existing works. What constitutes an adaptation depends on a particular country’s copyright law. Generally, a modification rises to the level of an adaptation under copyright law when the modified work (which is based upon the original work) manifests sufficient new creativity to be copyrightable itself as a new work.

One example that is always an adaptation under international treaties is the translation of a work from one language to another. This means that a preprint published in English under a CC BY-ND license cannot be translated into Arabic and shared further without the explicit permission of the author (the translation could be used internally, though, without violating a CC BY-ND 4.0 license).

However, note that all CC licenses allow the user to exercise the rights permitted under the license in any format or medium—these are not considered derivative works under the ND licenses. This allows for format shifting. For example, if an author publishes a preprint under any CC ND license in a .txt or .doc file, anyone (including the preprint publisher) may convert the file to PDF without violating the NoDerivatives restriction.

Small changes such as correcting typographical errors or inserting small annotations rarely result in the creation of a derivative work. Also note that excerpts typically do not qualify as derivatives (see below).

Does annotation of the work quality as an adaptation?

Example scenario: An archive of life sciences literature displays CC-licensed preprints on its own site along with inline annotations that provide readers with more information (such as links to relevant databases not present in the original text) about genes, species, and molecules mentioned. Does this constitute a derivative work?

Probably. Again, it depends on whether the annotations are sufficiently creative such that the article plus annotations can be said to be a new work as a matter of copyright under applicable law. A few annotations may not be sufficient, but the more substantive and prevalent the annotations (which themselves may be protectable expression as a matter of copyright), the more likely it is that the resulting work—(the article as annotated)—will be considered an adaptation.

Note that the Creative Commons 4.0 ND licenses expressly allow adaptations to be made so long as the work in adapted form is not shared. For example, a company may annotate for internal purposes a work under an ND license and circulate it internally. Even if the annotated version is considered an adaptation, this does not violate version 4.0 of the ND licenses so long as the annotated copy is not shared outside the company.

Does inclusion in a compilation qualify as an adaptation?

Example scenario: Researcher A publishes a preprint under a CC BY-ND license. Researcher B wants to compare an assay in this preprint to other similar experiments published in other articles. Researcher B uses a tool that reproduces individual figures from these papers (properly attributed to their original source) into a compilation of figures, adding their own notes. The compilation can be viewed by other researchers. Is this a derivative work of the preprint?

 

Probably not. This is more likely a compilation or collection, not a derivative, because it is incorporated as a separate and independent work into a larger work. To constitute a derivative, the resulting (larger) work has to be based on or derived from the original. Most of the time, excerpts from larger works for inclusion with other works doesn’t result in a derivative of the original, full work. Instead, the exclusive right under copyright at issue is the reproduction right, not the right to create derivatives. That said, under all CC licenses if an excerpt of the original is made whether or not a derivative is created, the reuser still must provide attribution and link back to the original so that others can view the excerpt in its original, unmodified form and context.

Derivatives table

The answers that follow are based on the CC ND licenses, and the answers (where given) are based on the best reading of the licenses but may not apply to your factual situation.

 

Type of change Is this a derivative/adaptation under a CC license?
Formatting
Converting a .pdf to .txt or .doc No
Converting a .pdf to a tagged format like JATS XML (the conversion labels certain parts of the manuscript as author, title, funder, etc – effectively annotating them) No, not without more changes
Annotations
Adding factual information about gene, protein, chemical, or species names Depends (see discussion above)
Adding comments or opinions (for example, a reader annotating a manuscript to the original manuscript Depends (see discussion above)
Adding comments or opinions to a reproduction of the manuscript Depends (see discussion above)
Excerpts
Copying a figure to a collection of figures, which is annotated (for example, Refigure) Not likely
Revising the manuscript
Correcting a typo No
Copyediting an entire manuscript Generally yes, but depends on level of creativity involved
Adding a sentence Not likely
Changing the layout or design of a figure Depends. Not if simply changing formats, but more adjusting and design changes the more likely this is a derivative
Changing a figure originally published in color to black and white Unlikely, but depends on the degree of creativity used by the person making the changes
Adjusting the manuscript to the journal’s format (two columns instead of one, placing figures in the text instead of at the end, adding hyperlinks to references, etc) Depends
Translations
Translating the text to another language Yes

If yes:

  • They require explicit permission from the author if all rights are reserved, or an ND license has been selected unless the adaptation is only used internally (not shared).
  • A separate copyright, owned by the entity making this change, is created. Note that only the contributions of the entity making the change are protected by copyright; the new copyright does not extend to the pre-existing copyrightable elements.

If no:

  • No permission from the author is needed to make the change, assuming that the process of making the derivative does not violate other usage restrictions (like distribution or NonCommercial if the BY-NC-ND license has been applied)

.

 

Share Alike

Note that the CC ShareAlike licenses only require that derivatives be licensed under the same license terms if they are shared. Both triggers must be met: a derivative is made, and the derivative is shared. No CC license requires that a derivative work, once created, be shared.

What does it mean to license a work under a CC ShareAlike license?

The CC ShareAlike licenses (BY-SA or BY-NC-SA) require users to “distribute…contributions under the same license as the original.” See this CC page for more information. Currently, some preprints such as arXiv offers these licenses, while others like bioRxiv do not.

 

Example scenario

Researcher A posts their preprint under a CC BY-SA license, including all diagrams it contains. A textbook author adapts a diagram from the paper into a figure in a textbook. As a condition of reuse, the textbook author must license their adapted diagram under CC BY-SA, but need not license the entire remainder of the book under BY-SA. This is because the diagram has been modified but incorporated into a larger work that is not a derivative of the diagram. The ShareAlike obligation therefore only applies to the diagram as modified.

.

 

Fair use

CC licenses do not reduce, limit, or restrict any rights under exceptions and limitations to copyright, such as fair use or fair dealing.

Do limitations and exceptions to copyright—such as fair use—permit activities such as text and data mining of preprints?

Yes, in the United States and some other jurisdictions text and data mining is permitted as a fair use or exception to copyright. Additionally, all CC licenses permit text and data mining of works under all CC licenses (even the NoDerivatives licenses) where fair use and other exceptions and limitations do not apply. This is one of the benefits of using a CC license–reusers are guaranteed the right to text and data mine so long as a derivative of the work that has been mined is not distributed (where the license is an ND license) and, if under an NC license, their creation of derivatives for internal-only use is not for commercial purposes.

 

According to WIkipedia, fair use is “a doctrine in the law of the United States that permits limited use of copyrighted material without having to first acquire permission from the copyright holder.” In other countries, other types of exceptions and limitations may allow a work to be used without violating copyright. Text and data mining is considered a fair use in the U.S., a product of a few important court cases. And since it is considered a fair use, users need not rely on or comply with a CC license to exercise those rights. The ability for users to freely conduct text and data mining on works is not as clear in other countries.

 

All CC licenses, including the NoDerivatives licenses, allow for text and data mining for uses consistent with the terms of the license. Under the ND licenses, the output from text and data mining may not be shared if the output can be said to be a derivative of the ND-licensed work. Most of the time, output from text and data mining should not be considered a derivative of the underlying ND-licensed work, especially if the output is data (which is not copyrightable) and is not the same data (but computational new data) resulting from the analysis of the original data.

 

Example scenario

A researcher downloads and conducts text and data mining on 30 datasets, all licensed under a CC license of some type, including BY-ND. After conducting their analysis, they generate data that include prevalences data, observations about differences and similarities, and similar. The researcher publishes a paper describing the results of their research, including the data they generated resulting from their text and data mining, without reproducing the datasets per se, though they link to them consistent with academic norms. They has not violated any of the CC licenses because the results they published do not include copies of the underlying datasets, only the results of their analysis. Moreover, their research paper is their own original work and not based on or derived from (in a copyright sense) the datasets.

.

 

Acknowledgements

Special thanks to Diane Peters (Creative Commons), Tim Vollmer (Creative Commons), and Donna Okubo (PLOS), along with other members of our (now sunsetted) licensing group, which includes Emilie David (AAAS), Michele Garfinkel (EMBO), Daniel Himmelstein (UPenn), Heather Joseph (SPARC), Arti Rai (Duke), Sowmya Swaminathan (Springer Nature), Neil Thakur (formerly NIH), Ron Vale (UCSF), and Dick Wilder (formerly Bill & Melinda Gates Foundation).

Additional input was provided by Richard Sever, Ross Mounce, Kevin Smith, Lisa Macklin, Peter Suber, Martyn Rittman, Nick Wehner, Michael Johansson, Joseph Bruckner, Rafael Silva-Rocha, Mayank Chugh, and Jon Tennant.