Skip to main content
AboutCode team
Open source for open source
View all authors

ScanCode LicenseDB -- 2,000+ licenses curated in a public database

· 4 min read
AboutCode team
Open source for open source

The ScanCode LicenseDB is all about identifying a wide variety of licenses that are actually found in software.

ScanCode-LicenseDB-2026-01

New software licenses appear constantly (like mushrooms popping out of the ground after a heavy rain) and old nearly-forgotten ones are rediscovered when someone scans a codebase that incorporates legacy code (like finding rare medieval manuscripts in the back shelves of a library). The ScanCode LicenseDB precisely identifies and organizes licenses and their metadata so that multiple members of the software community can understand exactly which licenses are being referenced in project documentation.

If you have seen a license notice, passed it on to your legal team for scrutiny, and completed that review, then you probably do not want to repeat that process over and over again.

With over 2,000 licenses, ScanCode LicenseDB is arguably the largest free list of curated software licenses available on the internet, and an essential reference license resource for license compliance and SBOMs. ScanCode LicenseDB is available as a website, a JSON or YAML API, and a git repository making it easy to reuse and integrate in tools that need a database of reference software licenses.

Here are some key points about the ScanCode LicenseDB:

  • Is a list of 2,470 licenses recognized by scancode-toolkit as of 2026-01-29
  • Identifies each license by the license key defined in scancode-toolkit
  • Provides an SPDX Identifier (with link) to every license and exception on the SPDX License List, and a “Licenseref” identifier for every license and exception not on the SPDX License List.
  • Provides license texts in plain text formats.
  • Provides license texts and metadata in yml and json.
  • Freely accessible via API
  • Data licensed under CC-BY-4.0
  • Community supported on GitHub.

And below are some frequently asked questions about the ScanCode LicenseDB.

Q: What are the inclusion criteria for a license to be in the ScanCode LicenseDB?

A: The only requirements are a text and a usage in existing code. The ScanCode LicenseDB includes multiple categories of licenses, not just open source: permissive, copyleft, commercial, proprietary free, source-available, etc. More information on license categories is available here: https://scancode-licensedb.aboutcode.org/help.html#license-categories

Q: Does the ScanCode LicenseDB compete with other license lists, such as the SPDX license list?

A: No. The ScanCode LicenseDB is intended to supplement other license lists. When new licenses are discovered by scancode-toolkit or the software community, they are added to the list with references to other lists whenever possible.

Q: What is the process for adding or correcting licenses in the ScanCode LicenseDB?

A: License curation is primarily a task of the active participants in AboutCode.org, but any member of the software community is welcome to log and respond to issues at https://github.com/nexB/scancode-licensedb/issues. See https://scancode-licensedb.aboutcode.org/help.html#support for more details.

Q: Is a license in the ScanCode LicenseDB “approved” or “recommended for use”?

A: The ScanCode LicenseDB is all about identifying the wide variety of licenses that are actually found in software. There is no attempt to approve or disapprove of license terms, and there is no attempt to correct poorly written licenses. The only license interpretation provided is a license category, which represents the best judgment of the license curators.

Q: How are licenses discovered (detected) by scancode-toolkit?

A: For license detection, ScanCode uses a (large) number of license texts and license detection ‘rules’ that are compiled in a search index. When scanning, the text of the target file is extracted and used to query the license search index and find license matches.

For copyright detection, ScanCode uses a grammar that defines the most common and less common forms of copyright statements. When scanning, the target file text is extracted and ‘parsed’ with this grammar to extract copyright statements.

More detailed information is available at https://scancode-toolkit.readthedocs.io/en/stable/explanation/scancode-license-detection.html#scancode-license-detection.

Q: How can I get help or contribute to ScanCode LicenseDB?

A: You can chat with the AboutCode community on Gitter, or report issues or ask questions at https://github.com/nexB/scancode-licensedb/issues.

What is a Dual License Anyway?

· 4 min read
AboutCode team
Open source for open source

Make it easier for users and remove the word “Dual” from your software project notice vocabulary.

dual_licensing-1

“This project is licensed under a Dual License of BSD and GPL.”

What does “Dual” mean in this context? In a practical sense, it means you have to dig more deeply into the licensing for that project to figure out what this license statement means:

  • Both the BSD AND GPL apply? (conjunctive)
  • Or choose between BSD OR GPL? (disjunctive)
  • Which version of BSD?
  • And which version of GPL?

Typically, but not always, this example statement means that you have a choice of BSD-3-Clause OR GPL 2.0 or later because these are the most common versions of those licenses. As the consumer of the software project you must conclude that interpretation and choice, usually after exploring the other license notices in the project files. You must declare that choice in the attribution of your project(s) or product(s) that use that software.

But doesn’t “Dual” mean “consisting of two parts”? Well, yes, that is true in standard English usage, but in the historical practice of many open source projects, this term is ambiguously applied. This wreaks havoc on license detection programs, and creates more busy-work for anyone wanting to use the “Dual-licensed” software.

If you are publishing an open source project, you may of course declare that the project code is under one license, and the project documentation is under another license, and the sample files are under another license. That makes perfect sense, especially if you do not use the word “Dual”. In fact, it would be best to remove the word “Dual” from your project notice vocabulary altogether. If you are publishing a project under a choice of licenses, you should probably indicate what the default license is in case the user of your software does not understand that a stated license conclusion is necessary, and you should avoid referring to that choice as a “Dual” license.

The best solution is to use a standard license expression which explicitly states whether the relationship between two licenses is “AND” or “OR”. The most common syntax for license expressions is from the SPDX v2.3 specification. There are many examples from the SPDX license list or the ScanCode LicenseDB. License identification precision provides the clarity that potential users of your software need to be compliant with the licensing terms.

Dual FOSS/Proprietary Licenses

An increasingly common occurrence in software project licensing is the statement that a project is dual-licensed under a FOSS license and a commercial alternative. This usually means there is a choice between the two licenses, and again the word “dual” is misleading because it makes no sense for both the FOSS license (especially a copyleft license) and the commercial alternative to apply simultaneously and equally. Also note that in such cases, the commercial alternative is often a requirement if the usage of the software goes beyond certain restrictions (e.g. number of users, deployment on a public network, embedding in a commercial product, etc.). Any license notices of this kind should be carefully reviewed to avoid legal risks.

The best practice for a multiple license use case is to state a valid license expression using the correct operator and standard license identifiers. Some examples:

  • /* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0-or-later */

  • /* SPDX-License-Identifier: BSD-3-Clause AND MIT */

  • /* SPDX-License-Identifier: AGPL-3.0-only OR LicenseRef-scancode-commercial-license */

You can, of course, provide additional explanatory text, remembering always to avoid the inappropriate use of the word “dual”.

More details about license expression syntax are provided in SPDX’s docs.

Additional Reading

The following links provide varying perspectives on “dual” licensing:

License Clarity Scoring in ScanCode

· 5 min read
AboutCode team
Open source for open source

When automating SCA, License Clarity Scoring helps determine if scan results require more review.

license-clarity-2

When automating Software Composition Analysis (SCA) with a scanning tool, you need to quickly evaluate the results – especially to determine whether or not the results require a deeper investigation.

ScanCode now includes License Clarity Scoring to provide users with a confidence level regarding the automated scan results.

License Clarity is a set of criteria that indicate how clearly, comprehensively and accurately a software project has defined and communicated the licensing that applies to the project software. Note that this is not an indication of the license clarity of any software dependencies.

License Clarity Scoring in ScanCode uses that series of criteria to then rank how well a software project provides licensing information.

usbmuxd-1.1.1-1536x992

Declared License Expression

declared_license_expression is the primary license expression as determined from the declaration(s) of the authors of the package.

The new summary fields are:

  • declared_license_expression
  • declared_holder
  • primary_language
  • other_license_expressions
  • other_holders
  • other_languages

Note that the term declared_license_expression is used equivalently for the concept of a primary license expression in order to align with community usage, such as SPDX.

Here is how ScanCode determines the value for a declared_license_expression, primary_holder and primary_language of a package when it scans a codebase:

  1. Look at the root of a codebase to see if there are any package manifest files that have origin information.
  2. If there is package data available, collect the license expression, holder, and package language and use that information as the declared_license_expression, declared_holder, and primary_language.
  3. If there are multiple package manifests at the codebase root, then concatenate all of the license expressions and holders together and use those concatenated values to construct the declared_license_expression and declared_holder.
  4. If there is no package data, then collect license and holder information from key files (such as LICENSE, NOTICE, README, COPYING, and ADDITIONAL_LICENSE_INFO). Try to find the primary license from the licenses referenced by the key files. If unable to determine a single license that is the primary, then concatenate all of the detected license expressions from key files together and use that as a conjunctive declared_license_expression. Concatenate all of the detected holders from key files together as the declared_holder.
  5. Note that a count of how many times a license identifier occurs in a codebase does NOT necessarily identify a license that appears in the (primary) declared_license_expression due to the typical inclusion of multiple third-party libraries that may have varying standards for license declaration. It is possible that the declared_license_expression constructed by this process may not appear literally in the codebase.

As of DejaCode 4.2, you can also access the new license clarity scoring fields and summary fields in the Scan tab of the Package details user view.

When you scan a Package from DejaCode, you can view the Scan Results in a Scan tab on the Package details user view. DejaCode presents a selection of scan details with an emphasis on license detection. You can also download the complete Scan Results in .json format.

You can set the values from declared_license_expression, declared_holder, and primary_language to the package definition in DejaCode.

pyinstaller-5.5-1426x1536

License Clarity Scoring

The license clarity score is a value from 0-100 calculated by combining the weighted values determined for each of the scoring elements: Declared license, Identification precision, License texts, Declared copyright, Ambiguous compound licensing, and Conflicting license categories.

Declared license (Scoring weight = 40)

When true, indicates that the software package licensing is documented at top-level or well-known locations (key files) in the software project, typically in a package manifest, NOTICE, LICENSE, COPYING or README file.

Identification precision (Scoring weight = 40)

Identification precision indicates how well the license statement(s) of the software identify known licenses that can be designated by precise keys (identifiers) as provided in a publicly available license list, such as the ScanCode LicenseDB, the SPDX license list, the OSI license list, or a URL pointing to a specific license text in a project or organization website.

License texts (Scoring weight = 10)

License texts are provided to support the declared license expression in files such as a package manifest, NOTICE, LICENSE, COPYING or README.

When true, indicates that the software package copyright is documented at top-level or well-known locations (key files) in the software project, typically in a package manifest, NOTICE, LICENSE, COPYING or README file.

Ambiguous compound licensing (Scoring negative weight = -10)

When true, indicates that the software has a license declaration that makes it difficult to construct a reliable license expression, such as in the case of multiple licenses where the conjunctive versus disjunctive relationship is not well defined.

Conflicting license categories (Scoring negative weight = -20)

When true, indicates the declared_license_expression of the software is in the permissive category, but that other potentially conflicting categories, such as copyleft and proprietary, have been detected in lower level code.

cla-workstation-order-df07672-1536x1364

Want to see License Clarity Scoring in action? Download ScanCode.io or sign up for a free DejaCode account.

ScanCode provides you the license clarity score when you specify the --summary option for a scan. ScanCode.io specifies that option for you automatically.

DejaCode makes it even easier and specifies all the scan options that you need automatically when you request a package scan.

Using Copyleft-licensed software components in a Java application

· 7 min read
AboutCode team
Open source for open source

Key considerations while using Copyleft-licensed software components in a Java application.

java_copyleft_license

This document explains some key considerations for the use of Copyleft-licensed software components in a Java application in two contexts:

  • Execution of the Java code in a shared JVM.
  • Combining class files in a shared executable JAR – and by extension in a Combined JAR (e.g. uber-jar or fat jar).

For this document, “JAR” refers specifically to an executable Java library that is a collection of .class files packaged into a file with the .jar extension; it does not refer to the use of a .jar file as an archive file only (such as for packaging source files for a Java library).

The purpose of this document is to present a “conservative” interpretation of what linking, or interaction may mean in the Java context. It is not based on any particular product or application and we are not aware of any specific license compliance enforcement actions in this area.

“Strong” Copyleft-licensed Components

The execution of any software component licensed under GPL (or another “strong” Copyleft license such as AGPL, SleepyCat, etc.) in a JVM effectively links that component with all other software components in that JVM process and therefore those other components become subject to GPL license obligations including redistribution of source code.

The net impact of this interaction inside a JVM is that you should not Deploy any GPL-licensed code in a commercial Java-based product, unless that GPL-licensed code is executed in a separate JVM. This use case is possible, but quite rare in practice.

In such rare cases, the GPL-licensed component should be used as-is and un-modified.

If a modification is absolutely required, the purpose of the modification must not be to expose some privileged way to communicate with this library from proprietary code such as exposing a socket interface or other API for the sole benefit of avoiding a direct call to the Copyleft-licensed library.

Such modifications would be considered as essentially similar to running the Copyleft-licensed library in the same JVM process and making direct calls so that the Copyleft obligation would still apply.

“Limited” Copyleft-licensed Components

Any code included within a JAR can be considered to be statically linked with any other code in that JAR, even though strictly-speaking there is no such concept of “static linking” in Java technology.

The primary logic here is that a JAR is an executable program and all of the files inside it interact within that context.

Clearly there are many programming-level differences between:

  1. the process of compiling and linking C/C++ source files into an executable program and
  2. the process of converting .java or other source files (such as Scala) into .class files and packaging them into a JAR.

But there are more similarities than differences. The net impact of this interaction inside a JAR is that you should not deploy any Copyleft-licensed code in a JAR in combination with any proprietary code.

The impact of software interaction of .class files within a JAR varies according to the specific subtype of limited Copyleft license. There are three primary subtypes to consider:

  1. LGPL
  2. GPL with Classpath Exception
  3. “Public” or file-based licenses (CDDL, EPL, MPL)

1) LGPL

The LGPL version 2 and version 3 licenses are quite different, but in both cases there are specific terms and conditions related to software interaction and these provide the strongest case that combining .class files in an executable .jar is a form of static linking.

2) GPL with Classpath Exception

This license permits static linking of “independent modules”, but it may be hard to argue that .class files combined into a single JAR are independent.

3) “Public” or file-based licenses (CDDL, EPL, MPL)

The Copyleft impact from these licenses are primarily limited to the file level so this is the best case to argue that you can combine class files into one JAR without Copyleft impact.

For a component licensed under any of the Limited Copyleft licenses, you do have the option to dynamically link separate libraries (JARs) within a JVM. This is different from GPL-licensed code, as described above, because you can dynamically link libraries under a Limited Copyleft license inside a JVM without a Copyleft impact on other libraries.

The recommended best practice is to Deploy any Java library under a Limited Copyleft license as a separate “dynamic” library as provisioned from the original OSS project. This is the best way to avoid Copyleft impact.

Combined JARs: uber-jars, mega-jars and fat-jars

Java code is typically packaged and redistributed as pre-compiled .class files assembled in one or more JAR libraries. Open source Java libraries are commonly downloaded at build time from a repository such as Maven (either a private or the Maven Central public repository).

The process of creating a Combined JAR is to combine the .class files from all of the third-party dependency JARs together with proprietary-licensed .class files in a single JAR. This larger Combined JAR mixes open source (and possibly Copyleft-licensed code) and proprietary code in a single JAR.

Creating larger Combined JARs is typically automated as part of a product build. Maven-based build plugins and tools include Maven Shade, one-jar, fat jar and others.

In most cases, this is an addition to the build that is easily reversed to revert to a multi-jar deployment approach. The technical purpose of building a Combined JAR may be to:

  • Simplify the deployment or configuration of some larger Java applications by reducing the number of .jar libraries to be deployed.
  • Simplify runtime configuration. In particular the Java class paths do not need to be configured to reference the dependencies since they are all contained in a single executable library.
  • Accelerate initial loading of the application in the JVM where startup time is critical for the application. This acceleration is likely to be minimal.

In addition to the Copyleft interaction issues outlined above, some other disadvantages of using Combined JARs are:

  • In the process of creating a Combined JAR, some common files with the same name and path (such as NOTICE, LICENSE) may be overwritten in a Combined JAR. Only one copy of each such file will exist in the Combined JAR. The terms of most open source licenses do not permit you to remove license or notice files.
  • The repackaging of un-modified JARs in a Combined JAR could be considered to be a modification. Most Copyleft licenses require you to track and document changes so this repackaging may require additional documentation work for the product team.
  • Tracing the package-version of an individual third-party component included in a Combined JAR may be difficult, which in turn may make it difficult to comply with Copyleft license conditions that require an offer to redistribute package-version-specific source code.
  • When updating software, the entire Combined JAR will need to be rebuilt even if most individual third-party packages are unchanged. In particular if a single third-party component JAR needs to be updated for a vulnerability, bug or new feature fix, then the whole Combined JAR need to be redistributed to customers.
  • If several larger Combined JARs are created in a product, the resulting size of the executables may be larger, as the contents of every shared third-party JAR will be duplicated in each Combined JAR instead of being shared across modules. Thus, a Combined JAR can impede the possibility and flexibility of Java library reuse.

In general, Combined JARs are best suited for Deployment of Java applications in an internal system/IT- or SaaS-only use case where some of their benefits are measurable and there are fewer issues related to license compliance and Copyleft-licensed component interaction.

When used in a commercial product that is distributed in any way, the issues attached to larger combined JARs usually outweigh any technical benefits that they may offer.