Replace inconsistent internal license identifiers with persistent SPDX identifiers using scancode-toolkit #75

storytracer · 2024-05-27T10:48:57Z

Our license identifiers in licenses.py are not standardized and incompatible with other datasets. I would suggest using the ScanCode LicenseDB as our reference and rely on SPDX as persistent identifiers. It is the most extensive collection of Open Source licenses out there, is used by the Stack v2 and offers the ready-to-use Python library scancode-toolkit.

We would not need to import all of scancode-toolkit, just the licensedcode module. Here's a usage example: https://github.com/nexB/scancode-toolkit/blob/4f49985c26b6b8f951f116dfa3fe39ce27bd4ce8/etc/scripts/licenses/synclic.py.

Essentially we would just need to add scancode-toolkit to our requirements.txt and two lines to our licenses.py:

from licensedcode.cache import get_licenses_by_spdx_key
SPDX_LICENSES = get_licenses_by_spdx_key()

The License class from the licensedcode module contains the ScanCode identifier, the SPDX identifier, the license URL, a link to the ScanCode LicenseDB, a short and a long name, the actual license text and very importantly a categorization into "Public Domain", "Permissive", "Copyleft", etc. which could be very useful for us.

I would encourage us to identify all licenses in our sources using the scancode-toolkit library, regardless of whether we want to include the datasets in the end or not. In addition we could create a new allow-list of licenses as a subset of the ScanCode License DB. To ensure backwards compatibility, we could additionally re-assign our current hard-coded license identifiers with the short name of the license and simultaneously deprecate their use, i.e.:

CC_BY_SA_2_5 = SPDX_LICENSES["cc-by-sa-2.5"].name

The text was updated successfully, but these errors were encountered:

craffel · 2024-05-27T13:35:30Z

On a quick skim I think all of our allowed licenses are in the DB, so I agree, this would be nice to do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace inconsistent internal license identifiers with persistent SPDX identifiers using scancode-toolkit #75

Replace inconsistent internal license identifiers with persistent SPDX identifiers using scancode-toolkit #75

storytracer commented May 27, 2024

craffel commented May 27, 2024

Replace inconsistent internal license identifiers with persistent SPDX identifiers using scancode-toolkit #75

Replace inconsistent internal license identifiers with persistent SPDX identifiers using scancode-toolkit #75

Comments

storytracer commented May 27, 2024

craffel commented May 27, 2024