Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace inconsistent internal license identifiers with persistent SPDX identifiers using scancode-toolkit #75

Open
storytracer opened this issue May 27, 2024 · 1 comment

Comments

@storytracer
Copy link

Our license identifiers in licenses.py are not standardized and incompatible with other datasets. I would suggest using the ScanCode LicenseDB as our reference and rely on SPDX as persistent identifiers. It is the most extensive collection of Open Source licenses out there, is used by the Stack v2 and offers the ready-to-use Python library scancode-toolkit.

We would not need to import all of scancode-toolkit, just the licensedcode module. Here's a usage example: https://github.com/nexB/scancode-toolkit/blob/4f49985c26b6b8f951f116dfa3fe39ce27bd4ce8/etc/scripts/licenses/synclic.py.

Essentially we would just need to add scancode-toolkit to our requirements.txt and two lines to our licenses.py:

from licensedcode.cache import get_licenses_by_spdx_key
SPDX_LICENSES = get_licenses_by_spdx_key()

The License class from the licensedcode module contains the ScanCode identifier, the SPDX identifier, the license URL, a link to the ScanCode LicenseDB, a short and a long name, the actual license text and very importantly a categorization into "Public Domain", "Permissive", "Copyleft", etc. which could be very useful for us.

I would encourage us to identify all licenses in our sources using the scancode-toolkit library, regardless of whether we want to include the datasets in the end or not. In addition we could create a new allow-list of licenses as a subset of the ScanCode License DB. To ensure backwards compatibility, we could additionally re-assign our current hard-coded license identifiers with the short name of the license and simultaneously deprecate their use, i.e.:

CC_BY_SA_2_5 = SPDX_LICENSES["cc-by-sa-2.5"].name
@craffel
Copy link
Collaborator

craffel commented May 27, 2024

On a quick skim I think all of our allowed licenses are in the DB, so I agree, this would be nice to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants