You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our license identifiers in licenses.py are not standardized and incompatible with other datasets. I would suggest using the ScanCode LicenseDB as our reference and rely on SPDX as persistent identifiers. It is the most extensive collection of Open Source licenses out there, is used by the Stack v2 and offers the ready-to-use Python library scancode-toolkit.
The License class from the licensedcode module contains the ScanCode identifier, the SPDX identifier, the license URL, a link to the ScanCode LicenseDB, a short and a long name, the actual license text and very importantly a categorization into "Public Domain", "Permissive", "Copyleft", etc. which could be very useful for us.
I would encourage us to identify all licenses in our sources using the scancode-toolkit library, regardless of whether we want to include the datasets in the end or not. In addition we could create a new allow-list of licenses as a subset of the ScanCode License DB. To ensure backwards compatibility, we could additionally re-assign our current hard-coded license identifiers with the short name of the license and simultaneously deprecate their use, i.e.:
CC_BY_SA_2_5=SPDX_LICENSES["cc-by-sa-2.5"].name
The text was updated successfully, but these errors were encountered:
Our license identifiers in licenses.py are not standardized and incompatible with other datasets. I would suggest using the ScanCode LicenseDB as our reference and rely on SPDX as persistent identifiers. It is the most extensive collection of Open Source licenses out there, is used by the Stack v2 and offers the ready-to-use Python library scancode-toolkit.
We would not need to import all of scancode-toolkit, just the
licensedcode
module. Here's a usage example: https://github.com/nexB/scancode-toolkit/blob/4f49985c26b6b8f951f116dfa3fe39ce27bd4ce8/etc/scripts/licenses/synclic.py.Essentially we would just need to add
scancode-toolkit
to our requirements.txt and two lines to our licenses.py:The License class from the
licensedcode
module contains the ScanCode identifier, the SPDX identifier, the license URL, a link to the ScanCode LicenseDB, a short and a long name, the actual license text and very importantly a categorization into "Public Domain", "Permissive", "Copyleft", etc. which could be very useful for us.I would encourage us to identify all licenses in our sources using the scancode-toolkit library, regardless of whether we want to include the datasets in the end or not. In addition we could create a new allow-list of licenses as a subset of the ScanCode License DB. To ensure backwards compatibility, we could additionally re-assign our current hard-coded license identifiers with the short name of the license and simultaneously deprecate their use, i.e.:
The text was updated successfully, but these errors were encountered: