Detect Secrets Stream is a server tool which ingests metadata of all (public repositories by default, private repositories are opt-in only) git pushes on your company's GitHub Enterprise server. For each push, it scans the push contents for secrets. Once found and verified, secrets metadata will be stored in a database, and the raw secret will be stored in Vault.
There is a companion Admin tool which enables org admins to:
- Opt-in their private repository for scanning
- Add security folks to be notified of leaked tokens
Under the hood, the server tool uses the developer tool to scan for secrets. Read IBM/detect-secrets for more info about developer tool.
python3
- If you are using macOs, you can use
pyenv
to install Python 3:brew install pyenv; pyenv install 3.8.5;
- Optionally, you can set your system's default Python to Python 3
pyenv global 3.8.5
. Then restart your shell. Runpython --version
to validate the default Python version is 3.
- If you are using macOs, you can use
docker
https://docs.docker.com/get-docker/skaffold
, v1.12.1 and abovekustomize
, v3.8.1 and above. Do NOT use the version bundled withkubectl
, as it does not support some options use by this project (e.g. replicas).container-structure-test
, used for docker image validation - installation.pipenv
, used to manage all Python dependencies.
You can install all the tools except docker and python 3 with one liner below
brew install kustomize skaffold container-structure-test pipenv
- Navigate into the cloned repo
- Start the
pipenv
shell withpipenv shell
- Install Python dependencies with
pipenv install --dev
- Initialize the
pre-commit
tool withpre-commit install
Example secrets can be found in secrets.template (more docs coming). For your local dev environment, some secrets are auto-generated. For prod environments, you will need to supply the real secrets.
To set up your local dev secrets, first run ./kustomize_envs/dev/gen-secret.sh
. It will create two hidden folders under /kustomize_envs/dev/
:
secret_generated/
containing automatically-generated secrets. Secrets get regenerated when re-runninggen-secret.sh
.secret_manual/
containing manually-entered secrets. Runninggen-secret.sh
will generate a template for secrets under this folder. You do need to manually update each secret to match your environment. Re-runninggen-secret.sh
will not overwrite the files if they are non-empty.
This table contains information on what the values of the manually-entered secrets should be set to:
File name | Value |
---|---|
app.key |
The test GitHub App's private key. Download this from the GitHub App config UI. |
env.txt |
The test GitHub App's app ID. Obtained from the GitHub App config UI. |
db2consv_zs.lic |
The IBM DB2 license certificate file. You can read more about how to retrieve it here. This is only needed if looking for DB2 for z or DB2 for i secrets. |
email.conf |
Your company's internal email regex. Replace mycompany.com with your company's email domain. |
ghe_revocation.token |
The Jenkins job trigger token to revoke the GHE token. |
github.conf |
github.mycompany.com - replace this with your company's GHE domain. tokens - a list of GitHub API tokens (no scopes necessary). This token pool is necessary for when a single token's rate limit has been reached. <org> and <repo> should be replaced with the organization/repository containing the org set configuration for private repository scanning. |
iam.conf |
The IBM Cloud IAM API key for an admin account which can resolve an IBM Cloud IAM token owner. |
kafka.conf |
brokers_sasl - comma-separated Kafka broker list. For example broker-1:9093,broker-2:9093,broker-3:9093 . api_key - Kafka API key to publish and consume from the queue. When using IBM Cloud Events Stream service, you can obtain such value from the Events Stream console by navigating to the Service credentials panel and creating a new service credential. brokers_sasl is the value of kafka_brokers_sasl (without " or spaces) from your service credential. api_key is the value of api_key from your service credential. |
revoker_urls.conf |
Replace github.mycompany.com with your company's GHE URL, artifactory with your company's artifactory URL, and jenkins with your company's Jenkins URL, which should contain a Jenkins job to revoke the GHE token. |
Besides the secrets mentioned from Local dev secrets, for production environments, you also need to prepare secrets which are auto-generated in the dev environment.
This table contains information on what the values of the dev secrets should be set to:
File name | Value |
---|---|
basic_auth.conf |
Basic auth info for ingestion and revoker layer. |
dc_iv_file and dc_key_file |
The key file used for deterministic encryption. This will be replaced by non-deterministic encryption later. |
gd_db.conf |
Database related secrets. |
hmac.key |
HMAC key used in hashing algorithm. |
encryption.key and encryption.key.pub |
Encryption key used in non-deterministic encryption. |
vault.conf |
Vault related secrets. DSS uses approle for auth and KV v1 as secret engine. |
You don't need to unlock secrets when running unit tests.
make test
make test-unit
This provides faster feedback if you are just writing code for a module.
# The part after last dash (-) corresponding to folder name under detect_secrets_stream
# For example, run unit test for just files under bp_lookup
make test-unit-bp_lookup
# Run unit test for just files under pi_cleaner
make test-unit-pi_cleaner
This requires a personal or staging environment. See kustomize_envs/dev/README.md for more details.
This repo has provided a utility module which enables an admin to do many routine tasks. The utility is invoked though python -m detect_secrets_stream.util.secret_util
Running the utility requires several environment variables. An example environment variable file (.env.example
) has been provided.
From a fish
shell, you can do something like below
cp .env.example .env.prod
env (grep -v '^#' .env.prod | xargs -n1) python -m detect_secrets_stream.util.secret_util --help
Based on id
python -m detect_secrets_stream.util.secret_util decrypt-token-by-id [token_id]
Based on UUID
python -m detect_secrets_stream.util.secret_util decrypt-token-by-uuid [uuid]
python -m detect_secrets_stream.util.secret_util get-org-admins [ORG_NAME]
# export require env vars
# macos
python -m detect_secrets_stream.util.secret_util backfill --size=10000 --from $(date -j -f "%a %b %d %T %Z %Y" "Wed Sep 11 00:00:00 EDT 2019" +"%s") --to $(date -j -
f "%a %b %d %T %Z %Y" "Wed Sep 12 00:00:00 EDT 2019" +"%s")
# linux
python -m detect_secrets_stream.util.secret_util backfill --size=10000 --from $(date -d '06/12/2012 07:21:22' +"%s") --to $(date -d '06/12/2012 08:21:22' +"%s")
Manually add a commit to the diff-scan
queue.
Note: must set KAFKA_CLIENT_ID
, GD_KAFKA_CONF
environment variables.
KAFKA_CLIENT_ID
is the name of the Kafka client used for manual ingestion. It can be anything, such as manual-ingest
GD_KAFKA_CONF
points to the Kafka configuration file. The production config is stored under kustomize_envs/prod-secrets/secret/kafka.conf
. The one below is an example of what should be contained in the config file.
[kafka]
brokers_sasl = my_sasl1.us-east.containers.appdomain.cloud:9000,my_sasl2.us-east.containers.appdomain.cloud:9000,my_sasl3.us-east.containers.appdomain.cloud:9000
api_key = my_api_key
Sample usage:
# export require env vars
python -m detect_secrets_stream.util.secret_util ingest-commit -r <repo> -c <commit>
More options: running the command above with --help
will reveal help info on more options. For example, if you know the branch and repository's visibility, you can also supply these.
By default, it assumes the commit is from the master branch and the repository's visibility is public.
python -m detect_secrets_stream.util.secret_util --help
The mechanism for removing old secrets is a cronjob which cleans PI on a daily basis.
It removes the following PI for all tokens that have been remediated for over seven days:
- Owner email
- Secret
- Encrypted secret
- Other factors
- Author name
- Author email
- Pusher username
- Pusher email
- Committer name
- Committer email
- Repo slug
- Location url
Every four hours, a cronjob checks if all live tokens in the database are still live and updates their statuses accordingly.
CREATE ROLE scan_worker_role;
GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA public TO scan_worker_role;
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO scan_worker_role;
GRANT TRUNCATE ON TABLE public.vmt_report TO scan_worker_role;
CREATE ROLE vmt_role WITH LOGIN;
GRANT CONNECT ON DATABASE <db_name> TO vmt_role;
GRANT USAGE ON SCHEMA public TO vmt_role;
GRANT SELECT ON public.vmt_report TO vmt_role;
CREATE USER vmt_user WITH IN GROUP vmt_role PASSWORD [redacted]
Column name | Type | Description |
---|---|---|
vuln_id |
VARCHAR | The vulnerability ID |
token_owner_email |
VARCHAR | The token owner's email address |
token_type |
VARCHAR | The type of token, such as 'Slack', 'GHE', Softlayer; (10 < types < 100) |
vulnerability |
VARCHAR | The vulnerability |
pusher_email |
VARCHAR | The commit pusher's email address |
committer_email |
VARCHAR | The committer's email address |
author_email |
VARCHAR | The author's email |
date_last_tested |
TIMESTAMPTZ | The date that the token was last tested |
date_remediated |
TIMESTAMPTZ | The date the token was remediated |
security_focals |
VARCHAR | The security focals |
repo_public |
BOOLEAN | Whether the token has been leaked in at least one public repository |
repo_private |
BOOLEAN | Whether the token has been leaked in at least one private repository |
See the additional docs for more information: