A Harlequin adapter for Databricks. Supports connecting to Databricks SQL warehouses or Databricks Runtime (DBR) interactive clusters.
harlequin-databricks
depends on harlequin
, so installing this package will also install Harlequin.
To install this adapter into an activated virtual environment:
pip install harlequin-databricks
poetry add harlequin-databricks
If you do not already have Harlequin installed:
pipx install harlequin-databricks
If you would like to add the Databricks adapter to an existing Harlequin installation:
pipx inject harlequin harlequin-databricks
Alternatively, you can install Harlequin with the databricks
extra:
pip install harlequin[databricks]
poetry add harlequin[databricks]
pipx install harlequin[databricks]
To connect to Databricks you are going to need to provide as CLI arguments:
- server-hostname
- http-path
- credentials for one of the following authentication methods:
- a personal access token (PAT)
- a username and password
- an OAuth U2M type
- a service principle client ID and secret for OAuth M2M
harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***
harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --username *** --password ***
For OAuth user-to-machine (U2M) authentication
supply either databricks-oauth
or azure-oauth
to the --auth-type
CLI argument:
harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --auth-type databricks-oauth
For OAuth machine-to-machine (M2M) authentication
you need to pip install databricks-sdk
as an additional dependency
(databricks-sdk is an optional dependency of
harlequin-databricks
) and supply --client-id
and --client-secret
CLI arguments:
harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --client-id *** --client-secret ***
We recommend you include an alias for your connection string in your .bash_profile
/.zprofile
so
you can launch harlequin-databricks with a short command like hdb
each time.
Run this command (once) to create the alias:
echo 'alias hdb="harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***"' >> .bash_profile
Supply the --skip-legacy-indexing
command line flag if you do not care about legacy metastores
(e.g. hive_metastore
) being indexed in Harlequin's Data Catalog pane.
This flag will skip indexing of old non-Unity Catalog metastores (i.e. they won't appear in the Data Catalog pane with this flag).
Because of the way legacy Databricks metastores works, a separate SQL query is required to fetch the metadata of each table in a legacy metastore. This means indexing them for Harlequin's Data Catalog pane is slow.
Databricks's Unity Catalog upgrade brought Information Schema, which allows harlequin-databricks to fetch metadata for all Unity Catalog assets with only two SQL queries.
So if your Databricks instance is running Unity Catalog, and you no longer care about the legacy
metastores, setting the --skip-legacy-indexing
CLI flag is recommended as it will mean
much faster indexing & refreshing of the assets in the Data Catalog pane.
Each time you start Harlequin, it will execute SQL commands from a Databricks initialization script. For example:
USE CATALOG my_catalog;
SET TIME ZONE 'Asia/Tokyo';
DECLARE yesterday DATE DEFAULT CURRENT_DATE - INTERVAL '1' DAY;
Multi-line SQL is allowed, but must be terminated by a semicolon.
By default, Harlequin will execute the script found at ~/.databricksrc
. However, you can provide
a different path using the --init-path
option (aliased to -i
or -init
):
harlequin -a databricks --init-path /path/to/my/script.sql
If you would like to open Harlequin without running the script you have at ~/.databricksrc
, you
can either pass a nonexistent path (or /dev/null
) to the option above, or start Harlequin with
the --no-init
option:
harlequin -a databricks --no-init
For more details on other command line options, run:
harlequin --help
For more information, see the harlequin-databricks Docs.
Please report bugs/issues with this adapter via the GitHub issues page. You are welcome to attempt fixes yourself by forking this repo then opening a PR.
For feature suggestions, please post in the discussions.