harlequin-databricks

A Harlequin adapter for Databricks. Supports connecting to Databricks SQL warehouses or Databricks Runtime (DBR) interactive clusters.

Installation

harlequin-databricks depends on harlequin, so installing this package will also install Harlequin.

Using pip

To install this adapter into an activated virtual environment:

pip install harlequin-databricks

Using poetry

poetry add harlequin-databricks

Using pipx

If you do not already have Harlequin installed:

pipx install harlequin-databricks

If you would like to add the Databricks adapter to an existing Harlequin installation:

pipx inject harlequin harlequin-databricks

As an Extra

Alternatively, you can install Harlequin with the databricks extra:

pip install harlequin[databricks]

poetry add harlequin[databricks]

pipx install harlequin[databricks]

Connecting to Databricks

To connect to Databricks you are going to need to provide as CLI arguments:

server-hostname
http-path
credentials for one of the following authentication methods:
- a personal access token (PAT)
- a username and password
- an OAuth U2M type
- a service principle client ID and secret for OAuth M2M

Personal Access Token (PAT) authentication:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***

Username and password (basic) authentication:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --username *** --password ***

OAuth U2M authentication:

For OAuth user-to-machine (U2M) authentication supply either databricks-oauth or azure-oauth to the --auth-type CLI argument:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --auth-type databricks-oauth

OAuth M2M authentication:

For OAuth machine-to-machine (M2M) authentication you need to pip install databricks-sdk as an additional dependency (databricks-sdk is an optional dependency of harlequin-databricks) and supply --client-id and --client-secret CLI arguments:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --client-id *** --client-secret ***

Store an alias for your connection string

We recommend you include an alias for your connection string in your .bash_profile/.zprofile so you can launch harlequin-databricks with a short command like hdb each time.

Run this command (once) to create the alias:

echo 'alias hdb="harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***"' >> .bash_profile

Using Unity Catalog and want fast Data Catalog indexing?

Supply the --skip-legacy-indexing command line flag if you do not care about legacy metastores (e.g. hive_metastore) being indexed in Harlequin's Data Catalog pane.

This flag will skip indexing of old non-Unity Catalog metastores (i.e. they won't appear in the Data Catalog pane with this flag).

Because of the way legacy Databricks metastores works, a separate SQL query is required to fetch the metadata of each table in a legacy metastore. This means indexing them for Harlequin's Data Catalog pane is slow.

Databricks's Unity Catalog upgrade brought Information Schema, which allows harlequin-databricks to fetch metadata for all Unity Catalog assets with only two SQL queries.

So if your Databricks instance is running Unity Catalog, and you no longer care about the legacy metastores, setting the --skip-legacy-indexing CLI flag is recommended as it will mean much faster indexing & refreshing of the assets in the Data Catalog pane.

Initialization Scripts

Each time you start Harlequin, it will execute SQL commands from a Databricks initialization script. For example:

USE CATALOG my_catalog;
SET TIME ZONE 'Asia/Tokyo';
DECLARE yesterday DATE DEFAULT CURRENT_DATE - INTERVAL '1' DAY;

Multi-line SQL is allowed, but must be terminated by a semicolon.

Configuring the Script Location

By default, Harlequin will execute the script found at ~/.databricksrc. However, you can provide a different path using the --init-path option (aliased to -i or -init):

harlequin -a databricks --init-path /path/to/my/script.sql

Disabling Initialization

If you would like to open Harlequin without running the script you have at ~/.databricksrc, you can either pass a nonexistent path (or /dev/null) to the option above, or start Harlequin with the --no-init option:

harlequin -a databricks --no-init

Other CLI options:

For more details on other command line options, run:

harlequin --help

For more information, see the harlequin-databricks Docs.

Issues, Contributions and Feature Requests

Please report bugs/issues with this adapter via the GitHub issues page. You are welcome to attempt fixes yourself by forking this repo then opening a PR.

For feature suggestions, please post in the discussions.

Special thanks to...

Ted Conbeer, Josh Temple & Tyler Hillery.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
img		img
scripts		scripts
src/harlequin_databricks		src/harlequin_databricks
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

harlequin-databricks

Installation

Using pip

Using poetry

Using pipx

As an Extra

Connecting to Databricks

Personal Access Token (PAT) authentication:

Username and password (basic) authentication:

OAuth U2M authentication:

OAuth M2M authentication:

Store an alias for your connection string

Using Unity Catalog and want fast Data Catalog indexing?

Initialization Scripts

Configuring the Script Location

Disabling Initialization

Other CLI options:

Issues, Contributions and Feature Requests

Special thanks to...

About

Releases 9

Packages

Contributors 2

Languages

License

alexmalins/harlequin-databricks

Folders and files

Latest commit

History

Repository files navigation

harlequin-databricks

Installation

Using pip

Using poetry

Using pipx

As an Extra

Connecting to Databricks

Personal Access Token (PAT) authentication:

Username and password (basic) authentication:

OAuth U2M authentication:

OAuth M2M authentication:

Store an alias for your connection string

Using Unity Catalog and want fast Data Catalog indexing?

Initialization Scripts

Configuring the Script Location

Disabling Initialization

Other CLI options:

Issues, Contributions and Feature Requests

Special thanks to...

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Languages

Packages