Databricks SQL Connector for Python

The Databricks SQL Connector for Python is a Python library that allows you to use Python code to run SQL commands on Databricks clusters and Databricks SQL endpoints.

Requirements

A development machine running Python 3.7 or higher.
An existing cluster or SQL endpoint.

Get started

Gather the following information for the cluster or SQL endpoint that you want to use:

Cluster – The server hostname of the cluster. You can get this from the Server Hostname value in the Advanced Options > JDBC/ODBC tab for your cluster.
SQL endpoint – The server hostname of the SQL endpoint. You can get this from the Server Hostname value in the Connection Details tab for your SQL endpoint.
A valid access token. You can use an Azure Databricks personal access token for the workspace. You can also use an Azure Active Directory access token.

Generate a personal access token

Authentication using Databricks personal access tokens | Databricks on AWS

The number of personal access tokens per user is limited to 600 per workspace.

Click Settings in the lower left corner of your Databricks workspace.
Click User Settings.
Go to the Access Tokens tab.
Click the Generate New Token button.
Optionally enter a description (comment) and expiration period.

Install the Databricks SQL Connector for Python library

Install the Databricks SQL Connector for Python library on your development machine by running pip install databricks-sql-connector.

$ pip install databricks-sql-connector

Query data

The following code example demonstrates how to call the Databricks SQL Connector for Python to run a basic SQL command on a cluster or SQL endpoint.

from databricks import sql

with sql.connect(server_hostname="<server-hostname>",
                 http_path="<http-path>",
                 access_token="<access-token>") as connection:
    with connection.cursor() as cursor:
        cursor.execute("SELECT * FROM <database-name>.<table-name> LIMIT 2")
        result = cursor.fetchall()

        for row in result:
          print(row)

Insert data

from databricks import sql

with sql.connect(server_hostname="...", http_path="...", access_token="...") as connection:
  with connection.cursor() as cursor:
    cursor.execute("CREATE TABLE IF NOT EXISTS squares (x int, x_squared int)")
    squares = [(i, i * i) for i in range(100)]
    values = ",".join([f"({x}, {y})" for (x, y) in squares])

    cursor.execute(f"INSERT INTO squares VALUES {values}")

Query metadata

from databricks import sql

with sql.connect(server_hostname="...", http_path="...", access_token="...") as connection:
  with connection.cursor() as cursor:
    cursor.columns(schema_name="default", table_name="squares")
    print(cursor.fetchall())

Cursor and connection management

from databricks import sql

connection = sql.connect(server_hostname="...", http_path="...", access_token="...")
cursor = connection.cursor()

cursor.execute("SELECT * from range(10)")
print(cursor.fetchall())

cursor.close()
connection.close()

Configure logging

import logging

from databricks import sql

logging.getLogger('databricks.sql').setLevel(logging.DEBUG)
sql.connect(...)