- Databricks SQL Connector for Python – Azure Databricks | Microsoft Docs
- Databricks SQL Connector for Python | Databricks on AWS
The Databricks SQL Connector for Python is a Python library that allows you to use Python code to run SQL commands on Databricks clusters and Databricks SQL endpoints.
Requirements
- A development machine running Python 3.7 or higher.
- An existing cluster or SQL endpoint.
Get started
Gather the following information for the cluster or SQL endpoint that you want to use:
- Cluster – The server hostname of the cluster. You can get this from the Server Hostname value in the Advanced Options > JDBC/ODBC tab for your cluster.
- SQL endpoint – The server hostname of the SQL endpoint. You can get this from the Server Hostname value in the Connection Details tab for your SQL endpoint.
- A valid access token. You can use an Azure Databricks personal access token for the workspace. You can also use an Azure Active Directory access token.
Generate a personal access token
The number of personal access tokens per user is limited to 600 per workspace.
- Click Settings in the lower left corner of your Databricks workspace.
- Click User Settings.
- Go to the Access Tokens tab.
- Click the Generate New Token button.
- Optionally enter a description (comment) and expiration period.
Install the Databricks SQL Connector for Python library
Install the Databricks SQL Connector for Python library on your development machine by running pip install databricks-sql-connector
.
$ pip install databricks-sql-connector
Query data
The following code example demonstrates how to call the Databricks SQL Connector for Python to run a basic SQL command on a cluster or SQL endpoint.
from databricks import sql with sql.connect(server_hostname="<server-hostname>", http_path="<http-path>", access_token="<access-token>") as connection: with connection.cursor() as cursor: cursor.execute("SELECT * FROM <database-name>.<table-name> LIMIT 2") result = cursor.fetchall() for row in result: print(row)
Insert data
from databricks import sql with sql.connect(server_hostname="...", http_path="...", access_token="...") as connection: with connection.cursor() as cursor: cursor.execute("CREATE TABLE IF NOT EXISTS squares (x int, x_squared int)") squares = [(i, i * i) for i in range(100)] values = ",".join([f"({x}, {y})" for (x, y) in squares]) cursor.execute(f"INSERT INTO squares VALUES {values}")
Query metadata
from databricks import sql with sql.connect(server_hostname="...", http_path="...", access_token="...") as connection: with connection.cursor() as cursor: cursor.columns(schema_name="default", table_name="squares") print(cursor.fetchall())
Cursor and connection management
from databricks import sql connection = sql.connect(server_hostname="...", http_path="...", access_token="...") cursor = connection.cursor() cursor.execute("SELECT * from range(10)") print(cursor.fetchall()) cursor.close() connection.close()
Configure logging
import logging from databricks import sql logging.getLogger('databricks.sql').setLevel(logging.DEBUG) sql.connect(...)