The Databricks SQL Connector for Python

The Databricks SQL Connector for Python is a Python library that allows you to use Python code to run SQL commands on Databricks clusters and Databricks SQL endpoints.

Requirements

Get started

Gather the following information for the cluster or SQL endpoint that you want to use:

  • Cluster – The server hostname of the cluster. You can get this from the Server Hostname value in the Advanced Options > JDBC/ODBC tab for your cluster.
  • SQL endpoint – The server hostname of the SQL endpoint. You can get this from the Server Hostname value in the Connection Details tab for your SQL endpoint.

Generate a personal access token

The number of personal access tokens per user is limited to 600 per workspace.

  1. Click User Settings Icon Settings in the lower left corner of your Databricks workspace.
  2. Click User Settings.
  3. Go to the Access Tokens tab.
  4. Click the Generate New Token button.
  5. Optionally enter a description (comment) and expiration period.

Install the Databricks SQL Connector for Python library

Install the Databricks SQL Connector for Python library on your development machine by running pip install databricks-sql-connector.

$ pip install databricks-sql-connector

Query data

The following code example demonstrates how to call the Databricks SQL Connector for Python to run a basic SQL command on a cluster or SQL endpoint.

from databricks import sql

with sql.connect(server_hostname="<server-hostname>",
                 http_path="<http-path>",
                 access_token="<access-token>") as connection:
    with connection.cursor() as cursor:
        cursor.execute("SELECT * FROM <database-name>.<table-name> LIMIT 2")
        result = cursor.fetchall()

        for row in result:
          print(row)

Insert data

from databricks import sql

with sql.connect(server_hostname="...", http_path="...", access_token="...") as connection:
  with connection.cursor() as cursor:
    cursor.execute("CREATE TABLE IF NOT EXISTS squares (x int, x_squared int)")
    squares = [(i, i * i) for i in range(100)]
    values = ",".join([f"({x}, {y})" for (x, y) in squares])

    cursor.execute(f"INSERT INTO squares VALUES {values}")

Query metadata

from databricks import sql

with sql.connect(server_hostname="...", http_path="...", access_token="...") as connection:
  with connection.cursor() as cursor:
    cursor.columns(schema_name="default", table_name="squares")
    print(cursor.fetchall())

Cursor and connection management

from databricks import sql

connection = sql.connect(server_hostname="...", http_path="...", access_token="...")
cursor = connection.cursor()

cursor.execute("SELECT * from range(10)")
print(cursor.fetchall())

cursor.close()
connection.close()

Configure logging

import logging

from databricks import sql

logging.getLogger('databricks.sql').setLevel(logging.DEBUG)
sql.connect(...)

Convert a JSON string to a dict

Example1

import json

json_string = """
{
    "id": "abc123",
    "name": "Bob"
}
"""
json_object = json.loads(json_string)

print(type(json_object))  # <class 'dict'>
print(json_object["id"])  # abc123

data = json.dumps(json_object)
print(type(data))         # <class 'str'>
print(data)               # {"id": "abc123", "name": "Bob"}

Example2

import json

jsonString = '{"a":54, "b": {"c":87}}'
aDict = json.loads(jsonString)

print(aDict)
print(aDict['a'])
print(aDict['b'])
print(aDict['b']['c'])
{'a': 54, 'b': {'c': 87}}
54
{'c': 87}
87

Python and REST APIs

  1. ติดตั้ง requests
  2. ทดลองเรียกแบบ GET
  3. ทดลองเรียกแบบ GET ที่เป็น trust a self signed SSL certificate
  4. ทดลองเรียกแบบ POST

ทดลองเรียกแบบ POST

1. ติดตั้ง requests

python -m pip install requests

2. ทดลองเรียกแบบ GET

import requests
api_url = "https://jsonplaceholder.typicode.com/todos/1"
response = requests.get(api_url)

print(response.status_code)
print(response.headers["Content-Type"])
print(response.json())
200
application/json; charset=utf-8
{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}

3. ทดลองเรียกแบบ GET ที่เป็น trust a self signed SSL certificate

เช่น สร้างโปรเจ็กส์ webapi ขึ้นมาด้วย .Net 6 แล้วรันแบบ localhost ก็จะเป็นแบบ self signed SSL certificate

import requests
api_url = "https://localhost:7237/WeatherForecast"
response = requests.get(api_url)

print(response.status_code)
print(response.headers["Content-Type"])
print(response.json())

จะได้ error

requests.exceptions.SSLError: HTTPSConnectionPool(host='localhost', port=7237): Max retries exceeded with url: /WeatherForecast (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1129)')))

วิธีแก้

ให้ทำการ export certificate ไฟล์ ออกมาจาก chrome ก่อน โดยไปที่ More tools | Developer tools

ไปที่ Security | Overview แล้วคลิกที่ View certificate

ที่แท็ป Details เลือก Copy to File…

แล้วเลือก Base64 encoded X.509 (.CER)

ปรับโคีด Python ให้อ้างถึงไฟล์ .cer

import requests
api_url = "https://localhost:7237/WeatherForecast"
response = requests.get(api_url, verify='./webapiCS.cer')

print(response.status_code)
print(response.headers["Content-Type"])
print(response.json())
200
application/json; charset=utf-8
[{'date': '2022-05-18T15:16:09.823195+07:00', 'temperatureC': -3, 'temperatureF': 27, 'summary': 'Mild'}, {'date': '2022-05-19T15:16:09.823412+07:00', 'temperatureC': -7, 'temperatureF': 20, 'summary': 'Cool'}, {'date': '2022-05-20T15:16:09.823414+07:00', 'temperatureC': -17, 'temperatureF': 2, 'summary': 'Warm'}, {'date': '2022-05-21T15:16:09.8234141+07:00', 'temperatureC': 7, 'temperatureF': 44, 'summary': 'Warm'}, {'date': '2022-05-22T15:16:09.8234142+07:00', 'temperatureC': -4, 'temperatureF': 25, 'summary': 'Hot'}]

หรือแก้ไขด้วยการไม่ต้อง verify

r = requests.get(url, verify=False)

4. ทดลองเรียกแบบ POST

import requests
api_url = "https://jsonplaceholder.typicode.com/todos"
todo = {"userId": 1, "title": "Buy milk", "completed": False}
response = requests.post(api_url, json=todo)
response.json()

print(response.status_code)
print(response.headers["Content-Type"])
print(response.json())
201
application/json; charset=utf-8
{'userId': 1, 'title': 'Buy milk', 'completed': False, 'id': 201}

หรือ

import requests
import json
api_url = "https://jsonplaceholder.typicode.com/todos"
todo = {"userId": 1, "title": "Buy milk", "completed": False}
headers =  {"Content-Type":"application/json"}
response = requests.post(api_url, data=json.dumps(todo), headers=headers)
response.json()

print(response.status_code)
print(response.headers["Content-Type"])
print(response.json())

Python Virtual Environments

ติดตั้ง virtualenv

$ pip install virtualenv

สร้างไดเร็กทอรี

$ mkdir python-virtual-environments 
$ cd python-virtual-environments

Create a new virtual environment inside the directory ชื่อ env

# Python 2
$ virtualenv env

# or

# Python 3
$ python3 -m venv env

activate scripts บน linux

$ source env/bin/activate
(env) $

activate scripts บน Windows

> .\env\scripts\activate
(env) >

ถ้าเจอ Error ตามนี้

+ .\env\scripts\activate
+ ~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : SecurityError: (:) [], PSSecurityException
    + FullyQualifiedErrorId : UnauthorizedAccess

ให้แก้ไขด้วยการเปิด PowerShell แบบ “Run as Administrator” (How to enable execution of PowerShell scripts? – Super User) แล้วพิมพ์คำสั่ง

set-executionpolicy remotesigned

Python – Segmentation fault

เวลาเจอ error ว่า Segmentation fault โดยไม่มีคำอธิบายใดๆ จะไปไม่เป็น

เลยจะให้พ่น error ออกมาด้วย Python faulthandler module

First add the following to the top of your module.

import faulthandler; faulthandler.enable()

Then re-run your program with the faulthandler startup flag.

# pass as an argument
python -Xfaulthandler test.py

# Or as an environment variable.
PYTHONFAULTHANDLER=1 python test.py

ก็จะได้ประมาณนี้

$ /usr/local/bin/python3.7  -Xfaulthandler test.py
Fatal Python error: Segmentation fault

Current thread 0x00007fc9540b6740 (most recent call first):
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 261 in items
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 310 in __len__
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/core/series.py", line 599 in __len__
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 962 in _try_convert_data
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1172 in <lambda>
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1152 in _process_converter
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1172 in _try_convert_types
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 892 in parse
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 777 in _get_object_parser
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 755 in read
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 618 in read_json
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/util/_decorators.py", line 296 in wrapper
  File "/home/jack/.local/lib/python3.7/site-packages/pandas/util/_decorators.py", line 199 in wrapper
  File "test.py", line 4 in <module>
Segmentation fault

ปัญหา Error – lzma

UserWarning: Could not import the lzma module.

/home/ctlfw/.local/lib/python3.7/site-packages/pandas/compat/__init__.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
  warnings.warn(msg)

ทางแก้ติดตั้ง lzma-dev

For ubuntu: sudo apt-get install liblzma-dev

For centos: sudo yum install -y xz-devel

เข้าไปในโฟลเดอร์ที่ใช้ติดตั้ง Python แล้วรัน

Then sudo configure && sudo make && sudo make install

แปลงจาก .json เป็น .csv ด้วย Python 3.8

แปลงด้วย Pandas

ติดตั้ง Pandas

python -m pip install pandas

แปลงจาก .json เป็น .csv ด้วย Pandas

import pandas as pd
df = pd.read_json (r'test.json')
df.to_csv (r'test.csv', index = None)

แปลงด้วย csv, json

import csv, json
fileInput = r'test.json'
fileOutput = r'test.csv'
inputFile = open(fileInput, 'r', encoding='utf-8') #open json file
outputFile = open(fileOutput, 'w', encoding='utf-8') #load csv file
data = json.load(inputFile) #load json content
inputFile.close() #close the input file
output = csv.writer(outputFile) #create a csv.write
output.writerow(data[0].keys())  # header row
f = True
for row in data:
    output.writerow(row.values()) #values row

แต่บางทีจะมีบรรทัดวางเกินมา ก็ลบบรรทัดว่างทิ้ง

filename_ip = 'test.csv'
filename_op = 'test.csv'
file = open(filename_ip, 'r', encoding='utf-8')
text = file.read()
lines = text.split('\n')
new_text = ''
for i in range(0, len(lines)):
    if (lines[i]!=''):
        new_text += lines[i] + '\n'
f = open(filename_op, 'w', encoding='utf-8')
f.write(new_text)
f.close()

ส่วนวิธีนี้ยังไม่ได้ลอง Convert JSON to CSV in Python – GeeksforGeeks

Python 3.8 วาดกราฟด้วย Graphviz

ติดตั้ง graphviz บน Ubuntu

sudo apt install graphviz

ติดตั้ง graphviz บน Windows เปิด cmd ด้วยสิทธิ administrator

choco install graphviz

dependency-graph

ติดตั้งไพธอนไลบรารี graphviz-0.19

python3.8 -m pip install graphviz

ที่บรรทัดแรกของไฟล์ dependency_graph.py เพิ่ม

#!/usr/bin/python3.8

ทำให้ไฟล์ dependency_graph.py รันได้

chmod 755 dependency_graph.py

หาตัวอย่าง source code ภาษา c (GitHub – pvigier/ecs: A simple and easy to use entity-component-system C++ library) มาไว้ใน folder src แล้วทดลองรัน

./dependency_graph.py src/ out -f png

จะได้ไฟล์ภาพ out.png

usage: dependency_graph.py [-h] [-f {bmp,gif,jpg,png,pdf,svg}] [-v] [-c]
                           folder output

positional arguments:
  folder                Path to the folder to scan
  output                Path of the output file without the extension

optional arguments:
  -h, --help            show this help message and exit
  -f {bmp,gif,jpg,png,pdf,svg}, --format {bmp,gif,jpg,png,pdf,svg}
                        Format of the output
  -v, --view            View the graph
  -c, --cluster         Create a cluster for each subfolder

แต่ถ้ารันแล้วได้ error

'dot' is not recognized as an internal or external command

แสดงว่ายังติดตั้ง graphviz ไม่สำเร็จ หรือ หา path ไม่เจอ Graphviz’s dot tool on Windows

pydeps

ติดตั้ง pydeps

python -m pip install pydeps

ตัวอย่างการเรียกใช้

pydeps pydeps
pydeps pydeps --rankdir TB
pydeps pydeps --rankdir BT

ให้ output เป็นไฟล์ png

pydeps -T png pydeps

Python Regular expression

  • docs.python.org – re — Regular expression operations

ค้นหาตัวอักษรระหว่าง AAA และ ZZZ

import re

text = 'gfgfdAAA1234ZZZuijjk'

try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    # AAA, ZZZ not found in the original string
    found = '' # apply your error handling
print (found)
# found: 1234
import re

text = 'page-0188482009-20211001-0002.json'

try:
    found = re.search('page-\w+-(.+?)-', text).group(1)
except AttributeError:
    found = ''
print (found)
# found: 20211001

หาชื่อไฟล์ที่อยู่หลัง folder ที่กำหนด

import re

text = 'content/ptn_year=2021/ptn_month=10/ptn_day=01/content-20211001.json'

try:
    found = re.search('ptn_day=\w+/(.*)', text).group(1)
except AttributeError:
    found = ''
print (found)
# found: content-20211001.json

Python3 ติดต่อดาต้าเบส MySQL

Create Connection

import mysql.connector

mydb = mysql.connector.connect(
    host="localhost",
    user="yourusername",
    password="yourpassword"
)

print(mydb)
# <mysql.connector.connection_cext.CMySQLConnection object at 0x000002A56602E948>

หรือระบุ database

import mysql.connector

mydb = mysql.connector.connect(
    host="localhost",
    user="yourusername",
    password="yourpassword",
    database="mydatabase"
)

Creating a Database

import mysql.connector

mydb = mysql.connector.connect(
    host="localhost",
    user="yourusername",
    password="yourpassword"
)

mycursor = mydb.cursor()

mycursor.execute("CREATE DATABASE mydatabase")

Show databases

mycursor = mydb.cursor()

mycursor.execute("SHOW DATABASES")

for x in mycursor:
    print(x)

Creating a Table

mycursor.execute("CREATE TABLE customers (name VARCHAR(255), address VARCHAR(255))")
mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")

Show table

mycursor.execute("SHOW TABLES")

for x in mycursor:
  print(x)

Alter table

mycursor.execute("ALTER TABLE customers ADD COLUMN id INT AUTO_INCREMENT PRIMARY KEY")

Delete table

mycursor.execute("DROP TABLE customers")

Insert Into Table

sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)

mydb.commit()

print(mycursor.rowcount, "record inserted.")

Insert Multiple Rows

sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = [
    ('Peter', 'Lowstreet 4'),
    ('Amy', 'Apple st 652'),
    ('Hannah', 'Mountain 21'),
    ('Michael', 'Valley 345'),
    ('Sandy', 'Ocean blvd 2'),
    ('Betty', 'Green Grass 1'),
    ('Richard', 'Sky st 331'),
    ('Susan', 'One way 98'),
    ('Vicky', 'Yellow Garden 2'),
    ('Ben', 'Park Lane 38'),
    ('William', 'Central st 954'),
    ('Chuck', 'Main Road 989'),
    ('Viola', 'Sideway 1633')
]

mycursor.executemany(sql, val)

mydb.commit()

print(mycursor.rowcount, "was inserted.")

Select From a Table

We use the fetchall() method, which fetches all rows from the last executed statement.

mycursor.execute("SELECT * FROM customers")

myresult = mycursor.fetchall()

for x in myresult:
    print(x)

fetchone()

The fetchone() method will return the first row of the result:

mycursor.execute("SELECT * FROM customers")

myresult = mycursor.fetchone()

print(myresult)