1. สร้างตาราง students
%sql CREATE TABLE students ( id INT, name STRING, value DOUBLE);
ดูคำสั่งสร้างตารางนี้ด้วย SHOW CREATE TABLE
%sql SHOW CREATE TABLE students
%sql CREATE TABLE spark_catalog.default.students ( id INT, name STRING, value DOUBLE) USING delta TBLPROPERTIES ( 'delta.minReaderVersion' = '1', 'delta.minWriterVersion' = '2')
Using DESCRIBE EXTENDED allows us to see important metadata about our table.
%sql DESCRIBE EXTENDED students

%python
df1 = spark.sql('DESCRIBE EXTENDED students')
df1.show()
+--------------------+--------------------+-------+ | col_name| data_type|comment| +--------------------+--------------------+-------+ | id| int| null| | name| string| null| | value| double| null| | | | | |# Detailed Table ...| | | | Catalog| spark_catalog| | | Database| default| | | Table| students| | | Type| MANAGED| | | Location|dbfs:/user/hive/w...| | | Provider| delta| | | Owner| root| | | Is_managed_location| true| | | Table Properties|[delta.minReaderV...| | +--------------------+--------------------+-------+
DESCRIBE DETAIL is another command that allows us to explore table metadata.
%sql DESCRIBE DETAIL students

%python
df2 = spark.sql('DESCRIBE DETAIL students')
df2.show(vertical=True)
-RECORD 0--------------------------------
format | delta
id | 46681f33-7201-4c6...
name | spark_catalog.def...
description | null
location | dbfs:/user/hive/w...
createdAt | 2023-04-17 16:24:...
lastModified | 2023-04-17 16:24:32
partitionColumns | []
numFiles | 0
sizeInBytes | 0
properties | {}
minReaderVersion | 1
minWriterVersion | 2
ดู Delta Lake Files
%python
display(dbutils.fs.ls('dbfs:/user/hive/warehouse/students'))

%python
li_file = dbutils.fs.ls('dbfs:/user/hive/warehouse/students')
df3 = sqlContext.createDataFrame(li_file)
df3.show()
+--------------------+--------------------+----+----------------+ | path| name|size|modificationTime| +--------------------+--------------------+----+----------------+ |dbfs:/user/hive/w...| _delta_log/| 0| 1681750558289| +--------------------+--------------------+----+----------------+
Reviewing Delta Lake Transactions
%sql DESCRIBE HISTORY students

%python
df4 = spark.sql('DESCRIBE HISTORY students')
df4.show(vertical=True)
-RECORD 0-----------------------------------
version | 0
timestamp | 2023-04-17 16:24:32
userId | 8501686721698164
userName | odl_user_915759@d...
operation | CREATE TABLE
operationParameters | {isManaged -> tru...
job | null
notebook | {1477724271071511}
clusterId | 0415-162149-6ai590aw
readVersion | null
isolationLevel | WriteSerializable
isBlindAppend | true
operationMetrics | {}
userMetadata | null
engineInfo | Databricks-Runtim...
2. เพิ่มข้อมูล 3 ครั้ง
%sql INSERT INTO students VALUES (1, "Yve", 1.0); INSERT INTO students VALUES (2, "Omar", 2.5); INSERT INTO students VALUES (3, "Elia", 3.3);
%python
df2 = spark.sql('DESCRIBE DETAIL students')
df2.show(vertical=True)
RECORD 0--------------------------------
format | delta
id | 46681f33-7201-4c6...
name | spark_catalog.def...
description | null
location | dbfs:/user/hive/w...
createdAt | 2023-04-17 16:24:...
lastModified | 2023-04-17 16:51:01
partitionColumns | []
numFiles | 3
sizeInBytes | 2613
properties | {}
minReaderVersion | 1
minWriterVersion | 2
%python
li_file = dbutils.fs.ls('dbfs:/user/hive/warehouse/students')
df3 = sqlContext.createDataFrame(li_file)
df3.show()
+--------------------+--------------------+----+----------------+ | path| name|size|modificationTime| +--------------------+--------------------+----+----------------+ |dbfs:/user/hive/w...| _delta_log/| 0| 1681750558289| |dbfs:/user/hive/w...|part-00000-1d6df3...| 868| 1681750256000| |dbfs:/user/hive/w...|part-00000-57bc12...| 872| 1681750261000| |dbfs:/user/hive/w...|part-00000-ec8db3...| 873| 1681750259000| +--------------------+--------------------+----+----------------+

%python display(spark.sql(f"SELECT * FROM json.`dbfs:/user/hive/warehouse/students/_delta_log/00000000000000000001.json`"))

รันคำสั่งนี้จะ error
%sql SELECT * FROM parquet.`dbfs:/user/hive/warehouse/students/part-00000-1d6df344-3187-42fd-8591-df4ed47b403f.c000.snappy.parquet`
AnalysisException: Incompatible format detected.
A transaction log for Delta was found at `dbfs:/user/hive/warehouse/students/_delta_log`,
but you are trying to read from `dbfs:/user/hive/warehouse/students/part-00000-57bc1277-813b-41c0-990d-4ba8fd05c9b1.c000.snappy.parquet` using format("parquet"). You must use
'format("delta")' when reading and writing to a delta table.
To disable this check, SET spark.databricks.delta.formatCheck.enabled=false
To learn more about Delta, see https://docs.databricks.com/delta/index.html; line 1 pos 14
ให้ SET ค่านี้ก่อน
%sql SET spark.databricks.delta.formatCheck.enabled=false
รันใหม่จะได้ละ
%sql SELECT * FROM parquet.`dbfs:/user/hive/warehouse/students/part-00000-1d6df344-3187-42fd-8591-df4ed47b403f.c000.snappy.parquet`
