Databricks SQL – Tables

Note that I have “USING DELTA” at the end of the CREATE statment. This is optional because Delta is the default table type.

USE demo;
CREATE OR REPLACE TABLE managed_table (width INT, length INT, height INT) USING DELTA;
INSERT INTO managed_table VALUES (3, 2, 1);
SELECT * FROM managed_table;
USE demo;
DESCRIBE EXTENDED managed_table;
DESCRIBE TABLE EXTENDED managed_table;
USE demo;
SHOW CREATE TABLE managed_table;
USE demo;
DESCRIBE HISTORY managed_table;
USE demo;
SELECT *
FROM managed_table
VERSION AS OF 0
RESTORE TABLE employee TO TIMESTAMP AS OF '2022-08-02 00:00:00';
RESTORE TABLE employee TO VERSION AS OF 1;
USE demo;
DROP TABLE IF EXISTS managed_table;

CREATE TABLE [USING] (Databricks SQL)

Defines a managed or external table, optionally using a data source.

{ { [CREATE OR] REPLACE TABLE | CREATE TABLE [ IF NOT EXISTS ] }
  table_name
  [ column_specification ] [ USING data_source ]
  [ table_clauses ]
  [ AS query ] }

column_specification
  ( { column_identifier column_type [ NOT NULL ]
      [ GENERATED ALWAYS AS ( expr ) |
        GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( [ START WITH start ] [ INCREMENT BY step ] ) ] ]
      [ COMMENT column_comment ] } [, ...] )

table_clauses
  { OPTIONS clause |
    PARTITIONED BY clause |
    clustered_by_clause |
    LOCATION path [ WITH ( CREDENTIAL credential_name ) ] |
    COMMENT table_comment |
    TBLPROPERTIES clause } [...]

clustered_by_clause
  { CLUSTERED BY ( cluster_column [, ...] )
    [ SORTED BY ( { sort_column [ ASC | DESC ] } [, ...] ) ]
    INTO num_buckets BUCKETS }

Examples

-- Creates a Delta table
> CREATE TABLE student (id INT, name STRING, age INT);

-- Use data from another table
> CREATE TABLE student_copy AS SELECT * FROM student;

-- Creates a CSV table from an external directory
> CREATE TABLE student USING CSV LOCATION '/mnt/csv_files';

-- Specify table comment and properties
> CREATE TABLE student (id INT, name STRING, age INT)
    COMMENT 'this is a comment'
    TBLPROPERTIES ('foo'='bar');

-- Specify table comment and properties with different clauses order
> CREATE TABLE student (id INT, name STRING, age INT)
    TBLPROPERTIES ('foo'='bar')
    COMMENT 'this is a comment';

-- Create partitioned table
> CREATE TABLE student (id INT, name STRING, age INT)
    PARTITIONED BY (age);

-- Create a table with a generated column
> CREATE TABLE rectangles(a INT, b INT,
                          area INT GENERATED ALWAYS AS (a * b));

DESCRIBE TABLE (Databricks SQL)

Returns the basic metadata information of a table. The metadata information includes column name, column type and column comment. Optionally you can specify a partition spec or column name to return the metadata pertaining to a partition or column respectively.

{ DESC | DESCRIBE } [ TABLE ] [ EXTENDED | FORMATTED ] table_name { [ PARTITION clause ] | [ column_name ] }

SHOW CREATE TABLE (Databricks SQL)

Returns the CREATE TABLE statement or CREATE VIEW statement that was used to create a given table or view.

SHOW CREATE TABLE { table_name | view_name }

Examples

> CREATE TABLE test (c INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    TBLPROPERTIES ('prop1' = 'value1', 'prop2' = 'value2');

> SHOW CREATE TABLE test;
                                       createtab_stmt
 ----------------------------------------------------
 CREATE TABLE `default`.`test` (`c` INT)
 USING text
 TBLPROPERTIES (
   'transient_lastDdlTime' = '1586269021',
   'prop1' = 'value1',
   'prop2' = 'value2')

DESCRIBE HISTORY (Databricks SQL)

You can retrieve information on the operations, user, timestamp, and so on for each write to a Delta table by running the history command. The operations are returned in reverse chronological order. By default table history is retained for 30 days.

ดู version และ timestamp ของ delta table

DESCRIBE HISTORY table_name

Examples

DESCRIBE HISTORY '/data/events/'          -- get the full history of the table

DESCRIBE HISTORY delta.`/data/events/`

DESCRIBE HISTORY '/data/events/' LIMIT 1  -- get the last operation only

DESCRIBE HISTORY eventsTable
from delta.tables import *

deltaTable = DeltaTable.forPath(spark, pathToTable)

fullHistoryDF = deltaTable.history()    # get the full history of the table

lastOperationDF = deltaTable.history(1) # get the last operation

RESTORE (Delta Lake on Databricks)

Restores a Delta table to an earlier state. Restoring to an earlier version number or a timestamp is supported.

RESTORE [ TABLE ] table_name [ TO ] time_travel_version

time_travel_version
 { TIMESTAMP AS OF timestamp_expression |
   VERSION AS OF version }

DROP TABLE (Databricks SQL)

DROP TABLE [ IF EXISTS ] table_name

Examples

-- Assumes a table named `employeetable` exists.
> DROP TABLE employeetable;

-- Assumes a table named `employeetable` exists in the `userdb` schema
> DROP TABLE userdb.employeetable;

-- Assumes a table named `employeetable` does not exist.
-- Throws exception
> DROP TABLE employeetable;
  Error: Table or view not found: employeetable;

-- Assumes a table named `employeetable` does not exist,Try with IF EXISTS
-- this time it will not throw exception
> DROP TABLE IF EXISTS employeetable;

Dropping External Tables Does Not Delete Data

Even though we dropped the table, we are still able to query the data directly in the filesystem because it still exists in the object store.

Now, this is one of the coolest features of Databricks SQL. We’ve talked about how the use of schemas and tables is just an organizational contruct. The data files located in this location can be queried directly, even though they are not part of a table or schema. We use tables and schemas simply to organize data in a way familiar to you.

SELECT * FROM delta.`wasbs://courseware@dbacademy.blob.core.windows.net/data-analysis-with-databricks/v01/sales`;