Read data stored in tables and views

  1. Databases and tables
  2. Views
  3. Reading from a Table/View
  4. Temporary Views

1. Databases and tables

An Azure Databricks database (schema) is a collection of tables. An Azure Databricks table is a collection of structured data. You can cache, filter, and perform any operations supported by Apache Spark DataFrames on Azure Databricks tables. You can query tables with Spark APIs and Spark SQL.

2. Views

Constructs a virtual table that has no physical data based on the result-set of a SQL query. ALTER VIEW and DROP VIEW only change metadata.

3. Reading from a Table/View

tempDF = spark.read.table("jack_db.jack_table1")
tempDF.printSchema()
display(tempDF)
%sql
SELECT * 
FROM jack_db.jack_table1
LIMIT(5)

Review: Reading from Tables

  • No job is executed – the schema is stored in the table definition on Databricks.
  • The file is stored on the DBFS. If we used JDBC, it would open the connection to the database and read it in. If we used an object store (like what is backing the DBFS), it would read the data from source.

4. Temporary Views

We can create a [temporary] view with createOrReplaceTempView()

# create a temporary view from the resulting DataFrame
tempDF.createOrReplaceTempView("jack_view1")
%sql
SELECT * 
FROM jack_view1 
ORDER BY user_id 
DESC LIMIT (5)

** Note #1: ** The method createOrReplaceTempView(..) is bound to the SparkSession meaning it will be discarded once the session ends.

** Note #2: ** On the other hand, the method createOrReplaceGlobalTempView(..) is bound to the spark application.*

Or to put that another way, I can use createOrReplaceTempView(..) in this notebook only. However, I can call createOrReplaceGlobalTempView(..) in this notebook and then access it from another.