Databricks Utilities (dbutils)

  1. List available utilities
  2. Data utility (dbutils.data)
  3. File system utility (dbutils.fs)
  4. Library utility (dbutils.library)
  5. Notebook utility (dbutils.notebook)
  6. Secrets utility (dbutils.secrets)
  7. Widgets utility (dbutils.widgets)
  8. Databricks Utilities API library
  9. Limitations

1. List available utilities

dbutils.help()

List available commands for a utility

dbutils.fs.help()

Display help for a command

dbutils.fs.help("cp")

2. Data utility (dbutils.data)

Commandssummarize

dbutils.data.help()

summarize command (dbutils.data.summarize)

Calculates and displays summary statistics of an Apache Spark DataFrame or pandas DataFrame. This command is available for Python, Scala and R.

df = spark.read.format('csv').load(
  '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv',
  header=True,
  inferSchema=True
)
dbutils.data.summarize(df)

Note that the visualization uses SI notation to concisely render numerical values smaller than 0.01 or larger than 10000.

3. File system utility (dbutils.fs)

CommandscpheadlsmkdirsmountmountsmvputrefreshMountsrmunmountupdateMount

cp command (dbutils.fs.cp)

Copies a file or directory, possibly across filesystems.

dbutils.fs.cp("/FileStore/old_file.txt", "/tmp/new/new_file.txt")

head command (dbutils.fs.head)

Returns up to the specified maximum number bytes of the given file. The bytes are returned as a UTF-8 encoded string.

dbutils.fs.head("/tmp/my_file.txt", 25)

ls command (dbutils.fs.ls)

Lists the contents of a directory.

dbutils.fs.ls("/tmp")
files = dbutils.fs.ls("/mnt/training/")

for fileInfo in files:
  print(fileInfo.path)

print("-"*80)

display(..)

Besides printing each item returned from dbutils.fs.ls(..) we can also pass that collection to another Databricks specific command called display(..).

files = dbutils.fs.ls("/mnt/training/")

display(files)

mkdirs command (dbutils.fs.mkdirs)

Creates the given directory if it does not exist. Also creates any necessary parent directories.

dbutils.fs.mkdirs("/tmp/parent/child/grandchild")

mount command (dbutils.fs.mount)

Mounts the specified source directory into DBFS at the specified mount point.

dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})

mounts command (dbutils.fs.mounts)

Displays information about what is currently mounted within DBFS.

dbutils.fs.mounts()
mounts = dbutils.fs.mounts()

for mount in mounts:
  print(mount.mountPoint + " >> " + mount.source)

print("-"*80)

mv command (dbutils.fs.mv)

Moves a file or directory, possibly across filesystems. A move is a copy followed by a delete, even for moves within filesystems.

dbutils.fs.mv("/FileStore/my_file.txt", "/tmp/parent/child/grandchild")

put command (dbutils.fs.put)

Writes the specified string to a file. The string is UTF-8 encoded.

dbutils.fs.put("/tmp/hello_db.txt", "Hello, Databricks!", True)

refreshMounts command (dbutils.fs.refreshMounts)

Forces all machines in the cluster to refresh their mount cache, ensuring they receive the most recent information.

dbutils.fs.refreshMounts()

rm command (dbutils.fs.rm)

Removes a file or directory.

dbutils.fs.rm("/tmp/hello_db.txt")

unmount command (dbutils.fs.unmount)

Deletes a DBFS mount point.

dbutils.fs.unmount("/mnt/<mount-name>")

updateMount command (dbutils.fs.updateMount)

Similar to the dbutils.fs.mount command, but updates an existing mount point instead of creating a new one. Returns an error if the mount point is not present.

dbutils.fs.updateMount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})

4. Library utility (dbutils.library)

The library utility is deprecated.

Utilities for session isolated libraries

CommandsinstallinstallPyPIlistrestartPythonupdateCondaEnv

5. Notebook utility (dbutils.notebook)

Commandsexitrun

The notebook utility allows you to chain together notebooks and act on their results. See Notebook workflows.

6. Secrets utility (dbutils.secrets)

CommandsgetgetByteslistlistScopes

The secrets utility allows you to store and access sensitive credential information without making them visible in notebooks. See Secret management and Use the secrets in a notebook. To list the available commands, run dbutils.secrets.help().

7. Widgets utility (dbutils.widgets)

CommandscomboboxdropdowngetgetArgumentmultiselectremoveremoveAlltext

The widgets utility allows you to parameterize notebooks. See Widgets. To list the available commands, run dbutils.widgets.help().

combobox command (dbutils.widgets.combobox)

Creates and displays a combobox widget with the specified programmatic name, default value, choices, and optional label.

dbutils.widgets.combobox(
  name='fruits_combobox',
  defaultValue='banana',
  choices=['apple', 'banana', 'coconut', 'dragon fruit'],
  label='Fruits'
)

print(dbutils.widgets.get("fruits_combobox"))

# banana

dropdown command (dbutils.widgets.dropdown)

Creates and displays a dropdown widget with the specified programmatic name, default value, choices, and optional label.

dbutils.widgets.dropdown(
  name='toys_dropdown',
  defaultValue='basketball',
  choices=['alphabet blocks', 'basketball', 'cape', 'doll'],
  label='Toys'
)

print(dbutils.widgets.get("toys_dropdown"))

# basketball

get command (dbutils.widgets.get)

Gets the current value of the widget with the specified programmatic name. This programmatic name can be either:

dbutils.widgets.get('fruits_combobox')

# banana

getArgument command (dbutils.widgets.getArgument)

Gets the current value of the widget with the specified programmatic name. If the widget does not exist, an optional message can be returned.

This command is deprecated. Use dbutils.widgets.get instead.

dbutils.widgets.getArgument('fruits_combobox', 'Error: Cannot find fruits combobox')

# Deprecation warning: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.
# Out[3]: 'banana'

multiselect command (dbutils.widgets.multiselect)

Creates and displays a multiselect widget with the specified programmatic name, default value, choices, and optional label.

dbutils.widgets.multiselect(
  name='days_multiselect',
  defaultValue='Tuesday',
  choices=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
    'Friday', 'Saturday', 'Sunday'],
  label='Days of the Week'
)

print(dbutils.widgets.get("days_multiselect"))

# Tuesday

remove command (dbutils.widgets.remove)

Removes the widget with the specified programmatic name.

If you add a command to remove a widget, you cannot add a subsequent command to create a widget in the same cell. You must create the widget in another cell.

dbutils.widgets.remove('fruits_combobox')

removeAll command (dbutils.widgets.removeAll)

Removes all widgets from the notebook.

If you add a command to remove all widgets, you cannot add a subsequent command to create any widgets in the same cell. You must create the widgets in another cell.

dbutils.widgets.removeAll()

text command (dbutils.widgets.text)

Creates and displays a text widget with the specified programmatic name, default value, and optional label.

dbutils.widgets.text(
  name='your_name_text',
  defaultValue='Enter your name',
  label='Your name'
)

print(dbutils.widgets.get("your_name_text"))

# Enter your name

8. Databricks Utilities API library

To accelerate application development, it can be helpful to compile, build, and test applications before you deploy them as production jobs. To enable you to compile against Databricks Utilities, Databricks provides the dbutils-api library. 

9. Limitations

Calling dbutils inside of executors can produce unexpected results or potentially result in errors.

If you need to run file system operations on executors using dbutils, there are several faster and more scalable alternatives available:

For information about executors, see Cluster Mode Overview on the Apache Spark website.