- List available utilities
- Data utility (dbutils.data)
- File system utility (dbutils.fs)
- Library utility (dbutils.library)
- Notebook utility (dbutils.notebook)
- Secrets utility (dbutils.secrets)
- Widgets utility (dbutils.widgets)
- Databricks Utilities API library
- Limitations
Databricks Utilities – dbutils
- You can access the DBFS through the Databricks Utilities class (and other file IO routines).
- An instance of DBUtils is already declared for us as
dbutils
. - For in-notebook documentation on DBUtils you can execute the command
dbutils.help()
.
1. List available utilities
dbutils.help()
List available commands for a utility
dbutils.help?
dbutils.fs.help()
Display help for a command
dbutils.fs.help('cp')
2. Data utility (dbutils.data)
Commands: summarize
dbutils.data.help()
summarize command (dbutils.data.summarize)
Calculates and displays summary statistics of an Apache Spark DataFrame or pandas DataFrame. This command is available for Python, Scala and R.
df = spark.read.format('csv').load( '/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv', header=True, inferSchema=True ) dbutils.data.summarize(df)
Note that the visualization uses SI notation to concisely render numerical values smaller than 0.01 or larger than 10000.
3. File system utility (dbutils.fs)
Commands: cp, head, ls, mkdirs, mount, mounts, mv, put, refreshMounts, rm, unmount, updateMount
cp command (dbutils.fs.cp)
Copies a file or directory, possibly across filesystems.
dbutils.fs.cp("/FileStore/old_file.txt", "/tmp/new/new_file.txt")
head command (dbutils.fs.head)
Returns up to the specified maximum number bytes of the given file. The bytes are returned as a UTF-8 encoded string.
dbutils.fs.head("/tmp/my_file.txt", 25)
ls command (dbutils.fs.ls)
Lists the contents of a directory.
dbutils.fs.ls("/tmp")
files = dbutils.fs.ls("/mnt/training/") for fileInfo in files: print(fileInfo.path) print("-"*80)
display(..)
Besides printing each item returned from dbutils.fs.ls(..)
we can also pass that collection to another Databricks specific command called display(..)
.
files = dbutils.fs.ls("/mnt/training/") display(files)
mkdirs command (dbutils.fs.mkdirs)
Creates the given directory if it does not exist. Also creates any necessary parent directories.
dbutils.fs.mkdirs("/tmp/parent/child/grandchild")
mount command (dbutils.fs.mount)
Mounts the specified source directory into DBFS at the specified mount point.
dbutils.fs.mount( source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net", mount_point = "/mnt/<mount-name>", extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
mounts command (dbutils.fs.mounts)
Displays information about what is currently mounted within DBFS.
dbutils.fs.mounts()
mounts = dbutils.fs.mounts() for mount in mounts: print(mount.mountPoint + " >> " + mount.source) print("-"*80)
mv command (dbutils.fs.mv)
Moves a file or directory, possibly across filesystems. A move is a copy followed by a delete, even for moves within filesystems.
dbutils.fs.mv("/FileStore/my_file.txt", "/tmp/parent/child/grandchild")
put command (dbutils.fs.put)
Writes the specified string to a file. The string is UTF-8 encoded.
dbutils.fs.put("/tmp/hello_db.txt", "Hello, Databricks!", True)
refreshMounts command (dbutils.fs.refreshMounts)
Forces all machines in the cluster to refresh their mount cache, ensuring they receive the most recent information.
dbutils.fs.refreshMounts()
rm command (dbutils.fs.rm)
Removes a file or directory.
dbutils.fs.rm("/tmp/hello_db.txt")
unmount command (dbutils.fs.unmount)
Deletes a DBFS mount point.
dbutils.fs.unmount("/mnt/<mount-name>")
updateMount command (dbutils.fs.updateMount)
Similar to the dbutils.fs.mount
command, but updates an existing mount point instead of creating a new one. Returns an error if the mount point is not present.
dbutils.fs.updateMount( source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net", mount_point = "/mnt/<mount-name>", extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
4. Library utility (dbutils.library)
The library utility is deprecated.
Utilities for session isolated libraries
Commands: install, installPyPI, list, restartPython, updateCondaEnv
5. Notebook utility (dbutils.notebook)
The notebook utility allows you to chain together notebooks and act on their results. See Notebook workflows.
dbutils.notebook.help()
6. Secrets utility (dbutils.secrets)
Commands: get, getBytes, list, listScopes
The secrets utility allows you to store and access sensitive credential information without making them visible in notebooks. See Secret management and Use the secrets in a notebook. To list the available commands, run dbutils.secrets.help()
.
dbutils.secrets.help()
7. Widgets utility (dbutils.widgets)
Commands: combobox, dropdown, get, getArgument, multiselect, remove, removeAll, text
The widgets utility allows you to parameterize notebooks. See Widgets. To list the available commands, run dbutils.widgets.help()
.
combobox command (dbutils.widgets.combobox)
Creates and displays a combobox widget with the specified programmatic name, default value, choices, and optional label.
dbutils.widgets.combobox( name='fruits_combobox', defaultValue='banana', choices=['apple', 'banana', 'coconut', 'dragon fruit'], label='Fruits' ) print(dbutils.widgets.get("fruits_combobox")) # banana
dropdown command (dbutils.widgets.dropdown)
Creates and displays a dropdown widget with the specified programmatic name, default value, choices, and optional label.
dbutils.widgets.dropdown( name='toys_dropdown', defaultValue='basketball', choices=['alphabet blocks', 'basketball', 'cape', 'doll'], label='Toys' ) print(dbutils.widgets.get("toys_dropdown")) # basketball
get command (dbutils.widgets.get)
Gets the current value of the widget with the specified programmatic name. This programmatic name can be either:
dbutils.widgets.get('fruits_combobox') # banana
getArgument command (dbutils.widgets.getArgument)
Gets the current value of the widget with the specified programmatic name. If the widget does not exist, an optional message can be returned.
This command is deprecated. Use dbutils.widgets.get instead.
dbutils.widgets.getArgument('fruits_combobox', 'Error: Cannot find fruits combobox') # Deprecation warning: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value. # Out[3]: 'banana'
multiselect command (dbutils.widgets.multiselect)
Creates and displays a multiselect widget with the specified programmatic name, default value, choices, and optional label.
dbutils.widgets.multiselect( name='days_multiselect', defaultValue='Tuesday', choices=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], label='Days of the Week' ) print(dbutils.widgets.get("days_multiselect")) # Tuesday
remove command (dbutils.widgets.remove)
Removes the widget with the specified programmatic name.
If you add a command to remove a widget, you cannot add a subsequent command to create a widget in the same cell. You must create the widget in another cell.
dbutils.widgets.remove('fruits_combobox')
removeAll command (dbutils.widgets.removeAll)
Removes all widgets from the notebook.
If you add a command to remove all widgets, you cannot add a subsequent command to create any widgets in the same cell. You must create the widgets in another cell.
dbutils.widgets.removeAll()
text command (dbutils.widgets.text)
Creates and displays a text widget with the specified programmatic name, default value, and optional label.
dbutils.widgets.text( name='your_name_text', defaultValue='Enter your name', label='Your name' ) print(dbutils.widgets.get("your_name_text")) # Enter your name
8. Databricks Utilities API library
To accelerate application development, it can be helpful to compile, build, and test applications before you deploy them as production jobs. To enable you to compile against Databricks Utilities, Databricks provides the dbutils-api
library.
9. Limitations
Calling dbutils
inside of executors can produce unexpected results or potentially result in errors.
If you need to run file system operations on executors using dbutils
, there are several faster and more scalable alternatives available:
- For file copy or move operations, you can check a faster option of running filesystem operations described in Parallelize filesystem operations.
- For file system list and delete operations, you can refer to parallel listing and delete methods utilizing Spark in How to list and delete files faster in Databricks.
For information about executors, see Cluster Mode Overview on the Apache Spark website.