Databricks CLI

CLI

Install the CLI

pip install databricks-cli

Update the CLI

pip install databricks-cli --upgrade

Set up authentication

databricks configure --token

After you complete the prompts, your access credentials are stored in the file ~/.databrickscfg on Unix, Linux, or macOS, or %USERPROFILE%\.databrickscfg on Windows. The file contains a default profile entry:

[DEFAULT]
host = <workspace-URL>
token = <personal-access-token>

Use the CLI

$ databricks --help
Usage: databricks [OPTIONS] COMMAND [ARGS]...

Options:
  -v, --version   x.xx.x
  --debug         Debug Mode. Shows full stack trace on error.
  --profile TEXT  CLI connection profile to use.
                  The default profile is "DEFAULT".

  -h, --help      Show this message and exit.

Commands:
  clusters        Utility to interact with Databricks clusters.
  configure       Configures host and authentication info for the CLI.
  fs              Utility to interact with DBFS.
  groups          Utility to interact with Databricks groups.
  instance-pools  Utility to interact with Databricks instance pools.
  jobs            Utility to interact with jobs.
  libraries       Utility to interact with libraries.
  pipelines       Utility to interact with the Databricks Delta Pipelines.
  runs            Utility to interact with the jobs runs.
  secrets         Utility to interact with Databricks secret API.
  stack           [Beta] Utility to deploy and download Databricks resource
                  stacks.

  workspace       Utility to interact with the Databricks workspace.
databricks jobs list --profile test
databricks jobs list --profile test--output JSON | jq '.jobs[] | select(.job_id == 123) | .settings'
databricks clusters list --profile test
databricks clusters list --profile test --output JSON | jq '[ .clusters[] | { name: .cluster_name, id: .cluster_id } ]'

JSON string parameters

databricks jobs run-now --job-id 9 --jar-params '["20180505", "alantest"]'

Runs CLI

Requirements to call the Jobs REST API 2.0

Update the CLI to version 0.16.0 or above

Run the command 

databricks jobs configure --version=2.0

This adds the setting jobs-api-version = 2.0 to the file ~/.databrickscfg on Unix, Linux, or macOS, or %USERPROFILE%\.databrickscfg on Windows. All job runs CLI (and jobs CLI) subcommands will call the Jobs REST API 2.0 by default.

Subcommands and general usage

$ databricks runs --help
Usage: databricks runs [OPTIONS] COMMAND [ARGS]...

  Utility to interact with jobs runs.

Options:
  -v, --version   0.11.0
  --debug         Debug Mode. Shows full stack trace on error.
  --profile TEXT  CLI connection profile to use. The default profile is
                  "DEFAULT".

  -h, --help      Show this message and exit.

Commands:
  cancel      Cancels the run specified.
  get         Gets the metadata about a run in json form.
  get-output  Gets the output of a run The output schema is documented...
  list        Lists job runs.
  submit      Submits a one-time run.

Get the output of a run

databricks runs get-output --run-id 119
{
  "metadata": {
    "job_id": 239,
    "run_id": 119,
    "number_in_job": 1,
    "original_attempt_run_id": 119,
    "state": {
      "life_cycle_state": "TERMINATED",
      "result_state": "SUCCESS",
      "state_message": ""
    },
    "task": {
      "notebook_task": {
        "notebook_path": "/Users/someone@example.com/notebooks/my-notebook.ipynb"
      }
    },
    "cluster_spec": {
      "new_cluster": {
        "spark_version": "8.1.x-scala2.12",
        "aws_attributes": {
          "zone_id": "us-west-2c",
          "availability": "SPOT_WITH_FALLBACK"
        },
        "node_type_id": "m5d.large",
        "enable_elastic_disk": false,
        "num_workers": 1
      }
    },
    "cluster_instance": {
      "cluster_id": "1234-567890-abcd123",
      "spark_context_id": "1234567890123456789"
    },
    "start_time": 1618510327335,
    "setup_duration": 191000,
    "execution_duration": 41000,
    "cleanup_duration": 2000,
    "end_time": 1618510561615,
    "trigger": "ONE_TIME",
    "creator_user_name": "someone@example.com",
    "run_name": "my-notebook-run",
    "run_page_url": "https://dbc-a1b2345c-d6e7.cloud.databricks.com/?o=1234567890123456#job/239/run/1",
    "run_type": "JOB_RUN",
    "attempt_number": 0
  },
  "notebook_output": {}
}

โดย notebook_output จะได้ค่ามาจาก dbutils.notebook.exit() ใน notebook Jobs API 2.0 | Databricks on AWS