OPTIMIZE

Optimizes the layout of Delta Lake data. Optionally optimize a subset of data or colocate data by column. If you do not specify colocation, bin-packing optimization is performed.

Syntax

OPTIMIZE table_name [WHERE predicate]
  [ZORDER BY (col_name1 [, ...] ) ]

Parameters

  • table_name – Identifies an existing Delta table. The name must not include a temporal specification.
  • WHERE – Optimize the subset of rows matching the given partition predicate. Only filters involving partition key attributes are supported.
  • ZORDER BY – Colocate column information in the same set of files. Co-locality is used by Delta Lake data-skipping algorithms to dramatically reduce the amount of data that needs to be read. You can specify multiple columns for ZORDER BY as a comma-separated list. However, the effectiveness of the locality drops with each additional column.

Examples

OPTIMIZE delta.`/data/events`

OPTIMIZE events

OPTIMIZE events WHERE date >= '2022-11-18'

OPTIMIZE events
WHERE date >= current_timestamp() - INTERVAL 1 day
ZORDER BY (eventType)