Optimizes the layout of Delta Lake data. Optionally optimize a subset of data or colocate data by column. If you do not specify colocation, bin-packing optimization is performed.
Syntax
OPTIMIZE table_name [WHERE predicate] [ZORDER BY (col_name1 [, ...] ) ]
Parameters
- table_name – Identifies an existing Delta table. The name must not include a temporal specification.
WHERE– Optimize the subset of rows matching the given partition predicate. Only filters involving partition key attributes are supported.ZORDER BY– Colocate column information in the same set of files. Co-locality is used by Delta Lake data-skipping algorithms to dramatically reduce the amount of data that needs to be read. You can specify multiple columns forZORDER BYas a comma-separated list. However, the effectiveness of the locality drops with each additional column.
Examples
OPTIMIZE delta.`/data/events` OPTIMIZE events OPTIMIZE events WHERE date >= '2022-11-18' OPTIMIZE events WHERE date >= current_timestamp() - INTERVAL 1 day ZORDER BY (eventType)