VACUUM

VACUUM table_name [RETAIN num HOURS] [DRY RUN]

Parameters

  • table_name – Identifies an existing Delta table. The name must not include a temporal specification.
  • RETAIN num HOURS – The retention threshold.
  • DRY RUN – Return a list of up to 1000 files to be deleted.

แสดงรายชื่อไฟล์ที่จะถูกลบ

VACUUM table_name RETAIN 720 HOURS DRY RUN

ทำการลบไฟล์

VACUUM table_name RETAIN 720 HOURS
1 day24 HOURS
1 week168 HOURS
2 week336 HOURS
30 days720 HOURS

Configure data retention for time travel

To time travel to a previous version, you must retain both the log and the data files for that version.

The data files backing a Delta table are never deleted automatically; data files are deleted only when you run VACUUMVACUUM does not delete Delta log files; log files are automatically cleaned up after checkpoints are written.

By default you can time travel to a Delta table up to 30 days old unless you have:

  • Run VACUUM on your Delta table.
  • Changed the data or log file retention periods using the following table properties:
    • delta.logRetentionDuration = "interval <interval>": controls how long the history for a table is kept. The default is interval 30 days.Each time a checkpoint is written, Databricks automatically cleans up log entries older than the retention interval. If you set this config to a large enough value, many log entries are retained. This should not impact performance as operations against the log are constant time. Operations on history are parallel but will become more expensive as the log size increases.
    • delta.deletedFileRetentionDuration = "interval <interval>": controls how long ago a file must have been deleted before being a candidate for VACUUM. The default is interval 7 days.To access 30 days of historical data even if you run VACUUM on the Delta table, set delta.deletedFileRetentionDuration = "interval 30 days". This setting may cause your storage costs to go up.

ถ้ารัน VACUUM เลย data files ที่เกิน 7 วันจะถูกลบ

ถ้าจะเก็บ data files ให้มากกว่า 7 วัน โดยไม่ต้องมาคอยกำหนดค่า RETAIN num HOURS ให้ไป SET delta.deletedFileRetentionDuration ก่อน แล้วค่อยรัน VACUUM

SET and UNSET TBLPROPERTIES

ALTER TABLE table_name
   { RENAME TO clause |
     ADD COLUMN clause |
     ALTER COLUMN clause |
     DROP COLUMN clause |
     RENAME COLUMN clause |
     ADD CONSTRAINT clause |
     DROP CONSTRAINT clause |
     ADD PARTITION clause |
     DROP PARTITION clause |
     RENAME PARTITION clause |
     RECOVER PARTITIONS clause |
     SET TBLPROPERTIES clause |
     UNSET TBLPROPERTIES clause |
     SET SERDE clause |
     SET LOCATION clause |
     SET OWNER TO clause }

Example

ALTER TABLE dbx.tab1 SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 30 days');
ALTER TABLE dbx.tab1 SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 30 days');

ALTER TABLE dbx.tab1 UNSET TBLPROPERTIES ('delta.deletedFileRetentionDuration');