From the Starburst Galaxy navigation menu, select Data > Data maintenance.
Data maintenance jobs run tasks that improve performance and reduce storage in Apache Iceberg tables. Supported tasks include data file compaction, statistics collection, and deletion of outdated snapshots and orphaned files.
To perform data maintenance operations on live tables, see data maintenance for Kafka streaming ingestion and data maintenance for file ingestion
The Data maintenance pane has the following levels:
To create a data maintenance job, click Create maintenance task in the top, catalog, or schema details levels.
Provide the following information in the Configure data maintenance dialog:
Maintenance task | Description |
---|---|
Compaction | Improves performance by optimizing your data file size. |
Profiling and statistics | Improves performance by analyzing the table and collecting statistics about your data. |
Snapshot expiration | Reduces storage by deleting data snapshots. |
Delete orphan files | Reduces storage by deleting orphaned data files. This rule includes files that are not part of a table. |
In the Execution details section, select an executing role and a cluster from the Select cluster the respective drop-down menus.
In the Job schedule section:
For Select frequency: Choose an hourly, daily, weekly, monthly, or annual schedule from the drop-down menu. The corresponding values depend on the schedule:
hh:mm
, then specify AM or PM.hh:mm
, specify AM or PM, then
select one or more days of the week.hh:mm
, specify AM or PM, then
select a date.MM/DD
hh:mm
. Specify AM or PM.For Enter cron expression: Enter the desired schedule in the form of a UNIX cron expression. For example, a cycle scheduled to run weekly at 9:30 AM on Monday, Wednesday, and Friday:
30 9 * * 1,3,5
Click Save.
All scheduled data maintenance jobs are listed in the Data maintenance pane beginning at the top details level.
As with other panes in Starburst Galaxy, the top row of this pane provides catalog > schema > table breadcrumbs to show which detail level you are on. Click the names in the breadcrumb list to navigate among the levels.
The header for the catalog and schema details levels include the symbol key, which explains the task symbols:
The Search field at the top, catalog, and schema details levels let you restrict the list to matching values.
The Last run status drop-down menu at the schema details level lets you restrict the list to jobs that are scheduled, running, completed, or failed.
The Maintenance task drop-down menu at the catalog and schema details levels lets you restrict the list to a single task type.
The header for the top details level shows the total number of catalogs with tasks, and provides a search bar and drop-down menus that let you customize which details appear in the list of catalogs.
To view data maintenance jobs of a certain status, use the Last run status drop-down menu. Use the search bar to find catalogs.
The list of catalogs has the following columns:
Tables with maintenance: The total number of table-level jobs.
To view catalog level details, click the name of a catalog from the top details level.
Catalog level details are organized in the following columns:
Schedule: The next scheduled run time.
To view schema level details, click the name of a schema from the catalog details level. The schema level details list can include individual tables or maintenance tasks set up to run for all tables in a schema.
Schema level details have the following columns:
The Status and Tasks columns remain blank until a task has been executed at least once.
Use the options menu to edit the task or to run it now without waiting for its start time.
For more information on individual data maintenance jobs, click a table name from the schema details level.
The title of the table details level task pane is the name of the table. The top portion at the pane provides a summary of the selected data maintenance job, a Run now button, and an options menu that allows you to edit the task.
The Task history section is organized in the following columns:
Elapsed time: The duration of data maintenance job.
All editing is performed at the schema and table details levels. The schema details level allows for bulk and individual job edits, while the table details level allows for individual job edits only.
To make bulk edits, go to the schema details level, and follow these steps:
To make individual edits:
Existing data maintenance jobs exclude tables that have separate maintenance schedules. To include these tables, delete the data maintenance job associated with them. After you delete the job, the previously excluded tables are included in the data maintenance job.
To delete jobs in bulk, go to the schema details level, and follow these steps:
To make individual deletes:
At the schema details level:
At the table details level:
Is the information on this page helpful?
Yes
No