Hydra
Search…
πŸ”
Scheduler
Hydra Scheduler is used to schedule tasks at a regular interval on your data warehouse.
Hydra Scheduler is a cron-style task scheduler that can run arbitrary SQL inside your data warehouse on any schedule. You can use tasks to move data between data sources (using foreign data wrappers), move data from row to columnar tables, perform transformations, refresh materialized views, or any other regular maintenance tasks.
Scheduler uses the standard cron syntax:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ min (0 - 59)
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ hour (0 - 23)
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ day of month (1 - 31)
β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ month (1 - 12)
β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ day of week (0 - 6) (0 to 6 are Sunday to
β”‚ β”‚ β”‚ β”‚ β”‚ Saturday, or use names; 7 is also Sunday)
* * * * *
In addition, you can use @hourly, @daily, @weekly, @monthly, and @yearly shortcuts. These are equivalent to:
0 * * * * @hourly
0 0 * * * @daily
0 0 * * 0 @weekly
0 0 1 * * @monthly
0 0 1 1 * @yearly
Note that the timezone used for all tasks is UTC.

All administration of tasks occur in the postgres database. Your Hydra user has permission to connect to this database for this purpose.
To connect to the database, connect normally, then switch to postgres using \c postgres.
$ psql service=hydra
​
d123456=> \c postgres
​
psql (13.7, server 13.6 (Ubuntu 13.6-1.pgdg18.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
You are now connected to database "postgres" as user "u123456".
postgres=>

To create a task:
SELECT cron.schedule_in_database(
'refresh-abc', -- task name,
'30 * * * *', -- schedule
'd123456', -- database
$ REFRESH MATERIALIZED VIEW CONCURRENTLY abc $ -- command
);

To delete a task:
SELECT cron.unschedule('refresh-abc');

You can alter any parameters about a task, such as its schedule, command, and whether it is currently active.
First you must obtain the job ID:
SELECT cron.jobid FROM cron.job WHERE jobname = 'refresh-abc';
You can then alter the task. NULL indicates that change is desired. The last parameter controls whether the task is disabled:
SELECT cron.alter_job(
42, -- job id
NULL, -- schedule, NULL = do not change
NULL, -- command, NULL = do not change
NULL, -- database, NULL = do not change
NULL, -- username, NULL = do not change
false -- active, false=inactive, true=active
)

The cron.job table contains all current jobs and their parameters.
SELECT * FROM cron.job;

Each time a job runs, a record is placed in cron.job_run_details.
SELECT * FROM cron.job_run_details
ORDER BY start_time DESC
LIMIT 10

Cleaning the task logs is at your discretion. You could schedule a task to remove logs after a set time:
SELECT cron.schedule_in_database(
'clean-cron-logs',
'0 0 * * *',
'postgres',
$ DELETE FROM cron.job_run_details WHERE end_time < now() - interval '7 days' $
);

Any SQL you can write can be turned into a scheduled task, but even better is writing a function you then schedule. This allows you to more easily write and test the function before scheduling the task. Functions can perform anything you need to do on your data warehouse.

​Materialized views are a great way to cache results of expensive analytical queries for your dashboards to consume. You can write a function to refresh your materialized views periodically:
CREATE OR REPLACE FUNCTION refresh_materialize_views_hourly()
RETURNS void LANGUAGE PLPGSQL AS $function$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY abc;
REFRESH MATERIALIZED VIEW CONCURRENTLY def;
END;
$function$;
You can then schedule this task:
SELECT cron.schedule_in_database(
'refresh_materialize_views_hourly',
'15 * * * *',
'd123456',
$ SELECT refresh_materialize_views_hourly() $
);
​
⚠
Refreshing too many views concurrently may cause a very large load spike on your database. You can use pg_sleep() or multiple tasks to spread the load over time.

You can create an hourly or daily process to move new data into columnar. This makes managing the incoming flow of data easier as you can replicate data into row-based tables at any interval, and then archive data in columnar storage once it no longer changes.
CREATE OR REPLACE FUNCTION copy_events_to_columnar()
RETURNS void LANGUAGE PLPGSQL AS $function$
BEGIN
-- copy in new events
INSERT INTO events_columnar
SELECT *
FROM events_row
WHERE events_row.created_at > (
SELECT MAX(created_at)
FROM events_columnar
-- adding this clause will greatly help with performance
WHERE created_at > NOW() - interval '1 week'
);
-- clear the events_row table
TRUNCATE events_row;
END;
$function$;
​
⚠
This query will only work as intended if some recent data exists in events_columnar. You should do an initial data load before running this function.
You can then schedule the task to move this data every day:
SELECT cron.schedule_in_database(
'copy_events_to_columnar',
'@daily',
'd123456',
$ SELECT copy_events_to_columnar(); $
);
As an added bonus, you can create a view to query from both tables:
CREATE VIEW events_all AS
SELECT * FROM events_columnar
UNION
SELECT * FROM events_row;
Copy link
Edit on GitHub
On this page
Administering tasks
Creating a task
Deleting a task
Modify, enable, or disable a task
Review tasks
Use Cases
Update materialized views
Move data to columnar storage