Troubleshooting the Apache Airflow Scheduler: DAG Not Triggered at Scheduled Time

Upsolver Team
Streaming Data
December 25, 2022

Table of Contents

It can be frustrating when the scheduler fails to trigger DAGs to run at the scheduled time, disrupting your workflows. This article will provide examples of why DAGs may not be triggered, how to fix this issue, and introduce a tool called SQLake for simplifying data pipeline orchestration.

Some common reason DAG Not Triggered at Scheduled Time are:

1. Airflow Runs Jobs at The End of An Interval

One possible reason for this issue is the start date of the DAG. In the provided code, the start date is set to the current date using the time module. However, Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of the DAG will be after the first interval rather than at the scheduled time.

dag = DAG( 'run_job', default_args=default_args, catchup=False, )

To solve this problem, you can either hard-code a static start date for the DAG or make sure that the dynamic start date is far enough in the past so that it is before the interval between executions.

It is generally recommended to use static start dates to have more control over when the DAG is run, especially if you need to re-run jobs or backfill data.

2. Airflow Webserver and Scheduler Misconfiguration

Another possible issue could be with the configuration of the Airflow webserver and scheduler. Suppose you are experiencing the issue where the DAG only executes once after restarting the webserver and scheduler.

In that case, it could be due to a problem with the configuration or connectivity between the two. You may want to check the logs or try restarting the webserver and scheduler again to see if that resolves the issue.

3. Check Dependencies

If you are still experiencing problems with the scheduler not triggering DAGs at the scheduled time, other issues may be at play. It could be related to the specific version of Airflow you are using, or there may be problems with your DAG code or the dependencies it uses.

Alternative Approach – Automated Orchestration

Although Airflow is a valuable tool, it can be challenging to troubleshoot. SQLake is a good alternative that enables the automation of data pipeline orchestration.

With SQLake you can:

Build reliable, maintainable, and testable data ingestion.
Process pipelines for batch and streaming data, using familiar SQL syntax.
Jobs are executed once and continue to run until stopped.
There is no need for scheduling or orchestration.
The compute cluster scales up and down automatically, simplifying the deployment and management of your data pipelines.

Here is a code example of joining multiple S3 data sources into SQLake and applying simple enrichments to the data.

Run the following code in SQLake

/* Ingest data */
-- 1. Create a connection to SQLake.
CREATE S3 CONNECTION airflow_alternative_pipelines_samples
    AWS_ROLE = 'arn:aws:iam::949275490180:role/samples_role'
    EXTERNAL_ID = 'AIRFLOW_ALTERNATIVE_SAMPLES'
    READ_ONLY = TRUE;
-- 2. Create empty tables to use as staging for orders.
CREATE TABLE default_glue_catalog.database_a137bd.orders_raw_data()
    PARTITIONED BY $event_date;
CREATE TABLE default_glue_catalog.database_a137bd.sales_info_raw_data()
    PARTITIONED BY $event_date;
-- 3. Create streaming jobs to ingest raw orders and sales data into the staging tables..
CREATE SYNC JOB load_orders_raw_data_from_s3
   CONTENT_TYPE = JSON
   AS COPY FROM S3 upsolver_s3_samples 
      BUCKET = 'upsolver-samples' 
      PREFIX = 'orders/' 
   INTO default_glue_catalog.database_a137bd.orders_raw_data; 
CREATE SYNC JOB load_sales_info_raw_data_from_s3
   CONTENT_TYPE = JSON
   AS COPY FROM S3 upsolver_s3_samples 
      BUCKET = 'upsolver-samples' 
      PREFIX = 'sales_info/'
   INTO default_glue_catalog.database_a137bd.sales_info_raw_data;

Explore the SQLake templates collection

Published in: Blog , Streaming Data

Upsolver Team

Upsolver enables any data engineer to build continuous SQL data pipelines for cloud data lake. Our team of expert solution architects is always available to chat about your next data project. Get in touch

Troubleshooting the Apache Airflow Scheduler: DAG Not Triggered at Scheduled Time

Some common reason DAG Not Triggered at Scheduled Time are:

1. Airflow Runs Jobs at The End of An Interval

2. Airflow Webserver and Scheduler Misconfiguration

3. Check Dependencies

Alternative Approach – Automated Orchestration

Templates

All Templates

Some common reason DAG Not Triggered at Scheduled Time are:

1. Airflow Runs Jobs at The End of An Interval

2. Airflow Webserver and Scheduler Misconfiguration

3. Check Dependencies

Alternative Approach – Automated Orchestration

Keep up with the latest cloud best practices and industry trends

Subscribe

All Templates