Troubleshooting Airflow “unrecognized arguments” Error

The “unrecognized arguments” error in Airflow indicates that the command line arguments you entered are not supported by the Airflow program. This error is typically caused by entering an incorrect command or mistyping an argument. This article will cover common causes of the “unrecognized arguments” error in Apache Airflow and provide an alternative solution, SQLake, for automating data pipeline orchestration for data engineers.

Common reasons for receiving “unrecognized arguments” in Apache Airflow:

a. The scheduler is trying to execute the DAG file as a command line argument, rather than as a Python script.

This might be because the DAG file is not being properly imported into the Airflow system, or because there is a problem with the DAG file itself.

To troubleshoot this issue, you can try the following steps:

  1. Confirm that the DAG file is located in the correct location within the Airflow DAGs folder, and that it has the correct file name and extension (e.g. .py).
  2. Check the Airflow logs for any errors or messages that might provide more context on the issue. You can find the logs in the “logs” folder within the Airflow home directory.
  3. Make sure that the DAG file is correctly formatted and follows the guidelines for creating a DAG in Airflow. For example, make sure that it has the correct import statements, that the DAG object is correctly defined and initialized, and that it has at least one task associated with it.
  4. Check the Airflow web UI to see if the DAG is listed under the “DAGs” tab. If it is not listed, it may indicate that it is not being properly imported into the system.
  5. Restart the Airflow scheduler and web server to see if that resolves the issue.

b. The script cannot parse the arguments being passed to it.

To fix this issue, you can try these steps:

  1. Confirm that the arguments are being passed correctly to the script. Make sure that you are using the correct syntax for passing arguments to the script, and that the arguments are being passed in the correct order.
  2. Check the syntax of the argparse.ArgumentParser() object and the add_argument() method. Make sure that the required arguments are marked as required using the required parameter, and that the correct action and type are being specified for each argument.
  3. Make sure that the dest parameter in the add_argument() method is correctly specified. The dest parameter specifies the name of the attribute that will be created to hold the argument’s value.
    For example, if the dest parameter is set to "mac", the value of the -m argument will be stored in the args.mac attribute.
  4. Check the script for any syntax errors or other issues that might prevent it from running correctly.
  5. Test the script with different sets of arguments to see if the issue is consistently reproducible. This can help narrow down the cause of the issue.

c. The airflow connections command is not configured to delete connections properly.

These are the steps to take:

  1. Confirm that you are using the correct syntax for the airflow connections command. According to the Airflow documentation, the correct syntax for deleting a connection is airflow connections --delete --conn_id CONN_ID, where CONN_ID is the ID of the connection you want to delete.
  2. Make sure that you are passing the --delete and --conn_id options to the airflow connections command. Without these options, the command will not recognize the arguments being passed to it.
  3. Check the Airflow logs for any errors or messages that might provide more context. You can find the logs in the “logs” folder within the Airflow home directory.
  4. Make sure that the connections you are trying to delete exist in the Airflow system. You can check the list of connections by running the airflow connections --list command.
  5. If you are using the Airflow CLI inside a script, make sure that the script has the necessary permissions to execute the airflow connections command.
  6. If you are still having trouble, you can try running the airflow connections command with the --debug option to enable debugging output. This can help you understand what is causing the issue.

Alternative Approach – Automated Orchestration:

Airflow is a great tool, but difficult to debug. SQLake is a great alternative that allows you to automate data pipeline orchestration.

With SQLake you can

  • Build reliable, maintainable, and testable data ingestion.
  • Process pipelines for batch and streaming data, using familiar SQL syntax.
  • Jobs are executed once and continue to run until stopped.
  • There is no need for scheduling or orchestration.
  • The compute cluster scales up and down automatically, simplifying the deployment and management of your data pipelines.

Here is a code example of joining multiple S3 data sources into SQLake and applying simple enrichments to the data.

/* Ingest data into SQLake */

-- 1. Create a connection to SQLake sample data source.
CREATE S3 CONNECTION upsolver_s3_samples
    AWS_ROLE = 'arn:aws:iam::949275490180:role/upsolver_samples_role'
    EXTERNAL_ID = 'SAMPLES'
    READ_ONLY = TRUE;

-- 2. Create empty tables to use as staging for orders and sales.
CREATE TABLE default_glue_catalog.database_a137bd.orders_raw_data()
    PARTITIONED BY $event_date;

CREATE TABLE default_glue_catalog.database_a137bd.sales_info_raw_data()
    PARTITIONED BY $event_date;

-- 3. Create streaming jobs to ingest raw orders and sales data into the staging tables..
CREATE SYNC JOB load_orders_raw_data_from_s3
   CONTENT_TYPE = JSON
   AS COPY FROM S3 upsolver_s3_samples 
      BUCKET = 'upsolver-samples' 
      PREFIX = 'orders/' 
   INTO default_glue_catalog.database_a137bd.orders_raw_data; 

CREATE SYNC JOB load_sales_info_raw_data_from_s3
   CONTENT_TYPE = JSON
   AS COPY FROM S3 upsolver_s3_samples 
      BUCKET = 'upsolver-samples' 
      PREFIX = 'sales_info/'
   INTO default_glue_catalog.database_a137bd.sales_info_raw_data;
Published in: Blog , Streaming Data
Upsolver Team
Upsolver Team

Upsolver enables any data engineer to build continuous SQL data pipelines for cloud data lake. Our team of expert solution architects is always available to chat about your next data project. Get in touch

Keep up with the latest cloud best practices and industry trends

Get weekly insights from the technical experts at Upsolver.

Subscribe

Templates

All Templates

Explore our expert-made templates & start with the right one for you.