#apache-airflow
How to Copy/Move S3 Files With Apache Airflow
-
Create an AWS account, and ensure you have the right roles and policies set before proceeding with the following code
-
Create a working instance of Apache Airflow in local or on your preferred cloud provider
-
Create an Airflow connection with
AWS_SECRET, AWS_ACCESS and role_arn
-
The connection extras will look something like this. Replace
<your-role-arn>
with the AWS role that you created.\{"region_name": "us-west-2", "role_arn": "<your-role-arn>", "assume_role_method": "assume_role"}
-
Add the following DAG to your dags folder
-
Run the code
from datetime import datetime
from typing import List, Optional, Tuple
from airflow import DAG
from airflow.providers.amazon.aws.operators.s3 import S3FileTransformOperator
import os
# This fixed NEGSIG.SIGEV error
os.environ['no_proxy'] = '*'
DAG_ID = "s3_file_transform"
with DAG(
dag_id=DAG_ID,
schedule=None,
start_date=datetime(2022, 11, 10),
tags=["example"],
catchup=False,
) as dag:
move_files = S3FileTransformOperator(
task_id="move_files",
source_s3_key='s3://v-glue-example-bucket/example.txt',
dest_s3_key="s3://v-glue-example-bucket/processed/{{ ds }}/example.txt",
transform_script="/bin/cp"
)
- You can change the
transform_script
form/bin/cp
to/bin/mv
to move files. - Note that the
dest_key
has{{ds}}
in it. This ensures a new blob is created every time the DAG runs. - You can also pass a python script as a string to
transform_script