Integrate Jfrog Artifactory with AWS Managed workflow for Apache Airflow
Overview
- When managing data pipelines and workflows in AWS Managed Apache Airflow, it’s often necessary to integrate with various external services for tasks such as downloading binaries, libraries, or other dependencies.
- JFrog Artifactory is a popular binary repository manager that allows you to securely store and manage artifacts. Integrating JFrog with AWS Managed Apache Airflow enables seamless and secure downloading of these artifacts, ensuring that your workflows can reliably access the necessary resources.
- This integration is crucial for maintaining consistency and security across your deployment pipelines, allowing you to automate complex workflows without manual intervention.
- By using AWS Secrets Manager and a startup script, we can securely manage the credentials required for this integration, ensuring that sensitive information is handled safely.
Demo
Managing secrets securely is crucial when working with workflows that require sensitive data, such as authentication tokens or API keys. AWS Managed Apache Airflow provides a robust platform for orchestrating workflows but requires careful handling of secrets.
This blog post will guide you through the steps to securely insert secrets during the initialization of AWS Managed Apache Airflow to download binaries from JFrog.
Prerequisites
- AWS Account with access to Managed Apache Airflow.
- AWS CLI configured on your local machine.
- JFrog Artifactory account and API key.
- AWS Secrets Manager setup to store JFrog credentials.
- S3 bucket to store the startup script.
Step 1: Store JFrog Credentials in AWS Secrets Manager
- AWS Secrets Manager in the AWS Management Console.
- Create a new secret and select Other type of secret.
- Add key/value pairs for your JFrog credentials (e.g., username and api_key).
- Name your secret, e.g., jfrog/credentials.
- Save the secret and note down the Secret ARN.
Step 2: Create a Startup Script
Create a startup script that will be executed during the initialization of the Airflow environment. This script will fetch the secrets from AWS Secrets Manager and set them as environment variables.
#!/bin/bash # Install AWS CLI if not already installed if ! command -v aws &> /dev/null then echo "AWS CLI not found. Installing..." curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install fi # Fetch the secrets from AWS Secrets Manager SECRET=$(aws secretsmanager get-secret-value --secret-id jfrog/credentials --query SecretString --output text) USERNAME=$(echo $SECRET | jq -r .username) API_KEY=$(echo $SECRET | jq -r .api_key) # Set environment variables echo "export JFROG_USERNAME=$USERNAME" >> /etc/environment echo "export JFROG_API_KEY=$API_KEY" >> /etc/environment
Save this script as init_script.sh.
Step 3: Upload the Startup Script to S3
Upload the init_script.sh to an S3 bucket that your Airflow environment has access to.
aws s3 cp init_script.sh s3://your-bucket-name/init_script.sh
Step 4: Configure AWS Managed Apache Airflow to Use the Startup Script
Modify the environment section of your Airflow environment to use the startup script from the S3 bucket.
Open the Amazon MWAA console.
Select your environment.
Edit the environment.In the Environment class section, add the S3 path to the startup script in the Startup script file field (e.g., s3://your-bucket-name/init_script.sh).
Step 5: Modify Airflow DAG to Use Environment Variables
Modify your Airflow DAG to use the environment variables set by the startup script for authentication.
from airflow import DAG from airflow.operators.python import PythonOperator from airflow.utils.dates import days_ago import os import requests # Define default arguments default_args = { 'owner': 'airflow', 'start_date': days_ago(1) } # Define the DAG dag = DAG( 'download_binaries_from_jfrog', default_args=default_args, schedule_interval=None, ) def download_binaries(): # Retrieve secrets from environment variables username = os.getenv('JFROG_USERNAME') api_key = os.getenv('JFROG_API_KEY') # JFrog API endpoint and headers jfrog_url = 'https://your-jfrog-instance/artifactory/path/to/your/binary' headers = { 'Authorization': f'Bearer {api_key}' } # Download the binary response = requests.get(jfrog_url, headers=headers, auth=(username, api_key)) if response.status_code == 200: with open('/path/to/save/binary', 'wb') as f: f.write(response.content) else: raise Exception('Failed to download binary from JFrog') # Define the task download_task = PythonOperator( task_id='download_binaries', python_callable=download_binaries, dag=dag, ) download_task
Step 6: Deploy and Test the DAG
- Upload the DAG to your AWS Managed Apache Airflow environment.
- Trigger the DAG manually via the Airflow UI to ensure it works correctly.
Conclusion
- By following these steps, you can securely insert secrets during the initialization of AWS Managed Apache Airflow and download binaries from JFrog.
- Using a startup script with AWS Secrets Manager ensures that your sensitive information is handled securely and dynamically at runtime, reducing the risk of exposure