Sanchit Dilip Jain/Integrate Jfrog Artifactory with AWS Managed workflow for Apache Airflow ๐Ÿ”

Created Wed, 19 Jun 2024 12:00:00 +0000 Modified Wed, 04 Dec 2024 15:21:06 +0000
700 Words 3 min

Integrate Jfrog Artifactory with AWS Managed workflow for Apache Airflow

Overview

  • When managing data pipelines and workflows in AWS Managed Apache Airflow, it’s often necessary to integrate with various external services for tasks such as downloading binaries, libraries, or other dependencies.
  • JFrog Artifactory is a popular binary repository manager that allows you to securely store and manage artifacts. Integrating JFrog with AWS Managed Apache Airflow enables seamless and secure downloading of these artifacts, ensuring that your workflows can reliably access the necessary resources.
  • This integration is crucial for maintaining consistency and security across your deployment pipelines, allowing you to automate complex workflows without manual intervention.
  • By using AWS Secrets Manager and a startup script, we can securely manage the credentials required for this integration, ensuring that sensitive information is handled safely.

Demo

  • Managing secrets securely is crucial when working with workflows that require sensitive data, such as authentication tokens or API keys. AWS Managed Apache Airflow provides a robust platform for orchestrating workflows but requires careful handling of secrets.

  • This blog post will guide you through the steps to securely insert secrets during the initialization of AWS Managed Apache Airflow to download binaries from JFrog.

    • Prerequisites

      • AWS Account with access to Managed Apache Airflow.
      • AWS CLI configured on your local machine.
      • JFrog Artifactory account and API key.
      • AWS Secrets Manager setup to store JFrog credentials.
      • S3 bucket to store the startup script.
    • Step 1: Store JFrog Credentials in AWS Secrets Manager

      • AWS Secrets Manager in the AWS Management Console.
      • Create a new secret and select Other type of secret.
      • Add key/value pairs for your JFrog credentials (e.g., username and api_key).
      • Name your secret, e.g., jfrog/credentials.
      • Save the secret and note down the Secret ARN.
    • Step 2: Create a Startup Script

      • Create a startup script that will be executed during the initialization of the Airflow environment. This script will fetch the secrets from AWS Secrets Manager and set them as environment variables.

        #!/bin/bash
        
        # Install AWS CLI if not already installed
        if ! command -v aws &> /dev/null
        then
            echo "AWS CLI not found. Installing..."
            curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
            unzip awscliv2.zip
            sudo ./aws/install
        fi
        
        # Fetch the secrets from AWS Secrets Manager
        SECRET=$(aws secretsmanager get-secret-value --secret-id jfrog/credentials --query SecretString --output text)
        USERNAME=$(echo $SECRET | jq -r .username)
        API_KEY=$(echo $SECRET | jq -r .api_key)
        
        # Set environment variables
        echo "export JFROG_USERNAME=$USERNAME" >> /etc/environment
        echo "export JFROG_API_KEY=$API_KEY" >> /etc/environment
        
      • Save this script as init_script.sh.

    • Step 3: Upload the Startup Script to S3

      • Upload the init_script.sh to an S3 bucket that your Airflow environment has access to.

        aws s3 cp init_script.sh s3://your-bucket-name/init_script.sh
        
    • Step 4: Configure AWS Managed Apache Airflow to Use the Startup Script

      • Modify the environment section of your Airflow environment to use the startup script from the S3 bucket.

      • Open the Amazon MWAA console.

      • Select your environment.

      • Edit the environment.In the Environment class section, add the S3 path to the startup script in the Startup script file field (e.g., s3://your-bucket-name/init_script.sh).

    • Step 5: Modify Airflow DAG to Use Environment Variables

      • Modify your Airflow DAG to use the environment variables set by the startup script for authentication.

        from airflow import DAG
        from airflow.operators.python import PythonOperator
        from airflow.utils.dates import days_ago
        import os
        import requests
        
        # Define default arguments
        default_args = {
            'owner': 'airflow',
            'start_date': days_ago(1)
        }
        
        # Define the DAG
        dag = DAG(
            'download_binaries_from_jfrog',
            default_args=default_args,
            schedule_interval=None,
        )
        
        def download_binaries():
            # Retrieve secrets from environment variables
            username = os.getenv('JFROG_USERNAME')
            api_key = os.getenv('JFROG_API_KEY')
        
            # JFrog API endpoint and headers
            jfrog_url = 'https://your-jfrog-instance/artifactory/path/to/your/binary'
            headers = {
                'Authorization': f'Bearer {api_key}'
            }
        
            # Download the binary
            response = requests.get(jfrog_url, headers=headers, auth=(username, api_key))
        
            if response.status_code == 200:
                with open('/path/to/save/binary', 'wb') as f:
                    f.write(response.content)
            else:
                raise Exception('Failed to download binary from JFrog')
        
        # Define the task
        download_task = PythonOperator(
            task_id='download_binaries',
            python_callable=download_binaries,
            dag=dag,
        )
        
        download_task
        
    • Step 6: Deploy and Test the DAG

      • Upload the DAG to your AWS Managed Apache Airflow environment.
      • Trigger the DAG manually via the Airflow UI to ensure it works correctly.

Conclusion

  • By following these steps, you can securely insert secrets during the initialization of AWS Managed Apache Airflow and download binaries from JFrog.
  • Using a startup script with AWS Secrets Manager ensures that your sensitive information is handled securely and dynamically at runtime, reducing the risk of exposure