Sanchit Dilip Jain/Redshift Data Analysis with Generative SQL ๐Ÿ”

Created Sun, 07 Jul 2024 12:00:00 +0000 Modified Sun, 07 Jul 2024 20:06:17 +0000
729 Words 3 min

Redshift Data Analysis with Generative SQL

  1. What is Amazon Redshift Serverless?

    • Amazon Redshift Serverless makes it convenient for you to run and scale analytics without having to provision and manage data warehouses.
    • Amazon Redshift automatically provisions and scales data warehouse capacity to deliver fast performance for demanding and unpredictable workloads. You pay only for the capacity that you use.
  2. What is Amazon Q?

    • Amazon Q generates code, tests, debugs, and has multistep planning and reasoning capabilities that can transform and implement new code generated from developer requests.
    • Amazon Q also makes it easier for employees to get answers to questions across business data such as company policies, product information, business results, code base, employees, and many other topics by connecting to enterprise data repositories to summarize the data logically, analyze trends, and engage in dialogue about the data.

Demo

  • In this blog, we’ll explore data analysis using the Redshift Query Editor V2. We will set up a Redshift serverless cluster using AWS CFT, and our goal is to use Amazon Q to generate queries so we can focus on querying data.

    • Provisioning Redshift serverless cluster via AWS Cloudformation

      • Download the Cloudformation from this URL and upload at AWS Cloudformation console

      • The CloudFormation stack will create the necessary resources required for the demo. Check the CloudFormation console and wait for the status CREATE_COMPLETE as shown below

    • Download a sample dataset from here and upload to an existing Amazon S3 bucket in the same account and same region of Redshift serverless cluster

    • Setup Redshift Query Editor V2

      • Navigate to the Redshift Query Editor V2

      • Press the Configure account button to open the Query Editor.

      • In the panel area click on Severless: workgroup to open the connection screen

      • Click Create connection to connect to the database.

    • Load Your Data into Redshift Query Editor V2

      • Click the Load Data button to initiate the data loading process.

      • Choose the Browse S3 button, select the bucket (same bucket you uploaded data in Step 2), and then pick the .clickstream_data.csv. file. This CSV contains the dataset for our analysis.

      • After selecting the file, click Choose, and then Next to proceed.

      • Choose to Load new table in the following screen.

      • In the Schema field, select public. Name your new table: clickstream_data works well.

      • For the IAM role, choose the pre-set RedshiftCopyUnload Role.

      • Complete the setup by clicking Create table and then the Load Data button.

    • Using Generative SQL for Queries

      • The clickstream data provides a detailed record of user interactions on a fictional website, capturing information such as IP addresses, event timestamps, device types, types of events (like visiting the home page or product catalog), product categories, and unique user IDs. In this section you will use Generative SQL with Amazon Q to understand the data.

        • Click the Plus sign to open a new Notebook. This will be your playground for data queries.

        • Next, click on the Generative SQL button then Settings in the popup

        • Click the Generative SQL checkbox, then click Save

        • Now click Generative SQL to bring up the chat window. Here’s where you’ll ask your data questions.

        • Start with a simple query to get a feel for the dataset. Type in: Show me all the product types in clickstream_data. Then, click Add to notebook and hit the Run button in the cell. This query gives you a snapshot of the different products in your dataset.

        • Ready to dive deeper? Try a more targeted query. Ask: Show me the users who engaged the most with “video games”. This will help you identify which users are most interested in video games, offering a glimpse into user preferences and behaviors.

        • Let’s run few more queries

          • Show me the most popular device types

          • Show me the top 5 users with the device type of tablet

          • Show me the most engaged users with a device type of mobile that had an event_type of the order_checkout.

          • Show me the top products viewed by users on mobile devices.

Conclusion

  • That wraps up our demo through data analysis with Amazon Q and Generative SQL. We’ve seen how these powerful tools can transform the way you interact with and understand your data.
  • Remember, while next generation developer tools like Amazon Q can significantly boost your productivity, it’s not infallible. Keep an eye out for areas where Q might not get things quite right and be ready to step in and guide the process.