Sanchit Dilip Jain/Permissions Management for Amazon Redshift DataShares 💡

Permissions Management for Amazon Redshift DataShares

What is Amazon Redshift Data Sharing?
- Amazon Redshift Data Sharing is a feature that enables Redshift cluster owners to share live data across different Amazon Redshift clusters, accounts, and regions.
- With data sharing, users can easily and securely share data with other Redshift users, without having to copy or move the data.
What are Amazon Redshift Data Sharing Benefits?
- Data sharing is a great way to improve your security and performance, as well as save money.
  - Enhanced Security - By using Amazon Redshift data sharing, you can ensure that only authorized users have access to sensitive data by limiting the number of tables they can access. This helps prevent accidental or malicious disclosure of confidential information during queries and analysis.
  - Improved Performance - Data sharing improves query performance by allowing multiple users with different privileges to access the same table simultaneously without affecting one another’s results or causing errors in the query plan generated by SQL queries run against it (for example, if two people were trying to run a join on two distinct tables).
  - Cost savings - Sharing data across accounts and regions can help to reduce data storage costs and eliminate the need to move data.
Adding/Removing individual objects from DataShares
- As a Producer Cluster administrator retains control of the datasets that are shared, the Administrator can add new objects to or remove them from the datashare.
- Access can also be granted or revoked to datashares as a whole for the consumer clusters, AWS accounts, or AWS Regions.
- When permissions are revoked, consumer clusters immediately lose access to the shared objects and stop seeing them in the list of INBOUND datashares in SVV_DATASHARES
Role-based Access control for Amazon Redshift Data SharingHeader anchor link
- On the consumer side, a consumer cluster administrator can determine which users and groups should get access to the shared data.
- An administrator can control access at the database or schema level. To further restrict access, views can be created on top of shared objects, exposing only the necessary data.
- We can then use these views to give access to the users and groups.
INCLUDENEW option in Data Sharing
- The INCLUDENEW clause specifies whether to add any future tables, views or SQL UDFs created in a specified schema to the datashare.
- Current tables, views, or SQL UDFs in the specified schema aren’t added to the datashare. The default setting is False.
- IncludeNew option should be used with great care. If set to TRUE this will automatically add all the objects created in the future in the specified schema to the datashare automatically.
- This might not be ideal in cases where fine-grained control is required on objects being shared. In such cases, leave the setting to default, which is FALSE.

Demo

Objective:
- In this lab, we will learn how to quickly add/remove objects from datashares as well as implement role-based access control for fine-grained access management
Prerequisite
- Complete Workload Isolation demo using Amazon Redshift Data Sharing lab
- Ensure both the clusters etl-redshift-cluster and bi-redshift-cluster are in a start state.
Remove the Order table from DataShare
- Go to Amazon Redshift Console by clicking link
- Select etl-redshift-cluster under Provisioned Clusters from the Redshift console
- Select Query Editor V2 to query etl-redshift-cluster. Configure Query Editor v2, leave settings as default, and press Configure Account
- Let’s connect to our database by selecting etl-redshift-cluster. Select temporary credentials on the list of options and connect using these credentials
  - Database name: dev
  - Database user: awsuser
- Once connected, run the below command on etl-redshift-cluster to remove the individual object from datashares:
  ALTER DATASHARE tpc_share REMOVE TABLE public.orders;
- Run the below query on the etl-redshift-cluster and confirm that public.orders table is no longer shared:
  select * from svv_datashare_objects;
Add the Orders table to the DataShare
- Run the below command to add an individual object to datashares:
  ALTER DATASHARE tpc_share Add TABLE public.orders;
- Run the below query on the etl-redshift-cluster and confirm that public.orders table is added to the date share:
  select * from svv_datashare_objects;
INCLUDENEW option in Data Sharing
- Run below SQL to add public schema to tpc_share datashare with INCLUDENEW=TRUE:
  ALTER DATASHARE tpc_share SET INCLUDENEW = TRUE for SCHEMA public;
- View all objects in tpc_share and verify that the include_new option is set to ’true’ for public schema:
  select * from svv_datashare_objects;
- Create a dummy table in public schema:
  CREATE TABLE public.include_new_true (col1 int, col2 char(1));
- Verify by querying the svv_datashare_objects table that the dummy table public.include_new_true is added automatically to datashare
  select * from svv_datashare_objects;
- Set INCLUDENEW option to FALSE and confirm that table is not added automatically with the default configuration
  ALTER DATASHARE tpc_share SET INCLUDENEW = FALSE for SCHEMA public;
- Create a dummy table in public schema again and verify this time it is not added to datashare.
  CREATE TABLE public.include_new_false (col1 int, col2 char(1))
- Verify by querying the svv_datashare_objects table that the dummy table public.include_new_fasle is not added automatically to datashare
  select * from svv_datashare_objects;
Role-based Access control for Amazon Redshift Data Sharing - Perform below steps for role based access control:
- Click bi-redshift-cluster under the Amazon Redshift Clusters tab
- Click on the Datashares tab and Connect to the database under Databases created in my namespace using the following temporary credentials: Database name: dev Database user: awsuser
- Create two roles for two different group of users. Data Analysts wants to have access to all underlying tables shared via data sharing. However, BI users need to run pre-defined reports with limited access.
  CREATE ROLE bi_users; CREATE ROLE data_analysts;
- Create users and assign roles to users respectively using below SQL:
  CREATE USER dan PASSWORD 'Redshift123' ; GRANT ROLE data_analysts to dan; CREATE USER sam PASSWORD 'Redshift123'; GRANT ROLE bi_users to sam;
- Create Views for all the tables shared via data sharing. External tables cannot be added to regular views and therefore create view with no schema binding
```
CREATE VIEW supplier_view AS SELECT * FROM tpc.public.supplier WITH NO SCHEMA BINDING;

CREATE VIEW lineitem_view AS SELECT * FROM tpc.public.lineitem WITH NO SCHEMA BINDING;

CREATE VIEW nation_view AS SELECT * FROM tpc.public.nation WITH NO SCHEMA BINDING;

CREATE VIEW orders_view AS SELECT * FROM tpc.public.orders WITH NO SCHEMA BINDING;

CREATE VIEW part_view AS SELECT * FROM tpc.public.part WITH NO SCHEMA BINDING;

CREATE VIEW partsupp_view AS SELECT * FROM tpc.public.partsupp WITH NO SCHEMA BINDING;

CREATE VIEW region_view AS SELECT * FROM tpc.public.region WITH NO SCHEMA BINDING;
```
- Create View aggregating lineitem data to the supplier level. This data needs to be visible just to BI users.
```
CREATE VIEW supplier_shipmode_agg as
    select l_suppkey, l_shipmode, datepart(year, L_SHIPDATE) l_shipyear,
    SUM(L_QUANTITY) TOTAL_QTY,
    SUM(L_DISCOUNT) TOTAL_DISCOUNT,
    SUM(L_TAX) TOTAL_TAX,
    SUM(L_EXTENDEDPRICE) TOTAL_EXTENDEDPRICE
    from tpc.public.lineitem
    group by 1,2,3
    WITH NO SCHEMA BINDING;
```
- Grant required access on views to data_analysts role.
```
    GRANT SELECT ON 
    lineitem_view, nation_view, orders_view, part_view,
    partsupp_view, region_view, supplier_view
    TO ROLE data_analysts;
```
- Grant required access on supplier_shipmode_agg view to bi_users role.
  GRANT SELECT ON supplier_shipmode_agg TO ROLE bi_users;
- Connect to bi-redshift-cluster using dan credentials. You will need to right click on bi-redshift-cluster and then select Edit connection
- Verify that data analysts (dan) can access all 1:1 views and cannot access supplier_shipmode_agg view. Run below query to confirm that data analyst(dan) can access 1:1 views
  Select * from nation_view limit 100;
  Run below query to confirm that data analyst(dan) cannot access supplier_shipmode_agg view
  Select * from supplier_shipmode_agg limit 100;
- Connect to bi-redshift-cluster using dan credentials. You will need to right click on bi-redshift-cluster and then select Edit connection
- Verify that sam has access to supplier_shipmode_agg view but cannot access 1:1 views. Run below query to confirm that bi_user(sam) can access supplier_shipmode_agg view
  Select * from supplier_shipmode_agg limit 100;
  Run below query to confirm that bi_user(sam) cannot access 1:1 views
  Select * from nation_view limit 100;

Resources

Visit this page to find the latest documentation.