Important
This post is to help customers who are looking for AES encryption and decryption UDF implementation in Amazon Redshift to answer “How AES encryption and decryption UDF can be implemented in Amazon Redshift?”. This post should not be considered as “Best Practice” for such implementation.
How to implement AES Encryption & Decryption UDF in Amazon Redshift
What is this post about?
This post is to help customers who are looking for AES encryption and decryption UDF implementation in Amazon Redshift to answer "How AES encryption and decryption UDF can be implemented in Amazon Redshift?".
Beside Amazon Redshift cluster level encryption, from data governance and protection perspective often customers want to use Advanced Encryption Standard (AES)for their sensitive data to be stored in Redshift database columns. As data loading is part of ELT, this is often helpful for ELT developers and architects to simplify their ELT process for Amazon Redshift and reducing development efforts using User Defined Function where developers need to encrypt data. At the same time, authorized users and data scientist can also leverage the decrypted data using decryption UDF.
Advanced Encryption Standard (AES)
The Advanced Encryption Standard (AES) is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001.
UDF Deployment
These user defined functions use PyPI pyaes module to encrypt and decrypt data using AES encrypt and decrypt methods.
AES encryption supports 128 bits (16 bytes), 192 bits (24 bytes) or 256 bits (32 bytes), any key length other than supported will throw error during encryption and/or decryption.
See for more details: https://github.com/ricmoo/pyaes
Create library
CREATE OR REPLACE LIBRARY pyaes
LANGUAGE plpythonu
FROM 'https://tinyurl.com/redshift-udfs/pyaes.zip?raw=true';
;
Create encrypt function
--- Create a separate schema to deploy encryption UDF
CREATE SCHEMA udf_enc;
--- Create UDF
CREATE OR REPLACE FUNCTION udf_enc.aes_encrypt(input VARCHAR(max), vKey VARCHAR(256))
RETURNS VARCHAR STABLE AS $$
import pyaes
import binascii
if input is None:
return None
key = vKey # Your Key here
aes=pyaes.AESModeOfOperationCTR(key)
cipher_txt=aes.encrypt(input)
cipher_txt2=binascii.hexlify(cipher_txt)
return str(cipher_txt2.decode('utf-8'))
$$ LANGUAGE plpythonu ;
Create decrypt function
--- Create a separate schema to deploy Decryption UDF to control decryption access to only authorized users.
CREATE SCHEMA udf_dec;
--- Create UDF
CREATE OR REPLACE FUNCTION
udf_dec.aes_decrypt(encrypted_msg varchar(max), vKey VARCHAR(256))
RETURNS VARCHAR STABLE AS $$
import pyaes
import binascii
if encrypted_msg is None or len(str(encrypted_msg)) == 0:
return None
key = vKey # Your decryption key here
aes = pyaes.AESModeOfOperationCTR(key)
encrypted_msg2=binascii.unhexlify(encrypted_msg)
decrypted_msg2 = aes.decrypt(encrypted_msg2)
return str(decrypted_msg2.decode('utf-8'))
$$ LANGUAGE plpythonu ;
Test functionality
Setup Test Environment and Data
--- Create schema to contain sensitive data
CREATE SCHEMA secure_edw_t;
--- Create views schema
CREATE SCHEMA edw_v;
--- Create a sample user
CREATE USER edw_v_read password 'B!@nK73x7';
--- Grant USAGE privileges on schema to make object visible to user (this will NOT make data visible to user)
GRANT USAGE ON SCHEMA secure_edw_t TO edw_v_read;
GRANT USAGE ON SCHEMA edw_v TO edw_v_read;
--- Create table
CREATE TABLE secure_edw_t.emp_secure
(
emp_id int,
emp_name VARCHAR(255),
emp_phone varchar(255),
emp_name_enc VARCHAR(255),
emp_phone_enc varchar(255)
);
--- Create view to project only encrypted values to the user
CREATE VIEW edw_v.emp
AS
(
SELECT emp_id
, emp_name_enc as emp_name
, emp_phone_enc as emp_phone
FROM secure_edw_t.emp_secure );
GRANT SELECT ON edw_v.emp TO edw_v_read;
--- Populate test data in Table
INSERT INTO secure_edw_t.emp_secure (emp_id, emp_name, emp_phone)
Values (1, 'Azhar', '2341232345');
INSERT INTO secure_edw_t.emp_secure (emp_id, emp_name, emp_phone)
Values (2, 'Humayun', '4323445676');
INSERT INTO secure_edw_t.emp_secure (emp_id, emp_name, emp_phone)
Values (3, 'Rawish', '2221233213');
INSERT INTO secure_edw_t.emp_secure (emp_id, emp_name, emp_phone)
Values (4, 'Khayyam', '9808987658');
INSERT INTO secure_edw_t.emp_secure (emp_id, emp_name, emp_phone)
Values (5, 'Kawish', '2342342456');
INSERT INTO secure_edw_t.emp_secure (emp_id, emp_name, emp_phone)
Values (6, 'Shariq', '6575768475');
INSERT INTO secure_edw_t.emp_secure (emp_id, emp_name, emp_phone)
Values (7, 'Qasim', '3213453234');
See GRANT for more details about granting appropriate permission in Amazon Redshift.
Encrypting Data
In this simple test, we are using different keys for corresponding columns for encryption.
- emp_name is encrypted using key
empnameKey/fhci4=dnv73./xorb3f05
- emp_phone is encrypted using key
empphoneKey29s0vne03]jv023n=bn34
SQL
--- Encrypt data
UPDATE secure_edw_t.emp_secure
SET emp_name_enc = udf_enc.aes_encrypt(emp_name, LPAD('empnameKey/fhci4=dnv73./xorb3f05', 32, 'z')),
emp_phone_enc = udf_enc.aes_encrypt(emp_phone, LPAD('empphoneKey29s0vne03]jv023n=bn34', 32, 'z'))
;
SELECT * FROM secure_edw_t.emp_secure;
Result
Actual data in table will be like below:
emp_id | emp_name | emp_phone | emp_name_enc | emp_phone_enc |
---|---|---|---|---|
1 | Azhar | 2341232345 | e8a769c636 | 9345901d200893aeca19 |
2 | Humayun | 4323445676 | e1a86cc63d3194 | 9545961f260f94abc91a |
3 | Rawish | 2221233213 | fbbc76ce372c | 9344961d200892afcf1f |
4 | Khayyam | 9808987658 | e2b560de3d2597 | 984e94142b0396abcb14 |
5 | Kawish | 2342342456 | e2bc76ce372c | 9345901e210f93a9cb1a |
6 | Shariq | 6575768475 | fab560d52d35 | 97439319250d99a9c919 |
7 | Qasim | 3213453234 | f8bc72ce29 | 9244951f260e92afcd18 |
Test decryption and user data visibility
Test values decryption
Using correct key
SQL
SELECT emp_id,
udf_dec.aes_decrypt(emp_name, LPAD('empnameKey/fhci4=dnv73./xorb3f05', 32, 'z')) emp_name,
udf_dec.aes_decrypt(emp_phone, LPAD('empphoneKey29s0vne03]jv023n=bn34', 32, 'z')) emp_phone
FROM edw_v.emp;
Result
emp_id | emp_name | emp_phone |
---|---|---|
1 | Azhar | 2341232345 |
2 | Humayun | 4323445676 |
3 | Rawish | 2221233213 |
4 | Khayyam | 9808987658 |
5 | Kawish | 2342342456 |
6 | Shariq | 6575768475 |
7 | Qasim | 3213453234 |
Using incorrect key
SQL
SELECT emp_id,
udf_dec.aes_decrypt(emp_name, LPAD('empWRONGKey/fhci4=dnv73./xorb3', 32, 'z')) emp_name,
aes_decrypt(emp_phone, LPAD('phoneWRONGKey29s0vne03]jv023n=', 32, 'z')) emp_phone
FROM edw_v.emp;
Result
SQL Error [500310] [XX000]: [Amazon](500310)
Invalid operation:
UnicodeDecodeError: 'utf8' codec can't decode
byte 0xfb in position 0: invalid start byte.
Please look at svl_udf_log for more information
Details:
-----------------------------------------------
error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xfb in position 0: invalid start byte. Please look at svl_udf_log for more information
code: 10000
context: UDF
query: 3405510
location: udf_client.cpp:369
process: query0_2002_3405510 [pid=43012]
-----------------------------------------------;
Check for error details if error occurred
SELECT * FROM svl_udf_log;
See svl_udf_log for more details
- Test data visibility for user
Change session user
We are using SET SESSION AUTHORIZATION
command to switch user session temporary to “edw_v_read”. If you are not running this test using superuser, then you have to logoff and login using the “edw_v_read” credentials to test.
See SET SESSION AUTHORIZATION for more details.
SET SESSION AUTHORIZATION 'edw_v_read';
Read data from view
SQL
SELECT *
FROM edw_v.emp;
Result
emp_id | emp_name | emp_phone |
---|---|---|
1 | e8a769c636 | 9345901d200893aeca19 |
2 | e1a86cc63d3194 | 9545961f260f94abc91a |
3 | fbbc76ce372c | 9344961d200892afcf1f |
4 | e2b560de3d2597 | 984e94142b0396abcb14 |
5 | e2bc76ce372c | 9345901e210f93a9cb1a |
6 | fab560d52d35 | 97439319250d99a9c919 |
7 | f8bc72ce29 | 9244951f260e92afcd18 |
Attempt to read data from table
SQL
SELECT *
FROM secure_edw_t.emp_secure;
Result
SQL Error [500310] [42501]: [Amazon](500310)
Invalid operation: permission denied for relation emp_secure;