Implementing Attribute-Level Encryption with DynamoDB

Matt Perreault
Real Kinetic Blog
Published in
7 min readMay 1, 2024

--

As always if you found this content valuable, please applaud and reshare! If something did not work for you leave a comment and I will get back to you. Cheers!

There will come a time in your data/software engineering career where you will be working with sensitive data. When that time comes, you will be expected to have a security story for this data. Most, if not all, modern cloud-managed database systems offer encryption at rest. For most cases, this level of protection is fine. The data would still be visible as cleartext when viewed from the AWS console or when queried, provided you have the proper IAM access.

However, you may find yourself working with extremely sensitive data, such as bank routing numbers or health data. In this instance, the transparent encryption AWS does may not be sufficient, and the data should only be decrypted when it needs to be read directly in the application. Application-level encryption does not rely on the underlying transport and at-rest encryption. This provides an additional layer of security because in order to access the data, we need both access to the database and its transparent encryption key as well as the application-level encryption key.

In this post, I am going to demonstrate a pattern for implementing attribute-level encryption for Amazon DynamoDB with KMS in Python using Lambda as the runtime. The test system is a simple event-driven architecture that is made up of S3, Lambda and DynamoDB. Even though this project is using DynamoDB as the data store, this pattern can be used for any other database or storage system. The infrastructure is built and managed using AWS CDK.

If you would like to skip straight to the full code base for this article you can find it at the Real Kinetic open source code lab project https://gitlab.com/real-kinetic-oss/code-lab/encrypt-dynamo.

In order to run through this project, you will need a few setup pieces in place:

  1. AWS Account
  2. AWS CDK
  3. Python >= 3.10

You will want to make sure that you have your ~/.aws/config and ~/.aws/credentials files set such that you have the default region and your AWS credentials set up for CDK to pull. Example of a config and credentials file below.

# ~/.aws/config
[default]
region=us-east-1
# ~/.aws/credentials
[default]
aws_access_key_id = MYAWSACCESSKEYID
aws_secret_access_key = MYAWSSECRETACCESSKEY

If you have never worked with CDK before, you should run through the docs first before getting too far into this post https://docs.aws.amazon.com/cdk/v2/guide/home.html

Here is an image of the reference architecture we are going to use to test our work:

As you can see, the architecture is very simple. The workflow kicks off when a json file is uploaded to the S3 bucket. That json file contains the data we want to encrypt and put into the DynamoDB table. It is a “new line” delimited json blob with some nested and un-nested data that needs to be encrypted.

{"id": "abc123", "encrypt": "I will be encrypted one day!", "secret_item": {"nested_attribute": "Encrypt me!"}}
{"id": "efg456", "encrypt": "I am sensitive data!"}

Now, let’s look at the stack and the infrastructure pieces needed in order to make this work.

Within our CDK stack, we start by creating our KMS key that will be used to encrypt the attributes.

class EncryptDynamoStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
key = kms.Key(self, "DynamoAttributeKey", enable_key_rotation=True)
key.add_alias("alias/dynamo-attribute-key")

Next, we will define our Lambda function and then add an IAM policy to it that will allow us to encrypt and describe that key.

encrypt_lambda = lambda_.Function(
self,
"EncryptDynamoLambda",
runtime=lambda_.Runtime.PYTHON_3_12,
handler="index.handler",
code=lambda_.Code.from_asset(
path.join(path.dirname(path.dirname(__file__)), "lambda_handler")
),
environment={"DYNAMO_TABLE": "encrypted"},
)

encrypt_lambda.add_to_role_policy(
iam.PolicyStatement(
actions=["kms:Encrypt", "kms:DescribeKey"],
effect=iam.Effect.ALLOW,
resources=[key.key_arn],
)
)

The final pieces will be our S3 bucket which will start the pipeline and our DynamoDB table. Note that we will add an S3 event source to our Lambda which will act as our entry point to the system.

landing_bucket = s3.Bucket(
self,
"DynamoLandingBucket",
block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
encryption=s3.BucketEncryption.S3_MANAGED,
enforce_ssl=True,
versioned=False,
)

dynamo_table = dynamo.Table(
self,
"DynamoTable",
table_name="encrypted",
partition_key=dynamo.Attribute(
name="id",
type=dynamo.AttributeType.STRING,
),
billing_mode=dynamo.BillingMode.PAY_PER_REQUEST,
)

encrypt_lambda.add_event_source(
source=event_sources.S3EventSource(
landing_bucket, events=[s3.EventType.OBJECT_CREATED_PUT]
)
)

landing_bucket.grant_read(encrypt_lambda)
dynamo_table.grant_read_write_data(encrypt_lambda)

That is all there is for the infrastructure. Now, let’s get into the fun stuff of the logic needed for our attribute-level encryption.

The general logic of the Lambda handler code is:

  1. Parse event for object and bucket keys
  2. Fetch records to encrypt from S3
  3. Encrypt records
  4. Batch write encrypted records to Dynamo
def handler(event, context):
# 1) Parse event for object and bucket keys
s3_inputs = [
{"Bucket": record["s3"]["bucket"]["name"], "Key": record["s3"]["object"]["key"]}
for record in event.get("Records")
]

key_response = kms_client.describe_key(KeyId="alias/dynamo-attribute-key")
key_id = key_response["KeyMetadata"]["KeyId"]

for s3_input in s3_inputs:
# 2) Fetch records to encrypt from S3
records = [record for record in get_records(s3_input)]
# 3) Encrypt records with KMS key
encrypted_records = encrypt_records(records, key_id)
# 4) Batch write encrypted reccords to dynamo
write_records(encrypted_records)

return 200

A few important pieces of information to call out. First, we start by getting the records we put in S3 by creating a list of dictionaries that will reference the S3 bucket and key for each object put in S3.

We then need to get the KMS key ID that we defined in our CDK stack from earlier. Here I use the “describe_key” function on the KMS Boto3 client to get that information.

Then we iterate over each of our S3 bucket-key dictionaries and run our algorithm.

The “get_records” function returns a list of records that will be written to the database. Since this is pretty standard stuff I won’t go into much detail but will paste the code below for anyone interested.

def get_records(s3_input: dict[str:str]) -> list[dict]:
response = s3_client.get_object(
Bucket=s3_input.get("Bucket"), Key=s3_input.get("Key")
)
records = response.get("Body").read().decode("utf-8")
return [json.loads(record) for record in records.strip().split("\n")]

Now, we get the crux of the article. Encrypting a subset of attributes that contain sensitive information. If you recall above we have an “encrypt” field in our data and a “nested_attribute” field in our data. These are the two fields that I am targeting to encrypt. At the global level of our function (outside the handler) I have defined a set that contains the fields I want to encrypt.

encrypt_set = {"encrypt", "nested_attribute"}

The “encrypt_records” function takes our list of records and the key ID of the KMS key we will use to encrypt the data. It iterates over each record and using dictionary comprehension for each key, value pair in each record calls the “encrypt_field” function. It returns a list of our encrypted records. This is where python’s list and dictionary comprehension makes these tasks very clean.

def encrypt_records(records: list[dict], key_id: str) -> list[dict]:
encrypted_records = []
for record in records:
encrypted_records.append(
{k: encrypt_field(k, v, key_id) for k, v in record.items()}
)

return encrypted_records

The “encrypt_field” function will recursively check the record for the attributes that need to be encrypted and if it finds a field it should encrypt it will then use the KMS Boto3 client to encrypt the attribute and replace the plaintext attribute with the encrypted cipher text blob. Let’s take a peek at the code.

def encrypt_field(field: str, attribute: str | dict, key_id: str) -> str | dict:
if isinstance(attribute, dict):
for k, v in attribute.items():
return {k: encrypt_field(k, v, key_id)}
if field in encrypt_set:
return kms_client.encrypt(
KeyId=key_id, Plaintext=bytes(attribute, "utf-8")
).get("CiphertextBlob")

return attribute

Once the program runs through our “encrypt_field” and “encrypt_records” functions, we now have a list of our encrypted items that are ready to be written to Dynamo.

Our final step is to write these records to the Dynamo table. Here, I am using the “batch_writer” in a context-manager which gives us some nice features such as retrying unprocessed items in the batch automatically.

def write_records(records: list) -> None:
with table.batch_writer(overwrite_by_pkeys=["id"]) as batch:
for record in records:
try:
batch.put_item(Item=record)
except ClientError:
print(f"Failed to put item: {record.get('id')} to dynamo")
raise

Time to test out our system! Run through a quick few CDK commands to synthesize the CloudFormation stack, bootstrap, and then deploy to AWS.

$ cdk synth
$ cdk bootstrap
$ cdk deploy

Once the stack is deployed, go to the AWS console and make sure that everything is as expected. I like to go to the Lambda console and click Applications on the left hand side to see all the pieces of the application. If you followed the code I have put up your application should be named EncryptionDynamoStack.

You should see the following resources

Navigate to the S3 bucket that was created and upload a tes json file. This will kick off the application and assuming you have run through everything correctly in your dynamo table you should now see your encrypted and unencrypted attributes.

In order to decrypt these values, you would create an IAM policy with a “kms:Decrypt” and “kms:DescribeKey” actions on the KMS key created in the CDK stack and attach it to the resource that would be in charge of decrypting these attributes. Run the pattern as described in this post, this time with the KMS client “decrypt” function on the same attributes.

I hope you found this tutorial helpful! If you have any questions, please feel free to leave a comment and I will get to it as soon as I can. Or if you need help building or managing your cloud stack please reach out to us at Real Kinetic — we are happy to set up a meeting!

--

--

Based in Colorado. In my day job I build and architect data intensive systems in the cloud