Deploying an AWS Lambda Function that Processes Historical Data from a Dynamo DB Table and Uploads Messages to an Amazon SNS Topic (using Terraform)

In the television and movie industry, attracting viewers is the most important trade of the business. There have been shows that have hit wide acclaim due to the love and support they’ve received from viewers, and vice versa, shows that have disappeared like thin air due to not appealing to the larger audience. Mass media conglomerates have always placed their primary focus on advertising and promotional tactics to draw in big crowds towards watching every tv show or movie that they have released for the many decades that entertainment has shaped our life. Now, over the past few years after the introduction of more sophisticated data collection techniques and marketing analytics, these companies have started using historical and average viewership data for shows they’ve released to determine whether a show is still in fashion amongst the public or has lost its place over time. This helps them make the big decision on whether to renew the show for another season or part ways after the end of the current season. Historical data collection and analysis can be a real pain if we didn’t have ways to automate the process, but luckily the cloud offers us a way to ease this difficulty by providing auto-scaling databases and on-demand function instances that we can mold to work with our data. This blog will explore how you can use the Amazon Web Services cloud provider to set up a Lambda Function that can compare historical viewership data for a particular tv show, stored in a No-SQL Dynamo Database table, against the current numbers to inform the promotional team on the show’s response via a message sent to an Simple Notification Service (SNS) topic that they are subscribed to. The entire deployment process will be performed via Terraform, a leading Infrastructure-as-Code (IaC) tool in the IT world at the current moment.

Figure 1 shows an architecture diagram of the entire workflow for visual clarity and understanding. The following segments will dive into each part of this workflow in more depth and explain how they were set up and accomplished.

· Setting up the main Terraform script to create an AWS Lambda Function and role to access the function

· Attaching policies to the role defined to access Dynamo DB, SNS, and Simple Queue Service (SQS) from the Lambda Function

· Enabling the Dynamo DB Trigger on the Lambda Function to process all records in the table as soon as a change is encountered

· Writing the actual function code to read records from the Dynamo DB table, compare their values against historical and average parameters, and upload messages to the SNS topic based on the trend observed

· Verifying the contents of the message delivered to the topic by setting up a subscriber queue and polling the messages in the queue

Figure 1. Workflow diagram for deploying an AWS Lambda Function that compares current viewership response with historical trends and uploads show continuation suggestions to SNS topics

Setting up the main Terraform script to create an AWS Lambda Function and role to access the function

While all cloud providers offer a console or portal to help users create resources and deploy them, infrastructure as code or IaC has become the leading trend for resource development and deployment in recent times. This is primarily because it offers the ability to write simple JSON definitions to launch resources from anywhere in the world, without needing access to anything except your code editor and a command line, unlike more traditional approaches to setting up infrastructure. Amongst the popular IaC tools in the industry, like Chef, Puppet, Ansible, etc., Terraform has also gained its place in popularity and ease of usage amongst IT enabled businesses. Figure 2 below shows the Terraform script that I developed to set up a basic Lambda Function and role to access the function.

Figure 2. Terraform script to set-up the AWS provider, credentials, and create the Lambda Function resource and corresponding role

Let’s break this down into parts for better understanding. The first thing that every main Terraform script must have is access to the particular cloud provider it is creating and deploying resources to. In our case, this would be Amazon Web Services or the AWS provider. We must specify the version(s) of AWS our script is compatible with, the region we want to place our resources in, and last but not least, the access and secret keys to our user account in AWS. The latter can be found under Identity Access Management (IAM) > Access Management > Users > Security Credentials > Access Keys as shown in Figure 3.

Figure 3. Access and Secret Keys used to login to particular user account on AWS from Terraform

Next, the actual role to access the Lambda Function must be created with the basic policy to allow this access. These and other resource definitions I will use in this tutorial can be found under the Terraform registry website for the AWS provider. Lastly, the actual function will be set up specifying the zip file with the code located inside of it, the name of the function, the role the function will be using (created above), and the runtime it will use to execute the code.

Attaching policies to the role defined to access Dynamo DB, SNS, and SQS from the Lambda Function

In order to obtain historical viewership data for a particular television show, an Amazon Dynamo DB table that can store unstructured, non-relational data in key/value pairs can be used. In this tutorial, a table that stores this data has been created and will be linked to the Lambda Function to process the data within the function code, explained in the next section. To allow access to this Dynamo DB table and retrieve records from the table, a policy must be attached to the Lambda Function role created in the previous section. Similarly, policies to access and upload messages to the SNS topics and SQS queues must be added and attached to the same role definition, so that the teams are alerted on the success status of a TV show, explained in later sections. Policies allow roles access to particular resources/services on AWS, which cannot be accessed by default. Figures 4–12 show how to set up these policies as separate resources and create role policy attachments in the main Terraform script.

Figure 4. IAM Policy that provides full access to Dynamo DB — 1
Figure 5. IAM Policy that provides full access to Dynamo DB — 2
Figure 6. IAM Policy that provides full access to Dynamo DB — 3 and role policy attachment to Lambda Function role
Figure 7. IAM Policy that provides Lambda Functions access to Dynamo DB and role policy attachment to Lambda Function role
Figure 8. IAM Policy that provides role access to SNS and role policy attachment to Lambda Function role
Figure 9. IAM Policy that provides full access to SNS and role policy attachment to Lambda Function role
Figure 10. IAM Policy that provides Lambda Functions basic execution access and role policy attachment to Lambda Function role
Figure 11. IAM Policy that provides Lambda Functions access to publish to SNS and role policy attachment to Lambda Function role
Figure 12. IAM Policy that provides full access to SQS and role policy attachment to Lambda Function role

It may be hard to comprehend how I was able to come up with a list of actions for each policy, especially the Dynamo DB policy. Luckily, definitions for all types of policies are available under the IAM > Roles tab for the particular role we created in the above section. We just have to click attach policies and type in the resource we would like to connect the role with. Figure 13 shows a full access to Dynamo DB policy definition, the same used to create the actual policy via Terraform.

Figure 13. Full Access to Dynamo DB Policy Definition (on IAM in the console)

The same technique was followed to create all the other policies as well. The next sections will delve into how these resources link together to perform the overall task of determining viewer response- the Lambda Function, the Dynamo DB table, and the SNS/SQS topics & queues respectively.

Enabling the Dynamo DB Trigger on the Lambda Function to process all records in the table as soon as a change is encountered

To make the process repeatable and more intelligent, a trigger that executes the Lambda Function to process viewership records from the Dynamo DB table is created, and this trigger fires every time a new viewership record for an episode is uploaded to the table. Figures 14–17 show where to add a trigger on the Lambda Function and how to configure it to connect to the Dynamo DB table of interest. Something of importance to note here is the specification of an Amazon Resource Number (ARN) to reference the actual Dynamo DB table demoShowViews. The ARN is like an address to this table, specifying its location (region) and the type of resource it is (Dynamo DB). This must be specified for all AWS resources when trying to create links between them or just even simply reference them.

Figure 14. Adding a Dynamo DB trigger to the Lambda Function (Step 1)
Figure 15. Adding a Dynamo DB trigger to the Lambda Function (Step 2)
Figure 16. Adding a Dynamo DB trigger to the Lambda Function (Step 3)
Figure 17. Adding a Dynamo DB trigger to the Lambda Function (Step 4)

Once this process has been complete and the trigger is in the enabled state as shown in Figure 17, we are ready to test whether adding a new record really does trigger the function immediately and allow it to execute its code to compare viewership response historically and over an average; the focus of our next section.

Writing the actual function code to read records from the Dynamo DB table, compare their values against historical and average parameters, and upload messages to the SNS topic based on the trend observed

Setting up the Dynamo DB trigger on the Lambda Function from the previous section automatically sets up a handler with an asynchronous function to parse out values from each record. We are going to modify this starter code to what is shown in Figure 18.

Figure 18. Lambda Function body to parse records from Dynamo DB table and publish corresponding messages to SNS topic

Alright, this is a lot to take in at once, so let’s break it down part by part. The first thing we’ll need when working with Dynamo DB or any other AWS resource within a Lambda Function is the aws-sdk. This provides access to a library of methods and classes used by each of the resources in AWS to perform certain actions on data that they are working with. Once this has been set up into a variable defined in the code, here conveniently called AWS, the configuration of this SDK must be set up to include the proper region in which the resources are located. In this demo, I created all the resources in the us-east-1 or North Virginia region and hence have provided that as the input to the SDK configuration.

After the SDK set-up has been complete with the above steps, we can dive into the handler, whose purpose is to “export” the object contained by it, which in this case is an asynchronous function body with code. The function takes in two inputs, an event and a context. The event is what we’re interested in since it basically consists of a bunch of information regarding why the function was triggered, including details about a new record upload, modification of an existing record, or deletion of a record. Our goal is to derive values from the event’s records that have been changed or inserted, and to do this, the best way is to use a for loop that iterates through all the records in an event. Since this is a Dynamo DB trigger event, each record (which will be in JSON format) will have a dynamodb object with respective key/value pairs that will list out the information stored in the table we’re reading records from. For the purposes of this demo, I will only be inserting new records (corresponding to new episodes and their respective number of views) to the table, and therefore the key for each dynamodb object will be a New Image, rather than an Old Image (which indicates modified or deleted records). In this way, the current episode number and the views that the episode received will be parsed out from the dynamodb New Image object under the episode and views keys respectively. Both these values will be stored in variables so that they can be used later for comparison with the other records’ viewership values.

The next step is to set up the Dynamo DB Document Client, a class provided by the aws-sdk library that permits the Lambda Function to submit queries and work with items in the Dynamo DB table of interest. We need to initialize this client with the region in which the table is located, which is again us-east-1 in my case, and store the client inside a variable for later manipulation. After this has been completed, a query to search the particular table, demoShowViews, for a specific episode to get the number of views corresponding to that episode, is formatted and stored in a variable called params. As can be seen evidently, this does not resemble a traditional SQL query SELECT statement, since firstly, this is a No-SQL table containing non-relational values, and secondly, going along with the first point, we are dealing with JSON formatted data that hence can only be parsed out via a JSON query. I have included the query formatting and remainder logic in a for loop so that every single record in the table up to and including the current episode’s record will be retrieved and parsed out. To actually get the record, the query is applied via the get method on the document client object created earlier, and a promise method is invoked to guarantee a result back to the user; whether it be a success or error callback. The await keyword is thrown in front of this statement to allow the function to wait for the promise to execute, especially since database table/query operations generally take some time to go through. The results of this query, which will basically be a record from the demoShowViews table, will be stored in JSON format into a constant called data. This constant will contain a property called Item which will store the viewership values for each record. In the same loop, I’ve added a counter and a cumulative summer to determine how many episodes have a higher viewership than the current episode we’re on and what the overall viewership sum is to calculate an average over all the episodes, respectively. Once the loop has been entirely passed through, the average number of views over all the episodes telecast so far will be determined and stored in a variable called avgViews to use in a mathematical comparison in the next block of code.

Now that the response has been determined, it is time to publish it to a topic that the promotional and creative teams will subscribe to so that they can learn about the performance of their show. To do this, we will be using the Amazon Simple Notification Service (SNS), which offers the ability to create topics that users and applications can publish to and respectively other users and applications can subscribe to and read data from. In this demo, I’ve already created an SNS topic called show-viewership-responses.fifo to support first in, first out message storage and delivery, as shown in Figures 19–21.

Figure 19. Creating an SNS topic on the console (Step 1)
Figure 20. Creating an SNS topic on the console (Step 2)
Figure 21. Creating an SNS topic on the console (Step 3)

In order to upload to this topic, we will once again need to make use of the aws-sdk for the SNS resource, this time making use of the publish method provided for that class to upload a message to the topic. But before a message can be directly published to the topic, it needs to formatted into SNS message acceptable format, which similar to the Dynamo DB query input, is a JSON key/value pair structure. Since we are uploading to a FIFO topic, we must provide the message content, topic ARN (again, this is important for any AWS resource), message group id, and message deduplication id. The latter two are made up by the user but must be unique for each message upload. This message is stored in a variable and is provided as the parameter input to the publish method discussed above. In this demo, three types of messages can be sent to the SNS topic: when the show is doing worse than average viewership response, when it is still doing better than average response but viewership has declined overall, and when it is very successful and needs no further attention. The first condition will be fulfilled if the current viewership numbers are less than the average viewership numbers as determined through the previous section of code. The second condition would be reached if the current viewership numbers are above the average viewership numbers but still less than at least one other previous episode’s viewership statistics. The final condition is in the situation where both the first and second condition are not met, indicating that the show is doing better than the average response and also not behind in viewership by even one previous episode. Once the messages have been formatted according to the conditions matched, a promise method is again invoked on the SNS service object to publish the message on to the topic and guarantee a response callback. Last but not least, an await is applied to the promise to retrieve parameters from the sent message, such as message ID, to ensure it was successfully published to the topic without an error callback. This summarizes the entire code involved to retrieve records from a Dynamo DB table, make comparisons amongst the data, and publish the overall response as a message to an SNS topic.

Verifying the contents of the message delivered to the topic by setting up a subscriber queue and polling the messages in the queue

Although we can gain confirmation from the message ID that a message was published on to the SNS topic, to read in the contents of the message to ensure that the right one was sent, we need the help of the Amazon Simple Queue Service, or SQS. The Amazon SQS allows us to set up queues to either send or receive messages, either within the queue itself or from a topic. The only thing we have to do from our side is create the queue (in this case as a FIFO queue since our topic is a FIFO topic), subscribe it to the topic of interest, and poll for messages after the trigger has been fired and the function has executed to send messages to the topic. Figures 22–27 represent this process step by step.

Figure 22. Creating an SQS queue on the console (Step 1)
Figure 23. Creating an SQS queue on the console (Step 2)
Figure 24. Creating an SQS queue on the console (Step 3)
Figure 25. Creating an SQS queue on the console (Step 4)
Figure 26. Creating an SQS queue on the console (Step 5)
Figure 27. Subscribing the SQS queue to the SNS topic on the console

To verify whether the right types of messages are sent for the three different scenarios outlined in the previous section, I have inserted different types of records into the demoShowViews Dynamo DB table. As can be seen in Figure 28, initially for testing purposes, I had only placed three records in this table. When I add in the fourth record, which I’ve set to have a higher viewership count than the other three records on purpose, the Lambda Function triggers to process the record and sends the success message (matching the final condition mentioned above) to the SNS topic. This message can be seen in its entirety when polled by the subscriber queue. Figures 29–31 represent this test.

Figure 28. Dynamo DB Table with initial testing records detailing episode number and view count
Figure 29. Fourth record with higher view count has been inserted to the Dynamo DB Table
Figure 30. A message has been delivered to the SNS topic and is polled in by the SQS queue
Figure 31. Contents of the message delivered to the SNS topic (satisfies condition 3)

The next test I conducted satisfies the first condition since the viewership numbers for the current episode are set to be lower than all previous episodes, indicating that the stats are less than the average response. When this record was inserted into the table, the function is again triggered and now sends the following message as shown in Figure 32 to the SNS topic, confirming that the response complies with the first condition.

Figure 32. Contents of the message delivered to the SNS topic (satisfies condition 1)

The last test, as probably inferred already by most of you, validates the second condition to check the situation where viewership response is better than the average but still behind compared to a few earlier episodes in the series. As expected, the function triggers to analyze this record and publishes a message to the topic stating to refresh the show with new thoughts, depicted in Figure 33. From these tests, it can clearly be seen that the Lambda Function interfaces properly with Dynamo DB and SNS/SQS, making the right conditional evaluations and uploading the appropriate messages to the topic of interest, for promotional and creative teams to consider for use in their decisions regarding the show.

Figure 33. Contents of the message delivered to the SNS topic (satisfies condition 2)

Bringing it back to focus…

There are many factors that drive the success of a TV show or movie, like creativity, innovation, dialogues, direction, etc., but the most important contributor of all is the viewer. Viewers have the power to determine whether a show should go on for 17 seasons like Grey’s Anatomy or vanish after merely six episodes, and hence their response and ratings are what large mass media firms use to make a concrete decision on how to continue or discontinue a show. Until the past 15–20 years ago, everything had to be maintained in a physical file and people had to meticulously analyze large sets of data to make accurate observations and decisions. Due to the fast paced and widespread acceptance of cloud technology, we now no longer need to turn to manual ways and can do everything in an automated and programmatic fashion with less physical infrastructure and data storage involved within our firms. This blog gives some insight into how firms can make use of the various services in AWS such as Dynamo DB to store records, Amazon SNS to upload messages to topics, and Amazon SQS to read messages published to topics, in order to form a comprehensive and intelligent approach to analyzing viewership trends for TV shows. The great thing about this approach is that even though I’ve shown it on a very small set of data for demo purposes, everything can be scaled out and up to accommodate business needs with minor configuration changes, allowing the same methods to be applicable to a much larger set of historical viewership data. Now that’s what we call the power of cloud technology.

BSMS Mechanical Engineering Grad from Georgia Tech now working as a DevOps Developer at Warner Media. I have a passion towards both cars and coding alike.