AWS Case Studies: Serverless Monitoring and Notification Pipeline
We recently worked with a Fortune 500 manufacturer of heavy equipment that is focused on quality, productivity, and effectively connecting its customers with data-driven insights via technology. As an international, publicly traded organization, it is also careful about managing security, risk and compliance. So, when this manufacturer asked if we could set up an audit and notification system, we were happy to roll up our sleeves and begin work. (You can read here the full case study of this Fortune 100 customer.)
For this manufacturer, we set up an audit and notification system using serverless tools. The goal of the notification system was to alert operations and information security teams of any known issues surfacing in the account, e.g., a VPC which is running out of IP addresses; or violations of the corporate security standard; or all volumes of Amazon RDS databases are encrypted and have a particular tag defined. The customer was not only looking for this system to be low maintenance but also wanted it to be extensible so that new rules could be conveniently added and the same system could be used to audit multiple AWS accounts. We took a serverless approach to this notification system.
We implemented the core of the business logic through AWS Lambda functions. AWS Lambda functions make the system easy to manage and eliminate the need for traditional IT constructs like Virtual Machines and Databases. We structured the system such that each rule is implemented by a separate Lambda function. These Lambda functions run in a single account periodically, auditing the customer accounts and generating relevant notifications.
Handling Multiple Accounts
Being a large corporation, this customer has more than 50 AWS accounts that they want continuously audited and monitored. However, handling multiple AWS accounts raises a two-part challenge: (1) how does the system audit all accounts securely, and (2) how do the administrators add additional accounts for auditing? To solve this challenge, we maintain an Amazon Dynamo DB table with IDs of all their accounts. In addition to the account ID, the database table contains another column with IAM Role ARN for the role in each account. The Lambda functions loop through all accounts. For each account, they assume the appropriate cross-account AWS IAM role, and run the rule check.
Our AWS consultants created an automated process for adding a new account. To simplify the deployment of the IAM role in multiple accounts, we used AWS Service Catalog and its cross-account sharing feature. When a new account is added, the Service Catalog product is shared with the new account. The product is then deployed in the account, and the new account is added to the list of accounts in the DynamoDB mentioned earlier.
The most interesting part of the project is the deployment pipeline. The pipeline is implemented using AWS Code Pipeline. The code for the Lambda functions is stored in an on-premise GitHub enterprise server. CodePipeline is triggered when a change to this code is made, e.g, if a function is added, removed, or changed.
To ease deployments, updates, and rollbacks, we use AWS CloudFormation for its built-in features. We set up each of the Lambda functions as their own CloudFormation stack and wrap them as nested stacks in an orchestration stack.
This top-level nested CloudFormation template is auto-generated by a Python script. The script reads out the directories for the code for each of the Lambda functions and uses the directory names to create the template. This means that the process for adding a new notification is really simple: create a new directory and write the code for it in that folder. The Python scripts adds it to the template automatically and CodePipeline deploys it.
We trigger the Python script as a part of AWS Code Pipeline (again created through CloudFormation). The Python script creates the orchestration template, adding, updating or deleting the nested stacks as needed.
The Lambda functions themselves were created to log data to AWS CloudWatch logs. From AWS logs, we used Lambda to trigger and copy the data into the customer’s existing in-house ELK cluster where the customer setup a dashboard to view and use the audit information.
This is how we architected the solution for this Fortune 100 organization. As the world progresses on serverless computing, we expect the definition of operations will change rapidly. A new class of problems will emerge alongside new best practices. And in the midst of that, Flux7 has every intention of being at the cutting edge solving these problems. For more reading on the future of DevOps, please refer to our resource page on the topic, check out my recent presentation at the DevOps Remote Conference on the topic of Effectively Planning for each of the Integral DevOps Processes, or subscribe to our blog below to receive weekly updates, analysis and tips and tricks.
You canread more AWS case studies here.