AWS Operational Excellence - Well-Architected Framework Pillar

Well, we're already onto Part 7 or our deep-dive into the AWS Well-Architected Framework. This time, we'll explore the Operational Excellence pillar. If you haven’t checked out the earlier 6 parts to this series, sign up at the bottom of this page. We’re moving on assuming that you’ve read the previous posts and are familiar with the AWS Well-Architected framework and all the guiding general design principles. Following the format of our earlier looks at the other pillars, we’re going to break down this Operational Excellence pillar as foolws

Operational Excellence Pillar Overview
The Pillar Design Principles
Best Practice Areas

AWS Well-Architected Operational Excellence Pillar Overview

The operational excellence pillar is critical in providing the right support structure to your cloud workloads and help you achieve your organizational objectives. The pillar moves on to include guidance and best practices to help ensure you're running your cloud workloads and and designing new ones as efficiently as possible. By using the guidance provided, you'll be able to improve the metrics and insight available on your overall business operations, and continue to improve your processes to needed to been the changing demands of your cloud infrastructure and business. Moving forward, we'll explore this pillars design principles that will guide us towards that Operational Excellence dream.

Understanding AWS Well-Architected Operational Excellence Pillar

Operational Excellence Pillar General Design Principles

Perform Operations as Code
Make Frequent, Small, Reversible Changes
Refine Operations Procedures Frequently
Anticipate Failure
Learn From All Operational Failures

At this point, we'll explore each of these 5 design principles of the pillar.

Perform Operations as Code

First, we have AWS pushing us to take advantage of using Infrastructure-as-Code (IaC) for our cloud environments. When using IaC tools such as AWS CloudFormation, you can approach your cloud infrastructure resources like your application software. With IaC, you can reduce human errors and help enable consistent, repeatable, and automated ways of provisioning and maintaining your cloud environments. Read about our latest customer success story, where Autimo helped implement a secure development platform based on IaC practices, all in line with the principles of the AWS Well-Architected Pillars.

Make Frequent, Small, Reversible Changes

Transitioning to more of an architectural mindset, consider designing your workloads to enable its various components to be updated and swapped out for other services in the future. This principle suggests we keep our changes small and fail fast. Keeping the scope of changes to a minimum can help reduce failed changes due to complexity and help troubleshoot when things do go wrong. Then, if things do break or do not go according to plan, reverse the changes or decisions quickly. learn from them, and adjust as needed.

Refine Operations Procedures Frequently

Moving right along, AWS advises that we tune our operational procedures often. If you're going through some procedure or performing some task and it's painful for whatever reason, make note of these pain points and then look for ways to improve them. There are often operational procedures that don't get used much on a day-to-day basis. Maybe these relate to your disaster recovery procedures or some other event that only occurs occasionally. It's important to regularly review these procedures as well since they may not have the review frequency needed to keep them up-to-date. Running frequent game day events with a variety of scenarios can help keep your operational procedures fresh, as they often force teams to run through and explore existing procedures (or lack thereof) and can identify multiple areas where improvements can be made or updates are needed.

Anticipate Failure

Continuing on the importance of these game day events, they can also help bring possible sources of failure to the forefront, so you can make improvements to mitigate those failure sources before they actually become a live failure scenario that impacts your customers and business. The game days also prepare staff for handling failure events and ensure they properly respond and can effectively address the situation. There are often a number of improvement areas around training, documentation, processes, and overall workload architecture or application design improvements.

Learn From All Operational Failures

Operational Excellence is more about the journey than actually reaching any sort of state of excellence. It is a constant, never-ending lifecycle of fail-learn-improve. AWS points out the importance of what was learned from failures. The important lessons are often relevant, at least in principle, for many different teams across the organization.

Operational Excellence Pillar Best Practice Areas

The AWS Well-Architected Operational Excellence Pillar contains 4 best practice areas to consider when reviewing operational aspects of your cloud workloads.

Organization
Operate
Operate
Evolve

Just like other parts of this series, we'll be diving into each of these best practice areas in a future article, looking at each best practice area in depth and providing additional insights on how to help on your journey towards Operational Excellence. Don't miss out on the upcoming parts of this series and other Cloud and DevOps by signing up for our newsletter at the bottom of the page!

Need Help Today?

The Autimo team can help you navigate the complex cloud security landscape and help build automation and baked-in security into your engineering platforms, leaving your business to focus on delivering value to your customers through your products and solutions.

AWS Secure Environment Accelerator (ASEA)

The Autimo team has specialized implementation experience with the AWS Secure Environment Accelerator, that helps Canadian and Global AWS customers with NIST 800-53 and Canadian Centre for Cyber Security’s ITSG-33 specifications around network design and segmentation, data security, centralized identity and access management (IAM), logging and auditing, and more. The reference ASEA AWS landing zone can be a complex implementation. The Autimo team can help understand your organization's unique requirements and help navigate the design and implementation details based on your unique business, security, and compliance requirements.

Operational Excellence. Delivered. Learn how Autimo helped Hero Group implement a turnkey development platform focused on security and productivity through infrastructure automation through Terraform infrastructure-as-code (IaC)

“Without Autimo’s help there is no way we’d have been able to hit the level of productivity we have seen. To be able to forget about the infrastructure and just get on with writing code has been transformative for our team.”

Markus Westerholz CTO, Hero Innovation Group

AWS Well-Architected - Operational Excellence