﻿
### Slide 1:

![Slide 1](slide_1.png)

### Slide 2:

![Slide 2](slide_2.png)

::: Notes


:::

### Slide 3:

![Slide 3](slide_3.png)

::: Notes

In this course, we'll examine cloud operations through a realistic scenario. You'll take on the role of a systems administrator at Example Corp, a company transitioning to cloud technology. We'll explore the patterns and frameworks that can inform your architectural decisions, and we'll critically evaluate their trade-offs and limitations.

#### Instructor notes

#### Student notes

To guide you through the process of learning how to become a *cloud operations administrator* for an AWS environment, this course presents a *learning scenario*. In this scenario, you are a systems administrator for a company called Example Corp. that is transitioning to an AWS Cloud environment. To prepare for the transition, you must understand the typical responsibilities involved in cloud operations and the AWS Well-Architected Framework. You must also understand the best practices for preparing, operating, and evolving an AWS environment.

:::

### Slide 4:

![Slide 4](slide_4.png)

::: Notes


:::

### Slide 5:

![Slide 5](slide_5.png)

::: Notes

Cloud operations encompasses all the activities needed to run and maintain your systems in the cloud. Think of it as six core responsibilities: first, you deploy infrastructure and applications consistently. Second, you monitor everything to maintain visibility. Third, you fortify your systems against disruptions. Fourth, you sustain them by optimizing resource usage. Fifth, you secure everything against threats. And finally, you optimize for cost and performance.

#### Instructor notes

#### Student notes

**Cloud operations** : *Deploy, monitor, fortify, sustain, secure, and optimize* complex computing systems. **Deploy** : Requires you to provision infrastructure, services, and other resources in a consistent, repeatable manner. This process requires you to plan for capacity, testing, workload management, and remediation, which might involve rollbacks. **Monitor** : Refers to your *visibility* into your applications, operational health, business outcomes, and customer impact. **Fortify** : Requires you to put measures in place to reduce infrastructure or service disruptions. Use high availability solutions and prepare to remediate any disruption quickly. **Sustain** : Helps you optimize your workload to efficiently use resources using a *data-driven approach*. Minimize and understand your environmental impact when running cloud workloads. **Secure** : Helps you protect your systems, data, and assets from unauthorized access or usage. You need preventative measures and mitigation strategies. **Optimize** : Optimizes your workloads for resource utilization, cost, return on investment (ROI), security, and performance.

:::

### Slide 6:

![Slide 6](slide_6.png)

::: Notes

The AWS Well-Architected Framework presents a categorization of design considerations for cloud systems, organized into six areas. These principles emerged from patterns observed in deployed systems. The framework provides a reference point for evaluating your architecture, though applying any principle requires understanding its specific context and constraints.

#### Instructor notes

#### Student notes

The *AWS Well-Architected Framework* is a set of *general design principles and conceptual themes* called the six pillars. The Well-Architected Framework includes *strategies and best practices* for operating cloud workloads. With the framework, you can operate reliable, secure, efficient, and cost-effective systems. These strategies and best practices have been developed from years of assisting thousands of AWS customers with the design and operations of their cloud footprint. For more information, see the "AWS Well-Architected Framework" technical paper at
https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html.

:::

### Slide 7:

![Slide 7](slide_7.png)

::: Notes

Before examining the six pillars, consider some foundational design tensions. How do you determine capacity requirements without prior data? What trade-offs emerge when you test at scale versus optimize for cost? When should you prioritize automation, and what risks does it introduce? These questions shape architectural decisions differently depending on context.

#### Instructor notes

#### Student notes

The Well-Architected Framework identifies a set of *general design principles* to facilitate good design in the cloud. The principles include the following: **Stop guessing your capacity needs** : When you decide on capacity requirements, consider an option you can use to *scale up and down as needed*. **Test systems at production scale** : By using the cloud, you can test on a production-scale environment at a fraction of the cost. You can provision a test environment and decommission it as soon as you complete the testing. **Automate** : With automation, you can deploy your environment in a repeatable manner and track changes and its impact.

:::

### Slide 8:

![Slide 8](slide_8.png)

::: Notes

Additional considerations emerge when you think about evolution and learning. How does your architecture adapt as requirements shift? What decisions require data versus judgment calls? Game days create controlled failure scenarios, but they may not reflect actual failure modes or organizational chaos. Consider what assumptions these practices make about your system and team.

#### Instructor notes

#### Student notes

The Well-Architected Framework also includes the following principles: **Accommodate for evolutionary architecture** : *Evolutionary architectures* are intended to provide the flexibility to adopt innovation, technology, and feature sets. When you take advantage of deployment automation and testing, you can reduce the impact of evolving your design. **Drive architectures using data** : On the cloud, you can collect vast amounts of data of your environment. This data can provide valuable *insight into the behavior of your workloads* so you can make fact-based decisions. **Improve through game days** : A game day simulates a failure or event to test systems, processes, and team responses. You want to simulate real-world experiences, such as outages, to understand how your environment responds and to determine whether you need to update your processes. Schedule game days recurrently.

:::

### Slide 9:

![Slide 9](slide_9.png)

::: Notes

The framework organizes considerations into six areas, each representing different types of design tension. Operational excellence concerns how you build and run systems. Security addresses access control and threat management. Reliability covers failure recovery and capacity. Performance efficiency involves resource utilization. Cost optimization requires understanding spending and value. Sustainability examines environmental impact. These pillars often create trade-offs with one another; none dominates universally.

#### Instructor notes

#### Student notes

The Well-Architected Framework principles and best practices are grouped into the following *six conceptual pillars* : **Operational excellence** - The ability to run and monitor systems to deliver business value and continually improve supporting processes and procedures. **Security** - The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies. **Reliability** - The ability of a system to do the following: Recover from infrastructure or service disruptions, Dynamically acquire computing resources to meet demand, Mitigate disruptions such as misconfigurations or transient network issues. **Performance efficiency** - The ability to use computing resources efficiently to meet system requirements and the ability to maintain that efficiency as demand changes and technologies evolve. **Cost optimization** - The ability to run systems to deliver business value at the lowest price point. **Sustainability** - The ability to minimize the environmental impacts of running cloud workloads. For more information, review "AWS Well-Architected" on the AWS Architecture Center at
https://aws.amazon.com/architecture/well-architected.

:::

### Slide 10:

![Slide 10](slide_10.png)

::: Notes

Operational excellence explores how you structure teams, visibility, and automation. Key questions: How do you align team structure with business outcomes? What does observability require in terms of cost and complexity? When does automation help versus introduce new failure modes? How do you balance small changes with necessary large-scale improvements? What are the trade-offs of different operational models?

#### Instructor notes

#### Student notes

The operational excellence pillar has the following design principles:

**Organize teams around business outcomes** : The ability of a team to achieve business outcomes comes from *leadership vision, effective operations, and a business-aligned operating model* that uses people, process, and technology to scale, optimize productivity, and differentiate through agility, responsiveness, and adaptation. Goals and operational KPIs are aligned at all levels to sustain long-term value.

**Implement observability for actionable insights** : Gain a comprehensive understanding of *workload behavior, performance, reliability, cost, and health* by establishing key performance indicators (KPIs) and leveraging observability telemetry to make informed decisions, take prompt action, and proactively improve performance, reliability, and cost.

**Safely automate where possible** : Define your entire workload and its operations as *code*, automate operations by initiating them in response to events, and employ automation safety by configuring guardrails to achieve consistent responses, limit human error, and reduce operator toil.

**Make frequent, small, reversible changes** : Design scalable, loosely coupled workloads that permit regular component updates through *automated deployment techniques* and smaller, incremental changes, reducing the blast radius and allowing for faster reversal when failures occur, to deliver beneficial changes while maintaining quality and adapting quickly.

**Refine operations procedures frequently** : As you evolve your workloads, evolve your operations by holding *regular reviews, validating procedures, updating them accordingly*, and communicating changes to all stakeholders and teams.

**Anticipate failure** : Test the effectiveness of your procedures and team response against *simulated failures* to make informed decisions and manage open risks.

**Learn from all operational events and metrics** : Drive improvement through *lessons learned from operational events and failures*, sharing data and anecdotes on how operations contribute to business outcomes across the organization.

**Use managed services** : Reduce operational burden by using AWS managed services and building operational procedures around interactions with those services. For more information, review the "Operational Excellence Pillar: AWS Well-Architected Framework" technical paper at
https://docs.aws.amazon.com/wellarchitected/latest/framework/oe-design-principles.html.

:::

### Slide 11:

![Slide 11](slide_11.png)

::: Notes

Operational excellence also involves organizational structure. You must establish priorities, but whose priorities and how do you measure them? You simplify operating models, but simplification often means fewer options. You enable teams to act, but this requires clear decision-making authority—how do you establish that? These organizational questions often prove more difficult than technical ones.

#### Instructor notes

#### Student notes

**Set operational priorities** : As part of your organization, your teams need *a shared understanding of your entire workload, their role in it, and shared business goals* to set the priorities that will achieve business success. Well-defined priorities maximize the benefits of your efforts. Evaluate internal and external customer needs by involving key stakeholders, including business, development, and operations teams, to determine where to focus efforts. Be aware of guidelines or obligations defined by your organizational governance and external factors, such as regulatory compliance requirements and industry standards that may mandate or emphasize specific focus. Evaluate threats to the business, such as business risk, liabilities, and information security threats, and maintain this information in a risk registry. Evaluate the impact of risks and tradeoffs between competing interests or alternative approaches. For example, you may emphasize accelerating speed to market for new features over cost optimization. Manage benefits and risks to make informed decisions when determining where to focus efforts. Some risks or choices may be acceptable for a time, while others require action. Review your priorities regularly so that you can update them as needs change.

**Simplify management of operating models** : Your teams need to understand their *roles in the success of other teams*, the role of other teams in their success, and have shared goals. Understanding responsibility, ownership, how decisions are made, and who has authority to make decisions helps focus efforts and maximize the benefits from your teams. No single operating model can support all teams and their workloads in your organization. Verify that there are identified owners for each application, workload, platform, and infrastructure component, and that each process and procedure has an identified owner responsible for its definition and performance. Clearly define the responsibilities of team members so that they can act appropriately, and have mechanisms to request additions, changes, and exceptions so that you do not constrain innovation. Define agreements between teams that describe how they work together to support each other and your business outcomes. Use tools or services that permit you to centrally govern your environments across accounts, such as AWS Organizations, to help manage your operating models. AWS Control Tower expands this management capability by allowing you to define blueprints for the setup of accounts, apply ongoing governance using AWS Organizations, and automate provisioning of new accounts. Adding managed services to your operating model can save you time and resources, and lets you keep your internal teams lean and focused on strategic outcomes that differentiate your business.

**Support organizational culture** : Provide support for your team members so they can be *more effective in taking action* and supporting your business outcomes. Engaged senior leadership sets expectations and measures success. Senior leadership is the sponsor, advocate, and driver for the adoption of best practices and the evolution of the organization. Let team members take action when outcomes are at risk to minimize impact, and encourage them to escalate to decision makers and stakeholders when they believe there is a risk so that it can be addressed and incidents avoided. Provide timely, clear, and actionable communications of known risks and planned events so that team members can take appropriate action. Encourage experimentation to accelerate learning and keep team members interested and engaged. Teams need to grow their skill sets to adopt new technologies and to support changes in demand and responsibilities. Support this by providing dedicated structured time for learning. Use cross-organizational diversity to seek multiple unique perspectives, increase innovation, challenge your assumptions, and reduce the risk of confirmation bias. Grow inclusion, diversity, and accessibility within your teams to gain beneficial perspectives.

:::

### Slide 12:

![Slide 12](slide_12.png)

::: Notes

Operational excellence requires thinking about how systems will be run. When you design as code, you gain reproducibility but add complexity. Runbooks and playbooks help with consistency, but they can also become outdated or miss novel situations. Ask yourself: what level of standardization makes sense for your environment? What flexibility do you lose?

#### Instructor notes

#### Student notes

As part of the operations team, you need to understand your workloads and the role you play in these workloads. Your goals should align and support business outcomes.

* **Design for operations** : The design of your workload should include how you will deploy, update, and operate it. View your entire workload as *code*, which includes your application, infrastructure, policy, governance, and operations. Include methods of observation, such as logs and metrics, for both technical and business indicators. By incorporating a tagging strategy into the design, you can manage several operational responsibilities such as automation, cost tracking, resource organization, and access control.
* **Operational readiness** : Operational readiness focuses on preparing to operate your workloads. Use *runbooks and playbooks* to help run your processes consistently. Runbooks are the predefined procedures to achieve a specific outcome. Provide consistent and prompt responses to well-understood events by documenting procedures in runbooks. Ensure that you have proper coverage, which includes staff, and the proper skill sets to operate your workload and your operations tools.

:::

### Slide 13:

![Slide 13](slide_13.png)

::: Notes

Understanding system health requires deciding what to measure and what baselines mean. What metrics matter most? How much instrumentation is enough before overhead becomes a problem? Root cause analysis provides insights but is also subject to hindsight bias. Consider: what are the limits of observability? When does more data create less clarity?

#### Instructor notes

#### Student notes

**Understand operational health** : Have a strategy to easily understand the operational health of your workload. Collect *log data and define baseline metrics*. Then use Amazon CloudWatch
dashboards to present both system-level and business-level views of your
metrics. **Respond to events** : Be ready to respond to both planned and
unplanned events. Use *runbooks and playbooks* to respond to these events.
You can invoke script responses to operational events by monitoring
metrics. You can use a *root cause analysis* to prevent the reoccurrence
of the unplanned event.

:::

### Slide 14:

![Slide 14](slide_14.png)

::: Notes

Continuous improvement requires examining what actually happened and why. Analyze successes and failures, but be aware that interpretation varies—the same event can support different conclusions. Logging creates data, but it doesn't automatically create understanding. Consider: how do you distinguish signal from noise in your operational data? What obstacles prevent teams from actually adopting lessons?

#### Instructor notes

#### Student notes

**Learn from experience** : Provide time to analyze your operations activities, failures, and experimentation to improve your procedures. *Implement a logging strategy* that aggregates logs of operations activities, workloads, and infrastructure with CloudWatch. Your strategy should include analysis of the logs data and a method to visualize the data.

* **Share learning** : *Share lessons learned* with other teams in your organization. This can prevent avoidable errors, but it also eases the development efforts. By using code methodologies for applications, compute, infrastructure, and operations, you can facilitate easy release, sharing, and adoption. You can share Amazon Machine Images (AMIs), AWS Lambda functions, AWS CloudFormation templates, and other resources.

:::

### Slide 15:

![Slide 15](slide_15.png)

::: Notes

AWS offers services that address various operational challenges. These tools can support your practices, but they work within particular constraints and assumptions. Using a service doesn't automatically solve the underlying problem—you still must answer the difficult questions about what to measure, when to act, and how to coordinate your team.

#### Instructor notes

Walk through each row and briefly describe how each service supports the corresponding best practice area.

#### Student notes

The following are services associated with the organizational best practices.

**Organizational priorities** (Operational Excellence pillar best practice): Consider using the *AWS Well-Architected Tool*, *AWS Trusted Advisor*, and *AWS Support* with this best practice.

**Managing operating models** (Operational Excellence pillar best practice): Consider using *AWS Organizations* and *AWS Control Tower* with this best practice.

**Organizational culture** (Operational Excellence pillar best practice): Consider using *AWS Training and Certification* and *AWS Managed Services* with this best practice.

:::

### Slide 16:

![Slide 16](slide_16.png)

::: Notes

Various AWS services address operational challenges: infrastructure as code, configuration tracking, systems management. However, tools alone don't create excellence—they require clear operational thinking and discipline. Ask yourself: which of these services actually solve my problems versus creating new dependencies?

#### Instructor notes

#### Student notes

The following are services associated with the *operational excellence best practices*:

**Operational priorities** : *AWS Trusted Advisor* and *AWS Support* provide real-time guidance and support to help you follow AWS best practices.

**Design for operations** : *CloudFormation*, *AWS Developer Tools*, and *AWS X-Ray* help you create infrastructure as code and trace requests through your applications.

**Operational readiness** : *AWS Config* and *AWS Systems Manager* help you track changes and automate management tasks on your instances.

:::

### Slide 17:

![Slide 17](slide_17.png)

::: Notes

Monitoring and response tools offer real-time visibility, but they raise questions: What should trigger automatic responses versus manual intervention? How do you avoid alert fatigue without missing critical issues? Tools provide data and automation, but human judgment remains central to operational decisions.

#### Instructor notes

#### Student notes

The following are services associated with the *operational excellence best practices* :

**Understand operational health** : *Amazon CloudWatch Logs*, *Amazon OpenSearch Service*, and *AWS Health Dashboard* provide insights into your system operations and logs.

**Respond to events** : *CloudWatch*, *Amazon EventBridge*, *Amazon SNS*, *AWS Auto Scaling*, and *Systems Manager* help you monitor, respond to, and scale your workloads.

:::

### Slide 18:

![Slide 18](slide_18.png)

::: Notes

As workloads evolve, gaining insights requires tools for analysis and communication. But sharing practices across an organization faces real obstacles: What works in one team may not transfer. Knowledge degrades over time. Consider how your organization actually adopts new practices—tools are necessary but not sufficient.

#### Instructor notes

#### Student notes

The following are services associated with *operational excellence* that help gain *insight as workloads evolve* :

**Gain insights** : *Amazon QuickSight*, *Amazon Athena*, and *CloudWatch* help you visualize and analyze your operational data.

**Share updated standards** : *Amazon SNS*, *AMIs*, *Lambda*, and *CloudFormation* enable you to share and standardize operational procedures and infrastructure.

:::

### Slide 19:

![Slide 19](slide_19.png)

::: Notes

The AWS Well-Architected Tool provides a structured review process against documented principles. It offers a framework for conversation and evaluation. However, the tool reflects particular assumptions about what matters. Use it to test your thinking, not to replace it—the recommendations still require your judgment about context and trade-offs.

#### Instructor notes

#### Student notes

The *AWS Well-Architected Tool* is a *self-service tool* that helps you to review and measure the state of your workloads in comparison to the latest AWS architectural best practices. The AWS Well-Architected Tool provides recommendations for making your workloads more *reliable, secure, efficient, and cost-effective*.

:::

### Slide 20:

![Slide 20](slide_20.png)

::: Notes

The Tool poses questions about your operational practices. Answering them reveals where your architecture differs from documented patterns, but alignment with a framework doesn't guarantee success. The questions may not address your specific constraints or environment. Use them as a starting point for deeper thinking about your actual systems.

#### Instructor notes

#### Student notes

These are *sample questions* from the Operational Excellence pillar in the AWS Well-Architected Tool.

:::

### Slide 21:

![Slide 21](slide_21.png)

::: Notes

The pillars represent competing concerns. Trade-offs between them are inevitable and must be made deliberately. A startup prioritizes different concerns than a regulated financial institution. An ecommerce system faces different constraints than an internal tool. The critical skill is identifying what you're sacrificing when you choose a direction and whether that sacrifice is acceptable in your context.

#### Instructor notes

#### Student notes

You make *tradeoffs between pillars* based on your *business context*. These business decisions can drive your engineering priorities. You might optimize to reduce cost at the expense of reliability in development environments. For mission-critical solutions, you might optimize reliability with increased costs. In ecommerce solutions, performance can affect revenue and customer propensity to buy. Security and operational excellence are generally not traded off against the other pillars. For more information, review the AWS Well-Architected website at
https://aws.amazon.com/architecture/well-architected.

:::

### Slide 22:

![Slide 22](slide_22.png)

::: Notes

As you prepare for your company's cloud transition, use the framework as a reference point, not as a prescription. It captures patterns from deployed systems but doesn't account for every context. The Tool structures evaluation but doesn't make decisions for you. Your architectural thinking must go beyond frameworks to address your specific constraints, team capabilities, and business needs.

#### Instructor notes

#### Student notes

To summarize, you are a *systems administrator*. Your company, Example Corp., is transitioning to the cloud. As your department prepares for this transition, everyone must understand how to use the *AWS Well-Architected Framework*. It is a collection of principles and best practices that AWS has collected from years of working with thousands of customers making the same transition. The AWS Well-Architected Tool is a self-paced tool to help review your architecture and adopt best practices.

:::

### Slide 23:

![Slide 23](slide_23.png)

::: Notes


#### Instructor notes

#### Student notes

:::

### Slide 24:

![Slide 24](slide_24.png)

::: Notes

This appendix covers the remaining 5 pillars of the Well-Architected
Framework.

:::

### Slide 25:

![Slide 25](slide_25.png)

::: Notes

The security pillar addresses threat modeling and defense. It involves questions: Who needs access to what, and how do you verify that access is legitimate? How do you protect data from unauthorized use while keeping it accessible? What happens when a breach occurs? Security always involves trade-offs—perfect security is impractical, so you must decide what level of protection makes sense for your assets and threats.

#### Instructor notes

#### Student notes

The security pillar of the AWS Well-Architected Framework has the following *design principles* :

**Implement a strong identity foundation** : Apply the *principle of least privilege* or grant only the permissions required to perform a task. Also centralize permissions management and reduce the use of long-term credentials.

**Use traceability** : Track *activity in your environment* and who does it.

**Apply security at all layers** : With AWS, you can implement *security practices* at various levels: virtual private cloud (VPC), subnet, load balancer, and instances. Implement security also at the operating system and application level.

**Automate security best practices** : Design your architecture with security in mind, including the implementation of *controls that are defined and managed by code*.

**Protect data in transit and data at rest** : Use mechanisms such as *encryption, tokenization, and access control*.

**Keep people away from your data** : Design your workloads to eliminate the exposure of direct access to your data. Minimize manual data processing to protect you from data loss, tampering, and human error.

**Prepare for security events** : *Implement an incident management process* and incorporate automation into the process. For more information, review the "Security Pillar: AWS Well-Architected Framework" technical paper at
https://docs.aws.amazon.com/wellarchitected/latest/security-pillar/welcome.html.

:::

### Slide 26:

![Slide 26](slide_26.png)

::: Notes

Security implementation involves multiple design areas. Identity and access management determine who can do what—this requires clear policies and enforcement. Infrastructure protection uses layers of defense, but each layer adds complexity and cost. Data management requires classification and encryption, but these don't guarantee confidentiality. Incident response processes help, but they cannot prevent all breaches. Consider what your actual threat model is and whether your security posture matches it.

#### Instructor notes

#### Student notes

*Identity and access management* are key to securing your cloud workloads. Ensure that only *authorized and authenticated users* have access to your resources and services. Apply the principle of least privilege to determine the level of access you grant to your resources and services. Implement strong credential management and frequently review permissions.

* **Detective controls** : *Identify potential security threats* or incidents. Design these controls to *collect logs, analyze the data, and report potential risks*.
* **Infrastructure protection** : *Systems and services* within your workload are protected. Use *multiple layers of defense* to secure your workload. With Amazon Virtual Private Cloud (Amazon VPC), you can create private, secured, and scalable environments.
* **Data protection** : Use *data classification, management, and encryption*. Classify your data by levels of sensitivity. Manage your data by enforcing appropriate access controls and adopting effective backup solutions.
* **Incident response** : *Put processes in place* to identify the potential impact of an incident, facilitate containment, run forensics, and determine a root cause. You can use the capabilities of the APIs and system tagging to improve your response time.

:::

### Slide 27:

![Slide 27](slide_27.png)

::: Notes

The reliability pillar addresses availability and recovery. It asks: What happens when components fail? How quickly can you detect and recover? What level of availability do your users actually need? Designing for failure requires understanding your specific failure modes and recovery capabilities. Horizontal scaling and automation help, but they don't eliminate failure—they change its character.

#### Instructor notes

#### Student notes

The reliability pillar of the AWS Well-Architected Framework has the following *design principles* :

**Test recovery procedure** : *Test how your system fails* and validate your recovery procedures.

**Automatically recover from failure** : *Monitor key operational metrics* and initiate remediation actions based on these metrics.

**Scale horizontally** : *Increase aggregate system availability* by replacing one large resource with multiple small resources.

**Stop guessing your capacity needs** : Consider using an option you can *scale up and down as needed*.

**Manage change in automation** : *Changes to your infrastructure* should be made using automation. The changes that need to be managed include changes to the automation, which then can be tracked and reviewed. For more information, review the "Reliability Pillar: AWS Well-Architected Framework" technical paper at
https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/welcome.html.

:::

### Slide 28:

![Slide 28](slide_28.png)

::: Notes

Reliability requires understanding service limits and how they affect your system. Scaling addresses some constraints but creates others. Backups provide recovery options, but they require testing to verify they actually work. Recovery procedures degrade over time without practice. Consider: what are your actual failure scenarios? Do your preparations address them? How often do you actually test recovery?

#### Instructor notes

#### Student notes

**Understand foundational limits** : *Consider the foundational requirements* of your workload. Understand your *service limits* and how they can potentially impact your workloads.

**Configure to scale with demand** : *Change management* focuses on how your environment adapts to change in demand. Create a configuration that will scale with demand. One of the key components to a scalable environment is a *monitoring strategy* that automatically alerts when key performance indicators (KPIs) deviate from expected norms.

**Implement failure management** : *Failure management* focuses on identifying failures and the appropriate response. Back up your data, and periodically recovery the data to verify backup integrity. Monitor your workload to detect failures and automate remediation. Frequently test your recovery procedures and identify opportunities for improvement.

:::

### Slide 29:

![Slide 29](slide_29.png)

::: Notes

Performance efficiency involves matching resources to requirements and adapting as both change. Using managed services trades control for convenience. Global deployment improves some metrics while creating others. Serverless works for certain workloads but adds constraints. Experimentation reveals what actually matters, but it requires time and resources. Ask: what performance matters for your application? What are you willing to trade for efficiency?

#### Instructor notes

#### Student notes

The *performance efficiency pillar* of the AWS Well-Architected Framework has the following design principles:

**Democratize advanced technologies** : *Adopt new complex technologies* as a service from AWS or other cloud vendors rather than investing time and expertise internally.

**Go global in minutes** : *Deploy your workloads* in multiple AWS Regions around the world to lower latency for customers and improve their experience.

**Use serverless architectures** : Some workloads are better suited for a *serverless architecture* which removes the need for you to run and maintain servers.

**Experiment more often** : Because of the scalable nature of the cloud, you can carry out *comparative testing* with different configurations.

**Gain mechanical sympathy** : *Use the technology approach* that aligns best to what you are trying to achieve. For more information, review the "Performance Efficiency Pillar: AWS Well-Architected Framework" technical paper at
https://docs.aws.amazon.com/wellarchitected/latest/performance-efficiency-pillar/welcome.html.

:::

### Slide 30:

![Slide 30](slide_30.png)

::: Notes

Performance efficiency requires matching architecture to actual requirements, not assumed ones. Monitor what happens in production, not just test environments. As workloads evolve, your choices become suboptimal—plan to revisit them. Decide where performance actually matters; optimizing everything is wasteful. This requires continuous attention and willingness to change direction when evidence suggests it.

#### Instructor notes

#### Student notes

**Base selection on requirements** : Focus on *architecting the optimal solution* for your workloads. Decide which combination of architectural approaches best meets your requirements.

**Review architecture regularly** : *Develop your architecture to evolve* by using innovation that might help improve your workloads.

**Have a comprehensive monitoring strategy** : Use a strategy that includes *active and passive monitoring*. The strategy should involve five phases: generation, aggregation, real-time processing and alarms, storage, and analytics.

**Balance tradeoffs** : *Carefully balance tradeoffs* so that you can optimize your workloads. Focus on the areas where performance is most critical.

:::

### Slide 31:

![Slide 31](slide_31.png)

::: Notes

Cost optimization requires understanding the relationship between spending and value. The pay-as-you-go model means you can waste money quickly on unused resources. Measurement is essential, but it's also challenging—what counts as value? Managed services reduce your operational burden but increase costs in other areas. Consider: what's your actual cost per unit of value? Are you optimizing for the right metric?

#### Instructor notes

#### Student notes

The cost optimization pillar of the AWS Well-Architected Framework has the following design principles:

**Adopt a consumption model** : *Pay only for what you use*. This model aligns your spend with your business requirements.

**Measure overall efficiency** : *Measure the business output* of the workload and the costs associated with delivering it.

**Stop spending money on data center operations** : AWS does the work of racking, stacking, and powering servers. As a result, you can focus on your customers and organization projects.

**Analyze and attribute expenditure** : The cloud makes it easier to *identify the usage and cost* of systems, which provides transparent attribution of IT costs to individual workload owners.

**Use managed services to reduce costs** : Use *managed and application-level services* to reduce cost of ownership. For more information, review the "Cost Optimization Pillar: AWS Well-Architected Framework" technical paper at
https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html.

:::

### Slide 32:

![Slide 32](slide_32.png)

::: Notes

Cost optimization requires selecting appropriate resources and rightsizing them. Matching supply to demand is difficult when demand is unpredictable. Understanding spending requires visibility and accounting discipline. Regular review catches optimization opportunities but also creates overhead. The challenge is that optimizing for cost often conflicts with other priorities—you must decide what trade-offs you're willing to accept.

#### Instructor notes

#### Student notes

**Use cost-effective resources** : Focus on *cost savings* by using the appropriate services, resources, and configurations for your workload. Approaches for cost-effectiveness include *appropriate provisioning*, *rightsizing*, *purchase options*, *geographic selection*, and *managed services*.

**Match supply with demand** : *Matching supply with demand* is key to optimizing the cost efficiency of a workload. You can do this by eliminating costly and wasteful overprovisioning. You can automatically scale resources to match demand.

**Have expenditure awareness** : *Understand your business cost drivers*. This awareness is critical to managing your business expenditure effectively. Gather data, analyze, and then report on key factors such as stakeholders, visibility, governance, cost attribution, and tagging.

**Optimize over time** : *Review your architecture regularly* to ensure that it continues to be the most cost-effective. Evaluate the cost impact of using new services, features, technology, and resources.

:::

### Slide 33:

![Slide 33](slide_33.png)

::: Notes

The sustainability pillar addresses environmental impact. Understanding carbon footprint requires tracking energy use and emissions across your stack. Setting improvement goals is worthwhile but challenging—what baseline do you measure against? Hardware efficiency and managed services reduce impact, but they create other constraints. Consider: how much environmental impact is acceptable? Where should you focus your efforts relative to other priorities?

#### Instructor notes

#### Student notes

The sustainability pillar of the AWS Well-Architected Framework has the following design principles:

**Understand your impact** : *Measure the environmental impact* of your cloud workloads, from customer use to eventual decommissioning and retirement.

**Establish sustainability goals** : For each cloud workload, *establish long-term sustainability goals*, such as reducing the compute and storage resources required per transaction.

**Maximize utilization** : *Rightsize workloads* and implement efficient design to ensure high utilization and maximize the energy efficiency of the underlying hardware.

**Use managed services** : *Sharing services* across a broad customer base helps maximize resource use, which reduces the amount of infrastructure needed to support cloud workloads.

**Reduce the downstream impact** : *Reduce the amount of energy or resources* required to use your services. Reduce or eliminate the need for customers to upgrade their devices to use your services. For more information, review the "Sustainability Pillar: AWS Well-Architected Framework" technical paper at
https://docs.aws.amazon.com/wellarchitected/latest/sustainability-pillar/sustainability-pillar.html.

:::

### Slide 34:

![Slide 34](slide_34.png)

::: Notes

Sustainability implementation involves choosing infrastructure locations, optimizing for actual usage patterns, and managing data efficiently. Regional selection matters, but availability and latency constraints may override sustainability concerns. Understanding real usage patterns takes observation and analysis. Data management for efficiency creates other trade-offs. Ask yourself: which sustainability measures align with your other business priorities? Which require significant trade-offs? Where is your focus most valuable?

#### Instructor notes

#### Student notes

**Region selection** : Choose *Regions near Amazon renewable energy projects* and Regions where the grid has a published carbon intensity that is lower than other locations.

**User behavior patterns** : *User behavior best practices* describe how you can align your workload better to your customers. Look at when, where, and how your customers use your workload, especially what they don't use.

**Software and architecture patterns** : *Optimize software and architecture* for asynchronous and scheduled jobs. Remove or refactor workload components with low or no use.

**Data patterns** : *Analyze data patterns* to implement data management practices that reduce the provisioned storage required to support your workload.

**Hardware patterns** : Using the capabilities of the cloud, you can *make frequent changes* to your workload implementations and aim to *minimize the amount of hardware* you need to operate your workload.

**Development and deployment process** : *Adopt development and testing methods* that rapidly introduce sustainability improvements. Keep your workload up to date. Increase the utilization of your build environments.

:::

