Calculating Sustainability Metrics in a Public Cloud

Brent Segner
8 min readFeb 26, 2024

--

In the past few years, there has been tremendous growth in interest in environmental sustainability. Whether it is reusable water bottles, better gas mileage, or composting leftover food, sustainability has taken a central focus. These efforts have had various degrees of adoption and success. Still, the overarching trend is that these efforts get better traction when individuals and companies receive a benefit along with the environment.

As concerns about climate change and environmental impact continue to rise, many companies increasingly focus on reducing their carbon footprint and adopting more sustainable practices in all operations, including cloud computing. With the tremendous amount of power and water required to support services within hyperscale data centers, the current consensus is that they are responsible for 2%-5% of all annual greenhouse gas emissions.

The major cloud providers (AWS, GCP, Azure, etc) have made a tremendous effort to reduce the environmental impact of operations through a shift towards renewable energy sources such as wind, solar, and hydroelectric power to power their data centers. In addition to cultivating cleaner energy sources, there has been a significant focus on implementing energy-efficient technologies and practices within their data centers to minimize energy consumption. These efficiency measures include using advanced cooling systems, optimizing server utilization, and adopting hardware with higher energy efficiency ratings. As a result of these measures, the average Power Use Efficiency (PUE) score is 1.2 (potentially less) for hyper-scale data centers.

Many cloud providers are reporting virtually zero emissions by purchasing credits from carbon offset programs ( reforestation initiatives or renewable energy projects) to balance out the carbon emissions produced by their data centers. While this is a good news story in many regards, the need for more transparency in the infrastructure creates a challenge for customers. In order to maintain the narrative of being carbon neutral, it is a challenge for customers of these cloud providers to understand the Scope3 impact of the services running within the cloud provider infrastructure. Ultimately, this level of visibility is necessary for GreenOps (Green Operations) teams within businesses to make informed decisions and align their technology architecture with their sustainability goals.

Calculating Scope 3 Footprint

Although it initially appears to be an unlikely pairing, FinOps (Financial Operations) teams have emerged as a vehicle to enable GreenOps teams to work towards their carbon reduction goals. These FinOps and GreenOps are closely tied together because they fundamentally involve optimizing resource usage in cloud computing environments, albeit with different focuses: FinOps focuses on cost optimization, while GreenOps focuses on environmental sustainability. Despite their distinct goals, there are several areas where these practices intersect and complement each other. In the case of understanding the Scope 3 footprint, the FinOps practice of establishing unit metrics opens the door to doing meaningful estimations of power consumption and, in turn, carbon emissions from these environments.

The methodology described within this article to calculate wattage draw (and, in turn, carbon) leverages one of my previous blogs around determining the number of CPU hours consumed by various compute services in AWS (EC2, Lambda, FarGate). While the CPU/Instance hours are reasonably straightforward and can be calculated using the AWS Cost & Usage Report, deriving the power is more challenging since we do not have visibility into the infrastructure. Arriving at a Kilowatt hour (KWh) and Metric Ton of Carbon Equivalent (MTCO2e) calculation will require several assumptions.

Calculation Assumptions

Assumption 1- The first of these assumptions involves a wattage draw for the various resources. Since this article’s principal focus is on calculating the wattage draw based on reporting from the CUR, I will refrain from going too deep into the nuances of how the different processor types and utilization influence power consumption. If you want a deeper dive into the subject, an excellent article by @Benjamin Davy breaks down the issue well. Since this has been exceptionally well researched, we will use their AWS EC2 Instance Power Dataset for the foundation of our power calculations.

Assumption 2 — The second of these assumptions involves the oversubscription of CPU resources on AWS. In this context, we are not focused on the performance implications of oversubscription but rather on how it would reduce the presumed power draw for any CPU hour of utilization since the resource is shared. In the case of EC2, the previously mentioned power research referenced under Assumption 1 indicates that the resources might be slightly underclocked to achieve the best combination of power and performance. Still, the processors do not appear to be oversubscribed. The AWS documentation confirms this and offers a mechanism to adjust the P State and leverage Turbo Boost if additional performance is needed.

Although we do not have precise insight into how the infrastructure is tuned for Lambda and Fargate since those are managed services, performance testing indicates some oversubscription for these services. With the fall-off in performance, as I/O increases, the Fargate testing suggests a modest contention for resources. Using a conservative approach, we will assume that the oversubscription is at least 2:1. The Lambda performance benchmarking and platform economics are indicative of a much higher level of oversubscription (potentially up to 20:1). Still, given the nature of the service, it is challenging to define a specific number so we will use 3:1 as a starting point.

Assumption 3: The third assumption is around CPU model & utilization since the wattage draws directly influence the carbon calculations. In the case of the EC2, the CPU model is less relevant since the dataset used to calculate wattage draw already considers it, and average CPU utilization can be collected from CloudWatch metrics. The Fargate and Lambda functions require one last set of assumptions. AWS will likely use a slightly older CPU model as infrastructure that is not as heavily leveraged for EC2 requirements is repurposed to support serverless workloads. These would likely come in the Broadwell or Skylake families, consuming about 7 watts at 50% TDP.

Assumption 4: The fourth and final assumption is the calculation used to determine the MTCO2e produced for every KWh of power consumed. This calculation is published by the EPA and is intended for users who want to know the equivalencies associated with greenhouse gas emissions related to electricity consumed, not reduced, and based on a national average emissions factor.

Summarized Assumptions

The following table summarizes the base assumptions we will use for the analysis. These are intended to be conservative and reflective of performance testing results conducted within the field but can be changed based on individual testing results.

MTCO2e Calculations

Since each user will have their own tooling and naming conventions specific to their respective environment, this section will focus less on the line-by-line code and more on the methodology used to reconstruct the wattage consumed & carbon produced by each service. As a matter of reference, given the volume of data, the examples used within this section will leverage PySpark.

Step 1: Load the EC2 power dataset above into a data frame.

instance_data = [[‘a1.medium’, 1],

[‘a1.large’, 3],

[‘a1.xlarge’, 7],

[‘a1.2xlarge’, 15],

[‘a1.4xlarge’, 30],

[‘a1.metal’, 30],

[‘c1.medium’, 4],

[‘c1.xlarge’, 19]…….]

Step 2: Pull the usage from the EC2, Fargate & Lambda services from the Cost & Usage report. This blog defines the fields that can be used as explicit filters for each service. Note that for EC2, it is not necessary to determine the number of CPU hours since instance hours will be the foundation for the wattage calculation. As each data frame is pulled in, I like to label the service to make it easier to work with later on when we perform the aggregations.

Step 3: This step applies the wattage consumed per hour to the respective datasets. There will be a slight difference in how the EC2 dataset will be handled since a join is required with the power data frame versus how we will handle the serverless data. This step brings in several of the assumptions and data points from above with the conversion of watts to KWh (/1000), factoring in PUE (1.2), and conversion to MTCO2e ( .0042).

EC2 Only (Join EC2 * Instance Power Dataframes)

Apply Wattage / Carbon Calculation (Note the difference between EC2 & Lambda using EC2 instance watts versus 7 watts per CPU hour for serverless)

Step 4: Once all the data frames have applied the total wattage & carbon calculations, they can be merged back into a single standard data frame to make analytics easier.

Step 5: The fifth step applies the oversubscription ratio discussed as part of the second assumption. There are several different ways to do this, but I prefer to do it in two steps: first, add it to the data frame using the label we applied in Step 2, and then create a new column with the adjusted value.

Add oversubscription value

Create an adjusted carbon value, including the calculation.

Step 6: The final step brings us to a previous blog where the data can now be summarized into meaningful unit metrics.

Conclusion

Ultimately, calculating an estimation of a company’s carbon footprint in the public cloud is an imperfect science, given the variability of resources, volatility of the data, and lack of transparency in the environment. The goal that we are working towards is to maximize resource efficiency in cloud environments. Whether you are approaching the problem from a FinOps perspective to optimize resource allocation and utilization to minimize costs or the GreenOps perspective to reduce energy consumption and carbon emissions, success for both disciplines ends at the same net result. By optimizing resource usage to reduce energy consumption and carbon emissions, GreenOps practices can also lead to cost savings.

Overall, integrating FinOps and GreenOps practices can help organizations achieve synergies between cost optimization and environmental sustainability in their cloud operations. By aligning financial goals with ecological objectives, businesses can maximize the value of their cloud investments while minimizing their environmental impact. The approach taken within this article to attempt to quantify the carbon footprint in the public cloud is just one of many. If you have a different system or method that you have found successful towards these ends, I would love to hear about it in the comments.

Note: The information and perspectives held within this blog represent my personal opinion and not that of my employer or foundations that I am affiliated with.

--

--

Brent Segner

Distinguished Engineer @ Capital One| Focused on all things FinOps | Passionate about Data Science, Machine Learning, AI, Python & Open Source