Unit Metrics in the Cloud

Brent Segner
5 min readFeb 24, 2024

For any business to succeed, there is a tremendous need to develop a deep understanding of how efficiently the money spent to operate the company is being used. With the immense investment in the cloud, even small decreases in efficiency or most changes to the types of resources being selected can have profound cost implications at scale. Cloud unit metrics play an essential role in safeguarding against these small shifts in the operating environment, seismically impacting the income statement.

While these cloud unit metrics can serve as a “north star” in guiding organizations towards their goals, particularly in cost management and resource optimization, it becomes difficult to implement across non-homogenous environments. Whether companies are spread across a multi-cloud environment (GCP, AWS, Azure) or have different resource types in the same cloud (i.e., Compute: EC2, Lambda, Fargate, etc.), the value to the business has a single unit metric that scales.

Since the footprint within each organization is unique, it would take a blog post longer than I care to write (or anyone would care to read) to address all of the conceivable scenarios. Instead of boiling the ocean, I will focus this post on interpreting the CPU Hour metric across the AWS Compute footprint, explicitly focusing on the services mentioned above (EC2, Lambda, and Fargate). Similar to previous posts, this will reflect my interpretations and methodologies, so I welcome any feedback on how others may have approached the same challenge.

CPU Hour Calculations

If you are just starting down the unit metrics path, the CPU hour metric is among the most significant. With compute resources occupying one of the single most prominent elements of cloud spend, the CPU Hour metric can be used in a variety of different ways, from determining what percentage of the resource footprint is on cloud native technologies versus VMs, as well as whether the environment is increasing or decreasing in overall capacity over time. The challenge is that not all interpretations of CPU hours are straightforward, requiring some understanding. In this section we will break down the methodologies that can be used to infer CPU hours consumed across the three primary compute resource types within AWS.

Note: In these examples, we will use the AWS Cost & Usage Report (CUR) as the authoritative source for time & resources consumed. With the typical record volume in the CUR being reasonably large, the filters will be expressed as a filter on a Spark data frame. Still, the same methodology could be applied to an Athena Query or relational database if needed.

Fargate CPU Hours

Fargate is reported in the CUR is reasonably straightforward in that each task reports the total number of CPU hours consumed. Once the dataset is filtered (example below), the aggregate of the lineItem_UsageAmount field will be representative of the total CPU hours consumed by the tasks.

EC2 CPU Hours

While not quite as simple as Fargate, the EC2 CPU Hours metric is a reasonably straightforward calculation. At its core, it is simply the number of vCPU contained within an instance multiplied by the total duration that that model was running.

Example:

Assuming that the data frame is already filtered to the data range needed, the next step is to filter to the EC2 instances. As a side note, the data returned may/will include EC2 instances running ECS, depending on the services being run within the environment.

Once the data frame consists of just EC2 instances, the calculation can be applied to derive the number of CPU hours consumed.

Lambda CPU Hours

The least straightforward CPU Hours metric of the three to calculate is Lambda. Since the number of CPUs allocated to Lambda functions is primarily based on memory allocation at the time of provisioning, there is no direct reference in the CUR to CPU resources consumed. For the information to be obtained, it needs to be a derived number. According to AWS, approximately 1 CPU is allocated to each block, and 1769 MB of memory is consumed.

After we filter to the Lambda records, selecting the items when the LineItemDescription field contains ‘AWS Lambda — Total Compute,’ the lineItem_UsageAmount field is populated with a combination of memory allocated * run time. This value can then be divided by 1769 to determine the CPU hours used for execution.

Conclusion

After overcoming the initial hurdle of calculating the number of CPU Hours consumed by a service, the real benefit of how these “north star” metrics can be used to create insights into the business can be realized. Aside from simply understanding the total number of CPU hours consumed by the infrastructure, they are a stepping stone towards cost efficiency measurements such as “Cost Per CPU Hour” or sustainability measurements such as “Carbon Per CPU Hour,” which will be covered in an upcoming blog.

Ultimately, any unit metric, whether CPU Hours or another, should help confirm a proper alignment with business goals. Whether the focus is on reducing costs, improving efficiency, or optimizing performance, the best metrics are the ones that reflect organizational priorities and drive actions toward achieving the declared goals.

Note: The information and perspectives held within this blog represent my personal opinion and not that of my employer or foundations that I am affiliated with.

--

--

Brent Segner

Distinguished Engineer @ Capital One| Focused on all things FinOps | Passionate about Data Science, Machine Learning, AI, Python & Open Source