Book Image

The Insider's Guide to Arm Cortex-M Development

By : Zachary Lasiuk, Pareena Verma, Jason Andrews
Book Image

The Insider's Guide to Arm Cortex-M Development

By: Zachary Lasiuk, Pareena Verma, Jason Andrews

Overview of this book

Cortex-M has been around since 2004, so why a new book now? With new microcontrollers based on the Cortex-M55 and Cortex-M85 being introduced this year, Cortex-M continues to expand. New software concepts, such as standardized software reuse, have emerged alongside new topics including security and machine learning. Development methodologies have also significantly advanced, with more embedded development taking place in the cloud and increased levels of automation. Due to these advances, a single engineer can no longer understand an entire project and requires new skills to be successful. This book provides a unique view of how to navigate and apply the latest concepts in microcontroller development. The book is split into two parts. First, you’ll be guided through how to select the ideal set of hardware, software, and tools for your specific project. Next, you’ll explore how to implement essential topics for modern embedded developers. Throughout the book, there are examples for you to learn by working with real Cortex-M devices with all software available on GitHub. You will gain experience with the small Cortex-M0+, the powerful Cortex-M55, and more Cortex-M processors. By the end of this book, you’ll be able to practically apply modern Cortex-M software development concepts.
Table of Contents (15 chapters)
Part 1: Get Set Up
Part 2: Sharpen Your Skills

Processor selection based on performance and power

Another way to choose and understand Cortex-M processors is by ranking, based on how well they match performance and power requirements. Without structure, this can be a daunting task, with a wide range of possibilities (from the number of interrupts to the overall price, and everything in between). In this section, we will define six categories to evaluate and go over a few examples of how to use this in practice to select the right processor for your project. Again, this is also a helpful framework for understanding what Cortex-M processors’ capabilities are.

We will select the right processor using an approach we will call requirement heuristics. This means translating your key project requirements into predefined areas and following simple steps to get to the right Cortex-M processor. The six areas are listed here:

  • Power
  • DSP performance
  • ML performance
  • Security
  • Safety
  • Cost

In each area, we rank the processors that best meet the project requirement. You can then select the areas that matter most to your project and find the processor that meets these needs. Let’s discuss each area before showing some examples.


Minimizing power consumption is crucial in highly constrained power environments. A common use case is in distributed sensors that require long periods of continuous operation without being serviced.

When looking at power metrics, there are a few things that will help in understanding technical jargon. There are often two types of power measurements: static (or leakage) power consumption and dynamic power consumption. Static power consumption measures the amount of power used by the processor when not actively processing anything, such as being in a “sleep” mode but with the power still on. Dynamic power consumption measures the power consumed when the processor is actively working on a task. Often, dynamic power is measured using an industry-standard software workload called Dhrystone, to enable consistent comparisons. Power is measured in microwatts (uW)/megahertz (MHz). It is defined as power per MHz to enable consistent comparisons between processors running at different frequencies.

The following table shows the dynamic power of the different Cortex-M processors on the same node size:

Table 1.1 – Dynamic power across different Cortex-M processors on 40 LP node size

Table 1.1 – Dynamic power across different Cortex-M processors on 40 LP node size

Another factor that affects the power consumption of the core is the technology node size used to manufacture the silicon. This can be referred to as “technology," "process node," or just "node," depending on the context. The node size refers generally to the physical size of the transistors; as the node size decreases, more transistors can be packed onto the same-size silicon wafer. Smaller node sizes also generally lead to reduced power consumption, both static and dynamic. Understanding the node size is helpful for accurately comparing different chips or boards.

Using a consistent node size of 40 nm and publicly available benchmarking data, we can rank the Cortex-M processors on the low-power axis, like so:

Figure 1.2 – Ranking power consumption for Cortex-M

Figure 1.2 – Ranking power consumption for Cortex-M

Note that the processors in the preceding screenshot are ordered from best, starting from the top. In this case, the processors requiring lower power for operation are ranked visually higher. We are also not displaying all the Cortex-M processors in this (and subsequent) screenshots—we’re only displaying processors that perform well in the category. The spacing between processors is not intended to communicate precise quantitative differences, only a general ranking based on power consumption. The dotted line is also intended as a qualitative distinction, indicating a notable separation between the processor capabilities on this axis.

The Cortex-M0+ comes in first, as one of the lowest-power 32-bit processors on the market. It can get as low as 4 uW/MHz in dynamic power consumption when manufactured at 40 nm. This processor is being used at the forefront of low-power technologies. It can even be used in applications without a battery, relying on energy harvesting from the environment to power the device. Now that is low power! We talk about energy harvesting and ultra-low-power applications in Chapter 10, Looking Ahead.

The Cortex-M23 is essentially tied with the Cortex-M0+ in terms of minimizing power. It can achieve similar power figures as the Cortex-M0+ when configured minimally. The security features increase power consumption when included. Overall, given how new the Cortex-M23 is and that it is being used more often at lower node sizes such as 28 nm and below, the Cortex-M23 is equally viable for minimizing power consumption.

The Cortex-M0 also minimizes power draw and is only slightly behind the Cortex-M0+ and Cortex-M23. The Cortex-M0+ is typically a better option than the Cortex-M0, being so closely related.

The Cortex-M33, Cortex-M3, and Cortex-M4 all have about triple the power draw as the Cortex-M0+. If the lower-power-consumption processors do not have enough processing power or features for your use case, these processors are likely a good fit.

DSP performance

DSP is needed when taking real-world signals and digitizing them to perform some computations. This is exceedingly common in movement, image, or audio processing applications when data is coming in real time. Devices with sensors and motors to detect and act on real-time data rely heavily on DSP.

The computational nature of these types of DSP applications is really centered on what we call scalar processing. You may be familiar with the word scalar from math or physics classes. A scalar is a quantity that has only one characteristic. For example, measuring gas pressure 10 times a second for 1 second will produce 10 data points. Each point has one characteristic: the magnitude of the gas pressure at that instant. These types of DSP applications, which include audio processing as well, lend themselves well to scalar processing.

To measure how good Cortex-M processors are at scalar processing, there are two common benchmarks: CoreMark and Dhrystone. Using these imperfect but generally helpful benchmarks, you can compare how well different processors run scalar workloads such as the DSP use cases discussed previously. You can download and view the Dhrystone (Dhrystone Million Instructions per Second (DMIPS)/MHz) and CoreMark scores for all Cortex-M series processors here:

Using these publicly available CoreMark benchmark scores compiled with the Arm Compiler for Embedded, we can rank the Cortex-M processors in terms of DSP performance, as follows:

Figure 1.3 – Ranking DSP performance for Cortex-M

Figure 1.3 – Ranking DSP performance for Cortex-M

The benchmark numbers quoted next are valid at the time of this book’s publication. Due to subtle changes in firmware, benchmarks, and compilers, the numbers may change slightly over time. These changes will be small, and the rankings listed are still directionally accurate.

As the newest Arm Cortex-M processor, the Cortex-M85 provides the highest scalar and signal-processing performance to date in the Cortex-M family. It boasts a CoreMark score of 6.28 CoreMark/MHz, is suitable for the most demanding DSP applications, and also includes TrustZone security features.

The Cortex-M7, while being superseded by Cortex-M85, is still a good choice for less demanding DSP applications or where functional safety is critical. The Cortex-M7 has a CoreMark score of 5.29.

The Cortex-M55 and Cortex-M33 are similar in scalar performance, with a CoreMark score of 4.4 and 4.1 respectively.

The Cortex-M4 and Cortex-M3 are the next steps down in performance, with CoreMark scores of 3.54 and 3.44 respectively. The Cortex-M4 is better with DSP use cases due to its optional FPU (which the Cortex-M3 does not have). The Cortex-M4 is commonly used in sensor fusion, motor control, and wearables. The Cortex-M3 is used for more balanced applications with lower area and power requirements.

Applications involving video processing are more demanding than traditional DSP software and benefit from simultaneous processing, called vector processing. Vector processing accelerates the most popular workload today—ML.

ML performance

Because of its increased popularity and potential in edge devices, we will devote an entire chapter to ML in Chapter 6, Leveraging Machine Learning. In this section, we will give an overview of how ML workloads are executed in hardware to identify the right Cortex-M processor for the job.

ML, at a computational level, is matrix math. NNs are represented by layers of neurons, with each neuron in one layer being connected to each neuron in the next layer. When an input is given (such as a picture of a cat to an image recognition network), it gets separated into distinct features and sent through each layer, one at a time. In practice, this means at each layer, there is x number of inputs going into n number of nodes. This leads to x*n computations at each layer, of which there could be dozens, with potentially hundreds of nodes in each layer.

In scalar computing, this could result in tens of thousands of calculations performed one after the other. In vector computing, you instead store each node’s value in a row (or lane) and make x*n calculations all at once. This is the benefit of vector processing, which has existed in larger Arm cores for years via NEON technology. The Helium extension brings this technology to Cortex-M processors without significantly increasing area and power.

Using matrix multiplication performance as a benchmark, we can rank the Cortex-M processors in terms of ML performance, like so:

Figure 1.4 – Ranking ML performance for Cortex-M

Figure 1.4 – Ranking ML performance for Cortex-M

The Cortex-M85 processor boasts the most recent implementation of the Helium vector processing technology. It brings more ML functionality to edge devices and enhances applications such as robotics, drones, and smart home control.

The Cortex-M55 processor was the first Cortex-M processor with Helium vector processing technology. It brings anomaly and object detection use cases to the edge when implemented standalone. When paired with an NPU (such as the Ethos-U55 discussed earlier in this chapter), gesture detection and speech recognition use cases can be unlocked while still controlling power consumption and cost. Even by itself, the Cortex-M55 has about an order of magnitude (OOM) better ML performance than the next closest, the Cortex-M7.

The Cortex-M7 processor is a superscalar processor, meaning it enables the parallelization of scalar workloads. This effectively allows it to run DSP applications faster, but the more computationally intensive ML use cases are more of a challenge. This processor is suitable for basic ML use cases such as vibration and keyword detection.

The Cortex-M4 processor is often stretched to its computational limits when applied to ML use cases. In most cases, it should only be considered if the ML use case is around vibration/keyword detection or sensor fusion, and there is a strict power or cost constraint.


As the importance and ubiquity of IoT devices have increased in people’s lives, security has become a strong requirement. The basics of security such as cryptographic password storage are no longer acceptable. As the value and volume of what is stored on edge devices increases, malicious actors get proportionally more incentivized for hacks.

We will also devote Chapter 7, Enforcing Security, to the topic of security and dive into more specifics there. This section will give you an overview of the key considerations to remember when selecting a processor with security in mind. To successfully secure your software and project, the underlying hardware needs to enable some essential features such as software isolation, memory protection, secure boot, and more. Arm has implemented a security extension to the newer Cortex-M processors called TrustZone that enhances these security basics, adds more functionality in hardware, and makes security implementations easier. TrustZone enables you to physically isolate sections of memory or peripherals at the hardware level, making hacks more difficult and more contained if they do occur. The full details, benefits, and a quick-start guide for this extension will be provided in a later chapter.

Note that this is an optional extension, so make sure to verify it is enabled for the processor in any development board you are considering.

Using TrustZone and additional security features as a guide, we can rank the Cortex-M processors in terms of security features, as follows:

Figure 1.5 – Ranking security for Cortex-M processors

Figure 1.5 – Ranking security for Cortex-M processors

In practice, these processors all contain the TrustZone security extension and are all excellent options for developing a secure project. They are all based on the Armv8-M instruction architecture, with the other Cortex-M processors being based on Armv7-M or Armv6-M. They are ordered in terms of most recently released, but other requirements such as low power, ML, or DSP performance should decide which of these processors to select. Note that Arm also has a Platform Security Architecture (PSA) certification that validates the security implementations at the development-board level.

The PSA and TrustZone implementations in software are all discussed more in Chapter 7, Enforcing Security. Resources to learn more about the different Arm instruction sets (outside the scope of this book) are listed under the Further reading section at the end of this chapter.

Important note

The Cortex-M35P processor is a specialized processor that is intended for the highest level of security. It features built-in tamper resistance and physical protection from invasive and non-invasive attack vectors. Basically, it is ideal for devices that protect valuable resources but are accessible by the public, such as a smart door lock. The core is similar to the Cortex-M33, adding that physical layer of protection. If your product needs physical, tamper-proof security as the primary requirement, this is definitely a Cortex-M processor to consider.


In the embedded space, safety requirements typically come when in a regulated, safety-critical environment. These safety requirements can be categorized as either diagnostic requirements or systematic requirements. Diagnostic requirements relate to the management of random faults on the device and are addressed by the addition of hardware features for fault detection (FD) and control. Systematic requirements relate to demonstrating the avoidance of systematic failures and are addressed typically through the design process and verification.

Products sold to high-safety environments must prove a level of risk reduction as defined by international standards. The International Electrotechnical Commission (IEC) 61508 standard defines general Safety Integrity Levels (SILs), with SIL 4 being the most strict and SIL 1 being the least strict. The automotive industry has a dedicated level system, the Automotive Safety Integration Level (ASIL), with ASIL D being the strictest and ASIL A being the least strict.

Some common safety features include the following:

  • Exception handling, which prevents software crashes in the case of system faults.
  • MPUs that ensure data integrity from invalid behavior.
  • Software test libraries (STLs) test for faults at startup and runtime. Note that this is not a feature of the processor, but instead, a suite of software tests provided to run on a specific processor.
  • Dual-Redundant Core Lockstep (DCLS), where two processors redundantly run the same code to uncover and correct system errors.
  • Error Correction Code (ECC), which automatically detects and corrects memory errors.
  • Memory Built-In Self-Test (MBIST) interfaces that enable memory integrity validation while the processor is running.

We can rank the Cortex-M processors in terms of safety features, showing the cutoff for which processors are capable of reaching certain safety levels, as follows:

Figure 1.6 – Ranking safety features of Cortex-M processors

Figure 1.6 – Ranking safety features of Cortex-M processors

The Cortex-M7 is alone in the Cortex-M family in offering ECC, MBIST, and DCLS features alongside the more common MPU and exception handling. The Cortex-M55, Cortex-M33, and Cortex-M23 contain almost all of those features, but are still capable of meeting the strict SIL 3 and ASIL D safety levels.

The Cortex-M4, Cortex-M3, and Cortex-M0+ all offer enough safety features to achieve the least strict SIL 2 and ASIL B safety levels with STLs, MPUs, and exception handling.

The Cortex-M35P processor is highly effective for safety applications as well as security applications. It contains most of the already listed safety features, adding in heightened observability to ensure expected behavior and more.

Now that we have looked at some key features that can drive your processor selection, let us look at how cost can impact this decision-making process.


Minimizing cost is a common requirement in deeply embedded and IoT spaces. When looking at microcontrollers or boards to purchase, the cost should be obvious and does not require much explanation.

We can, however, provide some context for what contributes to the cost of a microcontroller, with the largest factor here being the silicon area. As the area of a microcontroller increases, it requires more materials to make and thus intuitively raises costs. Production volume will also impact the cost. The higher the production volume, the lower the cost will be. Typically, the smaller the Cortex-M processor is, the less expensive it will be to manufacture, and thus the lower the price to purchase. We will review the Cortex-M processors with the lowest area so that you have a starting point to look for boards with these processors to have the best chance of minimizing your overall cost.

Important note

These Cortex-M processors are highly configurable, and the implementation of more features will increase the area and likely increase cost. In looking for a microcontroller or board while minimizing cost, make sure to select a product with only the minimum set of features you need to hold down the cost as much as possible.

Here are the Cortex-M processors ranked in terms of lowest possible area:

Figure 1.7 – Ranking area of Cortex-M processors

Figure 1.7 – Ranking area of Cortex-M processors

The Cortex-M0 and Cortex-M0+ are tied at the lowest area and are commonly found at the lowest price points. These are excellent choices for low-cost applications if they have enough functionality.

The Cortex-M23 is just behind the Cortex-M0 and Cortex-M0+ in terms of area. The Cortex-M23 has the benefit of enhanced security features, making it a great choice for low-cost connected use cases.

The Cortex-M3, Cortex-M4, and Cortex-M33 all follow, being notably larger than the previous Cortex-M processors. These are the next best options to look at when you need more functionality while keeping costs low.

With this background about selecting Cortex-M processors, let’s look at how to find development boards that include Cortex-M processors.