The millions of instructions that can be executed per second (MIPS) is one measure of processor performance. This figure depends on the processor architecture, the clock speed, the memory performance, and so on. The microcontroller can be clocked from one of three oscillator sources. A high speed external (HSE) clock is derived from a 25 MHz crystal oscillator connected between two pins of the microcontroller. A high speed internal (HSI) clock is sourced from an internal 16 MHz resistor-capacitor (RC) controlled oscillator, and a Phase Locked Loop (PLL) can be configured to provide multiples of either HSE or HSI.
A peripheral called
reset and clock control (RCC) allows the clock source to be selected and configured using a circuit known as a clock tree. The RCC peripheral also sources clocks for other microcontroller peripherals, and these also need to be configured. Following a hard reset, the RCC configuration is determined by the RCC register default values given in the RM0090 Reference Manual (www.st.com). Selecting Startup from the Device submenu of the RTE manager copies an assembly language file named
.s file extension is conventionally used to identify assembly language files) to our project. This file holds the exception table. The reset exception generated by a hard reset (that is, activating the reset button on the evaluation board) causes the microcontroller's program counter to be loaded with the address of the reset handler (identified by symbol
Reset_Handler), and this in turn calls a function named
SystemInit() defined in the file,
system_stm32f4xx.c . This function configures the RCC to use the 16 MHz HSI clock before calling the function
helloBlinky, and measure the frequency of the 'blinks'. We should see about 4 blinks/second or 4 Hz. It may be easier to count the blinks in a 10-second period.
When we examine the program code shown earlier, we see that the program spends most of its time executing the two nested
forloops. The statements inside these loops are executed thousands of times. Some readers may have spotted that there are no statements called inside the loop; but even so, the loop counter must be updated on each iteration. This operation requires a addition (ADD) instruction followed by a compare (
CMP) instruction to be executed.
We need to do some elementary math to work out how long it will take to execute these two instructions. Checking Table 3.1 of the ARM Cortex-M4 Processor Technical Reference Manual, we see that these each take 1 cycle to execute. Since
SystemInit()configures the RCC to use the HSI (16 MHz)clock, the time needed to switch the LED ON/OFF once will be 2 X (1000000) x 1/(16 x 106) x 2 (instructions) = 250 ms (that is, about 4 times per second).
To understand how the processor achieves this level of performance, we need to look at the processor architecture. The processor implements the ARMv7-M architecture profile described at http://infocenter.arm.com. ARMv7-M is a 32-bit architecture and the internal registers and data path are all 32-bit wide. ARMv7-M supports the Thumb Instruction Set Architecture (ISA) with Thumb-2 technology that includes both 16 and 32-bit instructions. ARM processors were originally inspired by Reduced Instruction Set Computing (RISC) architectures developed in the 1980s. RISC architecture attempted to improve on the performance of traditional computer architectures of the era that employed the so-called Complex Instruction Set Computing (CISC) architectures, by defining an ISA that supported a small number of instructions, each of which could be executed in one processor clock cycle, and so achieve a performance advantage. In the three decades since RISC was proposed, the size and complexity of RISC ISA's has increased, but the goal is still to minimize the number of clock cycles needed to execute each instruction. With this in mind, ARM Cortex-M3 and M4 processors have a three-stage instruction pipeline and Harvard bus architecture. Computers that use Harvard architecture have separate memories and busses for instructions and data rather than the shared memory systems used by von Neumann architectures, and the higher memory bandwidth this affords can achieve better performance.
The Cortex-M4 processor also provides signal processing support including a Single Instruction Multiple Data (SIMD) array processor and a fast Multiply Accumulator (MAC). Together with an optional Floating Point Unit (FPU), these features allow the Cortex-M4 to achieve much higher performance in Digital Signal Processing (DSP) applications than the earlier Cortex-M3.
Besides manufacturers' data sheets, there are a few books that address the Cortex-M4. Joseph Yiu's books (http://store.elsevier.com/Newnes/IMP_73/) on the Cortex-M3 and M4 processors are aimed at programmers, embedded product designers, and System-on-Chip (SoC) engineers. Books for undergraduate courses include a series of books by Jonathan Valvano (http://users.ece.utexas.edu/~valvano) and a text written by Daniel Lewis (http://catalogue.pearsoned.co.uk). Trevor Martin has also written an excellent guide to STM32 microcontrollers. This document is one of a number of insider guides that can be downloaded from http://www.hitex.com.