The instruction set of M7 are the same of M4 (see below), but a big
difference is a High performance 6 stage pipeline with dual-issue (it
executes up to two instructions per clock cycle).
M7 is a superscalar MCU, this means that it has the possibility to execute two instruction every clock cycle.
In other word means that Cortex M7 fetch from flash at 64bit.
But for reach this performance
is very important that the compiler is a clever compiler, this because
M7 has the possibility to execute (in the same time) one of this
instructions; see the red box below.
If the compiler write on two sequentially 32bit of flash, two load instructions, the M7 execute the first load and at the next clock cycle the second load... you have lost the possibility to execute two instruction every machine cycle.
At the moment (October 2015),
for reach the performance from L1 or TCM memory, you must manually
write the necessary instructions.
See the MPU instructions, in the Hands-On sections.
There are some examples concerning Cache, ITCM, DTCM, etc, in the Hands-On of this training.
ARM Cortex-Mx acronyms
ALU = Arithmetic Logic Unit
SIMD = Single instruction, multiple data
MPU = Memory Protection Unit
MAC = multiply–accumulate operation
LSU = load store unit
DPU = data processing unit
DTCM & ITCM = The memory system includes support for the connection of local Tightly Coupled Memory called ITCM (16K) and DTCM (64K)
STB = store buffer
BUI = Bus Interface Unit
TCU = Tightly-Coupled interface Unit
EPPB = External Private Peripheral Bus-The APB External PPB
Superscalar architecture first appeared on Intel Pentium 5 in 1993 => it means it is able to process multiple instructions in parallel (in our case, Cortex M7, up to 2, that’s why dual-issue)