@ -25,62 +25,189 @@ x Cortex-A57 clusters running at the following frequencies:
Juno supports CPU, cluster and system power down states, corresponding to power
levels 0, 1 and 2 respectively. It does not support any retention states.
We used the upstream `TF master as of 31/01/2017`_ , building the platform using
the `` ENABLE_RUNTIME_INSTRUMENTATION `` option:
.. code :: shell
make PLAT=juno ENABLE_RUNTIME_INSTRUMENTATION=1 \
SCP_BL2=<path/to/scp-fw.bin> \
BL33=<path/to/test-fw.bin> \
all fip
When using the debug build of TF, there was no noticeable difference in the
results.
The tests are based on an ARM-internal test framework. The release build of this
framework was used because the results in the debug build became skewed; the
console output prevented some of the tests from executing in parallel.
The tests consist of both parallel and sequential tests, which are broadly
described as follows:
- **Parallel Tests** This type of test powers on all the non-lead CPUs and
brings them and the lead CPU to a common synchronization point. The lead CPU
then initiates the test on all CPUs in parallel.
Given that runtime instrumentation using PMF is invasive, there is a small
(unquantified) overhead on the results. PMF uses the generic counter for
timestamps, which runs at 50MHz on Juno.
- **Sequential Tests** This type of test powers on each non-lead CPU in
sequence. The lead CPU initiates the test on a non-lead CPU then waits for the
test to complete before proceeding to the next non-lead CPU. The lead CPU then
executes the test on itself.
The following source trees and binaries were used:
- TF-A [`v2.9-rc0`_ ]
- TFTF [`v2.9-rc0`_ ]
Please see the Runtime Instrumentation `Testing Methodology`_ page for more
details.
Procedure
---------
#. Build TFTF with runtime instrumentation enabled:
.. code :: shell
make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
TESTS=runtime-instrumentation all
#. Fetch Juno's SCP binary from TF-A's archive:
.. code :: shell
curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \
https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin
#. Build TF-A with the following build options:
.. code :: shell
make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \
ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip
#. Load the following images onto the development board: `` fip.bin `` ,
`` scp_bl2.bin `` .
Results
-------
`` CPU_SUSPEND `` to deepest power level
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. table :: `` CPU_SUSPEND `` latencies (µs) to deepest power level in
parallel
+---------+------+-----------+---------+-------------+
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
+=========+======+===========+=========+=============+
| 0 | 0 | 243.76 | 239.92 | 6.32 |
+---------+------+-----------+---------+-------------+
| 0 | 1 | 663.5 | 30.32 | 167.82 |
+---------+------+-----------+---------+-------------+
| 1 | 0 | 105.12 | 22.84 | 5.88 |
+---------+------+-----------+---------+-------------+
| 1 | 1 | 384.16 | 19.06 | 4.7 |
+---------+------+-----------+---------+-------------+
| 1 | 2 | 523.98 | 270.46 | 4.74 |
+---------+------+-----------+---------+-------------+
| 1 | 3 | 950.54 | 220.9 | 89.2 |
+---------+------+-----------+---------+-------------+
.. table :: `` CPU_SUSPEND `` latencies (µs) to deepest power level in
serial
+---------+------+-----------+---------+-------------+
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
+=========+======+===========+=========+=============+
| 0 | 0 | 266.96 | 31.74 | 167.92 |
+---------+------+-----------+---------+-------------+
| 0 | 1 | 266.9 | 31.52 | 167.82 |
+---------+------+-----------+---------+-------------+
| 1 | 0 | 279.86 | 23.42 | 87.52 |
+---------+------+-----------+---------+-------------+
| 1 | 1 | 101.38 | 18.8 | 4.64 |
+---------+------+-----------+---------+-------------+
| 1 | 2 | 101.18 | 19.28 | 4.64 |
+---------+------+-----------+---------+-------------+
| 1 | 3 | 101.32 | 19.02 | 4.62 |
+---------+------+-----------+---------+-------------+
`` CPU_SUSPEND `` to power level 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. table :: `` CPU_SUSPEND `` latencies (µs) to power level 0 in
parallel
+---------+------+-----------+---------+-------------+
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
+=========+======+===========+=========+=============+
+---------+------+-----------+---------+-------------+
| 0 | 0 | 661.94 | 22.88 | 9.66 |
+---------+------+-----------+---------+-------------+
| 0 | 1 | 801.64 | 23.38 | 9.62 |
+---------+------+-----------+---------+-------------+
| 1 | 0 | 105.56 | 16.02 | 8.12 |
+---------+------+-----------+---------+-------------+
| 1 | 1 | 245.42 | 16.26 | 7.78 |
+---------+------+-----------+---------+-------------+
| 1 | 2 | 384.42 | 16.1 | 7.84 |
+---------+------+-----------+---------+-------------+
| 1 | 3 | 523.74 | 15.4 | 8.02 |
+---------+------+-----------+---------+-------------+
.. table :: `` CPU_SUSPEND `` latencies (µs) to power level 0 in serial
+---------+------+-----------+---------+-------------+
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
+=========+======+===========+=========+=============+
| 0 | 0 | 102.16 | 23.64 | 6.7 |
+---------+------+-----------+---------+-------------+
| 0 | 1 | 101.66 | 23.78 | 6.6 |
+---------+------+-----------+---------+-------------+
| 1 | 0 | 277.74 | 15.96 | 4.66 |
+---------+------+-----------+---------+-------------+
| 1 | 1 | 98.0 | 15.88 | 4.64 |
+---------+------+-----------+---------+-------------+
| 1 | 2 | 97.66 | 15.88 | 4.62 |
+---------+------+-----------+---------+-------------+
| 1 | 3 | 97.76 | 15.38 | 4.64 |
+---------+------+-----------+---------+-------------+
`` CPU_OFF `` on all non-lead CPUs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`` CPU_OFF `` on all non-lead CPUs in sequence then, `` CPU_SUSPEND `` on the lead
core to the deepest power level.
.. table :: `` CPU_OFF `` latencies (µs) on all non-lead CPUs
+---------+------+-----------+---------+-------------+
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
+=========+======+===========+=========+=============+
| 0 | 0 | 265.38 | 34.12 | 167.36 |
+---------+------+-----------+---------+-------------+
| 0 | 1 | 265.72 | 33.98 | 167.48 |
+---------+------+-----------+---------+-------------+
| 1 | 0 | 185.3 | 23.18 | 87.42 |
+---------+------+-----------+---------+-------------+
| 1 | 1 | 101.58 | 23.46 | 4.48 |
+---------+------+-----------+---------+-------------+
| 1 | 2 | 101.66 | 22.02 | 4.72 |
+---------+------+-----------+---------+-------------+
| 1 | 3 | 101.48 | 22.22 | 4.52 |
+---------+------+-----------+---------+-------------+
`` CPU_VERSION `` in parallel
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. table :: `` CPU_VERSION `` latency (µs) in parallel on all cores
+-------------+--------+--------------+
| Cluster | Core | Latency |
+=============+========+==============+
| 0 | 0 | 1.22 |
+-------------+--------+--------------+
| 0 | 1 | 1.2 |
+-------------+--------+--------------+
| 1 | 0 | 0.6 |
+-------------+--------+--------------+
| 1 | 1 | 1.08 |
+-------------+--------+--------------+
| 1 | 2 | 1.04 |
+-------------+--------+--------------+
| 1 | 3 | 1.04 |
+-------------+--------+--------------+
Annotated Historic Results
--------------------------
The following results are based on the upstream `TF master as of 31/01/2017`_ .
TF-A was built using the same build instructions as detailed in the procedure
above.
In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
CPU.
`` PSCI_ENTRY `` refers to the time taken from entering the TF PSCI implementation
to the point the hardware enters the low power state (WFI). Referring to the TF
runtime instrumentation points, this corresponds to:
`` (RT_INSTR_ENTER_HW_LOW_PWR - RT_INSTR_ENTER_PSCI) `` .
`` PSCI_EXIT `` refers to the time taken from the point the hardware exits the low
power state to exiting the TF PSCI implementation. This corresponds to:
`` (RT_INSTR_EXIT_PSCI - RT_INSTR_EXIT_HW_LOW_PWR) `` .
`` CFLUSH_OVERHEAD `` refers to the part of `` PSCI_ENTRY `` taken to flush the
caches. This corresponds to: `` (RT_INSTR_EXIT_CFLUSH - RT_INSTR_ENTER_CFLUSH) `` .
Note there is very little variance observed in the values given (~1us), although
the values for each CPU are sometimes interchanged, depending on the order in
which locks are acquired. Also, there is very little variance observed between
executing the tests sequentially in a single boot or rebooting between tests.
Given that runtime instrumentation using PMF is invasive, there is a small
(unquantified) overhead on the results. PMF uses the generic counter for
timestamps, which runs at 50MHz on Juno.
Results and Commentary
----------------------
`` PSCI_ENTRY `` corresponds to the powerdown latency, `` PSCI_EXIT `` the wakeup latency, and
`` CFLUSH_OVERHEAD `` the latency of the cache flush operation.
`` CPU_SUSPEND `` to deepest power level on all CPUs in parallel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -290,3 +417,5 @@ effects, given that these measurements are at the nano-second level.
.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0
.. _Testing Methodology: ../perf/psci-performance-methodology.html