Manish Pandey
2 years ago
committed by
TrustedFirmware Code Review
3 changed files with 175 additions and 1 deletions
@ -0,0 +1,117 @@ |
|||
PSCI Performance Measurement |
|||
============================ |
|||
|
|||
TF-A provides two instrumentation tools for performing analysis of the PSCI |
|||
implementation: |
|||
|
|||
* PSCI STAT |
|||
* Runtime Instrumentation |
|||
|
|||
This page explains how they may be enabled and used to perform all varieties of |
|||
analysis. |
|||
|
|||
Performance Measurement Framework |
|||
--------------------------------- |
|||
|
|||
The Performance Measurement Framework `PMF`_ is a framework that provides |
|||
mechanisms for collecting and retrieving timestamps at runtime from the |
|||
Performance Measurement Unit (`PMU`_). The PMU is a generalized abstraction for |
|||
accessing CPU hardware registers used to measure hardware events. This means, |
|||
for instance, that the PMU might be used to place instrumentation points at |
|||
logical locations in code for tracing purposes. |
|||
|
|||
TF-A utilises the PMF as a backend for the two instrumentation services it |
|||
provides--PSCI Statistics and Runtime Instrumentation. The PMF is used by |
|||
these services to facilitate collection and retrieval of timestamps. For |
|||
instance, the PSCI Statistics service registers the PMF service |
|||
``psci_svc`` to track its residency statistics. |
|||
|
|||
This is reserved a unique ID, name, and space in memory by the PMF. The |
|||
framework provides a convenient interface for PSCI Statistics to retrieve |
|||
values from ``psci_svc`` at runtime. Alternatively, the service may be |
|||
configured such that the PMF dumps those values to the console. A platform may |
|||
choose to expose SMCs that allow retrieval of these timestamps from the |
|||
service. |
|||
|
|||
This feature is enabled with the Boolean flag ``ENABLE_PMF``. |
|||
|
|||
PSCI Statistics |
|||
--------------- |
|||
|
|||
PSCI Statistics is a runtime service that provides residency statistics for |
|||
power states used by the platform. The service tracks residency time and |
|||
entry count. Residency time is the total time spent in a particular power |
|||
state by a PE. The entry count is the number of times the PE has entered |
|||
the power state. PSCI Statistics implements the optional functions |
|||
``PSCI_STAT_RESIDENCY`` and ``PSCI_STAT_COUNT`` from the `PSCI`_ |
|||
specification. |
|||
|
|||
|
|||
.. c:macro:: PSCI_STAT_RESIDENCY |
|||
|
|||
:param target_cpu: Contains copy of affinity fields in the MPIDR register |
|||
for identifying the target core (See section 5.1.4 of `PSCI`_ |
|||
specifications for more details). |
|||
:param power_state: identifier for a specific local |
|||
state. Generally, this parameter takes the same form as the power_state |
|||
parameter described for CPU_SUSPEND in section 5.4.2. |
|||
|
|||
:returns: Time spent in ``power_state``, in microseconds, by ``target_cpu`` |
|||
and the highest level expressed in ``power_state``. |
|||
|
|||
|
|||
.. c:macro:: PSCI_STAT_COUNT |
|||
|
|||
:param target_cpu: follows the same format as ``PSCI_STAT_RESIDENCY``. |
|||
:param power_state: follows the same format as ``PSCI_STAT_RESIDENCY``. |
|||
|
|||
:returns: Number of times the state expressed in ``power_state`` has been |
|||
used by ``target_cpu`` and the highest level expressed in |
|||
``power_state``. |
|||
|
|||
The implementation provides residency statistics only for low power states, |
|||
and does this regardless of the entry mechanism into those states. The |
|||
statistics it collects are set to 0 during shutdown or reset. |
|||
|
|||
PSCI Statistics is enabled with the Boolean build flag |
|||
``ENABLE_PSCI_STAT``. All Arm platforms utilise the PMF unless another |
|||
collection backend is provided (``ENABLE_PMF`` is implicitly enabled). |
|||
|
|||
Runtime Instrumentation |
|||
----------------------- |
|||
|
|||
The Runtime Instrumentation Service is an instrumentation tool that wraps |
|||
around the PMF to provide timestamp data. Although the service is not |
|||
restricted to PSCI, it is used primarily in TF-A to quantify the total time |
|||
spent in the PSCI implementation. The tool can be used to instrument other |
|||
components in TF-A as well. It is enabled with the Boolean flag |
|||
``ENABLE_RUNTIME_INSTRUMENTATION``, and as with PSCI STAT, requires PMF to |
|||
be enabled. |
|||
|
|||
In PSCI, this service provides instrumentation points in the |
|||
following code paths: |
|||
|
|||
* Entry into the PSCI SMC handler |
|||
* Exit from the PSCI SMC handler |
|||
* Entry to low power state |
|||
* Exit from low power state |
|||
* Entry into cache maintenance operations in PSCI |
|||
* Exit from cache maintenance operations in PSCI |
|||
|
|||
The service captures the cycle count, which allows for the time spent in the |
|||
implementation to be calculated, given the frequency counter. |
|||
|
|||
PSCI SMC Handler Instrumentation |
|||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|||
|
|||
The timestamp during entry into the handler is captured as early as possible |
|||
during the runtime exception, prior to entry into the handler itself. All |
|||
timestamps are stored in memory for later retrieval. The exit timestamp is |
|||
captured after normal return from the PSCI SMC handler, or, if a low power state |
|||
was requested, it is captured in the warm boot path. |
|||
|
|||
*Copyright (c) 2023, Arm Limited. All rights reserved.* |
|||
|
|||
.. _PMF: ../design/firmware-design.html#performance-measurement-framework |
|||
.. _PMU: performance-monitoring-unit.html |
|||
.. _PSCI: https://developer.arm.com/documentation/den0022/latest/ |
@ -0,0 +1,55 @@ |
|||
Runtime Instrumentation Methodology |
|||
=================================== |
|||
|
|||
This document outlines steps for undertaking performance measurements of key |
|||
operations in the Trusted Firmware-A Power State Coordination Interface (PSCI) |
|||
implementation, using the in-built Performance Measurement Framework (PMF) and |
|||
runtime instrumentation timestamps. |
|||
|
|||
Framework |
|||
~~~~~~~~~ |
|||
|
|||
The tests are based on the ``runtime-instrumentation`` test suite provided by |
|||
the Trusted Firmware Test Framework (TFTF). The release build of this framework |
|||
was used because the results in the debug build became skewed; the console |
|||
output prevented some of the tests from executing in parallel. |
|||
|
|||
The tests consist of both parallel and sequential tests, which are broadly |
|||
described as follows: |
|||
|
|||
- **Parallel Tests** This type of test powers on all the non-lead CPUs and |
|||
brings them and the lead CPU to a common synchronization point. The lead CPU |
|||
then initiates the test on all CPUs in parallel. |
|||
|
|||
- **Sequential Tests** This type of test powers on each non-lead CPU in |
|||
sequence. The lead CPU initiates the test on a non-lead CPU then waits for the |
|||
test to complete before proceeding to the next non-lead CPU. The lead CPU then |
|||
executes the test on itself. |
|||
|
|||
Note there is very little variance observed in the values given (~1us), although |
|||
the values for each CPU are sometimes interchanged, depending on the order in |
|||
which locks are acquired. Also, there is very little variance observed between |
|||
executing the tests sequentially in a single boot or rebooting between tests. |
|||
|
|||
Given that runtime instrumentation using PMF is invasive, there is a small |
|||
(unquantified) overhead on the results. PMF uses the generic counter for |
|||
timestamps, which runs at 50MHz on Juno. |
|||
|
|||
Metrics |
|||
~~~~~~~ |
|||
|
|||
.. glossary:: |
|||
|
|||
Powerdown Latency |
|||
Time taken from entering the TF PSCI implementation to the point the hardware |
|||
enters the low power state (WFI). Referring to the TF runtime instrumentation points, this |
|||
corresponds to: ``(RT_INSTR_ENTER_HW_LOW_PWR - RT_INSTR_ENTER_PSCI)``. |
|||
|
|||
Wakeup Latency |
|||
Time taken from the point the hardware exits the low power state to exiting |
|||
the TF PSCI implementation. This corresponds to: ``(RT_INSTR_EXIT_PSCI - |
|||
RT_INSTR_EXIT_HW_LOW_PWR)``. |
|||
|
|||
Cache Flush Latency |
|||
Time taken to flush the caches during powerdown. This corresponds to: |
|||
``(RT_INSTR_EXIT_CFLUSH - RT_INSTR_ENTER_CFLUSH)``. |
Loading…
Reference in new issue