This patch adds support for ARM Cortex-A35 processor in the CPU
specific framework, as described in the Cortex-A35 TRM (r0p0).
Change-Id: Ief930a0bdf6cd82f6cb1c3b106f591a71c883464
On the ARMv8 architecture, cache maintenance operations by set/way on the last
level of integrated cache do not affect the system cache. This means that such a
flush or clean operation could result in the data being pushed out to the system
cache rather than main memory. Another CPU could access this data before it
enables its data cache or MMU. Such accesses could be serviced from the main
memory instead of the system cache. If the data in the sysem cache has not yet
been flushed or evicted to main memory then there could be a loss of
coherency. The only mechanism to guarantee that the main memory will be updated
is to use cache maintenance operations to the PoC by MVA(See section D3.4.11
(System level caches) of ARMv8-A Reference Manual (Issue A.g/ARM DDI0487A.G).
This patch removes the reliance of Trusted Firmware on the flush by set/way
operation to ensure visibility of data in the main memory. Cache maintenance
operations by MVA are now used instead. The following are the broad category of
changes:
1. The RW areas of BL2/BL31/BL32 are invalidated by MVA before the C runtime is
initialised. This ensures that any stale cache lines at any level of cache
are removed.
2. Updates to global data in runtime firmware (BL31) by the primary CPU are made
visible to secondary CPUs using a cache clean operation by MVA.
3. Cache maintenance by set/way operations are only used prior to power down.
NOTE: NON-UPSTREAM TRUSTED FIRMWARE CODE SHOULD MAKE EQUIVALENT CHANGES IN
ORDER TO FUNCTION CORRECTLY ON PLATFORMS WITH SUPPORT FOR SYSTEM CACHES.
FixesARM-software/tf-issues#205
Change-Id: I64f1b398de0432813a0e0881d70f8337681f6e9a
This patch unifies the bakery lock api's across coherent and normal
memory implementation of locks by using same data type `bakery_lock_t`
and similar arguments to functions.
A separate section `bakery_lock` has been created and used to allocate
memory for bakery locks using `DEFINE_BAKERY_LOCK`. When locks are
allocated in normal memory, each lock for a core has to spread
across multiple cache lines. By using the total size allocated in a
separate cache line for a single core at compile time, the memory for
other core locks is allocated at link time by multiplying the single
core locks size with (PLATFORM_CORE_COUNT - 1). The normal memory lock
algorithm now uses lock address instead of the `id` in the per_cpu_data.
For locks allocated in coherent memory, it moves locks from
tzfw_coherent_memory to bakery_lock section.
The bakery locks are allocated as part of bss or in coherent memory
depending on usage of coherent memory. Both these regions are
initialised to zero as part of run_time_init before locks are used.
Hence, bakery_lock_init() is made an empty function as the lock memory
is already initialised to zero.
The above design lead to the removal of psci bakery locks from
non_cpu_power_pd_node to psci_locks.
NOTE: THE BAKERY LOCK API WHEN USE_COHERENT_MEM IS NOT SET HAS CHANGED.
THIS IS A BREAKING CHANGE FOR ALL PLATFORM PORTS THAT ALLOCATE BAKERY
LOCKS IN NORMAL MEMORY.
Change-Id: Ic3751c0066b8032dcbf9d88f1d4dc73d15f61d8b
This patch migrates the rest of Trusted Firmware excluding Secure Payload and
the dispatchers to the new platform and context management API. The per-cpu
data framework APIs which took MPIDRs as their arguments are deleted and only
the ones which take core index as parameter are retained.
Change-Id: I839d05ad995df34d2163a1cfed6baa768a5a595d
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
The return value from the SYS_WRITE semihosting operation is 0 if
the call is successful or the number of bytes not written, if there
is an error. The implementation of the write function in the
semihosting driver treats the return value as the number of bytes
written, which is wrong. This patch fixes it.
Change-Id: Id39dac3d17b5eac557408b8995abe90924c85b85
This patch fixes an issue in the cpu specific register reporting
of FVP AEM model whereby crash reporting itself triggers an exception
thus resulting in recursive crash prints. The input to the
'size_controlled_print' in the crash reporting framework should
be a NULL terminated string. As there were no cpu specific register
to be reported on FVP AEM model, the issue was caused by passing 0
instead of NULL terminated string to the above mentioned function.
Change-Id: I664427b22b89977b389175dfde84c815f02c705a
In order for the symbol table in the ELF file to contain the size of
functions written in assembly, it is necessary to report it to the
assembler using the .size directive.
To fulfil the above requirements, this patch introduces an 'endfunc'
macro which contains the .endfunc and .size directives. It also adds
a .func directive to the 'func' assembler macro.
The .func/.endfunc have been used so the assembler can fail if
endfunc is omitted.
FixesARM-Software/tf-issues#295
Change-Id: If8cb331b03d7f38fe7e3694d4de26f1075b278fc
Signed-off-by: Kévin Petit <kevin.petit@arm.com>
This patch removes the `owner` field in bakery_lock_t structure which
is the data structure used in the bakery lock implementation that uses
coherent memory. The assertions to protect against recursive lock
acquisition were based on the 'owner' field. They are now done based
on the bakery lock ticket number. These assertions are also added
to the bakery lock implementation that uses normal memory as well.
Change-Id: If4850a00dffd3977e218c0f0a8d145808f36b470
This patch optimizes the data structure used with the bakery lock
implementation for coherent memory to save memory and minimize memory
accesses. These optimizations were already part of the bakery lock
implementation for normal memory and this patch now implements
it for the coherent memory implementation as well. Also
included in the patch is a cleanup to use the do-while loop while
waiting for other contenders to finish choosing their tickets.
Change-Id: Iedb305473133dc8f12126726d8329b67888b70f1
This patch defines the ARRAY_SIZE macro for calculating number of elements
in an array and uses it where appropriate.
Change-Id: I72746a9229f0b259323972b498b9a3999731bc9b
The cpu-ops pointer was initialized before enabling the data cache in the cold
and warm boot paths. This required a DCIVAC cache maintenance operation to
invalidate any stale cache lines resident in other cpus.
This patch moves this initialization to the bl31_arch_setup() function
which is always called after the data cache and MMU has been enabled.
This change removes the need:
1. for the DCIVAC cache maintenance operation.
2. to initialise the CPU ops upon resumption from a PSCI CPU_SUSPEND
call since memory contents are always preserved in this case.
Change-Id: Ibb2fa2f7460d1a1f1e721242025e382734c204c6
The CPU specific reset handlers no longer have the freedom
of using any general purpose register because it is being invoked
by the BL3-1 entry point in addition to BL1. The Cortex-A57 CPU
specific reset handler was overwriting x20 register which was being
used by the BL3-1 entry point to save the entry point information.
This patch fixes this bug by reworking the register allocation in the
Cortex-A57 reset handler to avoid using x20. The patch also
explicitly mentions the register clobber list for each of the
callee functions invoked by the reset handler
Change-Id: I28fcff8e742aeed883eaec8f6c4ee2bd3fce30df
This patch adds the missing features to the C library included
in the Trusted Firmware to build PolarSSL:
- strcasecmp() function
- exit() function
- sscanf()* function
- time.h header file (and its dependencies)
* NOTE: the sscanf() function is not a real implementation. It just
returns the number of expected arguments by counting the number of
'%' characters present in the formar string. This return value is
good enough for PolarSSL because during the certificate parsing
only the return value is checked. The certificate validity period
is ignored.
Change-Id: I43bb3742f26f0bd458272fccc3d72a7f2176ab3d
This patch adds support to call the reset_handler() function in BL3-1 in the
cold and warm boot paths when another Boot ROM reset_handler() has already run.
This means the BL1 and BL3-1 versions of the CPU and platform specific reset
handlers may execute different code to each other. This enables a developer to
perform additional actions or undo actions already performed during the first
call of the reset handlers e.g. apply additional errata workarounds.
Typically, the reset handler will be first called from the BL1 Boot ROM. Any
additional functionality can be added to the reset handler when it is called
from BL3-1 resident in RW memory. The constant FIRST_RESET_HANDLER_CALL is used
to identify whether this is the first version of the reset handler code to be
executed or an overridden version of the code.
The Cortex-A57 errata workarounds are applied only if they have not already been
applied.
FixesARM-software/tf-issue#275
Change-Id: Id295f106e4fda23d6736debdade2ac7f2a9a9053
This patch moves the bakery locks out of coherent memory to normal memory.
This implies that the lock information needs to be placed on a separate cache
line for each cpu. Hence the bakery_lock_info_t structure is allocated in the
per-cpu data so as to minimize memory wastage. A similar platform per-cpu
data is introduced for the platform locks.
As a result of the above changes, the bakery lock api is completely changed.
Earlier, a reference to the lock structure was passed to the lock implementation.
Now a unique-id (essentially an index into the per-cpu data array) and an offset
into the per-cpu data for bakery_info_t needs to be passed to the lock
implementation.
Change-Id: I1e76216277448713c6c98b4c2de4fb54198b39e0
This patch is an optimization in the bakery_lock_get() function
which removes the wfe() when waiting for other contenders to choose
their ticket i.e when their `entering` flag is set. Since the time
taken to execute bakery_get_ticket() by other contenders is bounded,
this wait is a bounded time wait. Hence the removal of wfe() and the
corresponding sev() and dsb() in bakery_get_ticket() may result
in better time performance during lock acquisition.
Change-Id: I141bb21294226b54cb6e89e7cac0175c553afd8d
This patch fixes a crash due to corruption of cpu_ops
data structure. During the secondary CPU boot, after the
cpu_ops has been initialized in the per cpu-data, the
dcache lines need to invalidated so that the update in
memory can be seen later on when the dcaches are turned ON.
Also, after initializing the psci per cpu data, the dcache
lines are flushed so that they are written back to memory
and dirty dcache lines are avoided.
FixesARM-Software/tf-issues#271
Change-Id: Ia90f55e9882690ead61226eea5a5a9146d35f313
This patch fixes a bug in the bakery lock implementation where a data
synchronisation barrier instruction is not issued before sending an event as
mandated by the ARMv8 ARM. This can cause a event to be signalled before the
related memory accesses have completed resulting in erroneous execution.
FixesARM-software/tf-issues#272
Change-Id: I5ce02bf70afb001d967b9fa4c3f77442931d5349
This patch optimizes the Cortex-A57 cluster power down sequence by not
flushing the Level1 data cache. The L1 data cache and the L2 unified
cache are inclusive. A flush of the L2 by set/way flushes any dirty
lines from the L1 as well. This is a known safe deviation from the
Cortex-A57 TRM defined power down sequence. This optimization can be
enabled by the platform through the 'SKIP_A57_L1_FLUSH_PWR_DWN' build
flag. Each Cortex-A57 based platform must make its own decision on
whether to use the optimization.
This patch also renames the cpu-errata-workarounds.md to
cpu-specific-build-macros.md as this facilitates documentation
of both CPU Specific errata and CPU Specific Optimization
build macros.
Change-Id: I299b9fe79e9a7e08e8a0dffb7d345f9a00a71480
This the patch replaces the DSB SY with DSB ISH
after disabling L2 prefetches during the Cortex-A57
power down sequence.
Change-Id: I048d12d830c1b974b161224eff079fb9f8ecf52d
Prior to this patch, the errata workarounds were applied for any version
of the CPU in the release build and in the debug build an assert
failure resulted when the revision did not match. This patch applies
errata workarounds in the Cortex-A57 reset handler only if the 'variant'
and 'revision' fields read from the MIDR_EL1 match. In the debug build,
a warning message is printed for each errata workaround which is not
applied.
The patch modifies the register usage in 'reset_handler` so
as to adhere to ARM procedure calling standards.
FixesARM-software/tf-issues#242
Change-Id: I51b1f876474599db885afa03346e38a476f84c29
This patch adds level specific cache maintenance functions
to cache_helpers.S. The new functions 'dcsw_op_levelx',
where '1 <= x <= 3', allow to perform cache maintenance by
set/way for that particular level of cache. With this patch,
functions to support cache maintenance upto level 3 have
been implemented since it is the highest cache level for
most ARM SoCs.
These functions are now utilized in CPU specific power down
sequences to implement them as mandated by processor specific
technical reference manual.
Change-Id: Icd90ce6b51cff5a12863bcda01b93601417fd45c
This patch adds workarounds for selected errata which affect the Cortex-A57 r0p0
part. Each workaround has a build time flag which should be used by the platform
port to enable or disable the corresponding workaround. The workarounds are
disabled by default. An assertion is raised if the platform enables a workaround
which does not match the CPU revision at runtime.
Change-Id: I9ae96b01c6ff733d04dc733bd4e67dbf77b29fb0
This patch adds handlers for dumping Cortex-A57 and Cortex-A53 specific register
state to the CPU specific operations framework. The contents of CPUECTLR_EL1 are
dumped currently.
Change-Id: I63d3dbfc4ac52fef5e25a8cf6b937c6f0975c8ab
This patch adds CPU core and cluster power down sequences to the CPU specific
operations framework introduced in a earlier patch. Cortex-A53, Cortex-A57 and
generic AEM sequences have been added. The latter is suitable for the
Foundation and Base AEM FVPs. A pointer to each CPU's operations structure is
saved in the per-cpu data so that it can be easily accessed during power down
seqeunces.
An optional platform API has been introduced to allow a platform to disable the
Accelerator Coherency Port (ACP) during a cluster power down sequence. The weak
definition of this function (plat_disable_acp()) does not take any action. It
should be overriden with a strong definition if the ACP is present on a
platform.
Change-Id: I8d09bd40d2f528a28d2d3f19b77101178778685d
This patch adds an optional platform API (plat_reset_handler) which allows the
platform to perform any actions immediately after a cold or warm reset
e.g. implement errata workarounds. The function is called with MMU and caches
turned off. This API is weakly defined and does nothing by default but can be
overriden by a platform with a strong definition.
Change-Id: Ib0acdccbd24bc756528a8bd647df21e8d59707ff
This patch introduces a framework which will allow CPUs to perform
implementation defined actions after a CPU reset, during a CPU or cluster power
down, and when a crash occurs. CPU specific reset handlers have been implemented
in this patch. Other handlers will be implemented in subsequent patches.
Also moved cpu_helpers.S to the new directory lib/cpus/aarch64/.
Change-Id: I1ca1bade4d101d11a898fb30fea2669f9b37b956
Move the remaining IO storage source file (io_storage.c) from the
lib to the drivers directory. This requires that platform ports
explicitly add this file to the list of source files.
Also move the IO header files to a new sub-directory, include/io.
Change-Id: I862b1252a796b3bcac0d93e50b11e7fb2ded93d6
The intent of io_init() was to allow platform ports to provide
a data object (io_plat_data_t) to the IO storage framework to
allocate into. The abstraction was incomplete because io_plat_data_t
uses a platform defined constant and the IO storage framework
internally allocates other arrays using platform defined constants.
This change simplifies the implementation by instantiating the
supporting objects in the IO storage framework itself. There is now
no need for the platform to call io_init().
The FVP port has been updated accordingly.
THIS CHANGE REQUIRES ALL PLATFORM PORTS THAT USE THE IO STORAGE
FRAMEWORK TO BE UDPATED.
Change-Id: Ib48ac334de9e538064734334c773f8b43df3a7dc
Fix the following issues with the console log output:
* Make sure the welcome string is the first thing in the log output
(during normal boot).
* Prefix each message with the BL image name so it's clear which
BL the output is coming from.
* Ensure all output is wrapped in one of the log output macros so it can
be easily compiled out if necessary. Change some of the INFO() messages
to VERBOSE(), especially in the TSP.
* Create some extra NOTICE() and INFO() messages during cold boot.
* Remove all usage of \r in log output.
FixesARM-software/tf-issues#231
Change-Id: Ib24f7acb36ce64bbba549f204b9cde2dbb46c8a3
The patch implements a macro ASM_ASSERT() which can
be invoked from assembly code. When assertion happens,
file name and line number of the check is written
to the crash console.
FixesARM-software/tf-issues#95
Change-Id: I6f905a068e1c0fa4f746d723f18df60daaa00a86
This patch reworks the manner in which the M,A, C, SA, I, WXN & EE bits of
SCTLR_EL3 & SCTLR_EL1 are managed. The EE bit is cleared immediately after reset
in EL3. The I, A and SA bits are set next in EL3 and immediately upon entry in
S-EL1. These bits are no longer managed in the blX_arch_setup() functions. They
do not have to be saved and restored either. The M, WXN and optionally the C
bit are set in the enable_mmu_elX() function. This is done during both the warm
and cold boot paths.
FixesARM-software/tf-issues#226
Change-Id: Ie894d1a07b8697c116960d858cd138c50bc7a069
This patch implements a "tf_printf" which supports only the commonly
used format specifiers in Trusted Firmware, which uses a lot less
stack space than the stdlib printf function.
FixesARM-software/tf-issues#116
Change-Id: I7dfa1944f4c1e634b3e2d571f49afe02d109a351
This patch adds a 'flags' parameter to each exception level specific function
responsible for enabling the MMU. At present only a single flag which indicates
whether the data cache should also be enabled is implemented. Subsequent patches
will use this flag when enabling the MMU in the warm boot paths.
Change-Id: I0eafae1e678c9ecc604e680851093f1680e9cefa
Currently the TCR bits are hardcoded in xlat_tables.c. In order to
map higher physical address into low virtual address, the TCR bits
need to be configured accordingly.
This patch is to save the max VA and PA and calculate the TCR.PS/IPS
and t0sz bits in init_xlat_tables function.
Change-Id: Ia7a58e5372b20200153057d457f4be5ddbb7dae4
Making the simple mmio_read_*() and mmio_write_*() functions inline
saves 360 bytes of code in FVP release build.
FixesARM-software/tf-issues#210
Change-Id: I65134f9069f3b2d8821d882daaa5fdfe16355e2f
The bakery lock code currently expects the calling code to pass
the MPIDR_EL1 of the current CPU.
This is not always done correctly. Also the change to provide
inline access to system registers makes it more efficient for the
bakery lock code to obtain the MPIDR_EL1 directly.
This change removes the mpidr parameter from the bakery lock
interface, and results in a code reduction of 160 bytes for the
ARM FVP port.
FixesARM-software/tf-issues#213
Change-Id: I7ec7bd117bcc9794a0d948990fcf3336a367d543
Replace the current out-of-line assembler implementations of
the system register and system instruction operations with
inline assembler.
This enables better compiler optimisation and code generation
when accessing system registers.
FixesARM-software/tf-issues#91
Change-Id: I149af3a94e1e5e5140a3e44b9abfc37ba2324476
Current ATF uses a direct physical-to-virtual mapping, that is, a physical
address is mapped to the same address in the virtual space. For example,
physical address 0x8000_0000 is mapped to 0x8000_0000 virtual. This
approach works fine for FVP as all its physical addresses fall into 0 to
4GB range. But for other platform where all I/O addresses are 48-bit long,
If we follow the same direct mapping, we would need virtual address range
from 0 to 0x8fff_ffff_ffff, which is about 144TB. This requires a
significant amount of memory for MMU tables and it is not necessary to use
that much virtual space in ATF.
The patch is to enable mapping a physical address range to an arbitrary
virtual address range (instead of flat mapping)
Changed "base" to "base_va" and added "base_pa" in mmap_region_t and
modified functions such as mmap_add_region and init_xlation_table etc.
FixesARM-software/tf-issues#158
Previously, the enable_mmu_elX() functions were implicitly part of
the platform porting layer since they were included by generic
code. These functions have been placed behind 2 new platform
functions, bl31_plat_enable_mmu() and bl32_plat_enable_mmu().
These are weakly defined so that they can be optionally overridden
by platform ports.
Also, the enable_mmu_elX() functions have been moved to
lib/aarch64/xlat_tables.c for optional re-use by platform ports.
These functions are tightly coupled with the translation table
initialization code.
FixesARM-software/tf-issues#152
Change-Id: I0a2251ce76acfa3c27541f832a9efaa49135cc1c
Previously, platform.h contained many declarations and definitions
used for different purposes. This file has been split so that:
* Platform definitions used by common code that must be defined
by the platform are now in platform_def.h. The exact include
path is exported through $PLAT_INCLUDES in the platform makefile.
* Platform definitions specific to the FVP platform are now in
/plat/fvp/fvp_def.h.
* Platform API declarations specific to the FVP platform are now
in /plat/fvp/fvp_private.h.
* The remaining platform API declarations that must be ported by
each platform are still in platform.h but this file has been
moved to /include/plat/common since this can be shared by all
platforms.
Change-Id: Ieb3bb22fbab3ee8027413c6b39a783534aee474a
This patch adds support in the TSP to program the secure physical
generic timer to generate a EL-1 interrupt every half second. It also
adds support for maintaining the timer state across power management
operations. The TSPD ensures that S-EL1 can access the timer by
programming the SCR_EL3.ST bit.
This patch does not actually enable the timer. This will be done in a
subsequent patch once the complete framework for handling S-EL1
interrupts is in place.
Change-Id: I1b3985cfb50262f60824be3a51c6314ce90571bc
Addresses were declared as "unsigned int" in drivers/arm/peripherals/pl011/pl011.h and in function init_xlation_table. Changed to use "unsigned long" instead
FixesARM-software/tf-issues#156
This patch implements the register reporting when unhandled exceptions are
taken in BL3-1. Unhandled exceptions will result in a dump of registers
to the console, before halting execution by that CPU. The Crash Stack,
previously called the Exception Stack, is used for this activity.
This stack is used to preserve the CPU context and runtime stack
contents for debugging and analysis.
This also introduces the per_cpu_ptr_cache, referenced by tpidr_el3,
to provide easy access to some of BL3-1 per-cpu data structures.
Initially, this is used to provide a pointer to the Crash stack.
panic() now prints the the error file and line number in Debug mode
and prints the PC value in release mode.
The Exception Stack is renamed to Crash Stack with this patch.
The original intention of exception stack is no longer valid
since we intend to support several valid exceptions like IRQ
and FIQ in the trusted firmware context. This stack is now
utilized for dumping and reporting the system state when a
crash happens and hence the rename.
FixesARM-software/tf-issues#79 Improve reporting of unhandled exception
Change-Id: I260791dc05536b78547412d147193cdccae7811a
The data cache clean and invalidate operations dcsw_op_all()
and dcsw_op_loius() were implemented to invoke a DSB and ISB
barrier for every set/way operation. This adds a substantial
performance penalty to an already expensive operation.
These functions have been reworked to provide an optimised
implementation derived from the code in section D3.4 of the
ARMv8 ARM. The helper macro setup_dcsw_op_args has been moved
and reworked alongside the implementation.
FixesARM-software/tf-issues#146
Change-Id: Icd5df57816a83f0a842fce935320a369f7465c7f
There are a small number of non-EL specific helper functions
which are no longer used, and also some unusable helper
functions for non-existant registers.
This change removes all of these functions.
Change-Id: Idd656cef3b59cf5c46fe2be4029d72288b649c24