Browse Source

esp32/mpthreadport: Fix uneven GIL allocation between Python threads.

Explicitly yield each time a thread mutex is unlocked.

Key to understanding this bug is that Python threads run at equal RTOS
priority, and although ESP-IDF FreeRTOS (and I think vanilla FreeRTOS)
scheduler will round-robin equal priority tasks in the ready state it does
not make a similar guarantee for tasks moving between ready and waiting.

The pathological case of this bug is when one Python thread task is busy
(i.e. never blocks) it will hog the CPU more than expected, sometimes for
an unbounded amount of time. This happens even though it periodically
unlocks the GIL to allow another task to run.

Assume T1 is busy and T2 is blocked waiting for the GIL. T1 is executing
and hits a condition to yield execution:

1. T1 calls MP_THREAD_GIL_EXIT
2. FreeRTOS sees T2 is waiting for the GIL and moves it to the Ready list
   (but does not preempt, as T2 is same priority, so T1 keeps running).
3. T1 immediately calls MP_THREAD_GIL_ENTER and re-takes the GIL.
4. Pre-emptive context switch happens, T2 wakes up, sees GIL is not
   available, and goes on the waiting list for the GIL again.

To break this cycle step 4 must happen before step 3, but this may be a
very narrow window of time so it may not happen regularly - and
quantisation of the timing of the tick interrupt to trigger a context
switch may mean it never happens.

Yielding at the end of step 2 maximises the chance for another task to run.

Adds a test that fails on esp32 before this fix and passes afterwards.

Fixes issue #15423.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
pull/15476/head
Angus Gratton 4 months ago
committed by Damien George
parent
commit
337742f6c7
  1. 5
      ports/esp32/mpthreadport.c
  2. 53
      tests/thread/thread_coop.py
  3. 2
      tests/thread/thread_coop.py.exp

5
ports/esp32/mpthreadport.c

@ -221,6 +221,11 @@ int mp_thread_mutex_lock(mp_thread_mutex_t *mutex, int wait) {
void mp_thread_mutex_unlock(mp_thread_mutex_t *mutex) { void mp_thread_mutex_unlock(mp_thread_mutex_t *mutex) {
xSemaphoreGive(mutex->handle); xSemaphoreGive(mutex->handle);
// Python threads run at equal priority, so pre-emptively yield here to
// prevent pathological imbalances where a thread unlocks and then
// immediately re-locks a mutex before a context switch can occur, leaving
// another thread waiting for an unbounded period of time.
taskYIELD();
} }
void mp_thread_deinit(void) { void mp_thread_deinit(void) {

53
tests/thread/thread_coop.py

@ -0,0 +1,53 @@
# Threads should be semi-cooperative, to the point where one busy
# thread can't starve out another.
#
# (Note on ports without the GIL this one should always be true, on ports with GIL it's
# a test of the GIL behaviour.)
import _thread
import sys
from time import ticks_ms, ticks_diff, sleep_ms
done = False
ITERATIONS = 5
SLEEP_MS = 250
MAX_DELTA = 30
if sys.platform in ("win32", "linux", "darwin"):
# Conventional operating systems get looser timing restrictions
SLEEP_MS = 300
MAX_DELTA = 100
def busy_thread():
while not done:
pass
def test_sleeps():
global done
ok = True
for _ in range(ITERATIONS):
t0 = ticks_ms()
sleep_ms(SLEEP_MS)
t1 = ticks_ms()
d = ticks_diff(t1, t0)
if d < SLEEP_MS - MAX_DELTA or d > SLEEP_MS + MAX_DELTA:
print("Slept too long ", d)
ok = False
print("OK" if ok else "Not OK")
done = True
# make the thread the busy one, and check sleep time on main task
_thread.start_new_thread(busy_thread, ())
test_sleeps()
sleep_ms(100)
done = False
# now swap them
_thread.start_new_thread(test_sleeps, ())
busy_thread()

2
tests/thread/thread_coop.py.exp

@ -0,0 +1,2 @@
OK
OK
Loading…
Cancel
Save