esp32/mpthreadport: Fix uneven GIL allocation between Python threads.

Explicitly yield each time a thread mutex is unlocked. Key to understanding this bug is that Python threads run at equal RTOS priority, and although ESP-IDF FreeRTOS (and I think vanilla FreeRTOS) scheduler will round-robin equal priority tasks in the ready state it does not make a similar guarantee for tasks moving between ready and waiting. The pathological case of this bug is when one Python thread task is busy (i.e. never blocks) it will hog the CPU more than expected, sometimes for an unbounded amount of time. This happens even though it periodically unlocks the GIL to allow another task to run. Assume T1 is busy and T2 is blocked waiting for the GIL. T1 is executing and hits a condition to yield execution: 1. T1 calls MP_THREAD_GIL_EXIT 2. FreeRTOS sees T2 is waiting for the GIL and moves it to the Ready list (but does not preempt, as T2 is same priority, so T1 keeps running). 3. T1 immediately calls MP_THREAD_GIL_ENTER and re-takes the GIL. 4. Pre-emptive context switch happens, T2 wakes up, sees GIL is not available, and goes on the waiting list for the GIL again. To break this cycle step 4 must happen before step 3, but this may be a very narrow window of time so it may not happen regularly - and quantisation of the timing of the tick interrupt to trigger a context switch may mean it never happens. Yielding at the end of step 2 maximises the chance for another task to run. Adds a test that fails on esp32 before this fix and passes afterwards. Fixes issue #15423. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
4 months ago · 337742f6c7
3 changed files with 60 additions and 0 deletions
--- a/ports/esp32/mpthreadport.c
+++ b/ports/esp32/mpthreadport.c
@ -221,6 +221,11 @@ int mp_thread_mutex_lock(mp_thread_mutex_t *mutex, int wait) {
 void mp_thread_mutex_unlock(mp_thread_mutex_t *mutex) {
    xSemaphoreGive(mutex->handle);
    // Python threads run at equal priority, so pre-emptively yield here to
    // prevent pathological imbalances where a thread unlocks and then
    // immediately re-locks a mutex before a context switch can occur, leaving
    // another thread waiting for an unbounded period of time.
    taskYIELD();
 }
 void mp_thread_deinit(void) {
--- a/tests/thread/thread_coop.py
+++ b/tests/thread/thread_coop.py
@ -0,0 +1,53 @@
 # Threads should be semi-cooperative, to the point where one busy
 # thread can't starve out another.
 #
 # (Note on ports without the GIL this one should always be true, on ports with GIL it's
 # a test of the GIL behaviour.)
 import _thread
 import sys
 from time import ticks_ms, ticks_diff, sleep_ms
 done = False
 ITERATIONS = 5
 SLEEP_MS = 250
 MAX_DELTA = 30
 if sys.platform in ("win32", "linux", "darwin"):
    # Conventional operating systems get looser timing restrictions
    SLEEP_MS = 300
    MAX_DELTA = 100
 def busy_thread():
    while not done:
        pass
 def test_sleeps():
    global done
    ok = True
    for _ in range(ITERATIONS):
        t0 = ticks_ms()
        sleep_ms(SLEEP_MS)
        t1 = ticks_ms()
        d = ticks_diff(t1, t0)
        if d < SLEEP_MS - MAX_DELTA or d > SLEEP_MS + MAX_DELTA:
            print("Slept too long ", d)
            ok = False
    print("OK" if ok else "Not OK")
    done = True
 # make the thread the busy one, and check sleep time on main task
 _thread.start_new_thread(busy_thread, ())
 test_sleeps()
 sleep_ms(100)
 done = False
 # now swap them
 _thread.start_new_thread(test_sleeps, ())
 busy_thread()
--- a/tests/thread/thread_coop.py.exp
+++ b/tests/thread/thread_coop.py.exp
@ -0,0 +1,2 @@
 OK
 OK