summaryrefslogtreecommitdiffstats
path: root/vcl/README.scheduler.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-07 09:06:44 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-07 09:06:44 +0000
commited5640d8b587fbcfed7dd7967f3de04b37a76f26 (patch)
tree7a5f7c6c9d02226d7471cb3cc8fbbf631b415303 /vcl/README.scheduler.md
parentInitial commit. (diff)
downloadlibreoffice-upstream.tar.xz
libreoffice-upstream.zip
Adding upstream version 4:7.4.7.upstream/4%7.4.7upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--vcl/README.scheduler.md435
1 files changed, 435 insertions, 0 deletions
diff --git a/vcl/README.scheduler.md b/vcl/README.scheduler.md
new file mode 100644
index 000000000..bfaf53e01
--- /dev/null
+++ b/vcl/README.scheduler.md
@@ -0,0 +1,435 @@
+# VCL Scheduler
+
+## Introduction
+
+The VCL scheduler handles LOs primary event queue. It is simple by design,
+currently just a single-linked list, processed in list-order by priority
+using round-robin for reoccurring tasks.
+
+The scheduler has the following behaviour:
+
+B.1. Tasks are scheduled just priority based
+B.2. Implicitly cooperative AKA non-preemptive
+B.3. It's not "fair" in any way (a consequence of B.2)
+B.4. Tasks are handled round-robin (per priority)
+B.5. Higher priorities have lower values
+B.6. A small set of priorities instead of an flexible value AKA int
+
+There are some consequences due to this design.
+
+C.1. Higher priority tasks starve lower priority tasks
+ As long as a higher task is available, lower tasks are never run!
+ See Anti-pattern.
+
+C.2. Tasks should be split into sensible blocks
+ If this can't really be done, process pending tasks by calling
+ `Application::Reschedule()`. Or use a thread.
+
+C.3. This is not an OS scheduler
+ There is no real way to "fix" B.2. and B.3.
+ If you need to do a preemptive task, use a thread!
+ Otherwise make your task suspendable.
+
+
+## Driving the scheduler AKA the system timer
+
+ 1. There is just one system timer, which drives LO event loop
+ 2. The timer has to run in the main window thread
+ 3. The scheduler is run with the Solar mutex acquired
+ 4. The system timer is a single-shot timer
+ 5. The scheduler system event / message has a low system priority.
+ All system events should have a higher priority.
+
+Every time a task is started, the scheduler timer is adjusted. When the timer
+fires, it posts an event to the system message queue. If the next most
+important task is an Idle (AKA instant, 0ms timeout), the event is pushed to
+the back of the queue, so we don't starve system messages, otherwise to the
+front.
+
+Every time the scheduler is invoked it searches for the next task to process,
+restarts the timer with the timeout for the next event and then invokes the
+task. After invoking the task and if the task is still active, it is pushed
+to the end of the queue and the timeout is eventually adjusted.
+
+
+## Locking
+
+The locking is quite primitive: all interaction with internal Scheduler
+structures are locked. This includes the `ImplSchedulerContext` and the
+`Task::mpSchedulerData`, which is actually a part of the scheduler.
+Before invoking the task, we have to release the lock, so others can
+Start new Tasks.
+
+The Scheduler just processes its own Tasks in the main thread and needs
+the `SolarMutex` for it and for `DeInit` (tested by `DBG_TESTSOLARMUTEX`). All
+the other interaction just take the scheduler mutex or don't need locking
+at all.
+
+There is a "workaround" for static Task objects, which would crash LO on
+destruction, because `Task::~Task` would try to de-register itself in the
+Scheduler, while the `SchedulerLock` would be long gone. OTOH this makes
+Task handling less error-prone, than doing "special" manual cleanup.
+There is also a "= TODOs and ideas =" to get rid if static Tasks.
+
+Actually the scheduler mutex should never be locked when calling into
+non-scheduler code, so it was converted to a non-recursive
+`std::mutex`.
+
+
+## Idle processing
+
+Confusingly, there are 2 concepts that are called 'idle':
+
+* Instant (zero timeout) tasks, represented e.g. by the Idle class. This is
+a misnomer, as these tasks are processed after returning to the main loop.
+This is not necessarily when LO is idle, in fact such tasks may be invoked
+while there is input in the OS event queue pending.
+(TODO: This case should be fixed by renaming.)
+
+* Low priority tasks, represented by priorities `TaskPriority::HIGH_IDLE` and
+lower. In addition to being invoked only when there is no task with a higher
+priority, pending input in the OS event queue also takes precedence.
+
+
+## Lifecycle / thread-safety of Scheduler-based objects
+
+A scheduler object it thread-safe in the way, that it can be associated to
+any thread and any thread is free to call any functions on it. The owner must
+guarantee that the `Invoke()` function can be called, while the Scheduler object
+exists / is not disposed.
+
+
+## Anti-pattern: Dependencies via (fine grained) priorities
+
+"Idle 1" should run before "Idle 2", therefore give "Idle 1" a higher priority
+then "Idle 2". This just works correct for low frequency idles, but otherwise
+always breaks!
+
+If you have some longer work - even if it can be split by into schedulable,
+smaller blocks - you normally don't want to schedule it with a non-default
+priority, as it starves all lower priority tasks. Even if a block was processed
+in "Idle 1", it is scheduled with the same (higher) priority again. Changing
+the "Idle" to a "Timer" also won't work, as this breaks the dependency.
+
+What is needed is task based dependency handling, so if "Task 1" is done, it
+has to start "Task 2" and if "Task 1" is started again, it has to stop
+"Task 2". This currently has to be done by the implementor, but this feature
+can be added to the scheduler reasonably.
+
+
+## Implementation details
+
+### General: event priority for `DoYield`
+
+There are three types of events, with different priority:
+
+1. LO user events
+2. System events
+3. LO Scheduler event
+
+They should be processed according to the following code:
+
+```
+bool ImplYield(bool bWait, bool bAllCurrent)
+{
+ DBG_TESTSOLARMUTEX();
+ assert(IsMainThread());
+
+ bool bWasEvent = ProcessUserEvents( bAllCurrent );
+ if ( !bAllCurrent && bWasEvent )
+ return true;
+
+ SolarMutexReleaser();
+ bWasEvent = ProcessSystemEvents( bAllCurrent, &bWasSchedulerEvent ) || bWasEvent;
+ if ( !bWasSchedulerEvent && IsSchedulerEvent() )
+ {
+ ProcessSchedulerEvent()
+ bWasEvent = true;
+ }
+ if ( !bWasEvent && bWait )
+ {
+ WaitForSystemEvents();
+ bWasEvent = true;
+ }
+ return bWasEvent;
+}
+```
+
+### General: main thread deferral
+
+In almost all VCL backends, we run main thread deferrals by disabling the
+`SolarMutex` using a boolean. In the case of the redirect, this makes
+`tryToAcquire` and `doAcquire` return `true` or `1`, while a release is ignored.
+Also the `IsCurrentThread()` mutex check function will act accordingly, so all
+the `DBG_TESTSOLARMUTEX` won't fail.
+
+Since we just disable the locks when we start running the deferred code in the
+main thread, we won't let the main thread run into stuff, where it would
+normally wait for the SolarMutex.
+
+Eventually this will move into the `SolarMutex`. KDE / Qt also does main
+thread redirects using `Qt::BlockingQueuedConnection`.
+
+### General: processing all current events for `DoYield`
+
+This is easily implemented for all non-priority queue based implementations.
+Windows and macOS both have a timestamp attached to their events / messages,
+so simply get the current time and just process anything < timestamp.
+For the **KDE** backend this is already the default behaviour - single event
+processing isn't even supported. The headless backend accomplishes this by
+just processing a copy of the list of current events.
+
+Problematic in this regard is the **Gtk+** backend. `g_main_context_iteration`
+dispatches "only those highest priority event sources". There is no real way
+to tell, when these became ready. I've added a workaround idea to the TODO
+list. FWIW: **Qt** runs just a single timer source in the glib main context,
+basically the same we're doing with the LO scheduler as a system event.
+
+The gen **X11** backend has some levels of redirection, but needs quite some work
+to get this fixed.
+
+### General: non-main thread yield
+
+Yielding from a non-main thread must not wait in the main thread, as this
+may block the main thread until some events happen.
+
+Currently we wait on an extra conditional, which is cleared by the main event
+loop.
+
+### General: invalidation of elapsed timer event messages
+
+Since the system timer to run the scheduler is single-shot, there should never
+be more than one elapsed timer event in system event queue. When stopping or
+restarting the timer, we eventually have to remove the now invalid event from
+the queue.
+
+But for the Windows and macOS backends this may fail as they have delayed
+posting of events, so a consecutive remove after a post will actually yield no
+remove. On Windows we even get unwanted processing of events outside of the
+main event loop, which may call the Scheduler, as timer management is handled
+in critical scheduler code.
+
+To prevent these problems, we don't even try to remove these events, but
+invalidate them by versioning the timer events. Timer events with invalid
+versions are processed but simply don't run the scheduler.
+
+### General: track time of long running tasks
+
+There is `TaskStopwatch` class. It'll track the time and report a timeout either
+when the tasks time slice is finished or some system event did occur.
+
+Eventually it will be merged into the main scheduler, so each invoked task can
+easily track it's runtime and eventually this can be used to "blame" / find
+other long running tasks, so interactivity can be improved.
+
+There were some questions coming up when implementing it:
+
+#### Why does the scheduler not detect that we only have idle tasks pending,
+and skip the instant timeout?
+
+You never know how long a task will run. Currently the scheduler simply asks
+each task when it'll be ready to run, until two runnable tasks are found.
+Normally this is very quick, as LO has a lot of one-shot instant tasks / Idles
+and just a very few long term pending Timers.
+
+Especially UNO calls add a lot of Idles to the task list, which just need to
+be processed in order.
+
+#### Why not use things like Linux timer wheels?
+
+LO has relatively few timers and a lot one-shot Idles. 99% of time the search
+for the next task is quick, because there are just ~5 long term timers per
+document (cache invalidation, cursor blinking etc.).
+
+This might become a problem, if you have a lot of open documents, so the long
+term timer list increases AKA for highly loaded LOOL instances.
+
+But the Linux timer wheel mainly relies on the facts that the OS timers are
+expected to not expire, as they are use to catch "error" timeouts, which rarely
+happen, so this definitely not matches LO's usage.
+
+#### Not really usable to find misbehaving tasks
+
+The TaskStopwatch class is just a little time keeper + detecting of input
+events. This is not about misbehaving Tasks, but long running tasks, which
+have to yield to the Scheduler, so other Tasks and System events can be
+processed.
+
+There is the TODO to merge the functionality into the Scheduler itself, at
+which point we can think about profiling individual Tasks to improve
+interactivity.
+
+### macOS implementation details
+
+Generally the Scheduler is handled as expected, except on resize, which is
+handled with different runloop-modes in macOS. In case of a resize, the normal
+`runloop` is suspended in `sendEvent`, so we can't call the scheduler via posted
+main loop-events. Instead the scheduler uses the timer again.
+
+Like the Windows backend, all Cocoa / GUI handling also has to be run in
+the main thread. We're emulating Windows out-of-order PeekMessage processing,
+via a `YieldWakeupEvent` and two conditionals. When in a `RUNINMAIN` call, all
+the `DBG_TESTSOLARMUTEX` calls are disabled, as we can't release the
+`SolarMutex`, but we can prevent running any other `SolarMutex` based code.
+Those wakeup events must be ignored to prevent busy-locks. For more info read
+the "General: main thread deferral" section.
+
+We can neither rely on macOS dispatch_sync code block execution nor the
+message handling, as both can't be prioritized or filtered and the first
+does also not allow nested execution and is just processed in sequence.
+
+There is also a workaround for a problem for pushing tasks to an empty queue,
+as [NSApp postEvent: ... atStart: NO] doesn't append the event, if the
+message queue is empty.
+
+An additional problem is the filtering of events on Window close. This drops
+posted timer events, when a Window is closed resulting in a busy DoYield loop,
+so we have to re-post the event, after closing a window.
+
+### Windows implementation details
+
+Posted or sent event messages often trigger processing of WndProc in
+`PeekMessage`, `GetMessage` or `DispatchMessage`, independently from the message
+to fetch, remove or dispatch ("During this call, the system delivers pending,
+nonqueued messages..."). Additionally messages have an inherited priority
+based on the function used to generate them. Even if `WM_TIMER` messages should
+have the lowest priority, a manually posted `WM_TIMER` is processed with the
+priority of a PostMessage message.
+
+So we're giving up on processing all our Scheduler events as a message in the
+system message loop. Instead we just indicate a 0ms timer message by setting
+the `m_bDirectTimeout` in the timer object. This timer is always processed, if
+the system message wasn't already our timer. As a result we can also skip the
+polling. All this is one more reason to drop the single message processing
+in favour of always processing all pending (system) events.
+
+There is another special case, we have to handle: window updates during move
+and resize of windows. These system actions run in their own nested message
+loop. So we have to completely switch to timers, even for 0 ms. But these
+posted events prevent any event processing, while we're busy. The only viable
+solution seems to be to switch to `WM_TIMER` based timers, as these generate
+messages with the lowest system priority (but they don't allow 0 ms timeouts).
+So processing slows down during resize and move, but we gain working painting,
+even when busy.
+
+An additional workaround is implemented for the delayed queuing of posted
+messages, where `PeekMessage` in `WinSalTimer::Stop()` won't be able remove the
+just posted timer callback message. See "General: invalidation of elapsed
+timer event messages" for the details.
+
+To run the required GUI code in the main thread without unlocking the
+`SolarMutex`, we "disable" it. For more infos read the "General: main thread
+deferral" section.
+
+### KDE implementation details
+
+This implementation also works as intended. But there is a different Yield
+handling, because Qts `QAbstractEventDispatcher::processEvents` will always
+process all pending events.
+
+
+## TODOs and ideas
+
+### Task dependencies AKA children
+
+Every task can have a list of children / a child.
+
+ * When a task is stopped, the children are started.
+ * When a task is started, the children are stopped.
+
+This should be easy to implement.
+
+### Per priority time-sorted queues
+
+This would result in O(1) scheduler. It was used in the Linux kernel for some
+time (search Ingo Molnar's O(1) scheduler). This can be a scheduling
+optimization, which would prevent walking longer event list. But probably the
+management overhead would be too large, as we have many one-shot events.
+
+To find the next task the scheduler just walks the (constant) list of priority
+queues and schedules the first ready event of any queue.
+
+The downside of this approach: Insert / Start / Reschedule(for "auto" tasks)
+now need O(log(n)) to find the position in the queue of the priority.
+
+### Always process all (higher priority) pending events
+
+Currently `Application::Reschedule()` processes a single event or "all" events,
+with "all" defined as "100 events" in most backends. This already is ignored
+by the KDE backend, as Qt defines its `QAbstractEventDispatcher::processEvents`
+processing all pending events (there are ways to skip event classes, but no
+easy way to process just a single event).
+
+Since the Scheduler is always handled by the system message queue, there is
+really no more reasoning to stop after 100 events to prevent LO Scheduler
+starvation.
+
+### Drop static inherited or composed Task objects
+
+The sequence of destruction of static objects is not defined. So a static Task
+can not be guaranteed to happen before the Scheduler. When dynamic unloading
+is involved, this becomes an even worse problem. This way we could drop the
+mbStatic workaround from the Task class.
+
+### Run the LO application in its own thread
+
+This would probably get rid of most of the macOS and Windows implementation
+details / workarounds, but is quite probably a large amount of work.
+
+Instead of LO running in the main process / thread, we run it in a 2nd thread
+and defer al GUI calls to the main thread. This way it'll hopefully not block
+and can process system events.
+
+That's just a theory - it definitely needs more analysis before even attending
+an implementation.
+
+### Re-evaluate the macOS `ImplNSAppPostEvent`
+
+Probably a solution comparable to the Windows backends delayed PostMessage
+workaround using a validation timestamp is better then the current peek,
+remove, re-postEvent, which has to run in the main thread.
+
+Originally I didn't evaluate, if the event is actually lost or just delayed.
+
+### Drop `nMaxEvents` from Gtk+ based backends
+
+```
+gint last_priority = G_MAXINT;
+bool bWasEvent = false;
+do {
+ gint max_priority;
+ g_main_context_acquire( NULL );
+ bool bHasPending = g_main_context_prepare( NULL, &max_priority );
+ g_main_context_release( NULL );
+ if ( bHasPending )
+ {
+ if ( last_priority > max_priority )
+ {
+ bHasPending = g_main_context_iteration( NULL, bWait );
+ bWasEvent = bWasEvent || bHasPending;
+ }
+ else
+ bHasPending = false;
+ }
+}
+while ( bHasPending )
+```
+
+The idea is to use g_main_context_prepare and keep the `max_priority` as an
+indicator. We cannot prevent running newer lower events, but we can prevent
+running new higher events, which should be sufficient for most stuff.
+
+This also touches user event processing, which currently runs as a high
+priority idle in the event loop.
+
+### Drop nMaxEvents from gen (X11) backend
+
+A few layers of indirection make this code hard to follow. The `SalXLib::Yield`
+and `SalX11Display::Yield` architecture makes it impossible to process just the
+current events. This really needs a refactoring and rearchitecture step, which
+will also affect the Gtk+ and KDE backend for the user event handling.
+
+### Merge TaskStopwatch functionality into the Scheduler
+
+This way it can be easier used to profile Tasks, eventually to improve LO's
+interactivity.