Tuesday, 17 December 2019

GMemoryMonitor (low-memory-monitor, 2nd phase)

TL;DR

Use GMemoryMonitor in glib 2.63.3 and newer in your applications to lower overall memory usage, and detect low memory conditions.

low-memory-monitor

To start with, let's come back to low-memory-monitor, announced at the end of August.

It's not really a “low memory monitor”. I know, the name is deceiving, but it actually monitors memory pressure stalls, and how hard it is for the kernel to allocate memory when applications need it. The longer it takes to allocate memory, the longer the kernel takes to allocate it, usually because it needs to move memory around to make room for a big allocation, when an application starts up for example, or prepares an in-memory buffer for saving.

It is not a daemon that will kill programs on low memory. It's not a user-space out-of-memory killer, and does not take those policy decisions. It can however be configured to ask the kernel to do that. The kernel doesn't really know what it's doing though, and user-space isn't helping either, so best disable that for now...

As listed in low-memory-monitor's README (and in the announcement post), there were a number of similar projects around, but none that would offer everything we needed, eg.:
  • Has a D-Bus interface to propagate low memory conditions
  • Requires Linux 5.2's kernel memory pressure stalls information (Android's lowmemorykiller daemon has loads of code to get the same information from the kernel for older versions, and it really is quite a lot of code)
  • Written in a compiled language to save on startup/memory usage costs (around 500 lines of C code, as counted by sloccount)
  • Built-in policy, based upon values used in Android and Endless OS
 GMemoryMonitor

Next up, in our effort to limit memory usage, we'll need some help from applications. That's where GMemoryMonitor comes in. It's simple enough, listen to the low-memory-warning signal and free some image thumbnails, index caches, or dump some data to disk, when you receive a signal.

The signal also gives you a “warning level”, with 255 being when low-memory-monitor would trigger the kernel's OOM killer, and lower values different levels of “try to be a good citizen”.

The more astute amongst you will have noticed that low-memory-monitor runs as root, on the system bus, and wonder how those new fangled (5 years old today!) sandboxed applications would receive those signals. Fear not! Support for a portal version of GMemoryMonitor landed in xdg-desktop-portal on the same day as in glib. Everything tied together with installed tests that use the real xdg-desktop-portal to test the portal and unsandboxed versions.

How about an OOM killer?

By using memory pressure stall information, we receive information about the state of the kernel before getting into swapping that'd cause the machine to become unusable. This also means that, as our threshold for keeping everything ticking is low, if we were to kill high memory consumers, we'd get a butter smooth desktop, but, based on my personal experience, your browser and your mail client would take it in turns disappearing from your desktop in a way that you wouldn't even notice.

We'll definitely need to think about our next step in application state management, and changing our running applications paradigm.

Distributions should definitely disable the OOM killer for now, and possibly try their hands at upstream some systemd OOMPolicy and OOMScoreAdjust options for system daemons.

Conclusion

Creating low-memory-monitor was easy enough, getting everything else in place was decidedly more complicated. In addition to requiring changes to glib, xdg-desktop-portal and python-dbusmock, it also required a lot of work on the glib CI to save me from having to write integration tests in C that would have required a lot of scaffolding. So thanks to all involved in particular Philip Withnall for his patience reviewing my changes.

Friday, 13 December 2019

Dual-GPU support follow-up: NVIDIA driver support

If you remember, back in 2016, I did the work to get a “Launch on Discrete GPU” menu item added to application in gnome-shell.

This cycle I worked on adding support for the NVIDIA proprietary driver, so that the menu item shows up, and the right environment variables are used to launch applications on that device.

Tested with another unsupported device...


Behind the scenes

There were a number of problems with the old detection code in switcheroo-control:
- it required the graphics card to use vga_switcheroo in the kernel, which the NVIDIA driver didn't do
- it could support more than 2 GPUs
- and it didn't really actually know which GPU was going to be the “main” one

And, on top of all that, gnome-shell expected the Mesa OpenGL stack to be used, so it only knew the right environment variables to do that, and only for one secondary GPU.

So we've extended switcheroo-control and its API to do all this.

(As a side note, commenters asked me about the KDE support, and how it would integrate, and it turns out that KDE's code just checks for the presence of a file in /sys, which is only present when vga_switcheroo is used. So I would encourage KDE to adopt the switcheroo-control D-Bus API for this)

Closing

All this will be available in Fedora 32, using GNOME 3.36 and switcheroo-control 2.0. We might backport this to Fedora 31 after it's been tested, and if there is enough interest.