Detecting wake-ups
First, we need to detect wake-ups. This is a fairly well-known, and old, process, using the venerable PowerTop.
# powertop --html=/tmp/foo.html --time=60
You can increase the number of seconds to get a more realistic view of your machine's idleness, but this is enough to show the main culprits.
In my case, evolution, gnote and devhelp were all waking up about 30 times a second whilst mostly idle. Evolution might be an outlier, as it also talks to the network, and is a bigger application to debug, so I started with devhelp.
$ strace -vvv -p `pidof devhelp`
Process 19069 attached
restart_syscall(<... resuming interrupted call ...>) = 0
recvfrom(6, 0x23fece4, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=3, events=POLLIN}], 3, 17) = 0 (Timeout)
recvfrom(6, 0x23fece4, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=3, events=POLLIN}], 3, 17) = 0 (Timeout)
And the screen fills up with EAGAIN errors. This looks a lot like a a timeout being called too often.
Debugging wake-ups
I started sprinkling debug in g_main_context_prepare(), the function that prepares the various timeout and idle sources for dispatch, and calculates the timeouts for each poll() operation.
Something like:
if (source_timeout > 0 && source_timeout <= 20)
g_message ("Source '%s' has very low timeout %d", g_source_get_name (source), source_timeout);
The problem is we end up getting a null source name for almost all of the sources. This is where g_source_set_name() and its sibling g_source_set_name_by_id() come in handy.
timeout_id = g_timeout_add (timeout, myfunction, mydata);
g_source_set_name_by_id (timeout_id, "[module-name] myfunction");
And we start doing that all over GTK+. As you can see from the patches in the bug, there's not just timeouts added by g_timeout_add() that we need to name.
In custody
The huge amount of debug shown when running our application with the gmain.c debug above tells us:
GLib-Message: Source 0x2d4f380 '[gtk+] gdk_frame_clock_paint_idle' has very low timeout 17
even when the window doesn't change, is in the background, and not updating. About 30 times a second.
Who are you gonna call?
Or by whom have you been called, rather. This is a small section of my ~/.gdbinit which will break on a particular function, print a backtrace, and continue. It makes it easier to interact with the logs after the fact, especially if they are calls that happen often and you're not interested in all the calls.
set breakpoint pending on
break gdk_frame_clock_begin_updating
commands
bt
continue
end
We did the same for gdk_frame_clock_begin_updating and found a backtrace similar to the one in Bugzilla. We only needed to start reading some code after that, and figuring out what was going on. The result was a bug in GTK+, likely a regression from GTK+ 3.8.
Your laptop should last a bit longer when the updates hit.
TL;DR
Name your timeouts with g_source_set_name_by_id(), run powertop, and file bugs against broken applications.
Update: Fixed powertop command-line.
Update: Fixed powertop command-line.