Wednesday, 17 August 2022

Speeding up the kernel testing loop

When I create kernel contributions, I usually rely on a specific hardware, which makes using a system on which I need to deploy kernels too complicated or time-consuming to be worth it. Yes, I'm an idiot that hacks the kernel directly on their main machine, though in my defense, I usually just need to compile drivers rather than full kernels.

But sometimes I work on a part of the kernel that can't be easily swapped out, like the USB sub-system. In which case I need to test out full kernels.

I usually prefer compiling full kernels as RPMs, on my Fedora system as it makes it easier to remove old test versions and clearly tag more information in the changelog or version numbers if I need to.

Step one, build as non-root

First, if you haven't already done so, create an ~/.rpmmacros file (I know...), and add a few lines so you don't need to be root, or write stuff in /usr to create RPMs.

$ cat ~/.rpmmacros
%_topdir        /home/hadess/Projects/packages
%_tmppath        %{_topdir}/tmp

Easy enough. Now we can use fedpkg or rpmbuild to create RPMs. Don't forget to run those under “powerprofilesctl launch” to speed things up a bit.

Step two, build less

We're hacking the kernel, so let's try and build from upstream. Instead of the aforementioned fedpkg, we'll use “make binrpm-pkg” in the upstream kernel, which builds the kernel locally, as it normally would, and then packages just the binaries into an RPM. This means that you can't really redistribute the results of this command, but it's fine for our use.

 If you choose to build a source RPM using “make rpm-pkg”, know that this one will build the kernel inside rpmbuild, this will be important later.

 Now that we're building from the kernel sources, that's our time to activate the cheat code. Run “make localmodconfig”. It will generate a .config file containing just the currently loaded modules. Don't forget to modify it to include your new driver, or driver for a device you'll need for testing.

Step three, build faster

If running “make rpm-pkg” is the same as running “make ; make modules” and then packaging up the results, does that mean that the “%{?_smp_mflags}” RPM macro is ignored, I make you ask rhetorically. The answer is yes. “make -j16 rpm-pkg”. Boom. Faster.

Step four, build fasterer

As we're building in the kernel tree locally before creating a binary package, already compiled modules and binaries are kept, and shouldn't need to be recompiled. This last trick can however be used to speed up compilation significantly if you use multiple kernel trees, or need to clean the build tree for whatever reason. In my tests, it made things slightly slower for a single tree compilation.

$ sudo dnf install -y ccache
$ make CC="ccache gcc" -j16 binrpm-pkg

Easy.

And if you want to speed up the rpm-pkg build:

$ cat ~/.rpmmacros
[...]
%__cc            ccache gcc
%__cxx            ccache g++

More information is available in Speeding Up Linux Kernel Builds With Ccache.

Step five, package faster

Now, if you've implemented all this, you'll see that the compilation still stops for a significant amount of time just before writing “Wrote kernel...rpm”. A quick look at top will show a single CPU core pegged to 100% CPU. It's rpmbuild compressing the package that you will just install and forget about.

$ cat ~/.rpmmacros
[...]
%_binary_payload    w2T16.xzdio

More information is available in Accelerating Ceph RPM Packaging: Using Multithreaded Compression.

TL;DR and further work

All those changes sped up the kernel compilation part of my development from around 20 minutes to less than 2 minutes on my desktop machine.

$ cat ~/.rpmmacros
%_topdir        /home/hadess/Projects/packages
%_tmppath        %{_topdir}/tmp
%__cc            ccache gcc
%__cxx            ccache g++
%_binary_payload    w2T16.xzdio


$ powerprofilesctl launch make CC="ccache gcc" -j16 binrpm-pkg

I believe there's still significant speed ups that could be done, in the kernel, by parallelising some of the symbols manipulation, caching the BTF parsing for modules, switching the single-threaded vmlinux bzip2 compression, and not generating a headers RPM (note: tested this last one, saves 4 seconds :)

 

The results of my tests. YMMV, etc.

Command Time spent Notes
koji build --scratch --arch-override=x86_64 f36 kernel.src.rpm 129 minutes It's usually quicker, but that day must have been particularly busy
fedpkg local 70 minutes No rpmmacros changes except setting the workdir in $HOME
powerprofilesctl launch fedpkg local 25 minutes
localmodconfig / bin-rpmpkg 19 minutes Defaults to "-j2"
localmodconfig -j16 / bin-rpmpkg 1:48 minutes
powerprofilesctl launch localmodconfig ccache -j16 / bin-rpmpkg 7 minutes Cold cache
powerprofilesctl launch localmodconfig ccache -j16 / bin-rpmpkg 1:45 minutes Hot cache
powerprofilesctl launch localmodconfig xzdio -j16 / bin-rpmpkg 1:20 minutes

Saturday, 5 February 2022

“Videos” de-clutter-ification

(I nearly went with clutterectomy, but that would be doing our old servant project a disservice.)

Yesterday, I finally merged the work-in-progress branch porting totem to GStreamer's GTK GL sink widget, undoing a lot of the work done in 2011 and 2014 to port the video widget and then to finally make use of its features.

But GTK has been modernised (in GTK3 but in GTK4 even more so), GStreamer grew a collection of GL plugins, Wayland and VA-API matured and clutter (and its siblings clutter-gtk, and clutter-gst) didn't get the resources they needed to follow.

Screenshot_from_2022-02-03_18-03-40A screenshot with practically no changes, as expected

The list of bug fixes and enhancements is substantial:

  • Makes some files that threw shaders warnings playable
  • Fixes resize lag for the widgets embedded in the video widget
  • Fixes interactions with widgets on some HDR capable systems, or even widgets disappearing sometimes (!)
  • Gets rid of the floating blank windows under Wayland
  • Should help with tearing, although that's highly dependent on the system
  • Hi-DPI support
  • Hardware acceleration (through libva)

Until the port to GTK4, we expect a overall drop in performance on systems where there's no VA-API support, and the GTK4 port should bring it to par with the fastest of players available for GNOME.

You can install a Preview version right now by running:

$ flatpak install --user https://flathub.org/beta-repo/appstream/org.gnome.Totem.Devel.flatpakref

and filing bug in the GNOME GitLab.

Next stop, a GTK4 port!