Archiv für November 2011

Apport 1.90: Client-side duplicate checking

Apport and the retracer bot in the Canonical data center have provided server-side automatic closing of duplicate crash report bugs for quite a long time. As we have only kept Apport crash detection enabled in the development release, we got away with this as bugs usually did not get so many duplicates that they became unmanageable. Also, the number of duplicates provided a nice hint to how urgent and widespread a crash actually was.

However, it’s time to end that era and provide something better now:

  • This probably caused a lot of frustration when a reporter of the crash spent time, bandwidth, and creativity to upload the crash data and create a description for it, only to find that it got closed as a duplicate 20 minutes later.
  • Some highly visible crashes sometimes generated up to a hundred duplicates in Launchpad, which was prone to timeouts, and needless catch-up by the retracers.
  • We plan to have a real crash database soon, and eventually want to keep Apport enabled in stable releases. This will raise the number of duplicates that we get by several magnitudes.
  • For common crashes we had to write manual bug patterns to avoid getting even more duplicates.

So with the just released Apport 1.90 we introduce client-side duplicate checking. So from now, when you report a crash, you are likely to see “We already know about this” right away, without having to upload or type anything, and you will get directed to the bug page. You should mark yourself as affected and/or subscribe to the bug, both to get a notification when it gets fixed, and also to properly raise the “hotness” of the bug to bubble up to developer attention.

For the technically interested, this is how we detect duplicates for the “signal” crashes like SIGSEGV (as opposed to e. g. Python crashes, where we always have a fully symbolic stack trace):
As we cannot rely on symbolic stack traces, and do not want to force every user to download tons of debug symbols, Apport now falls back to generating a “crash address signature” which combines the absolute addresses of the (non-symbolic) stack trace and the /proc/pid/maps mapping to a stack of libraries and the relative offsets within those, which is stable under ASLR for a given set of dependency versions. As the offsets are specific to the architecture, we form the signature as combination of the executable name, the signal number, the architecture, and the offset list. For example, the i386 signature of bug looks like this:

/usr/bin/rhythmbox:11:i686:/usr/lib/libgstpbutils-0.10.so.0.24.0+c284:/usr/lib/i386-linux-gnu/libgobject-2.0.so.0.3000.0+3337a:/usr/lib/i386-linux-gnu/libgobject-2.0.so.0.3000.0+8e0

As library dependencies can change, we have more than one architecture, and the faulty function can be called from different entry points, there can be many address signatures for a bug, so the database maintains an N:1 mapping. In its current form the signatures are taken as-is, which is much more strict than it needs to be. Once this works in principle, we can refine the matching to also detect duplicates from different entry points by reducing the part that needs to match to the common prefix of several signatures which were proven to be a duplicate by the retracer (which gets a fully symbolic stack trace).

The retracer bots now exports the current duplicate/address signature database to http://people.canonical.com/~ubuntu-archive/apport-duplicates in an indexed text format from where Apport clients can quickly check whether a bug is known.

For the Launchpad crash database implementation we actually check if the bug is readable by the reporter, i. e. it is private and the reporter is in a subscribed team, or the bug is public; if not, we let him report the bug anyway and duplicate it later through the existing server-side retracer, so that the reporter has a chance of getting subscribed to the bug. We also let the bug be filed if the currently existing symbolic stack trace is bad (tagged as apport-failed-retrace) or if a developer wants a new symbolic stack trace with the current libraries (tagged as apport-request-retrace).

As this is a major new feature, I decided that it’s time to call this Apport 2.0. This is the first public beta towards it, thus called 1.90. With Apport’s test driven and agile development the version numbers do not mean much anyway (the retracer bots in the data center always just run trunk, for example), so this is as good time as any to reset the rather large “.26″ minor version that we are at right now.

Tags: , , , , ,

12.04: Testing FTW

I arrived back home in Augsburg, from last week’s Ubuntu Developer Summit in Orlando, FL. As this is a quality/LTS cycle, we pretty much already knew in advance what to do (bug fixing, bug fixing, some boot speed, and did I mention bug fixing?), but still we had many highly interesting and exciting sessions this time, not so much about what we are going to do, but how we are going to build 12.04.

So far our common practice has been to toss everything new into the development release until Feature Freeze and then try and clean up most of the fallout. Me and many other developers have always cried for having more time for fixing long-standing bugs and not introducing breakage in the first place. It seems that now with 12.04, Ubuntu/Canonical are actually getting serious about it.

(Any resemblance to that postcard from the Kennedy Space Center which I went to last Sunday is of course absolutely unintended and purely coincidental :-) ).

The mission statement is now to have working ISOs, stable → development, and daily intra-development upgrades every day, quick and regular cleanup of uninstallable packages, component-mismatches, NBS etc., backed by a new “stable +1″ team backed by three people on a rotational shift.

QA team is now setting up daily automatic smoketesting of the installer and other packages which have tests. For the latter we’ll convert some packages to the DEP-8, the proposed format for running autopkgtest on (I’ll do udisks, postgresql-common, pygobject, apport, and jockey soon).

We’ll try do put uploads which might break something (like new libraries) to a staging area first, against which we can run test suites of reverse dependencies before it lands in the new release. As doing this on a large scale still requires infrastructure to be created, we’ll only exercise it for a few packages by uploading to precise-proposed first, but this has a high potential for extension.

We want to commit to fixing major breakage within 3 hours of development time, or otherwise revert the faulty package to the previous version (unless that aggravates problems, such as file conflicts).

Finally, for Canonical upstreams we are introducing “acceptance criteria”, which will hopefully significantly raise the quality and lower the regressions of each Unity etc. release.

So, the mission is clear. In practice we’ll probably have to make some real-life concessions, and Murphy’s law dictates that there still will be some breakage, but we can learn from that as we go.

Let’s build 12.04 LTS!

Tags: , , , ,

Riding the Pangolin

Just took the plunge, using the excellent bandwidth and local mirror at UDS:

$ lsb_release -irc
Distributor ID: Ubuntu
Release: 12.04
Codename: precise

Nothing blew up in my face, so it seems today is a good day to die^Wupgrade.

Tags: , , ,