Part of our efforts to reduce power consumption is to identify processes which keep waking up the disk even when the computer is idle. This already resulted in a few bug reports (and some fixes, too), but we only really just began with this.
Unfortunately there is no really good tool to trace file access events system-wide. powertop claims to, but its output is both very incomplete, and also wrong (e. g. it claims that read accesses are writes). strace gives you everything you do and don’t want to know about what’s going on, but is per-process, and attaching strace to all running and new processes is cumbersome. blktrace is system-wide, but operates at a way too low level for this task: its output has nothing to do any more with files or even inodes, just raw block numbers which are impossible to convert back to an inode and file path.
So I created a little tool called fatrace (“file access trace”, not “fat race”
) which uses fanotify, a couple of /proc lookups and some glue to provide this. By default it monitors the whole system, i. e. all mounts (except the virtual ones like /proc, tmpfs, etc.), but you can also tell it to just consider the mount of the current directory. You can write the log into a file (stdout by default), and run it for a specified number of seconds. Optional time stamps and PID filters are also provided.
$ sudo fatrace rsyslogd(967): W /var/log/auth.log notify-osd(2264): O /usr/share/pixmaps/weechat.xpm compiz(2001): R device 8:2 inode 658203 [...]
It shows the process name and pid, the event type (Rread, Write, Open, or Close), and the path. Sometimes its’ not possible to determine a path (usually because it’s a temporary file which already got deleted, and I suspect mmaps as well), in that case it shows the device and inode number; such programs then need closer inspection with strace.
If you run this in gnome-terminal, there is an annoying feedback loop, as gnome-terminal causes a disk access with each output line, which then causes another output line, ad infinitum. To fix this, you can either redirect output to a file (-o /tmp/trace) or ignore the PID of gnome-terminal (-p `pidof gnome-terminal`).
So to investigate which programs are keeping your disk spinning, run something like
$ sudo fatrace -o /tmp/trace -s 60
and then do nothing until it finishes.
My next task will be to write an integration program which calls fatrace and powertop, and creates a nice little report out of that raw data, sorted by number of accesses and process name, and all that. But it might already help some folks as it is right now.
The code lives in bzr branch lp:fatrace (web view), you can just run make and sudo ./fatrace. I also uploaded a package to Ubuntu Precise, but it still needs to go through the NEW queue. I also made a 0.1 release, so you can just grab the release tarball if you prefer. Have a look at the manpage and --help, it should be pretty self-explanatory.
#1 by Jon on 2012/02/07 - 20:14
Zitieren
I wonder why g-t reads or writes on every command execution. I tried plowing through the strace output but couldn’t figure it out before my attention span waned. That seems *wrong* to me, though.
#2 by Anonymous on 2012/02/07 - 20:40
Zitieren
Note that perf and systemtap can both trace filesystem calls and other syscalls systemwide, and perf lives in the upstream kernel tree, making it pretty straightforward to make use of.
#3 by Mikkel Kamstrup Erlandsen on 2012/02/07 - 21:02
Zitieren
Hey I love this Martin!
Unfortunately it needs to be rejected as there are no unit test…
Wondering why you don’t readlink() on /proc/self/fd/$fd to get the pathnames instead of doing the fstat() dance? That might also solve the races you are seeing..?
#4 by Roger on 2012/02/07 - 21:32
Zitieren
What’s wrong with using lsof?
#5 by pitti on 2012/02/07 - 21:37
Zitieren
@Roger: How do you use lsof to achieve this?
#6 by Michael on 2012/02/07 - 21:38
Zitieren
Awesome, thanks for writing this!
I usually use atop for that purpose, but it will only show you which process reads/writes, not precisely which file.
#7 by pitti on 2012/02/07 - 21:38
Zitieren
@Mikkel: I do use readlink(/proc/self/fd/$fd) to figure out the file name, but as I don’t know _which_ fd, I need to iterate through them all and compare their inodes against the inode I got from fsnotify.
#8 by pitti on 2012/02/07 - 21:42
Zitieren
@Anonymous: systemtap is hard to use on Ubuntu as we don’t have a compatible kernel at the moment. I am not aware that perf could do this, its documentation (I looked at https://perf.wiki.kernel.org/articles/t/u/t/Tutorial.html) doesn’t seem to indicate this. But if it does, I’m happy to use this one instead.
Thanks for the pointer!
#9 by Michael on 2012/02/07 - 21:48
Zitieren
Hi Martin,
interesting tool. Just wanted to mention that you probably want “bzr branch lp:fatrace” and not “bzr branch lp:fanotify”
#10 by pitti on 2012/02/07 - 21:53
Zitieren
Thanks for spotting! Fixed the post.
#11 by Anonymous on 2012/02/07 - 22:10
Zitieren
@Martin: Take a look at perf’s -a option for systemwide profiling, and the rw-by-pid trace script. Various other stuff exists as well for other variations on that theme.
#12 by Peter on 2012/02/08 - 00:18
Zitieren
My system seems to trigger some bug in fatrace. It runs for a couple of seconds, and then exits with a “read: Value too large for defined data type” error message.
I placed a couple of log printfs to catch the source, and it is the read() in the “read all events in a loop” loop. The value of errno seems to be EOVERFLOW, but I have no clue what is causing it. Increasing the buffer size to 65536 didn’t help.
#13 by foo on 2012/02/08 - 03:56
Zitieren
How come you didn’t upload fatrace to Debian? You seem to be a Debian member so there should be no problem for you to do that?
#14 by Frank Ch. Eigler on 2012/02/08 - 04:23
Zitieren
“systemtap is hard to use on Ubuntu as we don’t have a compatible kernel at the moment”
Why do you think so? Monitoring file accesses is pure kernel-space probing, which has worked fine on Ubuntu and elsewhere.
#15 by pitti on 2012/02/08 - 06:33
Zitieren
Can upload to Debian once it proved to work and is not redundant. Other posts suggest that perf can do the same, I’ll try that today.
#16 by Iñigo on 2012/02/08 - 10:45
Zitieren
# strace -q -f -e open,close,stat -p $(pgrep -d ‘ -p’ ‘.*’) 2>&1 | grep -E ‘(open|close|stat)’
(press Ctrl+C like in a videogame to stop it)
Greetings from panet Debian, if the package is not in Debian, please tag your posts in such feed as [ubuntu only] in the title.
#17 by Iñigo on 2012/02/08 - 10:48
Zitieren
Clarification note: I’m not talking in name of th Debian project, I’m just a person that reads th feed, and at a side of my poor English, I try to be educated in my request.
#18 by pitti on 2012/02/08 - 11:09
Zitieren
That still won’t catch new processes (temporary callouts, for example), and also will report a lot of events which do not cause actual disk actitivy (/dev/shm/, sockets, etc.).
#19 by Iñigo on 2012/02/08 - 12:55
Zitieren
Hello again,
Of course my oneliner example is a poor man’s solution and can be improved and extended, (and can too write to a log file, review more calls like lstat, unlink, chmod, etc)
Hey, it’s just 2 piped commands, not a program
Other cheap solution to the topic issue, maybe to use:
echo 1 > /proc/sys/vm/block_dump
tail -f /var/log/syslog
echo 0 > /proc/sys/vm/block_dump
Again: It’s not “the same” and of course not comparable to a specific and targeted program.
Greets.
#20 by Iñigo on 2012/02/08 - 13:00
Zitieren
And about my comment on “feeds” and “topics”, there is:
http://planet.debian.org/deriv/
Every thing fits in the universe, human beings too.
o/
#21 by htorque on 2012/02/08 - 13:20
Zitieren
I don’t see any close events in the log? It ran for seven hours on an idle system, produced ~120k lines, 60% of the entries were caused by update.mlocate and update-apt-xapian-index, 60% of the 22k writes came from nm-applet. Will repeat it this night on battery power, too see if it makes any difference.
#22 by Marius Gedminas on 2012/02/08 - 15:53
Zitieren
I’m curious about the powertop issue. Simple bug or is it noticing writes caused by atime updates?
#23 by oliver on 2012/02/08 - 18:53
Zitieren
jon: AFAIK KDEs Konsole program keeps the scrollback buffer in a (hidden) file in /tmp/; maybe gnome-terminal does the same?
#24 by Ryan on 2012/02/08 - 23:56
Zitieren
Nice! It reminds me of opensnoop for dtrace. This looks like a useful tool.
#25 by pitti on 2012/02/09 - 08:10
Zitieren
@htorque: Was missing FAN_CLOSE in fanotify_mark(), fixed in trunk. Thanks!
#26 by htorque on 2012/02/09 - 12:23
Zitieren
Thanks!
I ran it tonight on battery for a couple of hours, and the writes I saw were: nmbd: 5/h, smbd: 5-6/h, rsyslogd: 2/h, upowerd: 6/h. Unfortunately, most of the time they didn’t overlap.
The rest (without update.mlocate and update-apt-xapian-index not a lot) pretty much was just a set of repeated reads (cached, I assume?).
#27 by pitti on 2012/02/16 - 07:55
Zitieren
Do you have any guidance how to use this? All the google hits about perf on LWN are wildly out of date (e. g. perf trace does not exist any more), and manpages and documentation are not very helpful since they describe commands that don’t work.
I recorded some data with “sudo perf record -a”, got a perf.data file. A simple “perf report -i perf.data” shows me some stuff, but no file access. But none of these work:
perf script report /usr/src/linux-headers-3.2.0-16/tools/perf/scripts/perl/rw-by-pid.pl
perf script report rw-by-pid.pl
perf script report rw-by-pid
They all say “Please specify a valid report script(see ‘perf script -l’ for listing)”, but “perf script -l” does not work, it just shows the help again. strace shows that it doesn’t even make an attempt to search for the file.
Thanks in advance!
#28 by pitti on 2012/02/16 - 08:24
Zitieren
Oh, I see what you mean with /proc/self/fd/$fd now. That indeed works much better and is much easier, I just committed that to trunk. Will release 0.2 today. Thanks for pointing out!
Pingback: New fatrace released, Debian package coming « Martin Pitt
#29 by james on 2013/04/12 - 08:22
Zitieren
This would not work with NFS client file access, correct? As fsnotify only works for local files.
#30 by pitti on 2013/04/19 - 06:07
Zitieren
Right, that does not go through the kernel’s VFS and hence not through fanotify. I’m less sure about inotify support, though. Also, the NFS client knows about file accesses already, so maybe that information can be retrieved from that.
#31 by Teresa e Junior on 2013/05/06 - 02:32
Zitieren
Thanks so much for this tool. Since it got into Debian, I’ve rewritten a lot of scripts and config files to prevent constant wakeups (eg. sleep loops, syslog, cache writes, mail checks…), and the difference is noticeable!
Good news on power savings also this morning from Ingo Molnar on http://lkml.indiana.edu/hypermail/linux/kernel/1305.0/01831.html
To the first poster @Jon, any terminal based on libvte and also Konsole writes to /tmp for every single line of output. The only way to achieve sanity and safety is either running an ugly terminal emulator or running tmux inside your favorite terminal, since it creates an internal buffer.