[Humbledown highlights] Build an inode Lookup Database using SQLite3

Originally published by myself on humbledown.org at Sun Aug 28 19:25:47 NZST 2011
and since recovered to this location. It has not been tested since its original publication.

Occassionally, such as when investigating NFS traffic, you need to determine which filename(s) correspond to a given inode, or set of inodes. Now, you could answer this question using a command such as find /someplace -inum 12345 -print, but that will generally be quite slow, and very cumbersome if you wanted to lookup multiple inodes.

I decided to do this a different way, by creating a database to map between inode and path. Furthermore, I decided to do this using sqlite3, because a) it ought to be faster than grepping through a file looking for a particular inode, b) sqlite3 is pretty useful and I thought this would be a good excuse to practice using it, c) it was easy to do so.
One important thing to note: if you model a table using an inode as a primary key (which must by definition be unique), then you will run into problems. This is because although inodes are unique in a filesystem, the paths that correspond to an inode are not, so you might have multiple rows with the same inode field. So, although we would want the 'inode' field indexed, it is not a primary key. SQLite3 doesn't care though, as each row has an internal row number.

To generate the database (which is held in one file; SQLite3 is very convenient like that), we can do the following:

$ sqlite3 inodes.sqlite3db 'CREATE TABLE inodes (inode INTEGER, path STRING)'

Done. Now we just need to populate it with data. We can use the find command to help with this. Since inodes are unique to a filesystem, I'm using the -xdev option to prevent crossing over into other filesystems. I'm using the vertical bar to separate the fields in the output, as this is the default for sqlite3's .import function, although it can be changed.

$ find / -xdev -printf "%i|%p\n" | head
2|/
261121|/boot
261131|/boot/vmlinuz-2.6.32-27-generic
261154|/boot/initrd.img-2.6.32-29-generic
261153|/boot/initrd.img-2.6.32-21-generic
261126|/boot/vmlinuz-2.6.31-20-generic
261140|/boot/System.map-2.6.32-21-generic
261179|/boot/abi-2.6.32-29-generic
261132|/boot/System.map-2.6.32-26-generic
261695|/boot/abi-2.6.32-32-generic

Okay, so that shows what the import data is like. We would write the whole thing to a file and then import that, but it makes more sence to pipe it directly into sqlite3. Note that .import can actually read from stdin, but you need to specify it as /dev/stdin, as it doesn't understand -. I'm using sudo to prevent a lot of potential Permission denied messages, although I still get one, which is the Gnome Virtual File System.

$ time sudo find / -xdev -printf "%i|%p\n" | sqlite3 inodes.sqlite3db '.import /dev/stdin inodes'
find: `/home/cameron/.gvfs': Permission denied

real    1m32.258s
user    0m3.708s
sys     0m11.265s

Verifying...

$ sqlite3 inodes.sqlite3db 'SELECT * FROM inodes LIMIT 5;'
2|/
261121|/boot
261131|/boot/vmlinuz-2.6.32-27-generic
261154|/boot/initrd.img-2.6.32-29-generic
261153|/boot/initrd.img-2.6.32-21-generic

Good, so the data has been imported, and we are ready to begin querying. Let's step back a little first: note that we have directories as well, as directories are just a special type of file that lists what is in the directory. We can use the debugfs command to output the contents of an inode (in this example, inode 2):

$ sudo debugfs -R 'cat <2>' /dev/sda5 | hexdump -C
debugfs 1.41.11 (14-Mar-2010)
00000000  02 00 00 00 0c 00 01 02  2e 00 00 00 02 00 00 00  |................|
00000010  0c 00 02 02 2e 2e 00 00  0b 00 00 00 14 00 0a 02  |................|
00000020  6c 6f 73 74 2b 66 6f 75  6e 64 00 00 10 00 00 00  |lost+found......|
00000030  14 00 0b 07 76 6d 6c 69  6e 75 7a 2e 6f 6c 64 00  |....vmlinuz.old.|
00000040  01 f4 0b 00 0c 00 03 02  6d 6e 74 00 01 fc 03 00  |........mnt.....|
00000050  0c 00 04 02 62 6f 6f 74  0f 00 00 00 18 00 0e 07  |....boot........|
00000060  69 6e 69 74 72 64 2e 69  6d 67 2e 6f 6c 64 00 00  |initrd.img.old..|
00000070  0e 00 00 00 10 00 05 07  63 64 72 6f 6d 00 00 00  |........cdrom...|
00000080  01 f6 09 00 0c 00 03 02  65 74 63 00 01 f0 0f 00  |........etc.....|
00000090  10 00 05 02 6d 65 64 69  61 00 00 00 04 f0 0f 00  |....media.......|
...

Anyway, back to querying. To query a single inode is pretty easy:

$ time sqlite3 inodes.sqlite3db 'SELECT * FROM inodes WHERE inode = 1044571;'
1044571|/root/.synaptic/lock

real    0m0.238s
user    0m0.136s
sys     0m0.092s

To query multiple random inodes is a bit more annoying:

$ time sqlite3 inodes.sqlite3db 'SELECT * FROM inodes WHERE (inode = 2419) or (inode = 1044571);'
1044571|/root/.synaptic/lock
2419|/var/lock

real    0m0.313s
user    0m0.108s
sys     0m0.184s

But we can overcome that nicely:

$ time echo -e '784412\n784499\n784499' \
>  | while read inode
>  do
>     echo "SELECT * FROM inodes WHERE inode = $inode;" 
>  done | sqlite3 inodes.sqlite3db
784412|/var/log/syslog.7.gz
784499|/var/log/user.log.3.gz
784499|/var/log/user.log.3.gz

real    0m0.688s
user    0m0.360s
sys     0m0.296s

So, it took about 1m30s to build the database, less than a second to query it multiple times. Compare that to using 'find ... -inum X -print' multiple times, each of which takes about 1m30s (ignoring any filesystem caching).

Limitations of this method include: must be periodically updated; is not real-time, so you might not have all short-lived files; an inode might not always map to something currently in existance (eg. open a file by name, delete the file [entry in directory], then when the file is actually closed, the file will be deleted: this is common for temporary files).

Using this method in conjunction with something like monitoring NFS traffic, you could use it to find out which files are being accessed. When watching/sampling NFS traffic, you will only determine a mapping between inode and pathname when a LOOKUP request is made, not when operating on a file. If you sample 200,000 NFS packets, you might not find any mappings, depending on the nature of the traffic. So this can be very useful to determine what is going on.

Comments

Popular posts from this blog

ORA-12170: TNS:Connect timeout — resolved

Getting MySQL server to run with SSL

From DNS Packet Capture to analysis in Kibana