A File System Change Logger
© Amit Singh. All Rights Reserved. Written in May 2005Background
Spotlight was a highly anticipated feature of Mac OS X 10.4 "Tiger". From a technical standpoint, you could think of Spotlight as roughly encompassing the following:
- An in-kernel notification mechanism that can inform user-space "subscribers" of file system changes as they happen.
- A database of various types of file-related information, in particular, of harvested metadata.
- A suite of programs for receiving change notifications, harvesting metadata, updating the database, and so on.
- The end-user and programmer interfaces.
This discussion involves accessing and demonstrating the lowest-level building block, the change notification mechanism.
Although this is a wild guess, and I could be entirely wrong, the Spotlight project might have been an 18-month to 2-year project from conception to release.
I was once (mid 2003-2004) involved in a conceptually very similar, but less consequential endeavor as Spotlight, the details of which I am unable to discuss. Consequently, I was very interested in Spotlight since I first heard about it. Even though it may be easy to explain or understand what Spotlight is, implementing the technology, especially in a commercial end-user system, is a monumental task in my opinion. I think Apple engineers have done a very admirable job with Spotlight, making pragmatic choices that work well.
fslogger
fslogger is a user-space program that subscribes to the same file system event notification mechanism as the Spotlight metadata server.
Note that fslogger does not use the Spotlight APIs. It uses the mechanism underlying to Spotlight.
fslogger's mode of operation is very simple. The following points are particularly noteworthy:
- It must be run as root, for example, via the
sudocommand. This is so because subscribing to the kernel's file system change notification service requires super-user privileges. - It takes no useful arguments. You can, however, pass in arbitrary arguments that will result in an informational message.
- It does not interfere with the working of Spotlight, since the kernel supports multiple such subscribers.
- It will not run on any Mac OS X version older than 10.4.
Once active, fslogger will wait for change notifications to arrive from the file system layer in the kernel. The various file system operations that are communicated to fslogger (and other subscribers such as Spotlight, specifically the metadata server) include:
- File creation
- Folder creation
- File or folder deletion
- Changes to the
statstructure (for example, a permission change) - Renaming of a file or a folder
- Content modification
- Content exchange between two files
- Finder information changes
- Change of ownership
The "exchange" operation on HFS Plus is used to exchange fork data of two files by simply swapping certain information in the Catalog file, thus preserving the file ID when updating an existing file.
fslogger receives and displays the aforementioned events practically instantly, courtesy of the kernel's support. An event notification contains details of the file system object(s) on which the event happened. fslogger processes this information, enhancing it marginally (for example, by determining human-friendly names corresponding to process, user, and group identifiers). The information displayed by fslogger for a typical file system event includes:
- The type of event (such as file creation or deletion)
- The process ID of the process responsible for the event, and if possible, the program name. Note that if the process has already exited, the corresponding program name may no longer be available. fslogger displays the word "exited?" in such cases.
- The path name of the file system object.
- The device number of the volume containing the file system object.
- The "inode number" of the file system object.
- The mode of the file system object, including its vnode type (for example,
VREGfor a regular file,VDIRfor a folder, and so on). - The user and group identifiers owning the file system object.
In its current version, fslogger uses a relatively small queue for holding change notifications. Under heavy file system activity, the queue may become full, and the kernel may have to drop an event, an action which itself is an event, and is reported as such by fslogger.
Microsoft Windows
It must be noted that certain versions of Microsoft Windows already provide analogous features, with some differing aspects. The NTFS Change Journal provides a persistent file system change log. When file system objects are added, deleted, or modified, the change is recorded in a per-volume journal.
The Change Journal is useful for file system replicators and indexers, incremental backup applications, virus and security-related scanners, and so on. In particular, Microsoft uses this mechanism for implementing the Indexing Service feature of Windows XP Professional.
Windows provides another related feature, the ReadDirectoryChangesW function, which retrieves directory-specific change information.
A sample output from fslogger is shown below.
$ sudo ./fslogger
fslogger ready
=> received 164 bytes
# Event
type = CREATE FILE
pid = 286 (zsh)
# Details
# type len data
VNODE 28 path = /Users/amit/Desktop/foo.txt
DEVICE 4 dev = 0xe000002 (major 14, minor 2)
INODE 4 ino = 808517
MODE 4 mode = -rw-r--r-- (0x0081a4, vnode type VREG)
UID 4 uid = 501 (amit)
GID 4 gid = 501 (amit)
DONE (0xb33f)
=> received 84 bytes
# Event
type = CONTENT MODIFIED
pid = 79 (Finder)
# Details
# type len data
VNODE 30 path = /Users/amit/Desktop/.DS_Store
DEVICE 4 dev = 0xe000002 (major 14, minor 2)
INODE 4 ino = 235865
MODE 4 mode = -rw------- (0x008180, vnode type VREG)
UID 4 uid = 501 (amit)
GID 4 gid = 501 (amit)
DONE (0xb33f)
=> received 88 bytes
# Event
type = CONTENT MODIFIED
pid = 3571 (mdimport)
# Details
# type len data
VNODE 34 path = /private/tmp/objc_sharing_ppc_501
DEVICE 4 dev = 0xe000002 (major 14, minor 2)
INODE 4 ino = 801404
MODE 4 mode = -rw------- (0x008180, vnode type VREG)
UID 4 uid = 501 (amit)
GID 4 gid = 0 (wheel)
DONE (0xb33f)
...
Uses
Apple's existing use of this notification support in Spotlight is a good demonstration of its utility and power. fslogger aims to further demonstrate the flexibility of this feature. Consider some scenarios in which fslogger (or more appropriately, a custom program that uses this feature) could be useful. Note that these are merely aloud thoughts, and may vary greatly in feasibility or utility.
- Monitoring and analyzing file system activity, perhaps for the purpose of better understanding the functioning of a running system, if nothing else. In general, watching "almost exactly what's changing" on your volume can be an interesting experience in itself. For example, one can wonder on academic grounds about the number of file system objects that change in a given time interval during typical use.
- Tracking what file system objects are being added, removed, or modified during installation of some software.
- Implementing your own file "management" schemes, perhaps for experimental purposes. Thus, you could write your own version of Spotlight, although it will be rather hard to justify in light of what Apple provides!
- Implementing your own arbitrarily complex (or arbitrarily "meta") file system event handlers. In a pinch, do "something" when "something" happens in a file system.
- Implementing an implicit or automatic file versioning system.
- Maintaining a "work log", that attempts to determine how much you "work" based on the number of file system objects, or perhaps even the amount of file data that you change during a given time. An overwhelming amount of web-browsing-related activity could be a suitable indicator of, well, something!
- Implementing custom logging schemes, such as ones that could tell you "what you were working on" on a given date.
- Using this feature along with others such as Automator or Dashboard to create something more useful.
Caveat
The interface that fslogger uses is private to Apple. Currently, there is a caveat regarding the use of this interface by third parties (including fslogger). While the change notification interface supports multiple clients, there is a single kernel buffer for holding events that are to be delivered to one or more subscribers, with the primary subscriber being Spotlight. Now, the kernel must hold events until it has notified all subscribers that are interested in them. Since there is a single buffer, a slow subscriber can cause it to overflow. If this happens, events will be dropped — for all subscribers, including Spotlight. Consequently, Spotlight may need to look at the entire volume to determine "what changed".
fslogger is meant to be a learning tool. If you use it, you must understand the aforementioned caveat. If you cause heavy enough file system activity (what's "heavy" will vary greatly, depending on your system and its currently available resources), both fslogger and Spotlight may miss events, causing Spotlight to spend some extra time looking at your volume. Note that Spotlight will not reindex the entire volume — it will only look for the changes that it missed.
An example of a typically heavy file system activity (that may quite possibly cause events to be dropped) is unpacking a giant tarball. Finally, if events are missed, fslogger will indicate that event (missing events is an event itself).
Download
FSLogger-1.1.dmg (only for Mac OS X 10.4.x "Tiger")
FSLogger-2.1.dmg (only for Mac OS X 10.5.x "Leopard")
FSLogger Source Code