A Note on Pathname Processing in HFSDebug

A couple of weeks ago, I released HFSDebug 4. I’ve updated it to make HFSDebug’s pathname processing a little more sophisticated. Depending on how (and how seriously) you use HFSDebug, knowing the details could be useful.

When you specify a file system object to HFSDebug using a pathname, how HFSDebug will treat the pathname usually depends on other arguments, or the lack of other arguments.

Something to Read and Forget: The “Legacy” Mode

A typical invocation is quite simple: you simply give HFSDebug a path. If the path exists on an HFS+ volume, HFSDebug will use the underlying volume as the one to operate upon.

$ sudo hfsdebug /mach_kernel
...

In this case, HFSDebug will begin by doing a stat(2) call on the path. (As you will see shortly, this and the entire “legacy mode” is optional—you can make HFSDebug do even this part “from scratch” and not use stat(2) or other file system calls on the volume.) Since the goal is to make it possible to examine file system’s internal structure as opposed to what the user “sees” through layers of interfaces, it matters whether the object in question is a file hard link, directory hard link, symbolic link, etc. Specifically, we need to get at the link reference that the pathname represents—not the target it resolves to. For symbolic links, we can use lstat(2) to give us the node ID of the reference. For file and directory hard links, we have to use something else.

How do we know if something is a hard link? HFSDebug examines the st_nlink (link count) and st_mode fields of the resultant stat structure.

In the case of regular files, a link count of 2 or higher means the object is a “known” hard link.

In the case of directories, what “link count” means is quite context-sensitive on HFS+. The stat structure’s st_nlink for a directory normally represents the directory’s item count. The directory itself (the "." entry) and its parent (the ".." entry) together add 2 to the count. Thus, if you have a directory with, say, 4 files and 4 subdirectories in it, a stat(2) call would report an st_nlink value of 10. However, if the folder count bit is enabled on the mount, the meaning of st_nlink changes: it then represents a count of only the subdirectories. In the aforementioned example, the link count would be 6 instead of 10. The folder count bit is currently only enabled for case-sensitive (HFSX) volumes. Besides the stat structure’s st_nlink field, HFS+ can separately provide the children count and the real hard link count for directories. The getattrlist(2) call can be used to retrieve these: they are the ATTR_DIR_ENTRYCOUNT and ATTR_DIR_LINKCOUNT directory attributes, respectively. Once we do know a directory’s “real” hard link count, again, a value of 2 or higher means it is a “known” hard link.

By a “known” file or directory hard link, I mean that we know that it currently is a hard link. That means the object we are looking at is a link reference and the “visible” node ID isn’t that of the reference but is that of its target. (See my previous post for more information.) In this case, HFSDebug will retrieve the object’s parent folder’s node ID and use it in conjunction with the object’s name to do a “from scratch” lookup of the object’s node ID. (This is a traditional { parent_nodeid, name } ==> nodeid lookup.)

However, things are not simpler if the link count of the object is 1. That’s because it could have had a higher link count in the past and the other links were deleted. Again, as I described earlier, the on-disk object continues to be a reference to the real content that lives in a special hidden HFS+ folder. If that’s the case though, the hidden folder will have a file or a folder whose name is formed from the object’s visible node ID: iNode%d or dir_%d for files and folders, respectively. If no such file or folder exists, as HFSDebug can look up from scratch, the object is not a current or past hard link.

If this sounds unnecessarily complex, well, some of it is. Until version 4, HFSDebug did not have the ability to process complete pathnames from scratch. With version 4, you can do things like the following on both mounted and unmounted volumes alike.

$ sudo hfsdebug -d /dev/disk0s2 /mach_kernel
...

Simpler, Better

In the new mode, HFSDebug will no longer do a stat(2)/lstat(2) or involve the file system otherwise. Obviously, if it has to support unmounted volumes, that’s how it has to be. It will take the pathname and process it component-by-component, which is easier to conceptualize than the “legacy” mode I described above. (Well, to be fair to the legacy mode, it became uglier with the advent of directory hard links.)

The following are examples of how HFSDebug will handle things based on some of its arguments.

$ sudo hfsdebug /foo/bar/baz
# legacy mode; will use stat(2)/lstat(2) to kick things off
...
$ sudo hfsdebug -d /dev/diskN /foo/bar/baz
# new mode; volume can be mounted or unmounted
...
$ sudo hfsdebug -P /foo/bar/baz
# new mode; uses root volume, which is obviously mounted
...
$ sudo hfsdebug -d /dev/diskN -P /foo/bar/baz
# new mode; volume can be mounted or unmounted

If you are wondering why I haven’t removed the legacy mode altogether, it’s because I want to keep it around for some time so that I can compare things while testing.

Just remember that you can use the -P argument (note that it’s the capital P) to specify a path and HFSDebug will use the new mode on both mounted and unmounted volumes. The path must be absolute.

Some Rules

Now, if HFSDebug is not involving the file system at all, it better be able to handle arbitrarily convoluted pathnames. Using things like realpath(3) is not an option since realpath(3) would want to call stat(2)/lstat(2). Say, we could have a path like the following.

/foo////././bar///../baz/../blah/..//////.././dir/are/you/../crazy/

Besides, there could be components in the path that are symbolic or hard links. Symbolic links could point to targets that have equally crazy pathnames. It’s not just a matter of canonicalizing the dots and the slashes. We have several requirements as illustrated by the following examples. (Some of these are simply HFSDebug conventions.)

  • We must ensure that all intermediate components resolve to directories. They can be actual directories, valid symbolic links to directories, or directory hard links. Remember that in the case of symbolic or hard links, the on-disk object will be a “file”—HFSDebug will need to resolve them from scratch too.
  • If there is a “..”, we must not blindly go “up” one level: we must ensure that what we are going back from is a directory. realpath(3) actually doesn’t care about this: it will canonicalize /path/to/file.txt/../file.txt to /path/to/file.txt.
  • Although we do want to resolve intermediate components that are links, we must not resolve the terminal component if it happens to be a hard link or a symbolic link. That’s because our goal is to look at what’s on disk for the given path. Besides, in the case of a link, the details shown by HFSDebug will include the full pathname to the link’s target. If we wanted further details on the target, we could run HFSDebug on it.
  • If the path has a terminal slash, HFSDebug will ensure that the component is a directory. If it happens to be a directory hard link or a symbolic link, HFSDebug will resolve it in this case. Consider an example: suppose there is a symbolic link /tmp/somesymlink. The following is what HFSDebug will do depending on the arguments and what the link points to.

# somesymlink points to a file
#
$ sudo hfsdebug -P /tmp/somesymlink
... # will show details of the link itself
$ sudo hfsdebug -P /tmp/somesymlink/
... # will complain that the link target is not a directory

# somesymlink points to a directory
#
$ sudo hfsdebug -P /tmp/somesymlink
... # will show details of the link itself
$ sudo hfsdebug -P /tmp/somesymlink/
... # will show details of the directory somesymlink points to

# somesymlink points to a non-existent target
#
$ sudo hfsdebug -P /tmp/somesymlink
... # will show details of the link itself
$ sudo hfsdebug -P /tmp/somesymlink/
... # will complain that path /tmp/somesymlink/ was not found on the volume

Comments are closed.


All contents of this site, unless otherwise noted, are ©1994-2014 Amit Singh. All Rights Reserved.