-
Notifications
You must be signed in to change notification settings - Fork 0
TarExtendedAttributes
Storing Extended Attributes in Tar Files
Also see: TarNFS4ACLs and TarPosix1eACLs
Currently, I know of several different techniques for storing POSIX.1e-style extended attributes within tar files. Apple has extended GNU tar with support for a "resource file" stored as a separate entry. Joerg Schilling's _star_ and my own _libarchive_ library (used by _bsdtar_) use pax extensions for this purpose. Solaris and AIX added new tar entry types for this purpose.
Libarchive stores each POSIX.1e extended attribute as a separate pax attribute. The pax attributes are named LIBARCHIVE.xattr.name, where name is the name of the POSIX.1e attribute. The name includes a namespace identifier (so an attribute "foo" in the user namespace from a FreeBSD system would get stored as LIBARCHIVE.xattr.user.foo). Non-ASCII bytes in the name get encoded as % followed by two uppercase hex digits. The value of the attribute is base-64 encoded. In particular, note that the pax attribute name and value are both entirely ASCII and hence are correct UTF-8. This was done to promote simple implementation for pax readers and writers that use strict UTF-8 parsing for all headers.
This approach works reasonably well for small attributes but could be cumbersome for very large attributes. It also does not support extended metadata (such as per-EA ownership and permissions) supported on some systems.
Joerg Schilling's star uses a very similar approach. It stores each attribute under SCHILY.xattr._name_ as above. I haven't found any documentation for how it handles non-ASCII bytes in the name or value.
The documentation included with star 1.5 (April, 2008) includes comments about the non-portability of this approach and suggests that this may be dropped in favor of a different approach that supports Solaris as well.
As of January 2009, GNU tar seems to be considering this approach, based on patches contributed by Red Hat.
Libarchive reads this format but does not currently write it.
Apple stores a separate resource file for each file written to a tar archive. This is based on the "`copyfile()`" library function. The resource file is stored as a full tar file entry just before the regular file in the archive and has the same name, except the last path element has "`._`" prepended. For example, the file "`some/dir/foo`" would have a resource file "`some/dir/._foo`".
Note that because the resource file has the same name, long filename extensions get unnecessarily duplicated with this arrangement. For example, using Apple's patched GNU tar distributed with Mac OS X through version 10.5, a long filename would generate the following sequence in the tar archive:
- 'L' header for the long filename of the resource file
- Body of the 'L' header with the filename for the resource file
- '0' header for the resource file itself
- Contents of the resource file
- 'L' header for the long filename of the file itself
- Body of the 'L' header with the filename for the file
- '0' header for the file
- Contents of the file.
Starting with libarchive 3.0, the main libarchive distribution supports the Apple format on Mac OS. It does not parse the contents of the resource file or provide any support for Mac OS resources on other platforms.
AIX tar adds an 'E' entry following the regular file entry for each extended attribute attached to the file. The pathname of this entry is the name of the attribute. The body contains the value of the attribute. By using the standard ustar pathname and body, AIX avoids encoding concerns encoding for these fields.
The example I've seen seems to include some data beyond the end of the EA body but I'm not sure what that is for.
Unfortunately, the AIX approach requires a minimum of 1024 bytes in the archive for each EA. It also puts the EAs after the main entry. This is fine if you have few EAs and they tend to be large, but the applications of EAs that I've seen tend to use very small values. Adapting libarchive to support AIX EAs would also be difficult; libarchive assumes that all metadata (including EAs) precede the file data.
I've seen references to an 'E' entry used by Solaris tar but haven't yet found any detailed documentation or examples of this entry.