[nycbug-talk] DragonFly HAMMER filessytem

Yarema yds at CoolRat.org
Wed Oct 17 17:26:33 EDT 2007

--On Wednesday, October 17, 2007 13:22:54 -0400 Isaac Levy 
<ike at lesmuug.org> wrote:

> Hi All,
> On Oct 15, 2007, at 12:18 PM, Pete Wright wrote:
>> On Mon, Oct 15, 2007 at 11:44:45AM -0400, Yarema wrote:
>>> Ike,
>>> I've gathered you like the ZFS implementation in FreeBSD.
>>> Check out Matt Dillon's HAMMER filesystem design document:
>>> http://Leaf.DragonFlyBSD.org/mailarchive/kernel/2007-10/msg00006.html
>>> ... and why he why he chose not to use Sun's ZFS:
>>> http://Leaf.DragonFlyBSD.org/mailarchive/kernel/2007-10/msg00008.html


> I see what Hammer is trying to accomplish, and the clustering/replication
> goals are AMAZING- however, I have one big show-stoppers for me using it:
> "...A volume is thus limited to 16TB..."
> I can't move foreword with this.  Right now the largest filesystems I
> touch are just over 10TB, so I'm under the limit- but the people using
> them are outgrowing them at a quick pace.  In another year I expect to be
> working with 30TB for single fileservers.
> (btw those boxes use UFS2 on FreeBSD-6-REL, and the boxes are SOOOO
> stable)
> I don't mean to sound macho about the disk space, but it's a real and
> growing concern for me right now.
> Managing it (with new features) is one thing, Hammer and ZFS both provide
> great tools for dealing with the increased space- but the raw idea of
> simply storing more bits is most important.
> --
> Maximum filesystem size comparison (what about maximum file size for
> Hammer btw?):
> Hammer: 16TB
> UFS2: 1YiB (1 Yobibyte =  ) http://en.wikipedia.org/wiki/Yobibyte
> ZFS: 16EiB (Exbibytes) http://en.wikipedia.org/wiki/EiB
> No time to do the math and figure out how many TB fit into a Yobibyte or
> an Exbibyte, but as far as my little brain can comprehend, both UFS2 and
> ZFS will meet my needs in the coming years.
> Sidenote, pretty neat specs (someone should add an entry for Hammer?!):
> http://en.wikipedia.org/wiki/Comparison_of_file_systems


Looking over the design document again, I'm not sure if the limit is 16TB 
or 524288TB:

    HAMMER's storage management limits it to 32768 volumes, 32768 clusters
    per volume, and 32768 16K filesystem buffers per cluster.   A volume
    is thus limited to 16TB and a HAMMER filesystem as a whole is limited
    to 524288TB.  HAMMER's on-disk structures are designed to allow future
    expansion through expansion of these limits.  In particular, the volume
    id is intended to be expanded to a full 32 bits in the future and using
    a larger buffer size will also greatly increase the cluster and volume
    size limitations by increasing the number of elements the buffer-
    restricted radix trees can manage.

Perhaps now would be a good time to drop a note to Matt Dillon expressing 
your concerns.  Maybe he'll expand these limits now instead of "in the 

What I find exciting about the HAMMER design is the versioning capabilities:

    A HAMMER filesystem can be mounted with an as-of date to access a
    snapshot of the system.  Snapshots do not have to be explicitly taken
    but are instead based on the retention policy you specify for any
    given HAMMER filesystem.  It is also possible to access individual files
    or directories (and their contents) using an as-of extension on the
    file name.

This is like having CVS or Subversion built into the filesystem.. almost.

... and lets not forget HAMMER database files:

    HAMMER uses 64 bit keys internally and makes key-based files directly
    available to userland.  Key-based files are not regular files and do not
    operate using a normal data offset space.

    You cannot copy a database file using a regular file copier.  The
    file type will not be S_IFREG but instead will be S_IFDB.   The file
    must be opened with O_DATABASE.  Reads which normally seek the file
    forward will instead iterate through the records and lseek/qseek can
    be used to acquire or set the key prior to the read/write operation.

Think how iTunes or Amarok create a db cache of all the meta tags.  Dovecot 
does the same for email headers.  Wouldn't it be nice if these apps had db 
support at the filesystem level instead of having to roll their own 
solution. The BeOS bfs had this with indexed attributes and people loved 
being able to get instant results searching through their mp3 collection 
and mail store.


More information about the talk mailing list