[nycbug-talk] DragonFly HAMMER filessytem
Yarema
yds at CoolRat.org
Wed Oct 17 17:26:33 EDT 2007
--On Wednesday, October 17, 2007 13:22:54 -0400 Isaac Levy
<ike at lesmuug.org> wrote:
> Hi All,
>
> On Oct 15, 2007, at 12:18 PM, Pete Wright wrote:
>> On Mon, Oct 15, 2007 at 11:44:45AM -0400, Yarema wrote:
>>> Ike,
>>>
>>> I've gathered you like the ZFS implementation in FreeBSD.
>>> Check out Matt Dillon's HAMMER filesystem design document:
>>> http://Leaf.DragonFlyBSD.org/mailarchive/kernel/2007-10/msg00006.html
>>> ... and why he why he chose not to use Sun's ZFS:
>>> http://Leaf.DragonFlyBSD.org/mailarchive/kernel/2007-10/msg00008.html
<..snipped..>
> I see what Hammer is trying to accomplish, and the clustering/replication
> goals are AMAZING- however, I have one big show-stoppers for me using it:
>
> "...A volume is thus limited to 16TB..."
>
> I can't move foreword with this. Right now the largest filesystems I
> touch are just over 10TB, so I'm under the limit- but the people using
> them are outgrowing them at a quick pace. In another year I expect to be
> working with 30TB for single fileservers.
> (btw those boxes use UFS2 on FreeBSD-6-REL, and the boxes are SOOOO
> stable)
>
> I don't mean to sound macho about the disk space, but it's a real and
> growing concern for me right now.
> Managing it (with new features) is one thing, Hammer and ZFS both provide
> great tools for dealing with the increased space- but the raw idea of
> simply storing more bits is most important.
>
> --
> Maximum filesystem size comparison (what about maximum file size for
> Hammer btw?):
>
> Hammer: 16TB
> UFS2: 1YiB (1 Yobibyte = ) http://en.wikipedia.org/wiki/Yobibyte
> ZFS: 16EiB (Exbibytes) http://en.wikipedia.org/wiki/EiB
>
> No time to do the math and figure out how many TB fit into a Yobibyte or
> an Exbibyte, but as far as my little brain can comprehend, both UFS2 and
> ZFS will meet my needs in the coming years.
>
> Sidenote, pretty neat specs (someone should add an entry for Hammer?!):
> http://en.wikipedia.org/wiki/Comparison_of_file_systems
Ike,
Looking over the design document again, I'm not sure if the limit is 16TB
or 524288TB:
HAMMER's storage management limits it to 32768 volumes, 32768 clusters
per volume, and 32768 16K filesystem buffers per cluster. A volume
is thus limited to 16TB and a HAMMER filesystem as a whole is limited
to 524288TB. HAMMER's on-disk structures are designed to allow future
expansion through expansion of these limits. In particular, the volume
id is intended to be expanded to a full 32 bits in the future and using
a larger buffer size will also greatly increase the cluster and volume
size limitations by increasing the number of elements the buffer-
restricted radix trees can manage.
Perhaps now would be a good time to drop a note to Matt Dillon expressing
your concerns. Maybe he'll expand these limits now instead of "in the
future".
What I find exciting about the HAMMER design is the versioning capabilities:
A HAMMER filesystem can be mounted with an as-of date to access a
snapshot of the system. Snapshots do not have to be explicitly taken
but are instead based on the retention policy you specify for any
given HAMMER filesystem. It is also possible to access individual files
or directories (and their contents) using an as-of extension on the
file name.
This is like having CVS or Subversion built into the filesystem.. almost.
... and lets not forget HAMMER database files:
HAMMER uses 64 bit keys internally and makes key-based files directly
available to userland. Key-based files are not regular files and do not
operate using a normal data offset space.
You cannot copy a database file using a regular file copier. The
file type will not be S_IFREG but instead will be S_IFDB. The file
must be opened with O_DATABASE. Reads which normally seek the file
forward will instead iterate through the records and lseek/qseek can
be used to acquire or set the key prior to the read/write operation.
Think how iTunes or Amarok create a db cache of all the meta tags. Dovecot
does the same for email headers. Wouldn't it be nice if these apps had db
support at the filesystem level instead of having to roll their own
solution. The BeOS bfs had this with indexed attributes and people loved
being able to get instant results searching through their mp3 collection
and mail store.
--
Yarema
http://yds.CoolRat.org
More information about the talk
mailing list