[nycbug-talk] Approach for the NFS cluster

Tue May 19 13:23:00 EDT 2009

>>>>> "sk" == Steven Kreuzer <skreuzer at exit2shell.com> writes:

    sk> This is an extremely oversimplified explanation of how you
    sk> could provide HA NFS.

yeah, though this is coming from someone who's never done it, that
sounds like a good summary.  Except that I don't know of any actual
clustering software built over ggate, and it's not something you roll
yourself with shell scripts.  The volume cannot be mounted on both
nodes at the same time because obviously the filesystem doesn't
support that, so, like other HA stuff, there has to be a heartbeat
network connection or a SCSI reservation scheme or some such magic so
the inactive node knows it's time to take over the storage,
fsck/log-roll it, mount it, export the NFS.  It's not like they can
both be ready all the time, and CARP will decide which one gets the
work---not posible.  Also the active node has to notice if, for some
reason, it has lost control by the rules of the heartbeat/reservation
scheme even though it doesn't feel crashed, and in that case it should
crash itself.

There may also be some app-specific magic in NFS.  The feature that
lets clients go through server reboots without losing any data, even
on open files, should make it much easier to clusterify than SMB: on
NFS this case is explicitly supported by, among other things, all the
write caches in the server filesystem and disks are kept in duplicate
in the clients so they can be re-rolled if the server crashes.  But
there may be some tricky corner cases the clustering software needs to
handle.  For example, on Solaris if using ZFS, you can ``disable the
ZIL'' to improve NFS performance in the case where you're opening,
writing, closing files frequently, but the cost of disabling is that
you lose this stateless-server-reboot feature.

    sk> suggest you look at Isilon, NetApp and Sun,

The solaris clustering stuff may actually be $0.  I'm not sure though,
never run it.  The clustering stuff is not the same thing as the pNFS
stuff.  +1 on Steven's point that you can do this with regular NFS on
the clients---only the servers need to be special.  But they need to
be pretty special.  The old clusters used a SCSI chain with two host
adapters, one at each end of the bus, so there's no external
terminator (just the integrated terminator in the host adapters).
These days probably you will need a SAS chassis with connections for
two initiators.  unless the ggate thing works, but there's a need to
flush write buffers deterministically when told to for the NFS corner
case, and some clusters use this SCSI-2 reservation command,
so...shared storage is not so much this abstract modular good-enough
blob.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL: <https://lists.nycbug.org:8443/pipermail/talk/attachments/20090519/77f5d7c2/attachment.bin>