[nycbug-talk] 5.4 Jails, nullfs or unionfs?

Tue Apr 19 02:14:38 EDT 2005

Turning on the -v flag here,

On Apr 19, 2005, at 12:49 AM, Isaac Levy wrote:

> All the filesystem mounts in 5.x are different- I'm not sure what's 
> inside and outside of stable any more, except now fundamental things 
> in 5.x, like devfs and procfs...

Actually, I didn't think it fair of me to just blurt that out without 
more explanation.

On Apr 19, 2005, at 1:04 AM, pete wright wrote:

> I remember these topics coming up, and from what I understood about
> most of the issues people where trying to address devfs seemed to
> handle most of the issues.  Specifically in regards to jailing, one is
> able to create devfs rulesets per jail.
Yes- absolutely- it's really WAY cleaner to work with devices now, 
inside and outside of jails IMHO.

> Thusly an admin is able to
> have some level of controll over what devices are accessable per jail,
> and how they are used.
Yes- as well as processes now too.  This stuff is cleaned up NIIIICE 
for 5.x,

All filesystems, real and abstracted, are heavily rebuilt for FreeBSD 
5.x.  For jailing, this deserves some attention for the record, with 
the jailing threads tonight (George mentioned all the new fabby 
jail-related sysctl variables, and devices play a big role in this for 
jailing):

----------
- devfs

In FreeBSD 4.x, devices are built statically- everything in /dev.  So, 
if one wants to limit access to various devices, one has to simply 
remove/delete them- it's that simple.  This has always had one pain in 
the tail drawback, what if one wants to re-enstate a device for a 
particular reason?  It means that one has to rebuild the devices for 
the jail from the master system, and go about re-deleting the devices 
which one still desires not to give jailed users access to.  Clumsy, 
but functional.

In FreeBSD 5.x devices are mounted using mount_devfs.  This is terrific 
for jailing, insomuch as the start scripts for a given jail can contain 
flags to mount_devfs to hide various devices, and the jail never gets 
them- it's that simple.  This makes reconfiguration of devices 
allocated to a particular jail pretty elegant and flexible.  So there's 
one nice big change which makes things simpler.
Sysctl variables affected:

security.jail.getfsstatroot_only
     This determines weather or not a jailed process can see all
     system mountpoints, a weakness somewhat swept under the rug in 4.x
kern.securelevel
     !Locking down a given jail's securelevel can be used to neatly
     mitigate the risks of memory hog fork-bombs, and other resource 
based attacks.
     In the 4.x era, these risks could really only be well mitigated by 
running an entire box at a
     given securelevel, which is a serious pain in the tail for day to 
day management IMHO.
     This enables a master server, and it's jails, to run at different 
securelevels- making
     things like locking down login.conf as immutable 
(maxproc/memoryuse) simple, and granular.

     I personally feel this is the most signifigant improvement in 
jailing over the 4.x era.

----------

- procfs

In FreeBSD 4.x procfs had some serious issues within jails, but was not 
really necessary for most practical applications, so there wasn't much 
fuss in simply not using it.  Early procfs had a massive hole where 
underprivileged local users could use procfs to gain superuser 
privileges based on insufficient access controll checks in procfs.  
This was fixed (around 2000 I believe, sorted out by 4.4), but procfs 
was slated for a full overhaul- and therefore was a waste of time to 
fix for jailing purposes.  This led to some of the more restricted feel 
of a jail, top needing patching (and breaking the patches every dot 
release of 4.x era, so much so that at the time of this writing, nobody 
is bothering to patch top for 4.10 or 4.11, to my knowledge).
The actual impact of not having procfs became moot in the context of 
*using* various applications, mostly internet applications, from jails- 
it just made for a fairly clumsy feel at times.
(note- this is where the PostgreSQL/jail disfunctional issues come from 
:)
Some folks graciously wrote some fine hacks for managing jails easier, 
jps, jtop, jkill- all perl wrappers which parse the output of their 
regular counterparts.
Also, there were only 3 real ways to control certain aspects of jails, 
and it was limited to all the jails on the system:

jail.set_hostname_allowed
     This lock is sane because a number of external jail mgmt utilities
     in 4.x use the jails hostname to get their job done.
jail.socket_unixiproute_only
     Opening this lets a jail access network protocol
     stacks that have not had jail functionality added to them,
     so one can use more than UNIX domain sockets, ipv4 addresses, and
     routing sockets.  Most internet apps don't call for any more than 
this.
jail.sysvipc_allowed
     This is dirty, and downright insecure- (but lets one
     use PostgreSQL from a 4.x jail, for example)

In FreeBSD 5.x, procfs is an entirely new animal, and thoughtfully 
enhances jail use and management.  The procfs rebuild has let the 
jls(8) and jexec(8) utilities come into being, created for listing and 
executing things in jails, respectively.  Also, killall now has a -j 
flag, to kill an entire jailed process tree cleanly, (retirement for 
jkill, thanks for all the bloodshed!).  It's really the way it ought to 
be now.
This enhances the entire way that one can lock down jails too- using 
the sysctl settings- and basically lets an administrator tweak out all 
the things which were previously almost impossible to enable in jails:

security.jail.allow_raw_sockets
     Want ping or traceroute from jails for some reason?  Have at it!
security.jail.set_hostname_allowed
     This is useful, as a number of utilities (and admins)
     may still use hostnames to manage jails from a master system- but 
jailing
    utilities now seem to be able to mitigate the risks of jails getting 
lost
    in the shuffle, by having more elegant ways to grab the jailed 
process trees.
security.jail.socket_unixiproute_only
     This is identical to the 4.x counterpart, and still leaves things 
like
     access to IPV6 stacks crudely wide open with access to every stack,
     for a practical example.
security.jail.sysvipc_allowed
     Again, this is identical again to the 4.x counterpart- and is a 
good lesson
     that complete backwards-compatibility breeds swiss-cheese insecure 
systems, IMHO.

--
Seems like a lot of jail-specific stuff, but in the end, it really 
isn't.  The defaults are setup to be as restrictive as is sane, (i.e. 
no securelevel choices are made in defaults, etc...) so there are no 
real surprises.

Now, with regard to nullfs and unionfs, I have *no idea* what the state 
of these are for FreeBSD, or for jails- but I'm not personally aware of 
any manditory use cases for these in jails to begin with- though I can 
think of things which would become nicer to manage, (a single update to 
a ports tree for all systems perhaps, or a single user-land image 
template for massively parallel jailed clusters, etc...), but these 
cases, to me, seem to be better suited to chrooted enviornments, 
because of the implied homogeneity- so I'm stumped, (and looking for a 
reason to get exited about nullfs or unionfs!)...

Rocket-
.ike