[talk] Does swap still matter?

Wed Mar 16 17:18:56 EDT 2022

> On Wed, Mar 16, 2022 at 2:22 PM Mark Saad <nonesuch at longcount.org> wrote:
> > Compressed pages (z-ram) is a cool idea in Linux and esxi 

> Depends on your embedded setup. A lot of projects put a high priority
> on predictability. You don't even allocate RAM because it means a
> potential run-time failure or arbitrary blocking. Better to just stick
> it all in the BSS

I don't think these never-call-malloc engine controller systems are the
most helpful thing to add to the taxonomy because they are basically 
emulations of analog computers and not POSIX, though I see the analogy
with a swapless non-overcommitted embedded system.  They're like an extreme
version of it with the same motivation!

I think the question for POSIX world is whether RAM explosion failures can be 
isolated on the time-sharing system.  Thrashing usually breaks 
process / user / jail isolation.  overcommit + OOM-killing I think can 
sometimes be configured to preserve isolation.  Something as simple as "kill
the thing with the biggest RSS" is actually decent user- or process-isolation
compared to thrashing.

What about swap inside a VM guest, so the guest's RAM is limited, but the
guest can swap?
 * virtual disk: may break isolation between VMs, if storage QoS is not good
 * zram: won't break isolation between VMs

What about with containers instead of VMs?  This can be almost arbitrarily
flexible, with these "tree" schedulers that overcomplicated garbage like 
systemd or nice garbage like SMF sets up, where users are isolated from one
another, from other instances of themslves (fork bomb on your X session?
you can ssh in.), and processes within a user session are isolated from one
another so each process gets an equal share of CPU and 1000-thread processes
are not overserved.  A system that can isolate at this level is superior to 
one that can only isolate VMs because it can do more "work conserving", 
more borrowing and sharing and stuff.

Has anyone used zram with LXC?  Can you set physical memory limits on
the containers, and is zram paging accounted to container CPU limits?  quick
lmgtfy says, "yes, you probably can":

 https://github.com/lxc/lxd/issues/3337#issuecomment-303596914

If so zram may be able to preserve user isolation where traditional swap can't.
but I'm not sure it is so.  The difference: zram consumes only CPU, which 
can (theoretically? or actually?) be accounted to the page faulter and 
scheduled.  Other swap mechanisms consume storage bandwidth which is often 
not accounted or scheduled, and if it somehow is it won't be with the 
full-fancy tree scheduler available for CPU scheduling, so it can turn into 
a user isolation breakdown.