Re: Unprivileged filesystem mounts

Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx> · Thu, 20 Mar 2025 02:26:41 -0400

On Wed, Mar 19, 2025 at 05:25:17PM -0400, Theodore Ts'o wrote:
> On Wed, Mar 19, 2025 at 01:44:13PM -0400, Demi Marie Obenour wrote:
> > > Note that this won't help if you have a malicious hardware that
> > > *pretends* to be a USB storage device, but which doens't behave a like
> > > a honest storage device.  For example, reading a particular sector
> > > with one data at time T, and a different data at time T+X, with no
> > > intervening writes.  There is no real defense to this attack, since
> > > there is no way that you can authentiate the external storage device;
> > > you could have a registry of USB vendor and model id's, but a device
> > > can always lie about its id numbers.
> > 
> > This attack can be defended against by sandboxing the filesystem driver
> > and copying files to trusted storage before using them.  You can
> > authenticate devices based on what port they are plugged into, and Qubes
> > OS is working on exactly that.
> 
> Copying files to trusted storge is not sufficient.  The problem is
> that an untrustworthy storage device can still play games with
> metadata blocks.  If you are willing to copy the entire storage device
> to trustworthy storage, and then run fsck on the file system, and then
> mount it, then *sure* that would help.  But if the storage device is
> very large or very slow, this might not be practical.

Copying flles is not sufficient on its own.  You need to _also_ sandbox
the file system driver, which defeats the attack you mentioned above:
the attacker can compromise the VM running the file system, but that
doesn't give the attacker anything particularly useful.

> > > Like everything else, security and usability and performance and costs
> > > are all engineering tradeoffs....
> >
> > Is the tradeoff fundamental, or is it a consequence of Linux being a
> > monolithic kernel?  If Linux were a microkernel and every filesystem
> > driver ran as a userspace process with no access to anything but the
> > device it is accessing, then there would be no tradeoff when it comes to
> > filesystems: a compromised filesystem driver would have no more access
> > than the device itself would, so compromising a filesystem driver would
> > be of much less value to an attacker.  There is still the problem that
> > plug and play is incompatible with not trusting devices to identify
> > themselves, but that's a different concern.
> 
> Microkernels have historically been a performance disaster.  Yes, you
> can invest a *vast* amount of effort into trying to make a microkernel
> OS more performant, but in the meantime, the competing monolithic
> kernel will have gotten even faster, or added more features, leaving
> the microkernel in the dust.

The L4 family of microkernels, and especially seL4, show that
microkernels do not need to be slow.  I do agree that making a
microkernel-based OS fast is hard, but on the other hand, running an
entire Linux VM just to host a single application isn't exactly an
efficient use of resources either.  The latter is what systems like Kata
containers wind up doing.

> The effort needed to create a new file system from scratch, taking it
> all the way from the initial design, implementation, testing and
> performance tuning, and making it something customers are comfortable
> depending on it for enterprise workloads is between 50 and 100
> engineer years.  This estimate came from looking at the development
> effort needed for various file systems implemented on monolithic
> kernels, including Digital's Advfs (part of Digital Unix and OSF/1),
> IBM's AIX, and Sun's ZFS, as well as GPFS from IBM (although that was
> a cluster file sytem, and the effort estimated from my talking to the
> engineering managers and tech leads was around 200 PY's.)
> 
> I'm not sure how much harder it will be to make a performant file
> system which is suitable for enterprise workloads from a performance,
> feature, and stability perspective, *and* to make it secure against
> storage devices which are outside the TCB, *and* to make it work on a
> microkernel.  But I'm going to guess it would inflate these effort
> estimates by at least 50%, if not more.

My understanding is that "Secure against storage devices which are
outside the TCB" mostly requires 2 things:

1. Either a programming language in which memory safety vulnerabilities
   are difficult to introduce by accident, or a sandbox that ensures
   that a compromised file system driver cannot do more than cause file
   system operations to return wrong results.

2. A way to kill a file system that is caught in an infinite loop, is
   eating too much memory, or is otherwise the victim of a denial of
   service attack without crashing the whole system.  This is not needed
   if denial of service attacks are outside of your threat model.

I'm not asking you (or anyone else) to write a filesystem driver that
has no bugs in the face of arbitrarily corrupted input.  I _expect_ that
there will be bugs in this case.  Right now, Linux kernel file systems
are written in C and run in the kernel, which means that a bug can
easily result in a complete system compromise.

> Of course, if we're just witing a super simple file system that is
> suitable for backups and file transfers, but not much else, that would
> probably take much less efort.  But if we need to support file
> exchange with storge devices with NTFS or HFS, thos aren't simple file
> sytes.  So the VM sandbox approach might still be the better way to go.

Certainly the VM sandbox is the simplest approach in the short term.

P.S.: For all that I may disagree with you on a lot of things, I am very
grateful for all the work you have put into making ext4 as solid a
filesystem as it is, as well as for your other innovations (like
creating /dev/{u,}random).
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Attachment:
signature.asc

Description: PGP signature