On Tue, Jan 21, 2025 at 11:37:38PM -0800, Christoph Hellwig wrote: > On Wed, Jan 22, 2025 at 08:29:09AM +0100, Gerhard Wiesinger wrote: > > BTW: Why does it break the ACID properties? > > It doesn't if implemented properly, which of course means out of place > writes. > > The only sane way to implement compression in XFS would be using out > of place writes, which we support for reflinks and which is heavily > used by the new zoned mode. For the latter retrofitting compression > would be relatively easy, but it first needs to get merged, then > stabilize and mature, and then we'll need to see if we have enough > use cases. So don't plan for it. ... but out of place writes means that every single fdatasync() called by the database now requires a file system level transaction commits. So now every single fdatasync(2) results in the data blocks getting written out to a new location on disk (this is what out of place writes mean), followed by a CACHE FLUSH, followed by the metadata updates to point at the new location on the disk, first written to the file system tranaction log, followed by the fs commit block, followed by a *second* CACHE FLUSH command. So now let's look at a sample scenario where the database needs to update 3 different 4k blocks (for example, where you are are crediting $100 to an income account, followed by a $100 debit to an expense account, followed by the database commit. Without transparent compression commits (assuming the database is properly using fdatasync so it's not asking the file system to update the ctime/mtime of the database file): 1) random write A (4k write) 2) random write B (4k write) 3) random write C (4k write) 4) CACHE FLUSH With transparent compression: 1) random write A 2) random write B 3) random write C 4) CACHE FLUSH 5) update the location of compression cluster A written to the fs journal 6) update the location of compression cluster B written to the fs journal 7) update the location of compression cluster C written to the fs journal 8) write the commit block to the fs journal 9) CACHE FLUSH This kills performance, and as I mentioned, in general, IOPS are expensive and write bandwidth is often far more expensive than bytes storage. This is true both for the raw storage by the cloud provider, the extra network bandwidth bewteen the host and cluster file system storing the emulated cloud block device, and amount of money charged to the cloud customer because it does cost more money to the cloud provider. If you try to do transparent compression using update-in-place (for example, via the technique in the Stac patent) then you don't need to update the location on disk, but given that you are replacing a 64k compression cluster every time you update a 4k block, if you crash in the middle of the 64k compression cluster update, that cluster could get corrupted --- at which point you break the database's ACID properties. Finally, note that both Amazon and Google have first party cloud products (RDS and CloudSQL, respectively) that provide to the customer the full MySQL and Postgres feature set. So if you want to enable database level compression, I believe you *can* do it. Compression is not free, and not magic, but if it works for you, you *can* enable it if you are using MySQL or Postgres. Now, if you are using a database that doesn't support database-level compression, then why don't you try demanding your vendor that is providing the database to add compression as a feature? Of course, they might ask you as the customer to pay $$$, but the development cost to add new features, whether in the database or the file system, is also not free. Cheers, - Ted