On Fri, 2025-07-04 at 20:51 +0100, Phillip Lougher wrote: > > > > On 26/06/2025 15:27 BST Joakim Tjernlund (Nokia) <joakim.tjernlund@xxxxxxxxx> wrote: > > > > > > On Thu, 2025-06-26 at 10:09 +0200, Joakim Tjernlund wrote: > > > We have an app running on a squashfs RFS(XZ compressed) and a appfs also on squashfs. > > > Whenever we validate an SW update image(stream a image.xz, uncompress it and on to /dev/null), > > > the apps are starved/blocked and make almost no progress, system time in top goes up to 99+% > > > and the console also becomes unresponsive. > > > > > > This feels like kernel is stuck/busy in a loop and does not let apps execute. > > > > > I have been away at the Glastonbury festival, hence the delay in replying. But > this isn't really anything to do with Squashfs per se, and basic computer > science theory explains what is going on here. So I'm surprised no-else has > responded. > > > > Kernel 5.15.185 > > > > > > Any ideas/pointers ? > > Yes, > > > > > > > Jocke > > > > This will reproduce the stuck behaviour we see: > > > cd /tmp (/tmp is an tmpfs) > > > wget > > https://fullimage.xz/ > > You've identified the cause here. > > > > > So just downloading it to tmpfs will confuse squashfs, seems to > > me that squashfs somehow see the xz compressed pages in page cache/VFS and > > tried to do something with them. > > But this is the completely wrong conclusion. Squashfs doesn't "magically" > see files downloaded into a different filesystem and try to do something > with them. > > What is happening is the system is thrashing, because the page cache doesn't > have enough remaining space to contain the working set of the running > application(s). > > See Wikipedia article > https://en.wikipedia.org/wiki/Thrashing_(computer_science) > > Tmpfs filesystems (/tmp here) are not backed by physical media, and their > content are stored in the page cache. So in effect if fullImage.xz takes > most of the page cache (system RAM), then there is no much space left to store > the pages of the applications that are running, and they constantly replace > each others pages. > > To make it easy, imagine we have two processes A and B, and the page cache > doesn't have enough space to store both the pages for processes A and B. > > Now: > > 1. Process A starts and demand-pages pages into the page cache from the > Squashfs root filesystem. This takes CPU resources to decompress the pages. > Process A runs for a while and then gets descheduled. > > 2. Process B starts and demand-pages pages into the page cache, replacing > Process A's pages. It runs for a while and then gets descheduled. > > 3 Process A restarts and finds all its pages have gone from page cache, and so > it has to re-demand-page the pages back. This replaces Process B's pages. > > 4. Process B restarts and finds all its pages have gone from the page cache ... > > In effect the system spends all it's time reading pages from the > Squashfs root filesystem, and doesn't do anything else, and hence it looks > like it has hung. > > This is not a fault with Squashfs, and it will happen with any filesystem > (ext4 etc) when system memory is too small to contain the working set of > pages. > > Now, to repeat what has caused this is the download of that fullImage.xz > which has filled most of the page cache (system RAM). To prevent that > from happening, there are two obvious solutions: > > 1. Split fullImage.xz into pieces and only download one piece at a time. This > will avoid filling up the page cache and the system trashing. > > 2. Kill all unnecessary applications and processes before downloading > fullImage.xz. In doing that you reduce the working set to RAM available, > which will again prevent thrashing. > > Hope that helps. > > Phillip You are absolutely right, above was low RAM due to filling the tmpfs RAM. But what threw me off was that I observed the same when streaming XZ to /dev/null. After som digging I found why, some XZ options do not respect "-0" presets w.r.t dict size and reset it back to default. Once I changed from "-0 --check=crc32 --arm --lzma2=lp=2,lc=2" to "-0 --check=crc32 --lzma2=dict=128KiB" I got a stable system. Perhaps xz -l could be improved to include dict size to make this more obvious? Jocke