On Fri, Aug 29, 2025 at 10:57 AM Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > -----Original Message----- > > From: Barry Song <21cnbao@xxxxxxxxx> > > Sent: Thursday, August 28, 2025 4:29 PM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > hannes@xxxxxxxxxxx; yosry.ahmed@xxxxxxxxx; nphamcs@xxxxxxxxx; > > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx; > > ryan.roberts@xxxxxxx; ying.huang@xxxxxxxxxxxxxxxxx; akpm@linux- > > foundation.org; senozhatsky@xxxxxxxxxxxx; linux-crypto@xxxxxxxxxxxxxxx; > > herbert@xxxxxxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; > > clabbe@xxxxxxxxxxxx; ardb@xxxxxxxxxx; ebiggers@xxxxxxxxxx; > > surenb@xxxxxxxxxx; Accardi, Kristen C <kristen.c.accardi@xxxxxxxxx>; > > Gomes, Vinicius <vinicius.gomes@xxxxxxxxx>; Feghali, Wajdi K > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > > Subject: Re: [PATCH v11 22/24] mm: zswap: Allocate pool batching resources > > if the compressor supports batching. > > > > > > > > > > If ZSWAP_MAX_BATCH_SIZE is set to 8 and there is no hardware batching, > > > > compression is done with a step size of 1. If the hardware step size is 4, > > > > compression occurs in two steps. If the hardware step size is 6, the first > > > > compression uses a step size of 6, and the second uses a step size of 2. > > > > Do you think this will work? > > > > > > Hi Barry, > > > > > > This would be non-optimal from code simplicity and latency perspectives. > > > One of the benefits of using the hardware accelerator's "batch parallelism" > > > is cost amortization across the batch. We might lose this benefit if we make > > > multiple calls to zswap_compress() to ask the hardware accelerator to > > > batch compress in smaller batches. Compression throughput would also > > > be sub-optimal. > > > > I guess it wouldn’t be an issue if both ZSWAP_MAX_BATCH_SIZE and > > pool->compr_batch_size are powers of two. As you mentioned, we still > > gain improvement with ZSWAP_MAX_BATCH_SIZE batching even when > > pool->compr_batch_size == 1, by compressing pages one by one but > > batching other work such as zswap_entries_cache_alloc_batch() ? > > > > > > > > In my patch-series, the rule is simple: if an algorithm has specified a > > > batch-size, carve out the folio by that "batch-size" # of pages to be > > > compressed as a batch in zswap_compress(). This custom batch-size > > > is capped at ZSWAP_MAX_BATCH_SIZE. > > > > > > If an algorithm has not specified a batch-size, the default batch-size > > > is 1. In this case, carve out the folio by ZSWAP_MAX_BATCH_SIZE > > > # of pages to be compressed as a batch in zswap_compress(). > > > > Yes, I understand your rule. However, having two global variables is still > > somewhat confusing. It might be clearer to use a single variable with a > > comment, since one variable can clearly determine the value of the other. > > > > Can we get the batch_size at runtime based on pool->compr_batch_size? > > > > /* > > * If hardware compression supports batching, we use the hardware step size. > > * Otherwise, we use ZSWAP_MAX_BATCH_SIZE for batching, but still > > compress > > * one page at a time. > > */ > > batch_size = pool->compr_batch_size > 1 ? pool->compr_batch_size : > > ZSWAP_MAX_BATCH_SIZE; > > > > We probably don’t need this if both pool->compr_batch_size and > > ZSWAP_MAX_BATCH_SIZE are powers of two? > > I am not sure I understand this rationale, but I do want to reiterate > that the patch-set implements a simple set of rules/design choices > to provide a batching framework for software and hardware compressors, > that has shown good performance improvements with both, while > unifying zswap_store()/zswap_compress() code paths for both. I’m really curious: if ZSWAP_MAX_BATCH_SIZE = 8 and compr_batch_size = 4, why wouldn’t batch_size = 8 and compr_batch_size = 4 perform better than batch_size = 4 and compr_batch_size = 4? In short, I’d like the case of compr_batch_size == 1 to be treated the same as compr_batch_size == 2, 4, etc., since you can still see performance improvements when ZSWAP_MAX_BATCH_SIZE = 8 and compr_batch_size == 1, as batching occurs even outside compression. Therefore, I would expect batch_size == 8 and compr_batch_size == 2 to perform better than when both are 2. The only thing preventing this from happening is that compr_batch_size might be 5, 6, or 7, which are not powers of two? > > As explained before, keeping the two variables as distinct u8 members > of struct zswap_pool is a design choice with these benefits: > > 1) Saves computes by avoiding computing this in performance-critical > zswap_store() code. I have verified that dynamically computing the > batch_size based on pool->compr_batch_size impacts latency. Ok, I’m a bit surprised, since this small computation wouldn’t cause a branch misprediction at all. In any case, if you want to keep both variables, that’s fine. But can we at least rename one of them? For example, use pool->store_batch_size and pool->compr_batch_size instead of pool->batch_size and pool->compr_batch_size, since pool->batch_size generally has a broader semantic scope than compr_batch_size. Thanks Barry