> > > > If ZSWAP_MAX_BATCH_SIZE is set to 8 and there is no hardware batching, > > compression is done with a step size of 1. If the hardware step size is 4, > > compression occurs in two steps. If the hardware step size is 6, the first > > compression uses a step size of 6, and the second uses a step size of 2. > > Do you think this will work? > > Hi Barry, > > This would be non-optimal from code simplicity and latency perspectives. > One of the benefits of using the hardware accelerator's "batch parallelism" > is cost amortization across the batch. We might lose this benefit if we make > multiple calls to zswap_compress() to ask the hardware accelerator to > batch compress in smaller batches. Compression throughput would also > be sub-optimal. I guess it wouldn’t be an issue if both ZSWAP_MAX_BATCH_SIZE and pool->compr_batch_size are powers of two. As you mentioned, we still gain improvement with ZSWAP_MAX_BATCH_SIZE batching even when pool->compr_batch_size == 1, by compressing pages one by one but batching other work such as zswap_entries_cache_alloc_batch() ? > > In my patch-series, the rule is simple: if an algorithm has specified a > batch-size, carve out the folio by that "batch-size" # of pages to be > compressed as a batch in zswap_compress(). This custom batch-size > is capped at ZSWAP_MAX_BATCH_SIZE. > > If an algorithm has not specified a batch-size, the default batch-size > is 1. In this case, carve out the folio by ZSWAP_MAX_BATCH_SIZE > # of pages to be compressed as a batch in zswap_compress(). Yes, I understand your rule. However, having two global variables is still somewhat confusing. It might be clearer to use a single variable with a comment, since one variable can clearly determine the value of the other. Can we get the batch_size at runtime based on pool->compr_batch_size? /* * If hardware compression supports batching, we use the hardware step size. * Otherwise, we use ZSWAP_MAX_BATCH_SIZE for batching, but still compress * one page at a time. */ batch_size = pool->compr_batch_size > 1 ? pool->compr_batch_size : ZSWAP_MAX_BATCH_SIZE; We probably don’t need this if both pool->compr_batch_size and ZSWAP_MAX_BATCH_SIZE are powers of two? > > > > > I don’t quite understand why you want to save > > ZSWAP_MAX_BATCH_SIZE - X resources, since even without hardware > > batching > > you are still allocating all ZSWAP_MAX_BATCH_SIZE resources. This is the > > case for all platforms except yours. > > Not sure I understand.. Just to clarify, this is not done to save on resources, > rather for the reasons stated above. > > We are already saving on resources by only allocating only > "pool->compr_batch_size" number of resources > (*not* ZSWAP_MAX_BATCH_SIZE resources): > > pool->compr_batch_size = min(ZSWAP_MAX_BATCH_SIZE, > crypto_acomp_batch_size(acomp_ctx->acomp)); > > For non-Intel platforms, this means only 1 dst buffer is allocated, as > explained in the commit log for this patch. you are right. I misunderstood your code :-) > > " A "u8 compr_batch_size" member is added to "struct zswap_pool", as per > Yosry's suggestion. pool->compr_batch_size is set as the minimum of the > compressor's max batch-size and ZSWAP_MAX_BATCH_SIZE. Accordingly, it > proceeds to allocate the necessary compression dst buffers in the > per-CPU acomp_ctx." This is fine, but it still doesn’t provide a strong reason for having two global variables when one can fully determine the value of the other. Thanks Barry