[RFC] Looking at the IAA driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've been staring at the crypto IAA driver and checking if "wq_table"
relies on disabling bottom halves/ expecting that softirq is not
preemtible. It does not look like it does. wq_table_next_wq() returns a
"struct idxd_wq *" which could return twice the same but the struct
itself has its own locking which looks fine.
Please correct me if I am wrong ;)

While looking at it, I noticed a few other things:

- wq_table
  This could be initialized as DEFINE_PER_CPU() instead the
  alloc_percpu() it has now. I see no benefit from delaying it. The
  driver is loaded on its own once the dmaengine counterpart is loaded
  first and added the special bus plus the device. So the allocation
  could be avoided.

- nr_cpus, nr_nodes, nr_cpus_per_node.
  This is derived from
   - nr_cpus = num_possible_cpus();
   - for_each_node_with_cpus(node) nr_nodes++;
   - nr_cpus_per_node = nr_cpus / nr_nodes;

   Moving on. nr_cpus_per_node is only used as
       cpus_per_iaa = (nr_nodes * nr_cpus_per_node) / nr_iaa;

   Given the above definition for nr_cpus_per_node we can substitute
       cpus_per_iaa = nr_cpus / nr_iaa

   and then remove nr_cpus_per_node as I see no other user. If we ignore
   the one pr_debug() then we could also remove nr_nodes because there
   are no users.

   There are several iterations which go from 0 to nr_nodes. I would
   recommend using for_each_possible_cpu(). Then nr_cpus could be
   removed.

- wq_table->wqs
  This is allocated as (max_wqs * sizeof(struct wq *)). I don't know
  where "struct wq" is from but it is not "struct idxd_wq" and gcc
  probably doesn't care because it is a pointer. Also the memory for the
  per-CPU memory is allocated always from the _current_ NUMA node. Not
  bad but also not ideal.
  How huge can max_wqs can get? If it is, say 64, then you could make a
  fix array of 64 which is part of struct wq_table_entry and avoid the
  allocation. 
  Another variant would be to do something like
   struct wq_table_entry {
         int     max_wqs;
         int     n_wqs;
         int     cur_wq;
         struct idxd_wq wqs[];
   };
   and then
       wq_table = alloc_percpu(struct_size(wq_table, wqs, max_wqs));

   voilà.
   I didn't figure out how the final struct idxd_wq is allocated but it
   would make sense to allocate it node local the CPU if possible.

Sebastian





[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux