On Thu, 26 Jun 2025 12:26:19 +0200 Lorenzo Pieralisi <lpieralisi@xxxxxxxxxx> wrote: > The GICv5 architecture implements Interrupt Translation Service > (ITS) components in order to translate events coming from peripherals > into interrupt events delivered to the connected IRSes. > > Events (ie MSI memory writes to ITS translate frame), are translated > by the ITS using tables kept in memory. > > ITS translation tables for peripherals is kept in memory storage > (device table [DT] and Interrupt Translation Table [ITT]) that > is allocated by the driver on boot. > > Both tables can be 1- or 2-level; the structure is chosen by the > driver after probing the ITS HW parameters and checking the > allowed table splits and supported {device/event}_IDbits. > > DT table entries are allocated on demand (ie when a device is > probed); the DT table is sized using the number of supported > deviceID bits in that that's a system design decision (ie the > number of deviceID bits implemented should reflect the number > of devices expected in a system) therefore it makes sense to > allocate a DT table that can cater for the maximum number of > devices. > > DT and ITT tables are allocated using the kmalloc interface; > the allocation size may be smaller than a page or larger, > and must provide contiguous memory pages. > > LPIs INTIDs backing the device events are allocated one-by-one > and only upon Linux IRQ allocation; this to avoid preallocating > a large number of LPIs to cover the HW device MSI vector > size whereas few MSI entries are actually enabled by a device. > > ITS cacheability/shareability attributes are programmed > according to the provided firmware ITS description. > > The GICv5 partially reuses the GICv3 ITS MSI parent infrastructure > and adds functions required to retrieve the ITS translate frame > addresses out of msi-map and msi-parent properties to implement > the GICv5 ITS MSI parent callbacks. > > Co-developed-by: Sascha Bischoff <sascha.bischoff@xxxxxxx> > Signed-off-by: Sascha Bischoff <sascha.bischoff@xxxxxxx> > Co-developed-by: Timothy Hayes <timothy.hayes@xxxxxxx> > Signed-off-by: Timothy Hayes <timothy.hayes@xxxxxxx> > Signed-off-by: Lorenzo Pieralisi <lpieralisi@xxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Marc Zyngier <maz@xxxxxxxxxx> Hi Lorenzo, It almost certainly doesn't matter, but there are a couple of release paths in here where things don't happen in the same order as the equivalent error tear down paths (i.e. not reverse of setup). There may well be a good reason for that but I couldn't immediately spot what it was. Also a follow up similar to earlier comment about the table sizing code not matching the comments above it. Same thing going on here. Jonathan git a/drivers/irqchip/irq-gic-v5-its.c b/drivers/irqchip/irq-gic-v5-its.c > new file mode 100644 > index 000000000000..cba632eb0273 > --- /dev/null > +++ b/drivers/irqchip/irq-gic-v5-its.c > +/* > + * Function to check whether the device table or ITT table support > + * a two-level table and if so depending on the number of id_bits > + * requested, determine whether a two-level table is required. > + * > + * Return the 2-level size value if a two level table is deemed > + * necessary. > + */ > +static bool gicv5_its_l2sz_two_level(bool devtab, u32 its_idr1, u8 id_bits, u8 *sz) > +{ > + unsigned int l2_bits, l2_sz; > + > + if (devtab && !FIELD_GET(GICV5_ITS_IDR1_DT_LEVELS, its_idr1)) > + return false; > + > + if (!devtab && !FIELD_GET(GICV5_ITS_IDR1_ITT_LEVELS, its_idr1)) > + return false; > + > + /* > + * Pick an L2 size that matches the pagesize; if a match > + * is not found, go for the smallest supported l2 size granule. Similar to before, this description is confusing. If Page size is 64K and 16 + 4 are supported we choose 16 which is not he smallest supported (4 is). The condition the comment refers to only applies if only larger than pagesized things are supported. > + * > + * This ensures that we will always be able to allocate > + * contiguous memory at L2. > + */ > + switch (PAGE_SIZE) { > + case SZ_64K: > + if (GICV5_ITS_IDR1_L2SZ_SUPPORT_64KB(its_idr1)) { > + l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_64k; > + break; > + } > + fallthrough; > + case SZ_16K: > + if (GICV5_ITS_IDR1_L2SZ_SUPPORT_16KB(its_idr1)) { > + l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_16k; > + break; > + } > + fallthrough; > + case SZ_4K: > + if (GICV5_ITS_IDR1_L2SZ_SUPPORT_4KB(its_idr1)) { > + l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_4k; > + break; > + } > + if (GICV5_ITS_IDR1_L2SZ_SUPPORT_16KB(its_idr1)) { > + l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_16k; > + break; > + } > + if (GICV5_ITS_IDR1_L2SZ_SUPPORT_64KB(its_idr1)) { > + l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_64k; > + break; > + } > + > + l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_4k; > + break; > + } > + > + l2_bits = gicv5_its_l2sz_to_l2_bits(l2_sz); > + > + if (l2_bits > id_bits) > + return false; > + > + *sz = l2_sz; > + > + return true; > +} > +/* > + * Register a new device in the device table. Allocate an ITT and > + * program the L2DTE entry according to the ITT structure that > + * was chosen. > + */ > +static int gicv5_its_device_register(struct gicv5_its_chip_data *its, > + struct gicv5_its_dev *its_dev) > +{ > + u8 event_id_bits, device_id_bits, itt_struct, itt_l2sz; > + phys_addr_t itt_phys_base; > + bool two_level_itt; > + u32 idr1, idr2; > + __le64 *dte; > + u64 val; > + int ret; > + > + device_id_bits = devtab_cfgr_field(its, DEVICEID_BITS); > + > + if (its_dev->device_id >= BIT(device_id_bits)) { > + pr_err("Supplied DeviceID (%u) outside of Device Table range (%u)!", > + its_dev->device_id, (u32)GENMASK(device_id_bits - 1, 0)); > + return -EINVAL; > + } > + > + dte = gicv5_its_devtab_get_dte_ref(its, its_dev->device_id, true); > + if (!dte) > + return -ENOMEM; > + > + if (FIELD_GET(GICV5_DTL2E_VALID, le64_to_cpu(*dte))) > + return -EBUSY; > + > + /* > + * Determine how many bits we need, validate those against the max. > + * Based on these, determine if we should go for a 1- or 2-level ITT. > + */ > + event_id_bits = order_base_2(its_dev->num_events); > + > + idr2 = its_readl_relaxed(its, GICV5_ITS_IDR2); > + > + if (event_id_bits > FIELD_GET(GICV5_ITS_IDR2_EVENTID_BITS, idr2)) { > + pr_err("Required EventID bits (%u) larger than supported bits (%u)!", > + event_id_bits, > + (u8)FIELD_GET(GICV5_ITS_IDR2_EVENTID_BITS, idr2)); > + return -EINVAL; > + } > + > + idr1 = its_readl_relaxed(its, GICV5_ITS_IDR1); > + > + /* > + * L2 ITT size is programmed into the L2DTE regardless of > + * whether a two-level or linear ITT is built, init it. > + */ > + itt_l2sz = 0; > + > + two_level_itt = gicv5_its_l2sz_two_level(false, idr1, event_id_bits, > + &itt_l2sz); > + if (two_level_itt) > + ret = gicv5_its_create_itt_two_level(its, its_dev, event_id_bits, > + itt_l2sz, > + its_dev->num_events); > + else > + ret = gicv5_its_create_itt_linear(its, its_dev, event_id_bits); > + if (ret) > + return ret; > + > + itt_phys_base = two_level_itt ? virt_to_phys(its_dev->itt_cfg.l2.l1itt) : > + virt_to_phys(its_dev->itt_cfg.linear.itt); > + > + itt_struct = two_level_itt ? GICV5_ITS_DT_ITT_CFGR_STRUCTURE_TWO_LEVEL : > + GICV5_ITS_DT_ITT_CFGR_STRUCTURE_LINEAR; > + > + val = FIELD_PREP(GICV5_DTL2E_EVENT_ID_BITS, event_id_bits) | > + FIELD_PREP(GICV5_DTL2E_ITT_STRUCTURE, itt_struct) | > + (itt_phys_base & GICV5_DTL2E_ITT_ADDR_MASK) | > + FIELD_PREP(GICV5_DTL2E_ITT_L2SZ, itt_l2sz) | > + FIELD_PREP(GICV5_DTL2E_VALID, 0x1); > + > + its_write_table_entry(its, dte, val); > + > + ret = gicv5_its_device_cache_inv(its, its_dev); > + if (ret) { > + gicv5_its_free_itt(its_dev); > + its_write_table_entry(its, dte, 0); If it makes no difference, unwind in reverse order of setup so swap the two lines above. > + return ret; > + } > + > + return 0; > +} > +static struct gicv5_its_dev *gicv5_its_alloc_device(struct gicv5_its_chip_data *its, int nvec, > + u32 dev_id) > +{ > + struct gicv5_its_dev *its_dev; > + void *entry; > + int ret; > + > + its_dev = gicv5_its_find_device(its, dev_id); > + if (!IS_ERR(its_dev)) { > + pr_err("A device with this DeviceID (0x%x) has already been registered.\n", > + dev_id); > + > + return ERR_PTR(-EBUSY); > + } > + > + its_dev = kzalloc(sizeof(*its_dev), GFP_KERNEL); > + if (!its_dev) > + return ERR_PTR(-ENOMEM); > + > + its_dev->device_id = dev_id; > + its_dev->num_events = nvec; > + > + ret = gicv5_its_device_register(its, its_dev); > + if (ret) { > + pr_err("Failed to register the device\n"); > + goto out_dev_free; > + } > + > + gicv5_its_device_cache_inv(its, its_dev); > + > + its_dev->its_node = its; > + > + its_dev->event_map = (unsigned long *)bitmap_zalloc(its_dev->num_events, GFP_KERNEL); > + if (!its_dev->event_map) { > + ret = -ENOMEM; > + goto out_unregister; > + } > + > + entry = xa_store(&its->its_devices, dev_id, its_dev, GFP_KERNEL); > + if (xa_is_err(entry)) { > + ret = xa_err(entry); > + goto out_bitmap_free; > + } > + > + return its_dev; > + > +out_bitmap_free: > + bitmap_free(its_dev->event_map); > +out_unregister: > + gicv5_its_device_unregister(its, its_dev); > +out_dev_free: > + kfree(its_dev); > + return ERR_PTR(ret); > +} > + > +static int gicv5_its_msi_prepare(struct irq_domain *domain, struct device *dev, > + int nvec, msi_alloc_info_t *info) > +{ > + u32 dev_id = info->scratchpad[0].ul; > + struct msi_domain_info *msi_info; > + struct gicv5_its_chip_data *its; > + struct gicv5_its_dev *its_dev; > + > + msi_info = msi_get_domain_info(domain); > + its = msi_info->data; > + > + guard(mutex)(&its->dev_alloc_lock); > + > + its_dev = gicv5_its_alloc_device(its, nvec, dev_id); > + if (IS_ERR(its_dev)) > + return PTR_ERR(its_dev); > + > + its_dev->its_trans_phys_base = info->scratchpad[1].ul; > + info->scratchpad[0].ptr = its_dev; > + > + return 0; > +} > + > +static void gicv5_its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info) > +{ > + struct gicv5_its_dev *its_dev = info->scratchpad[0].ptr; > + struct msi_domain_info *msi_info; > + struct gicv5_its_chip_data *its; > + > + msi_info = msi_get_domain_info(domain); > + its = msi_info->data; > + > + guard(mutex)(&its->dev_alloc_lock); > + > + if (WARN_ON_ONCE(!bitmap_empty(its_dev->event_map, its_dev->num_events))) > + return; > + > + gicv5_its_device_unregister(its, its_dev); > + bitmap_free(its_dev->event_map); > + xa_erase(&its->its_devices, its_dev->device_id); I was expecting this to be in reverse order of what happens in *msi_prepare (and *msi_alloc under that). That would give the order xa_erase(); bitmap_free(); gicv5_its_device_unregister(); kfree(its_dev); If there is a reason for this ordering it might be good to add a comment calling it out. > + kfree(its_dev); > +}