Re: [RFC bpf-next 2/9] bpf, x86: add new map type: instructions set

Anton Protopopov <a.s.protopopov@xxxxxxxxx> · Thu, 19 Jun 2025 18:55:23 +0000

On 25/06/17 05:57PM, Eduard Zingerman wrote:
> On Sun, 2025-06-15 at 08:59 +0000, Anton Protopopov wrote:
> 
> Meta: "instruction set" is a super confusing name, at-least for me the
>       first thought is about actual set of instructions supported by
>       some h/w. instruction_info? instruction_offset? just
>       "iset"/"ioffset"?
> 
> [...]
> 
> > On map creation/initialization, before loading the program, each
> > element of the map should be initialized to point to an instruction
> > offset within the program. Before the program load such maps should
> > be made frozen. After the program verification xlated and jitted
> > offsets can be read via the bpf(2) syscall.
> 
> I think such maps would be a bit more ergonomic it original
> instruction index would be saved as well, e.g:
> 
>   (original_offset, xlated_offset, jitted_offset)
> 
> Otherwise user would have to recover original offset from some
> external mapping. This information is stored in orig_xlated_off
> anyway.

I do agree that this might be convenient to have the original_offset.
But the only use case I see here is "BPF debuggers". Such programs
will be able to build this mapping themselves.

I would add it as is, the only obstacle I see now is map key size.
Now from BPF point of view and from userspace point of view it is 8.
Userspace sees (u32 xlated, u32 jitted), and BPF sees *ip. I haven't
looked at how much work it is to have different key sizes for
userspace and BPF, and if this breaks things too much.

> [...]
> 
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 15672cb926fc..923c38f212dc 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -1615,6 +1615,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
> 
> [...]
> 
> > @@ -2642,6 +2645,14 @@ st:			if (is_imm8(insn->off))
> >  				return -EFAULT;
> >  			}
> >  			memcpy(rw_image + proglen, temp, ilen);
> > +
> > +			/*
> > +			 * Instruction sets need to know how xlated code
> > +			 * maps to jited code
> > +			 */
> > +			abs_xlated_off = bpf_prog->aux->subprog_start + i - 1 - adjust_off;
> 
> Nit: `adjust_off` is a bit hard to follow, maybe move the following:
> 
> 	abs_xlated_off = bpf_prog->aux->subprog_start + i - 1;
> 
>      to the beginning of the loop?

Thank, this isn't transparent indeed. I will move things to be more
readable.

> 
> > +			bpf_prog_update_insn_ptr(bpf_prog, abs_xlated_off, proglen, ilen,
> > +						 jmp_offset, image + proglen);
> 
> Nit: initialize `jmp_offset` at each loop iteration to 0?
>      otherwise it would denote jump offset of the last processed
>      jump instruction for all following non-jump instructions.

Yes, thanks.

> >  		}
> >  		proglen += ilen;
> >  		addrs[i] = proglen;
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 8189f49e43d6..008bcd44c60e 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -3596,4 +3596,25 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
> >  	return prog->aux->func_idx != 0;
> >  }
> >
> > +int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog);
> > +int bpf_insn_set_ready(struct bpf_map *map);
> > +void bpf_insn_set_release(struct bpf_map *map);
> > +void bpf_insn_set_adjust(struct bpf_map *map, u32 off, u32 len);
> > +void bpf_insn_set_adjust_after_remove(struct bpf_map *map, u32 off, u32 len);
> > +
> > +struct bpf_insn_ptr {
> 
> Could you please add comments describing each field?
> E.g.: "address of the instruction in the jitted image",
>       "for jump instructions, the relative offset of the jump target",
>       "index of the original instruction",
>       "original value of the corresponding bpf_insn_set_value.xlated_off".

Sure, will add.

(Also, not to repeat "yes" many times, all your comments below look
to make sense, will address them. Thanks.)

> > +	void *jitted_ip;
> > +	u32 jitted_len;
> > +	int jitted_jump_offset;
> > +	struct bpf_insn_set_value user_value; /* userspace-visible value */
> > +	u32 orig_xlated_off;
> > +};
> 
> [...]
> 
> > diff --git a/kernel/bpf/bpf_insn_set.c b/kernel/bpf/bpf_insn_set.c
> > new file mode 100644
> 
> [...]
> 
> > +static int insn_set_check_btf(const struct bpf_map *map,
> > +			      const struct btf *btf,
> > +			      const struct btf_type *key_type,
> > +			      const struct btf_type *value_type)
> > +{
> > +	u32 int_data;
> > +
> > +	if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
> > +		return -EINVAL;
> > +
> > +	if (BTF_INFO_KIND(value_type->info) != BTF_KIND_INT)
> > +		return -EINVAL;
> > +
> > +	int_data = *(u32 *)(key_type + 1);
> 
> Nit: use btf_type_int() accessor?
> 
> > +	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
> > +		return -EINVAL;
> > +
> > +	int_data = *(u32 *)(value_type + 1);
> > +	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
> 
> Should this check for `BTF_INT_BITS(int_data) != 64`?
> 
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> 
> [...]
> 
> > +int bpf_insn_set_init(struct bpf_map *map, const struct bpf_prog *prog)
> > +{
> > +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +	int i;
> > +
> > +	if (!is_frozen(map))
> > +		return -EINVAL;
> > +
> > +	if (!valid_offsets(insn_set, prog))
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * There can be only one program using the map
> > +	 */
> > +	mutex_lock(&insn_set->state_mutex);
> > +	if (insn_set->state != INSN_SET_STATE_FREE) {
> > +		mutex_unlock(&insn_set->state_mutex);
> > +		return -EBUSY;
> > +	}
> > +	insn_set->state = INSN_SET_STATE_INIT;
> > +	mutex_unlock(&insn_set->state_mutex);
> > +
> > +	/*
> > +	 * Reset all the map indexes to the original values.  This is needed,
> > +	 * e.g., when a replay of verification with different log level should
> > +	 * be performed.
> > +	 */
> > +	for (i = 0; i < map->max_entries; i++)
> > +		insn_set->ptrs[i].user_value.xlated_off = insn_set->ptrs[i].orig_xlated_off;
> > +
> > +	return 0;
> > +}
> > +
> > +int bpf_insn_set_ready(struct bpf_map *map)
> 
> What is the reasoning for not needing to take the mutex here and in
> the bpf_insn_set_release?
> 
> > +{
> > +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +	int i;
> > +
> > +	for (i = 0; i < map->max_entries; i++) {
> > +		if (insn_set->ptrs[i].user_value.xlated_off == INSN_DELETED)
> > +			continue;
> > +		if (!insn_set->ips[i])
> > +			return -EFAULT;
> > +	}
> > +
> > +	insn_set->state = INSN_SET_STATE_READY;
> > +	return 0;
> > +}
> > +
> > +void bpf_insn_set_release(struct bpf_map *map)
> > +{
> > +	struct bpf_insn_set *insn_set = cast_insn_set(map);
> > +
> > +	insn_set->state = INSN_SET_STATE_FREE;
> > +}
> 
> [...]
> 
> (... I'll continue reading through patch-set a bit later ...)