On 7/7/2025 2:18 PM, Margolin, Michael wrote:
On 7/7/2025 1:28 PM, Leon Romanovsky wrote:
CAUTION: This email originated from outside of the organization. Do
not click links or open attachments unless you can confirm the sender
and know the content is safe.
On Mon, Jul 07, 2025 at 12:51:40PM +0300, Margolin, Michael wrote:
On 7/7/2025 9:28 AM, Leon Romanovsky wrote:
CAUTION: This email originated from outside of the organization. Do
not click links or open attachments unless you can confirm the
sender and know the content is safe.
On Sun, Jul 06, 2025 at 07:32:05PM +0300, Margolin, Michael wrote:
On 7/6/2025 10:25 AM, Leon Romanovsky wrote:
CAUTION: This email originated from outside of the organization.
Do not click links or open attachments unless you can confirm the
sender and know the content is safe.
On Thu, Jul 03, 2025 at 06:23:14PM +0000, Michael Margolin wrote:
reinit_completion(&comp_ctx->wait_event);
@@ -557,17 +559,19 @@ static int
efa_com_wait_and_process_admin_cq_interrupts(struct efa_comp_ctx
*com
if (comp_ctx->status == EFA_CMD_COMPLETED)
ibdev_err_ratelimited(
aq->efa_dev,
- "The device sent a completion but
the driver didn't receive any MSI-X interrupt for admin cmd
%s(%d) status %d (ctx: 0x%p, sq producer: %d, sq consumer: %d,
cq consumer: %d)\n",
+ "The device sent a completion but
the driver didn't receive any MSI-X interrupt for admin cmd
%s(%d) status %d (id: %d, sq producer: %d, sq consumer: %d, cq
consumer: %d)\n",
efa_com_cmd_str(comp_ctx->cmd_opcode),
comp_ctx->cmd_opcode, comp_ctx->status,
- comp_ctx, aq->sq.pc, aq->sq.cc,
aq->cq.cc);
+ comp_ctx->cmd_id, aq->sq.pc,
aq->sq.cc,
+ aq->cq.cc);
else
ibdev_err_ratelimited(
aq->efa_dev,
- "The device didn't send any
completion for admin cmd %s(%d) status %d (ctx 0x%p, sq
producer: %d, sq consumer: %d, cq consumer: %d)\n",
+ "The device didn't send any
completion for admin cmd %s(%d) status %d (id: %d, sq producer:
%d, sq consumer: %d, cq consumer: %d)\n",
efa_com_cmd_str(comp_ctx->cmd_opcode),
comp_ctx->cmd_opcode, comp_ctx->status,
- comp_ctx, aq->sq.pc, aq->sq.cc,
aq->cq.cc);
+ comp_ctx->cmd_id, aq->sq.pc,
aq->sq.cc,
+ aq->cq.cc);
I have very strong feeling that you don't really use these prints
in real life.
For example, comp_ctx->cmd_id is printed with %d, while code and
comment
around cmd_id in __efa_com_submit_admin_cmd() suggests that it
needs to be 0x%X.
It has a lot of information separated to LSB and MSB bits which
are not readable
while printing with %d.
You are also printing comp_ctx->status, which is clear from
if/else section.
So no, I don't buy this claim for "additional debug information",
while
existing is not used.
What do you mean by that?!?
If you take a close look on the prints, you will see the reasons.
For example, you print comp_ctx->status which can be only 0 or 1,
while it is already clear what its value from the print itself.
These errors are extremely rare and are not manually reproducible,
that's
why we want to collect as much information as we can when it happens.
Do it out-of-tree, there is no need in upstream code for internal
debug
sessions.
It's not for internal debug, it is used in production. Why would I
upstream
internal debug prints?
It is used in internal cloud for the NICs not available to the rest of
the world. So yes, it is your internal debug print.
I'm less bothered by the format as long as we have the info we need.
Like I said, it is clear that you never actually relied on this
information.
Better if you completely delete these prints and keep them
out-of-tree.
Not sure that I follow, can you please elaborate on what "relying"
means
from your POV?
"relied" == "used"
The NIC might be available only in AWS but customers decide what Linux
versions they use. Many of our customers choose to use upstream
versions and avoid taking debug or modified versions.
You are right saying the appearance for this print is rare, that's why
as you pointed out, it isn't perfectly formatted. But this is since
the issue rarely reproduce, what is exactly the reason we need the
print in-place when it does.
I can definitely improve the styling if it makes things better.
Michael
Leon, do you have any additional concerns regarding this patch? It's
more than 6 weeks in the pipe now.
We are asking customers to use patched drivers and we shouldn't be there.
Michael