Hi Tomi, On Thu, Jun 26, 2025 at 11:30:44AM +0300, Tomi Valkeinen wrote: > On 19/06/2024 03:17, Laurent Pinchart wrote: > > Hello, > > > > This patch series is the second version of the vsp1 driver conversion to > > using the subdev active state. This version suffers from the same main > > lockdep issue as v1 (see below), but I decided to still repost it as the > > first 17 patches can go in already while we figure out how to handle the > > lockdep issue. > > > > The driver is fairly complex and creates many subdevs, exposing them to > > userspace through subdev nodes but also controlling them from within the > > kernel when acting as a backend for the R-Car DU display driver. It is > > thus a good testing ground to validate the subdev active state API. > > > > The first 17 patches are miscellaneous refactoring and cleanups to > > prepare for the actual conversion. In the middle of that is patch 11/19, > > which adds a function to dump a pipeline to the kernel log, which came > > very handy for debugging. > > > > Patch 18/19 is the actual conversion to the active state API. While I > > tried to keep it as small as possible by handling all the refactoring in > > separate patches, it's still the largest one in the series, mostly due > > to the fact that the drivers creates many subdevs. If that's any > > consolation, the diffstat is nice (net removal of 261 lines). Patch > > 19/19 is then an additional cleanup on top. > > > > The good news is that both the VSP test suite ([1]) and the display test > > suite ([2]) pass. The bad news is that lockdep complains quite heavily: > > > > [ 276.810170] ============================================ > > [ 276.815489] WARNING: possible recursive locking detected > > [ 276.820813] 6.10.0-rc3-00175-g639ee34a20f1 #803 Not tainted > > [ 276.826401] -------------------------------------------- > > [ 276.831724] yavta/1481 is trying to acquire lock: > > [ 276.836442] ffff00000ff26868 (vsp1_entity:531:sd->active_state->lock){+.+.}-{3:3}, at: v4l2_subdev_link_validate+0x140/0x308 > > [ 276.847736] > > [ 276.847736] but task is already holding lock: > > [ 276.853581] ffff00000ff27c68 (vsp1_entity:531:sd->active_state->lock){+.+.}-{3:3}, at: v4l2_subdev_link_validate+0x118/0x308 > > [ 276.864856] > > [ 276.864856] other info that might help us debug this: > > [ 276.871396] Possible unsafe locking scenario: > > [ 276.871396] > > [ 276.877326] CPU0 > > [ 276.879779] ---- > > [ 276.882232] lock(vsp1_entity:531:sd->active_state->lock); > > [ 276.887827] lock(vsp1_entity:531:sd->active_state->lock); > > [ 276.893422] > > [ 276.893422] *** DEADLOCK *** > > [ 276.893422] > > [ 276.899349] May be due to missing lock nesting notation > > [ 276.899349] > > [ 276.906145] 3 locks held by yavta/1481: > > [ 276.909996] #0: ffff00000fcaa7a0 (&video->lock){+.+.}-{3:3}, at: __video_do_ioctl+0x19c/0x5b0 > > [ 276.918666] #1: ffff00000fc9f418 (&mdev->graph_mutex){+.+.}-{3:3}, at: vsp1_video_streamon+0xec/0x4e8 > > [ 276.928031] #2: ffff00000ff27c68 (vsp1_entity:531:sd->active_state->lock){+.+.}-{3:3}, at: v4l2_subdev_link_validate+0x118/0x308 > > [ 276.939744] > > [ 276.939744] stack backtrace: > > [ 276.944114] CPU: 1 PID: 1481 Comm: yavta Not tainted 6.10.0-rc3-00175-g639ee34a20f1 #803 > > [ 276.952225] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) > > [ 276.960157] Call trace: > > [ 276.962616] dump_backtrace+0xa0/0x128 > > [ 276.966391] show_stack+0x20/0x38 > > [ 276.969723] dump_stack_lvl+0x8c/0xd0 > > [ 276.973412] dump_stack+0x1c/0x28 > > [ 276.976747] print_deadlock_bug+0x29c/0x3b0 > > [ 276.980950] __lock_acquire+0x165c/0x1b80 > > [ 276.984978] lock_acquire.part.0+0x168/0x310 > > [ 276.989265] lock_acquire+0x70/0x90 > > [ 276.992772] __mutex_lock+0x10c/0x4b8 > > [ 276.996458] mutex_lock_nested+0x2c/0x40 > > [ 277.000396] v4l2_subdev_link_validate+0x140/0x308 > > [ 277.005213] __media_pipeline_start+0x880/0xc30 > > [ 277.009763] __video_device_pipeline_start+0x48/0x68 > > [ 277.014750] vsp1_video_streamon+0x148/0x4e8 > > [ 277.019044] v4l_streamon+0x50/0x70 > > [ 277.022553] __video_do_ioctl+0x4cc/0x5b0 > > [ 277.026583] video_usercopy+0x3c4/0xb90 > > [ 277.030438] video_ioctl2+0x20/0x48 > > [ 277.033942] v4l2_ioctl+0xa4/0xc8 > > [ 277.037278] __arm64_sys_ioctl+0xec/0x118 > > [ 277.041310] invoke_syscall+0x68/0x198 > > [ 277.045084] el0_svc_common.constprop.0+0x80/0x150 > > [ 277.049900] do_el0_svc+0x38/0x50 > > [ 277.053237] el0_svc+0x4c/0xc0 > > [ 277.056309] el0t_64_sync_handler+0x120/0x130 > > [ 277.060685] el0t_64_sync+0x190/0x198 > > > > There is no actual deadlock situation caused by the tests, but a lockdep > > class deadlock detection. > > > > Subdev initialization is handled in one helper function for all subdevs > > created by the driver. This causes all locks for the active state being > > created with the same lock class. As a result, when validating links and > > trying to lock both the sink and source states, lockdep complains of > > recursive locking, even if the two locks are different mutex instances. > > Sounds familiar =). > > > Working around the issue is likely possible. The > > v4l2_subdev_init_finalize() function is actually a macro calling > > __v4l2_subdev_init_finalize() with an explicit lock class. The VSP1 > > driver could do the same and push the lock class one layer up, to the > > callers of vsp1_entity_init(). This would solve part of the problem > > only, as the driver creates multiple subdevs of the same type, so > > dynamic allocation of the lock class may be required. That would however > > be a bad solution, or rather not a solution to the actual problem. There > > is a reason why lockdep groups locks in classes, beside just saving > > memory. When locks belonging to different instances of the same object > > type are taken recursively, it is often the sign of a bad design. > > > > Even if we worked around this problem, lockdep would later complain > > about AB-BA locking at link validation time. The VSP1 entities can be > > assembled in a pipeline in any order, so even if the link validation > > helpers are careful to always lock in the sink-source order, the sink > > and source could get swapped. > > > > I believe this calls for a rework of locking for subdev states. The > > option I'm envisioning is to lock all subdevs in a pipeline when > > starting the pipeline, with a new media_pipeline_lock() call just before > > media_pipeline_start(). The link validation helpers would then be called > > with all locks taken, and so would the .s_stream() subdev operations. > > This would simplify locking in drivers as a result, which I think is a > > very desirable side effect. Then, after starting the pipeline, the > > top-level driver would call media_pipeline_unlock(). Similar locking > > would be performed at pipeline stop time, to ensure that the .s_stream() > > operations would also be called with the locks held. Obviously, just > > moving locking around won't fix the lockdep issues, and the second part > > of the fix would be to use ww-mutexes instead of regular mutexes. The > > result would be similar to the implementation of the KMS atomic API, > > giving me some confidence that it goes in the right direction. An > > additional difficulty, however, is that we need to lock both the subdev > > active state and the control handler with the same lock. > > This has been the long term plan, but perhaps it's not so far off in the > future, then... Sounds like I need to give it a try then. > But should the control state be merged to the active state first? I fear > that's a large amount of work in itself, though, but managing the active > state and control state locking separately with the scheme you outline > sounds complex. > > Or can we do a new requirement for active-state-enabled subdevs: the > control handler _must_ use the same lock as the active state (which, I > think, is often done already). Would that help? I think that would help, and I also think that drivers are largely doing that already. I don't think we document it though. > Or as a temporary solution... If there's a central place in the vsp1 > infrastructure, can we have a single (shared) state lock for all the > vsp1 subdevs? That would mean taking the same lock multiple times, but > somehow I recall that a similar thing was already done in some of the > drivers. I'll check that. > So I agree with the goal. But it would be nice to not block this series > until all the work is done. Or can we annotate the locking for this > particular case so that lockdep doesn't care about vsp1 locks? All the preparatory patches (01/19 to 17/19) have been merged, it's only the last two that are blocked. I'm not sure about lock annotation, I'll give it a look. -- Regards, Laurent Pinchart