Hi, I'm experiencing a hung task issue with tgtd during sustained heavy write operations and would appreciate some guidance on debugging this further. I'm not very experienced in Linux nor iSCSI so I'm looking for some pointers as to where to look. Setup: Host: Debian 6.1.0-37-amd64 running in Proxmox VM Physical storage: Proxmox cluster with Ceph storage (6x 500GB local SSDs, replication factor 2 = ~2TB effective) VM storage: /dev/sdb backed by Ceph RBD, presented to TGT VM TGT configuration: serving iSCSI targets for network boot via iPXE Storage stack: Windows client → iSCSI over Ethernet → TGT (rdwr) → LVM thin → Ceph RBD → local SSDs Target configuration: See below Use case: LAN party network boot - multiple Windows clients boot from disk images over network (And just because I like this concept) Client Configuration: Windows client boots via iPXE from iSCSI target PrimoCache installed with 20GB RAM cache (since the connection to the iSCSI disks can be a little slow, PrimoCache is effectively a windows read-write cache) PrimoCache configuration: caches all reads/writes to improve network storage performance Steam installation target: ~70GB game download/installation Issue Manifestation: During the GTA 5 installation via Steam, the first ~20GB of writes are absorbed by PrimoCache's RAM buffer and appear to complete quickly. However, once the cache fills up, PrimoCache begins flushing data to the network storage (TGT). The installation progresses normally for a while as data streams through the cache to the iSCSI target, but at some point (usually around 30-40GB into the installation), all I/O operations completely cease. The Windows client shows the Steam download as paused/stalled, and the system becomes effectively frozen for storage operations. At this point, the TGT target becomes unresponsive - no new I/O operations are processed, and the client cannot complete any disk operations. This coincides with the kernel hung task message appearing in dmesg. Kernel trace: [14259.621377] INFO: task tgtd:824 blocked for more than 241 seconds. [14259.627485] Call Trace: [14259.628232] <TASK> [14259.628867] __schedule+0x34d/0x9e0 [14259.629728] schedule+0x5a/0xd0 [14259.630527] io_schedule+0x42/0x70 [14259.631387] folio_wait_bit_common+0x13d/0x350 [14259.632388] ? filemap_alloc_folio+0x100/0x100 [14259.633441] folio_wait_writeback+0x28/0x80 [14259.634347] __filemap_fdatawait_range+0x90/0x120 [14259.635403] file_write_and_wait_range+0x6f/0x90 [14259.636510] blkdev_fsync+0x13/0x30 [14259.637373] __x64_sys_fdatasync+0x4b/0x90 The trace shows tgtd is blocked in fdatasync(), waiting for writeback to complete. Current LVM thin pool status: thin_pool: 499.75GB, 17.11% data, 16.42% metadata Target LV: iscsi_grandia2, 250GB, 34.20% used Chunk size: default (64KB) Additional Observations: The issue appears to be triggered specifically by sustained write workloads Multiple I_T nexus connections (4-23) from the same initiator (iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19) all from IP 10.62.0.57 - What could this mean? No obvious issues visible in Ceph cluster health during the hang LVM thin pool has plenty of free space (17.11% data usage, 16.42% metadata usage) Some more information: root@vm-iscsibeasttgt2:~# tgtadm --lld iscsi --mode target --op show --tid 5 nop_interval=0 nop_count=0 MaxRecvDataSegmentLength=8192 HeaderDigest=None DataDigest=None InitialR2T=Yes MaxOutstandingR2T=1 ImmediateData=Yes FirstBurstLength=65536 MaxBurstLength=262144 DataPDUInOrder=Yes DataSequenceInOrder=Yes ErrorRecoveryLevel=0 IFMarker=No OFMarker=No DefaultTime2Wait=2 DefaultTime2Retain=20 OFMarkInt=Reject IFMarkInt=Reject MaxConnections=1 RDMAExtensions=Yes TargetRecvDataSegmentLength=262144 InitiatorRecvDataSegmentLength=262144 MaxOutstandingUnexpectedPDUs=0 MaxXmitDataSegmentLength=8192 MaxQueueCmd=128 root@vm-iscsibeasttgt2:~# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert iscsi_devedselvm iscsi_vg Vwi-aotz-- 100.00g thin_pool 0.00 iscsi_grandia2 iscsi_vg Vwi-aotz-- 250.00g thin_pool 34.20 [lvol0_pmspare] iscsi_vg ewi------- 128.00m thin_pool iscsi_vg twi-aotz-- <499.75g 17.11 16.42 [thin_pool_tdata] iscsi_vg Twi-ao---- <499.75g [thin_pool_tmeta] iscsi_vg ewi-ao---- 128.00m dmesg: [14259.621377] INFO: task tgtd:824 blocked for more than 241 seconds. [14259.623000] Not tainted 6.1.0-37-amd64 #1 Debian 6.1.140-1 [14259.624314] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14259.625871] task:tgtd state:D stack:0 pid:824 ppid:1 flags:0x00004002 [14259.627485] Call Trace: [14259.628232] <TASK> [14259.628867] __schedule+0x34d/0x9e0 [14259.629728] schedule+0x5a/0xd0 [14259.630527] io_schedule+0x42/0x70 [14259.631387] folio_wait_bit_common+0x13d/0x350 [14259.632388] ? filemap_alloc_folio+0x100/0x100 [14259.633441] folio_wait_writeback+0x28/0x80 [14259.634347] __filemap_fdatawait_range+0x90/0x120 [14259.635403] file_write_and_wait_range+0x6f/0x90 [14259.636510] blkdev_fsync+0x13/0x30 [14259.637373] __x64_sys_fdatasync+0x4b/0x90 [14259.638283] do_syscall_64+0x55/0xb0 [14259.639220] ? __rseq_handle_notify_resume+0xa9/0x4a0 [14259.640361] ? do_futex+0x106/0x1b0 [14259.641207] ? futex_wake+0x7c/0x180 [14259.641997] ? do_futex+0xda/0x1b0 [14259.642843] ? __x64_sys_futex+0x8e/0x1d0 [14259.643869] ? exit_to_user_mode_prepare+0x40/0x1e0 [14259.644972] ? syscall_exit_to_user_mode+0x1e/0x40 [14259.645986] ? do_syscall_64+0x61/0xb0 [14259.646786] ? do_syscall_64+0x61/0xb0 [14259.647659] ? do_syscall_64+0x61/0xb0 [14259.648528] ? do_syscall_64+0x61/0xb0 [14259.649453] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [14259.650454] RIP: 0033:0x7f3194c90cba [14259.651218] RSP: 002b:00007f316d7fac70 EFLAGS: 00000293 ORIG_RAX: 000000000000004b [14259.652659] RAX: ffffffffffffffda RBX: 00005576cb22d8a0 RCX: 00007f3194c90cba [14259.654022] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000f [14259.655403] RBP: 0000000000000000 R08: 0000000000000035 R09: 00000000ffffffff [14259.656820] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 [14259.658114] R13: 000000000000000f R14: 00007ffcb3047e80 R15: 00007f316cffb000 [14259.659461] </TASK> tgt-admin -s: Target 5: iqn.2024-11.local.vm-iscsibeast:grandia2 System information: Driver: iscsi State: ready I_T nexus information: I_T nexus: 4 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 5 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 6 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 7 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 8 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 9 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 10 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 11 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 12 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 13 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 14 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 15 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 16 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 17 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 18 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 19 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 20 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 21 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 I_T nexus: 23 Initiator: iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none Connection: 1 IP Address: 10.62.0.57 LUN information: LUN: 0 Type: controller SCSI ID: IET 00050000 SCSI SN: beaf50 Size: 0 MB, Block size: 1 Online: Yes Removable media: No Prevent removal: No Readonly: No SWP: No Thin-provisioning: No Backing store type: null Backing store path: None Backing store flags: LUN: 1 Type: disk SCSI ID: IET 00050001 SCSI SN: beaf51 Size: 268435 MB, Block size: 512 Online: Yes Removable media: No Prevent removal: No Readonly: No SWP: No Thin-provisioning: No Backing store type: rdwr Backing store path: /dev/iscsi_vg/iscsi_grandia2 Backing store flags: Account information: ACL information: ALL root@vm-iscsibeasttgt2:~# cat /etc/tgt/conf.d/iscsi.conf # TGT iSCSI Configuration # Auto-generated by debian-server-setup-tgt.sh # Using all TGT defaults for optimal connection management # Default settings default-driver iscsi # Prevent connection accumulation for iPXE clients <driver iscsi> MaxConnections 1 </driver> # Include user-specific target files include /etc/tgt/conf.d/targets/*.conf So I'm mainly curious if someone has some ideas what is causing this problem. I would expect that "slow" storage would just "slow down" write operations. Not cause a complete system freeze and eventual bluescreen in windows stating "Critical_process_died". Also I'm not sure if this is the place to ask these questions as I've never used a mailing list before. So if not, feel free to point me to the right place. Kind regards, Devedse