tgtd hung task during heavy I/O - fdatasync blocking indefinitely

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm experiencing a hung task issue with tgtd during sustained heavy
write operations and would appreciate some guidance on debugging this
further. I'm not very experienced in Linux nor iSCSI so I'm looking
for some pointers as to where to look.

Setup:

Host: Debian 6.1.0-37-amd64 running in Proxmox VM
Physical storage: Proxmox cluster with Ceph storage (6x 500GB local
SSDs, replication factor 2 = ~2TB effective)
VM storage: /dev/sdb backed by Ceph RBD, presented to TGT VM
TGT configuration: serving iSCSI targets for network boot via iPXE
Storage stack: Windows client → iSCSI over Ethernet → TGT (rdwr) → LVM
thin → Ceph RBD → local SSDs
Target configuration: See below
Use case: LAN party network boot - multiple Windows clients boot from
disk images over network (And just because I like this concept)

Client Configuration:

Windows client boots via iPXE from iSCSI target
PrimoCache installed with 20GB RAM cache (since the connection to the
iSCSI disks can be a little slow, PrimoCache is effectively a windows
read-write cache)
PrimoCache configuration: caches all reads/writes to improve network
storage performance
Steam installation target: ~70GB game download/installation

Issue Manifestation: During the GTA 5 installation via Steam, the
first ~20GB of writes are absorbed by PrimoCache's RAM buffer and
appear to complete quickly. However, once the cache fills up,
PrimoCache begins flushing data to the network storage (TGT).

The installation progresses normally for a while as data streams
through the cache to the iSCSI target, but at some point (usually
around 30-40GB into the installation), all I/O operations completely
cease. The Windows client shows the Steam download as paused/stalled,
and the system becomes effectively frozen for storage operations.

At this point, the TGT target becomes unresponsive - no new I/O
operations are processed, and the client cannot complete any disk
operations. This coincides with the kernel hung task message appearing
in dmesg.

Kernel trace:

[14259.621377] INFO: task tgtd:824 blocked for more than 241 seconds.
[14259.627485] Call Trace:
[14259.628232]  <TASK>
[14259.628867]  __schedule+0x34d/0x9e0
[14259.629728]  schedule+0x5a/0xd0
[14259.630527]  io_schedule+0x42/0x70
[14259.631387]  folio_wait_bit_common+0x13d/0x350
[14259.632388]  ? filemap_alloc_folio+0x100/0x100
[14259.633441]  folio_wait_writeback+0x28/0x80
[14259.634347]  __filemap_fdatawait_range+0x90/0x120
[14259.635403]  file_write_and_wait_range+0x6f/0x90
[14259.636510]  blkdev_fsync+0x13/0x30
[14259.637373]  __x64_sys_fdatasync+0x4b/0x90


The trace shows tgtd is blocked in fdatasync(), waiting for writeback
to complete.

Current LVM thin pool status:

thin_pool: 499.75GB, 17.11% data, 16.42% metadata
Target LV: iscsi_grandia2, 250GB, 34.20% used
Chunk size: default (64KB)

Additional Observations:

The issue appears to be triggered specifically by sustained write workloads
Multiple I_T nexus connections (4-23) from the same initiator
(iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19) all from
IP 10.62.0.57 - What could this mean?
No obvious issues visible in Ceph cluster health during the hang
LVM thin pool has plenty of free space (17.11% data usage, 16.42%
metadata usage)

Some more information:

root@vm-iscsibeasttgt2:~# tgtadm --lld iscsi --mode target --op show --tid 5
nop_interval=0
nop_count=0
MaxRecvDataSegmentLength=8192
HeaderDigest=None
DataDigest=None
InitialR2T=Yes
MaxOutstandingR2T=1
ImmediateData=Yes
FirstBurstLength=65536
MaxBurstLength=262144
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
IFMarker=No
OFMarker=No
DefaultTime2Wait=2
DefaultTime2Retain=20
OFMarkInt=Reject
IFMarkInt=Reject
MaxConnections=1
RDMAExtensions=Yes
TargetRecvDataSegmentLength=262144
InitiatorRecvDataSegmentLength=262144
MaxOutstandingUnexpectedPDUs=0
MaxXmitDataSegmentLength=8192
MaxQueueCmd=128

root@vm-iscsibeasttgt2:~# lvs -a
  LV                VG       Attr       LSize    Pool      Origin
Data%  Meta%  Move Log Cpy%Sync Convert
  iscsi_devedselvm  iscsi_vg Vwi-aotz--  100.00g thin_pool        0.00
  iscsi_grandia2    iscsi_vg Vwi-aotz--  250.00g thin_pool        34.20
  [lvol0_pmspare]   iscsi_vg ewi-------  128.00m
  thin_pool         iscsi_vg twi-aotz-- <499.75g                  17.11  16.42
  [thin_pool_tdata] iscsi_vg Twi-ao---- <499.75g
  [thin_pool_tmeta] iscsi_vg ewi-ao----  128.00m

dmesg:
[14259.621377] INFO: task tgtd:824 blocked for more than 241 seconds.
[14259.623000]       Not tainted 6.1.0-37-amd64 #1 Debian 6.1.140-1
[14259.624314] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14259.625871] task:tgtd            state:D stack:0     pid:824
ppid:1      flags:0x00004002
[14259.627485] Call Trace:
[14259.628232]  <TASK>
[14259.628867]  __schedule+0x34d/0x9e0
[14259.629728]  schedule+0x5a/0xd0
[14259.630527]  io_schedule+0x42/0x70
[14259.631387]  folio_wait_bit_common+0x13d/0x350
[14259.632388]  ? filemap_alloc_folio+0x100/0x100
[14259.633441]  folio_wait_writeback+0x28/0x80
[14259.634347]  __filemap_fdatawait_range+0x90/0x120
[14259.635403]  file_write_and_wait_range+0x6f/0x90
[14259.636510]  blkdev_fsync+0x13/0x30
[14259.637373]  __x64_sys_fdatasync+0x4b/0x90
[14259.638283]  do_syscall_64+0x55/0xb0
[14259.639220]  ? __rseq_handle_notify_resume+0xa9/0x4a0
[14259.640361]  ? do_futex+0x106/0x1b0
[14259.641207]  ? futex_wake+0x7c/0x180
[14259.641997]  ? do_futex+0xda/0x1b0
[14259.642843]  ? __x64_sys_futex+0x8e/0x1d0
[14259.643869]  ? exit_to_user_mode_prepare+0x40/0x1e0
[14259.644972]  ? syscall_exit_to_user_mode+0x1e/0x40
[14259.645986]  ? do_syscall_64+0x61/0xb0
[14259.646786]  ? do_syscall_64+0x61/0xb0
[14259.647659]  ? do_syscall_64+0x61/0xb0
[14259.648528]  ? do_syscall_64+0x61/0xb0
[14259.649453]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[14259.650454] RIP: 0033:0x7f3194c90cba
[14259.651218] RSP: 002b:00007f316d7fac70 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[14259.652659] RAX: ffffffffffffffda RBX: 00005576cb22d8a0 RCX: 00007f3194c90cba
[14259.654022] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000f
[14259.655403] RBP: 0000000000000000 R08: 0000000000000035 R09: 00000000ffffffff
[14259.656820] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[14259.658114] R13: 000000000000000f R14: 00007ffcb3047e80 R15: 00007f316cffb000
[14259.659461]  </TASK>

tgt-admin -s:
Target 5: iqn.2024-11.local.vm-iscsibeast:grandia2
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
        I_T nexus: 4
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 5
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 6
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 7
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 8
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 9
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 10
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 11
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 12
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 13
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 14
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 15
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 16
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 17
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 18
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 19
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 20
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 21
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
        I_T nexus: 23
            Initiator:
iqn.2010-04.org.ipxe:98ae8321-e8c8-ab14-ae97-345a60b59b19 alias: none
            Connection: 1
                IP Address: 10.62.0.57
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00050000
            SCSI SN: beaf50
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
        LUN: 1
            Type: disk
            SCSI ID: IET     00050001
            SCSI SN: beaf51
            Size: 268435 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: rdwr
            Backing store path: /dev/iscsi_vg/iscsi_grandia2
            Backing store flags:
    Account information:
    ACL information:
        ALL

root@vm-iscsibeasttgt2:~# cat /etc/tgt/conf.d/iscsi.conf
# TGT iSCSI Configuration
# Auto-generated by debian-server-setup-tgt.sh
# Using all TGT defaults for optimal connection management

# Default settings
default-driver iscsi

# Prevent connection accumulation for iPXE clients
<driver iscsi>
    MaxConnections 1
</driver>

# Include user-specific target files
include /etc/tgt/conf.d/targets/*.conf

So I'm mainly curious if someone has some ideas what is causing this
problem. I would expect that "slow" storage would just "slow down"
write operations. Not cause a complete system freeze and eventual
bluescreen in windows stating "Critical_process_died". Also I'm not
sure if this is the place to ask these questions as I've never used a
mailing list before. So if not, feel free to point me to the right
place.

Kind regards,
Devedse





[Index of Archives]     [Linux SCSI]     [Linux RAID]     [Linux Clusters]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]

  Powered by Linux