On 31/5/25 23:43, Achilleas Mantzios wrote:
Hi
a query is stuck with the above, it seems it waits for parallel worker to finish, however , there are no parallel works running :
postgres@[local]/dynacom=# SELECT application_name, backend_type, backend_start,xact_start,query_start,wait_event_type, wait_event ,state FROM pg_stat_activity;
application_name | backend_type | backend_start | xact_start | query_start | wait_event_type | wait_event | state
------------------+------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------
| autovacuum launcher | 2024-11-29 17:48:50.92935+02 | | | Activity | AutoVacuumMain |
| logical replication launcher | 2024-11-29 17:48:50.929496+02 | | | Activity | LogicalLauncherMain |
DBMIRROR | client backend | 2025-05-31 19:04:16.724305+03 | 2025-05-31 19:05:21.686093+03 | 2025-05-31 19:05:21.909936+03 | IPC | ParallelFinish | active
| client backend | 2025-05-31 23:31:30.030806+03 | | 2025-05-31 23:35:05.045573+03 | Client | ClientRead | idle
psql | client backend | 2025-05-31 23:29:33.863485+03 | 2025-05-31 23:35:09.322972+03 | 2025-05-31 23:35:09.322972+03 | | | active
RXMLFVSLS | client backend | 2025-05-31 23:32:37.351131+03 | | 2025-05-31 23:35:09.295221+03 | Client | ClientRead | idle
psql | client backend | 2025-04-28 16:59:55.968442+03 | | 2025-05-27 16:43:56.338228+03 | Client | ClientRead | idle
| background writer | 2024-11-29 17:48:50.916876+02 | | | Activity | BgWriterMain |
| archiver | 2024-12-03 18:57:36.447067+02 | | | Activity | ArchiverMain |
| checkpointer | 2024-11-29 17:48:50.916648+02 | | | Activity | CheckpointerMain |
| walwriter | 2024-11-29 17:48:50.928789+02 | | | Activity | WalWriterMain |
(11 rows)
postgres@[local]/dynacom=#
So, I will terminate this backend now to get the system working again, we are curious why this happened, our system serves daily 22M+ transactions, this is Saturday night hence the low traffic.
postgres@smadb:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
Release: 12
Codename: bookworm
postgres@smadb:~$ psql -Aqt -c 'select version()'
PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
postgres@smadb:~$
Some additional info :
1690535 : the pid in question
we found no trace or indication for OOM killer ,
root@smadb:~# strace -p 1690535
strace: Process 1690535 attached
epoll_wait(12,
also lsof showed :
postgres
1690535 1690649 Sweeper postgres 0r CHR
1,3 0t0 4 /dev/null
postgres 1690535 1690649 Sweeper postgres 1w
FIFO 0,14 0t0 213624178 pipe
postgres 1690535 1690649 Sweeper postgres 2w
FIFO 0,14 0t0 213624178 pipe
postgres 1690535 1690649 Sweeper postgres 3u
a_inode 0,15 0 1059 [signalfd]
postgres 1690535 1690649 Sweeper postgres 4r
FIFO 0,14 0t0 213624177 pipe
postgres 1690535 1690649 Sweeper postgres 5u
a_inode 0,15 0 1059
[eventpoll:3,4,11]
postgres 1690535 1690649 Sweeper postgres 6u
REG 8,32 8192 157352475
/raid4/pgsql/data/PG_16_202
307071/207491653/2601
postgres 1690535 1690649 Sweeper postgres 7u
REG 8,32 450560 157352631
/raid4/pgsql/data/PG_16_202
307071/207491653/207493206
postgres 1690535 1690649 Sweeper postgres 8u
REG 8,32 40960 157356847
/raid4/pgsql/data/PG_16_202
307071/207491653/207503536
postgres 1690535 1690649 Sweeper postgres 9u
REG 8,32 40960 157356848
/raid4/pgsql/data/PG_16_202
307071/207491653/207503538
postgres 1690535 1690649 Sweeper postgres 10u
REG 8,32 40960 157357848
/raid4/pgsql/data/PG_16_202
307071/207491653/207504627
postgres 1690535 1690649 Sweeper postgres 11u
IPv4 1241927029 0t0 TCP
smadb.internal.net:postgres
ql->sma.internal.net:42615 (ESTABLISHED)
postgres
1690535 1690649 Sweeper postgres 12u a_inode
0,15 0 1059 [eventpoll:3,4]
So , as far as we understand, it waited for an inode ?
I tried to pg_terminate_backend(1690535); it did nothing to the process, then pg_cancel_backend(1690535);
I went to the shell and did a normal
kill 1690535
it did nothing, and before i kill -9 it, I just tried kill -HUP, and it woke up with
10.9.0.10(42615)
[1690535] 683b2880.19cba7 2025-05-31 23:53:48.231 EEST DBMIRROR
postgres@dynacom line:4 FATAL: terminating connection due to
administrator command
10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31
23:53:48.231 EEST DBMIRROR
postgres@dynacom line:5 STATEMENT: SELECT pd.XID,MAX(SeqId)
FROM dbmirror_Pending
pd LEFT JOIN dbmirror_MirroredTransaction mt INNER JOIN
dbmirror_MirrorHost mh ON mt.MirrorHostId = mh.MirrorHostId AND
mh.HostName= '192.168.211.1' ON pd.XID =
mt.XID WHERE mt.XID is null and (pd.slaveid is null or
pd.slaveid = '579') GROUP BY pd.XID ORDER BY MAX(pd.SeqId)
10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31
23:53:48.234 EEST DBMIRROR
postgres@dynacom line:6 LOG: disconnection: session time:
4:49:31.510 user=postgr
es database=dynacom host=10.9.0.10 port=42615