[CEPH-MDS] directory access blocked while looping subtree empty export

"=?gb18030?b?us6++w==?=" <317143086@xxxxxx> · Sat, 30 Aug 2025 10:51:25 +0800

Hi, all

Our cluster has encountered a CephFS metadata-related failure, which causes user requests to access a specific directory (/danlu/ProjectStorage/, with approximately 1.4 billion objects) to get stuck. In the MDS logs, there is a recurring loop of printouts for all dir_frags of certain directories.

This issue may be caused by frequent changes we made to the filesystem subtree export pin. Can it be resolved by resetting the MDS MAP using the command ceph fs reset <fs name&gt; --yes-i-really-mean-it?

1. Failure Timeline
1.0 Our CephFS is deployed with multiple MDS nodes. We manually control MDS load balancing using static pins, with a total of 11 nodes (Rank 0每10). Among them, MDS Rank 4 (pinned to /danlu/ProjectStorage/) bears a relatively large volume of metadata.
1.1 Abnormal user access was detected, and slow METADATA occurred on MDS 4.
1.2 MDS 4 was restarted. During the MDS recovery process (replay or rejoin phase), excessive memory usage occurred, exceeding the physical machine＊s memory limit (384 GB) and causing repeated OOM (Out-of-Memory) errors.
1.3 New nodes (Rank 11, 12, 13) were added, but the OOM issue persisted.
1.4 MDS Rank 4 was replaced with a server with larger memory. It started normally and entered the active state.
1.5 When traversing the extended attributes of files under Rank 4, it was found that the /danlu/ProjectStorage/3990/ directory contained approximately 80 million objects. For load balancing purposes, the /danlu/ProjectStorage/3990 subtree was statically pinned to the newly added Rank 11 node.
1.6 During the export of data from Rank 4 to Rank 11, MDS 4 restarted automatically because the mds_beacon_grace parameter was set too short (120 seconds). Subsequently, both Rank 4 and Rank 11 entered an abnormal state (stuck in rejoin or clientreplay).
1.7 By configuring the following parameters, Rank 4 and Rank 11 returned to the active state (all parameters were reverted later), but the /danlu/ProjectStorage/ directory still failed to provide services:
mds_beacon_grace=3600
mds_bal_interval=0
mds_wipe_sessions true
mds_replay_unsafe_with_closed_session true
1.8 Attempted to adjust the static pin of /danlu/ProjectStorage/ from Rank 4 to Rank 11.
1.9 Current stable state: Checking the process status and logs of Rank 4 and Rank 11 shows that the md_submit thread has a 100% CPU utilization rate. Logs indicate that Rank 4 is exporting the /danlu/ProjectStorage/3990 subtree to Rank 11, but Rank 11 reports an empty export error and routes the request back to Rank 4.

2. FS Status

ceph fs status
dl_cephfs - 186 clients
=========
+------+--------+---------------------------+---------------+-------+-------+
| Rank | State&nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; MDS&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp; Activity&nbsp; &nbsp;|&nbsp; dns&nbsp; |&nbsp; inos |
+------+--------+---------------------------+---------------+-------+-------+
|&nbsp; 0&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-45&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 5 /s |&nbsp; 114k |&nbsp; 113k |
|&nbsp; 1&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-46&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 2 /s |&nbsp; 118k |&nbsp; 117k |
|&nbsp; 2&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-47&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp;11 /s | 5124k | 5065k |
|&nbsp; 3&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-10&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; 117 /s | 2408k | 2396k |
|&nbsp; 4&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-01&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 0 /s | 31.5k | 26.9k |
|&nbsp; 5&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-02&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 0 /s | 80.0k | 78.9k |
|&nbsp; 6&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-11&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 2 /s | 1145k | 1145k |
|&nbsp; 7&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-57&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 1 /s |&nbsp; 169k |&nbsp; 168k |
|&nbsp; 8&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-44&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp;33 /s | 10.1M | 10.1M |
|&nbsp; 9&nbsp; &nbsp;| active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-20&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 4 /s |&nbsp; 196k |&nbsp; 195k |
|&nbsp; 10&nbsp; | active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-43&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 2 /s | 1758k | 1751k |
|&nbsp; 11&nbsp; | active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-48&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 0 /s | 2879k | 2849k |
|&nbsp; 12&nbsp; | active |&nbsp; &nbsp; &nbsp; &nbsp; ceph-prod-60&nbsp; &nbsp; &nbsp; &nbsp;| Reqs:&nbsp; &nbsp; 0 /s | 1875&nbsp; | 1738&nbsp; |
|&nbsp; 13&nbsp; | active | fuxi-aliyun-ceph-res-tmp3 | Reqs:&nbsp; &nbsp; 3 /s | 80.8k | 60.3k |
+------+--------+---------------------------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|&nbsp; &nbsp; &nbsp; &nbsp;Pool&nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;type&nbsp; &nbsp;|&nbsp; used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 2109G | 2741G |
|&nbsp; &nbsp;cephfs_data&nbsp; &nbsp;|&nbsp; &nbsp;data&nbsp; &nbsp;| 1457T |&nbsp; 205T |
+-----------------+----------+-------+-------+
+--------------------------+
|&nbsp; &nbsp; &nbsp; &nbsp;Standby MDS&nbsp; &nbsp; &nbsp; &nbsp; |
+--------------------------+
| fuxi-aliyun-ceph-res-tmp |
+--------------------------+

3. MDS Ops

3.1 MDS.4
{
&nbsp; &nbsp; "description": "rejoin:client.92313222:258",
&nbsp; &nbsp; "initiated_at": "2025-08-30 05:24:02.110611",
&nbsp; &nbsp; "age": 9389.4134921530003,
&nbsp; &nbsp; "duration": 9389.4136841620002,
&nbsp; &nbsp; "type_data": {
&nbsp; &nbsp; &nbsp; &nbsp; "flag_point": "dispatched",
&nbsp; &nbsp; &nbsp; &nbsp; "reqid": "client.92313222:258",
&nbsp; &nbsp; &nbsp; &nbsp; "op_type": "no_available_op_found",
&nbsp; &nbsp; &nbsp; &nbsp; "events": [
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-30 05:24:02.110611",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "initiated"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-30 05:24:02.110611",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "header_read"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-30 05:24:02.110610",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "throttled"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-30 05:24:02.110626",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "all_read"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-30 05:24:02.110655",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "dispatched"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; ]
&nbsp; &nbsp; }
}

3.2 MDS.11

{
&nbsp; &nbsp; "ops": [
&nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "description": "rejoin:client.87797615:61578323",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "initiated_at": "2025-08-29 23:36:22.696648",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "age": 38670.621505474002,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "duration": 38670.621529709002,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "type_data": {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "flag_point": "dispatched",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "reqid": "client.87797615:61578323",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "op_type": "no_available_op_found",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "events": [
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-29 23:36:22.696648",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "initiated"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-29 23:36:22.696648",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "header_read"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-29 23:36:22.696646",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "throttled"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-29 23:36:22.696659",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "all_read"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "time": "2025-08-29 23:36:23.303274",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "event": "dispatched"
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ]
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; ],
&nbsp; &nbsp; "num_ops": 1
}

4. MDS Logs

4.1 mds.4

--- Start of Loop ---
2025-08-29 21:42:35.749 7fcfa3d45700&nbsp; 7 mds.4.migrator export_dir [dir 0x1000340af7a.00001* /danlu/ProjectStorage/ [2,head] auth{0=3,11=2} pv=173913808 v=173913806 cv=0/0 dir_auth=4 ap=2+1 state=1610613760|exporting f(v3168 91=0+91) n(v15319930 rc2025-07-17 14:10:02.703809 b6931715137150 1953080=1912827+40253) hs=1+0,ss=0+0 dirty=1 | request=1 child=1 subtree=1 subtreetemp=0 replicated=1 dirty=1 authpin=1 0x558fb40f8500] to 11
2025-08-29 21:42:35.749 7fcfa3d45700&nbsp; 7 mds.4.migrator already exporting
2025-08-29 21:42:35.749 7fcfa3d45700&nbsp; 7 mds.4.migrator export_dir [dir 0x1000340af7a.00110* /danlu/ProjectStorage/ [2,head] auth{0=3,11=2} pv=173913735 v=173913732 cv=0/0 dir_auth=4 ap=3+3 state=1610613760|exporting f(v3168 102=0+102) n(v15319926 rc2025-07-24 13:53:23.519020 b448426565839 422059=420844+1215) hs=3+0,ss=0+0 dirty=1 | request=1 child=1 subtree=1 subtreetemp=0 replicated=1 dirty=1 authpin=1 0x558fb40f8a00] to 11
2025-08-29 21:42:35.749 7fcfa3d45700&nbsp; 7 mds.4.migrator already exporting
2025-08-29 21:42:35.749 7fcfa3d45700&nbsp; 7 mds.4.migrator export_dir [dir 0x1000340af7a.00111* /danlu/ProjectStorage/ [2,head] auth{0=3,1=2,2=2,10=2,11=2,13=2} pv=173913731 v=173913730 cv=0/0 dir_auth=4 ap=2+1 state=1610613760|exporting f(v3168 102=0+102) n(v15319926 rc2025-08-27 12:17:12.186080 b25441350802673 324852174=320337481+4514693) hs=2+0,ss=0+0 | request=1 child=1 subtree=1 subtreetemp=0 replicated=1 dirty=1 authpin=1 0x558fb40f8f00] to 11
2025-08-29 21:42:35.749 7fcfa3d45700&nbsp; 7 mds.4.migrator already exporting
2025-08-29 21:42:35.749 7fcfa3d45700&nbsp; 7 mds.4.migrator export_dir [dir 0x1000340af7a.01001* /danlu/ProjectStorage/ [2,head] auth{0=3,1=2,2=2,3=2,6=2,7=2,8=3,9=2,10=2,11=2,13=2} pv=173914304 v=173914298 cv=0/0 dir_auth=4 ap=4+21 state=1610613760|exporting f(v3168 101=0+101) n(v15319926 rc2025-08-28 10:44:08.113080 b96173795872233 155213668=143249296+11964372) hs=2+0,ss=0+0 dirty=2 | request=1 child=1 subtree=1 subtreetemp=0 replicated=1 dirty=1 authpin=1 0x558fb40f9400] to 11
2025-08-29 21:42:35.749 7fcfa3d45700&nbsp; 7 mds.4.migrator already exporting
... ommited many already exporting for other dir_frags...
--- End of Loop ---

4.2 mds.11
--- Start of Loop ---

2025-08-29 21:42:31.991 7f6ab9899700&nbsp; 2 mds.11.cache Memory usage:&nbsp; total 50859980, rss 25329048, heap 331976, baseline 331976, 0 / 2849528 inodes have caps, 0 caps, 0 caps per inode
2025-08-29 21:42:31.991 7f6ab9899700&nbsp; 7 mds.11.server recall_client_state: min=100 max=1048576 total=0 flags=0xa
2025-08-29 21:42:31.991 7f6ab9899700&nbsp; 7 mds.11.server recalled 0 client caps.
2025-08-29 21:42:31.991 7f6abc09e700&nbsp; 0 log_channel(cluster) log [WRN] : 2 slow requests, 0 included below; oldest blocked for &gt; 30395.764013 secs
2025-08-29 21:42:31.995 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1895) from client.91614761
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 5 mds.ceph-prod-01 handle_mds_map old map epoch 1656906 <= 1656906, discarding
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1743) from client.91533406
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1737) from client.76133646
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1739) from client.91667655
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1895) from client.92253020
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1895) from client.92348807
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1895) from client.91480971
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1751) from client.87797615
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1739) from client.68366166
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 1390) from client.66458847
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 506) from client.91440537
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 309) from client.91117710
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 3 mds.11.server handle_client_session client_session(request_renewcaps seq 283) from client.92192971
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 7 mds.11.locker handle_file_lock a=nudge on (inest mix-&gt;lock(2) g=4 dirty) from mds.4 [inode 0x5002aada391 [...2,head] /danlu/ProjectStorage/1352/L36/screenTouchHtml/2025-08-27/ auth{4=2,10=3} v621993 pv621997 ap=4 dirtyparent f(v1 m2025-08-28 10:37:07.785514 1=0+1) n(v2 rc2025-08-28 10:37:07.809515 b29740 4=1+3) (inest mix-&gt;lock(2) g=4 dirty) (ifile mix-&gt;lock(2) w=1 flushing scatter_wanted) (iversion lock) | dirtyscattered=2 lock=1 importing=0 dirfrag=1 dirtyrstat=0 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x5579af8b5c00]
2025-08-29 21:42:32.703 7f6abe0a2700&nbsp; 7 mds.11.locker handle_file_lock trying nudge on (inest mix-&gt;lock(2) g=4 dirty) on [inode 0x5002aada391 [...2,head] /danlu/ProjectStorage/1352/L36/screenTouchHtml/2025-08-27/ auth{4=2,10=3} v621993 pv621997 ap=4 dirtyparent f(v1 m2025-08-28 10:37:07.785514 1=0+1) n(v2 rc2025-08-28 10:37:07.809515 b29740 4=1+3) (inest mix-&gt;lock(2) g=4 dirty) (ifile mix-&gt;lock(2) w=1 flushing scatter_wanted) (iversion lock) | dirtyscattered=2 lock=1 importing=0 dirfrag=1 dirtyrstat=0 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x5579af8b5c00]
2025-08-29 21:42:32.991 7f6ab9899700&nbsp; 7 mds.11.cache trim bytes_used=6GB limit=32GB reservation=0.05% count=0
2025-08-29 21:42:32.991 7f6ab9899700&nbsp; 7 mds.11.cache trim_lru trimming 0 items from LRU size=2879068 mid=1961726 pintail=0 pinned=76602
2025-08-29 21:42:32.991 7f6ab9899700&nbsp; 7 mds.11.cache trim_lru trimmed 0 items
2025-08-29 21:42:32.991 7f6ab9899700&nbsp; 7 mds.11.migrator export_empty_import [dir 0x5002aef450c.100110000001011* /danlu/ProjectStorage/3990/scrapy_html/ [2,head] auth{4=2} v=228873 cv=0/0 dir_auth=11 state=1073741824 f(v1 m2025-06-10 11:04:10.789649 1091=1091+0) n(v8 rc2025-06-10 11:04:10.789649 b133525839 1091=1091+0) hs=0+0,ss=0+0 | subtree=1 subtreetemp=0 replicated=1 0x557627968000]
2025-08-29 21:42:32.991 7f6ab9899700&nbsp; 7 mds.11.migrator&nbsp; really empty, exporting to 4
2025-08-29 21:42:32.991 7f6ab9899700&nbsp; 7 mds.11.migrator exporting to mds.4 empty import [dir 0x5002aef450c.100110000001011* /danlu/ProjectStorage/3990/scrapy_html/ [2,head] auth{4=2} v=228873 cv=0/0 dir_auth=11 state=1073741824 f(v1 m2025-06-10 11:04:10.789649 1091=1091+0) n(v8 rc2025-06-10 11:04:10.789649 b133525839 1091=1091+0) hs=0+0,ss=0+0 | subtree=1 subtreetemp=0 replicated=1 0x557627968000]
2025-08-29 21:42:32.991 7f6ab9899700&nbsp; 7 mds.11.migrator export_dir [dir 0x5002aef450c.100110000001011* /danlu/ProjectStorage/3990/scrapy_html/ [2,head] auth{4=2} v=228873 cv=0/0 dir_auth=11 state=1073741824 f(v1 m2025-06-10 11:04:10.789649 1091=1091+0) n(v8 rc2025-06-10 11:04:10.789649 b133525839 1091=1091+0) hs=0+0,ss=0+0 | subtree=1 subtreetemp=0 replicated=1 0x557627968000] to 4
... ommited many export_empty_import for other dir_frags...
--- End of Loop, and all export_empty_imports above will be printed ---

5. Cluster informations

Version: v1.42.21

Current Config:
[global]
...cluster IPs omitted

# update on Sep,26 2021
rbd_cache = false
rbd_op_threads = 4

mon_osd_nearfull_ratio = 0.89
mds_beacon_grace = 300
mon_max_pg_per_osd = 300

[mon]
mon_allow_pool_delete = true
mon osd allow primary affinity = true
mon_osd_cache_size = 40000
rocksdb_cache_size = 5368709120
debug_mon = 5/5
mon_lease_ack_timeout_factor=4
mon_osd_min_down_reporters = 3

[mds]
mds_cache_memory_limit = 34359738368
#mds_cache_memory_limit = 23622320128
mds_bal_min_rebalance = 1000
mds_log_max_segments = 256
mds_session_blacklist_on_evict = false
mds_session_blacklist_on_timeout = false
mds_cap_revoke_eviction_timeout = 300

[osd]
# update on Sep,26 2021
osd_op_threads = 32
osd_op_num_shards = 16
osd_op_num_threads_per_shard_hdd = 1
osd_op_num_threads_per_shard_ssd = 2
#rbd_op_threads = 16
osd_disk_threads = 16
filestore_op_threads = 16
osd_scrub_min_interval=1209600
osd_scrub_max_interval=2592000
osd_deep_scrub_interval=2592000
osd_scrub_begin_hour = 23
osd_scrub_end_hour = 18
osd_scrub_chunk_min = 5
osd_scrub_chunk_max = 25
osd_max_scrubs=2
osd_op_queue_cut_off = high
osd_max_backfills = 4
bluefs_buffered_io = true
#bluestore_fsck_quick_fix_threads = 2
osd_max_pg_per_osd_hard_ratio = 6
osd_crush_update_on_start = false
osd_scrub_during_recovery = true
osd_repair_during_recovery = true
bluestore_cache_trim_max_skip_pinned = 10000
osd_heartbeat_grace = 60
osd_heartbeat_interval = 60
osd_op_thread_suicide_timeout = 2000
osd_op_thread_timeout = 90

[client]
rgw cache enabled = true
rgw cache lru size = 100000
rgw thread pool size = 4096
rgw_max_concurrent_requests = 4096
rgw override bucket index max shards = 32
rgw lifecycle work time = 00:00-24:00

Best Regards,
Jun He
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx