On Wed, 2025-08-06 at 14:23 +0530, Kotresh Hiremath Ravishankar wrote: > On Sat, Aug 2, 2025 at 1: 31 AM Viacheslav Dubeyko <Slava. Dubeyko@ ibm. com> wrote: > > On Fri, 2025-08-01 at 22: 59 +0530, Kotresh Hiremath Ravishankar wrote: > > > > Hi, > > > > 1. I will modify the commit message > > On Sat, Aug 2, 2025 at 1:31 AM Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote: > > > > On Fri, 2025-08-01 at 22:59 +0530, Kotresh Hiremath Ravishankar wrote: > > > > > > Hi, > > > > > > 1. I will modify the commit message to clearly explain the issue in the next revision. > > > 2. The maximum possible length for the fsname is not defined in mds side. I didn't find any restriction imposed on the length. So we need to live with it. > > > > We have two constants in Linux kernel [1]: > > > > #define NAME_MAX 255 /* # chars in a file name */ > > #define PATH_MAX 4096 /* # chars in a path name including nul */ > > > > I don't think that fsname can be bigger than PATH_MAX. > > As I had mentioned earlier, the CephFS server side code is not restricting the filesystem name > during creation. I validated the creation of a filesystem name with a length of 5000. > Please try the following. > > [kotresh@fedora build]$ alpha_str=$(< /dev/urandom tr -dc 'a-zA-Z' | head -c 5000) > [kotresh@fedora build]$ ceph fs new $alpha_str cephfs_data cephfs_metadata > [kotresh@fedora build]$ bin/ceph fs ls > > So restricting the fsname length in the kclient would likely cause issues. If we need to enforce the limitation, I think, it should be done at server side first and it’s a separate effort. > I am not sure that Linux kernel is capable to digest any name bigger than NAME_MAX. Are you sure that we can pass xfstests with filesystem name bigger than NAME_MAX? Another point here that we can put buffer with name inline into struct ceph_mdsmap if the name cannot be bigger than NAME_MAX, for example. In this case we don't need to allocate fs_name memory for every ceph_mdsmap_decode() call. > > > > > 3. I will fix up doutc in the next revision. > > > 4. The fs_name is part of the mdsmap in the server side [1]. The kernel client decodes only necessary fields from the mdsmap sent by the server. Until now, the fs_name > > > was not being decoded, as part of this fix, it's required and being decoded. > > > > > > > Correct me if I am wrong. I can create a Ceph cluster with several MDS servers. > > In this cluster, I can create multiple file system volumes. And every file > > system volume will have some name (fs_name). So, if we store fs_name into > > mdsmap, then which name do we imply here? Do we imply cluster name as fs_name or > > name of particular file system volume? > > In CephFS, we mainly deal with two maps MDSMap[1] and FSMap[2]. The MDSMap represents > the state for a particular single filesystem. So the ‘fs_name’ in the MDSMap points to a file system > name that the MDSMap represents. Each filesystem will have a distinct MDSMap. The FSMap was > introduced to support multiple filesystems in the cluster. The FSMap holds all the filesystems in the > cluster using the MDSMap of each file system. The clients subscribe to these maps. So when kclient > is receiving a mdsmap, it’s corresponding to the filesystem it’s dealing with. > So, it's sounds to me that MDS keeps multiple MDSMaps for multiple file systems. And kernel side receives only MDSMap for operations. The FSMap is kept on MDS side and kernel never receives it. Am I right here? Thanks, Slava. > [1] https://github.com/ceph/ceph/blob/main/src/mds/MDSMap.h > [2] https://github.com/ceph/ceph/blob/main/src/mds/FSMap.h > > Thanks, > Kotresh H R > > > > > Thanks, > > Slava. > > > > > > > > > > > > [1] > > https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/limits.h#L12 > > > > > [1] https://github.com/ceph/ceph/blob/main/src/mds/MDSMap.h#L596 > > > > > > On Tue, Jul 29, 2025 at 11:57 PM Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote: > > > > On Tue, 2025-07-29 at 22:32 +0530, khiremat@xxxxxxxxxx wrote: > > > > > From: Kotresh HR <khiremat@xxxxxxxxxx> > > > > > > > > > > The mds auth caps check should also validate the > > > > > fsname along with the associated caps. Not doing > > > > > so would result in applying the mds auth caps of > > > > > one fs on to the other fs in a multifs ceph cluster. > > > > > The patch fixes the same. > > > > > > > > > > Steps to Reproduce (on vstart cluster): > > > > > 1. Create two file systems in a cluster, say 'a' and 'b' > > > > > 2. ceph fs authorize a client.usr / r > > > > > 3. ceph fs authorize b client.usr / rw > > > > > 4. ceph auth get client.usr >> ./keyring > > > > > 5. sudo bin/mount.ceph usr@.a=/ /kmnt_a_usr/ > > > > > 6. touch /kmnt_a_usr/file1 (SHOULD NOT BE ALLOWED) > > > > > 7. sudo bin/mount.ceph admin@.a=/ /kmnt_a_admin > > > > > 8. echo "data" > /kmnt_a_admin/admin_file1 > > > > > 9. rm -f /kmnt_a_usr/admin_file1 (SHOULD NOT BE ALLOWED) > > > > > > > > > > > > > I think we are missing to explain here which problem or > > > > symptoms will see the user that has this issue. I assume that > > > > this will be seen as the issue reproduction: > > > > > > > > With client_3 which has only 1 filesystem in caps is working as expected > > > > mkdir /mnt/client_3/test_3 > > > > mkdir: cannot create directory ‘/mnt/client_3/test_3’: Permission denied > > > > > > > > Am I correct here? > > > > > > > > > URL: https://tracker.ceph.com/issues/72167 > > > > > Signed-off-by: Kotresh HR <khiremat@xxxxxxxxxx> > > > > > --- > > > > > fs/ceph/mds_client.c | 10 ++++++++++ > > > > > fs/ceph/mdsmap.c | 11 ++++++++++- > > > > > fs/ceph/mdsmap.h | 1 + > > > > > 3 files changed, 21 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > > > index 9eed6d73a508..ba91f3360ff6 100644 > > > > > --- a/fs/ceph/mds_client.c > > > > > +++ b/fs/ceph/mds_client.c > > > > > @@ -5640,11 +5640,21 @@ static int ceph_mds_auth_match(struct ceph_mds_client *mdsc, > > > > > u32 caller_uid = from_kuid(&init_user_ns, cred->fsuid); > > > > > u32 caller_gid = from_kgid(&init_user_ns, cred->fsgid); > > > > > struct ceph_client *cl = mdsc->fsc->client; > > > > > + const char *fs_name = mdsc->mdsmap->fs_name; > > > > > const char *spath = mdsc->fsc->mount_options->server_path; > > > > > bool gid_matched = false; > > > > > u32 gid, tlen, len; > > > > > int i, j; > > > > > > > > > > + if (auth->match.fs_name && strcmp(auth->match.fs_name, fs_name)) { > > > > > > > > Should we consider to use strncmp() here? > > > > We should have the limitation of maximum possible name length. > > > > > > > > > + doutc(cl, "fsname check failed fs_name=%s match.fs_name=%s\n", > > > > > + fs_name, auth->match.fs_name); > > > > > + return 0; > > > > > + } else { > > > > > + doutc(cl, "fsname check passed fs_name=%s match.fs_name=%s\n", > > > > > + fs_name, auth->match.fs_name); > > > > > > > > I assume that we could call the doutc with auth->match.fs_name == NULL. So, I am > > > > expecting to have a crash here. > > > > > > > > > + } > > > > > + > > > > > doutc(cl, "match.uid %lld\n", auth->match.uid); > > > > > if (auth->match.uid != MDS_AUTH_UID_ANY) { > > > > > if (auth->match.uid != caller_uid) > > > > > diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c > > > > > index 8109aba66e02..f1431ba0b33e 100644 > > > > > --- a/fs/ceph/mdsmap.c > > > > > +++ b/fs/ceph/mdsmap.c > > > > > @@ -356,7 +356,15 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p, > > > > > /* enabled */ > > > > > ceph_decode_8_safe(p, end, m->m_enabled, bad_ext); > > > > > /* fs_name */ > > > > > - ceph_decode_skip_string(p, end, bad_ext); > > > > > + m->fs_name = ceph_extract_encoded_string(p, end, NULL, GFP_NOFS); > > > > > + if (IS_ERR(m->fs_name)) { > > > > > + err = PTR_ERR(m->fs_name); > > > > > + m->fs_name = NULL; > > > > > + if (err == -ENOMEM) > > > > > + goto out_err; > > > > > + else > > > > > + goto bad; > > > > > + } > > > > > } > > > > > /* damaged */ > > > > > if (mdsmap_ev >= 9) { > > > > > @@ -418,6 +426,7 @@ void ceph_mdsmap_destroy(struct ceph_mdsmap *m) > > > > > kfree(m->m_info); > > > > > } > > > > > kfree(m->m_data_pg_pools); > > > > > + kfree(m->fs_name); > > > > > kfree(m); > > > > > } > > > > > > > > > > diff --git a/fs/ceph/mdsmap.h b/fs/ceph/mdsmap.h > > > > > index 1f2171dd01bf..acb0a2a3627a 100644 > > > > > --- a/fs/ceph/mdsmap.h > > > > > +++ b/fs/ceph/mdsmap.h > > > > > @@ -45,6 +45,7 @@ struct ceph_mdsmap { > > > > > bool m_enabled; > > > > > bool m_damaged; > > > > > int m_num_laggy; > > > > > + char *fs_name; > > > > > > > > The ceph_mdsmap structure describes servers in the mds cluster [1]. > > > > Semantically, I don't see any relation of fs_name with this structure. > > > > As a result, I don't see the point to keep this pointer in this structure. > > > > Why the fs_name has been placed in this structure? > > > > > > > > Thanks, > > > > Slava. > > > > > > > > > }; > > > > > > > > > > static inline struct ceph_entity_addr * > > > > > > > > [1] https://elixir.bootlin.com/linux/v6.16/source/fs/ceph/mdsmap.h#L11 > > > >