On Thu, Apr 24, 2025 at 06:09:45PM +0200, Luca Di Maio wrote: > On Wed, Apr 23, 2025 at 01:23:58PM -0700, Darrick J. Wong wrote: > > On Wed, Apr 23, 2025 at 06:03:17PM +0200, Luca Di Maio wrote: > > > +static void fail(char *msg, int i) > > > +{ > > > + fprintf(stderr, _("%s: %s [%d - %s]\n"), progname, msg, i, strerror(i)); > > > + exit(1); > > > +} > > > + > > > +static int newregfile(char *fname) > > > +{ > > > + int fd; > > > + off_t size; > > > + > > > + if ((fd = open(fname, O_RDONLY)) < 0 || (size = filesize(fd)) < 0) { > > > + fprintf(stderr, _("%s: cannot open %s: %s\n"), progname, fname, > > > + strerror(errno)); > > > + exit(1); > > > + } > > > + > > > + return fd; > > > +} > > > > Why is this copy-pasting code from proto.c? Put the new functions > > there, and then you don't need all this externing. > > > > Right, this is because with a separate flag I thought it would have been > better to keep it in a separate file. Common functionality goes together in a C module, regardless of how it gets called. > With the new behaviour you proposed in the previous mail (one -p flag, > check if file/directory) then I can unify back into proto.c, thus > removing all the exported functions changes. <nod> > > > + > > > +static void writetimestamps(struct xfs_inode *ip, struct stat statbuf) > > > +{ > > > + struct timespec64 ts; > > > + > > > + /* > > > + * Copy timestamps from source file to destination inode. > > > + * In order to not be influenced by our own access timestamp, > > > + * we set atime and ctime to mtime of the source file. > > > + * Usually reproducible archives will delete or not register > > > + * atime and ctime, for example: > > > + * https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html > > > + */ > > > + ts.tv_sec = statbuf.st_mtime; > > > + ts.tv_nsec = statbuf.st_mtim.tv_nsec; > > > + inode_set_atime_to_ts(VFS_I(ip), ts); > > > + inode_set_ctime_to_ts(VFS_I(ip), ts); > > > + inode_set_mtime_to_ts(VFS_I(ip), ts); > > > > This seems weird to me that you'd set [ac]time to mtime. Why not open > > the source file O_ATIME and copy atime? And why would copying ctime not > > result in a reproducible build? > > > > Not sure what you do about crtime. > > > > The problem stems from the extraction of the artifact. Usually > reproducible archives will remove [ac]time and only keep mtime, but in > the moment that a file is extracted, any filesystem will assign [ac]time > to the moment of extraction. > This will add randomness not to the filesystem itself, because it will > be reproducible if acting on the same extracted archive, but it will not > be reproducible if acting on a new extraction of the same archive. > > Another approach we can do is what mkfs.ext4's populate functionality is > doing: while it preserves mtime, [cr,a,c]time is set to whatever time the > mkfs command is running. > > This would make it preserve the important timestamp (mtime) and move the > "problem" of the reproducible/changing timestamp to the environment, > while keeping the behaviour of mkfs.xfs sensible > > What do you think? The thing is, if you were relying on atime/mtime for detection of "file data changed since last read" then /not/ copying atime into the filesystem breaks that property in the image. How about copying [acm]time from the source file by default, but then add a new -p noatime option to skip the atime? ctime/crtime should be the current time when mkfs command is running. I assume that you have a gettimeofday type wrapper that makes it always return the same value? > > > + /* > > > + * copy over file content, attributes and > > > + * timestamps > > > + */ > > > + if (fd != 0) { > > > + writefile(ip, fname, fd); > > > + writeattrs(ip, fname, fd); > > > > Since we're adding features, should this read the fsxattr info from the > > source file, override it with the set fields in *fsxp, and set that on > > the file? If you're going to slurp up a directory, you might as well > > get all the non-xattr file attributes. > > > > Right, I thought creatproto() did that, but now I see that this is done > only for the root inode, I'll add this for others too, thanks. Right. > > > + libxfs_parent_finish(mp, ppargs); > > > + tp = NULL; > > > > Shouldn't this copy xattrs and fsxattrs to directories and symlinks too? > > > > Right, will add, thanks. <nod> > > > +/* > > > + * walk_dir will recursively list files and directories > > > + * and populate the mountpoint *mp with them using handle_direntry(). > > > + */ > > > +static void walk_dir(struct xfs_mount *mp, struct xfs_inode *pip, > > > + struct fsxattr *fsxp, char *cur_path) > > > +{ > > > + DIR *dir; > > > + struct dirent *entry; > > > + > > > + /* > > > + * open input directory and iterate over all entries in it. > > > + * when another directory is found, we will recursively call > > > + * populatefromdir. > > > + */ > > > + if ((dir = opendir(cur_path)) == NULL) > > > + fail(_("cannot open input dir"), 1); > > > + while ((entry = readdir(dir)) != NULL) { > > > + handle_direntry(mp, pip, fsxp, cur_path, entry); > > > + } > > > + closedir(dir); > > > +} > > > > nftw() ? Which has the nice feature of constraining the number of open > > dirs at any given time. > > > > --D > > > > The problem with nftw() is that working with callback functions, we will > need to switch to static variables for state, for example to keep track > of each ip's pip, while with the recursive approach we can have some > state and basically walk_dir() behaves similar to parseproto(), making > changes to the rest of the file minimal. > This seems to involve a lot more changes than now where we're basically > just adding a limited number of functions to proto.c. Eck, ok. Never mind then. I guess we could try to bump RLIMIT_NOFILE in that case to avoid EMFILE. --D > Thanks again for the review Darrick, > I'll wait for your feedback on the walk_dir() vs nftw() and the [ac]time > approach, > thanks > > L. >