Justin Tobler <jltobler@xxxxxxxxx> writes: > On 25/08/22 09:49AM, Junio C Hamano wrote: >> Justin Tobler <jltobler@xxxxxxxxx> writes: >> >> > /* >> > - * This creates one packfile per large blob unless bulk-checkin >> > - * machinery is "plugged". >> > + * This writes the specified object to a packfile. Objects written here >> > + * during the same transaction are written to the same packfile. The >> > + * packfile is not flushed until the transaction is flushed. The caller >> > + * is expected to ensure a valid transaction is setup for objects to be >> > + * recorded to. >> > * >> > * This also bypasses the usual "convert-to-git" dance, and that is on >> > * purpose. We could write a streaming version of the converting >> > diff --git a/object-file.c b/object-file.c >> > index 1740aa2b2e3..bc15af42450 100644 >> > --- a/object-file.c >> > +++ b/object-file.c >> > @@ -1253,19 +1253,26 @@ int index_fd(struct index_state *istate, struct object_id *oid, >> > * Call xsize_t() only when needed to avoid potentially unnecessary >> > * die() for large files. >> > */ >> > - if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) >> > + if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) { >> > ret = index_stream_convert_blob(istate, oid, fd, path, flags); >> > - else if (!S_ISREG(st->st_mode)) >> > + } else if (!S_ISREG(st->st_mode)) { >> > ret = index_pipe(istate, oid, fd, type, path, flags); >> > - else if ((st->st_size >= 0 && (size_t) st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) || >> > - type != OBJ_BLOB || >> > - (path && would_convert_to_git(istate, path))) >> > + } else if ((st->st_size >= 0 && >> > + (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) || >> > + type != OBJ_BLOB || >> > + (path && would_convert_to_git(istate, path))) { >> > ret = index_core(istate, oid, fd, xsize_t(st->st_size), >> > type, path, flags); >> > - else >> > - ret = index_blob_bulk_checkin(the_repository->objects->transaction, >> > + } else { >> > + struct odb_transaction *transaction; >> > + >> > + transaction = begin_odb_transaction(the_repository->objects); >> > + ret = index_blob_bulk_checkin(transaction, >> > oid, fd, xsize_t(st->st_size), >> > path, flags); >> > + end_odb_transaction(transaction); >> > + } >> >> Interesting. If the caller does the odb transaction management >> itself by calling begin/end before calling this function and fed two >> or more large objects, the original code did the right thing with >> .nesting set to 1. In the new code, it still does the right thing >> even during the call to index_blob_bulk_checkin we raise .nesting by >> one, because the all non-zero .nesting values mean the same thing to >> the machinery, thanks to lazy initialization of the .objdir. > > In the original code, if no transaction was setup prior to invoking > index_blob_bulk_checkin() (i.e. nesting == 0), the object would be > written to its own packfile and flushed immediately. If a transaction > had been started further upstream (i.e nesting > 0), the object would be > written to a packfile, but not flushed. This allowed for subseqent calls > to index_blob_bulk_checkin() to write objects to the same packfile. Only > when transaction ended would the packfile be flushed. > > With this change, index_blob_bulk_checkin() must be provided a > transaction so it can write the object to the packfile. As there is now > always a transaction involved, there is no longer any automatic packfile > flushing. The caller is required to ensure a transaction handling as > appropriate. > >> So, isn't the comment above the function now less accurate than >> before? The caller of this function does not have to do anything >> and we do not expect the caller to "ensure a valid transaction" at >> all, no? > > I'm not quite sure I follow. index_blob_bulk_checkin() now expects a > transaction to be setup even if we intend to only write a single object. > Thus the call site in index_fd() is adjusted to ensure there is a > transaction via invoking begin_odb_transaction() and > end_odb_transaction() before/after the function respectively. > > Just to clarify, are we talking about the comment above > index_blob_bulk_checkin()? Ahh, sorry, I misread the patch and missed the hunk/file boundary; iow, thought the comment was about index_fd(), which was the source of my confusion. Thanks for correcting me.