On 25/08/22 09:49AM, Junio C Hamano wrote: > Justin Tobler <jltobler@xxxxxxxxx> writes: > > > /* > > - * This creates one packfile per large blob unless bulk-checkin > > - * machinery is "plugged". > > + * This writes the specified object to a packfile. Objects written here > > + * during the same transaction are written to the same packfile. The > > + * packfile is not flushed until the transaction is flushed. The caller > > + * is expected to ensure a valid transaction is setup for objects to be > > + * recorded to. > > * > > * This also bypasses the usual "convert-to-git" dance, and that is on > > * purpose. We could write a streaming version of the converting > > diff --git a/object-file.c b/object-file.c > > index 1740aa2b2e3..bc15af42450 100644 > > --- a/object-file.c > > +++ b/object-file.c > > @@ -1253,19 +1253,26 @@ int index_fd(struct index_state *istate, struct object_id *oid, > > * Call xsize_t() only when needed to avoid potentially unnecessary > > * die() for large files. > > */ > > - if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) > > + if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) { > > ret = index_stream_convert_blob(istate, oid, fd, path, flags); > > - else if (!S_ISREG(st->st_mode)) > > + } else if (!S_ISREG(st->st_mode)) { > > ret = index_pipe(istate, oid, fd, type, path, flags); > > - else if ((st->st_size >= 0 && (size_t) st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) || > > - type != OBJ_BLOB || > > - (path && would_convert_to_git(istate, path))) > > + } else if ((st->st_size >= 0 && > > + (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) || > > + type != OBJ_BLOB || > > + (path && would_convert_to_git(istate, path))) { > > ret = index_core(istate, oid, fd, xsize_t(st->st_size), > > type, path, flags); > > - else > > - ret = index_blob_bulk_checkin(the_repository->objects->transaction, > > + } else { > > + struct odb_transaction *transaction; > > + > > + transaction = begin_odb_transaction(the_repository->objects); > > + ret = index_blob_bulk_checkin(transaction, > > oid, fd, xsize_t(st->st_size), > > path, flags); > > + end_odb_transaction(transaction); > > + } > > Interesting. If the caller does the odb transaction management > itself by calling begin/end before calling this function and fed two > or more large objects, the original code did the right thing with > .nesting set to 1. In the new code, it still does the right thing > even during the call to index_blob_bulk_checkin we raise .nesting by > one, because the all non-zero .nesting values mean the same thing to > the machinery, thanks to lazy initialization of the .objdir. In the original code, if no transaction was setup prior to invoking index_blob_bulk_checkin() (i.e. nesting == 0), the object would be written to its own packfile and flushed immediately. If a transaction had been started further upstream (i.e nesting > 0), the object would be written to a packfile, but not flushed. This allowed for subseqent calls to index_blob_bulk_checkin() to write objects to the same packfile. Only when transaction ended would the packfile be flushed. With this change, index_blob_bulk_checkin() must be provided a transaction so it can write the object to the packfile. As there is now always a transaction involved, there is no longer any automatic packfile flushing. The caller is required to ensure a transaction handling as appropriate. > So, isn't the comment above the function now less accurate than > before? The caller of this function does not have to do anything > and we do not expect the caller to "ensure a valid transaction" at > all, no? I'm not quite sure I follow. index_blob_bulk_checkin() now expects a transaction to be setup even if we intend to only write a single object. Thus the call site in index_fd() is adjusted to ensure there is a transaction via invoking begin_odb_transaction() and end_odb_transaction() before/after the function respectively. Just to clarify, are we talking about the comment above index_blob_bulk_checkin()? -Justin