Re: [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/08/22 09:49AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@xxxxxxxxx> writes:
> 
> >  /*
> > - * This creates one packfile per large blob unless bulk-checkin
> > - * machinery is "plugged".
> > + * This writes the specified object to a packfile. Objects written here
> > + * during the same transaction are written to the same packfile. The
> > + * packfile is not flushed until the transaction is flushed. The caller
> > + * is expected to ensure a valid transaction is setup for objects to be
> > + * recorded to.
> >   *
> >   * This also bypasses the usual "convert-to-git" dance, and that is on
> >   * purpose. We could write a streaming version of the converting
> > diff --git a/object-file.c b/object-file.c
> > index 1740aa2b2e3..bc15af42450 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1253,19 +1253,26 @@ int index_fd(struct index_state *istate, struct object_id *oid,
> >  	 * Call xsize_t() only when needed to avoid potentially unnecessary
> >  	 * die() for large files.
> >  	 */
> > -	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path))
> > +	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) {
> >  		ret = index_stream_convert_blob(istate, oid, fd, path, flags);
> > -	else if (!S_ISREG(st->st_mode))
> > +	} else if (!S_ISREG(st->st_mode)) {
> >  		ret = index_pipe(istate, oid, fd, type, path, flags);
> > -	else if ((st->st_size >= 0 && (size_t) st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
> > -		 type != OBJ_BLOB ||
> > -		 (path && would_convert_to_git(istate, path)))
> > +	} else if ((st->st_size >= 0 &&
> > +		    (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
> > +		   type != OBJ_BLOB ||
> > +		   (path && would_convert_to_git(istate, path))) {
> >  		ret = index_core(istate, oid, fd, xsize_t(st->st_size),
> >  				 type, path, flags);
> > -	else
> > -		ret = index_blob_bulk_checkin(the_repository->objects->transaction,
> > +	} else {
> > +		struct odb_transaction *transaction;
> > +
> > +		transaction = begin_odb_transaction(the_repository->objects);
> > +		ret = index_blob_bulk_checkin(transaction,
> >  					      oid, fd, xsize_t(st->st_size),
> >  					      path, flags);
> > +		end_odb_transaction(transaction);
> > +	}
> 
> Interesting.  If the caller does the odb transaction management
> itself by calling begin/end before calling this function and fed two
> or more large objects, the original code did the right thing with
> .nesting set to 1.  In the new code, it still does the right thing
> even during the call to index_blob_bulk_checkin we raise .nesting by
> one, because the all non-zero .nesting values mean the same thing to
> the machinery, thanks to lazy initialization of the .objdir.

In the original code, if no transaction was setup prior to invoking
index_blob_bulk_checkin() (i.e. nesting == 0), the object would be
written to its own packfile and flushed immediately. If a transaction
had been started further upstream (i.e nesting > 0), the object would be
written to a packfile, but not flushed. This allowed for subseqent calls
to index_blob_bulk_checkin() to write objects to the same packfile. Only
when transaction ended would the packfile be flushed.

With this change, index_blob_bulk_checkin() must be provided a
transaction so it can write the object to the packfile. As there is now
always a transaction involved, there is no longer any automatic packfile
flushing. The caller is required to ensure a transaction handling as
appropriate.

> So, isn't the comment above the function now less accurate than
> before?  The caller of this function does not have to do anything
> and we do not expect the caller to "ensure a valid transaction" at
> all, no?

I'm not quite sure I follow. index_blob_bulk_checkin() now expects a
transaction to be setup even if we intend to only write a single object.
Thus the call site in index_fd() is adjusted to ensure there is a
transaction via invoking begin_odb_transaction() and
end_odb_transaction() before/after the function respectively.

Just to clarify, are we talking about the comment above
index_blob_bulk_checkin()?

-Justin




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux