Re: [PATCH 04/16] reftable/block: simplify how we track restart points

Patrick Steinhardt <ps@xxxxxx> · Mon, 7 Apr 2025 14:31:07 +0200

On Thu, Apr 03, 2025 at 08:17:50AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > Restart points record the location of reftable records that do not use
> > prefix compression and are used to perform a binary search inside of a
> > block. These restart points are encoded at the end of a block, between
> > the record data and the footer of a table.
> >
> > The block structure contains three different variables related to these
> > restart points:
> >
> >   - The block length contains the length of the reftable block up to the
> >     restart points.
> >
> >   - The restart count contains the number of restart points contained in
> >     the block.
> >
> >   - The restart bytes variable tracks where the restart point data
> >     begins.
> >
> > Tracking all three of these variables is unnecessary though as the data
> > can be derived from one another: the block length without restart points
> > is the exact same as the offset of the restart count data, which we
> > already track via the `restart_bytes` data.
> >
> 
> Nit: This para makes it seem as if we'd eliminate 'block length' in
> support of having/keeping `restart_bytes`, but we remove both.

We don't, we only remove the block length. The restart bytes are
retained, but they are renamed to `restart_off` to better reflect what
it actually contains. The next paragraph tries to explain this:

> > Refactor the code so that we track the location of restart bytes not as
> > a pointer, but instead as an offset. This allows us to trivially get rid
> > of the `block_len` variable as described above. This avoids having the
> > confusing `block_len` variable and allows us to do less bookkeeping
> > overall.
> >
> > Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> > ---
> >  reftable/block.c | 25 ++++++++++++-------------
> >  reftable/block.h |  8 +++++---
> >  reftable/table.c |  2 +-
> >  3 files changed, 18 insertions(+), 17 deletions(-)
> >
> > diff --git a/reftable/block.c b/reftable/block.c
> > index 97740187259..f2567a8f0fd 100644
> > --- a/reftable/block.c
> > +++ b/reftable/block.c
> > @@ -216,10 +216,9 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
> >  	uint32_t full_block_size = table_block_size;
> >  	uint8_t typ = block->data[header_off];
> >  	uint32_t sz = reftable_get_be24(block->data + header_off + 1);
> > -	int err = 0;
> > -	uint16_t restart_count = 0;
> > -	uint32_t restart_start = 0;
> > -	uint8_t *restart_bytes = NULL;
> > +	uint16_t restart_count;
> > +	uint32_t restart_off;
> 
> Nit: I guess this is to be consistent with `header_off`, but I would
> think spelling it out as `header_offset` is much easier to understand.

Yeah, I'm not much of a fan of such abbreviations, either. But I'd like
to retain this abbreviation for the sake of consistency.

Patrick