Re: Support for transferring sparse files via scp/sftp correctly?

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

 



On Fri, 4 Apr 2025 at 07:07, Ron Frederick <ronf@xxxxxxxxxxxxx> wrote:
>
> On Apr 3, 2025, at 6:02 PM, Darren Tucker <dtucker@xxxxxxxxxxx> wrote:
> > On Sat, 29 Mar 2025 at 16:14, Ron Frederick <ronf@xxxxxxxxxxxxx <mailto:ronf@xxxxxxxxxxxxx>> wrote:
> >> [...]
> >> If you don’t get all of the requested ranges in a single request, additional requests can be sent starting at just past the end of the last range previously returned.
> >>
> >> What do you think?
> >
> > That seems like it'd work well for things with SEEK_HOLE or equivalent, although there's always the chance of the underlying file changing between mapping it out and doing the transfer.
>
> Since my last message, I’ve also implemented support for this in Windows, which has a DeviceIOControl called FSCTL_QUERY_ALLOCATED_RANGES that returns an array of offset and length values, within a given range in a file (also specified by offset and length). So, it’s almost a direct mapping to the extension I proposed. I basically have three different versions of a request_ranges() function (Windows, systems with SEEK_DATA/SEEK_HOLE, and a dummy implementation for all other platforms which just returns the full range passed in).
>
> The risk of missing data due to file changes is no different than what could happen if you were reading data sequentially and something did a write to the source file after you had already copied that part of the file.
>
>
> > Damien pointed out that it's possible to do a reasonable but not perfect sparse file support by memcmp'ing your existing file buffer with a block of zeros and skipping the write if it matches.  OpenBSD's cp(1) does this (look for "skipholes"): https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/cp/utils.c?annotate=HEAD.

This should not be done. Either a system has SEEK_DATA/SEEK_HOLE,
Win32 (Windows&ReactOS) FSCTL_QUERY_ALLOCATED_RANGES, or just copy all
bytes.
The misunderstanding is that sequences of 0x00 bytes are automatically
holes. That is not true. Holes represent ranges of "no data", and only
for backwards compatibility read as 0x00 bytes. Valid data ranges can
contain long sequences of 0x00 bytes, therefore PLEASE don't invent
extra holes in sparse files just because they are sequences of 0x00
bytes.

Lionel
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev




[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux