Re: Support for transferring sparse files via scp/sftp correctly?

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

 





On 3/7/25 00:38, Damien Miller wrote:
On Wed, 5 Mar 2025, Cedric Blancher wrote:

On Tue, 4 Mar 2025 at 21:22, Chris Rapier <rapier@xxxxxxx> wrote:



On 3/4/25 05:34, Philipp Marek via openssh-unix-dev wrote:
Does OpenSSH scp/sftp mode transfer sparse files correctly, i.e. are
holes skipped and not transferred as chunks of 0 bytes? [1]

We're asking about sparse files in the >= 1PB range, which consists of
multi-TB holes with around 600-2000GB of valid data.


Perhaps rsync would be a good fit here,
it supports --sparse.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

I think one of the issues you are going to face is that SEEK_DATA and
SEEK_HOLE don't seem to be currently supported under OpenBSD. Since
that's the home OS for OpenSSH this could create portability issues.
While you can get around that with the judicious use of defines it means
that the feature set will start to shift between different OSes.

OpenBSD unfortunately does not implement so many other APIs. But other
OS do implement SEEK_DATA+SEEK_HOLE, including FreeBSD, Linux,
Solaris, Illumos and even Cygwin. Even NFS has a SEEK to lookup holes
and data sections in files.
SEEK_HOLE+SEEK_DATA are also now part of the POSIX standard, so IMO it
is time to face the bug that sparse files are not handled correctly
and fix it

You and the others on this thread are IIRC the first people in sftp's
24 year history to ever ask for sparse file support. Its absence is not
a bug and adding it will almost certainly require new protocol extensions.

This is valid and one of the issues that I brought up with my co-developer when he brought this to my attention. I am not convinced it is needed except in a few corner cases as I tend to view SSH more as a transport mechanism than anything else. I don't see the need to replicate what is adequately handled by other tools. That said, I do think it's an interesting problem.

Being pushy with vounteer developers, telling us what our priorities
should be, assigning us work, etc. will not have the result you want.

Personally, I'm not asking any of the developers to do this work. I really do apologize if it came across that way. I think it's kind of a big ask and adequately handled by the use of rsync in most user cases. It is something I *might* look at doing because I can see the value for people in my community of HPC science users (which is who I develop HPN-SSH for). However, I, like you, need to balance that against other development priorities. This is on my 'maybe' list but that's about it.

If you want this to happen, I recommend starting by figuring out what
protocol extensions need to be made, and how to support sparse files
on system without SEEK_DATA/HOLE - it should be pretty to do this on
upload without these flags and without extensions.

Avoiding the use of SEEK_HOLE and SEEK_DATA makes sense from a portability perspective and, after looking at rsync, seems feasible. That said, handling this in terms of the protocol is important. I'm willing, at times, to bend the protocol but not break it.

Chris
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev



[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux