diff options
-rw-r--r-- | CMakeLists.txt | 15 | ||||
-rw-r--r-- | README.md | 124 | ||||
-rw-r--r-- | doc/README.md | 11 | ||||
-rw-r--r-- | doc/mscp.1.in | 316 | ||||
-rw-r--r-- | doc/mscp.rst | 210 |
5 files changed, 569 insertions, 107 deletions
diff --git a/CMakeLists.txt b/CMakeLists.txt index d04f2a9..b6f8beb 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -142,6 +142,21 @@ target_compile_options(mscp PRIVATE ${MSCP_COMPILE_OPTS}) install(TARGETS mscp RUNTIME DESTINATION bin) +# mscp manpage and document +configure_file( + ${mscp_SOURCE_DIR}/doc/mscp.1.in + ${PROJECT_BINARY_DIR}/mscp.1) + +add_custom_target(update-mscp-rst + COMMENT "Update doc/mscp.rst from mscp.1.in" + WORKING_DIRECTORY ${PROJECT_BINARY_DIR} + COMMAND + pandoc -s -f man mscp.1 -t rst -o ${PROJECT_SOURCE_DIR}/doc/mscp.rst) + +install(FILES ${PROJECT_BINARY_DIR}/mscp.1 + DESTINATION ${CMAKE_INSTALL_MANDIR}/man1) + + # Test add_test(NAME pytest COMMAND python3 -m pytest -v @@ -3,17 +3,20 @@ [](https://github.com/upa/mscp/actions/workflows/build-ubuntu.yml) [](https://github.com/upa/mscp/actions/workflows/build-macos.yml) [](https://github.com/upa/mscp/actions/workflows/test.yml) -`mscp`, a variant of `scp`, copies files over multiple ssh (SFTP) -connections. Multiple threads and connections in mscp transfer (1) -multiple files simultaneously and (2) a large file in parallel. It -would shorten the waiting time for transferring a lot of/large files -over networks. +`mscp`, a variant of `scp`, copies files over multiple SSH (SFTP) +connections by multiple threads. It enables transferring (1) multiple +files simultaneously and (2) a large file in parallel, reducing the +transfer time for a lot of/large files over networks. -You can use `mscp` like `scp`, for example, `mscp -user@example.com:srcfile /tmp/dstfile`. Remote hosts only need to run -standard `sshd` supporting the SFTP subsystem (e.g. openssh-server), -and you need to be able to ssh to the hosts as usual. `mscp` does not -require anything else. +You can use `mscp` like `scp`, for example: + +```shell-session +$ mscp user@example.com:srcfile /tmp/dstfile +``` + +Remote hosts only need to run standard `sshd` supporting the SFTP +subsystem (e.g. openssh-server), and you need to be able to ssh to the +hosts as usual. `mscp` does not require anything else. https://github.com/upa/mscp/assets/184632/19230f57-be7f-4ef0-98dd-cb4c460f570d @@ -62,7 +65,7 @@ chmod 755 /usr/local/bin/mscp ## Build -mscp depends on a patched [libssh](https://www.libssh.org/). The +mscp depends on a patched [libssh](https://www.libssh.org/). The patch introduces asynchronous SFTP Write, which is derived from https://github.com/limes-datentechnik-gmbh/libssh (see [Re: SFTP Write async](https://archive.libssh.org/libssh/2020-06/0000004.html)). @@ -94,105 +97,12 @@ make # install the mscp binary to CMAKE_INSTALL_PREFIX/bin (usually /usr/local/bin) make install ``` + Source tar balls (`mscp-X.X.X.tar.gz`, not `Source code`) in [Releases page](https://github.com/upa/mscp/releases) contains the patched version of libssh. So you can start from cmake with it. -## Run - -- Usage - -```console -$ mscp -mscp v0.0.8: copy files over multiple ssh connections - -Usage: mscp [vqDHdNh] [-n nr_conns] [-m coremask] [-u max_startups] - [-s min_chunk_sz] [-S max_chunk_sz] [-a nr_ahead] [-b buf_sz] - [-l login_name] [-p port] [-i identity_file] - [-c cipher_spec] [-M hmac_spec] [-C compress] source ... target -``` - -- Example: copy a 15GB file on memory over a 100Gbps link - - Two Intel Xeon Gold 6130 machines directly connected with Intel E810 100Gbps NICs. - - Default `openssh-server` runs on the remote host. - -```console -$ mscp /var/ram/test.img 10.0.0.1:/var/ram/ -[======================================] 100% 15GB/15GB 1.7GB/s 00:00 ETA -``` - -```console -# with some optimizations. top speed reaches 3.0GB/s. -$ mscp -n 5 -m 0x1f -c aes128-gcm@openssh.com /var/ram/test.img 10.0.0.1:/var/ram/ -[======================================] 100% 15GB/15GB 2.4GB/s 00:00 ETA -``` - -- `-v` option increments verbose output level. - -```console -$ mscp test 10.0.0.1: -[=======================================] 100% 49B /49B 198.8B/s 00:00 ETA -``` - -```console -$ mscp -vv test 10.0.0.1: -file: test/test1 -> ./test/test1 -file: test/testdir/asdf -> ./test/testdir/asdf -file: test/testdir/qwer -> ./test/testdir/qwer -file: test/test2 -> ./test/test2 -we have only 4 chunk(s). set number of connections to 4 -connecting to localhost for a copy thread... -connecting to localhost for a copy thread... -connecting to localhost for a copy thread... -copy start: test/test1 -copy start: test/test2 -copy start: test/testdir/asdf -copy start: test/testdir/qwer -copy done: test/test1 -copy done: test/test2 -copy done: test/testdir/qwer -copy done: test/testdir/asdf -[=======================================] 100% 49B /49B 198.1B/s 00:00 ETA -``` - -- Full usage - -```console -$ mscp -h -mscp v0.0.9-11-g5802679: copy files over multiple ssh connections - -Usage: mscp [vqDHdNh] [-n nr_conns] [-m coremask] [-u max_startups] - [-s min_chunk_sz] [-S max_chunk_sz] [-a nr_ahead] [-b buf_sz] - [-l login_name] [-p port] [-F ssh_config] [-i identity_file] - [-c cipher_spec] [-M hmac_spec] [-C compress] source ... target - - -n NR_CONNECTIONS number of connections (default: floor(log(cores)*2)+1) - -m COREMASK hex value to specify cores where threads pinned - -u MAX_STARTUPS number of concurrent outgoing connections (default: 8) - -s MIN_CHUNK_SIZE min chunk size (default: 64MB) - -S MAX_CHUNK_SIZE max chunk size (default: filesize/nr_conn) - - -a NR_AHEAD number of inflight SFTP commands (default: 32) - -b BUF_SZ buffer size for i/o and transfer - - -v increment verbose output level - -q disable output - -D dry run. check copy destinations with -vvv - -r no effect - - -l LOGIN_NAME login name - -p PORT port number - -F CONFIG path to user ssh config (default ~/.ssh/config) - -i IDENTITY identity file for public key authentication - -c CIPHER cipher spec - -M HMAC hmac spec - -C COMPRESS enable compression: yes, no, zlib, zlib@openssh.com - -H disable hostkey check - -d increment ssh debug output level - -N enable Nagle's algorithm (default disabled) - -h print this help -``` +## Documentation -Note: mscp is still under development, and the author is not -responsible for any accidents due to mscp. +[manpage](/doc/mscp.rst) is available.
\ No newline at end of file diff --git a/doc/README.md b/doc/README.md new file mode 100644 index 0000000..d4bf747 --- /dev/null +++ b/doc/README.md @@ -0,0 +1,11 @@ + +# Document + +The base file of documents is `mscp.1.in`. The manpage of mscp and +`doc/mscp.rst` are generated from `mscp.1.in`. + +When `mscp.1.in` is changed, update `doc/mscp.rst` by: + +1. `cd build` +2. `cmake ..` +3. `make update-mscp-rst`
\ No newline at end of file diff --git a/doc/mscp.1.in b/doc/mscp.1.in new file mode 100644 index 0000000..545ea09 --- /dev/null +++ b/doc/mscp.1.in @@ -0,0 +1,316 @@ +.TH MSCP 1 "@MSCP_BUILD_VERSION@" "mscp" "User Commands" + +.SH NAME +mscp \- copy files over multiple SSH connections + +.SH SYNOPSIS + +.B mscp +.RB [ \-vqDHdNh ] +[\c +.BI \-n \ NR_CONNECTIONS\c +] +[\c +.BI \-m \ COREMASK\c +] +[\c +.BI \-u \ MAX_STARTUPS\c +] +[\c +.BI \-I \ INTERVAL\c +] +[\c +.BI \-s \ MIN_CHUNK_SIZE\c +] +[\c +.BI \-S \ MAX_CHUNK_SIZE\c +] +[\c +.BI \-a \ NR_AHEAD\c +] +[\c +.BI \-b \ BUF_SIZE\c +] +[\c +.BI \-l \ LOGIN_NAME\c +] +[\c +.BR \-p |\c +.BI \-P \ PORT\c +] +[\c +.BI \-F \ CONFIG\c +] +[\c +.BI \-i \ IDENTITY\c +] +[\c +.BI \-c \ CIPHER\c +] +[\c +.BI \-M \ HMAC\c +] +[\c +.BI \-C \ COMPRESS\c +] +.I source ... target + +.SH DESCRIPTION + +.PP +.B mscp +copies files over multiple SSH (SFTP) connections by multiple +threads. It enables transferring (1) multiple files simultaneously and +(2) a large file in parallel, reducing the transfer time for a lot +of/large files over networks. + +.PP +The usage of +.B mscp +imitates the +.B scp +command of +.I OpenSSH, +for example: + +.nf + $ mscp srcfile user@example.com:dstfile +.fi + +Remote hosts only need to run standard +.B sshd +supporting the SFTP subsystem, and users need to be able to +.B ssh +to the hosts as usual. +.B mscp +does not require anything else. + +.PP +.B mscp +uses +.UR https://\:www\:.libssh\:.org +libssh +.UE +as its SSH implementation. Thus, supported SSH features, for example, +authentication, encryption, and various options in ssh_config, follow +what +.I libssh +supports. + +.SH OPTIONS +.TP +.B \-n \fINR_CONNECTIONS\fR +Specifies the number of SSH connections. The default value is +calculated from the number of CPU cores on the host with the following +formula: floor(log(nr_cores)*2)+1. + +.TP +.B \-m \fICOREMASK\fR +Configures CPU cores to be used by the hexadecimal bitmask. All CPU +cores are used by default. + +.TP +.B \-u \fIMAX_STARTUPS\fR +Specifies the number of concurrent outgoing SSH connections. +.B sshd +limits the number of simultaneous SSH connection attempts by +.I MaxStartups +in +.I sshd_config. +The default +.I MaxStartups +is 10; thus, we set the default MAX_STARTUPS 8. + +.TP +.B \-I \fIINTERVAL\fR +Specifies the interval (in seconds) between SSH connection +attempts. Some firewall products treat SSH connection attempts from a +single source IP address for a short period as a brute force attack. +This option inserts intervals between the attempts to avoid being +determined as an attack. The default value is 0. + +.TP +.B \-s \fIMIN_CHUNK_SIZE\fR +Specifies the minimum chunk size. +.B mscp +divides a file into chunks and copies the chunks in parallel. + +.TP +.B \-S \fIMAX_CHUNK_SIZE\fR +Specifies the maximum chunk size. The default is file size divided by +the number of connections. + +.TP +.B \-a \fINR_AHEAD\fR +Specifies the number of inflight SFTP commands. The default value is +32. + +.TP +.B \-b \fIBUF_SIZE\fR +Specifies the buffer size for I/O and transfer over SFTP. The default +value is 16384. Note that the SSH specification restricts buffer size +delivered over SSH. Changing this value is not recommended at present. + +.TP +.B \-v +Increments the verbose output level. + +.TP +.B \-q +Quiet mode: turns off all outputs. + +.TP +.B \-D +Dry-run mode: it scans source files to be copied, calculates chunks, +and resolves destination file paths. Dry-run mode with +.B -vv +option enables confirming files to be copied and their destination +paths. + +.TP +.B \-r +No effect. +.B mscp +copies recursively if a source path is a directory. This option exists +for just compatibility. + +.TP +.B \-l \fILOGIN_NAME\fR +Specifies the username to log in on the remote machine as with +.I ssh(1). + +.TP +.B \-p,\-P \fIPORT\fR +Specifies the port number to connect to on the remote machine as with +ssh(1) and scp(1). + +.TP +.B \-F \fICONFIG\fR +Specifies an alternative per-user ssh configuration file. Note that +acceptable options in the configuration file are what +.I libssh +supports. + +.TP +.B \-i \fIIDENTITY\fR +Specifies the identity file for public key authentication. + +.TP +.B \-c \fICIPHER\fR +Selects the cipher to use for encrypting the data transfer. See +.UR https://\:www\:.libssh\:.org/\:features/ +libssh features +.UE . + +.TP +.B \-M \fIHMAC\fR +Specifies MAC hash algorithms. See +.UR https://\:www\:.libssh\:.org/\:features/ +libssh features +.UE . + +.TP +.B \-C \fICOMPRESS\fR +Enables compression: yes, no, zlib, zlib@openssh.com. The default is +none. See +.UR https://\:www\:.libssh\:.org/\:features/ +libssh features +.UE . + +.TP +.B \-H +Disables hostkey checking. + +.TP +.B \-d +Increments the ssh debug output level. + +.TP +.B \-N +Enables Nagle's algorithm. It is disabled by default. + +.TP +.B \-h +Prints help. + +.SH EXIT STATUS +Exit status is 0 on success, and >0 if an error occurs. + +.SH NOTES + +.PP +.B mscp +uses glob(3) for globbing pathnames, including matching patterns for +local and remote paths. However, globbing on the +.I remote +side does not work with musl libc (used in Alpine Linux and the +single-binary version of mscp) because musl libc does not support +GLOB_ALTDIRFUNC. + +.PP +.B mscp +does not support remote-to-remote copy, which +.B scp +supports. + +.SH EXAMPLES + +.PP +Copy a local file to a remote host with different name: + +.nf + $ mscp ~/src-file 10.0.0.1:copied-file +.fi + +.PP +Copy a local file and a directory to /tmp at a remote host: + +.nf + $ mscp ~/src-file dir1 10.0.0.1:/tmp +.fi + +.PP +In a long fat network, following options might improve performance: + +.nf + $ mscp -n 64 -m 0xffff -a 64 -c aes128-gcm@openssh.com src 10.0.0.1: +.fi + +.B -n +increases the number of SSH connections than default, +.B -m +pins threads to specific CPU cores, +.B -a +increases asynchronous inflight SFTP WRITE/READ commands, and +.B -c aes128-gcm@openssh.com +will be faster than the default chacha20-poly1305 cipher, particularly +on hosts that support AES-NI. + + + +.SH "SEE ALSO" +.BR scp (1), +.BR ssh (1), +.BR sshd (8). + +.SH "PAPER REFERENCE" + + +Ryo Nakamura and Yohei Kuga. 2023. Multi-threaded scp: Easy and Fast +File Transfer over SSH. In Practice and Experience in Advanced +Research Computing (PEARC '23). Association for Computing Machinery, +New York, NY, USA, 320–323. +.UR https://\:doi\:.org/\:10.1145/\:3569951.3597582 +DOI +.UE . + + +.SH CONTACT INFROMATION +.PP +For pathces, bug reports, or feature requests, please open an issue on +.UR https://\:github\:.com/\:upa/\:mscp +GitHub +.UE . + +.SH AUTHORS +Ryo Nakamura <upa@haeena.net> diff --git a/doc/mscp.rst b/doc/mscp.rst new file mode 100644 index 0000000..cd74f04 --- /dev/null +++ b/doc/mscp.rst @@ -0,0 +1,210 @@ +==== +MSCP +==== + +:Date: v0.1.2-14-g24617d2 + +NAME +==== + +mscp - copy files over multiple SSH connections + +SYNOPSIS +======== + +**mscp** [**-vqDHdNh**] [ **-n**\ *NR_CONNECTIONS* ] [ +**-m**\ *COREMASK* ] [ **-u**\ *MAX_STARTUPS* ] [ **-I**\ *INTERVAL* ] [ +**-s**\ *MIN_CHUNK_SIZE* ] [ **-S**\ *MAX_CHUNK_SIZE* ] [ +**-a**\ *NR_AHEAD* ] [ **-b**\ *BUF_SIZE* ] [ **-l**\ *LOGIN_NAME* ] [ +**-p**\ \| **-P**\ *PORT* ] [ **-F**\ *CONFIG* ] [ **-i**\ *IDENTITY* ] +[ **-c**\ *CIPHER* ] [ **-M**\ *HMAC* ] [ **-C**\ *COMPRESS* ] *source +... target* + +DESCRIPTION +=========== + +**mscp** copies files over multiple SSH (SFTP) connections by multiple +threads. It enables transferring (1) multiple files simultaneously and +(2) a large file in parallel, reducing the transfer time for a lot +of/large files over networks. + +The usage of **mscp** imitates the **scp** command of *OpenSSH,* for +example: + +:: + + $ mscp srcfile user@example.com:dstfile + +Remote hosts only need to run standard **sshd** supporting the SFTP +subsystem, and users need to be able to **ssh** to the hosts as usual. +**mscp** does not require anything else. + +**mscp** uses `libssh <https://www.libssh.org>`__ as its SSH +implementation. Thus, supported SSH features, for example, +authentication, encryption, and various options in ssh_config, follow +what *libssh* supports. + +OPTIONS +======= + +**-n NR_CONNECTIONS** + Specifies the number of SSH connections. The default value is + calculated from the number of CPU cores on the host with the + following formula: floor(log(nr_cores)*2)+1. + +**-m COREMASK** + Configures CPU cores to be used by the hexadecimal bitmask. All CPU + cores are used by default. + +**-u MAX_STARTUPS** + Specifies the number of concurrent outgoing SSH connections. **sshd** + limits the number of simultaneous SSH connection attempts by + *MaxStartups* in *sshd_config.* The default *MaxStartups* is 10; + thus, we set the default MAX_STARTUPS 8. + +**-I INTERVAL** + Specifies the interval (in seconds) between SSH connection attempts. + Some firewall products treat SSH connection attempts from a single + source IP address for a short period as a brute force attack. This + option inserts intervals between the attempts to avoid being + determined as an attack. The default value is 0. + +**-s MIN_CHUNK_SIZE** + Specifies the minimum chunk size. **mscp** divides a file into chunks + and copies the chunks in parallel. + +**-S MAX_CHUNK_SIZE** + Specifies the maximum chunk size. The default is file size divided by + the number of connections. + +**-a NR_AHEAD** + Specifies the number of inflight SFTP commands. The default value is + 32. + +**-b BUF_SIZE** + Specifies the buffer size for I/O and transfer over SFTP. The default + value is 16384. Note that the SSH specification restricts buffer size + delivered over SSH. Changing this value is not recommended at + present. + +**-v** + Increments the verbose output level. + +**-q** + Quiet mode: turns off all outputs. + +**-D** + Dry-run mode: it scans source files to be copied, calculates chunks, + and resolves destination file paths. Dry-run mode with **-vv** option + enables confirming files to be copied and their destination paths. + +**-r** + No effect. **mscp** copies recursively if a source path is a + directory. This option exists for just compatibility. + +**-l LOGIN_NAME** + Specifies the username to log in on the remote machine as with + *ssh(1).* + +**-p,-P PORT** + Specifies the port number to connect to on the remote machine as with + ssh(1) and scp(1). + +**-F CONFIG** + Specifies an alternative per-user ssh configuration file. Note that + acceptable options in the configuration file are what *libssh* + supports. + +**-i IDENTITY** + Specifies the identity file for public key authentication. + +**-c CIPHER** + Selects the cipher to use for encrypting the data transfer. See + `libssh features <https://www.libssh.org/features/>`__. + +**-M HMAC** + Specifies MAC hash algorithms. See `libssh + features <https://www.libssh.org/features/>`__. + +**-C COMPRESS** + Enables compression: yes, no, zlib, zlib@openssh.com. The default is + none. See `libssh features <https://www.libssh.org/features/>`__. + +**-H** + Disables hostkey checking. + +**-d** + Increments the ssh debug output level. + +**-N** + Enables Nagle's algorithm. It is disabled by default. + +**-h** + Prints help. + +EXIT STATUS +=========== + +Exit status is 0 on success, and >0 if an error occurs. + +NOTES +===== + +**mscp** uses glob(3) for globbing pathnames, including matching +patterns for local and remote paths. However, globbing on the *remote* +side does not work with musl libc (used in Alpine Linux and the +single-binary version of mscp) because musl libc does not support +GLOB_ALTDIRFUNC. + +**mscp** does not support remote-to-remote copy, which **scp** supports. + +EXAMPLES +======== + +Copy a local file to a remote host with different name: + +:: + + $ mscp ~/src-file 10.0.0.1:copied-file + +Copy a local file and a directory to /tmp at a remote host: + +:: + + $ mscp ~/src-file dir1 10.0.0.1:/tmp + +In a long fat network, following options might improve performance: + +:: + + $ mscp -n 64 -m 0xffff -a 64 -c aes128-gcm@openssh.com src 10.0.0.1: + +**-n** increases the number of SSH connections than default, **-m** pins +threads to specific CPU cores, **-a** increases asynchronous inflight +SFTP WRITE/READ commands, and **-c aes128-gcm@openssh.com** will be +faster than the default chacha20-poly1305 cipher, particularly on hosts +that support AES-NI. + +SEE ALSO +======== + +**scp**\ (1), **ssh**\ (1), **sshd**\ (8). + +PAPER REFERENCE +=============== + +Ryo Nakamura and Yohei Kuga. 2023. Multi-threaded scp: Easy and Fast +File Transfer over SSH. In Practice and Experience in Advanced Research +Computing (PEARC '23). Association for Computing Machinery, New York, +NY, USA, 320–323. `DOI <https://doi.org/10.1145/3569951.3597582>`__. + +CONTACT INFROMATION +=================== + +For pathces, bug reports, or feature requests, please open an issue on +`GitHub <https://github.com/upa/mscp>`__. + +AUTHORS +======= + +Ryo Nakamura <upa@haeena.net> |