diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-28 15:55:15 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-28 15:55:15 +0000 |
commit | 02ad08238d02c56e16fc99788c732ff5e77a1759 (patch) | |
tree | 61ad5e18868748f4705e487b65c847377bb7031c /src/parallel_design.pod | |
parent | Initial commit. (diff) | |
download | parallel-upstream/20221122+ds.tar.xz parallel-upstream/20221122+ds.zip |
Adding upstream version 20221122+ds.upstream/20221122+dsupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/parallel_design.pod')
-rw-r--r-- | src/parallel_design.pod | 1477 |
1 files changed, 1477 insertions, 0 deletions
diff --git a/src/parallel_design.pod b/src/parallel_design.pod new file mode 100644 index 0000000..85aee12 --- /dev/null +++ b/src/parallel_design.pod @@ -0,0 +1,1477 @@ +#!/usr/bin/perl -w + +# SPDX-FileCopyrightText: 2021-2022 Ole Tange, http://ole.tange.dk and Free Software and Foundation, Inc. +# SPDX-License-Identifier: GFDL-1.3-or-later +# SPDX-License-Identifier: CC-BY-SA-4.0 + +=encoding utf8 + + +=head1 Design of GNU Parallel + +This document describes design decisions made in the development of +GNU B<parallel> and the reasoning behind them. It will give an +overview of why some of the code looks the way it does, and will help +new maintainers understand the code better. + + +=head2 One file program + +GNU B<parallel> is a Perl script in a single file. It is object +oriented, but contrary to normal Perl scripts each class is not in its +own file. This is due to user experience: The goal is that in a pinch +the user will be able to get GNU B<parallel> working simply by copying +a single file: No need to mess around with environment variables like +PERL5LIB. + + +=head2 Choice of programming language + +GNU B<parallel> is designed to be able to run on old systems. That +means that it cannot depend on a compiler being installed - and +especially not a compiler for a language that is younger than 20 years +old. + +The goal is that you can use GNU B<parallel> on any system, even if +you are not allowed to install additional software. + +Of all the systems I have experienced, I have yet to see a system that +had GCC installed that did not have Perl. The same goes for Rust, Go, +Haskell, and other younger languages. I have, however, seen systems +with Perl without any of the mentioned compilers. + +Most modern systems also have either Python2 or Python3 installed, but +you still cannot be certain which version, and since Python2 cannot +run under Python3, Python is not an option. + +Perl has the added benefit that implementing the {= perlexpr =} +replacement string was fairly easy. + +The primary drawback is that Perl is slow. So there is an overhead of +3-10 ms/job and 1 ms/MB output (and even more if you use B<--tag>). + + +=head2 Old Perl style + +GNU B<parallel> uses some old, deprecated constructs. This is due to a +goal of being able to run on old installations. Currently the target +is CentOS 3.9 and Perl 5.8.0. + + +=head2 Scalability up and down + +The smallest system GNU B<parallel> is tested on is a 32 MB ASUS +WL500gP. The largest is a 2 TB 128-core machine. It scales up to +around 100 machines - depending on the duration of each job. + + +=head2 Exponentially back off + +GNU B<parallel> busy waits. This is because the reason why a job is +not started may be due to load average (when using B<--load>), and +thus it will not make sense to just wait for a job to finish. Instead +the load average must be rechecked regularly. Load average is not the +only reason: B<--timeout> has a similar problem. + +To not burn up too much CPU GNU B<parallel> sleeps exponentially +longer and longer if nothing happens, maxing out at 1 second. + + +=head2 Shell compatibility + +It is a goal to have GNU B<parallel> work equally well in any +shell. However, in practice GNU B<parallel> is being developed in +B<bash> and thus testing in other shells is limited to reported bugs. + +When an incompatibility is found there is often not an easy fix: +Fixing the problem in B<csh> often breaks it in B<bash>. In these +cases the fix is often to use a small Perl script and call that. + + +=head2 env_parallel + +B<env_parallel> is a dummy shell script that will run if +B<env_parallel> is not an alias or a function and tell the user how to +activate the alias/function for the supported shells. + +The alias or function will copy the current environment and run the +command with GNU B<parallel> in the copy of the environment. + +The problem is that you cannot access all of the current environment +inside Perl. E.g. aliases, functions and unexported shell variables. + +The idea is therefore to take the environment and put it in +B<$PARALLEL_ENV> which GNU B<parallel> prepends to every command. + +The only way to have access to the environment is directly from the +shell, so the program must be written in a shell script that will be +sourced and there has to deal with the dialect of the relevant shell. + + +=head3 env_parallel.* + +These are the files that implements the alias or function +B<env_parallel> for a given shell. It could be argued that these +should be put in some obscure place under /usr/lib, but by putting +them in your path it becomes trivial to find the path to them and +B<source> them: + + source `which env_parallel.foo` + +The beauty is that they can be put anywhere in the path without the +user having to know the location. So if the user's path includes +/afs/bin/i386_fc5 or /usr/pkg/parallel/bin or +/usr/local/parallel/20161222/sunos5.6/bin the files can be put in the +dir that makes most sense for the sysadmin. + + +=head3 env_parallel.bash / env_parallel.sh / env_parallel.ash / +env_parallel.dash / env_parallel.zsh / env_parallel.ksh / +env_parallel.mksh + +B<env_parallel.(bash|sh|ash|dash|ksh|mksh|zsh)> defines the function +B<env_parallel>. It uses B<alias> and B<typeset> to dump the +configuration (with a few exceptions) into B<$PARALLEL_ENV> before +running GNU B<parallel>. + +After GNU B<parallel> is finished, B<$PARALLEL_ENV> is deleted. + + +=head3 env_parallel.csh + +B<env_parallel.csh> has two purposes: If B<env_parallel> is not an +alias: make it into an alias that sets B<$PARALLEL> with arguments +and calls B<env_parallel.csh>. + +If B<env_parallel> is an alias, then B<env_parallel.csh> uses +B<$PARALLEL> as the arguments for GNU B<parallel>. + +It exports the environment by writing a variable definition to a file +for each variable. The definitions of aliases are appended to this +file. Finally the file is put into B<$PARALLEL_ENV>. + +GNU B<parallel> is then run and B<$PARALLEL_ENV> is deleted. + + +=head3 env_parallel.fish + +First all functions definitions are generated using a loop and +B<functions>. + +Dumping the scalar variable definitions is harder. + +B<fish> can represent non-printable characters in (at least) 2 +ways. To avoid problems all scalars are converted to \XX quoting. + +Then commands to generate the definitions are made and separated by +NUL. + +This is then piped into a Perl script that quotes all values. List +elements will be appended using two spaces. + +Finally \n is converted into \1 because B<fish> variables cannot +contain \n. GNU B<parallel> will later convert all \1 from +B<$PARALLEL_ENV> into \n. + +This is then all saved in B<$PARALLEL_ENV>. + +GNU B<parallel> is called, and B<$PARALLEL_ENV> is deleted. + + +=head2 parset (supported in sh, ash, dash, bash, zsh, ksh, mksh) + +B<parset> is a shell function. This is the reason why B<parset> can +set variables: It runs in the shell which is calling it. + +It is also the reason why B<parset> does not work, when data is piped +into it: B<... | parset ...> makes B<parset> start in a subshell, and +any changes in environment can therefore not make it back to the +calling shell. + + +=head2 Job slots + +The easiest way to explain what GNU B<parallel> does is to assume that +there are a number of job slots, and when a slot becomes available a +job from the queue will be run in that slot. But originally GNU +B<parallel> did not model job slots in the code. Job slots have been +added to make it possible to use B<{%}> as a replacement string. + +While the job sequence number can be computed in advance, the job slot +can only be computed the moment a slot becomes available. So it has +been implemented as a stack with lazy evaluation: Draw one from an +empty stack and the stack is extended by one. When a job is done, push +the available job slot back on the stack. + +This implementation also means that if you re-run the same jobs, you +cannot assume jobs will get the same slots. And if you use remote +executions, you cannot assume that a given job slot will remain on the +same remote server. This goes double since number of job slots can be +adjusted on the fly (by giving B<--jobs> a file name). + + +=head2 Rsync protocol version + +B<rsync> 3.1.x uses protocol 31 which is unsupported by version +2.5.7. That means that you cannot push a file to a remote system using +B<rsync> protocol 31, if the remote system uses 2.5.7. B<rsync> does +not automatically downgrade to protocol 30. + +GNU B<parallel> does not require protocol 31, so if the B<rsync> +version is >= 3.1.0 then B<--protocol 30> is added to force newer +B<rsync>s to talk to version 2.5.7. + + +=head2 Compression + +GNU B<parallel> buffers output in temporary files. B<--compress> +compresses the buffered data. This is a bit tricky because there +should be no files to clean up if GNU B<parallel> is killed by a power +outage. + +GNU B<parallel> first selects a compression program. If the user has +not selected one, the first of these that is in $PATH is used: B<pzstd +lbzip2 pbzip2 zstd pixz lz4 pigz lzop plzip lzip gzip lrz pxz bzip2 +lzma xz clzip>. They are sorted by speed on a 128 core machine. + +Schematically the setup is as follows: + + command started by parallel | compress > tmpfile + cattail tmpfile | uncompress | parallel which reads the output + +The setup is duplicated for both standard output (stdout) and standard +error (stderr). + +GNU B<parallel> pipes output from the command run into the compression +program which saves to a tmpfile. GNU B<parallel> records the pid of +the compress program. At the same time a small Perl script (called +B<cattail> above) is started: It basically does B<cat> followed by +B<tail -f>, but it also removes the tmpfile as soon as the first byte +is read, and it continuously checks if the pid of the compression +program is dead. If the compress program is dead, B<cattail> reads the +rest of tmpfile and exits. + +As most compression programs write out a header when they start, the +tmpfile in practice is removed by B<cattail> after around 40 ms. + +More detailed it works like this: + + bash ( command ) | + sh ( emptywrapper ( bash ( compound compress ) ) >tmpfile ) + cattail ( rm tmpfile; compound decompress ) < tmpfile + +This complex setup is to make sure compress program is only started if +there is input. This means each job will cause 8 processes to run. If +combined with B<--keep-order> these processes will run until the job +has been printed. + + +=head2 Wrapping + +The command given by the user can be wrapped in multiple +templates. Templates can be wrapped in other templates. + + + +=over 15 + +=item B<$COMMAND> + +the command to run. + + +=item B<$INPUT> + +the input to run. + + +=item B<$SHELL> + +the shell that started GNU Parallel. + + +=item B<$SSHLOGIN> + +the sshlogin. + + +=item B<$WORKDIR> + +the working dir. + + +=item B<$FILE> + +the file to read parts from. + + +=item B<$STARTPOS> + +the first byte position to read from B<$FILE>. + + +=item B<$LENGTH> + +the number of bytes to read from B<$FILE>. + + +=item --shellquote + +echo I<Double quoted $INPUT> + + +=item --nice I<pri> + +Remote: See B<The remote system wrapper>. + +Local: B<setpriority(0,0,$nice)> + +=item --cat + + cat > {}; $COMMAND {}; + perl -e '$bash = shift; + $csh = shift; + for(@ARGV) { unlink;rmdir; } + if($bash =~ s/h//) { exit $bash; } + exit $csh;' "$?h" "$status" {}; + +{} is set to B<$PARALLEL_TMP> which is a tmpfile. The Perl script +saves the exit value, unlinks the tmpfile, and returns the exit value +- no matter if the shell is B<bash>/B<ksh>/B<zsh> (using $?) or +B<*csh>/B<fish> (using $status). + +=item --fifo + + perl -e '($s,$c,$f) = @ARGV; + # mkfifo $PARALLEL_TMP + system "mkfifo", $f; + # spawn $shell -c $command & + $pid = fork || exec $s, "-c", $c; + open($o,">",$f) || die $!; + # cat > $PARALLEL_TMP + while(sysread(STDIN,$buf,131072)){ + syswrite $o, $buf; + } + close $o; + # waitpid to get the exit code from $command + waitpid $pid,0; + # Cleanup + unlink $f; + exit $?/256;' $SHELL -c $COMMAND $PARALLEL_TMP + +This is an elaborate way of: mkfifo {}; run B<$COMMAND> in the +background using B<$SHELL>; copying STDIN to {}; waiting for background +to complete; remove {} and exit with the exit code from B<$COMMAND>. + +It is made this way to be compatible with B<*csh>/B<fish>. + +=item --pipepart + + + < $FILE perl -e 'while(@ARGV) { + sysseek(STDIN,shift,0) || die; + $left = shift; + while($read = + sysread(STDIN,$buf, + ($left > 131072 ? 131072 : $left))){ + $left -= $read; + syswrite(STDOUT,$buf); + } + }' $STARTPOS $LENGTH + +This will read B<$LENGTH> bytes from B<$FILE> starting at B<$STARTPOS> +and send it to STDOUT. + +=item --sshlogin $SSHLOGIN + + ssh $SSHLOGIN "$COMMAND" + +=item --transfer + + ssh $SSHLOGIN mkdir -p ./$WORKDIR; + rsync --protocol 30 -rlDzR \ + -essh ./{} $SSHLOGIN:./$WORKDIR; + ssh $SSHLOGIN "$COMMAND" + +Read about B<--protocol 30> in the section B<Rsync protocol version>. + +=item --transferfile I<file> + +<<todo>> + +=item --basefile + +<<todo>> + +=item --return I<file> + + $COMMAND; _EXIT_status=$?; mkdir -p $WORKDIR; + rsync --protocol 30 \ + --rsync-path=cd\ ./$WORKDIR\;\ rsync \ + -rlDzR -essh $SSHLOGIN:./$FILE ./$WORKDIR; + exit $_EXIT_status; + +The B<--rsync-path=cd ...> is needed because old versions of B<rsync> +do not support B<--no-implied-dirs>. + +The B<$_EXIT_status> trick is to postpone the exit value. This makes it +incompatible with B<*csh> and should be fixed in the future. Maybe a +wrapping 'sh -c' is enough? + +=item --cleanup + +$RETURN is the wrapper from B<--return> + + $COMMAND; _EXIT_status=$?; $RETURN; + ssh $SSHLOGIN \(rm\ -f\ ./$WORKDIR/{}\;\ + rmdir\ ./$WORKDIR\ \>\&/dev/null\;\); + exit $_EXIT_status; + +B<$_EXIT_status>: see B<--return> above. + + +=item --pipe + + perl -e 'if(sysread(STDIN, $buf, 1)) { + open($fh, "|-", "@ARGV") || die; + syswrite($fh, $buf); + # Align up to 128k block + if($read = sysread(STDIN, $buf, 131071)) { + syswrite($fh, $buf); + } + while($read = sysread(STDIN, $buf, 131072)) { + syswrite($fh, $buf); + } + close $fh; + exit ($?&127 ? 128+($?&127) : 1+$?>>8) + }' $SHELL -c $COMMAND + +This small wrapper makes sure that B<$COMMAND> will never be run if +there is no data. + +=item --tmux + +<<TODO Fixup with '-quoting>> +mkfifo /tmp/tmx3cMEV && + sh -c 'tmux -S /tmp/tmsaKpv1 new-session -s p334310 -d "sleep .2" >/dev/null 2>&1'; +tmux -S /tmp/tmsaKpv1 new-window -t p334310 -n wc\ 10 \(wc\ 10\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status\ \>\>\ /tmp/tmx3cMEV\&echo\ wc\\\ 10\;\ echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10; +exec perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and exit($1);exit$c' /tmp/tmx3cMEV + + +mkfifo I<tmpfile.tmx>; +tmux -S <tmpfile.tms> new-session -s pI<PID> -d 'sleep .2' >&/dev/null; +tmux -S <tmpfile.tms> new-window -t pI<PID> -n <<shell quoted input>> \(<<shell quoted input>>\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status\ \>\>\ I<tmpfile.tmx>\&echo\ <<shell double quoted input>>\;echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10; +exec perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and exit($1);exit$c' I<tmpfile.tmx> + +First a FIFO is made (.tmx). It is used for communicating exit +value. Next a new tmux session is made. This may fail if there is +already a session, so the output is ignored. If all job slots finish +at the same time, then B<tmux> will close the session. A temporary +socket is made (.tms) to avoid a race condition in B<tmux>. It is +cleaned up when GNU B<parallel> finishes. + +The input is used as the name of the windows in B<tmux>. When the job +inside B<tmux> finishes, the exit value is printed to the FIFO (.tmx). +This FIFO is opened by B<perl> outside B<tmux>, and B<perl> then +removes the FIFO. B<Perl> blocks until the first value is read from +the FIFO, and this value is used as exit value. + +To make it compatible with B<csh> and B<bash> the exit value is +printed as: $?h/$status and this is parsed by B<perl>. + +There is a bug that makes it necessary to print the exit value 3 +times. + +Another bug in B<tmux> requires the length of the tmux title and +command to not have certain limits. When inside these limits, 75 '\ ' +are added to the title to force it to be outside the limits. + +You can map the bad limits using: + + perl -e 'sub r { int(rand(shift)).($_[0] && "\t".r(@_)) } print map { r(@ARGV)."\n" } 1..10000' 1600 1500 90 | + perl -ane '$F[0]+$F[1]+$F[2] < 2037 and print ' | + parallel --colsep '\t' --tagstring '{1}\t{2}\t{3}' tmux -S /tmp/p{%}-'{=3 $_="O"x$_ =}' \ + new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm -f /tmp/p{%}-O*' + + perl -e 'sub r { int(rand(shift)).($_[0] && "\t".r(@_)) } print map { r(@ARGV)."\n" } 1..10000' 17000 17000 90 | + parallel --colsep '\t' --tagstring '{1}\t{2}\t{3}' \ + tmux -S /tmp/p{%}-'{=3 $_="O"x$_ =}' new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm /tmp/p{%}-O*' + > value.csv 2>/dev/null + + R -e 'a<-read.table("value.csv");X11();plot(a[,1],a[,2],col=a[,4]+5,cex=0.1);Sys.sleep(1000)' + +For B<tmux 1.8> 17000 can be lowered to 2100. + +The interesting areas are title 0..1000 with (title + whole command) +in 996..1127 and 9331..9636. + +=back + +The ordering of the wrapping is important: + +=over 5 + +=item * + +$PARALLEL_ENV which is set in env_parallel.* must be prepended to the +command first, as the command may contain exported variables or +functions. + +=item * + +B<--nice>/B<--cat>/B<--fifo> should be done on the remote machine + +=item * + +B<--pipepart>/B<--pipe> should be done on the local machine inside B<--tmux> + +=back + + +=head2 Convenience options --nice --basefile --transfer --return +--cleanup --tmux --group --compress --cat --fifo --workdir --tag +--tagstring + +These are all convenience options that make it easier to do a +task. But more importantly: They are tested to work on corner cases, +too. Take B<--nice> as an example: + + nice parallel command ... + +will work just fine. But when run remotely, you need to move the nice +command so it is being run on the server: + + parallel -S server nice command ... + +And this will again work just fine, as long as you are running a +single command. When you are running a composed command you need nice +to apply to the whole command, and it gets harder still: + + parallel -S server -q nice bash -c 'command1 ...; cmd2 | cmd3' + +It is not impossible, but by using B<--nice> GNU B<parallel> will do +the right thing for you. Similarly when transferring files: It starts +to get hard when the file names contain space, :, `, *, or other +special characters. + +To run the commands in a B<tmux> session you basically just need to +quote the command. For simple commands that is easy, but when commands +contain special characters, it gets much harder to get right. + +B<--compress> not only compresses standard output (stdout) but also +standard error (stderr); and it does so into files, that are open but +deleted, so a crash will not leave these files around. + +B<--cat> and B<--fifo> are easy to do by hand, until you want to clean +up the tmpfile and keep the exit code of the command. + +The real killer comes when you try to combine several of these: Doing +that correctly for all corner cases is next to impossible to do by +hand. + +=head2 --shard + +The simple way to implement sharding would be to: + +=over 5 + +=item 1 + +start n jobs, + +=item 2 + +split each line into columns, + +=item 3 + +select the data from the relevant column + +=item 4 + +compute a hash value from the data + +=item 5 + +take the modulo n of the hash value + +=item 6 + +pass the full line to the jobslot that has the computed value + +=back + +Unfortunately Perl is rather slow at computing the hash value (and +somewhat slow at splitting into columns). + +One solution is to use a compiled language for the splitting and +hashing, but that would go against the design criteria of not +depending on a compiler. + +Luckily those tasks can be parallelized. So GNU B<parallel> starts n +sharders that do step 2-6, and passes blocks of 100k to each of those +in a round robin manner. To make sure these sharders compute the hash +the same way, $PERL_HASH_SEED is set to the same value for all sharders. + +Running n sharders poses a new problem: Instead of having n outputs +(one for each computed value) you now have n outputs for each of the n +values, so in total n*n outputs; and you need to merge these n*n +outputs together into n outputs. + +This can be done by simply running 'parallel -j0 --lb cat ::: +outputs_for_one_value', but that is rather inefficient, as it spawns a +process for each file. Instead the core code from 'parcat' is run, +which is also a bit faster. + +All the sharders and parcats communicate through named pipes that are +unlinked as soon as they are opened. + + +=head2 Shell shock + +The shell shock bug in B<bash> did not affect GNU B<parallel>, but the +solutions did. B<bash> first introduced functions in variables named: +I<BASH_FUNC_myfunc()> and later changed that to +I<BASH_FUNC_myfunc%%>. When transferring functions GNU B<parallel> +reads off the function and changes that into a function definition, +which is copied to the remote system and executed before the actual +command is executed. Therefore GNU B<parallel> needs to know how to +read the function. + +From version 20150122 GNU B<parallel> tries both the ()-version and +the %%-version, and the function definition works on both pre- and +post-shell shock versions of B<bash>. + + +=head2 The remote system wrapper + +The remote system wrapper does some initialization before starting the +command on the remote system. + +=head3 Make quoting unnecessary by hex encoding everything + +When you run B<ssh server foo> then B<foo> has to be quoted once: + + ssh server "echo foo; echo bar" + +If you run B<ssh server1 ssh server2 foo> then B<foo> has to be quoted +twice: + + ssh server1 ssh server2 \'"echo foo; echo bar"\' + +GNU B<parallel> avoids this by packing everyting into hex values and +running a command that does not need quoting: + + perl -X -e GNU_Parallel_worker,eval+pack+q/H10000000/,join+q//,@ARGV + +This command reads hex from the command line and converts that to +bytes that are then eval'ed as a Perl expression. + +The string B<GNU_Parallel_worker> is not needed. It is simply there to +let the user know, that this process is GNU B<parallel> working. + +=head3 Ctrl-C and standard error (stderr) + +If the user presses Ctrl-C the user expects jobs to stop. This works +out of the box if the jobs are run locally. Unfortunately it is not so +simple if the jobs are run remotely. + +If remote jobs are run in a tty using B<ssh -tt>, then Ctrl-C works, +but all output to standard error (stderr) is sent to standard output +(stdout). This is not what the user expects. + +If remote jobs are run without a tty using B<ssh> (without B<-tt>), +then output to standard error (stderr) is kept on stderr, but Ctrl-C +does not kill remote jobs. This is not what the user expects. + +So what is needed is a way to have both. It seems the reason why +Ctrl-C does not kill the remote jobs is because the shell does not +propagate the hang-up signal from B<sshd>. But when B<sshd> dies, the +parent of the login shell becomes B<init> (process id 1). So by +exec'ing a Perl wrapper to monitor the parent pid and kill the child +if the parent pid becomes 1, then Ctrl-C works and stderr is kept on +stderr. + +Ctrl-C does, however, kill the ssh connection, so any output from +a remote dying process is lost. + +To be able to kill all (grand)*children a new process group is +started. + + +=head3 --nice + +B<nice>ing the remote process is done by B<setpriority(0,0,$nice)>. A +few old systems do not implement this and B<--nice> is unsupported on +those. + + +=head3 Setting $PARALLEL_TMP + +B<$PARALLEL_TMP> is used by B<--fifo> and B<--cat> and must point to a +non-exitent file in B<$TMPDIR>. This file name is computed on the +remote system. + + +=head3 The wrapper + +The wrapper looks like this: + + $shell = $PARALLEL_SHELL || $SHELL; + $tmpdir = $TMPDIR || $PARALLEL_REMOTE_TMPDIR; + $nice = $opt::nice; + $termseq = $opt::termseq; + + # Check that $tmpdir is writable + -w $tmpdir || + die("$tmpdir is not writable.". + " Set PARALLEL_REMOTE_TMPDIR"); + # Set $PARALLEL_TMP to a non-existent file name in $TMPDIR + do { + $ENV{PARALLEL_TMP} = $tmpdir."/par". + join"", map { (0..9,"a".."z","A".."Z")[rand(62)] } (1..5); + } while(-e $ENV{PARALLEL_TMP}); + # Set $script to a non-existent file name in $TMPDIR + do { + $script = $tmpdir."/par". + join"", map { (0..9,"a".."z","A".."Z")[rand(62)] } (1..5); + } while(-e $script); + # Create a script from the hex code + # that removes itself and runs the commands + open($fh,">",$script) || die; + # ' needed due to rc-shell + print($fh("rm \'$script\'\n",$bashfunc.$cmd)); + close $fh; + my $parent = getppid; + my $done = 0; + $SIG{CHLD} = sub { $done = 1; }; + $pid = fork; + unless($pid) { + # Make own process group to be able to kill HUP it later + eval { setpgrp }; + # Set nice value + eval { setpriority(0,0,$nice) }; + # Run the script + exec($shell,$script); + die("exec failed: $!"); + } + while((not $done) and (getppid == $parent)) { + # Parent pid is not changed, so sshd is alive + # Exponential sleep up to 1 sec + $s = $s < 1 ? 0.001 + $s * 1.03 : $s; + select(undef, undef, undef, $s); + } + if(not $done) { + # sshd is dead: User pressed Ctrl-C + # Kill as per --termseq + my @term_seq = split/,/,$termseq; + if(not @term_seq) { + @term_seq = ("TERM",200,"TERM",100,"TERM",50,"KILL",25); + } + while(@term_seq && kill(0,-$pid)) { + kill(shift @term_seq, -$pid); + select(undef, undef, undef, (shift @term_seq)/1000); + } + } + wait; + exit ($?&127 ? 128+($?&127) : 1+$?>>8) + + +=head2 Transferring of variables and functions + +Transferring of variables and functions given by B<--env> is done by +running a Perl script remotely that calls the actual command. The Perl +script sets B<$ENV{>I<variable>B<}> to the correct value before +exec'ing a shell that runs the function definition followed by the +actual command. + +The function B<env_parallel> copies the full current environment into +the environment variable B<PARALLEL_ENV>. This variable is picked up +by GNU B<parallel> and used to create the Perl script mentioned above. + + +=head2 Base64 encoded bzip2 + +B<csh> limits words of commands to 1024 chars. This is often too little +when GNU B<parallel> encodes environment variables and wraps the +command with different templates. All of these are combined and quoted +into one single word, which often is longer than 1024 chars. + +When the line to run is > 1000 chars, GNU B<parallel> therefore +encodes the line to run. The encoding B<bzip2>s the line to run, +converts this to base64, splits the base64 into 1000 char blocks (so +B<csh> does not fail), and prepends it with this Perl script that +decodes, decompresses and B<eval>s the line. + + @GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64"); + eval "@GNU_Parallel"; + + $SIG{CHLD}="IGNORE"; + # Search for bzip2. Not found => use default path + my $zip = (grep { -x $_ } "/usr/local/bin/bzip2")[0] || "bzip2"; + # $in = stdin on $zip, $out = stdout from $zip + my($in, $out,$eval); + open3($in,$out,">&STDERR",$zip,"-dc"); + if(my $perlpid = fork) { + close $in; + $eval = join "", <$out>; + close $out; + } else { + close $out; + # Pipe decoded base64 into 'bzip2 -dc' + print $in (decode_base64(join"",@ARGV)); + close $in; + exit; + } + wait; + eval $eval; + +Perl and B<bzip2> must be installed on the remote system, but a small +test showed that B<bzip2> is installed by default on all platforms +that runs GNU B<parallel>, so this is not a big problem. + +The added bonus of this is that much bigger environments can now be +transferred as they will be below B<bash>'s limit of 131072 chars. + + +=head2 Which shell to use + +Different shells behave differently. A command that works in B<tcsh> +may not work in B<bash>. It is therefore important that the correct +shell is used when GNU B<parallel> executes commands. + +GNU B<parallel> tries hard to use the right shell. If GNU B<parallel> +is called from B<tcsh> it will use B<tcsh>. If it is called from +B<bash> it will use B<bash>. It does this by looking at the +(grand)*parent process: If the (grand)*parent process is a shell, use +this shell; otherwise look at the parent of this (grand)*parent. If +none of the (grand)*parents are shells, then $SHELL is used. + +This will do the right thing if called from: + +=over 2 + +=item * + +an interactive shell + +=item * + +a shell script + +=item * + +a Perl script in `` or using B<system> if called as a single string. + +=back + +While these cover most cases, there are situations where it will fail: + +=over 2 + +=item * + +When run using B<exec>. + +=item * + +When run as the last command using B<-c> from another shell (because +some shells use B<exec>): + + zsh% bash -c "parallel 'echo {} is not run in bash; \ + set | grep BASH_VERSION' ::: This" + +You can work around that by appending '&& true': + + zsh% bash -c "parallel 'echo {} is run in bash; \ + set | grep BASH_VERSION' ::: This && true" + +=item * + +When run in a Perl script using B<system> with parallel as the first +string: + + #!/usr/bin/perl + + system("parallel",'setenv a {}; echo $a',":::",2); + +Here it depends on which shell is used to call the Perl script. If the +Perl script is called from B<tcsh> it will work just fine, but if it +is called from B<bash> it will fail, because the command B<setenv> is +not known to B<bash>. + +=back + +If GNU B<parallel> guesses wrong in these situation, set the shell using +B<$PARALLEL_SHELL>. + + +=head2 Always running commands in a shell + +If the command is a simple command with no redirection and setting of +variables, the command I<could> be run without spawning a +shell. E.g. this simple B<grep> matching either 'ls ' or ' wc E<gt>E<gt> c': + + parallel "grep -E 'ls | wc >> c' {}" ::: foo + +could be run as: + + system("grep","-E","ls | wc >> c","foo"); + +However, as soon as the command is a bit more complex a shell I<must> +be spawned: + + parallel "grep -E 'ls | wc >> c' {} | wc >> c" ::: foo + parallel "LANG=C grep -E 'ls | wc >> c' {}" ::: foo + +It is impossible to tell how B<| wc E<gt>E<gt> c> should be +interpreted without parsing the string (is the B<|> a pipe in shell or +an alternation in a B<grep> regexp? Is B<LANG=C> a command in B<csh> +or setting a variable in B<bash>? Is B<E<gt>E<gt>> redirection or part +of a regexp?). + +On top of this, wrapper scripts will often require a shell to be +spawned. + +The downside is that you need to quote special shell chars twice: + + parallel echo '*' ::: This will expand the asterisk + parallel echo "'*'" ::: This will not + parallel "echo '*'" ::: This will not + parallel echo '\*' ::: This will not + parallel echo \''*'\' ::: This will not + parallel -q echo '*' ::: This will not + +B<-q> will quote all special chars, thus redirection will not work: +this prints '* > out.1' and I<does not> save '*' into the file out.1: + + parallel -q echo "*" ">" out.{} ::: 1 + +GNU B<parallel> tries to live up to Principle Of Least Astonishment +(POLA), and the requirement of using B<-q> is hard to understand, when +you do not see the whole picture. + + +=head2 Quoting + +Quoting depends on the shell. For most shells '-quoting is used for +strings containing special characters. + +For B<tcsh>/B<csh> newline is quoted as \ followed by newline. Other +special characters are also \-quoted. + +For B<rc> everything is quoted using '. + + +=head2 --pipepart vs. --pipe + +While B<--pipe> and B<--pipepart> look much the same to the user, they are +implemented very differently. + +With B<--pipe> GNU B<parallel> reads the blocks from standard input +(stdin), which is then given to the command on standard input (stdin); +so every block is being processed by GNU B<parallel> itself. This is +the reason why B<--pipe> maxes out at around 500 MB/sec. + +B<--pipepart>, on the other hand, first identifies at which byte +positions blocks start and how long they are. It does that by seeking +into the file by the size of a block and then reading until it meets +end of a block. The seeking explains why GNU B<parallel> does not know +the line number and why B<-L/-l> and B<-N> do not work. + +With a reasonable block and file size this seeking is more than 1000 +time faster than reading the full file. The byte positions are then +given to a small script that reads from position X to Y and sends +output to standard output (stdout). This small script is prepended to +the command and the full command is executed just as if GNU +B<parallel> had been in its normal mode. The script looks like this: + + < file perl -e 'while(@ARGV) { + sysseek(STDIN,shift,0) || die; + $left = shift; + while($read = sysread(STDIN,$buf, + ($left > 131072 ? 131072 : $left))){ + $left -= $read; syswrite(STDOUT,$buf); + } + }' startbyte length_in_bytes + +It delivers 1 GB/s per core. + +Instead of the script B<dd> was tried, but many versions of B<dd> do +not support reading from one byte to another and might cause partial +data. See this for a surprising example: + + yes | dd bs=1024k count=10 | wc + + +=head2 --block-size adjustment + +Every time GNU B<parallel> detects a record bigger than +B<--block-size> it increases the block size by 30%. A small +B<--block-size> gives very poor performance; by exponentially +increasing the block size performance will not suffer. + +GNU B<parallel> will waste CPU power if B<--block-size> does not +contain a full record, because it tries to find a full record and will +fail to do so. The recommendation is therefore to use a +B<--block-size> > 2 records, so you always get at least one full +record when you read one block. + +If you use B<-N> then B<--block-size> should be big enough to contain +N+1 records. + + +=head2 Automatic --block-size computation + +With B<--pipepart> GNU B<parallel> can compute the B<--block-size> +automatically. A B<--block-size> of B<-1> will use a block size so +that each jobslot will receive approximately 1 block. B<--block -2> +will pass 2 blocks to each jobslot and B<-I<n>> will pass I<n> blocks +to each jobslot. + +This can be done because B<--pipepart> reads from files, and we can +compute the total size of the input. + + +=head2 --jobs and --onall + +When running the same commands on many servers what should B<--jobs> +signify? Is it the number of servers to run on in parallel? Is it the +number of jobs run in parallel on each server? + +GNU B<parallel> lets B<--jobs> represent the number of servers to run +on in parallel. This is to make it possible to run a sequence of +commands (that cannot be parallelized) on each server, but run the +same sequence on multiple servers. + + +=head2 --shuf + +When using B<--shuf> to shuffle the jobs, all jobs are read, then they +are shuffled, and finally executed. When using SQL this makes the +B<--sqlmaster> be the part that shuffles the jobs. The B<--sqlworker>s +simply executes according to Seq number. + + +=head2 --csv + +B<--pipepart> is incompatible with B<--csv> because you can have +records like: + + a,b,c + a," + a,b,c + a,b,c + a,b,c + ",c + a,b,c + +Here the second record contains a multi-line field that looks like +records. Since B<--pipepart> does not read then whole file when +searching for record endings, it may start reading in this multi-line +field, which would be wrong. + + +=head2 Buffering on disk + +GNU B<parallel> buffers output, because if output is not buffered you +have to be ridiculously careful on sizes to avoid mixing of outputs +(see excellent example on https://catern.com/posts/pipes.html). + +GNU B<parallel> buffers on disk in $TMPDIR using files, that are +removed as soon as they are created, but which are kept open. So even +if GNU B<parallel> is killed by a power outage, there will be no files +to clean up afterwards. Another advantage is that the file system is +aware that these files will be lost in case of a crash, so it does +not need to sync them to disk. + +It gives the odd situation that a disk can be fully used, but there +are no visible files on it. + + +=head3 Partly buffering in memory + +When using output formats SQL and CSV then GNU Parallel has to read +the whole output into memory. When run normally it will only read the +output from a single job. But when using B<--linebuffer> every line +printed will also be buffered in memory - for all jobs currently +running. + +If memory is tight, then do not use the output format SQL/CSV with +B<--linebuffer>. + + +=head3 Comparing to buffering in memory + +B<gargs> is a parallelizing tool that buffers in memory. It is +therefore a useful way of comparing the advantages and disadvantages +of buffering in memory to buffering on disk. + +On an system with 6 GB RAM free and 6 GB free swap these were tested +with different sizes: + + echo /dev/zero | gargs "head -c $size {}" >/dev/null + echo /dev/zero | parallel "head -c $size {}" >/dev/null + +The results are here: + + JobRuntime Command + 0.344 parallel_test 1M + 0.362 parallel_test 10M + 0.640 parallel_test 100M + 9.818 parallel_test 1000M + 23.888 parallel_test 2000M + 30.217 parallel_test 2500M + 30.963 parallel_test 2750M + 34.648 parallel_test 3000M + 43.302 parallel_test 4000M + 55.167 parallel_test 5000M + 67.493 parallel_test 6000M + 178.654 parallel_test 7000M + 204.138 parallel_test 8000M + 230.052 parallel_test 9000M + 255.639 parallel_test 10000M + 757.981 parallel_test 30000M + 0.537 gargs_test 1M + 0.292 gargs_test 10M + 0.398 gargs_test 100M + 3.456 gargs_test 1000M + 8.577 gargs_test 2000M + 22.705 gargs_test 2500M + 123.076 gargs_test 2750M + 89.866 gargs_test 3000M + 291.798 gargs_test 4000M + +GNU B<parallel> is pretty much limited by the speed of the disk: Up to +6 GB data is written to disk but cached, so reading is fast. Above 6 +GB data are both written and read from disk. When the 30000MB job is +running, the disk system is slow, but usable: If you are not using the +disk, you almost do not feel it. + +B<gargs> has a speed advantage up until 2500M where it hits a +wall. Then the system starts swapping like crazy and is completely +unusable. At 5000M it goes out of memory. + +You can make GNU B<parallel> behave similar to B<gargs> if you point +$TMPDIR to a tmpfs-filesystem: It will be faster for small outputs, +but may kill your system for larger outputs and cause you to lose +output. + + +=head2 Disk full + +GNU B<parallel> buffers on disk. If the disk is full, data may be +lost. To check if the disk is full GNU B<parallel> writes a 8193 byte +file every second. If this file is written successfully, it is removed +immediately. If it is not written successfully, the disk is full. The +size 8193 was chosen because 8192 gave wrong result on some file +systems, whereas 8193 did the correct thing on all tested filesystems. + + +=head2 Memory usage + +Normally GNU B<parallel> will use around 17 MB RAM constantly - no +matter how many jobs or how much output there is. There are a few +things that cause the memory usage to rise: + +=over 3 + +=item * + +Multiple input sources. GNU B<parallel> reads an input source only +once. This is by design, as an input source can be a stream +(e.g. FIFO, pipe, standard input (stdin)) which cannot be rewound and +read again. When reading a single input source, the memory is freed as +soon as the job is done - thus keeping the memory usage constant. + +But when reading multiple input sources GNU B<parallel> keeps the +already read values for generating all combinations with other input +sources. + +=item * + +Computing the number of jobs. B<--bar>, B<--eta>, and B<--halt xx%> +use B<total_jobs()> to compute the total number of jobs. It does this +by generating the data structures for all jobs. All these job data +structures will be stored in memory and take up around 400 bytes/job. + +=item * + +Buffering a full line. B<--linebuffer> will read a full line per +running job. A very long output line (say 1 GB without \n) will +increase RAM usage temporarily: From when the beginning of the line is +read till the line is printed. + +=item * + +Buffering the full output of a single job. This happens when using +B<--results *.csv/*.tsv> or B<--sql*>. Here GNU B<parallel> will read +the whole output of a single job and save it as csv/tsv or SQL. + +=back + + +=head2 Argument separators ::: :::: :::+ ::::+ + +The argument separator B<:::> was chosen because I have never seen +B<:::> used in any command. The natural choice B<--> would be a bad +idea since it is not unlikely that the template command will contain +B<-->. I have seen B<::> used in programming languanges to separate +classes, and I did not want the user to be confused that the separator +had anything to do with classes. + +B<:::> also makes a visual separation, which is good if there are +multiple B<:::>. + +When B<:::> was chosen, B<::::> came as a fairly natural extension. + +Linking input sources meant having to decide for some way to indicate +linking of B<:::> and B<::::>. B<:::+> and B<::::+> were chosen, so +that they were similar to B<:::> and B<::::>. + +In 2022 I realized that B<///> would have been an even better choice, +because you cannot have an file named B<///> whereas you I<can> have a +file named B<:::>. + + +=head2 Perl replacement strings, {= =}, and --rpl + +The shorthands for replacement strings make a command look more +cryptic. Different users will need different replacement +strings. Instead of inventing more shorthands you get more +flexible replacement strings if they can be programmed by the user. + +The language Perl was chosen because GNU B<parallel> is written in +Perl and it was easy and reasonably fast to run the code given by the +user. + +If a user needs the same programmed replacement string again and +again, the user may want to make his own shorthand for it. This is +what B<--rpl> is for. It works so well, that even GNU B<parallel>'s +own shorthands are implemented using B<--rpl>. + +In Perl code the bigrams B<{=> and B<=}> rarely exist. They look like a +matching pair and can be entered on all keyboards. This made them good +candidates for enclosing the Perl expression in the replacement +strings. Another candidate ,, and ,, was rejected because they do not +look like a matching pair. B<--parens> was made, so that the users can +still use ,, and ,, if they like: B<--parens ,,,,> + +Internally, however, the B<{=> and B<=}> are replaced by \257< and +\257>. This is to make it simpler to make regular expressions. You +only need to look one character ahead, and never have to look behind. + + +=head2 Test suite + +GNU B<parallel> uses its own testing framework. This is mostly due to +historical reasons. It deals reasonably well with tests that are +dependent on how long a given test runs (e.g. more than 10 secs is a +pass, but less is a fail). It parallelizes most tests, but it is easy +to force a test to run as the single test (which may be important for +timing issues). It deals reasonably well with tests that fail +intermittently. It detects which tests failed and pushes these to the +top, so when running the test suite again, the tests that failed most +recently are run first. + +If GNU B<parallel> should adopt a real testing framework then those +elements would be important. + +Since many tests are dependent on which hardware it is running on, +these tests break when run on a different hardware than what the test +was written for. + +When most bugs are fixed a test is added, so this bug will not +reappear. It is, however, sometimes hard to create the environment in +which the bug shows up - especially if the bug only shows up +sometimes. One of the harder problems was to make a machine start +swapping without forcing it to its knees. + + +=head2 Median run time + +Using a percentage for B<--timeout> causes GNU B<parallel> to compute +the median run time of a job. The median is a better indicator of the +expected run time than average, because there will often be outliers +taking way longer than the normal run time. + +To avoid keeping all run times in memory, an implementation of +remedian was made (Rousseeuw et al). + + +=head2 Error messages and warnings + +Error messages like: ERROR, Not found, and 42 are not very +helpful. GNU B<parallel> strives to inform the user: + +=over 2 + +=item * + +What went wrong? + +=item * + +Why did it go wrong? + +=item * + +What can be done about it? + +=back + +Unfortunately it is not always possible to predict the root cause of +the error. + + +=head2 Determine number of CPUs + +CPUs is an ambiguous term. It can mean the number of socket filled +(i.e. the number of physical chips). It can mean the number of cores +(i.e. the number of physical compute cores). It can mean the number of +hyperthreaded cores (i.e. the number of virtual cores - with some of +them possibly being hyperthreaded). + +On ark.intel.com Intel uses the terms I<cores> and I<threads> for +number of physical cores and the number of hyperthreaded cores +respectively. + +GNU B<parallel> uses uses I<CPUs> as the number of compute units and +the terms I<sockets>, I<cores>, and I<threads> to specify how the +number of compute units is calculated. + + +=head2 Computation of load + +Contrary to the obvious B<--load> does not use load average. This is +due to load average rising too slowly. Instead it uses B<ps> to list +the number of threads in running or blocked state (state D, O or +R). This gives an instant load. + +As remote calculation of load can be slow, a process is spawned to run +B<ps> and put the result in a file, which is then used next time. + + +=head2 Killing jobs + +GNU B<parallel> kills jobs. It can be due to B<--memfree>, B<--halt>, +or when GNU B<parallel> meets a condition from which it cannot +recover. Every job is started as its own process group. This way any +(grand)*children will get killed, too. The process group is killed +with the specification mentioned in B<--termseq>. + + +=head2 SQL interface + +GNU B<parallel> uses the DBURL from GNU B<sql> to give database +software, username, password, host, port, database, and table in a +single string. + +The DBURL must point to a table name. The table will be dropped and +created. The reason for not reusing an existing table is that the user +may have added more input sources which would require more columns in +the table. By prepending '+' to the DBURL the table will not be +dropped. + +The table columns are similar to joblog with the addition of B<V1> +.. B<Vn> which are values from the input sources, and Stdout and +Stderr which are the output from standard output and standard error, +respectively. + +The Signal column has been renamed to _Signal due to Signal being a +reserved word in MySQL. + + +=head2 Logo + +The logo is inspired by the Cafe Wall illusion. The font is DejaVu +Sans. + +=head2 Citation notice + +Funding a free software project is hard. GNU B<parallel> is no +exception. On top of that it seems the less visible a project is, the +harder it is to get funding. And the nature of GNU B<parallel> is that +it will never be seen by "the guy with the checkbook", but only by the +people doing the actual work. + +This problem has been covered by others - though no solution has been +found: https://www.slideshare.net/NadiaEghbal/consider-the-maintainer +https://www.numfocus.org/blog/why-is-numpy-only-now-getting-funded/ + +Before implementing the citation notice it was discussed with the +users: +https://lists.gnu.org/archive/html/parallel/2013-11/msg00006.html + +Having to spend 10 seconds on running B<parallel --citation> once is +no doubt not an ideal solution, but no one has so far come up with an +ideal solution - neither for funding GNU B<parallel> nor other free +software. + +If you believe you have the perfect solution, you should try it out, +and if it works, you should post it on the email list. Ideas that will +cost work and which have not been tested are, however, unlikely to be +prioritized. + +Running B<parallel --citation> one single time takes less than 10 +seconds, and will silence the citation notice for future runs. This is +comparable to graphical tools where you have to click a checkbox +saying "Do not show this again". But if that is too much trouble for +you, why not use one of the alternatives instead? See a list in: +B<man parallel_alternatives>. + +As the request for citation is not a legal requirement this is +acceptable under GPLv3 and cleared with Richard M. Stallman +himself. Thus it does not fall under this: +https://www.gnu.org/licenses/gpl-faq.en.html#RequireCitation + + +=head1 Ideas for new design + +=head2 Multiple processes working together + +Open3 is slow. Printing is slow. It would be good if they did not tie +up resources, but were run in separate threads. + + +=head2 --rrs on remote using a perl wrapper + +... | perl -pe '$/=$recend$recstart;BEGIN{ if(substr($_) eq $recstart) substr($_)="" } eof and substr($_) eq $recend) substr($_)="" + +It ought to be possible to write a filter that removed rec sep on the +fly instead of inside GNU B<parallel>. This could then use more cpus. + +Will that require 2x record size memory? + +Will that require 2x block size memory? + + +=head1 Historical decisions + +These decisions were relevant for earlier versions of GNU B<parallel>, +but not the current version. They are kept here as historical record. + + +=head2 --tollef + +You can read about the history of GNU B<parallel> on +https://www.gnu.org/software/parallel/history.html + +B<--tollef> was included to make GNU B<parallel> switch compatible +with the parallel from moreutils (which is made by Tollef Fog +Heen). This was done so that users of that parallel easily could port +their use to GNU B<parallel>: Simply set B<PARALLEL="--tollef"> and +that would be it. + +But several distributions chose to make B<--tollef> global (by putting +it into /etc/parallel/config) without making the users aware of this, +and that caused much confusion when people tried out the examples from +GNU B<parallel>'s man page and these did not work. The users became +frustrated because the distribution did not make it clear to them that +it has made B<--tollef> global. + +So to lessen the frustration and the resulting support, B<--tollef> +was obsoleted 20130222 and removed one year later. + + +=cut |