diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 09:43:15 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 09:43:15 +0000 |
commit | 4fd40781fe3edc4ae5895e3cfddfbfea328ac0e8 (patch) | |
tree | efb19f2d9385f548990904757cdf44f3b9546fca /src/parallel_tutorial.pod | |
parent | Initial commit. (diff) | |
download | parallel-4fd40781fe3edc4ae5895e3cfddfbfea328ac0e8.tar.xz parallel-4fd40781fe3edc4ae5895e3cfddfbfea328ac0e8.zip |
Adding upstream version 20240222+ds.upstream/20240222+dsupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/parallel_tutorial.pod')
-rw-r--r-- | src/parallel_tutorial.pod | 3177 |
1 files changed, 3177 insertions, 0 deletions
diff --git a/src/parallel_tutorial.pod b/src/parallel_tutorial.pod new file mode 100644 index 0000000..90a3663 --- /dev/null +++ b/src/parallel_tutorial.pod @@ -0,0 +1,3177 @@ +#!/usr/bin/perl -w + +# SPDX-FileCopyrightText: 2021-2024 Ole Tange, http://ole.tange.dk and Free Software and Foundation, Inc. +# SPDX-License-Identifier: GFDL-1.3-or-later +# SPDX-License-Identifier: CC-BY-SA-4.0 + +=head1 GNU Parallel Tutorial + +This tutorial shows off much of GNU B<parallel>'s functionality. The +tutorial is meant to learn the options in and syntax of GNU +B<parallel>. The tutorial is B<not> to show realistic examples from the +real world. + +=head2 Reader's guide + +If you prefer reading a book buy B<GNU Parallel 2018> at +https://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html +or download it at: https://doi.org/10.5281/zenodo.1146014 + +Otherwise start by watching the intro videos for a quick introduction: +https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 + +Then browse through the examples (B<man parallel_examples>). That will give +you an idea of what GNU B<parallel> is capable of. + +If you want to dive even deeper: spend a couple of hours walking +through the tutorial (B<man parallel_tutorial>). Your command line +will love you for it. + +Finally you may want to look at the rest of the manual (B<man +parallel>) if you have special needs not already covered. + +If you want to know the design decisions behind GNU B<parallel>, try: +B<man parallel_design>. This is also a good intro if you intend to +change GNU B<parallel>. + + + +=head1 Prerequisites + +To run this tutorial you must have the following: + +=over 9 + +=item parallel >= version 20160822 + +Install the newest version using your package manager (recommended for +security reasons), the way described in README, or with this command: + + $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \ + fetch -o - http://pi.dk/3 ) > install.sh + $ sha1sum install.sh + 12345678 51621b7f 1ee103c0 0783aae4 ef9889f8 + $ md5sum install.sh + 62eada78 703b5500 241b8e50 baf62758 + $ sha512sum install.sh + 160d3159 9480cf5c a101512f 150b7ac0 206a65dc 86f2bb6b bdf1a2bc 96bc6d06 + 7f8237c2 0964b67f bccf8a93 332528fa 11e5ab43 2a6226a6 ceb197ab 7f03c061 + $ bash install.sh + +This will also install the newest version of the tutorial which you +can see by running this: + + man parallel_tutorial + +Most of the tutorial will work on older versions, too. + + +=item abc-file: + +The file can be generated by this command: + + parallel -k echo ::: A B C > abc-file + +=item def-file: + +The file can be generated by this command: + + parallel -k echo ::: D E F > def-file + +=item abc0-file: + +The file can be generated by this command: + + perl -e 'printf "A\0B\0C\0"' > abc0-file + +=item abc_-file: + +The file can be generated by this command: + + perl -e 'printf "A_B_C_"' > abc_-file + +=item tsv-file.tsv + +The file can be generated by this command: + + perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv + +=item num8 + +The file can be generated by this command: + + perl -e 'for(1..8){print "$_\n"}' > num8 + +=item num128 + +The file can be generated by this command: + + perl -e 'for(1..128){print "$_\n"}' > num128 + +=item num30000 + +The file can be generated by this command: + + perl -e 'for(1..30000){print "$_\n"}' > num30000 + +=item num1000000 + +The file can be generated by this command: + + perl -e 'for(1..1000000){print "$_\n"}' > num1000000 + +=item num_%header + +The file can be generated by this command: + + (echo %head1; echo %head2; \ + perl -e 'for(1..10){print "$_\n"}') > num_%header + +=item fixedlen + +The file can be generated by this command: + + perl -e 'print "HHHHAAABBBCCC"' > fixedlen + +=item For remote running: ssh login on 2 servers with no password in +$SERVER1 and $SERVER2 must work. + + SERVER1=server.example.com + SERVER2=server2.example.net + +So you must be able to do this without entering a password: + + ssh $SERVER1 echo works + ssh $SERVER2 echo works + +It can be setup by running B<ssh-keygen -t dsa; ssh-copy-id $SERVER1> +and using an empty passphrase, or you can use B<ssh-agent>. + +=back + + +=head1 Input sources + +GNU B<parallel> reads input from input sources. These can be files, the +command line, and stdin (standard input or a pipe). + +=head2 A single input source + +Input can be read from the command line: + + parallel echo ::: A B C + +Output (the order may be different because the jobs are run in +parallel): + + A + B + C + +The input source can be a file: + + parallel -a abc-file echo + +Output: Same as above. + +STDIN (standard input) can be the input source: + + cat abc-file | parallel echo + +Output: Same as above. + + +=head2 Multiple input sources + +GNU B<parallel> can take multiple input sources given on the command +line. GNU B<parallel> then generates all combinations of the input +sources: + + parallel echo ::: A B C ::: D E F + +Output (the order may be different): + + A D + A E + A F + B D + B E + B F + C D + C E + C F + +The input sources can be files: + + parallel -a abc-file -a def-file echo + +Output: Same as above. + +STDIN (standard input) can be one of the input sources using B<->: + + cat abc-file | parallel -a - -a def-file echo + +Output: Same as above. + +Instead of B<-a> files can be given after B<::::>: + + cat abc-file | parallel echo :::: - def-file + +Output: Same as above. + +::: and :::: can be mixed: + + parallel echo ::: A B C :::: def-file + +Output: Same as above. + +=head3 Linking arguments from input sources + +With B<--link> you can link the input sources and get one argument +from each input source: + + parallel --link echo ::: A B C ::: D E F + +Output (the order may be different): + + A D + B E + C F + +If one of the input sources is too short, its values will wrap: + + parallel --link echo ::: A B C D E ::: F G + +Output (the order may be different): + + A F + B G + C F + D G + E F + +For more flexible linking you can use B<:::+> and B<::::+>. They work +like B<:::> and B<::::> except they link the previous input source to +this input source. + +This will link ABC to GHI: + + parallel echo :::: abc-file :::+ G H I :::: def-file + +Output (the order may be different): + + A G D + A G E + A G F + B H D + B H E + B H F + C I D + C I E + C I F + +This will link GHI to DEF: + + parallel echo :::: abc-file ::: G H I ::::+ def-file + +Output (the order may be different): + + A G D + A H E + A I F + B G D + B H E + B I F + C G D + C H E + C I F + +If one of the input sources is too short when using B<:::+> or +B<::::+>, the rest will be ignored: + + parallel echo ::: A B C D E :::+ F G + +Output (the order may be different): + + A F + B G + + +=head2 Changing the argument separator. + +GNU B<parallel> can use other separators than B<:::> or B<::::>. This is +typically useful if B<:::> or B<::::> is used in the command to run: + + parallel --arg-sep ,, echo ,, A B C :::: def-file + +Output (the order may be different): + + A D + A E + A F + B D + B E + B F + C D + C E + C F + +Changing the argument file separator: + + parallel --arg-file-sep // echo ::: A B C // def-file + +Output: Same as above. + + +=head2 Changing the argument delimiter + +GNU B<parallel> will normally treat a full line as a single argument: It +uses B<\n> as argument delimiter. This can be changed with B<-d>: + + parallel -d _ echo :::: abc_-file + +Output (the order may be different): + + A + B + C + +NUL can be given as B<\0>: + + parallel -d '\0' echo :::: abc0-file + +Output: Same as above. + +A shorthand for B<-d '\0'> is B<-0> (this will often be used to read files +from B<find ... -print0>): + + parallel -0 echo :::: abc0-file + +Output: Same as above. + +=head2 End-of-file value for input source + +GNU B<parallel> can stop reading when it encounters a certain value: + + parallel -E stop echo ::: A B stop C D + +Output: + + A + B + +=head2 Skipping empty lines + +Using B<--no-run-if-empty> GNU B<parallel> will skip empty lines. + + (echo 1; echo; echo 2) | parallel --no-run-if-empty echo + +Output: + + 1 + 2 + + +=head1 Building the command line + +=head2 No command means arguments are commands + +If no command is given after parallel the arguments themselves are +treated as commands: + + parallel ::: ls 'echo foo' pwd + +Output (the order may be different): + + [list of files in current dir] + foo + [/path/to/current/working/dir] + +The command can be a script, a binary or a Bash function if the function is +exported using B<export -f>: + + # Only works in Bash + my_func() { + echo in my_func $1 + } + export -f my_func + parallel my_func ::: 1 2 3 + +Output (the order may be different): + + in my_func 1 + in my_func 2 + in my_func 3 + +=head2 Replacement strings + +=head3 The 7 predefined replacement strings + +GNU B<parallel> has several replacement strings. If no replacement +strings are used the default is to append B<{}>: + + parallel echo ::: dir/file.ext dir2/file2.ext2 + +Output: + + dir/file.ext + dir2/file2.ext2 + +The default replacement string is B<{}>: + + parallel echo {} ::: dir/file.ext dir2/file2.ext2 + +Output: + + dir/file.ext + dir2/file2.ext2 + +The replacement string B<{.}> removes the extension: + + parallel echo {.} ::: dir/file.ext dir2/file2.ext2 + +Output: + + dir/file + dir2/file2 + +The replacement string B<{/}> removes the path: + + parallel echo {/} ::: dir/file.ext dir2/file2.ext2 + +Output: + + file.ext + file2.ext2 + +The replacement string B<{//}> keeps only the path: + + parallel echo {//} ::: dir/file.ext dir2/file2.ext2 + +Output: + + dir + dir2 + +The replacement string B<{/.}> removes the path and the extension: + + parallel echo {/.} ::: dir/file.ext dir2/file2.ext2 + +Output: + + file + file2 + +The replacement string B<{#}> gives the job number: + + parallel echo {#} ::: A B C + +Output (the order may be different): + + 1 + 2 + 3 + +The replacement string B<{%}> gives the job slot number (between 1 and +number of jobs to run in parallel): + + parallel -j 2 echo {%} ::: A B C + +Output (the order may be different and 1 and 2 may be swapped): + + 1 + 2 + 1 + +=head3 Changing the replacement strings + +The replacement string B<{}> can be changed with B<-I>: + + parallel -I ,, echo ,, ::: A/B.C + +Output: + + A/B.C + +The replacement string B<{.}> can be changed with B<--extensionreplace>: + + parallel --extensionreplace ,, echo ,, ::: A/B.C + +Output: + + A/B + +The replacement string B<{/}> can be replaced with B<--basenamereplace>: + + parallel --basenamereplace ,, echo ,, ::: A/B.C + +Output: + + B.C + +The replacement string B<{//}> can be changed with B<--dirnamereplace>: + + parallel --dirnamereplace ,, echo ,, ::: A/B.C + +Output: + + A + +The replacement string B<{/.}> can be changed with B<--basenameextensionreplace>: + + parallel --basenameextensionreplace ,, echo ,, ::: A/B.C + +Output: + + B + +The replacement string B<{#}> can be changed with B<--seqreplace>: + + parallel --seqreplace ,, echo ,, ::: A B C + +Output (the order may be different): + + 1 + 2 + 3 + +The replacement string B<{%}> can be changed with B<--slotreplace>: + + parallel -j2 --slotreplace ,, echo ,, ::: A B C + +Output (the order may be different and 1 and 2 may be swapped): + + 1 + 2 + 1 + +=head3 Perl expression replacement string + +When predefined replacement strings are not flexible enough a perl +expression can be used instead. One example is to remove two +extensions: foo.tar.gz becomes foo + + parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz + +Output: + + foo + +In B<{= =}> you can access all of GNU B<parallel>'s internal functions +and variables. A few are worth mentioning. + +B<total_jobs()> returns the total number of jobs: + + parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5} + +Output: + + Job 1 of 5 + Job 2 of 5 + Job 3 of 5 + Job 4 of 5 + Job 5 of 5 + +B<Q(...)> shell quotes the string: + + parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$' + +Output: + + */!#$ shell quoted is \*/\!\#\$ + +B<skip()> skips the job: + + parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5} + +Output: + + 1 + 2 + 4 + 5 + +B<@arg> contains the input source variables: + + parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \ + ::: {1..3} ::: {1..3} + +Output: + + 1 2 + 1 3 + 2 1 + 2 3 + 3 1 + 3 2 + +If the strings B<{=> and B<=}> cause problems they can be replaced with B<--parens>: + + parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \ + ::: foo.tar.gz + +Output: + + foo + +To define a shorthand replacement string use B<--rpl>: + + parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \ + ::: foo.tar.gz + +Output: Same as above. + +If the shorthand starts with B<{> it can be used as a positional +replacement string, too: + + parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}' + ::: foo.tar.gz + +Output: Same as above. + +If the shorthand contains matching parenthesis the replacement string +becomes a dynamic replacement string and the string in the parenthesis +can be accessed as $$1. If there are multiple matching parenthesis, +the matched strings can be accessed using $$2, $$3 and so on. + +You can think of this as giving arguments to the replacement +string. Here we give the argument B<.tar.gz> to the replacement string +B<{%I<string>}> which removes I<string>: + + parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz + +Output: + + foo.zip + +Here we give the two arguments B<tar.gz> and B<zip> to the replacement +string B<{/I<string1>/I<string2>}> which replaces I<string1> with +I<string2>: + + parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \ + ::: foo.tar.gz + +Output: + + foo.zip + + +GNU B<parallel>'s 7 replacement strings are implemented as this: + + --rpl '{} ' + --rpl '{#} $_=$job->seq()' + --rpl '{%} $_=$job->slot()' + --rpl '{/} s:.*/::' + --rpl '{//} $Global::use{"File::Basename"} ||= + eval "use File::Basename; 1;"; $_ = dirname($_);' + --rpl '{/.} s:.*/::; s:\.[^/.]+$::;' + --rpl '{.} s:\.[^/.]+$::' + +=head3 Positional replacement strings + +With multiple input sources the argument from the individual input +sources can be accessed with S<< B<{>numberB<}> >>: + + parallel echo {1} and {2} ::: A B ::: C D + +Output (the order may be different): + + A and C + A and D + B and C + B and D + +The positional replacement strings can also be modified using B</>, B<//>, B</.>, and B<.>: + + parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F + +Output (the order may be different): + + /=B.C //=A /.=B .=A/B + /=E.F //=D /.=E .=D/E + +If a position is negative, it will refer to the input source counted +from behind: + + parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \ + ::: A B ::: C D ::: E F + +Output (the order may be different): + + 1=A 2=C 3=E -1=E -2=C -3=A + 1=A 2=C 3=F -1=F -2=C -3=A + 1=A 2=D 3=E -1=E -2=D -3=A + 1=A 2=D 3=F -1=F -2=D -3=A + 1=B 2=C 3=E -1=E -2=C -3=B + 1=B 2=C 3=F -1=F -2=C -3=B + 1=B 2=D 3=E -1=E -2=D -3=B + 1=B 2=D 3=F -1=F -2=D -3=B + + +=head3 Positional perl expression replacement string + +To use a perl expression as a positional replacement string simply +prepend the perl expression with number and space: + + parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \ + ::: bar ::: foo.tar.gz + +Output: + + foo bar + +If a shorthand defined using B<--rpl> starts with B<{> it can be used as +a positional replacement string, too: + + parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \ + ::: bar ::: foo.tar.gz + +Output: Same as above. + + +=head3 Input from columns + +The columns in a file can be bound to positional replacement strings +using B<--colsep>. Here the columns are separated by TAB (\t): + + parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv + +Output (the order may be different): + + 1=f1 2=f2 + 1=A 2=B + 1=C 2=D + +=head3 Header defined replacement strings + +With B<--header> GNU B<parallel> will use the first value of the input +source as the name of the replacement string. Only the non-modified +version B<{}> is supported: + + parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D + +Output (the order may be different): + + f1=A f2=C + f1=A f2=D + f1=B f2=C + f1=B f2=D + +It is useful with B<--colsep> for processing files with TAB separated values: + + parallel --header : --colsep '\t' echo f1={f1} f2={f2} \ + :::: tsv-file.tsv + +Output (the order may be different): + + f1=A f2=B + f1=C f2=D + +=head3 More pre-defined replacement strings with --plus + +B<--plus> adds the replacement strings B<{+/} {+.} {+..} {+...} {..} {...} +{/..} {/...} {##}>. The idea being that B<{+foo}> matches the opposite of B<{foo}> +and B<{}> = B<{+/}>/B<{/}> = B<{.}>.B<{+.}> = B<{+/}>/B<{/.}>.B<{+.}> = B<{..}>.B<{+..}> = +B<{+/}>/B<{/..}>.B<{+..}> = B<{...}>.B<{+...}> = B<{+/}>/B<{/...}>.B<{+...}>. + + parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3 + parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3 + parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3 + parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3 + parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3 + parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3 + parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3 + parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3 + +Output: + + dir/sub/file.ex1.ex2.ex3 + +B<{##}> is simply the number of jobs: + + parallel --plus echo Job {#} of {##} ::: {1..5} + +Output: + + Job 1 of 5 + Job 2 of 5 + Job 3 of 5 + Job 4 of 5 + Job 5 of 5 + +=head3 Dynamic replacement strings with --plus + +B<--plus> also defines these dynamic replacement strings: + +=over 19 + +=item B<{:-I<string>}> + +Default value is I<string> if the argument is empty. + +=item B<{:I<number>}> + +Substring from I<number> till end of string. + +=item B<{:I<number1>:I<number2>}> + +Substring from I<number1> to I<number2>. + +=item B<{#I<string>}> + +If the argument starts with I<string>, remove it. + +=item B<{%I<string>}> + +If the argument ends with I<string>, remove it. + +=item B<{/I<string1>/I<string2>}> + +Replace I<string1> with I<string2>. + +=item B<{^I<string>}> + +If the argument starts with I<string>, upper case it. I<string> must +be a single letter. + +=item B<{^^I<string>}> + +If the argument contains I<string>, upper case it. I<string> must be a +single letter. + +=item B<{,I<string>}> + +If the argument starts with I<string>, lower case it. I<string> must +be a single letter. + +=item B<{,,I<string>}> + +If the argument contains I<string>, lower case it. I<string> must be a +single letter. + +=back + +They are inspired from B<Bash>: + + unset myvar + echo ${myvar:-myval} + parallel --plus echo {:-myval} ::: "$myvar" + + myvar=abcAaAdef + echo ${myvar:2} + parallel --plus echo {:2} ::: "$myvar" + + echo ${myvar:2:3} + parallel --plus echo {:2:3} ::: "$myvar" + + echo ${myvar#bc} + parallel --plus echo {#bc} ::: "$myvar" + echo ${myvar#abc} + parallel --plus echo {#abc} ::: "$myvar" + + echo ${myvar%de} + parallel --plus echo {%de} ::: "$myvar" + echo ${myvar%def} + parallel --plus echo {%def} ::: "$myvar" + + echo ${myvar/def/ghi} + parallel --plus echo {/def/ghi} ::: "$myvar" + + echo ${myvar^a} + parallel --plus echo {^a} ::: "$myvar" + echo ${myvar^^a} + parallel --plus echo {^^a} ::: "$myvar" + + myvar=AbcAaAdef + echo ${myvar,A} + parallel --plus echo '{,A}' ::: "$myvar" + echo ${myvar,,A} + parallel --plus echo '{,,A}' ::: "$myvar" + +Output: + + myval + myval + cAaAdef + cAaAdef + cAa + cAa + abcAaAdef + abcAaAdef + AaAdef + AaAdef + abcAaAdef + abcAaAdef + abcAaA + abcAaA + abcAaAghi + abcAaAghi + AbcAaAdef + AbcAaAdef + AbcAAAdef + AbcAAAdef + abcAaAdef + abcAaAdef + abcaaadef + abcaaadef + + +=head2 More than one argument + +With B<--xargs> GNU B<parallel> will fit as many arguments as possible on a +single line: + + cat num30000 | parallel --xargs echo | wc -l + +Output (if you run this under Bash on GNU/Linux): + + 2 + +The 30000 arguments fitted on 2 lines. + +The maximal length of a single line can be set with B<-s>. With a maximal +line length of 10000 chars 17 commands will be run: + + cat num30000 | parallel --xargs -s 10000 echo | wc -l + +Output: + + 17 + +For better parallelism GNU B<parallel> can distribute the arguments +between all the parallel jobs when end of file is met. + +Below GNU B<parallel> reads the last argument when generating the second +job. When GNU B<parallel> reads the last argument, it spreads all the +arguments for the second job over 4 jobs instead, as 4 parallel jobs +are requested. + +The first job will be the same as the B<--xargs> example above, but the +second job will be split into 4 evenly sized jobs, resulting in a +total of 5 jobs: + + cat num30000 | parallel --jobs 4 -m echo | wc -l + +Output (if you run this under Bash on GNU/Linux): + + 5 + +This is even more visible when running 4 jobs with 10 arguments. The +10 arguments are being spread over 4 jobs: + + parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10 + +Output: + + 1 2 3 + 4 5 6 + 7 8 9 + 10 + +A replacement string can be part of a word. B<-m> will not repeat the context: + + parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G + +Output (the order may be different): + + pre-A B-post + pre-C D-post + pre-E F-post + pre-G-post + +To repeat the context use B<-X> which otherwise works like B<-m>: + + parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G + +Output (the order may be different): + + pre-A-post pre-B-post + pre-C-post pre-D-post + pre-E-post pre-F-post + pre-G-post + +To limit the number of arguments use B<-N>: + + parallel -N3 echo ::: A B C D E F G H + +Output (the order may be different): + + A B C + D E F + G H + +B<-N> also sets the positional replacement strings: + + parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H + +Output (the order may be different): + + 1=A 2=B 3=C + 1=D 2=E 3=F + 1=G 2=H 3= + +B<-N0> reads 1 argument but inserts none: + + parallel -N0 echo foo ::: 1 2 3 + +Output: + + foo + foo + foo + +=head2 Quoting + +Command lines that contain special characters may need to be protected from the shell. + +The B<perl> program B<print "@ARGV\n"> basically works like B<echo>. + + perl -e 'print "@ARGV\n"' A + +Output: + + A + +To run that in parallel the command needs to be quoted: + + parallel perl -e 'print "@ARGV\n"' ::: This wont work + +Output: + + [Nothing] + +To quote the command use B<-q>: + + parallel -q perl -e 'print "@ARGV\n"' ::: This works + +Output (the order may be different): + + This + works + +Or you can quote the critical part using B<\'>: + + parallel perl -e \''print "@ARGV\n"'\' ::: This works, too + +Output (the order may be different): + + This + works, + too + +GNU B<parallel> can also \-quote full lines. Simply run this: + + parallel --shellquote + Warning: Input is read from the terminal. You either know what you + Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot + Warning: ::: or :::: or to pipe data into parallel. If so + Warning: consider going through the tutorial: man parallel_tutorial + Warning: Press CTRL-D to exit. + perl -e 'print "@ARGV\n"' + [CTRL-D] + +Output: + + perl\ -e\ \'print\ \"@ARGV\\n\"\' + +This can then be used as the command: + + parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works + +Output (the order may be different): + + This + also + works + + +=head2 Trimming space + +Space can be trimmed on the arguments using B<--trim>: + + parallel --trim r echo pre-{}-post ::: ' A ' + +Output: + + pre- A-post + +To trim on the left side: + + parallel --trim l echo pre-{}-post ::: ' A ' + +Output: + + pre-A -post + +To trim on the both sides: + + parallel --trim lr echo pre-{}-post ::: ' A ' + +Output: + + pre-A-post + + +=head2 Respecting the shell + +This tutorial uses Bash as the shell. GNU B<parallel> respects which +shell you are using, so in B<zsh> you can do: + + parallel echo \={} ::: zsh bash ls + +Output: + + /usr/bin/zsh + /bin/bash + /bin/ls + +In B<csh> you can do: + + parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: * + +Output: + + [somedir] is a dir + +This also becomes useful if you use GNU B<parallel> in a shell script: +GNU B<parallel> will use the same shell as the shell script. + + +=head1 Controlling the output + +The output can prefixed with the argument: + + parallel --tag echo foo-{} ::: A B C + +Output (the order may be different): + + A foo-A + B foo-B + C foo-C + +To prefix it with another string use B<--tagstring>: + + parallel --tagstring {}-bar echo foo-{} ::: A B C + +Output (the order may be different): + + A-bar foo-A + B-bar foo-B + C-bar foo-C + +To see what commands will be run without running them use B<--dryrun>: + + parallel --dryrun echo {} ::: A B C + +Output (the order may be different): + + echo A + echo B + echo C + +To print the command before running them use B<--verbose>: + + parallel --verbose echo {} ::: A B C + +Output (the order may be different): + + echo A + echo B + A + echo C + B + C + +GNU B<parallel> will postpone the output until the command completes: + + parallel -j2 'printf "%s-start\n%s" {} {}; + sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1 + +Output: + + 2-start + 2-middle + 2-end + 1-start + 1-middle + 1-end + 4-start + 4-middle + 4-end + +To get the output immediately use B<--ungroup>: + + parallel -j2 --ungroup 'printf "%s-start\n%s" {} {}; + sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1 + +Output: + + 4-start + 42-start + 2-middle + 2-end + 1-start + 1-middle + 1-end + -middle + 4-end + +B<--ungroup> is fast, but can cause half a line from one job to be mixed +with half a line of another job. That has happened in the second line, +where the line '4-middle' is mixed with '2-start'. + +To avoid this use B<--linebuffer>: + + parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {}; + sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1 + +Output: + + 4-start + 2-start + 2-middle + 2-end + 1-start + 1-middle + 1-end + 4-middle + 4-end + +To force the output in the same order as the arguments use B<--keep-order>/B<-k>: + + parallel -j2 -k 'printf "%s-start\n%s" {} {}; + sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1 + +Output: + + 4-start + 4-middle + 4-end + 2-start + 2-middle + 2-end + 1-start + 1-middle + 1-end + + +=head2 Saving output into files + +GNU B<parallel> can save the output of each job into files: + + parallel --files echo ::: A B C + +Output will be similar to this: + + /tmp/pAh6uWuQCg.par + /tmp/opjhZCzAX4.par + /tmp/W0AT_Rph2o.par + +By default GNU B<parallel> will cache the output in files in B</tmp>. This +can be changed by setting B<$TMPDIR> or B<--tmpdir>: + + parallel --tmpdir /var/tmp --files echo ::: A B C + +Output will be similar to this: + + /var/tmp/N_vk7phQRc.par + /var/tmp/7zA4Ccf3wZ.par + /var/tmp/LIuKgF_2LP.par + +Or: + + TMPDIR=/var/tmp parallel --files echo ::: A B C + +Output: Same as above. + +The output files can be saved in a structured way using B<--results>: + + parallel --results outdir echo ::: A B C + +Output: + + A + B + C + +These files were also generated containing the standard output +(stdout), standard error (stderr), and the sequence number (seq): + + outdir/1/A/seq + outdir/1/A/stderr + outdir/1/A/stdout + outdir/1/B/seq + outdir/1/B/stderr + outdir/1/B/stdout + outdir/1/C/seq + outdir/1/C/stderr + outdir/1/C/stdout + +B<--header :> will take the first value as name and use that in the +directory structure. This is useful if you are using multiple input +sources: + + parallel --header : --results outdir echo ::: f1 A B ::: f2 C D + +Generated files: + + outdir/f1/A/f2/C/seq + outdir/f1/A/f2/C/stderr + outdir/f1/A/f2/C/stdout + outdir/f1/A/f2/D/seq + outdir/f1/A/f2/D/stderr + outdir/f1/A/f2/D/stdout + outdir/f1/B/f2/C/seq + outdir/f1/B/f2/C/stderr + outdir/f1/B/f2/C/stdout + outdir/f1/B/f2/D/seq + outdir/f1/B/f2/D/stderr + outdir/f1/B/f2/D/stdout + +The directories are named after the variables and their values. + +=head1 Controlling the execution + +=head2 Number of simultaneous jobs + +The number of concurrent jobs is given with B<--jobs>/B<-j>: + + /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128 + +With 64 jobs in parallel the 128 B<sleep>s will take 2-8 seconds to run - +depending on how fast your machine is. + +By default B<--jobs> is the same as the number of CPU cores. So this: + + /usr/bin/time parallel -N0 sleep 1 :::: num128 + +should take twice the time of running 2 jobs per CPU core: + + /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128 + +B<--jobs 0> will run as many jobs in parallel as possible: + + /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128 + +which should take 1-7 seconds depending on how fast your machine is. + +B<--jobs> can read from a file which is re-read when a job finishes: + + echo 50% > my_jobs + /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 & + sleep 1 + echo 0 > my_jobs + wait + +The first second only 50% of the CPU cores will run a job. Then B<0> is +put into B<my_jobs> and then the rest of the jobs will be started in +parallel. + +Instead of basing the percentage on the number of CPU cores +GNU B<parallel> can base it on the number of CPUs: + + parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8 + +=head2 Shuffle job order + +If you have many jobs (e.g. by multiple combinations of input +sources), it can be handy to shuffle the jobs, so you get different +values run. Use B<--shuf> for that: + + parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C + +Output: + + All combinations but different order for each run. + +=head2 Interactivity + +GNU B<parallel> can ask the user if a command should be run using B<--interactive>: + + parallel --interactive echo ::: 1 2 3 + +Output: + + echo 1 ?...y + echo 2 ?...n + 1 + echo 3 ?...y + 3 + +GNU B<parallel> can be used to put arguments on the command line for an +interactive command such as B<emacs> to edit one file at a time: + + parallel --tty emacs ::: 1 2 3 + +Or give multiple argument in one go to open multiple files: + + parallel -X --tty vi ::: 1 2 3 + +=head2 A terminal for every job + +Using B<--tmux> GNU B<parallel> can start a terminal for every job run: + + seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}' + +This will tell you to run something similar to: + + tmux -S /tmp/tmsrPrO0 attach + +Using normal B<tmux> keystrokes (CTRL-b n or CTRL-b p) you can cycle +between windows of the running jobs. When a job is finished it will +pause for 10 seconds before closing the window. + +=head2 Timing + +Some jobs do heavy I/O when they start. To avoid a thundering herd GNU +B<parallel> can delay starting new jobs. B<--delay> I<X> will make +sure there is at least I<X> seconds between each start: + + parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3 + +Output: + + Starting 1 + Thu Aug 15 16:24:33 CEST 2013 + Starting 2 + Thu Aug 15 16:24:35 CEST 2013 + Starting 3 + Thu Aug 15 16:24:38 CEST 2013 + + +If jobs taking more than a certain amount of time are known to fail, +they can be stopped with B<--timeout>. The accuracy of B<--timeout> is +2 seconds: + + parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8 + +Output: + + 2 + 4 + +GNU B<parallel> can compute the median runtime for jobs and kill those +that take more than 200% of the median runtime: + + parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3 + +Output: + + 2.1 + 2.2 + 3 + 2.3 + +=head2 Progress information + +Based on the runtime of completed jobs GNU B<parallel> can estimate the +total runtime: + + parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1 + +Output: + + Computers / CPU cores / Max jobs to run + 1:local / 2 / 2 + + Computer:jobs running/jobs completed/%of started jobs/ + Average seconds to complete + ETA: 2s 0left 1.11avg local:0/9/100%/1.1s + +GNU B<parallel> can give progress information with B<--progress>: + + parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1 + +Output: + + Computers / CPU cores / Max jobs to run + 1:local / 2 / 2 + + Computer:jobs running/jobs completed/%of started jobs/ + Average seconds to complete + local:0/9/100%/1.1s + +A progress bar can be shown with B<--bar>: + + parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1 + +And a graphic bar can be shown with B<--bar> and B<zenity>: + + seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \ + 2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' | + zenity --progress --auto-kill --auto-close) + +A logfile of the jobs completed so far can be generated with B<--joblog>: + + parallel --joblog /tmp/log exit ::: 1 2 3 0 + cat /tmp/log + +Output: + + Seq Host Starttime Runtime Send Receive Exitval Signal Command + 1 : 1376577364.974 0.008 0 0 1 0 exit 1 + 2 : 1376577364.982 0.013 0 0 2 0 exit 2 + 3 : 1376577364.990 0.013 0 0 3 0 exit 3 + 4 : 1376577365.003 0.003 0 0 0 0 exit 0 + +The log contains the job sequence, which host the job was run on, the +start time and run time, how much data was transferred, the exit +value, the signal that killed the job, and finally the command being +run. + +With a joblog GNU B<parallel> can be stopped and later pickup where it +left off. It it important that the input of the completed jobs is +unchanged. + + parallel --joblog /tmp/log exit ::: 1 2 3 0 + cat /tmp/log + parallel --resume --joblog /tmp/log exit ::: 1 2 3 0 0 0 + cat /tmp/log + +Output: + + Seq Host Starttime Runtime Send Receive Exitval Signal Command + 1 : 1376580069.544 0.008 0 0 1 0 exit 1 + 2 : 1376580069.552 0.009 0 0 2 0 exit 2 + 3 : 1376580069.560 0.012 0 0 3 0 exit 3 + 4 : 1376580069.571 0.005 0 0 0 0 exit 0 + + Seq Host Starttime Runtime Send Receive Exitval Signal Command + 1 : 1376580069.544 0.008 0 0 1 0 exit 1 + 2 : 1376580069.552 0.009 0 0 2 0 exit 2 + 3 : 1376580069.560 0.012 0 0 3 0 exit 3 + 4 : 1376580069.571 0.005 0 0 0 0 exit 0 + 5 : 1376580070.028 0.009 0 0 0 0 exit 0 + 6 : 1376580070.038 0.007 0 0 0 0 exit 0 + +Note how the start time of the last 2 jobs is clearly different from the second run. + +With B<--resume-failed> GNU B<parallel> will re-run the jobs that failed: + + parallel --resume-failed --joblog /tmp/log exit ::: 1 2 3 0 0 0 + cat /tmp/log + +Output: + + Seq Host Starttime Runtime Send Receive Exitval Signal Command + 1 : 1376580069.544 0.008 0 0 1 0 exit 1 + 2 : 1376580069.552 0.009 0 0 2 0 exit 2 + 3 : 1376580069.560 0.012 0 0 3 0 exit 3 + 4 : 1376580069.571 0.005 0 0 0 0 exit 0 + 5 : 1376580070.028 0.009 0 0 0 0 exit 0 + 6 : 1376580070.038 0.007 0 0 0 0 exit 0 + 1 : 1376580154.433 0.010 0 0 1 0 exit 1 + 2 : 1376580154.444 0.022 0 0 2 0 exit 2 + 3 : 1376580154.466 0.005 0 0 3 0 exit 3 + +Note how seq 1 2 3 have been repeated because they had exit value +different from 0. + +B<--retry-failed> does almost the same as B<--resume-failed>. Where +B<--resume-failed> reads the commands from the command line (and +ignores the commands in the joblog), B<--retry-failed> ignores the +command line and reruns the commands mentioned in the joblog. + + parallel --retry-failed --joblog /tmp/log + cat /tmp/log + +Output: + + Seq Host Starttime Runtime Send Receive Exitval Signal Command + 1 : 1376580069.544 0.008 0 0 1 0 exit 1 + 2 : 1376580069.552 0.009 0 0 2 0 exit 2 + 3 : 1376580069.560 0.012 0 0 3 0 exit 3 + 4 : 1376580069.571 0.005 0 0 0 0 exit 0 + 5 : 1376580070.028 0.009 0 0 0 0 exit 0 + 6 : 1376580070.038 0.007 0 0 0 0 exit 0 + 1 : 1376580154.433 0.010 0 0 1 0 exit 1 + 2 : 1376580154.444 0.022 0 0 2 0 exit 2 + 3 : 1376580154.466 0.005 0 0 3 0 exit 3 + 1 : 1376580164.633 0.010 0 0 1 0 exit 1 + 2 : 1376580164.644 0.022 0 0 2 0 exit 2 + 3 : 1376580164.666 0.005 0 0 3 0 exit 3 + + +=head2 Termination + +=head3 Unconditional termination + +By default GNU B<parallel> will wait for all jobs to finish before exiting. + +If you send GNU B<parallel> the B<TERM> signal, GNU B<parallel> will +stop spawning new jobs and wait for the remaining jobs to finish. If +you send GNU B<parallel> the B<TERM> signal again, GNU B<parallel> +will kill all running jobs and exit. + +=head3 Termination dependent on job status + +For certain jobs there is no need to continue if one of the jobs fails +and has an exit code different from 0. GNU B<parallel> will stop spawning new jobs +with B<--halt soon,fail=1>: + + parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3 + +Output: + + 0 + 0 + 1 + parallel: This job failed: + echo 1; exit 1 + parallel: Starting no more jobs. Waiting for 1 jobs to finish. + 2 + +With B<--halt now,fail=1> the running jobs will be killed immediately: + + parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3 + +Output: + + 0 + 0 + 1 + parallel: This job failed: + echo 1; exit 1 + +If B<--halt> is given a percentage this percentage of the jobs must fail +before GNU B<parallel> stops spawning more jobs: + + parallel -j2 --halt soon,fail=20% echo {}\; exit {} \ + ::: 0 1 2 3 4 5 6 7 8 9 + +Output: + + 0 + 1 + parallel: This job failed: + echo 1; exit 1 + 2 + parallel: This job failed: + echo 2; exit 2 + parallel: Starting no more jobs. Waiting for 1 jobs to finish. + 3 + parallel: This job failed: + echo 3; exit 3 + +If you are looking for success instead of failures, you can use +B<success>. This will finish as soon as the first job succeeds: + + parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6 + +Output: + + 1 + 2 + 3 + 0 + parallel: This job succeeded: + echo 0; exit 0 + +GNU B<parallel> can retry the command with B<--retries>. This is useful if a +command fails for unknown reasons now and then. + + parallel -k --retries 3 \ + 'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0 + cat /tmp/runs + +Output: + + completed 1 + completed 2 + completed 0 + + tried 1 + tried 2 + tried 1 + tried 2 + tried 1 + tried 2 + tried 0 + +Note how job 1 and 2 were tried 3 times, but 0 was not retried because it had exit code 0. + +=head3 Termination signals (advanced) + +Using B<--termseq> you can control which signals are sent when killing +children. Normally children will be killed by sending them B<SIGTERM>, +waiting 200 ms, then another B<SIGTERM>, waiting 100 ms, then another +B<SIGTERM>, waiting 50 ms, then a B<SIGKILL>, finally waiting 25 ms +before giving up. It looks like this: + + show_signals() { + perl -e 'for(keys %SIG) { + $SIG{$_} = eval "sub { print \"Got $_\\n\"; }"; + } + while(1){sleep 1}' + } + export -f show_signals + echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \ + -u --timeout 1 show_signals + +Output: + + Got TERM + Got TERM + Got TERM + +Or just: + + echo | parallel -u --timeout 1 show_signals + +Output: Same as above. + +You can change this to B<SIGINT>, B<SIGTERM>, B<SIGKILL>: + + echo | parallel --termseq INT,200,TERM,100,KILL,25 \ + -u --timeout 1 show_signals + +Output: + + Got INT + Got TERM + +The B<SIGKILL> does not show because it cannot be caught, and thus the +child dies. + + +=head2 Limiting the resources + +To avoid overloading systems GNU B<parallel> can look at the system load +before starting another job: + + parallel --load 100% echo load is less than {} job per cpu ::: 1 + +Output: + + [when then load is less than the number of cpu cores] + load is less than 1 job per cpu + +GNU B<parallel> can also check if the system is swapping. + + parallel --noswap echo the system is not swapping ::: now + +Output: + + [when then system is not swapping] + the system is not swapping now + +Some jobs need a lot of memory, and should only be started when there +is enough memory free. Using B<--memfree> GNU B<parallel> can check if +there is enough memory free. Additionally, GNU B<parallel> will kill +off the youngest job if the memory free falls below 50% of the +size. The killed job will put back on the queue and retried later. + + parallel --memfree 1G echo will run if more than 1 GB is ::: free + +GNU B<parallel> can run the jobs with a nice value. This will work both +locally and remotely. + + parallel --nice 17 echo this is being run with nice -n ::: 17 + +Output: + + this is being run with nice -n 17 + +=head1 Remote execution + +GNU B<parallel> can run jobs on remote servers. It uses B<ssh> to +communicate with the remote machines. + +=head2 Sshlogin + +The most basic sshlogin is B<-S> I<host>: + + parallel -S $SERVER1 echo running on ::: $SERVER1 + +Output: + + running on [$SERVER1] + +To use a different username prepend the server with I<username@>: + + parallel -S username@$SERVER1 echo running on ::: username@$SERVER1 + +Output: + + running on [username@$SERVER1] + +The special sshlogin B<:> is the local machine: + + parallel -S : echo running on ::: the_local_machine + +Output: + + running on the_local_machine + +If B<ssh> is not in $PATH it can be prepended to $SERVER1: + + parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh + +Output: + + custom ssh + +The B<ssh> command can also be given using B<--ssh>: + + parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh + +or by setting B<$PARALLEL_SSH>: + + export PARALLEL_SSH=/usr/bin/ssh + parallel -S $SERVER1 echo custom ::: ssh + +Several servers can be given using multiple B<-S>: + + parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts + +Output (the order may be different): + + running + on + more + hosts + +Or they can be separated by B<,>: + + parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts + +Output: Same as above. + +Or newline: + + # This gives a \n between $SERVER1 and $SERVER2 + SERVERS="`echo $SERVER1; echo $SERVER2`" + parallel -S "$SERVERS" echo ::: running on more hosts + +They can also be read from a file (replace I<user@> with the user on B<$SERVER2>): + + echo $SERVER1 > nodefile + # Force 4 cores, special ssh-command, username + echo 4//usr/bin/ssh user@$SERVER2 >> nodefile + parallel --sshloginfile nodefile echo ::: running on more hosts + +Output: Same as above. + +Every time a job finished, the B<--sshloginfile> will be re-read, so +it is possible to both add and remove hosts while running. + +The special B<--sshloginfile ..> reads from B<~/.parallel/sshloginfile>. + +To force GNU B<parallel> to treat a server having a given number of CPU +cores prepend the number of core followed by B</> to the sshlogin: + + parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4 + +Output: + + force 4 cpus on server + +Servers can be put into groups by prepending I<@groupname> to the +server and the group can then be selected by appending I<@groupname> to +the argument if using B<--hostgroup>: + + parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \ + ::: run_on_grp1@grp1 run_on_grp2@grp2 + +Output: + + run_on_grp1 + run_on_grp2 + +A host can be in multiple groups by separating the groups with B<+>, and +you can force GNU B<parallel> to limit the groups on which the command +can be run with B<-S> I<@groupname>: + + parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \ + ::: run_on_grp1 also_grp1 + +Output: + + run_on_grp1 + also_grp1 + +=head2 Transferring files + +GNU B<parallel> can transfer the files to be processed to the remote +host. It does that using rsync. + + echo This is input_file > input_file + parallel -S $SERVER1 --transferfile {} cat ::: input_file + +Output: + + This is input_file + +If the files are processed into another file, the resulting file can be +transferred back: + + echo This is input_file > input_file + parallel -S $SERVER1 --transferfile {} --return {}.out \ + cat {} ">"{}.out ::: input_file + cat input_file.out + +Output: Same as above. + +To remove the input and output file on the remote server use B<--cleanup>: + + echo This is input_file > input_file + parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \ + cat {} ">"{}.out ::: input_file + cat input_file.out + +Output: Same as above. + +There is a shorthand for B<--transferfile {} --return --cleanup> called B<--trc>: + + echo This is input_file > input_file + parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file + cat input_file.out + +Output: Same as above. + +Some jobs need a common database for all jobs. GNU B<parallel> can +transfer that using B<--basefile> which will transfer the file before the +first job: + + echo common data > common_file + parallel --basefile common_file -S $SERVER1 \ + cat common_file\; echo {} ::: foo + +Output: + + common data + foo + +To remove it from the remote host after the last job use B<--cleanup>. + + +=head2 Working dir + +The default working dir on the remote machines is the login dir. This +can be changed with B<--workdir> I<mydir>. + +Files transferred using B<--transferfile> and B<--return> will be relative +to I<mydir> on remote computers, and the command will be executed in +the dir I<mydir>. + +The special I<mydir> value B<...> will create working dirs under +B<~/.parallel/tmp> on the remote computers. If B<--cleanup> is given +these dirs will be removed. + +The special I<mydir> value B<.> uses the current working dir. If the +current working dir is beneath your home dir, the value B<.> is +treated as the relative path to your home dir. This means that if your +home dir is different on remote computers (e.g. if your login is +different) the relative path will still be relative to your home dir. + + parallel -S $SERVER1 pwd ::: "" + parallel --workdir . -S $SERVER1 pwd ::: "" + parallel --workdir ... -S $SERVER1 pwd ::: "" + +Output: + + [the login dir on $SERVER1] + [current dir relative on $SERVER1] + [a dir in ~/.parallel/tmp/...] + + +=head2 Avoid overloading sshd + +If many jobs are started on the same server, B<sshd> can be +overloaded. GNU B<parallel> can insert a delay between each job run on +the same server: + + parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3 + +Output (the order may be different): + + 1 + 2 + 3 + +B<sshd> will be less overloaded if using B<--controlmaster>, which will +multiplex ssh connections: + + parallel --controlmaster -S $SERVER1 echo ::: 1 2 3 + +Output: Same as above. + +=head2 Ignore hosts that are down + +In clusters with many hosts a few of them are often down. GNU B<parallel> +can ignore those hosts. In this case the host 173.194.32.46 is down: + + parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar + +Output: + + bar + +=head2 Running the same commands on all hosts + +GNU B<parallel> can run the same command on all the hosts: + + parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar + +Output (the order may be different): + + foo + bar + foo + bar + +Often you will just want to run a single command on all hosts with out +arguments. B<--nonall> is a no argument B<--onall>: + + parallel --nonall -S $SERVER1,$SERVER2 echo foo bar + +Output: + + foo bar + foo bar + +When B<--tag> is used with B<--nonall> and B<--onall> the B<--tagstring> is the host: + + parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar + +Output (the order may be different): + + $SERVER1 foo bar + $SERVER2 foo bar + +B<--jobs> sets the number of servers to log in to in parallel. + +=head2 Transferring environment variables and functions + +B<env_parallel> is a shell function that transfers all aliases, +functions, variables, and arrays. You active it by running: + + source `which env_parallel.bash` + +Replace B<bash> with the shell you use. + +Now you can use B<env_parallel> instead of B<parallel> and still have +your environment: + + alias myecho=echo + myvar="Joe's var is" + env_parallel -S $SERVER1 'myecho $myvar' ::: green + +Output: + + Joe's var is green + +The disadvantage is that if your environment is huge B<env_parallel> +will fail. + +When B<env_parallel> fails, you can still use B<--env> to tell GNU +B<parallel> to transfer an environment variable to the remote system. + + MYVAR='foo bar' + export MYVAR + parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz + +Output: + + foo bar baz + +This works for functions, too, if your shell is Bash: + + # This only works in Bash + my_func() { + echo in my_func $1 + } + export -f my_func + parallel --env my_func -S $SERVER1 my_func ::: baz + +Output: + + in my_func baz + +GNU B<parallel> can copy all user defined variables and functions to +the remote system. It just needs to record which ones to ignore in +B<~/.parallel/ignored_vars>. Do that by running this once: + + parallel --record-env + cat ~/.parallel/ignored_vars + +Output: + + [list of variables to ignore - including $PATH and $HOME] + +Now all other variables and functions defined will be copied when +using B<--env _>. + + # The function is only copied if using Bash + my_func2() { + echo in my_func2 $VAR $1 + } + export -f my_func2 + VAR=foo + export VAR + + parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar + +Output: + + foo + in my_func2 foo bar + +If you use B<env_parallel> the variables, functions, and aliases do +not even need to be exported to be copied: + + NOT='not exported var' + alias myecho=echo + not_ex() { + myecho in not_exported_func $NOT $1 + } + env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar + +Output: + + not exported var + in not_exported_func not exported var bar + + +=head2 Showing what is actually run + +B<--verbose> will show the command that would be run on the local +machine. + +When using B<--cat>, B<--pipepart>, or when a job is run on a remote +machine, the command is wrapped with helper scripts. B<-vv> shows all +of this. + + parallel -vv --pipepart --block 1M wc :::: num30000 + +Output: + + <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die; + $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072 + ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }' + 0 0 0 168894 | (wc) + 30000 30000 168894 + +When the command gets more complex, the output is so hard to read, +that it is only useful for debugging: + + my_func3() { + echo in my_func $1 > $1.out + } + export -f my_func3 + parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \ + -S $SERVER1 my_func3 {} ::: abc-file + +Output will be similar to: + + + ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync + --protocol 30 -rlDzR -essh ./abc-file + server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e + \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64"); + eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\' + c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N + TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE + BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU + iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh + IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI + 0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2 + ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF + TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7 + YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub + Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW + RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ + MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB + KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ + fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse + W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi + JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV + mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k + dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=; + _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\ + ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh + server:./abc-file.out ./.;ssh server -- \(rm\ -f\ + ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\ + ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\ + 2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh + server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\ + sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ + ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\ + ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf + .parallel/tmp/aspire-1928520-1; exit $_EXIT_status; + +=head1 Saving output to shell variables (advanced) + +GNU B<parset> will set shell variables to the output of GNU +B<parallel>. GNU B<parset> has one important limitation: It cannot be +part of a pipe. In particular this means it cannot read anything from +standard input (stdin) or pipe output to another program. + +To use GNU B<parset> prepend command with destination variables: + + parset myvar1,myvar2 echo ::: a b + echo $myvar1 + echo $myvar2 + +Output: + + a + b + +If you only give a single variable, it will be treated as an array: + + parset myarray seq {} 5 ::: 1 2 3 + echo "${myarray[1]}" + +Output: + + 2 + 3 + 4 + 5 + +The commands to run can be an array: + + cmd=("echo '<<joe \"double space\" cartoon>>'" "pwd") + parset data ::: "${cmd[@]}" + echo "${data[0]}" + echo "${data[1]}" + +Output: + + <<joe "double space" cartoon>> + [current dir] + + +=head1 Saving to an SQL base (advanced) + +GNU B<parallel> can save into an SQL base. Point GNU B<parallel> to a +table and it will put the joblog there together with the variables and +the output each in their own column. + +=head2 CSV as SQL base + +The simplest is to use a CSV file as the storage table: + + parallel --sqlandworker csv:///%2Ftmp/log.csv \ + seq ::: 10 ::: 12 13 14 + cat /tmp/log.csv + +Note how '/' in the path must be written as %2F. + +Output will be similar to: + + Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal, + Command,V1,V2,Stdout,Stderr + 1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10 + 11 + 12 + ", + 2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10 + 11 + 12 + 13 + ", + 3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10 + 11 + 12 + 13 + 14 + ", + +A proper CSV reader (like LibreOffice or R's read.csv) will read this +format correctly - even with fields containing newlines as above. + +If the output is big you may want to put it into files using B<--results>: + + parallel --results outdir --sqlandworker csv:///%2Ftmp/log2.csv \ + seq ::: 10 ::: 12 13 14 + cat /tmp/log2.csv + +Output will be similar to: + + Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal, + Command,V1,V2,Stdout,Stderr + 1,:,1458824738.287,0.029,0,9,0,0, + "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr + 2,:,1458824738.298,0.025,0,12,0,0, + "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr + 3,:,1458824738.309,0.026,0,15,0,0, + "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr + + +=head2 DBURL as table + +The CSV file is an example of a DBURL. + +GNU B<parallel> uses a DBURL to address the table. A DBURL has this format: + + vendor://[[user][:password]@][host][:port]/[database[/table] + +Example: + + mysql://scott:tiger@my.example.com/mydatabase/mytable + postgresql://scott:tiger@pg.example.com/mydatabase/mytable + sqlite3:///%2Ftmp%2Fmydatabase/mytable + csv:///%2Ftmp/log.csv + +To refer to B</tmp/mydatabase> with B<sqlite> or B<csv> you need to +encode the B</> as B<%2F>. + +Run a job using B<sqlite> on B<mytable> in B</tmp/mydatabase>: + + DBURL=sqlite3:///%2Ftmp%2Fmydatabase + DBURLTABLE=$DBURL/mytable + parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz + +To see the result: + + sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;' + +Output will be similar to: + + Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal| + Command|V1|V2|Stdout|Stderr + 1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz + | + 2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz + | + 3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz + | + 4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz + | + +The first columns are well known from B<--joblog>. B<V1> and B<V2> are +data from the input sources. B<Stdout> and B<Stderr> are standard +output and standard error, respectively. + +=head2 Using multiple workers + +Using an SQL base as storage costs overhead in the order of 1 second +per job. + +One of the situations where it makes sense is if you have multiple +workers. + +You can then have a single master machine that submits jobs to the SQL +base (but does not do any of the work): + + parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz + +On the worker machines you run exactly the same command except you +replace B<--sqlmaster> with B<--sqlworker>. + + parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz + +To run a master and a worker on the same machine use B<--sqlandworker> +as shown earlier. + + +=head1 --pipe + +The B<--pipe> functionality puts GNU B<parallel> in a different mode: +Instead of treating the data on stdin (standard input) as arguments +for a command to run, the data will be sent to stdin (standard input) +of the command. + +The typical situation is: + + command_A | command_B | command_C + +where command_B is slow, and you want to speed up command_B. + +=head2 Chunk size + +By default GNU B<parallel> will start an instance of command_B, read a +chunk of 1 MB, and pass that to the instance. Then start another +instance, read another chunk, and pass that to the second instance. + + cat num1000000 | parallel --pipe wc + +Output (the order may be different): + + 165668 165668 1048571 + 149797 149797 1048579 + 149796 149796 1048572 + 149797 149797 1048579 + 149797 149797 1048579 + 149796 149796 1048572 + 85349 85349 597444 + +The size of the chunk is not exactly 1 MB because GNU B<parallel> only +passes full lines - never half a line, thus the blocksize is only +1 MB on average. You can change the block size to 2 MB with B<--block>: + + cat num1000000 | parallel --pipe --block 2M wc + +Output (the order may be different): + + 315465 315465 2097150 + 299593 299593 2097151 + 299593 299593 2097151 + 85349 85349 597444 + +GNU B<parallel> treats each line as a record. If the order of records +is unimportant (e.g. you need all lines processed, but you do not care +which is processed first), then you can use B<--roundrobin>. Without +B<--roundrobin> GNU B<parallel> will start a command per block; with +B<--roundrobin> only the requested number of jobs will be started +(B<--jobs>). The records will then be distributed between the running +jobs: + + cat num1000000 | parallel --pipe -j4 --roundrobin wc + +Output will be similar to: + + 149797 149797 1048579 + 299593 299593 2097151 + 315465 315465 2097150 + 235145 235145 1646016 + +One of the 4 instances got a single record, 2 instances got 2 full +records each, and one instance got 1 full and 1 partial record. + +=head2 Records + +GNU B<parallel> sees the input as records. The default record is a single +line. + +Using B<-N140000> GNU B<parallel> will read 140000 records at a time: + + cat num1000000 | parallel --pipe -N140000 wc + +Output (the order may be different): + + 140000 140000 868895 + 140000 140000 980000 + 140000 140000 980000 + 140000 140000 980000 + 140000 140000 980000 + 140000 140000 980000 + 140000 140000 980000 + 20000 20000 140001 + +Note how that the last job could not get the full 140000 lines, but +only 20000 lines. + +If a record is 75 lines B<-L> can be used: + + cat num1000000 | parallel --pipe -L75 wc + +Output (the order may be different): + + 165600 165600 1048095 + 149850 149850 1048950 + 149775 149775 1048425 + 149775 149775 1048425 + 149850 149850 1048950 + 149775 149775 1048425 + 85350 85350 597450 + 25 25 176 + +Note how GNU B<parallel> still reads a block of around 1 MB; but +instead of passing full lines to B<wc> it passes full 75 lines at a +time. This of course does not hold for the last job (which in this +case got 25 lines). + +=head2 Fixed length records + +Fixed length records can be processed by setting B<--recend ''> and +B<--block I<recordsize>>. A header of size I<n> can be processed with +B<--header .{I<n>}>. + +Here is how to process a file with a 4-byte header and a 3-byte record +size: + + cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \ + 'echo start; cat; echo' + +Output: + + start + HHHHAAA + start + HHHHCCC + start + HHHHBBB + +It may be more efficient to increase B<--block> to a multiplum of the +record size. + +=head2 Record separators + +GNU B<parallel> uses separators to determine where two records split. + +B<--recstart> gives the string that starts a record; B<--recend> gives the +string that ends a record. The default is B<--recend '\n'> (newline). + +If both B<--recend> and B<--recstart> are given, then the record will only +split if the recend string is immediately followed by the recstart +string. + +Here the B<--recend> is set to B<', '>: + + echo /foo, bar/, /baz, qux/, | \ + parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END + +Output: + + JOB1 + /foo, END + JOB2 + bar/, END + JOB3 + /baz, END + JOB4 + qux/, + END + +Here the B<--recstart> is set to B</>: + + echo /foo, bar/, /baz, qux/, | \ + parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END + +Output: + + JOB1 + /foo, barEND + JOB2 + /, END + JOB3 + /baz, quxEND + JOB4 + /, + END + +Here both B<--recend> and B<--recstart> are set: + + echo /foo, bar/, /baz, qux/, | \ + parallel -kN1 --recend ', ' --recstart / --pipe \ + echo JOB{#}\;cat\;echo END + +Output: + + JOB1 + /foo, bar/, END + JOB2 + /baz, qux/, + END + +Note the difference between setting one string and setting both strings. + +With B<--regexp> the B<--recend> and B<--recstart> will be treated as +a regular expression: + + echo foo,bar,_baz,__qux, | \ + parallel -kN1 --regexp --recend ,_+ --pipe \ + echo JOB{#}\;cat\;echo END + +Output: + + JOB1 + foo,bar,_END + JOB2 + baz,__END + JOB3 + qux, + END + +GNU B<parallel> can remove the record separators with +B<--remove-rec-sep>/B<--rrs>: + + echo foo,bar,_baz,__qux, | \ + parallel -kN1 --rrs --regexp --recend ,_+ --pipe \ + echo JOB{#}\;cat\;echo END + +Output: + + JOB1 + foo,barEND + JOB2 + bazEND + JOB3 + qux, + END + +=head2 Header + +If the input data has a header, the header can be repeated for each +job by matching the header with B<--header>. If headers start with +B<%> you can do this: + + cat num_%header | \ + parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat + +Output (the order may be different): + + JOB1 + %head1 + %head2 + 1 + 2 + 3 + JOB2 + %head1 + %head2 + 4 + 5 + 6 + JOB3 + %head1 + %head2 + 7 + 8 + 9 + JOB4 + %head1 + %head2 + 10 + +If the header is 2 lines, B<--header> 2 will work: + + cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat + +Output: Same as above. + +=head2 --pipepart + +B<--pipe> is not very efficient. It maxes out at around 500 +MB/s. B<--pipepart> can easily deliver 5 GB/s. But there are a few +limitations. The input has to be a normal file (not a pipe) given by +B<-a> or B<::::> and B<-L>/B<-l>/B<-N> do not work. B<--recend> and +B<--recstart>, however, I<do> work, and records can often be split on +that alone. + + parallel --pipepart -a num1000000 --block 3m wc + +Output (the order may be different): + + 444443 444444 3000002 + 428572 428572 3000004 + 126985 126984 888890 + + +=head1 Shebang + +=head2 Input data and parallel command in the same file + +GNU B<parallel> is often called as this: + + cat input_file | parallel command + +With B<--shebang> the I<input_file> and B<parallel> can be combined into the same script. + +UNIX shell scripts start with a shebang line like this: + + #!/bin/bash + +GNU B<parallel> can do that, too. With B<--shebang> the arguments can be +listed in the file. The B<parallel> command is the first line of the +script: + + #!/usr/bin/parallel --shebang -r echo + + foo + bar + baz + +Output (the order may be different): + + foo + bar + baz + +=head2 Parallelizing existing scripts + +GNU B<parallel> is often called as this: + + cat input_file | parallel command + parallel command ::: foo bar + +If B<command> is a script, B<parallel> can be combined into a single +file so this will run the script in parallel: + + cat input_file | command + command foo bar + +This B<perl> script B<perl_echo> works like B<echo>: + + #!/usr/bin/perl + + print "@ARGV\n" + +It can be called as this: + + parallel perl_echo ::: foo bar + +By changing the B<#!>-line it can be run in parallel: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/perl + + print "@ARGV\n" + +Thus this will work: + + perl_echo foo bar + +Output (the order may be different): + + foo + bar + +This technique can be used for: + +=over 9 + +=item Perl: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/perl + + print "Arguments @ARGV\n"; + + +=item Python: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/python + + import sys + print 'Arguments', str(sys.argv) + + +=item Bash/sh/zsh/Korn shell: + + #!/usr/bin/parallel --shebang-wrap /bin/bash + + echo Arguments "$@" + + +=item csh: + + #!/usr/bin/parallel --shebang-wrap /bin/csh + + echo Arguments "$argv" + + +=item Tcl: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh + + puts "Arguments $argv" + + +=item R: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave + + args <- commandArgs(trailingOnly = TRUE) + print(paste("Arguments ",args)) + + +=item GNUplot: + + #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot + + print "Arguments ", system('echo $ARG') + + +=item Ruby: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby + + print "Arguments " + puts ARGV + + +=item Octave: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/octave + + printf ("Arguments"); + arg_list = argv (); + for i = 1:nargin + printf (" %s", arg_list{i}); + endfor + printf ("\n"); + +=item Common LISP: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp + + (format t "~&~S~&" 'Arguments) + (format t "~&~S~&" *args*) + +=item PHP: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/php + <?php + echo "Arguments"; + foreach(array_slice($argv,1) as $v) + { + echo " $v"; + } + echo "\n"; + ?> + +=item Node.js: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/node + + var myArgs = process.argv.slice(2); + console.log('Arguments ', myArgs); + +=item LUA: + + #!/usr/bin/parallel --shebang-wrap /usr/bin/lua + + io.write "Arguments" + for a = 1, #arg do + io.write(" ") + io.write(arg[a]) + end + print("") + +=item C#: + + #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp + + var argv = Environment.GetEnvironmentVariable("ARGV"); + print("Arguments "+argv); + +=back + +=head1 Semaphore + +GNU B<parallel> can work as a counting semaphore. This is slower and less +efficient than its normal mode. + +A counting semaphore is like a row of toilets. People needing a toilet +can use any toilet, but if there are more people than toilets, they +will have to wait for one of the toilets to become available. + +An alias for B<parallel --semaphore> is B<sem>. + +B<sem> will follow a person to the toilets, wait until a toilet is +available, leave the person in the toilet and exit. + +B<sem --fg> will follow a person to the toilets, wait until a toilet is +available, stay with the person in the toilet and exit when the person +exits. + +B<sem --wait> will wait for all persons to leave the toilets. + +B<sem> does not have a queue discipline, so the next person is chosen +randomly. + +B<-j> sets the number of toilets. + +=head2 Mutex + +The default is to have only one toilet (this is called a mutex). The +program is started in the background and B<sem> exits immediately. Use +B<--wait> to wait for all B<sem>s to finish: + + sem 'sleep 1; echo The first finished' && + echo The first is now running in the background && + sem 'sleep 1; echo The second finished' && + echo The second is now running in the background + sem --wait + +Output: + + The first is now running in the background + The first finished + The second is now running in the background + The second finished + +The command can be run in the foreground with B<--fg>, which will only +exit when the command completes: + + sem --fg 'sleep 1; echo The first finished' && + echo The first finished running in the foreground && + sem --fg 'sleep 1; echo The second finished' && + echo The second finished running in the foreground + sem --wait + +The difference between this and just running the command, is that a +mutex is set, so if other B<sem>s were running in the background only one +would run at a time. + +To control which semaphore is used, use +B<--semaphorename>/B<--id>. Run this in one terminal: + + sem --id my_id -u 'echo First started; sleep 10; echo First done' + +and simultaneously this in another terminal: + + sem --id my_id -u 'echo Second started; sleep 10; echo Second done' + +Note how the second will only be started when the first has finished. + +=head2 Counting semaphore + +A mutex is like having a single toilet: When it is in use everyone +else will have to wait. A counting semaphore is like having multiple +toilets: Several people can use the toilets, but when they all are in +use, everyone else will have to wait. + +B<sem> can emulate a counting semaphore. Use B<--jobs> to set the +number of toilets like this: + + sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' && + sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' && + sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' && + sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' && + sem --wait --id my_id + +Output: + + Start 1 + Start 2 + Start 3 + 1 done + Start 4 + 2 done + 3 done + 4 done + +=head2 Timeout + +With B<--semaphoretimeout> you can force running the command anyway after +a period (positive number) or give up (negative number): + + sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' && + sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' && + sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs' + sem --id foo --wait + +Output: + + Slow started + parallel: Warning: Semaphore timed out. Stealing the semaphore. + Forced running after 1 sec + parallel: Warning: Semaphore timed out. Exiting. + Slow ended + +Note how the 'Give up' was not run. + +=head1 Informational + +GNU B<parallel> has some options to give short information about the +configuration. + +B<--help> will print a summary of the most important options: + + parallel --help + +Output: + + Usage: + + parallel [options] [command [arguments]] < list_of_arguments + parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))... + cat ... | parallel --pipe [options] [command [arguments]] + + -j n Run n jobs in parallel + -k Keep same order + -X Multiple arguments with context replace + --colsep regexp Split input on regexp for positional replacements + {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings + {3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings + With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} = + {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...} + + -S sshlogin Example: foo@server.example.com + --slf .. Use ~/.parallel/sshloginfile as the list of sshlogins + --trc {}.bar Shorthand for --transfer --return {}.bar --cleanup + --onall Run the given command with argument on all sshlogins + --nonall Run the given command with no arguments on all sshlogins + + --pipe Split stdin (standard input) to multiple jobs. + --recend str Record end separator for --pipe. + --recstart str Record start separator for --pipe. + + See 'man parallel' for details + + Academic tradition requires you to cite works you base your article on. + When using programs that use GNU Parallel to process data for publication + please cite: + + O. Tange (2011): GNU Parallel - The Command-Line Power Tool, + ;login: The USENIX Magazine, February 2011:42-47. + + This helps funding further development; AND IT WON'T COST YOU A CENT. + If you pay 10000 EUR you should feel free to use GNU Parallel without citing. + +When asking for help, always report the full output of this: + + parallel --version + +Output: + + GNU parallel 20230122 + Copyright (C) 2007-2024 Ole Tange, http://ole.tange.dk and Free Software + Foundation, Inc. + License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html> + This is free software: you are free to change and redistribute it. + GNU parallel comes with no warranty. + + Web site: https://www.gnu.org/software/parallel + + When using programs that use GNU Parallel to process data for publication + please cite as described in 'parallel --citation'. + +In scripts B<--minversion> can be used to ensure the user has at least +this version: + + parallel --minversion 20130722 && \ + echo Your version is at least 20130722. + +Output: + + 20160322 + Your version is at least 20130722. + +If you are using GNU B<parallel> for research the BibTeX citation can be +generated using B<--citation>: + + parallel --citation + +Output: + + Academic tradition requires you to cite works you base your article on. + When using programs that use GNU Parallel to process data for publication + please cite: + + @article{Tange2011a, + title = {GNU Parallel - The Command-Line Power Tool}, + author = {O. Tange}, + address = {Frederiksberg, Denmark}, + journal = {;login: The USENIX Magazine}, + month = {Feb}, + number = {1}, + volume = {36}, + url = {https://www.gnu.org/s/parallel}, + year = {2011}, + pages = {42-47}, + doi = {10.5281/zenodo.16303} + } + + (Feel free to use \nocite{Tange2011a}) + + This helps funding further development; AND IT WON'T COST YOU A CENT. + If you pay 10000 EUR you should feel free to use GNU Parallel without citing. + + If you send a copy of your published article to tange@gnu.org, it will be + mentioned in the release notes of next version of GNU Parallel. + +With B<--max-line-length-allowed> GNU B<parallel> will report the maximal +size of the command line: + + parallel --max-line-length-allowed + +Output (may vary on different systems): + + 131071 + +B<--number-of-cpus> and B<--number-of-cores> run system specific code to +determine the number of CPUs and CPU cores on the system. On +unsupported platforms they will return 1: + + parallel --number-of-cpus + parallel --number-of-cores + +Output (may vary on different systems): + + 4 + 64 + +=head1 Profiles + +The defaults for GNU B<parallel> can be changed systemwide by putting the +command line options in B</etc/parallel/config>. They can be changed for +a user by putting them in B<~/.parallel/config>. + +Profiles work the same way, but have to be referred to with B<--profile>: + + echo '--nice 17' > ~/.parallel/nicetimeout + echo '--timeout 300%' >> ~/.parallel/nicetimeout + parallel --profile nicetimeout echo ::: A B C + +Output: + + A + B + C + +Profiles can be combined: + + echo '-vv --dry-run' > ~/.parallel/dryverbose + parallel --profile dryverbose --profile nicetimeout echo ::: A B C + +Output: + + echo A + echo B + echo C + + +=head1 Spread the word + +I hope you have learned something from this tutorial. + +If you like GNU B<parallel>: + +=over 2 + +=item * + +(Re-)walk through the tutorial if you have not done so in the past year +(https://www.gnu.org/software/parallel/parallel_tutorial.html) + +=item * + +Give a demo at your local user group/your team/your colleagues + +=item * + +Post the intro videos and the tutorial on Reddit, Mastodon, Diaspora*, +forums, blogs, Identi.ca, Google+, Twitter, Facebook, Linkedin, and +mailing lists + +=item * + +Request or write a review for your favourite blog or magazine +(especially if you do something cool with GNU B<parallel>) + +=item * + +Invite me for your next conference + +=back + +If you use GNU B<parallel> for research: + +=over 2 + +=item * + +Please cite GNU B<parallel> in you publications (use B<--citation>) + +=back + +If GNU B<parallel> saves you money: + +=over 2 + +=item * + +(Have your company) donate to FSF or become a member +https://my.fsf.org/donate/ + +=back + +(C) 2013-2024 Ole Tange, GFDLv1.3+ (See +LICENSES/GFDL-1.3-or-later.txt) + + +=cut |