src/parallel_design.pod


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480

#!/usr/bin/perl -w

# SPDX-FileCopyrightText: 2021-2024 Ole Tange, http://ole.tange.dk and Free Software and Foundation, Inc.
# SPDX-License-Identifier: GFDL-1.3-or-later
# SPDX-License-Identifier: CC-BY-SA-4.0

=encoding utf8


=head1 Design of GNU Parallel

This document describes design decisions made in the development of
GNU B<parallel> and the reasoning behind them. It will give an
overview of why some of the code looks the way it does, and will help
new maintainers understand the code better.


=head2 One file program

GNU B<parallel> is a Perl script in a single file. It is object
oriented, but contrary to normal Perl scripts each class is not in its
own file. This is due to user experience: The goal is that in a pinch
the user will be able to get GNU B<parallel> working simply by copying
a single file: No need to mess around with environment variables like
PERL5LIB.


=head2 Choice of programming language

GNU B<parallel> is designed to be able to run on old systems. That
means that it cannot depend on a compiler being installed - and
especially not a compiler for a language that is younger than 20 years
old.

The goal is that you can use GNU B<parallel> on any system, even if
you are not allowed to install additional software.

Of all the systems I have experienced, I have yet to see a system that
had GCC installed that did not have Perl. The same goes for Rust, Go,
Haskell, and other younger languages. I have, however, seen systems
with Perl without any of the mentioned compilers.

Most modern systems also have either Python2 or Python3 installed, but
you still cannot be certain which version, and since Python2 cannot
run under Python3, Python is not an option.

Perl has the added benefit that implementing the {= perlexpr =}
replacement string was fairly easy.

The primary drawback is that Perl is slow. So there is an overhead of
3-10 ms/job and 1 ms/MB output (and even more if you use B<--tag>).


=head2 Old Perl style

GNU B<parallel> uses some old, deprecated constructs. This is due to a
goal of being able to run on old installations. Currently the target
is CentOS 3.9 and Perl 5.8.0.


=head2 Scalability up and down

The smallest system GNU B<parallel> is tested on is a 32 MB ASUS
WL500gP. The largest is a 2 TB 128-core machine. It scales up to
around 100 machines - depending on the duration of each job.


=head2 Exponentially back off

GNU B<parallel> busy waits. This is because the reason why a job is
not started may be due to load average (when using B<--load>), and
thus it will not make sense to just wait for a job to finish. Instead
the load average must be rechecked regularly. Load average is not the
only reason: B<--timeout> has a similar problem.

To not burn up too much CPU GNU B<parallel> sleeps exponentially
longer and longer if nothing happens, maxing out at 1 second.


=head2 Shell compatibility

It is a goal to have GNU B<parallel> work equally well in any
shell. However, in practice GNU B<parallel> is being developed in
B<bash> and thus testing in other shells is limited to reported bugs.

When an incompatibility is found there is often not an easy fix:
Fixing the problem in B<csh> often breaks it in B<bash>. In these
cases the fix is often to use a small Perl script and call that.


=head2 env_parallel

B<env_parallel> is a dummy shell script that will run if
B<env_parallel> is not an alias or a function and tell the user how to
activate the alias/function for the supported shells.

The alias or function will copy the current environment and run the
command with GNU B<parallel> in the copy of the environment.

The problem is that you cannot access all of the current environment
inside Perl. E.g. aliases, functions and unexported shell variables.

The idea is therefore to take the environment and put it in
B<$PARALLEL_ENV> which GNU B<parallel> prepends to every command.

The only way to have access to the environment is directly from the
shell, so the program must be written in a shell script that will be
sourced and there has to deal with the dialect of the relevant shell.


=head3 env_parallel.*

These are the files that implements the alias or function
B<env_parallel> for a given shell. It could be argued that these
should be put in some obscure place under /usr/lib, but by putting
them in your path it becomes trivial to find the path to them and
B<source> them:

  source `which env_parallel.foo`

The beauty is that they can be put anywhere in the path without the
user having to know the location. So if the user's path includes
/afs/bin/i386_fc5 or /usr/pkg/parallel/bin or
/usr/local/parallel/20161222/sunos5.6/bin the files can be put in the
dir that makes most sense for the sysadmin.


=head3 env_parallel.bash / env_parallel.sh / env_parallel.ash /
env_parallel.dash / env_parallel.zsh / env_parallel.ksh /
env_parallel.mksh

B<env_parallel.(bash|sh|ash|dash|ksh|mksh|zsh)> defines the function
B<env_parallel>. It uses B<alias> and B<typeset> to dump the
configuration (with a few exceptions) into B<$PARALLEL_ENV> before
running GNU B<parallel>.

After GNU B<parallel> is finished, B<$PARALLEL_ENV> is deleted.


=head3 env_parallel.csh

B<env_parallel.csh> has two purposes: If B<env_parallel> is not an
alias: make it into an alias that sets B<$PARALLEL> with arguments
and calls B<env_parallel.csh>.

If B<env_parallel> is an alias, then B<env_parallel.csh> uses
B<$PARALLEL> as the arguments for GNU B<parallel>.

It exports the environment by writing a variable definition to a file
for each variable.  The definitions of aliases are appended to this
file. Finally the file is put into B<$PARALLEL_ENV>.

GNU B<parallel> is then run and B<$PARALLEL_ENV> is deleted.


=head3 env_parallel.fish

First all functions definitions are generated using a loop and
B<functions>.

Dumping the scalar variable definitions is harder.

B<fish> can represent non-printable characters in (at least) 2
ways. To avoid problems all scalars are converted to \XX quoting.

Then commands to generate the definitions are made and separated by
NUL.

This is then piped into a Perl script that quotes all values. List
elements will be appended using two spaces.

Finally \n is converted into \1 because B<fish> variables cannot
contain \n. GNU B<parallel> will later convert all \1 from
B<$PARALLEL_ENV> into \n.

This is then all saved in B<$PARALLEL_ENV>.

GNU B<parallel> is called, and B<$PARALLEL_ENV> is deleted.


=head2 parset (supported in sh, ash, dash, bash, zsh, ksh, mksh)

B<parset> is a shell function. This is the reason why B<parset> can
set variables: It runs in the shell which is calling it.

It is also the reason why B<parset> does not work, when data is piped
into it: B<... | parset ...> makes B<parset> start in a subshell, and
any changes in environment can therefore not make it back to the
calling shell.


=head2 Job slots

The easiest way to explain what GNU B<parallel> does is to assume that
there are a number of job slots, and when a slot becomes available a
job from the queue will be run in that slot. But originally GNU
B<parallel> did not model job slots in the code. Job slots have been
added to make it possible to use B<{%}> as a replacement string.

While the job sequence number can be computed in advance, the job slot
can only be computed the moment a slot becomes available. So it has
been implemented as a stack with lazy evaluation: Draw one from an
empty stack and the stack is extended by one. When a job is done, push
the available job slot back on the stack.

This implementation also means that if you re-run the same jobs, you
cannot assume jobs will get the same slots. And if you use remote
executions, you cannot assume that a given job slot will remain on the
same remote server. This goes double since number of job slots can be
adjusted on the fly (by giving B<--jobs> a file name).


=head2 Rsync protocol version

B<rsync> 3.1.x uses protocol 31 which is unsupported by version
2.5.7. That means that you cannot push a file to a remote system using
B<rsync> protocol 31, if the remote system uses 2.5.7. B<rsync> does
not automatically downgrade to protocol 30.

GNU B<parallel> does not require protocol 31, so if the B<rsync>
version is >= 3.1.0 then B<--protocol 30> is added to force newer
B<rsync>s to talk to version 2.5.7.


=head2 Compression

GNU B<parallel> buffers output in temporary files.  B<--compress>
compresses the buffered data.  This is a bit tricky because there
should be no files to clean up if GNU B<parallel> is killed by a power
outage.

GNU B<parallel> first selects a compression program. If the user has
not selected one, the first of these that is in $PATH is used: B<pzstd
lbzip2 pbzip2 zstd pixz lz4 pigz lzop plzip lzip gzip lrz pxz bzip2
lzma xz clzip>. They are sorted by speed on a 128 core machine.

Schematically the setup is as follows:

  command started by parallel | compress > tmpfile
  cattail tmpfile | uncompress | parallel which reads the output

The setup is duplicated for both standard output (stdout) and standard
error (stderr).

GNU B<parallel> pipes output from the command run into the compression
program which saves to a tmpfile. GNU B<parallel> records the pid of
the compress program.  At the same time a small Perl script (called
B<cattail> above) is started: It basically does B<cat> followed by
B<tail -f>, but it also removes the tmpfile as soon as the first byte
is read, and it continuously checks if the pid of the compression
program is dead. If the compress program is dead, B<cattail> reads the
rest of tmpfile and exits.

As most compression programs write out a header when they start, the
tmpfile in practice is removed by B<cattail> after around 40 ms.

More detailed it works like this:

  bash ( command ) |
    sh ( emptywrapper ( bash ( compound compress ) ) >tmpfile )
  cattail ( rm tmpfile; compound decompress ) < tmpfile

This complex setup is to make sure compress program is only started if
there is input. This means each job will cause 8 processes to run. If
combined with B<--keep-order> these processes will run until the job
has been printed.


=head2 Wrapping

The command given by the user can be wrapped in multiple
templates. Templates can be wrapped in other templates.


=over 15

=item B<$COMMAND>

the command to run.


=item B<$INPUT>

the input to run.


=item B<$SHELL>

the shell that started GNU Parallel.


=item B<$SSHLOGIN>

the sshlogin.


=item B<$WORKDIR>

the working dir.


=item B<$FILE>

the file to read parts from.


=item B<$STARTPOS>

the first byte position to read from B<$FILE>.


=item B<$LENGTH>

the number of bytes to read from B<$FILE>.


=item --shellquote

echo I<Double quoted $INPUT>


=item --nice I<pri>

Remote: See B<The remote system wrapper>.

Local: B<setpriority(0,0,$nice)>

=item --cat

  cat > {}; $COMMAND {};
  perl -e '$bash = shift;
    $csh = shift;
    for(@ARGV) { unlink;rmdir; }
    if($bash =~ s/h//) { exit $bash;  }
    exit $csh;' "$?h" "$status" {};

{} is set to B<$PARALLEL_TMP> which is a tmpfile. The Perl script
saves the exit value, unlinks the tmpfile, and returns the exit value
- no matter if the shell is B<bash>/B<ksh>/B<zsh> (using $?) or
B<*csh>/B<fish> (using $status).

=item --fifo

  perl -e '($s,$c,$f) = @ARGV;
    # mkfifo $PARALLEL_TMP
    system "mkfifo", $f;
    # spawn $shell -c $command &
    $pid = fork || exec $s, "-c", $c;
    open($o,">",$f) || die $!;
    # cat > $PARALLEL_TMP
    while(sysread(STDIN,$buf,131072)){
       syswrite $o, $buf;
    }
    close $o;
    # waitpid to get the exit code from $command
    waitpid $pid,0;
    # Cleanup
    unlink $f;
    exit $?/256;' $SHELL -c $COMMAND $PARALLEL_TMP

This is an elaborate way of: mkfifo {}; run B<$COMMAND> in the
background using B<$SHELL>; copying STDIN to {}; waiting for background
to complete; remove {} and exit with the exit code from B<$COMMAND>.

It is made this way to be compatible with B<*csh>/B<fish>.

=item --pipepart


  < $FILE perl -e 'while(@ARGV) {
      sysseek(STDIN,shift,0) || die;
      $left = shift;
      while($read =
            sysread(STDIN,$buf,
                    ($left > 131072 ? 131072 : $left))){
        $left -= $read;
        syswrite(STDOUT,$buf);
      }
    }' $STARTPOS $LENGTH

This will read B<$LENGTH> bytes from B<$FILE> starting at B<$STARTPOS>
and send it to STDOUT.

=item --sshlogin $SSHLOGIN

  ssh $SSHLOGIN "$COMMAND"

=item --transfer

  ssh $SSHLOGIN mkdir -p ./$WORKDIR;
  rsync --protocol 30 -rlDzR \
        -essh ./{} $SSHLOGIN:./$WORKDIR;
  ssh $SSHLOGIN "$COMMAND"

Read about B<--protocol 30> in the section B<Rsync protocol version>.

=item --transferfile I<file>

<<todo>>

=item --basefile

<<todo>>

=item --return I<file>

  $COMMAND; _EXIT_status=$?; mkdir -p $WORKDIR;
  rsync --protocol 30 \
    --rsync-path=cd\ ./$WORKDIR\;\ rsync \
    -rlDzR -essh $SSHLOGIN:./$FILE ./$WORKDIR;
  exit $_EXIT_status;

The B<--rsync-path=cd ...> is needed because old versions of B<rsync>
do not support B<--no-implied-dirs>.

The B<$_EXIT_status> trick is to postpone the exit value. This makes it
incompatible with B<*csh> and should be fixed in the future. Maybe a
wrapping 'sh -c' is enough?

=item --cleanup

$RETURN is the wrapper from B<--return>

  $COMMAND; _EXIT_status=$?; $RETURN;
  ssh $SSHLOGIN \(rm\ -f\ ./$WORKDIR/{}\;\
                  rmdir\ ./$WORKDIR\ \>\&/dev/null\;\);
  exit $_EXIT_status;

B<$_EXIT_status>: see B<--return> above.


=item --pipe

  perl -e 'if(sysread(STDIN, $buf, 1)) {
	open($fh, "|-", "@ARGV") || die;
	syswrite($fh, $buf);
	# Align up to 128k block
	if($read = sysread(STDIN, $buf, 131071)) {
	    syswrite($fh, $buf);
	}
	while($read = sysread(STDIN, $buf, 131072)) {
	    syswrite($fh, $buf);
	}
	close $fh;
	exit ($?&127 ? 128+($?&127) : 1+$?>>8)
    }' $SHELL -c $COMMAND

This small wrapper makes sure that B<$COMMAND> will never be run if
there is no data.

=item --tmux

<<TODO Fixup with '-quoting>>
mkfifo /tmp/tmx3cMEV &&
  sh -c 'tmux -S /tmp/tmsaKpv1 new-session -s p334310 -d "sleep .2" >/dev/null 2>&1';
tmux -S /tmp/tmsaKpv1 new-window -t p334310 -n wc\ 10 \(wc\ 10\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status\ \>\>\ /tmp/tmx3cMEV\&echo\ wc\\\ 10\;\ echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10;
exec perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and exit($1);exit$c' /tmp/tmx3cMEV


mkfifo I<tmpfile.tmx>;
tmux -S <tmpfile.tms> new-session -s pI<PID> -d 'sleep .2' >&/dev/null;
tmux -S <tmpfile.tms> new-window -t pI<PID> -n <<shell quoted input>> \(<<shell quoted input>>\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status\ \>\>\ I<tmpfile.tmx>\&echo\ <<shell double quoted input>>\;echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10;
exec perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and exit($1);exit$c' I<tmpfile.tmx>

First a FIFO is made (.tmx). It is used for communicating exit
value. Next a new tmux session is made. This may fail if there is
already a session, so the output is ignored. If all job slots finish
at the same time, then B<tmux> will close the session. A temporary
socket is made (.tms) to avoid a race condition in B<tmux>. It is
cleaned up when GNU B<parallel> finishes.

The input is used as the name of the windows in B<tmux>. When the job
inside B<tmux> finishes, the exit value is printed to the FIFO (.tmx).
This FIFO is opened by B<perl> outside B<tmux>, and B<perl> then
removes the FIFO. B<Perl> blocks until the first value is read from
the FIFO, and this value is used as exit value.

To make it compatible with B<csh> and B<bash> the exit value is
printed as: $?h/$status and this is parsed by B<perl>.

There is a bug that makes it necessary to print the exit value 3
times.

Another bug in B<tmux> requires the length of the tmux title and
command to not have certain limits.  When inside these limits, 75 '\ '
are added to the title to force it to be outside the limits.

You can map the bad limits using:

  perl -e 'sub r { int(rand(shift)).($_[0] && "\t".r(@_)) } print map { r(@ARGV)."\n" } 1..10000' 1600 1500 90 |
    perl -ane '$F[0]+$F[1]+$F[2] < 2037 and print ' |
    parallel --colsep '\t' --tagstring '{1}\t{2}\t{3}' tmux -S /tmp/p{%}-'{=3 $_="O"x$_ =}' \
      new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm -f /tmp/p{%}-O*'

  perl -e 'sub r { int(rand(shift)).($_[0] && "\t".r(@_)) } print map { r(@ARGV)."\n" } 1..10000' 17000 17000 90 |
    parallel --colsep '\t' --tagstring '{1}\t{2}\t{3}' \
  tmux -S /tmp/p{%}-'{=3 $_="O"x$_ =}' new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm /tmp/p{%}-O*'
  > value.csv 2>/dev/null

  R -e 'a<-read.table("value.csv");X11();plot(a[,1],a[,2],col=a[,4]+5,cex=0.1);Sys.sleep(1000)'

For B<tmux 1.8> 17000 can be lowered to 2100.

The interesting areas are title 0..1000 with (title + whole command)
in 996..1127 and 9331..9636.

=back

The ordering of the wrapping is important:

=over 5

=item *

$PARALLEL_ENV which is set in env_parallel.* must be prepended to the
command first, as the command may contain exported variables or
functions.

=item *

B<--nice>/B<--cat>/B<--fifo> should be done on the remote machine

=item *

B<--pipepart>/B<--pipe> should be done on the local machine inside B<--tmux>

=back


=head2 Convenience options --nice --basefile --transfer --return
--cleanup --tmux --group --compress --cat --fifo --workdir --tag
--tagstring

These are all convenience options that make it easier to do a
task. But more importantly: They are tested to work on corner cases,
too. Take B<--nice> as an example:

  nice parallel command ...

will work just fine. But when run remotely, you need to move the nice
command so it is being run on the server:

  parallel -S server nice command ...

And this will again work just fine, as long as you are running a
single command. When you are running a composed command you need nice
to apply to the whole command, and it gets harder still:

  parallel -S server -q nice bash -c 'command1 ...; cmd2 | cmd3'

It is not impossible, but by using B<--nice> GNU B<parallel> will do
the right thing for you. Similarly when transferring files: It starts
to get hard when the file names contain space, :, `, *, or other
special characters.

To run the commands in a B<tmux> session you basically just need to
quote the command. For simple commands that is easy, but when commands
contain special characters, it gets much harder to get right.

B<--compress> not only compresses standard output (stdout) but also
standard error (stderr); and it does so into files, that are open but
deleted, so a crash will not leave these files around.

B<--cat> and B<--fifo> are easy to do by hand, until you want to clean
up the tmpfile and keep the exit code of the command.

The real killer comes when you try to combine several of these: Doing
that correctly for all corner cases is next to impossible to do by
hand.

=head2 --shard

The simple way to implement sharding would be to:

=over 5

=item 1

start n jobs,

=item 2

split each line into columns,

=item 3

select the data from the relevant column

=item 4

compute a hash value from the data

=item 5

take the modulo n of the hash value

=item 6

pass the full line to the jobslot that has the computed value

=back

Unfortunately Perl is rather slow at computing the hash value (and
somewhat slow at splitting into columns).

One solution is to use a compiled language for the splitting and
hashing, but that would go against the design criteria of not
depending on a compiler.

Luckily those tasks can be parallelized. So GNU B<parallel> starts n
sharders that do step 2-6, and passes blocks of 100k to each of those
in a round robin manner. To make sure these sharders compute the hash
the same way, $PERL_HASH_SEED is set to the same value for all sharders.

Running n sharders poses a new problem: Instead of having n outputs
(one for each computed value) you now have n outputs for each of the n
values, so in total n*n outputs; and you need to merge these n*n
outputs together into n outputs.

This can be done by simply running 'parallel -j0 --lb cat :::
outputs_for_one_value', but that is rather inefficient, as it spawns a
process for each file. Instead the core code from 'parcat' is run,
which is also a bit faster.

All the sharders and parcats communicate through named pipes that are
unlinked as soon as they are opened.


=head2 Shell shock

The shell shock bug in B<bash> did not affect GNU B<parallel>, but the
solutions did. B<bash> first introduced functions in variables named:
I<BASH_FUNC_myfunc()> and later changed that to
I<BASH_FUNC_myfunc%%>. When transferring functions GNU B<parallel>
reads off the function and changes that into a function definition,
which is copied to the remote system and executed before the actual
command is executed. Therefore GNU B<parallel> needs to know how to
read the function.

From version 20150122 GNU B<parallel> tries both the ()-version and
the %%-version, and the function definition works on both pre- and
post-shell shock versions of B<bash>.


=head2 The remote system wrapper

The remote system wrapper does some initialization before starting the
command on the remote system.

=head3 Make quoting unnecessary by hex encoding everything

When you run B<ssh server foo> then B<foo> has to be quoted once:

  ssh server "echo foo; echo bar"

If you run B<ssh server1 ssh server2 foo> then B<foo> has to be quoted
twice:

  ssh server1 ssh server2 \'"echo foo; echo bar"\'

GNU B<parallel> avoids this by packing everyting into hex values and
running a command that does not need quoting:

  perl -X -e GNU_Parallel_worker,eval+pack+q/H10000000/,join+q//,@ARGV

This command reads hex from the command line and converts that to
bytes that are then eval'ed as a Perl expression.

The string B<GNU_Parallel_worker> is not needed. It is simply there to
let the user know, that this process is GNU B<parallel> working.

=head3 Ctrl-C and standard error (stderr)

If the user presses Ctrl-C the user expects jobs to stop. This works
out of the box if the jobs are run locally. Unfortunately it is not so
simple if the jobs are run remotely.

If remote jobs are run in a tty using B<ssh -tt>, then Ctrl-C works,
but all output to standard error (stderr) is sent to standard output
(stdout). This is not what the user expects.

If remote jobs are run without a tty using B<ssh> (without B<-tt>),
then output to standard error (stderr) is kept on stderr, but Ctrl-C
does not kill remote jobs. This is not what the user expects.

So what is needed is a way to have both. It seems the reason why
Ctrl-C does not kill the remote jobs is because the shell does not
propagate the hang-up signal from B<sshd>. But when B<sshd> dies, the
parent of the login shell becomes B<init> (process id 1). So by
exec'ing a Perl wrapper to monitor the parent pid and kill the child
if the parent pid becomes 1, then Ctrl-C works and stderr is kept on
stderr.

Ctrl-C does, however, kill the ssh connection, so any output from
a remote dying process is lost.

To be able to kill all (grand)*children a new process group is
started.


=head3 --nice

B<nice>ing the remote process is done by B<setpriority(0,0,$nice)>. A
few old systems do not implement this and B<--nice> is unsupported on
those.


=head3 Setting $PARALLEL_TMP

B<$PARALLEL_TMP> is used by B<--fifo> and B<--cat> and must point to a
non-exitent file in B<$TMPDIR>. This file name is computed on the
remote system.


=head3 The wrapper

The wrapper looks like this:

  $shell = $PARALLEL_SHELL || $SHELL;
  $tmpdir = $TMPDIR || $PARALLEL_REMOTE_TMPDIR;
  $nice = $opt::nice;
  $termseq = $opt::termseq;

  # Check that $tmpdir is writable
  -w $tmpdir ||
      die("$tmpdir is not writable.".
  	" Set PARALLEL_REMOTE_TMPDIR");
  # Set $PARALLEL_TMP to a non-existent file name in $TMPDIR
  do {
      $ENV{PARALLEL_TMP} = $tmpdir."/par".
  	join"", map { (0..9,"a".."z","A".."Z")[rand(62)] } (1..5);
  } while(-e $ENV{PARALLEL_TMP});
  # Set $script to a non-existent file name in $TMPDIR
  do {
      $script = $tmpdir."/par".
  	join"", map { (0..9,"a".."z","A".."Z")[rand(62)] } (1..5);
  } while(-e $script);
  # Create a script from the hex code
  # that removes itself and runs the commands
  open($fh,">",$script) || die;
  # ' needed due to rc-shell
  print($fh("rm \'$script\'\n",$bashfunc.$cmd));
  close $fh;
  my $parent = getppid;
  my $done = 0;
  $SIG{CHLD} = sub { $done = 1; };
  $pid = fork;
  unless($pid) {
      # Make own process group to be able to kill HUP it later
      eval { setpgrp };
      # Set nice value
      eval { setpriority(0,0,$nice) };
      # Run the script
      exec($shell,$script);
      die("exec failed: $!");
  }
  while((not $done) and (getppid == $parent)) {
      # Parent pid is not changed, so sshd is alive
      # Exponential sleep up to 1 sec
      $s = $s < 1 ? 0.001 + $s * 1.03 : $s;
      select(undef, undef, undef, $s);
  }
  if(not $done) {
      # sshd is dead: User pressed Ctrl-C
      # Kill as per --termseq
      my @term_seq = split/,/,$termseq;
      if(not @term_seq) {
  	@term_seq = ("TERM",200,"TERM",100,"TERM",50,"KILL",25);
      }
      while(@term_seq && kill(0,-$pid)) {
  	kill(shift @term_seq, -$pid);
  	select(undef, undef, undef, (shift @term_seq)/1000);
      }
  }
  wait;
  exit ($?&127 ? 128+($?&127) : 1+$?>>8)


=head2 Transferring of variables and functions

Transferring of variables and functions given by B<--env> is done by
running a Perl script remotely that calls the actual command. The Perl
script sets B<$ENV{>I<variable>B<}> to the correct value before
exec'ing a shell that runs the function definition followed by the
actual command.

The function B<env_parallel> copies the full current environment into
the environment variable B<PARALLEL_ENV>. This variable is picked up
by GNU B<parallel> and used to create the Perl script mentioned above.


=head2 Base64 encoded bzip2

B<csh> limits words of commands to 1024 chars. This is often too little
when GNU B<parallel> encodes environment variables and wraps the
command with different templates. All of these are combined and quoted
into one single word, which often is longer than 1024 chars.

When the line to run is > 1000 chars, GNU B<parallel> therefore
encodes the line to run. The encoding B<bzip2>s the line to run,
converts this to base64, splits the base64 into 1000 char blocks (so
B<csh> does not fail), and prepends it with this Perl script that
decodes, decompresses and B<eval>s the line.

    @GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
    eval "@GNU_Parallel";

    $SIG{CHLD}="IGNORE";
    # Search for bzip2. Not found => use default path
    my $zip = (grep { -x $_ } "/usr/local/bin/bzip2")[0] || "bzip2";
    # $in = stdin on $zip, $out = stdout from $zip
    my($in, $out,$eval);
    open3($in,$out,">&STDERR",$zip,"-dc");
    if(my $perlpid = fork) {
        close $in;
        $eval = join "", <$out>;
        close $out;
    } else {
        close $out;
        # Pipe decoded base64 into 'bzip2 -dc'
        print $in (decode_base64(join"",@ARGV));
        close $in;
        exit;
    }
    wait;
    eval $eval;

Perl and B<bzip2> must be installed on the remote system, but a small
test showed that B<bzip2> is installed by default on all platforms
that runs GNU B<parallel>, so this is not a big problem.

The added bonus of this is that much bigger environments can now be
transferred as they will be below B<bash>'s limit of 131072 chars.


=head2 Which shell to use

Different shells behave differently. A command that works in B<tcsh>
may not work in B<bash>.  It is therefore important that the correct
shell is used when GNU B<parallel> executes commands.

GNU B<parallel> tries hard to use the right shell. If GNU B<parallel>
is called from B<tcsh> it will use B<tcsh>.  If it is called from
B<bash> it will use B<bash>. It does this by looking at the
(grand)*parent process: If the (grand)*parent process is a shell, use
this shell; otherwise look at the parent of this (grand)*parent. If
none of the (grand)*parents are shells, then $SHELL is used.

This will do the right thing if called from:

=over 2

=item *

an interactive shell

=item *

a shell script

=item *

a Perl script in `` or using B<system> if called as a single string.

=back

While these cover most cases, there are situations where it will fail:

=over 2

=item *

When run using B<exec>.

=item *

When run as the last command using B<-c> from another shell (because
some shells use B<exec>):

  zsh% bash -c "parallel 'echo {} is not run in bash; \
       set | grep BASH_VERSION' ::: This"

You can work around that by appending '&& true':

  zsh% bash -c "parallel 'echo {} is run in bash; \
       set | grep BASH_VERSION' ::: This && true"

=item *

When run in a Perl script using B<system> with parallel as the first
string:

  #!/usr/bin/perl

  system("parallel",'setenv a {}; echo $a',":::",2);

Here it depends on which shell is used to call the Perl script. If the
Perl script is called from B<tcsh> it will work just fine, but if it
is called from B<bash> it will fail, because the command B<setenv> is
not known to B<bash>.

=back

If GNU B<parallel> guesses wrong in these situation, set the shell using
B<$PARALLEL_SHELL>.


=head2 Always running commands in a shell

If the command is a simple command with no redirection and setting of
variables, the command I<could> be run without spawning a
shell. E.g. this simple B<grep> matching either 'ls ' or ' wc E<gt>E<gt> c':

  parallel "grep -E 'ls | wc >> c' {}" ::: foo

could be run as:

  system("grep","-E","ls | wc >> c","foo");

However, as soon as the command is a bit more complex a shell I<must>
be spawned:

  parallel "grep -E 'ls | wc >> c' {} | wc >> c" ::: foo
  parallel "LANG=C grep -E 'ls | wc >> c' {}" ::: foo

It is impossible to tell how B<| wc E<gt>E<gt> c> should be
interpreted without parsing the string (is the B<|> a pipe in shell or
an alternation in a B<grep> regexp?  Is B<LANG=C> a command in B<csh>
or setting a variable in B<bash>? Is B<E<gt>E<gt>> redirection or part
of a regexp?).

On top of this, wrapper scripts will often require a shell to be
spawned.

The downside is that you need to quote special shell chars twice:

  parallel echo '*' ::: This will expand the asterisk
  parallel echo "'*'" ::: This will not
  parallel "echo '*'" ::: This will not
  parallel echo '\*' ::: This will not
  parallel echo \''*'\' ::: This will not
  parallel -q echo '*' ::: This will not

B<-q> will quote all special chars, thus redirection will not work:
this prints '* > out.1' and I<does not> save '*' into the file out.1:

  parallel -q echo "*" ">" out.{} ::: 1

GNU B<parallel> tries to live up to Principle Of Least Astonishment
(POLA), and the requirement of using B<-q> is hard to understand, when
you do not see the whole picture.


=head2 Quoting

Quoting depends on the shell. For most shells '-quoting is used for
strings containing special characters.

For B<tcsh>/B<csh> newline is quoted as \ followed by newline. Other
special characters are also \-quoted.

For B<rc> everything is quoted using '.


=head2 --pipepart vs. --pipe

While B<--pipe> and B<--pipepart> look much the same to the user, they are
implemented very differently.

With B<--pipe> GNU B<parallel> reads the blocks from standard input
(stdin), which is then given to the command on standard input (stdin);
so every block is being processed by GNU B<parallel> itself. This is
the reason why B<--pipe> maxes out at around 500 MB/sec.

B<--pipepart>, on the other hand, first identifies at which byte
positions blocks start and how long they are. It does that by seeking
into the file by the size of a block and then reading until it meets
end of a block. The seeking explains why GNU B<parallel> does not know
the line number and why B<-L/-l> and B<-N> do not work.

With a reasonable block and file size this seeking is more than 1000
time faster than reading the full file. The byte positions are then
given to a small script that reads from position X to Y and sends
output to standard output (stdout). This small script is prepended to
the command and the full command is executed just as if GNU
B<parallel> had been in its normal mode. The script looks like this:

  < file perl -e 'while(@ARGV) {
     sysseek(STDIN,shift,0) || die;
     $left = shift;
     while($read = sysread(STDIN,$buf,
                           ($left > 131072 ? 131072 : $left))){
       $left -= $read; syswrite(STDOUT,$buf);
     }
  }' startbyte length_in_bytes

It delivers 1 GB/s per core.

Instead of the script B<dd> was tried, but many versions of B<dd> do
not support reading from one byte to another and might cause partial
data. See this for a surprising example:

  yes | dd bs=1024k count=10 | wc


=head2 --block-size adjustment

Every time GNU B<parallel> detects a record bigger than
B<--block-size> it increases the block size by 30%. A small
B<--block-size> gives very poor performance; by exponentially
increasing the block size performance will not suffer.

GNU B<parallel> will waste CPU power if B<--block-size> does not
contain a full record, because it tries to find a full record and will
fail to do so. The recommendation is therefore to use a
B<--block-size> > 2 records, so you always get at least one full
record when you read one block.

If you use B<-N> then B<--block-size> should be big enough to contain
N+1 records.


=head2 Automatic --block-size computation

With B<--pipepart> GNU B<parallel> can compute the B<--block-size>
automatically. A B<--block-size> of B<-1> will use a block size so
that each jobslot will receive approximately 1 block.  B<--block -2>
will pass 2 blocks to each jobslot and B<->I<n> will pass I<n> blocks
to each jobslot.

This can be done because B<--pipepart> reads from files, and we can
compute the total size of the input.


=head2 --jobs and --onall

When running the same commands on many servers what should B<--jobs>
signify? Is it the number of servers to run on in parallel?  Is it the
number of jobs run in parallel on each server?

GNU B<parallel> lets B<--jobs> represent the number of servers to run
on in parallel. This is to make it possible to run a sequence of
commands (that cannot be parallelized) on each server, but run the
same sequence on multiple servers.


=head2 --shuf

When using B<--shuf> to shuffle the jobs, all jobs are read, then they
are shuffled, and finally executed. When using SQL this makes the
B<--sqlmaster> be the part that shuffles the jobs. The B<--sqlworker>s
simply executes according to Seq number.


=head2 --csv

B<--pipepart> is incompatible with B<--csv> because you can have
records like:

  a,b,c
  a,"
  a,b,c
  a,b,c
  a,b,c
  ",c
  a,b,c

Here the second record contains a multi-line field that looks like
records. Since B<--pipepart> does not read then whole file when
searching for record endings, it may start reading in this multi-line
field, which would be wrong.


=head2 Buffering on disk

GNU B<parallel> buffers output, because if output is not buffered you
have to be ridiculously careful on sizes to avoid mixing of outputs
(see excellent example on https://catern.com/posts/pipes.html).

GNU B<parallel> buffers on disk in $TMPDIR using files, that are
removed as soon as they are created, but which are kept open. So even
if GNU B<parallel> is killed by a power outage, there will be no files
to clean up afterwards. Another advantage is that the file system is
aware that these files will be lost in case of a crash, so it does
not need to sync them to disk.

It gives the odd situation that a disk can be fully used, but there
are no visible files on it.


=head3 Partly buffering in memory

When using output formats SQL and CSV then GNU Parallel has to read
the whole output into memory. When run normally it will only read the
output from a single job. But when using B<--linebuffer> every line
printed will also be buffered in memory - for all jobs currently
running.

If memory is tight, then do not use the output format SQL/CSV with
B<--linebuffer>.


=head3 Comparing to buffering in memory

B<gargs> is a parallelizing tool that buffers in memory. It is
therefore a useful way of comparing the advantages and disadvantages
of buffering in memory to buffering on disk.

On an system with 6 GB RAM free and 6 GB free swap these were tested
with different sizes:

  echo /dev/zero | gargs "head -c $size {}" >/dev/null
  echo /dev/zero | parallel "head -c $size {}" >/dev/null

The results are here:

  JobRuntime      Command
       0.344      parallel_test 1M
       0.362      parallel_test 10M
       0.640      parallel_test 100M
       9.818      parallel_test 1000M
      23.888      parallel_test 2000M
      30.217      parallel_test 2500M
      30.963      parallel_test 2750M
      34.648      parallel_test 3000M
      43.302      parallel_test 4000M
      55.167      parallel_test 5000M
      67.493      parallel_test 6000M
     178.654      parallel_test 7000M
     204.138      parallel_test 8000M
     230.052      parallel_test 9000M
     255.639      parallel_test 10000M
     757.981      parallel_test 30000M
       0.537      gargs_test 1M
       0.292      gargs_test 10M
       0.398      gargs_test 100M
       3.456      gargs_test 1000M
       8.577      gargs_test 2000M
      22.705      gargs_test 2500M
     123.076      gargs_test 2750M
      89.866      gargs_test 3000M
     291.798      gargs_test 4000M

GNU B<parallel> is pretty much limited by the speed of the disk: Up to
6 GB data is written to disk but cached, so reading is fast. Above 6
GB data are both written and read from disk. When the 30000MB job is
running, the disk system is slow, but usable: If you are not using the
disk, you almost do not feel it.

B<gargs> has a speed advantage up until 2500M where it hits a
wall. Then the system starts swapping like crazy and is completely
unusable. At 5000M it goes out of memory.

You can make GNU B<parallel> behave similar to B<gargs> if you point
$TMPDIR to a tmpfs-filesystem: It will be faster for small outputs,
but may kill your system for larger outputs and cause you to lose
output.


=head2 Disk full

GNU B<parallel> buffers on disk. If the disk is full, data may be
lost. To check if the disk is full GNU B<parallel> writes a 8193 byte
file every second. If this file is written successfully, it is removed
immediately. If it is not written successfully, the disk is full. The
size 8193 was chosen because 8192 gave wrong result on some file
systems, whereas 8193 did the correct thing on all tested filesystems.


=head2 Memory usage

Normally GNU B<parallel> will use around 17 MB RAM constantly - no
matter how many jobs or how much output there is. There are a few
things that cause the memory usage to rise:

=over 3

=item *

Multiple input sources. GNU B<parallel> reads an input source only
once. This is by design, as an input source can be a stream
(e.g. FIFO, pipe, standard input (stdin)) which cannot be rewound and
read again. When reading a single input source, the memory is freed as
soon as the job is done - thus keeping the memory usage constant.

But when reading multiple input sources GNU B<parallel> keeps the
already read values for generating all combinations with other input
sources.

=item *

Computing the number of jobs. B<--bar>, B<--eta>, and B<--halt xx%>
use B<total_jobs()> to compute the total number of jobs. It does this
by generating the data structures for all jobs. All these job data
structures will be stored in memory and take up around 400 bytes/job.

=item *

Buffering a full line. B<--linebuffer> will read a full line per
running job. A very long output line (say 1 GB without \n) will
increase RAM usage temporarily: From when the beginning of the line is
read till the line is printed.

=item *

Buffering the full output of a single job. This happens when using
B<--results *.csv/*.tsv> or B<--sql*>. Here GNU B<parallel> will read
the whole output of a single job and save it as csv/tsv or SQL.

=back


=head2 Argument separators ::: :::: :::+ ::::+

The argument separator B<:::> was chosen because I have never seen
B<:::> used in any command. The natural choice B<--> would be a bad
idea since it is not unlikely that the template command will contain
B<-->. I have seen B<::> used in programming languanges to separate
classes, and I did not want the user to be confused that the separator
had anything to do with classes.

B<:::> also makes a visual separation, which is good if there are
multiple B<:::>.

When B<:::> was chosen, B<::::> came as a fairly natural extension.

Linking input sources meant having to decide for some way to indicate
linking of B<:::> and B<::::>. B<:::+> and B<::::+> were chosen, so
that they were similar to B<:::> and B<::::>.

In 2022 I realized that B<///> would have been an even better choice,
because you cannot have an file named B<///> whereas you I<can> have a
file named B<:::>.


=head2 Perl replacement strings, {= =}, and --rpl

The shorthands for replacement strings make a command look more
cryptic. Different users will need different replacement
strings. Instead of inventing more shorthands you get more
flexible replacement strings if they can be programmed by the user.

The language Perl was chosen because GNU B<parallel> is written in
Perl and it was easy and reasonably fast to run the code given by the
user.

If a user needs the same programmed replacement string again and
again, the user may want to make his own shorthand for it. This is
what B<--rpl> is for. It works so well, that even GNU B<parallel>'s
own shorthands are implemented using B<--rpl>.

In Perl code the bigrams B<{=> and B<=}> rarely exist. They look like a
matching pair and can be entered on all keyboards. This made them good
candidates for enclosing the Perl expression in the replacement
strings. Another candidate ,, and ,, was rejected because they do not
look like a matching pair. B<--parens> was made, so that the users can
still use ,, and ,, if they like: B<--parens ,,,,>

Internally, however, the B<{=> and B<=}> are replaced by \257< and
\257>. This is to make it simpler to make regular expressions. You
only need to look one character ahead, and never have to look behind.


=head2 Test suite

GNU B<parallel> uses its own testing framework. This is mostly due to
historical reasons. It deals reasonably well with tests that are
dependent on how long a given test runs (e.g. more than 10 secs is a
pass, but less is a fail). It parallelizes most tests, but it is easy
to force a test to run as the single test (which may be important for
timing issues). It deals reasonably well with tests that fail
intermittently. It detects which tests failed and pushes these to the
top, so when running the test suite again, the tests that failed most
recently are run first.

If GNU B<parallel> should adopt a real testing framework then those
elements would be important.

Since many tests are dependent on which hardware it is running on,
these tests break when run on a different hardware than what the test
was written for.

When most bugs are fixed a test is added, so this bug will not
reappear. It is, however, sometimes hard to create the environment in
which the bug shows up - especially if the bug only shows up
sometimes. One of the harder problems was to make a machine start
swapping without forcing it to its knees.


=head2 Median run time

Using a percentage for B<--timeout> causes GNU B<parallel> to compute
the median run time of a job. The median is a better indicator of the
expected run time than average, because there will often be outliers
taking way longer than the normal run time.

To avoid keeping all run times in memory, an implementation of
remedian was made (Rousseeuw et al).


=head2 Error messages and warnings

Error messages like: ERROR, Not found, and 42 are not very
helpful. GNU B<parallel> strives to inform the user:

=over 2

=item *

What went wrong?

=item *

Why did it go wrong?

=item *

What can be done about it?

=back

Unfortunately it is not always possible to predict the root cause of
the error.


=head2 Determine number of CPUs

CPUs is an ambiguous term. It can mean the number of socket filled
(i.e. the number of physical chips). It can mean the number of cores
(i.e. the number of physical compute cores). It can mean the number of
hyperthreaded cores (i.e. the number of virtual cores - with some of
them possibly being hyperthreaded).

On ark.intel.com Intel uses the terms I<cores> and I<threads> for
number of physical cores and the number of hyperthreaded cores
respectively.

GNU B<parallel> uses uses I<CPUs> as the number of compute units and
the terms I<sockets>, I<cores>, and I<threads> to specify how the
number of compute units is calculated.


=head2 Computation of load

Contrary to the obvious B<--load> does not use load average. This is
due to load average rising too slowly. Instead it uses B<ps> to list
the number of threads in running or blocked state (state D, O or
R). This gives an instant load.

As remote calculation of load can be slow, a process is spawned to run
B<ps> and put the result in a file, which is then used next time.


=head2 Killing jobs

GNU B<parallel> kills jobs. It can be due to B<--memfree>, B<--halt>,
or when GNU B<parallel> meets a condition from which it cannot
recover. Every job is started as its own process group. This way any
(grand)*children will get killed, too. The process group is killed
with the specification mentioned in B<--termseq>.


=head2 SQL interface

GNU B<parallel> uses the DBURL from GNU B<sql> to give database
software, username, password, host, port, database, and table in a
single string.

The DBURL must point to a table name. The table will be dropped and
created. The reason for not reusing an existing table is that the user
may have added more input sources which would require more columns in
the table. By prepending '+' to the DBURL the table will not be
dropped.

The table columns are similar to joblog with the addition of B<V1>
.. B<Vn> which are values from the input sources, and Stdout and
Stderr which are the output from standard output and standard error,
respectively.

The Signal column has been renamed to _Signal due to Signal being a
reserved word in MySQL.


=head2 Logo

The logo is inspired by the Cafe Wall illusion. The font is DejaVu
Sans.

=head2 Citation notice

For details: See
https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt

Funding a free software project is hard. GNU B<parallel> is no
exception. On top of that it seems the less visible a project is, the
harder it is to get funding. And the nature of GNU B<parallel> is that
it will never be seen by "the guy with the checkbook", but only by the
people doing the actual work.

This problem has been covered by others - though no solution has been
found: https://www.slideshare.net/NadiaEghbal/consider-the-maintainer
https://www.numfocus.org/blog/why-is-numpy-only-now-getting-funded/

Before implementing the citation notice it was discussed with the
users:
https://lists.gnu.org/archive/html/parallel/2013-11/msg00006.html

Having to spend 10 seconds on running B<parallel --citation> once is
no doubt not an ideal solution, but no one has so far come up with an
ideal solution - neither for funding GNU B<parallel> nor other free
software.

If you believe you have the perfect solution, you should try it out,
and if it works, you should post it on the email list. Ideas that will
cost work and which have not been tested are, however, unlikely to be
prioritized.

Running B<parallel --citation> one single time takes less than 10
seconds, and will silence the citation notice for future runs. This is
comparable to graphical tools where you have to click a checkbox
saying "Do not show this again". But if that is too much trouble for
you, why not use one of the alternatives instead?  See a list in:
B<man parallel_alternatives>.

As the request for citation is not a legal requirement this is
acceptable under GPLv3 and cleared with Richard M. Stallman
himself. Thus it does not fall under this:
https://www.gnu.org/licenses/gpl-faq.en.html#RequireCitation


=head1 Ideas for new design

=head2 Multiple processes working together

Open3 is slow. Printing is slow. It would be good if they did not tie
up resources, but were run in separate threads.


=head2 --rrs on remote using a perl wrapper

... | perl -pe '$/=$recend$recstart;BEGIN{ if(substr($_) eq $recstart) substr($_)="" } eof and substr($_) eq $recend) substr($_)=""

It ought to be possible to write a filter that removed rec sep on the
fly instead of inside GNU B<parallel>. This could then use more cpus.

Will that require 2x record size memory?

Will that require 2x block size memory?


=head1 Historical decisions

These decisions were relevant for earlier versions of GNU B<parallel>,
but not the current version. They are kept here as historical record.


=head2 --tollef

You can read about the history of GNU B<parallel> on
https://www.gnu.org/software/parallel/history.html

B<--tollef> was included to make GNU B<parallel> switch compatible
with the parallel from moreutils (which is made by Tollef Fog
Heen). This was done so that users of that parallel easily could port
their use to GNU B<parallel>: Simply set B<PARALLEL="--tollef"> and
that would be it.

But several distributions chose to make B<--tollef> global (by putting
it into /etc/parallel/config) without making the users aware of this,
and that caused much confusion when people tried out the examples from
GNU B<parallel>'s man page and these did not work.  The users became
frustrated because the distribution did not make it clear to them that
it has made B<--tollef> global.

So to lessen the frustration and the resulting support, B<--tollef>
was obsoleted 20130222 and removed one year later.


=cut