1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
|
.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
. if \nF \{\
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{\
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "PERLOPENTUT 1"
.TH PERLOPENTUT 1 2024-01-12 "perl v5.38.2" "Perl Programmers Reference Guide"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
perlopentut \- simple recipes for opening files and pipes in Perl
.SH DESCRIPTION
.IX Header "DESCRIPTION"
Whenever you do I/O on a file in Perl, you do so through what in Perl is
called a \fBfilehandle\fR. A filehandle is an internal name for an external
file. It is the job of the \f(CW\*(C`open\*(C'\fR function to make the association
between the internal name and the external name, and it is the job
of the \f(CW\*(C`close\*(C'\fR function to break that association.
.PP
For your convenience, Perl sets up a few special filehandles that are
already open when you run. These include \f(CW\*(C`STDIN\*(C'\fR, \f(CW\*(C`STDOUT\*(C'\fR, \f(CW\*(C`STDERR\*(C'\fR,
and \f(CW\*(C`ARGV\*(C'\fR. Since those are pre-opened, you can use them right away
without having to go to the trouble of opening them yourself:
.PP
.Vb 1
\& print STDERR "This is a debugging message.\en";
\&
\& print STDOUT "Please enter something: ";
\& $response = <STDIN> // die "how come no input?";
\& print STDOUT "Thank you!\en";
\&
\& while (<ARGV>) { ... }
.Ve
.PP
As you see from those examples, \f(CW\*(C`STDOUT\*(C'\fR and \f(CW\*(C`STDERR\*(C'\fR are output
handles, and \f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`ARGV\*(C'\fR are input handles. They are
in all capital letters because they are reserved to Perl, much
like the \f(CW@ARGV\fR array and the \f(CW%ENV\fR hash are. Their external
associations were set up by your shell.
.PP
You will need to open every other filehandle on your own. Although there
are many variants, the most common way to call Perl's \fBopen()\fR function
is with three arguments and one return value:
.PP
\&\f(CW\*(C` \fR\f(CIOK\fR\f(CW = open(\fR\f(CIHANDLE\fR\f(CW, \fR\f(CIMODE\fR\f(CW, \fR\f(CIPATHNAME\fR\f(CW)\*(C'\fR
.PP
Where:
.IP \fIOK\fR 4
.IX Item "OK"
will be some defined value if the open succeeds, but
\&\f(CW\*(C`undef\*(C'\fR if it fails;
.IP \fIHANDLE\fR 4
.IX Item "HANDLE"
should be an undefined scalar variable to be filled in by the
\&\f(CW\*(C`open\*(C'\fR function if it succeeds;
.IP \fIMODE\fR 4
.IX Item "MODE"
is the access mode and the encoding format to open the file with;
.IP \fIPATHNAME\fR 4
.IX Item "PATHNAME"
is the external name of the file you want opened.
.PP
Most of the complexity of the \f(CW\*(C`open\*(C'\fR function lies in the many
possible values that the \fIMODE\fR parameter can take on.
.PP
One last thing before we show you how to open files: opening
files does not (usually) automatically lock them in Perl. See
perlfaq5 for how to lock.
.SH "Opening Text Files"
.IX Header "Opening Text Files"
.SS "Opening Text Files for Reading"
.IX Subsection "Opening Text Files for Reading"
If you want to read from a text file, first open it in
read-only mode like this:
.PP
.Vb 3
\& my $filename = "/some/path/to/a/textfile/goes/here";
\& my $encoding = ":encoding(UTF\-8)";
\& my $handle = undef; # this will be filled in on success
\&
\& open($handle, "< $encoding", $filename)
\& || die "$0: can\*(Aqt open $filename for reading: $!";
.Ve
.PP
As with the shell, in Perl the \f(CW"<"\fR is used to open the file in
read-only mode. If it succeeds, Perl allocates a brand new filehandle for
you and fills in your previously undefined \f(CW$handle\fR argument with a
reference to that handle.
.PP
Now you may use functions like \f(CW\*(C`readline\*(C'\fR, \f(CW\*(C`read\*(C'\fR, \f(CW\*(C`getc\*(C'\fR, and
\&\f(CW\*(C`sysread\*(C'\fR on that handle. Probably the most common input function
is the one that looks like an operator:
.PP
.Vb 2
\& $line = readline($handle);
\& $line = <$handle>; # same thing
.Ve
.PP
Because the \f(CW\*(C`readline\*(C'\fR function returns \f(CW\*(C`undef\*(C'\fR at end of file or
upon error, you will sometimes see it used this way:
.PP
.Vb 7
\& $line = <$handle>;
\& if (defined $line) {
\& # do something with $line
\& }
\& else {
\& # $line is not valid, so skip it
\& }
.Ve
.PP
You can also just quickly \f(CW\*(C`die\*(C'\fR on an undefined value this way:
.PP
.Vb 1
\& $line = <$handle> // die "no input found";
.Ve
.PP
However, if hitting EOF is an expected and normal event, you do not want to
exit simply because you have run out of input. Instead, you probably just want
to exit an input loop. You can then test to see if an actual error has caused
the loop to terminate, and act accordingly:
.PP
.Vb 6
\& while (<$handle>) {
\& # do something with data in $_
\& }
\& if ($!) {
\& die "unexpected error while reading from $filename: $!";
\& }
.Ve
.PP
\&\fBA Note on Encodings\fR: Having to specify the text encoding every time
might seem a bit of a bother. To set up a default encoding for \f(CW\*(C`open\*(C'\fR so
that you don't have to supply it each time, you can use the \f(CW\*(C`open\*(C'\fR pragma:
.PP
.Vb 1
\& use open qw< :encoding(UTF\-8) >;
.Ve
.PP
Once you've done that, you can safely omit the encoding part of the
open mode:
.PP
.Vb 2
\& open($handle, "<", $filename)
\& || die "$0: can\*(Aqt open $filename for reading: $!";
.Ve
.PP
But never use the bare \f(CW"<"\fR without having set up a default encoding
first. Otherwise, Perl cannot know which of the many, many, many possible
flavors of text file you have, and Perl will have no idea how to correctly
map the data in your file into actual characters it can work with. Other
common encoding formats including \f(CW"ASCII"\fR, \f(CW"ISO\-8859\-1"\fR,
\&\f(CW"ISO\-8859\-15"\fR, \f(CW"Windows\-1252"\fR, \f(CW"MacRoman"\fR, and even \f(CW"UTF\-16LE"\fR.
See perlunitut for more about encodings.
.SS "Opening Text Files for Writing"
.IX Subsection "Opening Text Files for Writing"
When you want to write to a file, you first have to decide what to do about
any existing contents of that file. You have two basic choices here: to
preserve or to clobber.
.PP
If you want to preserve any existing contents, then you want to open the file
in append mode. As in the shell, in Perl you use \f(CW">>"\fR to open an
existing file in append mode. \f(CW">>"\fR creates the file if it does not
already exist.
.PP
.Vb 3
\& my $handle = undef;
\& my $filename = "/some/path/to/a/textfile/goes/here";
\& my $encoding = ":encoding(UTF\-8)";
\&
\& open($handle, ">> $encoding", $filename)
\& || die "$0: can\*(Aqt open $filename for appending: $!";
.Ve
.PP
Now you can write to that filehandle using any of \f(CW\*(C`print\*(C'\fR, \f(CW\*(C`printf\*(C'\fR,
\&\f(CW\*(C`say\*(C'\fR, \f(CW\*(C`write\*(C'\fR, or \f(CW\*(C`syswrite\*(C'\fR.
.PP
As noted above, if the file does not already exist, then the append-mode open
will create it for you. But if the file does already exist, its contents are
safe from harm because you will be adding your new text past the end of the
old text.
.PP
On the other hand, sometimes you want to clobber whatever might already be
there. To empty out a file before you start writing to it, you can open it
in write-only mode:
.PP
.Vb 3
\& my $handle = undef;
\& my $filename = "/some/path/to/a/textfile/goes/here";
\& my $encoding = ":encoding(UTF\-8)";
\&
\& open($handle, "> $encoding", $filename)
\& || die "$0: can\*(Aqt open $filename in write\-open mode: $!";
.Ve
.PP
Here again Perl works just like the shell in that the \f(CW">"\fR clobbers
an existing file.
.PP
As with the append mode, when you open a file in write-only mode,
you can now write to that filehandle using any of \f(CW\*(C`print\*(C'\fR, \f(CW\*(C`printf\*(C'\fR,
\&\f(CW\*(C`say\*(C'\fR, \f(CW\*(C`write\*(C'\fR, or \f(CW\*(C`syswrite\*(C'\fR.
.PP
What about read-write mode? You should probably pretend it doesn't exist,
because opening text files in read-write mode is unlikely to do what you
would like. See perlfaq5 for details.
.SH "Opening Binary Files"
.IX Header "Opening Binary Files"
If the file to be opened contains binary data instead of text characters,
then the \f(CW\*(C`MODE\*(C'\fR argument to \f(CW\*(C`open\*(C'\fR is a little different. Instead of
specifying the encoding, you tell Perl that your data are in raw bytes.
.PP
.Vb 3
\& my $filename = "/some/path/to/a/binary/file/goes/here";
\& my $encoding = ":raw :bytes"
\& my $handle = undef; # this will be filled in on success
.Ve
.PP
And then open as before, choosing \f(CW"<"\fR, \f(CW">>"\fR, or
\&\f(CW">"\fR as needed:
.PP
.Vb 2
\& open($handle, "< $encoding", $filename)
\& || die "$0: can\*(Aqt open $filename for reading: $!";
\&
\& open($handle, ">> $encoding", $filename)
\& || die "$0: can\*(Aqt open $filename for appending: $!";
\&
\& open($handle, "> $encoding", $filename)
\& || die "$0: can\*(Aqt open $filename in write\-open mode: $!";
.Ve
.PP
Alternately, you can change to binary mode on an existing handle this way:
.PP
.Vb 1
\& binmode($handle) || die "cannot binmode handle";
.Ve
.PP
This is especially handy for the handles that Perl has already opened for you.
.PP
.Vb 2
\& binmode(STDIN) || die "cannot binmode STDIN";
\& binmode(STDOUT) || die "cannot binmode STDOUT";
.Ve
.PP
You can also pass \f(CW\*(C`binmode\*(C'\fR an explicit encoding to change it on the fly.
This isn't exactly "binary" mode, but we still use \f(CW\*(C`binmode\*(C'\fR to do it:
.PP
.Vb 2
\& binmode(STDIN, ":encoding(MacRoman)") || die "cannot binmode STDIN";
\& binmode(STDOUT, ":encoding(UTF\-8)") || die "cannot binmode STDOUT";
.Ve
.PP
Once you have your binary file properly opened in the right mode, you can
use all the same Perl I/O functions as you used on text files. However,
you may wish to use the fixed-size \f(CW\*(C`read\*(C'\fR instead of the variable-sized
\&\f(CW\*(C`readline\*(C'\fR for your input.
.PP
Here's an example of how to copy a binary file:
.PP
.Vb 3
\& my $BUFSIZ = 64 * (2 ** 10);
\& my $name_in = "/some/input/file";
\& my $name_out = "/some/output/flie";
\&
\& my($in_fh, $out_fh, $buffer);
\&
\& open($in_fh, "<", $name_in)
\& || die "$0: cannot open $name_in for reading: $!";
\& open($out_fh, ">", $name_out)
\& || die "$0: cannot open $name_out for writing: $!";
\&
\& for my $fh ($in_fh, $out_fh) {
\& binmode($fh) || die "binmode failed";
\& }
\&
\& while (read($in_fh, $buffer, $BUFSIZ)) {
\& unless (print $out_fh $buffer) {
\& die "couldn\*(Aqt write to $name_out: $!";
\& }
\& }
\&
\& close($in_fh) || die "couldn\*(Aqt close $name_in: $!";
\& close($out_fh) || die "couldn\*(Aqt close $name_out: $!";
.Ve
.SH "Opening Pipes"
.IX Header "Opening Pipes"
Perl also lets you open a filehandle into an external program or shell
command rather than into a file. You can do this in order to pass data
from your Perl program to an external command for further processing, or
to receive data from another program for your own Perl program to
process.
.PP
Filehandles into commands are also known as \fIpipes\fR, since they work on
similar inter-process communication principles as Unix pipelines. Such a
filehandle has an active program instead of a static file on its
external end, but in every other sense it works just like a more typical
file-based filehandle, with all the techniques discussed earlier in this
article just as applicable.
.PP
As such, you open a pipe using the same \f(CW\*(C`open\*(C'\fR call that you use for
opening files, setting the second (\f(CW\*(C`MODE\*(C'\fR) argument to special
characters that indicate either an input or an output pipe. Use \f(CW"\-|"\fR for a
filehandle that will let your Perl program read data from an external
program, and \f(CW"|\-"\fR for a filehandle that will send data to that
program instead.
.SS "Opening a pipe for reading"
.IX Subsection "Opening a pipe for reading"
Let's say you'd like your Perl program to process data stored in a nearby
directory called \f(CW\*(C`unsorted\*(C'\fR, which contains a number of textfiles.
You'd also like your program to sort all the contents from these files
into a single, alphabetically sorted list of unique lines before it
starts processing them.
.PP
You could do this through opening an ordinary filehandle into each of
those files, gradually building up an in-memory array of all the file
contents you load this way, and finally sorting and filtering that array
when you've run out of files to load. \fIOr\fR, you could offload all that
merging and sorting into your operating system's own \f(CW\*(C`sort\*(C'\fR command by
opening a pipe directly into its output, and get to work that much
faster.
.PP
Here's how that might look:
.PP
.Vb 2
\& open(my $sort_fh, \*(Aq\-|\*(Aq, \*(Aqsort \-u unsorted/*.txt\*(Aq)
\& or die "Couldn\*(Aqt open a pipe into sort: $!";
\&
\& # And right away, we can start reading sorted lines:
\& while (my $line = <$sort_fh>) {
\& #
\& # ... Do something interesting with each $line here ...
\& #
\& }
.Ve
.PP
The second argument to \f(CW\*(C`open\*(C'\fR, \f(CW"\-|"\fR, makes it a read-pipe into a
separate program, rather than an ordinary filehandle into a file.
.PP
Note that the third argument to \f(CW\*(C`open\*(C'\fR is a string containing the
program name (\f(CW\*(C`sort\*(C'\fR) plus all its arguments: in this case, \f(CW\*(C`\-u\*(C'\fR to
specify unqiue sort, and then a fileglob specifying the files to sort.
The resulting filehandle \f(CW$sort_fh\fR works just like a read-only (\f(CW"<"\fR) filehandle, and your program can subsequently read data
from it as if it were opened onto an ordinary, single file.
.SS "Opening a pipe for writing"
.IX Subsection "Opening a pipe for writing"
Continuing the previous example, let's say that your program has
completed its processing, and the results sit in an array called
\&\f(CW@processed\fR. You want to print these lines to a file called
\&\f(CW\*(C`numbered.txt\*(C'\fR with a neatly formatted column of line-numbers.
.PP
Certainly you could write your own code to do this — or, once again,
you could kick that work over to another program. In this case, \f(CW\*(C`cat\*(C'\fR,
running with its own \f(CW\*(C`\-n\*(C'\fR option to activate line numbering, should do
the trick:
.PP
.Vb 2
\& open(my $cat_fh, \*(Aq|\-\*(Aq, \*(Aqcat \-n > numbered.txt\*(Aq)
\& or die "Couldn\*(Aqt open a pipe into cat: $!";
\&
\& for my $line (@processed) {
\& print $cat_fh $line;
\& }
.Ve
.PP
Here, we use a second \f(CW\*(C`open\*(C'\fR argument of \f(CW"|\-"\fR, signifying that the
filehandle assigned to \f(CW$cat_fh\fR should be a write-pipe. We can then
use it just as we would a write-only ordinary filehandle, including the
basic function of \f(CW\*(C`print\*(C'\fR\-ing data to it.
.PP
Note that the third argument, specifying the command that we wish to
pipe to, sets up \f(CW\*(C`cat\*(C'\fR to redirect its output via that \f(CW">"\fR
symbol into the file \f(CW\*(C`numbered.txt\*(C'\fR. This can start to look a little
tricky, because that same symbol would have meant something
entirely different had it showed it in the second argument to \f(CW\*(C`open\*(C'\fR!
But here in the third argument, it's simply part of the shell command that
Perl will open the pipe into, and Perl itself doesn't invest any special
meaning to it.
.SS "Expressing the command as a list"
.IX Subsection "Expressing the command as a list"
For opening pipes, Perl offers the option to call \f(CW\*(C`open\*(C'\fR with a list
comprising the desired command and all its own arguments as separate
elements, rather than combining them into a single string as in the
examples above. For instance, we could have phrased the \f(CW\*(C`open\*(C'\fR call in
the first example like this:
.PP
.Vb 2
\& open(my $sort_fh, \*(Aq\-|\*(Aq, \*(Aqsort\*(Aq, \*(Aq\-u\*(Aq, glob(\*(Aqunsorted/*.txt\*(Aq))
\& or die "Couldn\*(Aqt open a pipe into sort: $!";
.Ve
.PP
When you call \f(CW\*(C`open\*(C'\fR this way, Perl invokes the given command directly,
bypassing the shell. As such, the shell won't try to interpret any
special characters within the command's argument list, which might
overwise have unwanted effects. This can make for safer, less
error-prone \f(CW\*(C`open\*(C'\fR calls, useful in cases such as passing in variables
as arguments, or even just referring to filenames with spaces in them.
.PP
However, when you \fIdo\fR want to pass a meaningful metacharacter to the
shell, such with the \f(CW"*"\fR inside that final \f(CW\*(C`unsorted/*.txt\*(C'\fR argument
here, you can't use this alternate syntax. In this case, we have worked
around it via Perl's handy \f(CW\*(C`glob\*(C'\fR built-in function, which evaluates
its argument into a list of filenames — and we can safely pass that
resulting list right into \f(CW\*(C`open\*(C'\fR, as shown above.
.PP
Note also that representing piped-command arguments in list form like
this doesn't work on every platform. It will work on any Unix-based OS
that provides a real \f(CW\*(C`fork\*(C'\fR function (e.g. macOS or Linux), as well as
on Windows when running Perl 5.22 or later.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
The full documentation for \f(CW\*(C`open\*(C'\fR
provides a thorough reference to this function, beyond the best-practice
basics covered here.
.SH "AUTHOR and COPYRIGHT"
.IX Header "AUTHOR and COPYRIGHT"
Copyright 2013 Tom Christiansen; now maintained by Perl5 Porters
.PP
This documentation is free; you can redistribute it and/or modify it under
the same terms as Perl itself.
|