summaryrefslogtreecommitdiffstats
path: root/upstream/fedora-rawhide/man1/perlguts.1
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/fedora-rawhide/man1/perlguts.1')
-rw-r--r--upstream/fedora-rawhide/man1/perlguts.14465
1 files changed, 4465 insertions, 0 deletions
diff --git a/upstream/fedora-rawhide/man1/perlguts.1 b/upstream/fedora-rawhide/man1/perlguts.1
new file mode 100644
index 00000000..fe69a378
--- /dev/null
+++ b/upstream/fedora-rawhide/man1/perlguts.1
@@ -0,0 +1,4465 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLGUTS 1"
+.TH PERLGUTS 1 2024-01-25 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlguts \- Introduction to the Perl API
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+This document attempts to describe how to use the Perl API, as well as
+to provide some info on the basic workings of the Perl core. It is far
+from complete and probably contains many errors. Please refer any
+questions or comments to the author below.
+.SH Variables
+.IX Header "Variables"
+.SS Datatypes
+.IX Subsection "Datatypes"
+Perl has three typedefs that handle Perl's three main data types:
+.PP
+.Vb 3
+\& SV Scalar Value
+\& AV Array Value
+\& HV Hash Value
+.Ve
+.PP
+Each typedef has specific routines that manipulate the various data types.
+.SS "What is an ""IV""?"
+.IX Subsection "What is an ""IV""?"
+Perl uses a special typedef IV which is a simple signed integer type that is
+guaranteed to be large enough to hold a pointer (as well as an integer).
+Additionally, there is the UV, which is simply an unsigned IV.
+.PP
+Perl also uses several special typedefs to declare variables to hold
+integers of (at least) a given size.
+Use I8, I16, I32, and I64 to declare a signed integer variable which has
+at least as many bits as the number in its name. These all evaluate to
+the native C type that is closest to the given number of bits, but no
+smaller than that number. For example, on many platforms, a \f(CW\*(C`short\*(C'\fR is
+16 bits long, and if so, I16 will evaluate to a \f(CW\*(C`short\*(C'\fR. But on
+platforms where a \f(CW\*(C`short\*(C'\fR isn't exactly 16 bits, Perl will use the
+smallest type that contains 16 bits or more.
+.PP
+U8, U16, U32, and U64 are to declare the corresponding unsigned integer
+types.
+.PP
+If the platform doesn't support 64\-bit integers, both I64 and U64 will
+be undefined. Use IV and UV to declare the largest practicable, and
+\&\f(CW\*(C`"WIDEST_UTYPE" in perlapi\*(C'\fR for the absolute maximum unsigned, but which
+may not be usable in all circumstances.
+.PP
+A numeric constant can be specified with "\f(CW\*(C`INT16_C\*(C'\fR" in perlapi,
+"\f(CW\*(C`UINTMAX_C\*(C'\fR" in perlapi, and similar.
+.SS "Working with SVs"
+.IX Subsection "Working with SVs"
+An SV can be created and loaded with one command. There are five types of
+values that can be loaded: an integer value (IV), an unsigned integer
+value (UV), a double (NV), a string (PV), and another scalar (SV).
+("PV" stands for "Pointer Value". You might think that it is misnamed
+because it is described as pointing only to strings. However, it is
+possible to have it point to other things. For example, it could point
+to an array of UVs. But,
+using it for non-strings requires care, as the underlying assumption of
+much of the internals is that PVs are just for strings. Often, for
+example, a trailing \f(CW\*(C`NUL\*(C'\fR is tacked on automatically. The non-string use
+is documented only in this paragraph.)
+.PP
+The seven routines are:
+.PP
+.Vb 7
+\& SV* newSViv(IV);
+\& SV* newSVuv(UV);
+\& SV* newSVnv(double);
+\& SV* newSVpv(const char*, STRLEN);
+\& SV* newSVpvn(const char*, STRLEN);
+\& SV* newSVpvf(const char*, ...);
+\& SV* newSVsv(SV*);
+.Ve
+.PP
+\&\f(CW\*(C`STRLEN\*(C'\fR is an integer type (\f(CW\*(C`Size_t\*(C'\fR, usually defined as \f(CW\*(C`size_t\*(C'\fR in
+\&\fIconfig.h\fR) guaranteed to be large enough to represent the size of
+any string that perl can handle.
+.PP
+In the unlikely case of a SV requiring more complex initialization, you
+can create an empty SV with newSV(len). If \f(CW\*(C`len\*(C'\fR is 0 an empty SV of
+type NULL is returned, else an SV of type PV is returned with len + 1 (for
+the \f(CW\*(C`NUL\*(C'\fR) bytes of storage allocated, accessible via SvPVX. In both cases
+the SV has the undef value.
+.PP
+.Vb 3
+\& SV *sv = newSV(0); /* no storage allocated */
+\& SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage
+\& * allocated */
+.Ve
+.PP
+To change the value of an \fIalready-existing\fR SV, there are eight routines:
+.PP
+.Vb 9
+\& void sv_setiv(SV*, IV);
+\& void sv_setuv(SV*, UV);
+\& void sv_setnv(SV*, double);
+\& void sv_setpv(SV*, const char*);
+\& void sv_setpvn(SV*, const char*, STRLEN)
+\& void sv_setpvf(SV*, const char*, ...);
+\& void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
+\& SV **, Size_t, bool *);
+\& void sv_setsv(SV*, SV*);
+.Ve
+.PP
+Notice that you can choose to specify the length of the string to be
+assigned by using \f(CW\*(C`sv_setpvn\*(C'\fR, \f(CW\*(C`newSVpvn\*(C'\fR, or \f(CW\*(C`newSVpv\*(C'\fR, or you may
+allow Perl to calculate the length by using \f(CW\*(C`sv_setpv\*(C'\fR or by specifying
+0 as the second argument to \f(CW\*(C`newSVpv\*(C'\fR. Be warned, though, that Perl will
+determine the string's length by using \f(CW\*(C`strlen\*(C'\fR, which depends on the
+string terminating with a \f(CW\*(C`NUL\*(C'\fR character, and not otherwise containing
+NULs.
+.PP
+The arguments of \f(CW\*(C`sv_setpvf\*(C'\fR are processed like \f(CW\*(C`sprintf\*(C'\fR, and the
+formatted output becomes the value.
+.PP
+\&\f(CW\*(C`sv_vsetpvfn\*(C'\fR is an analogue of \f(CW\*(C`vsprintf\*(C'\fR, but it allows you to specify
+either a pointer to a variable argument list or the address and length of
+an array of SVs. The last argument points to a boolean; on return, if that
+boolean is true, then locale-specific information has been used to format
+the string, and the string's contents are therefore untrustworthy (see
+perlsec). This pointer may be NULL if that information is not
+important. Note that this function requires you to specify the length of
+the format.
+.PP
+The \f(CW\*(C`sv_set*()\*(C'\fR functions are not generic enough to operate on values
+that have "magic". See "Magic Virtual Tables" later in this document.
+.PP
+All SVs that contain strings should be terminated with a \f(CW\*(C`NUL\*(C'\fR character.
+If it is not \f(CW\*(C`NUL\*(C'\fR\-terminated there is a risk of
+core dumps and corruptions from code which passes the string to C
+functions or system calls which expect a \f(CW\*(C`NUL\*(C'\fR\-terminated string.
+Perl's own functions typically add a trailing \f(CW\*(C`NUL\*(C'\fR for this reason.
+Nevertheless, you should be very careful when you pass a string stored
+in an SV to a C function or system call.
+.PP
+To access the actual value that an SV points to, Perl's API exposes
+several macros that coerce the actual scalar type into an IV, UV, double,
+or string:
+.IP \(bu 4
+\&\f(CWSvIV(SV*)\fR (\f(CW\*(C`IV\*(C'\fR) and \f(CWSvUV(SV*)\fR (\f(CW\*(C`UV\*(C'\fR)
+.IP \(bu 4
+\&\f(CWSvNV(SV*)\fR (\f(CW\*(C`double\*(C'\fR)
+.IP \(bu 4
+Strings are a bit complicated:
+.RS 4
+.IP \(bu 4
+Byte string: \f(CW\*(C`SvPVbyte(SV*, STRLEN len)\*(C'\fR or \f(CWSvPVbyte_nolen(SV*)\fR
+.Sp
+If the Perl string is \f(CW"\exff\exff"\fR, then this returns a 2\-byte \f(CW\*(C`char*\*(C'\fR.
+.Sp
+This is suitable for Perl strings that represent bytes.
+.IP \(bu 4
+UTF\-8 string: \f(CW\*(C`SvPVutf8(SV*, STRLEN len)\*(C'\fR or \f(CWSvPVutf8_nolen(SV*)\fR
+.Sp
+If the Perl string is \f(CW"\exff\exff"\fR, then this returns a 4\-byte \f(CW\*(C`char*\*(C'\fR.
+.Sp
+This is suitable for Perl strings that represent characters.
+.Sp
+\&\fBCAVEAT\fR: That \f(CW\*(C`char*\*(C'\fR will be encoded via Perl's internal UTF\-8 variant,
+which means that if the SV contains non-Unicode code points (e.g.,
+0x110000), then the result may contain extensions over valid UTF\-8.
+See "is_strict_utf8_string" in perlapi for some methods Perl gives
+you to check the UTF\-8 validity of these macros' returns.
+.IP \(bu 4
+You can also use \f(CW\*(C`SvPV(SV*, STRLEN len)\*(C'\fR or \f(CWSvPV_nolen(SV*)\fR
+to fetch the SV's raw internal buffer. This is tricky, though; if your Perl
+string
+is \f(CW"\exff\exff"\fR, then depending on the SV's internal encoding you might get
+back a 2\-byte \fBOR\fR a 4\-byte \f(CW\*(C`char*\*(C'\fR.
+Moreover, if it's the 4\-byte string, that could come from either Perl
+\&\f(CW"\exff\exff"\fR stored UTF\-8 encoded, or Perl \f(CW"\exc3\exbf\exc3\exbf"\fR stored
+as raw octets. To differentiate between these you \fBMUST\fR look up the
+SV's UTF8 bit (cf. \f(CW\*(C`SvUTF8\*(C'\fR) to know whether the source Perl string
+is 2 characters (\f(CW\*(C`SvUTF8\*(C'\fR would be on) or 4 characters (\f(CW\*(C`SvUTF8\*(C'\fR would be
+off).
+.Sp
+\&\fBIMPORTANT:\fR Use of \f(CW\*(C`SvPV\*(C'\fR, \f(CW\*(C`SvPV_nolen\*(C'\fR, or
+similarly-named macros \fIwithout\fR looking up the SV's UTF8 bit is
+almost certainly a bug if non-ASCII input is allowed.
+.Sp
+When the UTF8 bit is on, the same \fBCAVEAT\fR about UTF\-8 validity applies
+here as for \f(CW\*(C`SvPVutf8\*(C'\fR.
+.RE
+.RS 4
+.Sp
+(See "How do I pass a Perl string to a C library?" for more details.)
+.Sp
+In \f(CW\*(C`SvPVbyte\*(C'\fR, \f(CW\*(C`SvPVutf8\*(C'\fR, and \f(CW\*(C`SvPV\*(C'\fR, the length of the \f(CW\*(C`char*\*(C'\fR returned
+is placed into the
+variable \f(CW\*(C`len\*(C'\fR (these are macros, so you do \fInot\fR use \f(CW&len\fR). If you do
+not care what the length of the data is, use \f(CW\*(C`SvPVbyte_nolen\*(C'\fR,
+\&\f(CW\*(C`SvPVutf8_nolen\*(C'\fR, or \f(CW\*(C`SvPV_nolen\*(C'\fR instead.
+The global variable \f(CW\*(C`PL_na\*(C'\fR can also be given to
+\&\f(CW\*(C`SvPVbyte\*(C'\fR/\f(CW\*(C`SvPVutf8\*(C'\fR/\f(CW\*(C`SvPV\*(C'\fR
+in this case. But that can be quite inefficient because \f(CW\*(C`PL_na\*(C'\fR must
+be accessed in thread-local storage in threaded Perl. In any case, remember
+that Perl allows arbitrary strings of data that may both contain NULs and
+might not be terminated by a \f(CW\*(C`NUL\*(C'\fR.
+.Sp
+Also remember that C doesn't allow you to safely say \f(CW\*(C`foo(SvPVbyte(s, len),
+len);\*(C'\fR. It might work with your
+compiler, but it won't work for everyone.
+Break this sort of statement up into separate assignments:
+.Sp
+.Vb 5
+\& SV *s;
+\& STRLEN len;
+\& char *ptr;
+\& ptr = SvPVbyte(s, len);
+\& foo(ptr, len);
+.Ve
+.RE
+.PP
+If you want to know if the scalar value is TRUE, you can use:
+.PP
+.Vb 1
+\& SvTRUE(SV*)
+.Ve
+.PP
+Although Perl will automatically grow strings for you, if you need to force
+Perl to allocate more memory for your SV, you can use the macro
+.PP
+.Vb 1
+\& SvGROW(SV*, STRLEN newlen)
+.Ve
+.PP
+which will determine if more memory needs to be allocated. If so, it will
+call the function \f(CW\*(C`sv_grow\*(C'\fR. Note that \f(CW\*(C`SvGROW\*(C'\fR can only increase, not
+decrease, the allocated memory of an SV and that it does not automatically
+add space for the trailing \f(CW\*(C`NUL\*(C'\fR byte (perl's own string functions typically do
+\&\f(CW\*(C`SvGROW(sv, len + 1)\*(C'\fR).
+.PP
+If you want to write to an existing SV's buffer and set its value to a
+string, use \fBSvPVbyte_force()\fR or one of its variants to force the SV to be
+a PV. This will remove any of various types of non-stringness from
+the SV while preserving the content of the SV in the PV. This can be
+used, for example, to append data from an API function to a buffer
+without extra copying:
+.PP
+.Vb 11
+\& (void)SvPVbyte_force(sv, len);
+\& s = SvGROW(sv, len + needlen + 1);
+\& /* something that modifies up to needlen bytes at s+len, but
+\& modifies newlen bytes
+\& eg. newlen = read(fd, s + len, needlen);
+\& ignoring errors for these examples
+\& */
+\& s[len + newlen] = \*(Aq\e0\*(Aq;
+\& SvCUR_set(sv, len + newlen);
+\& SvUTF8_off(sv);
+\& SvSETMAGIC(sv);
+.Ve
+.PP
+If you already have the data in memory or if you want to keep your
+code simple, you can use one of the sv_cat*() variants, such as
+\&\fBsv_catpvn()\fR. If you want to insert anywhere in the string you can use
+\&\fBsv_insert()\fR or \fBsv_insert_flags()\fR.
+.PP
+If you don't need the existing content of the SV, you can avoid some
+copying with:
+.PP
+.Vb 10
+\& SvPVCLEAR(sv);
+\& s = SvGROW(sv, needlen + 1);
+\& /* something that modifies up to needlen bytes at s, but modifies
+\& newlen bytes
+\& eg. newlen = read(fd, s, needlen);
+\& */
+\& s[newlen] = \*(Aq\e0\*(Aq;
+\& SvCUR_set(sv, newlen);
+\& SvPOK_only(sv); /* also clears SVf_UTF8 */
+\& SvSETMAGIC(sv);
+.Ve
+.PP
+Again, if you already have the data in memory or want to avoid the
+complexity of the above, you can use \fBsv_setpvn()\fR.
+.PP
+If you have a buffer allocated with \fBNewx()\fR and want to set that as the
+SV's value, you can use \fBsv_usepvn_flags()\fR. That has some requirements
+if you want to avoid perl re-allocating the buffer to fit the trailing
+NUL:
+.PP
+.Vb 5
+\& Newx(buf, somesize+1, char);
+\& /* ... fill in buf ... */
+\& buf[somesize] = \*(Aq\e0\*(Aq;
+\& sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL);
+\& /* buf now belongs to perl, don\*(Aqt release it */
+.Ve
+.PP
+If you have an SV and want to know what kind of data Perl thinks is stored
+in it, you can use the following macros to check the type of SV you have.
+.PP
+.Vb 3
+\& SvIOK(SV*)
+\& SvNOK(SV*)
+\& SvPOK(SV*)
+.Ve
+.PP
+Be aware that retrieving the numeric value of an SV can set IOK or NOK
+on that SV, even when the SV started as a string. Prior to Perl
+5.36.0 retrieving the string value of an integer could set POK, but
+this can no longer occur. From 5.36.0 this can be used to distinguish
+the original representation of an SV and is intended to make life
+simpler for serializers:
+.PP
+.Vb 10
+\& /* references handled elsewhere */
+\& if (SvIsBOOL(sv)) {
+\& /* originally boolean */
+\& ...
+\& }
+\& else if (SvPOK(sv)) {
+\& /* originally a string */
+\& ...
+\& }
+\& else if (SvNIOK(sv)) {
+\& /* originally numeric */
+\& ...
+\& }
+\& else {
+\& /* something special or undef */
+\& }
+.Ve
+.PP
+You can get and set the current length of the string stored in an SV with
+the following macros:
+.PP
+.Vb 2
+\& SvCUR(SV*)
+\& SvCUR_set(SV*, I32 val)
+.Ve
+.PP
+You can also get a pointer to the end of the string stored in the SV
+with the macro:
+.PP
+.Vb 1
+\& SvEND(SV*)
+.Ve
+.PP
+But note that these last three macros are valid only if \f(CWSvPOK()\fR is true.
+.PP
+If you want to append something to the end of string stored in an \f(CW\*(C`SV*\*(C'\fR,
+you can use the following functions:
+.PP
+.Vb 6
+\& void sv_catpv(SV*, const char*);
+\& void sv_catpvn(SV*, const char*, STRLEN);
+\& void sv_catpvf(SV*, const char*, ...);
+\& void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
+\& I32, bool);
+\& void sv_catsv(SV*, SV*);
+.Ve
+.PP
+The first function calculates the length of the string to be appended by
+using \f(CW\*(C`strlen\*(C'\fR. In the second, you specify the length of the string
+yourself. The third function processes its arguments like \f(CW\*(C`sprintf\*(C'\fR and
+appends the formatted output. The fourth function works like \f(CW\*(C`vsprintf\*(C'\fR.
+You can specify the address and length of an array of SVs instead of the
+va_list argument. The fifth function
+extends the string stored in the first
+SV with the string stored in the second SV. It also forces the second SV
+to be interpreted as a string.
+.PP
+The \f(CW\*(C`sv_cat*()\*(C'\fR functions are not generic enough to operate on values that
+have "magic". See "Magic Virtual Tables" later in this document.
+.PP
+If you know the name of a scalar variable, you can get a pointer to its SV
+by using the following:
+.PP
+.Vb 1
+\& SV* get_sv("package::varname", 0);
+.Ve
+.PP
+This returns NULL if the variable does not exist.
+.PP
+If you want to know if this variable (or any other SV) is actually \f(CW\*(C`defined\*(C'\fR,
+you can call:
+.PP
+.Vb 1
+\& SvOK(SV*)
+.Ve
+.PP
+The scalar \f(CW\*(C`undef\*(C'\fR value is stored in an SV instance called \f(CW\*(C`PL_sv_undef\*(C'\fR.
+.PP
+Its address can be used whenever an \f(CW\*(C`SV*\*(C'\fR is needed. Make sure that
+you don't try to compare a random sv with \f(CW&PL_sv_undef\fR. For example
+when interfacing Perl code, it'll work correctly for:
+.PP
+.Vb 1
+\& foo(undef);
+.Ve
+.PP
+But won't work when called as:
+.PP
+.Vb 2
+\& $x = undef;
+\& foo($x);
+.Ve
+.PP
+So to repeat always use \fBSvOK()\fR to check whether an sv is defined.
+.PP
+Also you have to be careful when using \f(CW&PL_sv_undef\fR as a value in
+AVs or HVs (see "AVs, HVs and undefined values").
+.PP
+There are also the two values \f(CW\*(C`PL_sv_yes\*(C'\fR and \f(CW\*(C`PL_sv_no\*(C'\fR, which contain
+boolean TRUE and FALSE values, respectively. Like \f(CW\*(C`PL_sv_undef\*(C'\fR, their
+addresses can be used whenever an \f(CW\*(C`SV*\*(C'\fR is needed.
+.PP
+Do not be fooled into thinking that \f(CW\*(C`(SV *) 0\*(C'\fR is the same as \f(CW&PL_sv_undef\fR.
+Take this code:
+.PP
+.Vb 5
+\& SV* sv = (SV*) 0;
+\& if (I\-am\-to\-return\-a\-real\-value) {
+\& sv = sv_2mortal(newSViv(42));
+\& }
+\& sv_setsv(ST(0), sv);
+.Ve
+.PP
+This code tries to return a new SV (which contains the value 42) if it should
+return a real value, or undef otherwise. Instead it has returned a NULL
+pointer which, somewhere down the line, will cause a segmentation violation,
+bus error, or just weird results. Change the zero to \f(CW&PL_sv_undef\fR in the
+first line and all will be well.
+.PP
+To free an SV that you've created, call \f(CWSvREFCNT_dec(SV*)\fR. Normally this
+call is not necessary (see "Reference Counts and Mortality").
+.SS Offsets
+.IX Subsection "Offsets"
+Perl provides the function \f(CW\*(C`sv_chop\*(C'\fR to efficiently remove characters
+from the beginning of a string; you give it an SV and a pointer to
+somewhere inside the PV, and it discards everything before the
+pointer. The efficiency comes by means of a little hack: instead of
+actually removing the characters, \f(CW\*(C`sv_chop\*(C'\fR sets the flag \f(CW\*(C`OOK\*(C'\fR
+(offset OK) to signal to other functions that the offset hack is in
+effect, and it moves the PV pointer (called \f(CW\*(C`SvPVX\*(C'\fR) forward
+by the number of bytes chopped off, and adjusts \f(CW\*(C`SvCUR\*(C'\fR and \f(CW\*(C`SvLEN\*(C'\fR
+accordingly. (A portion of the space between the old and new PV
+pointers is used to store the count of chopped bytes.)
+.PP
+Hence, at this point, the start of the buffer that we allocated lives
+at \f(CW\*(C`SvPVX(sv) \- SvIV(sv)\*(C'\fR in memory and the PV pointer is pointing
+into the middle of this allocated storage.
+.PP
+This is best demonstrated by example. Normally copy-on-write will prevent
+the substitution from operator from using this hack, but if you can craft a
+string for which copy-on-write is not possible, you can see it in play. In
+the current implementation, the final byte of a string buffer is used as a
+copy-on-write reference count. If the buffer is not big enough, then
+copy-on-write is skipped. First have a look at an empty string:
+.PP
+.Vb 7
+\& % ./perl \-Ilib \-MDevel::Peek \-le \*(Aq$a=""; $a .= ""; Dump $a\*(Aq
+\& SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390
+\& REFCNT = 1
+\& FLAGS = (POK,pPOK)
+\& PV = 0x7ffb7bc05b50 ""\e0
+\& CUR = 0
+\& LEN = 10
+.Ve
+.PP
+Notice here the LEN is 10. (It may differ on your platform.) Extend the
+length of the string to one less than 10, and do a substitution:
+.PP
+.Vb 9
+\& % ./perl \-Ilib \-MDevel::Peek \-le \*(Aq$a=""; $a.="123456789"; $a=~s/.//; \e
+\& Dump($a)\*(Aq
+\& SV = PV(0x7ffa04008a70) at 0x7ffa04030390
+\& REFCNT = 1
+\& FLAGS = (POK,OOK,pPOK)
+\& OFFSET = 1
+\& PV = 0x7ffa03c05b61 ( "\e1" . ) "23456789"\e0
+\& CUR = 8
+\& LEN = 9
+.Ve
+.PP
+Here the number of bytes chopped off (1) is shown next as the OFFSET. The
+portion of the string between the "real" and the "fake" beginnings is
+shown in parentheses, and the values of \f(CW\*(C`SvCUR\*(C'\fR and \f(CW\*(C`SvLEN\*(C'\fR reflect
+the fake beginning, not the real one. (The first character of the string
+buffer happens to have changed to "\e1" here, not "1", because the current
+implementation stores the offset count in the string buffer. This is
+subject to change.)
+.PP
+Something similar to the offset hack is performed on AVs to enable
+efficient shifting and splicing off the beginning of the array; while
+\&\f(CW\*(C`AvARRAY\*(C'\fR points to the first element in the array that is visible from
+Perl, \f(CW\*(C`AvALLOC\*(C'\fR points to the real start of the C array. These are
+usually the same, but a \f(CW\*(C`shift\*(C'\fR operation can be carried out by
+increasing \f(CW\*(C`AvARRAY\*(C'\fR by one and decreasing \f(CW\*(C`AvFILL\*(C'\fR and \f(CW\*(C`AvMAX\*(C'\fR.
+Again, the location of the real start of the C array only comes into
+play when freeing the array. See \f(CW\*(C`av_shift\*(C'\fR in \fIav.c\fR.
+.SS "What's Really Stored in an SV?"
+.IX Subsection "What's Really Stored in an SV?"
+Recall that the usual method of determining the type of scalar you have is
+to use \f(CW\*(C`Sv*OK\*(C'\fR macros. Because a scalar can be both a number and a string,
+usually these macros will always return TRUE and calling the \f(CW\*(C`Sv*V\*(C'\fR
+macros will do the appropriate conversion of string to integer/double or
+integer/double to string.
+.PP
+If you \fIreally\fR need to know if you have an integer, double, or string
+pointer in an SV, you can use the following three macros instead:
+.PP
+.Vb 3
+\& SvIOKp(SV*)
+\& SvNOKp(SV*)
+\& SvPOKp(SV*)
+.Ve
+.PP
+These will tell you if you truly have an integer, double, or string pointer
+stored in your SV. The "p" stands for private.
+.PP
+There are various ways in which the private and public flags may differ.
+For example, in perl 5.16 and earlier a tied SV may have a valid
+underlying value in the IV slot (so SvIOKp is true), but the data
+should be accessed via the FETCH routine rather than directly,
+so SvIOK is false. (In perl 5.18 onwards, tied scalars use
+the flags the same way as untied scalars.) Another is when
+numeric conversion has occurred and precision has been lost: only the
+private flag is set on 'lossy' values. So when an NV is converted to an
+IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
+.PP
+In general, though, it's best to use the \f(CW\*(C`Sv*V\*(C'\fR macros.
+.SS "Working with AVs"
+.IX Subsection "Working with AVs"
+There are two main, longstanding ways to create and load an AV. The first
+method creates an empty AV:
+.PP
+.Vb 1
+\& AV* newAV();
+.Ve
+.PP
+The second method both creates the AV and initially populates it with SVs:
+.PP
+.Vb 1
+\& AV* av_make(SSize_t num, SV **ptr);
+.Ve
+.PP
+The second argument points to an array containing \f(CW\*(C`num\*(C'\fR \f(CW\*(C`SV*\*(C'\fR's. Once the
+AV has been created, the SVs can be destroyed, if so desired.
+.PP
+Perl v5.36 added two new ways to create an AV and allocate a SV** array
+without populating it. These are more efficient than a \fBnewAV()\fR followed by an
+\&\fBav_extend()\fR.
+.PP
+.Vb 4
+\& /* Creates but does not initialize (Zero) the SV** array */
+\& AV *av = newAV_alloc_x(1);
+\& /* Creates and does initialize (Zero) the SV** array */
+\& AV *av = newAV_alloc_xz(1);
+.Ve
+.PP
+The numerical argument refers to the number of array elements to allocate, not
+an array index, and must be >0. The first form must only ever be used when all
+elements will be initialized before any read occurs. Reading a non-initialized
+SV* \- i.e. treating a random memory address as a SV* \- is a serious bug.
+.PP
+Once the AV has been created, the following operations are possible on it:
+.PP
+.Vb 4
+\& void av_push(AV*, SV*);
+\& SV* av_pop(AV*);
+\& SV* av_shift(AV*);
+\& void av_unshift(AV*, SSize_t num);
+.Ve
+.PP
+These should be familiar operations, with the exception of \f(CW\*(C`av_unshift\*(C'\fR.
+This routine adds \f(CW\*(C`num\*(C'\fR elements at the front of the array with the \f(CW\*(C`undef\*(C'\fR
+value. You must then use \f(CW\*(C`av_store\*(C'\fR (described below) to assign values
+to these new elements.
+.PP
+Here are some other functions:
+.PP
+.Vb 3
+\& SSize_t av_top_index(AV*);
+\& SV** av_fetch(AV*, SSize_t key, I32 lval);
+\& SV** av_store(AV*, SSize_t key, SV* val);
+.Ve
+.PP
+The \f(CW\*(C`av_top_index\*(C'\fR function returns the highest index value in an array (just
+like $#array in Perl). If the array is empty, \-1 is returned. The
+\&\f(CW\*(C`av_fetch\*(C'\fR function returns the value at index \f(CW\*(C`key\*(C'\fR, but if \f(CW\*(C`lval\*(C'\fR
+is non-zero, then \f(CW\*(C`av_fetch\*(C'\fR will store an undef value at that index.
+The \f(CW\*(C`av_store\*(C'\fR function stores the value \f(CW\*(C`val\*(C'\fR at index \f(CW\*(C`key\*(C'\fR, and does
+not increment the reference count of \f(CW\*(C`val\*(C'\fR. Thus the caller is responsible
+for taking care of that, and if \f(CW\*(C`av_store\*(C'\fR returns NULL, the caller will
+have to decrement the reference count to avoid a memory leak. Note that
+\&\f(CW\*(C`av_fetch\*(C'\fR and \f(CW\*(C`av_store\*(C'\fR both return \f(CW\*(C`SV**\*(C'\fR's, not \f(CW\*(C`SV*\*(C'\fR's as their
+return value.
+.PP
+A few more:
+.PP
+.Vb 3
+\& void av_clear(AV*);
+\& void av_undef(AV*);
+\& void av_extend(AV*, SSize_t key);
+.Ve
+.PP
+The \f(CW\*(C`av_clear\*(C'\fR function deletes all the elements in the AV* array, but
+does not actually delete the array itself. The \f(CW\*(C`av_undef\*(C'\fR function will
+delete all the elements in the array plus the array itself. The
+\&\f(CW\*(C`av_extend\*(C'\fR function extends the array so that it contains at least \f(CW\*(C`key+1\*(C'\fR
+elements. If \f(CW\*(C`key+1\*(C'\fR is less than the currently allocated length of the array,
+then nothing is done.
+.PP
+If you know the name of an array variable, you can get a pointer to its AV
+by using the following:
+.PP
+.Vb 1
+\& AV* get_av("package::varname", 0);
+.Ve
+.PP
+This returns NULL if the variable does not exist.
+.PP
+See "Understanding the Magic of Tied Hashes and Arrays" for more
+information on how to use the array access functions on tied arrays.
+.PP
+\fIMore efficient working with new or vanilla AVs\fR
+.IX Subsection "More efficient working with new or vanilla AVs"
+.PP
+Perl v5.36 and v5.38 introduced streamlined, inlined versions of some
+functions:
+.IP \(bu 4
+\&\f(CW\*(C`av_store_simple\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`av_fetch_simple\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`av_push_simple\*(C'\fR
+.PP
+These are drop-in replacements, but can only be used on straightforward
+AVs that meet the following criteria:
+.IP \(bu 4
+are not magical
+.IP \(bu 4
+are not readonly
+.IP \(bu 4
+are "real" (refcounted) AVs
+.IP \(bu 4
+have an av_top_index value > \-2
+.PP
+AVs created using \f(CWnewAV()\fR, \f(CW\*(C`av_make\*(C'\fR, \f(CW\*(C`newAV_alloc_x\*(C'\fR, and
+\&\f(CW\*(C`newAV_alloc_xz\*(C'\fR are all compatible at the time of creation. It is
+only if they are declared readonly or unreal, have magic attached, or
+are otherwise configured unusually that they will stop being compatible.
+.PP
+Note that some interpreter functions may attach magic to an AV as part
+of normal operations. It is therefore safest, unless you are sure of the
+lifecycle of an AV, to only use these new functions close to the point
+of AV creation.
+.SS "Working with HVs"
+.IX Subsection "Working with HVs"
+To create an HV, you use the following routine:
+.PP
+.Vb 1
+\& HV* newHV();
+.Ve
+.PP
+Once the HV has been created, the following operations are possible on it:
+.PP
+.Vb 2
+\& SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
+\& SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
+.Ve
+.PP
+The \f(CW\*(C`klen\*(C'\fR parameter is the length of the key being passed in (Note that
+you cannot pass 0 in as a value of \f(CW\*(C`klen\*(C'\fR to tell Perl to measure the
+length of the key). The \f(CW\*(C`val\*(C'\fR argument contains the SV pointer to the
+scalar being stored, and \f(CW\*(C`hash\*(C'\fR is the precomputed hash value (zero if
+you want \f(CW\*(C`hv_store\*(C'\fR to calculate it for you). The \f(CW\*(C`lval\*(C'\fR parameter
+indicates whether this fetch is actually a part of a store operation, in
+which case a new undefined value will be added to the HV with the supplied
+key and \f(CW\*(C`hv_fetch\*(C'\fR will return as if the value had already existed.
+.PP
+Remember that \f(CW\*(C`hv_store\*(C'\fR and \f(CW\*(C`hv_fetch\*(C'\fR return \f(CW\*(C`SV**\*(C'\fR's and not just
+\&\f(CW\*(C`SV*\*(C'\fR. To access the scalar value, you must first dereference the return
+value. However, you should check to make sure that the return value is
+not NULL before dereferencing it.
+.PP
+The first of these two functions checks if a hash table entry exists, and the
+second deletes it.
+.PP
+.Vb 2
+\& bool hv_exists(HV*, const char* key, U32 klen);
+\& SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
+.Ve
+.PP
+If \f(CW\*(C`flags\*(C'\fR does not include the \f(CW\*(C`G_DISCARD\*(C'\fR flag then \f(CW\*(C`hv_delete\*(C'\fR will
+create and return a mortal copy of the deleted value.
+.PP
+And more miscellaneous functions:
+.PP
+.Vb 2
+\& void hv_clear(HV*);
+\& void hv_undef(HV*);
+.Ve
+.PP
+Like their AV counterparts, \f(CW\*(C`hv_clear\*(C'\fR deletes all the entries in the hash
+table but does not actually delete the hash table. The \f(CW\*(C`hv_undef\*(C'\fR deletes
+both the entries and the hash table itself.
+.PP
+Perl keeps the actual data in a linked list of structures with a typedef of HE.
+These contain the actual key and value pointers (plus extra administrative
+overhead). The key is a string pointer; the value is an \f(CW\*(C`SV*\*(C'\fR. However,
+once you have an \f(CW\*(C`HE*\*(C'\fR, to get the actual key and value, use the routines
+specified below.
+.PP
+.Vb 10
+\& I32 hv_iterinit(HV*);
+\& /* Prepares starting point to traverse hash table */
+\& HE* hv_iternext(HV*);
+\& /* Get the next entry, and return a pointer to a
+\& structure that has both the key and value */
+\& char* hv_iterkey(HE* entry, I32* retlen);
+\& /* Get the key from an HE structure and also return
+\& the length of the key string */
+\& SV* hv_iterval(HV*, HE* entry);
+\& /* Return an SV pointer to the value of the HE
+\& structure */
+\& SV* hv_iternextsv(HV*, char** key, I32* retlen);
+\& /* This convenience routine combines hv_iternext,
+\& hv_iterkey, and hv_iterval. The key and retlen
+\& arguments are return values for the key and its
+\& length. The value is returned in the SV* argument */
+.Ve
+.PP
+If you know the name of a hash variable, you can get a pointer to its HV
+by using the following:
+.PP
+.Vb 1
+\& HV* get_hv("package::varname", 0);
+.Ve
+.PP
+This returns NULL if the variable does not exist.
+.PP
+The hash algorithm is defined in the \f(CW\*(C`PERL_HASH\*(C'\fR macro:
+.PP
+.Vb 1
+\& PERL_HASH(hash, key, klen)
+.Ve
+.PP
+The exact implementation of this macro varies by architecture and version
+of perl, and the return value may change per invocation, so the value
+is only valid for the duration of a single perl process.
+.PP
+See "Understanding the Magic of Tied Hashes and Arrays" for more
+information on how to use the hash access functions on tied hashes.
+.SS "Hash API Extensions"
+.IX Subsection "Hash API Extensions"
+Beginning with version 5.004, the following functions are also supported:
+.PP
+.Vb 2
+\& HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
+\& HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
+\&
+\& bool hv_exists_ent (HV* tb, SV* key, U32 hash);
+\& SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
+\&
+\& SV* hv_iterkeysv (HE* entry);
+.Ve
+.PP
+Note that these functions take \f(CW\*(C`SV*\*(C'\fR keys, which simplifies writing
+of extension code that deals with hash structures. These functions
+also allow passing of \f(CW\*(C`SV*\*(C'\fR keys to \f(CW\*(C`tie\*(C'\fR functions without forcing
+you to stringify the keys (unlike the previous set of functions).
+.PP
+They also return and accept whole hash entries (\f(CW\*(C`HE*\*(C'\fR), making their
+use more efficient (since the hash number for a particular string
+doesn't have to be recomputed every time). See perlapi for detailed
+descriptions.
+.PP
+The following macros must always be used to access the contents of hash
+entries. Note that the arguments to these macros must be simple
+variables, since they may get evaluated more than once. See
+perlapi for detailed descriptions of these macros.
+.PP
+.Vb 6
+\& HePV(HE* he, STRLEN len)
+\& HeVAL(HE* he)
+\& HeHASH(HE* he)
+\& HeSVKEY(HE* he)
+\& HeSVKEY_force(HE* he)
+\& HeSVKEY_set(HE* he, SV* sv)
+.Ve
+.PP
+These two lower level macros are defined, but must only be used when
+dealing with keys that are not \f(CW\*(C`SV*\*(C'\fRs:
+.PP
+.Vb 2
+\& HeKEY(HE* he)
+\& HeKLEN(HE* he)
+.Ve
+.PP
+Note that both \f(CW\*(C`hv_store\*(C'\fR and \f(CW\*(C`hv_store_ent\*(C'\fR do not increment the
+reference count of the stored \f(CW\*(C`val\*(C'\fR, which is the caller's responsibility.
+If these functions return a NULL value, the caller will usually have to
+decrement the reference count of \f(CW\*(C`val\*(C'\fR to avoid a memory leak.
+.SS "AVs, HVs and undefined values"
+.IX Subsection "AVs, HVs and undefined values"
+Sometimes you have to store undefined values in AVs or HVs. Although
+this may be a rare case, it can be tricky. That's because you're
+used to using \f(CW&PL_sv_undef\fR if you need an undefined SV.
+.PP
+For example, intuition tells you that this XS code:
+.PP
+.Vb 2
+\& AV *av = newAV();
+\& av_store( av, 0, &PL_sv_undef );
+.Ve
+.PP
+is equivalent to this Perl code:
+.PP
+.Vb 2
+\& my @av;
+\& $av[0] = undef;
+.Ve
+.PP
+Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use \f(CW&PL_sv_undef\fR as a marker
+for indicating that an array element has not yet been initialized.
+Thus, \f(CW\*(C`exists $av[0]\*(C'\fR would be true for the above Perl code, but
+false for the array generated by the XS code. In perl 5.20, storing
+&PL_sv_undef will create a read-only element, because the scalar
+&PL_sv_undef itself is stored, not a copy.
+.PP
+Similar problems can occur when storing \f(CW&PL_sv_undef\fR in HVs:
+.PP
+.Vb 1
+\& hv_store( hv, "key", 3, &PL_sv_undef, 0 );
+.Ve
+.PP
+This will indeed make the value \f(CW\*(C`undef\*(C'\fR, but if you try to modify
+the value of \f(CW\*(C`key\*(C'\fR, you'll get the following error:
+.PP
+.Vb 1
+\& Modification of non\-creatable hash value attempted
+.Ve
+.PP
+In perl 5.8.0, \f(CW&PL_sv_undef\fR was also used to mark placeholders
+in restricted hashes. This caused such hash entries not to appear
+when iterating over the hash or when checking for the keys
+with the \f(CW\*(C`hv_exists\*(C'\fR function.
+.PP
+You can run into similar problems when you store \f(CW&PL_sv_yes\fR or
+\&\f(CW&PL_sv_no\fR into AVs or HVs. Trying to modify such elements
+will give you the following error:
+.PP
+.Vb 1
+\& Modification of a read\-only value attempted
+.Ve
+.PP
+To make a long story short, you can use the special variables
+\&\f(CW&PL_sv_undef\fR, \f(CW&PL_sv_yes\fR and \f(CW&PL_sv_no\fR with AVs and
+HVs, but you have to make sure you know what you're doing.
+.PP
+Generally, if you want to store an undefined value in an AV
+or HV, you should not use \f(CW&PL_sv_undef\fR, but rather create a
+new undefined value using the \f(CW\*(C`newSV\*(C'\fR function, for example:
+.PP
+.Vb 2
+\& av_store( av, 42, newSV(0) );
+\& hv_store( hv, "foo", 3, newSV(0), 0 );
+.Ve
+.SS References
+.IX Subsection "References"
+References are a special type of scalar that point to other data types
+(including other references).
+.PP
+To create a reference, use either of the following functions:
+.PP
+.Vb 2
+\& SV* newRV_inc((SV*) thing);
+\& SV* newRV_noinc((SV*) thing);
+.Ve
+.PP
+The \f(CW\*(C`thing\*(C'\fR argument can be any of an \f(CW\*(C`SV*\*(C'\fR, \f(CW\*(C`AV*\*(C'\fR, or \f(CW\*(C`HV*\*(C'\fR. The
+functions are identical except that \f(CW\*(C`newRV_inc\*(C'\fR increments the reference
+count of the \f(CW\*(C`thing\*(C'\fR, while \f(CW\*(C`newRV_noinc\*(C'\fR does not. For historical
+reasons, \f(CW\*(C`newRV\*(C'\fR is a synonym for \f(CW\*(C`newRV_inc\*(C'\fR.
+.PP
+Once you have a reference, you can use the following macro to dereference
+the reference:
+.PP
+.Vb 1
+\& SvRV(SV*)
+.Ve
+.PP
+then call the appropriate routines, casting the returned \f(CW\*(C`SV*\*(C'\fR to either an
+\&\f(CW\*(C`AV*\*(C'\fR or \f(CW\*(C`HV*\*(C'\fR, if required.
+.PP
+To determine if an SV is a reference, you can use the following macro:
+.PP
+.Vb 1
+\& SvROK(SV*)
+.Ve
+.PP
+To discover what type of value the reference refers to, use the following
+macro and then check the return value.
+.PP
+.Vb 1
+\& SvTYPE(SvRV(SV*))
+.Ve
+.PP
+The most useful types that will be returned are:
+.PP
+.Vb 4
+\& SVt_PVAV Array
+\& SVt_PVHV Hash
+\& SVt_PVCV Code
+\& SVt_PVGV Glob (possibly a file handle)
+.Ve
+.PP
+Any numerical value returned which is less than SVt_PVAV will be a scalar
+of some form.
+.PP
+See "svtype" in perlapi for more details.
+.SS "Blessed References and Class Objects"
+.IX Subsection "Blessed References and Class Objects"
+References are also used to support object-oriented programming. In perl's
+OO lexicon, an object is simply a reference that has been blessed into a
+package (or class). Once blessed, the programmer may now use the reference
+to access the various methods in the class.
+.PP
+A reference can be blessed into a package with the following function:
+.PP
+.Vb 1
+\& SV* sv_bless(SV* sv, HV* stash);
+.Ve
+.PP
+The \f(CW\*(C`sv\*(C'\fR argument must be a reference value. The \f(CW\*(C`stash\*(C'\fR argument
+specifies which class the reference will belong to. See
+"Stashes and Globs" for information on converting class names into stashes.
+.PP
+/* Still under construction */
+.PP
+The following function upgrades rv to reference if not already one.
+Creates a new SV for rv to point to. If \f(CW\*(C`classname\*(C'\fR is non-null, the SV
+is blessed into the specified class. SV is returned.
+.PP
+.Vb 1
+\& SV* newSVrv(SV* rv, const char* classname);
+.Ve
+.PP
+The following three functions copy integer, unsigned integer or double
+into an SV whose reference is \f(CW\*(C`rv\*(C'\fR. SV is blessed if \f(CW\*(C`classname\*(C'\fR is
+non-null.
+.PP
+.Vb 3
+\& SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
+\& SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
+\& SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
+.Ve
+.PP
+The following function copies the pointer value (\fIthe address, not the
+string!\fR) into an SV whose reference is rv. SV is blessed if \f(CW\*(C`classname\*(C'\fR
+is non-null.
+.PP
+.Vb 1
+\& SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
+.Ve
+.PP
+The following function copies a string into an SV whose reference is \f(CW\*(C`rv\*(C'\fR.
+Set length to 0 to let Perl calculate the string length. SV is blessed if
+\&\f(CW\*(C`classname\*(C'\fR is non-null.
+.PP
+.Vb 2
+\& SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
+\& STRLEN length);
+.Ve
+.PP
+The following function tests whether the SV is blessed into the specified
+class. It does not check inheritance relationships.
+.PP
+.Vb 1
+\& int sv_isa(SV* sv, const char* name);
+.Ve
+.PP
+The following function tests whether the SV is a reference to a blessed object.
+.PP
+.Vb 1
+\& int sv_isobject(SV* sv);
+.Ve
+.PP
+The following function tests whether the SV is derived from the specified
+class. SV can be either a reference to a blessed object or a string
+containing a class name. This is the function implementing the
+\&\f(CW\*(C`UNIVERSAL::isa\*(C'\fR functionality.
+.PP
+.Vb 1
+\& bool sv_derived_from(SV* sv, const char* name);
+.Ve
+.PP
+To check if you've got an object derived from a specific class you have
+to write:
+.PP
+.Vb 1
+\& if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
+.Ve
+.SS "Creating New Variables"
+.IX Subsection "Creating New Variables"
+To create a new Perl variable with an undef value which can be accessed from
+your Perl script, use the following routines, depending on the variable type.
+.PP
+.Vb 3
+\& SV* get_sv("package::varname", GV_ADD);
+\& AV* get_av("package::varname", GV_ADD);
+\& HV* get_hv("package::varname", GV_ADD);
+.Ve
+.PP
+Notice the use of GV_ADD as the second parameter. The new variable can now
+be set, using the routines appropriate to the data type.
+.PP
+There are additional macros whose values may be bitwise OR'ed with the
+\&\f(CW\*(C`GV_ADD\*(C'\fR argument to enable certain extra features. Those bits are:
+.IP GV_ADDMULTI 4
+.IX Item "GV_ADDMULTI"
+Marks the variable as multiply defined, thus preventing the:
+.Sp
+.Vb 1
+\& Name <varname> used only once: possible typo
+.Ve
+.Sp
+warning.
+.IP GV_ADDWARN 4
+.IX Item "GV_ADDWARN"
+Issues the warning:
+.Sp
+.Vb 1
+\& Had to create <varname> unexpectedly
+.Ve
+.Sp
+if the variable did not exist before the function was called.
+.PP
+If you do not specify a package name, the variable is created in the current
+package.
+.SS "Reference Counts and Mortality"
+.IX Subsection "Reference Counts and Mortality"
+Perl uses a reference count-driven garbage collection mechanism. SVs,
+AVs, or HVs (xV for short in the following) start their life with a
+reference count of 1. If the reference count of an xV ever drops to 0,
+then it will be destroyed and its memory made available for reuse.
+At the most basic internal level, reference counts can be manipulated
+with the following macros:
+.PP
+.Vb 3
+\& int SvREFCNT(SV* sv);
+\& SV* SvREFCNT_inc(SV* sv);
+\& void SvREFCNT_dec(SV* sv);
+.Ve
+.PP
+(There are also suffixed versions of the increment and decrement macros,
+for situations where the full generality of these basic macros can be
+exchanged for some performance.)
+.PP
+However, the way a programmer should think about references is not so
+much in terms of the bare reference count, but in terms of \fIownership\fR
+of references. A reference to an xV can be owned by any of a variety
+of entities: another xV, the Perl interpreter, an XS data structure,
+a piece of running code, or a dynamic scope. An xV generally does not
+know what entities own the references to it; it only knows how many
+references there are, which is the reference count.
+.PP
+To correctly maintain reference counts, it is essential to keep track
+of what references the XS code is manipulating. The programmer should
+always know where a reference has come from and who owns it, and be
+aware of any creation or destruction of references, and any transfers
+of ownership. Because ownership isn't represented explicitly in the xV
+data structures, only the reference count need be actually maintained
+by the code, and that means that this understanding of ownership is not
+actually evident in the code. For example, transferring ownership of a
+reference from one owner to another doesn't change the reference count
+at all, so may be achieved with no actual code. (The transferring code
+doesn't touch the referenced object, but does need to ensure that the
+former owner knows that it no longer owns the reference, and that the
+new owner knows that it now does.)
+.PP
+An xV that is visible at the Perl level should not become unreferenced
+and thus be destroyed. Normally, an object will only become unreferenced
+when it is no longer visible, often by the same means that makes it
+invisible. For example, a Perl reference value (RV) owns a reference to
+its referent, so if the RV is overwritten that reference gets destroyed,
+and the no-longer-reachable referent may be destroyed as a result.
+.PP
+Many functions have some kind of reference manipulation as
+part of their purpose. Sometimes this is documented in terms
+of ownership of references, and sometimes it is (less helpfully)
+documented in terms of changes to reference counts. For example, the
+\&\fBnewRV_inc()\fR function is documented to create a new RV
+(with reference count 1) and increment the reference count of the referent
+that was supplied by the caller. This is best understood as creating
+a new reference to the referent, which is owned by the created RV,
+and returning to the caller ownership of the sole reference to the RV.
+The \fBnewRV_noinc()\fR function instead does not
+increment the reference count of the referent, but the RV nevertheless
+ends up owning a reference to the referent. It is therefore implied
+that the caller of \f(CWnewRV_noinc()\fR is relinquishing a reference to the
+referent, making this conceptually a more complicated operation even
+though it does less to the data structures.
+.PP
+For example, imagine you want to return a reference from an XSUB
+function. Inside the XSUB routine, you create an SV which initially
+has just a single reference, owned by the XSUB routine. This reference
+needs to be disposed of before the routine is complete, otherwise it
+will leak, preventing the SV from ever being destroyed. So to create
+an RV referencing the SV, it is most convenient to pass the SV to
+\&\f(CWnewRV_noinc()\fR, which consumes that reference. Now the XSUB routine
+no longer owns a reference to the SV, but does own a reference to the RV,
+which in turn owns a reference to the SV. The ownership of the reference
+to the RV is then transferred by the process of returning the RV from
+the XSUB.
+.PP
+There are some convenience functions available that can help with the
+destruction of xVs. These functions introduce the concept of "mortality".
+Much documentation speaks of an xV itself being mortal, but this is
+misleading. It is really \fIa reference to\fR an xV that is mortal, and it
+is possible for there to be more than one mortal reference to a single xV.
+For a reference to be mortal means that it is owned by the temps stack,
+one of perl's many internal stacks, which will destroy that reference
+"a short time later". Usually the "short time later" is the end of
+the current Perl statement. However, it gets more complicated around
+dynamic scopes: there can be multiple sets of mortal references hanging
+around at the same time, with different death dates. Internally, the
+actual determinant for when mortal xV references are destroyed depends
+on two macros, SAVETMPS and FREETMPS. See perlcall and perlxs
+and "Temporaries Stack" below for more details on these macros.
+.PP
+Mortal references are mainly used for xVs that are placed on perl's
+main stack. The stack is problematic for reference tracking, because it
+contains a lot of xV references, but doesn't own those references: they
+are not counted. Currently, there are many bugs resulting from xVs being
+destroyed while referenced by the stack, because the stack's uncounted
+references aren't enough to keep the xVs alive. So when putting an
+(uncounted) reference on the stack, it is vitally important to ensure that
+there will be a counted reference to the same xV that will last at least
+as long as the uncounted reference. But it's also important that that
+counted reference be cleaned up at an appropriate time, and not unduly
+prolong the xV's life. For there to be a mortal reference is often the
+best way to satisfy this requirement, especially if the xV was created
+especially to be put on the stack and would otherwise be unreferenced.
+.PP
+To create a mortal reference, use the functions:
+.PP
+.Vb 3
+\& SV* sv_newmortal()
+\& SV* sv_mortalcopy(SV*)
+\& SV* sv_2mortal(SV*)
+.Ve
+.PP
+\&\f(CWsv_newmortal()\fR creates an SV (with the undefined value) whose sole
+reference is mortal. \f(CWsv_mortalcopy()\fR creates an xV whose value is a
+copy of a supplied xV and whose sole reference is mortal. \f(CWsv_2mortal()\fR
+mortalises an existing xV reference: it transfers ownership of a reference
+from the caller to the temps stack. Because \f(CW\*(C`sv_newmortal\*(C'\fR gives the new
+SV no value, it must normally be given one via \f(CW\*(C`sv_setpv\*(C'\fR, \f(CW\*(C`sv_setiv\*(C'\fR,
+etc. :
+.PP
+.Vb 2
+\& SV *tmp = sv_newmortal();
+\& sv_setiv(tmp, an_integer);
+.Ve
+.PP
+As that is multiple C statements it is quite common so see this idiom instead:
+.PP
+.Vb 1
+\& SV *tmp = sv_2mortal(newSViv(an_integer));
+.Ve
+.PP
+The mortal routines are not just for SVs; AVs and HVs can be
+made mortal by passing their address (type-casted to \f(CW\*(C`SV*\*(C'\fR) to the
+\&\f(CW\*(C`sv_2mortal\*(C'\fR or \f(CW\*(C`sv_mortalcopy\*(C'\fR routines.
+.SS "Stashes and Globs"
+.IX Subsection "Stashes and Globs"
+A \fBstash\fR is a hash that contains all variables that are defined
+within a package. Each key of the stash is a symbol
+name (shared by all the different types of objects that have the same
+name), and each value in the hash table is a GV (Glob Value). This GV
+in turn contains references to the various objects of that name,
+including (but not limited to) the following:
+.PP
+.Vb 6
+\& Scalar Value
+\& Array Value
+\& Hash Value
+\& I/O Handle
+\& Format
+\& Subroutine
+.Ve
+.PP
+There is a single stash called \f(CW\*(C`PL_defstash\*(C'\fR that holds the items that exist
+in the \f(CW\*(C`main\*(C'\fR package. To get at the items in other packages, append the
+string "::" to the package name. The items in the \f(CW\*(C`Foo\*(C'\fR package are in
+the stash \f(CW\*(C`Foo::\*(C'\fR in PL_defstash. The items in the \f(CW\*(C`Bar::Baz\*(C'\fR package are
+in the stash \f(CW\*(C`Baz::\*(C'\fR in \f(CW\*(C`Bar::\*(C'\fR's stash.
+.PP
+To get the stash pointer for a particular package, use the function:
+.PP
+.Vb 2
+\& HV* gv_stashpv(const char* name, I32 flags)
+\& HV* gv_stashsv(SV*, I32 flags)
+.Ve
+.PP
+The first function takes a literal string, the second uses the string stored
+in the SV. Remember that a stash is just a hash table, so you get back an
+\&\f(CW\*(C`HV*\*(C'\fR. The \f(CW\*(C`flags\*(C'\fR flag will create a new package if it is set to GV_ADD.
+.PP
+The name that \f(CW\*(C`gv_stash*v\*(C'\fR wants is the name of the package whose symbol table
+you want. The default package is called \f(CW\*(C`main\*(C'\fR. If you have multiply nested
+packages, pass their names to \f(CW\*(C`gv_stash*v\*(C'\fR, separated by \f(CW\*(C`::\*(C'\fR as in the Perl
+language itself.
+.PP
+Alternately, if you have an SV that is a blessed reference, you can find
+out the stash pointer by using:
+.PP
+.Vb 1
+\& HV* SvSTASH(SvRV(SV*));
+.Ve
+.PP
+then use the following to get the package name itself:
+.PP
+.Vb 1
+\& char* HvNAME(HV* stash);
+.Ve
+.PP
+If you need to bless or re-bless an object you can use the following
+function:
+.PP
+.Vb 1
+\& SV* sv_bless(SV*, HV* stash)
+.Ve
+.PP
+where the first argument, an \f(CW\*(C`SV*\*(C'\fR, must be a reference, and the second
+argument is a stash. The returned \f(CW\*(C`SV*\*(C'\fR can now be used in the same way
+as any other SV.
+.PP
+For more information on references and blessings, consult perlref.
+.SS "I/O Handles"
+.IX Subsection "I/O Handles"
+Like AVs and HVs, IO objects are another type of non-scalar SV which
+may contain input and output PerlIO objects or a \f(CW\*(C`DIR *\*(C'\fR
+from \fBopendir()\fR.
+.PP
+You can create a new IO object:
+.PP
+.Vb 1
+\& IO* newIO();
+.Ve
+.PP
+Unlike other SVs, a new IO object is automatically blessed into the
+IO::File class.
+.PP
+The IO object contains an input and output PerlIO handle:
+.PP
+.Vb 2
+\& PerlIO *IoIFP(IO *io);
+\& PerlIO *IoOFP(IO *io);
+.Ve
+.PP
+Typically if the IO object has been opened on a file, the input handle
+is always present, but the output handle is only present if the file
+is open for output. For a file, if both are present they will be the
+same PerlIO object.
+.PP
+Distinct input and output PerlIO objects are created for sockets and
+character devices.
+.PP
+The IO object also contains other data associated with Perl I/O
+handles:
+.PP
+.Vb 12
+\& IV IoLINES(io); /* $. */
+\& IV IoPAGE(io); /* $% */
+\& IV IoPAGE_LEN(io); /* $= */
+\& IV IoLINES_LEFT(io); /* $\- */
+\& char *IoTOP_NAME(io); /* $^ */
+\& GV *IoTOP_GV(io); /* $^ */
+\& char *IoFMT_NAME(io); /* $~ */
+\& GV *IoFMT_GV(io); /* $~ */
+\& char *IoBOTTOM_NAME(io);
+\& GV *IoBOTTOM_GV(io);
+\& char IoTYPE(io);
+\& U8 IoFLAGS(io);
+\&
+\& =for apidoc_sections $io_scn, $formats_section
+\&=for apidoc_section $reports
+\&=for apidoc Amh|IV|IoLINES|IO *io
+\&=for apidoc Amh|IV|IoPAGE|IO *io
+\&=for apidoc Amh|IV|IoPAGE_LEN|IO *io
+\&=for apidoc Amh|IV|IoLINES_LEFT|IO *io
+\&=for apidoc Amh|char *|IoTOP_NAME|IO *io
+\&=for apidoc Amh|GV *|IoTOP_GV|IO *io
+\&=for apidoc Amh|char *|IoFMT_NAME|IO *io
+\&=for apidoc Amh|GV *|IoFMT_GV|IO *io
+\&=for apidoc Amh|char *|IoBOTTOM_NAME|IO *io
+\&=for apidoc Amh|GV *|IoBOTTOM_GV|IO *io
+\&=for apidoc_section $io
+\&=for apidoc Amh|char|IoTYPE|IO *io
+\&=for apidoc Amh|U8|IoFLAGS|IO *io
+.Ve
+.PP
+Most of these are involved with formats.
+.PP
+\&\fBIoFLAGs()\fR may contain a combination of flags, the most interesting of
+which are \f(CW\*(C`IOf_FLUSH\*(C'\fR (\f(CW$|\fR) for autoflush and \f(CW\*(C`IOf_UNTAINT\*(C'\fR,
+settable with IO::Handle's \fBuntaint()\fR method.
+.PP
+The IO object may also contains a directory handle:
+.PP
+.Vb 1
+\& DIR *IoDIRP(io);
+.Ve
+.PP
+suitable for use with \fBPerlDir_read()\fR etc.
+.PP
+All of these accessors macros are lvalues, there are no distinct
+\&\f(CW_set()\fR macros to modify the members of the IO object.
+.SS "Double-Typed SVs"
+.IX Subsection "Double-Typed SVs"
+Scalar variables normally contain only one type of value, an integer,
+double, pointer, or reference. Perl will automatically convert the
+actual scalar data from the stored type into the requested type.
+.PP
+Some scalar variables contain more than one type of scalar data. For
+example, the variable \f(CW$!\fR contains either the numeric value of \f(CW\*(C`errno\*(C'\fR
+or its string equivalent from either \f(CW\*(C`strerror\*(C'\fR or \f(CW\*(C`sys_errlist[]\*(C'\fR.
+.PP
+To force multiple data values into an SV, you must do two things: use the
+\&\f(CW\*(C`sv_set*v\*(C'\fR routines to add the additional scalar type, then set a flag
+so that Perl will believe it contains more than one type of data. The
+four macros to set the flags are:
+.PP
+.Vb 4
+\& SvIOK_on
+\& SvNOK_on
+\& SvPOK_on
+\& SvROK_on
+.Ve
+.PP
+The particular macro you must use depends on which \f(CW\*(C`sv_set*v\*(C'\fR routine
+you called first. This is because every \f(CW\*(C`sv_set*v\*(C'\fR routine turns on
+only the bit for the particular type of data being set, and turns off
+all the rest.
+.PP
+For example, to create a new Perl variable called "dberror" that contains
+both the numeric and descriptive string error values, you could use the
+following code:
+.PP
+.Vb 2
+\& extern int dberror;
+\& extern char *dberror_list;
+\&
+\& SV* sv = get_sv("dberror", GV_ADD);
+\& sv_setiv(sv, (IV) dberror);
+\& sv_setpv(sv, dberror_list[dberror]);
+\& SvIOK_on(sv);
+.Ve
+.PP
+If the order of \f(CW\*(C`sv_setiv\*(C'\fR and \f(CW\*(C`sv_setpv\*(C'\fR had been reversed, then the
+macro \f(CW\*(C`SvPOK_on\*(C'\fR would need to be called instead of \f(CW\*(C`SvIOK_on\*(C'\fR.
+.SS "Read-Only Values"
+.IX Subsection "Read-Only Values"
+In Perl 5.16 and earlier, copy-on-write (see the next section) shared a
+flag bit with read-only scalars. So the only way to test whether
+\&\f(CW\*(C`sv_setsv\*(C'\fR, etc., will raise a "Modification of a read-only value" error
+in those versions is:
+.PP
+.Vb 1
+\& SvREADONLY(sv) && !SvIsCOW(sv)
+.Ve
+.PP
+Under Perl 5.18 and later, SvREADONLY only applies to read-only variables,
+and, under 5.20, copy-on-write scalars can also be read-only, so the above
+check is incorrect. You just want:
+.PP
+.Vb 1
+\& SvREADONLY(sv)
+.Ve
+.PP
+If you need to do this check often, define your own macro like this:
+.PP
+.Vb 5
+\& #if PERL_VERSION >= 18
+\& # define SvTRULYREADONLY(sv) SvREADONLY(sv)
+\& #else
+\& # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv))
+\& #endif
+.Ve
+.SS "Copy on Write"
+.IX Subsection "Copy on Write"
+Perl implements a copy-on-write (COW) mechanism for scalars, in which
+string copies are not immediately made when requested, but are deferred
+until made necessary by one or the other scalar changing. This is mostly
+transparent, but one must take care not to modify string buffers that are
+shared by multiple SVs.
+.PP
+You can test whether an SV is using copy-on-write with \f(CWSvIsCOW(sv)\fR.
+.PP
+You can force an SV to make its own copy of its string buffer by calling \f(CWsv_force_normal(sv)\fR or SvPV_force_nolen(sv).
+.PP
+If you want to make the SV drop its string buffer, use
+\&\f(CW\*(C`sv_force_normal_flags(sv, SV_COW_DROP_PV)\*(C'\fR or simply
+\&\f(CW\*(C`sv_setsv(sv, NULL)\*(C'\fR.
+.PP
+All of these functions will croak on read-only scalars (see the previous
+section for more on those).
+.PP
+To test that your code is behaving correctly and not modifying COW buffers,
+on systems that support \fBmmap\fR\|(2) (i.e., Unix) you can configure perl with
+\&\f(CW\*(C`\-Accflags=\-DPERL_DEBUG_READONLY_COW\*(C'\fR and it will turn buffer violations
+into crashes. You will find it to be marvellously slow, so you may want to
+skip perl's own tests.
+.SS "Magic Variables"
+.IX Subsection "Magic Variables"
+[This section still under construction. Ignore everything here. Post no
+bills. Everything not permitted is forbidden.]
+.PP
+Any SV may be magical, that is, it has special features that a normal
+SV does not have. These features are stored in the SV structure in a
+linked list of \f(CW\*(C`struct magic\*(C'\fR's, typedef'ed to \f(CW\*(C`MAGIC\*(C'\fR.
+.PP
+.Vb 10
+\& struct magic {
+\& MAGIC* mg_moremagic;
+\& MGVTBL* mg_virtual;
+\& U16 mg_private;
+\& char mg_type;
+\& U8 mg_flags;
+\& I32 mg_len;
+\& SV* mg_obj;
+\& char* mg_ptr;
+\& };
+.Ve
+.PP
+Note this is current as of patchlevel 0, and could change at any time.
+.SS "Assigning Magic"
+.IX Subsection "Assigning Magic"
+Perl adds magic to an SV using the sv_magic function:
+.PP
+.Vb 1
+\& void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
+.Ve
+.PP
+The \f(CW\*(C`sv\*(C'\fR argument is a pointer to the SV that is to acquire a new magical
+feature.
+.PP
+If \f(CW\*(C`sv\*(C'\fR is not already magical, Perl uses the \f(CW\*(C`SvUPGRADE\*(C'\fR macro to
+convert \f(CW\*(C`sv\*(C'\fR to type \f(CW\*(C`SVt_PVMG\*(C'\fR.
+Perl then continues by adding new magic
+to the beginning of the linked list of magical features. Any prior entry
+of the same type of magic is deleted. Note that this can be overridden,
+and multiple instances of the same type of magic can be associated with an
+SV.
+.PP
+The \f(CW\*(C`name\*(C'\fR and \f(CW\*(C`namlen\*(C'\fR arguments are used to associate a string with
+the magic, typically the name of a variable. \f(CW\*(C`namlen\*(C'\fR is stored in the
+\&\f(CW\*(C`mg_len\*(C'\fR field and if \f(CW\*(C`name\*(C'\fR is non-null then either a \f(CW\*(C`savepvn\*(C'\fR copy of
+\&\f(CW\*(C`name\*(C'\fR or \f(CW\*(C`name\*(C'\fR itself is stored in the \f(CW\*(C`mg_ptr\*(C'\fR field, depending on
+whether \f(CW\*(C`namlen\*(C'\fR is greater than zero or equal to zero respectively. As a
+special case, if \f(CW\*(C`(name && namlen == HEf_SVKEY)\*(C'\fR then \f(CW\*(C`name\*(C'\fR is assumed
+to contain an \f(CW\*(C`SV*\*(C'\fR and is stored as-is with its REFCNT incremented.
+.PP
+The sv_magic function uses \f(CW\*(C`how\*(C'\fR to determine which, if any, predefined
+"Magic Virtual Table" should be assigned to the \f(CW\*(C`mg_virtual\*(C'\fR field.
+See the "Magic Virtual Tables" section below. The \f(CW\*(C`how\*(C'\fR argument is also
+stored in the \f(CW\*(C`mg_type\*(C'\fR field. The value of
+\&\f(CW\*(C`how\*(C'\fR should be chosen from the set of macros
+\&\f(CW\*(C`PERL_MAGIC_foo\*(C'\fR found in \fIperl.h\fR. Note that before
+these macros were added, Perl internals used to directly use character
+literals, so you may occasionally come across old code or documentation
+referring to 'U' magic rather than \f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR for example.
+.PP
+The \f(CW\*(C`obj\*(C'\fR argument is stored in the \f(CW\*(C`mg_obj\*(C'\fR field of the \f(CW\*(C`MAGIC\*(C'\fR
+structure. If it is not the same as the \f(CW\*(C`sv\*(C'\fR argument, the reference
+count of the \f(CW\*(C`obj\*(C'\fR object is incremented. If it is the same, or if
+the \f(CW\*(C`how\*(C'\fR argument is \f(CW\*(C`PERL_MAGIC_arylen\*(C'\fR, \f(CW\*(C`PERL_MAGIC_regdatum\*(C'\fR,
+\&\f(CW\*(C`PERL_MAGIC_regdata\*(C'\fR, or if it is a NULL pointer, then \f(CW\*(C`obj\*(C'\fR is merely
+stored, without the reference count being incremented.
+.PP
+See also \f(CW\*(C`sv_magicext\*(C'\fR in perlapi for a more flexible way to add magic
+to an SV.
+.PP
+There is also a function to add magic to an \f(CW\*(C`HV\*(C'\fR:
+.PP
+.Vb 1
+\& void hv_magic(HV *hv, GV *gv, int how);
+.Ve
+.PP
+This simply calls \f(CW\*(C`sv_magic\*(C'\fR and coerces the \f(CW\*(C`gv\*(C'\fR argument into an \f(CW\*(C`SV\*(C'\fR.
+.PP
+To remove the magic from an SV, call the function sv_unmagic:
+.PP
+.Vb 1
+\& int sv_unmagic(SV *sv, int type);
+.Ve
+.PP
+The \f(CW\*(C`type\*(C'\fR argument should be equal to the \f(CW\*(C`how\*(C'\fR value when the \f(CW\*(C`SV\*(C'\fR
+was initially made magical.
+.PP
+However, note that \f(CW\*(C`sv_unmagic\*(C'\fR removes all magic of a certain \f(CW\*(C`type\*(C'\fR from the
+\&\f(CW\*(C`SV\*(C'\fR. If you want to remove only certain
+magic of a \f(CW\*(C`type\*(C'\fR based on the magic
+virtual table, use \f(CW\*(C`sv_unmagicext\*(C'\fR instead:
+.PP
+.Vb 1
+\& int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl);
+.Ve
+.SS "Magic Virtual Tables"
+.IX Subsection "Magic Virtual Tables"
+The \f(CW\*(C`mg_virtual\*(C'\fR field in the \f(CW\*(C`MAGIC\*(C'\fR structure is a pointer to an
+\&\f(CW\*(C`MGVTBL\*(C'\fR, which is a structure of function pointers and stands for
+"Magic Virtual Table" to handle the various operations that might be
+applied to that variable.
+.PP
+The \f(CW\*(C`MGVTBL\*(C'\fR has five (or sometimes eight) pointers to the following
+routine types:
+.PP
+.Vb 5
+\& int (*svt_get) (pTHX_ SV* sv, MAGIC* mg);
+\& int (*svt_set) (pTHX_ SV* sv, MAGIC* mg);
+\& U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg);
+\& int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg);
+\& int (*svt_free) (pTHX_ SV* sv, MAGIC* mg);
+\&
+\& int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv,
+\& const char *name, I32 namlen);
+\& int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param);
+\& int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg);
+.Ve
+.PP
+This MGVTBL structure is set at compile-time in \fIperl.h\fR and there are
+currently 32 types. These different structures contain pointers to various
+routines that perform additional actions depending on which function is
+being called.
+.PP
+.Vb 8
+\& Function pointer Action taken
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-
+\& svt_get Do something before the value of the SV is
+\& retrieved.
+\& svt_set Do something after the SV is assigned a value.
+\& svt_len Report on the SV\*(Aqs length.
+\& svt_clear Clear something the SV represents.
+\& svt_free Free any extra storage associated with the SV.
+\&
+\& svt_copy copy tied variable magic to a tied element
+\& svt_dup duplicate a magic structure during thread cloning
+\& svt_local copy magic to local value during \*(Aqlocal\*(Aq
+.Ve
+.PP
+For instance, the MGVTBL structure called \f(CW\*(C`vtbl_sv\*(C'\fR (which corresponds
+to an \f(CW\*(C`mg_type\*(C'\fR of \f(CW\*(C`PERL_MAGIC_sv\*(C'\fR) contains:
+.PP
+.Vb 1
+\& { magic_get, magic_set, magic_len, 0, 0 }
+.Ve
+.PP
+Thus, when an SV is determined to be magical and of type \f(CW\*(C`PERL_MAGIC_sv\*(C'\fR,
+if a get operation is being performed, the routine \f(CW\*(C`magic_get\*(C'\fR is
+called. All the various routines for the various magical types begin
+with \f(CW\*(C`magic_\*(C'\fR. NOTE: the magic routines are not considered part of
+the Perl API, and may not be exported by the Perl library.
+.PP
+The last three slots are a recent addition, and for source code
+compatibility they are only checked for if one of the three flags
+\&\f(CW\*(C`MGf_COPY\*(C'\fR, \f(CW\*(C`MGf_DUP\*(C'\fR, or \f(CW\*(C`MGf_LOCAL\*(C'\fR is set in mg_flags.
+This means that most code can continue declaring
+a vtable as a 5\-element value. These three are
+currently used exclusively by the threading code, and are highly subject
+to change.
+.PP
+The current kinds of Magic Virtual Tables are:
+.PP
+.Vb 10
+\& mg_type
+\& (old\-style char and macro) MGVTBL Type of magic
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-\-
+\& \e0 PERL_MAGIC_sv vtbl_sv Special scalar variable
+\& # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
+\& % PERL_MAGIC_rhash (none) Extra data for restricted
+\& hashes
+\& * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace
+\& vars
+\& . PERL_MAGIC_pos vtbl_pos pos() lvalue
+\& : PERL_MAGIC_symtab (none) Extra data for symbol
+\& tables
+\& < PERL_MAGIC_backref vtbl_backref For weak ref data
+\& @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV
+\& B PERL_MAGIC_bm vtbl_regexp Boyer\-Moore
+\& (fast string search)
+\& c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
+\& (AMT) on stash
+\& D PERL_MAGIC_regdata vtbl_regdata Regex match position data
+\& (@+ and @\- vars)
+\& d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
+\& element
+\& E PERL_MAGIC_env vtbl_env %ENV hash
+\& e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
+\& f PERL_MAGIC_fm vtbl_regexp Formline
+\& (\*(Aqcompiled\*(Aq format)
+\& g PERL_MAGIC_regex_global vtbl_mglob m//g target
+\& H PERL_MAGIC_hints vtbl_hints %^H hash
+\& h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
+\& I PERL_MAGIC_isa vtbl_isa @ISA array
+\& i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
+\& k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
+\& L PERL_MAGIC_dbfile (none) Debugger %_<filename
+\& l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
+\& element
+\& N PERL_MAGIC_shared (none) Shared between threads
+\& n PERL_MAGIC_shared_scalar (none) Shared between threads
+\& o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
+\& P PERL_MAGIC_tied vtbl_pack Tied array or hash
+\& p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
+\& q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
+\& r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex
+\& S PERL_MAGIC_sig vtbl_sig %SIG hash
+\& s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
+\& t PERL_MAGIC_taint vtbl_taint Taintedness
+\& U PERL_MAGIC_uvar vtbl_uvar Available for use by
+\& extensions
+\& u PERL_MAGIC_uvar_elem (none) Reserved for use by
+\& extensions
+\& V PERL_MAGIC_vstring (none) SV was vstring literal
+\& v PERL_MAGIC_vec vtbl_vec vec() lvalue
+\& w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF\-8 information
+\& X PERL_MAGIC_destruct vtbl_destruct destruct callback
+\& x PERL_MAGIC_substr vtbl_substr substr() lvalue
+\& Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not
+\& exist
+\& y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
+\& variable / smart parameter
+\& vivification
+\& Z PERL_MAGIC_hook vtbl_hook %{^HOOK} hash
+\& z PERL_MAGIC_hookelem vtbl_hookelem %{^HOOK} hash element
+\& \e PERL_MAGIC_lvref vtbl_lvref Lvalue reference
+\& constructor
+\& ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call
+\& to this CV
+\& ^ PERL_MAGIC_extvalue (none) Value magic available for
+\& use by extensions
+\& ~ PERL_MAGIC_ext (none) Variable magic available
+\& for use by extensions
+.Ve
+.PP
+When an uppercase and lowercase letter both exist in the table, then the
+uppercase letter is typically used to represent some kind of composite type
+(a list or a hash), and the lowercase letter is used to represent an element
+of that composite type. Some internals code makes use of this case
+relationship. However, 'v' and 'V' (vec and v\-string) are in no way related.
+.PP
+The \f(CW\*(C`PERL_MAGIC_ext\*(C'\fR, \f(CW\*(C`PERL_MAGIC_extvalue\*(C'\fR and \f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR magic types
+are defined specifically for use by extensions and will not be used by perl
+itself. Extensions can use \f(CW\*(C`PERL_MAGIC_ext\*(C'\fR or \f(CW\*(C`PERL_MAGIC_extvalue\*(C'\fR magic to
+\&'attach' private information to variables (typically objects). This is
+especially useful because there is no way for normal perl code to corrupt this
+private information (unlike using extra elements of a hash object).
+\&\f(CW\*(C`PERL_MAGIC_extvalue\*(C'\fR is value magic (unlike \f(CW\*(C`PERL_MAGIC_ext\*(C'\fR and
+\&\f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR) meaning that on localization the new value will not be
+magical.
+.PP
+Similarly, \f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR magic can be used much like \fBtie()\fR to call a
+C function any time a scalar's value is used or changed. The \f(CW\*(C`MAGIC\*(C'\fR's
+\&\f(CW\*(C`mg_ptr\*(C'\fR field points to a \f(CW\*(C`ufuncs\*(C'\fR structure:
+.PP
+.Vb 5
+\& struct ufuncs {
+\& I32 (*uf_val)(pTHX_ IV, SV*);
+\& I32 (*uf_set)(pTHX_ IV, SV*);
+\& IV uf_index;
+\& };
+.Ve
+.PP
+When the SV is read from or written to, the \f(CW\*(C`uf_val\*(C'\fR or \f(CW\*(C`uf_set\*(C'\fR
+function will be called with \f(CW\*(C`uf_index\*(C'\fR as the first arg and a pointer to
+the SV as the second. A simple example of how to add \f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR
+magic is shown below. Note that the ufuncs structure is copied by
+sv_magic, so you can safely allocate it on the stack.
+.PP
+.Vb 10
+\& void
+\& Umagic(sv)
+\& SV *sv;
+\& PREINIT:
+\& struct ufuncs uf;
+\& CODE:
+\& uf.uf_val = &my_get_fn;
+\& uf.uf_set = &my_set_fn;
+\& uf.uf_index = 0;
+\& sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
+.Ve
+.PP
+Attaching \f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR to arrays is permissible but has no effect.
+.PP
+For hashes there is a specialized hook that gives control over hash
+keys (but not values). This hook calls \f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR 'get' magic
+if the "set" function in the \f(CW\*(C`ufuncs\*(C'\fR structure is NULL. The hook
+is activated whenever the hash is accessed with a key specified as
+an \f(CW\*(C`SV\*(C'\fR through the functions \f(CW\*(C`hv_store_ent\*(C'\fR, \f(CW\*(C`hv_fetch_ent\*(C'\fR,
+\&\f(CW\*(C`hv_delete_ent\*(C'\fR, and \f(CW\*(C`hv_exists_ent\*(C'\fR. Accessing the key as a string
+through the functions without the \f(CW\*(C`..._ent\*(C'\fR suffix circumvents the
+hook. See "GUTS" in Hash::Util::FieldHash for a detailed description.
+.PP
+Note that because multiple extensions may be using \f(CW\*(C`PERL_MAGIC_ext\*(C'\fR
+or \f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR magic, it is important for extensions to take
+extra care to avoid conflict. Typically only using the magic on
+objects blessed into the same class as the extension is sufficient.
+For \f(CW\*(C`PERL_MAGIC_ext\*(C'\fR magic, it is usually a good idea to define an
+\&\f(CW\*(C`MGVTBL\*(C'\fR, even if all its fields will be \f(CW0\fR, so that individual
+\&\f(CW\*(C`MAGIC\*(C'\fR pointers can be identified as a particular kind of magic
+using their magic virtual table. \f(CW\*(C`mg_findext\*(C'\fR provides an easy way
+to do that:
+.PP
+.Vb 1
+\& STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
+\&
+\& MAGIC *mg;
+\& if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
+\& /* this is really ours, not another module\*(Aqs PERL_MAGIC_ext */
+\& my_priv_data_t *priv = (my_priv_data_t *)mg\->mg_ptr;
+\& ...
+\& }
+.Ve
+.PP
+Also note that the \f(CW\*(C`sv_set*()\*(C'\fR and \f(CW\*(C`sv_cat*()\*(C'\fR functions described
+earlier do \fBnot\fR invoke 'set' magic on their targets. This must
+be done by the user either by calling the \f(CWSvSETMAGIC()\fR macro after
+calling these functions, or by using one of the \f(CW\*(C`sv_set*_mg()\*(C'\fR or
+\&\f(CW\*(C`sv_cat*_mg()\*(C'\fR functions. Similarly, generic C code must call the
+\&\f(CWSvGETMAGIC()\fR macro to invoke any 'get' magic if they use an SV
+obtained from external sources in functions that don't handle magic.
+See perlapi for a description of these functions.
+For example, calls to the \f(CW\*(C`sv_cat*()\*(C'\fR functions typically need to be
+followed by \f(CWSvSETMAGIC()\fR, but they don't need a prior \f(CWSvGETMAGIC()\fR
+since their implementation handles 'get' magic.
+.SS "Finding Magic"
+.IX Subsection "Finding Magic"
+.Vb 2
+\& MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that
+\& * type */
+.Ve
+.PP
+This routine returns a pointer to a \f(CW\*(C`MAGIC\*(C'\fR structure stored in the SV.
+If the SV does not have that magical
+feature, \f(CW\*(C`NULL\*(C'\fR is returned. If the
+SV has multiple instances of that magical feature, the first one will be
+returned. \f(CW\*(C`mg_findext\*(C'\fR can be used
+to find a \f(CW\*(C`MAGIC\*(C'\fR structure of an SV
+based on both its magic type and its magic virtual table:
+.PP
+.Vb 1
+\& MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
+.Ve
+.PP
+Also, if the SV passed to \f(CW\*(C`mg_find\*(C'\fR or \f(CW\*(C`mg_findext\*(C'\fR is not of type
+SVt_PVMG, Perl may core dump.
+.PP
+.Vb 1
+\& int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
+.Ve
+.PP
+This routine checks to see what types of magic \f(CW\*(C`sv\*(C'\fR has. If the mg_type
+field is an uppercase letter, then the mg_obj is copied to \f(CW\*(C`nsv\*(C'\fR, but
+the mg_type field is changed to be the lowercase letter.
+.SS "Understanding the Magic of Tied Hashes and Arrays"
+.IX Subsection "Understanding the Magic of Tied Hashes and Arrays"
+Tied hashes and arrays are magical beasts of the \f(CW\*(C`PERL_MAGIC_tied\*(C'\fR
+magic type.
+.PP
+WARNING: As of the 5.004 release, proper usage of the array and hash
+access functions requires understanding a few caveats. Some
+of these caveats are actually considered bugs in the API, to be fixed
+in later releases, and are bracketed with [MAYCHANGE] below. If
+you find yourself actually applying such information in this section, be
+aware that the behavior may change in the future, umm, without warning.
+.PP
+The perl tie function associates a variable with an object that implements
+the various GET, SET, etc methods. To perform the equivalent of the perl
+tie function from an XSUB, you must mimic this behaviour. The code below
+carries out the necessary steps \-\- firstly it creates a new hash, and then
+creates a second hash which it blesses into the class which will implement
+the tie methods. Lastly it ties the two hashes together, and returns a
+reference to the new tied hash. Note that the code below does NOT call the
+TIEHASH method in the MyTie class \-
+see "Calling Perl Routines from within C Programs" for details on how
+to do this.
+.PP
+.Vb 10
+\& SV*
+\& mytie()
+\& PREINIT:
+\& HV *hash;
+\& HV *stash;
+\& SV *tie;
+\& CODE:
+\& hash = newHV();
+\& tie = newRV_noinc((SV*)newHV());
+\& stash = gv_stashpv("MyTie", GV_ADD);
+\& sv_bless(tie, stash);
+\& hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
+\& RETVAL = newRV_noinc(hash);
+\& OUTPUT:
+\& RETVAL
+.Ve
+.PP
+The \f(CW\*(C`av_store\*(C'\fR function, when given a tied array argument, merely
+copies the magic of the array onto the value to be "stored", using
+\&\f(CW\*(C`mg_copy\*(C'\fR. It may also return NULL, indicating that the value did not
+actually need to be stored in the array. [MAYCHANGE] After a call to
+\&\f(CW\*(C`av_store\*(C'\fR on a tied array, the caller will usually need to call
+\&\f(CWmg_set(val)\fR to actually invoke the perl level "STORE" method on the
+TIEARRAY object. If \f(CW\*(C`av_store\*(C'\fR did return NULL, a call to
+\&\f(CWSvREFCNT_dec(val)\fR will also be usually necessary to avoid a memory
+leak. [/MAYCHANGE]
+.PP
+The previous paragraph is applicable verbatim to tied hash access using the
+\&\f(CW\*(C`hv_store\*(C'\fR and \f(CW\*(C`hv_store_ent\*(C'\fR functions as well.
+.PP
+\&\f(CW\*(C`av_fetch\*(C'\fR and the corresponding hash functions \f(CW\*(C`hv_fetch\*(C'\fR and
+\&\f(CW\*(C`hv_fetch_ent\*(C'\fR actually return an undefined mortal value whose magic
+has been initialized using \f(CW\*(C`mg_copy\*(C'\fR. Note the value so returned does not
+need to be deallocated, as it is already mortal. [MAYCHANGE] But you will
+need to call \f(CWmg_get()\fR on the returned value in order to actually invoke
+the perl level "FETCH" method on the underlying TIE object. Similarly,
+you may also call \f(CWmg_set()\fR on the return value after possibly assigning
+a suitable value to it using \f(CW\*(C`sv_setsv\*(C'\fR, which will invoke the "STORE"
+method on the TIE object. [/MAYCHANGE]
+.PP
+[MAYCHANGE]
+In other words, the array or hash fetch/store functions don't really
+fetch and store actual values in the case of tied arrays and hashes. They
+merely call \f(CW\*(C`mg_copy\*(C'\fR to attach magic to the values that were meant to be
+"stored" or "fetched". Later calls to \f(CW\*(C`mg_get\*(C'\fR and \f(CW\*(C`mg_set\*(C'\fR actually
+do the job of invoking the TIE methods on the underlying objects. Thus
+the magic mechanism currently implements a kind of lazy access to arrays
+and hashes.
+.PP
+Currently (as of perl version 5.004), use of the hash and array access
+functions requires the user to be aware of whether they are operating on
+"normal" hashes and arrays, or on their tied variants. The API may be
+changed to provide more transparent access to both tied and normal data
+types in future versions.
+[/MAYCHANGE]
+.PP
+You would do well to understand that the TIEARRAY and TIEHASH interfaces
+are mere sugar to invoke some perl method calls while using the uniform hash
+and array syntax. The use of this sugar imposes some overhead (typically
+about two to four extra opcodes per FETCH/STORE operation, in addition to
+the creation of all the mortal variables required to invoke the methods).
+This overhead will be comparatively small if the TIE methods are themselves
+substantial, but if they are only a few statements long, the overhead
+will not be insignificant.
+.SS "Localizing changes"
+.IX Subsection "Localizing changes"
+Perl has a very handy construction
+.PP
+.Vb 4
+\& {
+\& local $var = 2;
+\& ...
+\& }
+.Ve
+.PP
+This construction is \fIapproximately\fR equivalent to
+.PP
+.Vb 6
+\& {
+\& my $oldvar = $var;
+\& $var = 2;
+\& ...
+\& $var = $oldvar;
+\& }
+.Ve
+.PP
+The biggest difference is that the first construction would
+reinstate the initial value of \f(CW$var\fR, irrespective of how control exits
+the block: \f(CW\*(C`goto\*(C'\fR, \f(CW\*(C`return\*(C'\fR, \f(CW\*(C`die\*(C'\fR/\f(CW\*(C`eval\*(C'\fR, etc. It is a little bit
+more efficient as well.
+.PP
+There is a way to achieve a similar task from C via Perl API: create a
+\&\fIpseudo-block\fR, and arrange for some changes to be automatically
+undone at the end of it, either explicit, or via a non-local exit (via
+\&\fBdie()\fR). A \fIblock\fR\-like construct is created by a pair of
+\&\f(CW\*(C`ENTER\*(C'\fR/\f(CW\*(C`LEAVE\*(C'\fR macros (see "Returning a Scalar" in perlcall).
+Such a construct may be created specially for some important localized
+task, or an existing one (like boundaries of enclosing Perl
+subroutine/block, or an existing pair for freeing TMPs) may be
+used. (In the second case the overhead of additional localization must
+be almost negligible.) Note that any XSUB is automatically enclosed in
+an \f(CW\*(C`ENTER\*(C'\fR/\f(CW\*(C`LEAVE\*(C'\fR pair.
+.PP
+Inside such a \fIpseudo-block\fR the following service is available:
+.ie n .IP """SAVEINT(int i)""" 4
+.el .IP "\f(CWSAVEINT(int i)\fR" 4
+.IX Item "SAVEINT(int i)"
+.PD 0
+.ie n .IP """SAVEIV(IV i)""" 4
+.el .IP "\f(CWSAVEIV(IV i)\fR" 4
+.IX Item "SAVEIV(IV i)"
+.ie n .IP """SAVEI32(I32 i)""" 4
+.el .IP "\f(CWSAVEI32(I32 i)\fR" 4
+.IX Item "SAVEI32(I32 i)"
+.ie n .IP """SAVELONG(long i)""" 4
+.el .IP "\f(CWSAVELONG(long i)\fR" 4
+.IX Item "SAVELONG(long i)"
+.ie n .IP """SAVEI8(I8 i)""" 4
+.el .IP "\f(CWSAVEI8(I8 i)\fR" 4
+.IX Item "SAVEI8(I8 i)"
+.ie n .IP """SAVEI16(I16 i)""" 4
+.el .IP "\f(CWSAVEI16(I16 i)\fR" 4
+.IX Item "SAVEI16(I16 i)"
+.ie n .IP """SAVEBOOL(int i)""" 4
+.el .IP "\f(CWSAVEBOOL(int i)\fR" 4
+.IX Item "SAVEBOOL(int i)"
+.ie n .IP """SAVESTRLEN(STRLEN i)""" 4
+.el .IP "\f(CWSAVESTRLEN(STRLEN i)\fR" 4
+.IX Item "SAVESTRLEN(STRLEN i)"
+.PD
+These macros arrange things to restore the value of integer variable
+\&\f(CW\*(C`i\*(C'\fR at the end of the enclosing \fIpseudo-block\fR.
+.ie n .IP SAVESPTR(s) 4
+.el .IP \f(CWSAVESPTR(s)\fR 4
+.IX Item "SAVESPTR(s)"
+.PD 0
+.ie n .IP SAVEPPTR(p) 4
+.el .IP \f(CWSAVEPPTR(p)\fR 4
+.IX Item "SAVEPPTR(p)"
+.PD
+These macros arrange things to restore the value of pointers \f(CW\*(C`s\*(C'\fR and
+\&\f(CW\*(C`p\*(C'\fR. \f(CW\*(C`s\*(C'\fR must be a pointer of a type which survives conversion to
+\&\f(CW\*(C`SV*\*(C'\fR and back, \f(CW\*(C`p\*(C'\fR should be able to survive conversion to \f(CW\*(C`char*\*(C'\fR
+and back.
+.ie n .IP """SAVERCPV(char **ppv)""" 4
+.el .IP "\f(CWSAVERCPV(char **ppv)\fR" 4
+.IX Item "SAVERCPV(char **ppv)"
+This macro arranges to restore the value of a \f(CW\*(C`char *\*(C'\fR variable which
+was allocated with a call to \f(CWrcpv_new()\fR to its previous state when
+the current pseudo block is completed. The pointer stored in \f(CW*ppv\fR at
+the time of the call will be refcount incremented and stored on the save
+stack. Later when the current \fIpseudo-block\fR is completed the value
+stored in \f(CW*ppv\fR will be refcount decremented, and the previous value
+restored from the savestack which will also be refcount decremented.
+.Sp
+This is the \f(CW\*(C`RCPV\*(C'\fR equivalent of \f(CWSAVEGENERICSV()\fR.
+.ie n .IP """SAVEGENERICSV(SV **psv)""" 4
+.el .IP "\f(CWSAVEGENERICSV(SV **psv)\fR" 4
+.IX Item "SAVEGENERICSV(SV **psv)"
+This macro arranges to restore the value of a \f(CW\*(C`SV *\*(C'\fR variable to its
+previous state when the current pseudo block is completed. The pointer
+stored in \f(CW*psv\fR at the time of the call will be refcount incremented
+and stored on the save stack. Later when the current \fIpseudo-block\fR is
+completed the value stored in \f(CW*ppv\fR will be refcount decremented, and
+the previous value restored from the savestack which will also be refcount
+decremented. This the C equivalent of \f(CW\*(C`local $sv\*(C'\fR.
+.ie n .IP """SAVEFREESV(SV *sv)""" 4
+.el .IP "\f(CWSAVEFREESV(SV *sv)\fR" 4
+.IX Item "SAVEFREESV(SV *sv)"
+The refcount of \f(CW\*(C`sv\*(C'\fR will be decremented at the end of
+\&\fIpseudo-block\fR. This is similar to \f(CW\*(C`sv_2mortal\*(C'\fR in that it is also a
+mechanism for doing a delayed \f(CW\*(C`SvREFCNT_dec\*(C'\fR. However, while \f(CW\*(C`sv_2mortal\*(C'\fR
+extends the lifetime of \f(CW\*(C`sv\*(C'\fR until the beginning of the next statement,
+\&\f(CW\*(C`SAVEFREESV\*(C'\fR extends it until the end of the enclosing scope. These
+lifetimes can be wildly different.
+.Sp
+Also compare \f(CW\*(C`SAVEMORTALIZESV\*(C'\fR.
+.ie n .IP """SAVEMORTALIZESV(SV *sv)""" 4
+.el .IP "\f(CWSAVEMORTALIZESV(SV *sv)\fR" 4
+.IX Item "SAVEMORTALIZESV(SV *sv)"
+Just like \f(CW\*(C`SAVEFREESV\*(C'\fR, but mortalizes \f(CW\*(C`sv\*(C'\fR at the end of the current
+scope instead of decrementing its reference count. This usually has the
+effect of keeping \f(CW\*(C`sv\*(C'\fR alive until the statement that called the currently
+live scope has finished executing.
+.ie n .IP """SAVEFREEOP(OP *op)""" 4
+.el .IP "\f(CWSAVEFREEOP(OP *op)\fR" 4
+.IX Item "SAVEFREEOP(OP *op)"
+The \f(CW\*(C`OP *\*(C'\fR is \f(CWop_free()\fRed at the end of \fIpseudo-block\fR.
+.ie n .IP SAVEFREEPV(p) 4
+.el .IP \f(CWSAVEFREEPV(p)\fR 4
+.IX Item "SAVEFREEPV(p)"
+The chunk of memory which is pointed to by \f(CW\*(C`p\*(C'\fR is \f(CWSafefree()\fRed at the
+end of the current \fIpseudo-block\fR.
+.ie n .IP """SAVEFREERCPV(char *pv)""" 4
+.el .IP "\f(CWSAVEFREERCPV(char *pv)\fR" 4
+.IX Item "SAVEFREERCPV(char *pv)"
+Ensures that a \f(CW\*(C`char *\*(C'\fR which was created by a call to \f(CWrcpv_new()\fR is
+\&\f(CWrcpv_free()\fRed at the end of the current \fIpseudo-block\fR.
+.Sp
+This is the RCPV equivalent of \f(CWSAVEFREESV()\fR.
+.ie n .IP """SAVECLEARSV(SV *sv)""" 4
+.el .IP "\f(CWSAVECLEARSV(SV *sv)\fR" 4
+.IX Item "SAVECLEARSV(SV *sv)"
+Clears a slot in the current scratchpad which corresponds to \f(CW\*(C`sv\*(C'\fR at
+the end of \fIpseudo-block\fR.
+.ie n .IP """SAVEDELETE(HV *hv, char *key, I32 length)""" 4
+.el .IP "\f(CWSAVEDELETE(HV *hv, char *key, I32 length)\fR" 4
+.IX Item "SAVEDELETE(HV *hv, char *key, I32 length)"
+The key \f(CW\*(C`key\*(C'\fR of \f(CW\*(C`hv\*(C'\fR is deleted at the end of \fIpseudo-block\fR. The
+string pointed to by \f(CW\*(C`key\*(C'\fR is \fBSafefree()\fRed. If one has a \fIkey\fR in
+short-lived storage, the corresponding string may be reallocated like
+this:
+.Sp
+.Vb 1
+\& SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
+.Ve
+.ie n .IP """SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)""" 4
+.el .IP "\f(CWSAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)\fR" 4
+.IX Item "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)"
+At the end of \fIpseudo-block\fR the function \f(CW\*(C`f\*(C'\fR is called with the
+only argument \f(CW\*(C`p\*(C'\fR which may be NULL.
+.ie n .IP """SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)""" 4
+.el .IP "\f(CWSAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)\fR" 4
+.IX Item "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)"
+At the end of \fIpseudo-block\fR the function \f(CW\*(C`f\*(C'\fR is called with the
+implicit context argument (if any), and \f(CW\*(C`p\*(C'\fR which may be NULL.
+.Sp
+Note the \fIend of the current pseudo-block\fR may occur much later than
+the \fIend of the current statement\fR. You may wish to look at the
+\&\f(CWMORTALDESTRUCTOR_X()\fR macro instead.
+.ie n .IP """MORTALSVFUNC_X(SVFUNC_t f, SV *sv)""" 4
+.el .IP "\f(CWMORTALSVFUNC_X(SVFUNC_t f, SV *sv)\fR" 4
+.IX Item "MORTALSVFUNC_X(SVFUNC_t f, SV *sv)"
+At the end of \fIthe current statement\fR the function \f(CW\*(C`f\*(C'\fR is called with
+the implicit context argument (if any), and \f(CW\*(C`sv\*(C'\fR which may be NULL.
+.Sp
+Be aware that the parameter argument to the destructor function differs
+from the related \f(CWSAVEDESTRUCTOR_X()\fR in that it MUST be either NULL or
+an \f(CW\*(C`SV*\*(C'\fR.
+.Sp
+Note the \fIend of the current statement\fR may occur much before the
+the \fIend of the current pseudo-block\fR. You may wish to look at the
+\&\f(CWSAVEDESTRUCTOR_X()\fR macro instead.
+.ie n .IP """MORTALDESTRUCTOR_SV(SV *coderef, SV *args)""" 4
+.el .IP "\f(CWMORTALDESTRUCTOR_SV(SV *coderef, SV *args)\fR" 4
+.IX Item "MORTALDESTRUCTOR_SV(SV *coderef, SV *args)"
+At the end of \fIthe current statement\fR the Perl function contained in
+\&\f(CW\*(C`coderef\*(C'\fR is called with the arguments provided (if any) in \f(CW\*(C`args\*(C'\fR.
+See the documentation for \f(CWmortal_destructor_sv()\fR for details on
+the \f(CW\*(C`args\*(C'\fR parameter is handled.
+.Sp
+Note the \fIend of the current statement\fR may occur much before the
+the \fIend of the current pseudo-block\fR. If you wish to call a perl
+function at the end of the current pseudo block you should use the
+\&\f(CWSAVEDESTRUCTOR_X()\fR API instead, which will require you create a
+C wrapper to call the Perl function.
+.ie n .IP SAVESTACK_POS() 4
+.el .IP \f(CWSAVESTACK_POS()\fR 4
+.IX Item "SAVESTACK_POS()"
+The current offset on the Perl internal stack (cf. \f(CW\*(C`SP\*(C'\fR) is restored
+at the end of \fIpseudo-block\fR.
+.PP
+The following API list contains functions, thus one needs to
+provide pointers to the modifiable data explicitly (either C pointers,
+or Perlish \f(CW\*(C`GV *\*(C'\fRs). Where the above macros take \f(CW\*(C`int\*(C'\fR, a similar
+function takes \f(CW\*(C`int *\*(C'\fR.
+.PP
+Other macros above have functions implementing them, but its probably
+best to just use the macro, and not those or the ones below.
+.ie n .IP """SV* save_scalar(GV *gv)""" 4
+.el .IP "\f(CWSV* save_scalar(GV *gv)\fR" 4
+.IX Item "SV* save_scalar(GV *gv)"
+Equivalent to Perl code \f(CW\*(C`local $gv\*(C'\fR.
+.ie n .IP """AV* save_ary(GV *gv)""" 4
+.el .IP "\f(CWAV* save_ary(GV *gv)\fR" 4
+.IX Item "AV* save_ary(GV *gv)"
+.PD 0
+.ie n .IP """HV* save_hash(GV *gv)""" 4
+.el .IP "\f(CWHV* save_hash(GV *gv)\fR" 4
+.IX Item "HV* save_hash(GV *gv)"
+.PD
+Similar to \f(CW\*(C`save_scalar\*(C'\fR, but localize \f(CW@gv\fR and \f(CW%gv\fR.
+.ie n .IP """void save_item(SV *item)""" 4
+.el .IP "\f(CWvoid save_item(SV *item)\fR" 4
+.IX Item "void save_item(SV *item)"
+Duplicates the current value of \f(CW\*(C`SV\*(C'\fR. On the exit from the current
+\&\f(CW\*(C`ENTER\*(C'\fR/\f(CW\*(C`LEAVE\*(C'\fR \fIpseudo-block\fR the value of \f(CW\*(C`SV\*(C'\fR will be restored
+using the stored value. It doesn't handle magic. Use \f(CW\*(C`save_scalar\*(C'\fR if
+magic is affected.
+.ie n .IP """SV* save_svref(SV **sptr)""" 4
+.el .IP "\f(CWSV* save_svref(SV **sptr)\fR" 4
+.IX Item "SV* save_svref(SV **sptr)"
+Similar to \f(CW\*(C`save_scalar\*(C'\fR, but will reinstate an \f(CW\*(C`SV *\*(C'\fR.
+.ie n .IP """void save_aptr(AV **aptr)""" 4
+.el .IP "\f(CWvoid save_aptr(AV **aptr)\fR" 4
+.IX Item "void save_aptr(AV **aptr)"
+.PD 0
+.ie n .IP """void save_hptr(HV **hptr)""" 4
+.el .IP "\f(CWvoid save_hptr(HV **hptr)\fR" 4
+.IX Item "void save_hptr(HV **hptr)"
+.PD
+Similar to \f(CW\*(C`save_svref\*(C'\fR, but localize \f(CW\*(C`AV *\*(C'\fR and \f(CW\*(C`HV *\*(C'\fR.
+.PP
+The \f(CW\*(C`Alias\*(C'\fR module implements localization of the basic types within the
+\&\fIcaller's scope\fR. People who are interested in how to localize things in
+the containing scope should take a look there too.
+.SH Subroutines
+.IX Header "Subroutines"
+.SS "XSUBs and the Argument Stack"
+.IX Subsection "XSUBs and the Argument Stack"
+The XSUB mechanism is a simple way for Perl programs to access C subroutines.
+An XSUB routine will have a stack that contains the arguments from the Perl
+program, and a way to map from the Perl data structures to a C equivalent.
+.PP
+The stack arguments are accessible through the \f(CWST(n)\fR macro, which returns
+the \f(CW\*(C`n\*(C'\fR'th stack argument. Argument 0 is the first argument passed in the
+Perl subroutine call. These arguments are \f(CW\*(C`SV*\*(C'\fR, and can be used anywhere
+an \f(CW\*(C`SV*\*(C'\fR is used.
+.PP
+Most of the time, output from the C routine can be handled through use of
+the RETVAL and OUTPUT directives. However, there are some cases where the
+argument stack is not already long enough to handle all the return values.
+An example is the POSIX \fBtzname()\fR call, which takes no arguments, but returns
+two, the local time zone's standard and summer time abbreviations.
+.PP
+To handle this situation, the PPCODE directive is used and the stack is
+extended using the macro:
+.PP
+.Vb 1
+\& EXTEND(SP, num);
+.Ve
+.PP
+where \f(CW\*(C`SP\*(C'\fR is the macro that represents the local copy of the stack pointer,
+and \f(CW\*(C`num\*(C'\fR is the number of elements the stack should be extended by.
+.PP
+Now that there is room on the stack, values can be pushed on it using \f(CW\*(C`PUSHs\*(C'\fR
+macro. The pushed values will often need to be "mortal" (See
+"Reference Counts and Mortality"):
+.PP
+.Vb 7
+\& PUSHs(sv_2mortal(newSViv(an_integer)))
+\& PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
+\& PUSHs(sv_2mortal(newSVnv(a_double)))
+\& PUSHs(sv_2mortal(newSVpv("Some String",0)))
+\& /* Although the last example is better written as the more
+\& * efficient: */
+\& PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
+.Ve
+.PP
+And now the Perl program calling \f(CW\*(C`tzname\*(C'\fR, the two values will be assigned
+as in:
+.PP
+.Vb 1
+\& ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
+.Ve
+.PP
+An alternate (and possibly simpler) method to pushing values on the stack is
+to use the macro:
+.PP
+.Vb 1
+\& XPUSHs(SV*)
+.Ve
+.PP
+This macro automatically adjusts the stack for you, if needed. Thus, you
+do not need to call \f(CW\*(C`EXTEND\*(C'\fR to extend the stack.
+.PP
+Despite their suggestions in earlier versions of this document the macros
+\&\f(CW\*(C`(X)PUSH[iunp]\*(C'\fR are \fInot\fR suited to XSUBs which return multiple results.
+For that, either stick to the \f(CW\*(C`(X)PUSHs\*(C'\fR macros shown above, or use the new
+\&\f(CW\*(C`m(X)PUSH[iunp]\*(C'\fR macros instead; see "Putting a C value on Perl stack".
+.PP
+For more information, consult perlxs and perlxstut.
+.SS "Autoloading with XSUBs"
+.IX Subsection "Autoloading with XSUBs"
+If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the
+fully-qualified name of the autoloaded subroutine in the \f(CW$AUTOLOAD\fR variable
+of the XSUB's package.
+.PP
+But it also puts the same information in certain fields of the XSUB itself:
+.PP
+.Vb 4
+\& HV *stash = CvSTASH(cv);
+\& const char *subname = SvPVX(cv);
+\& STRLEN name_length = SvCUR(cv); /* in bytes */
+\& U32 is_utf8 = SvUTF8(cv);
+.Ve
+.PP
+\&\f(CWSvPVX(cv)\fR contains just the sub name itself, not including the package.
+For an AUTOLOAD routine in UNIVERSAL or one of its superclasses,
+\&\f(CWCvSTASH(cv)\fR returns NULL during a method call on a nonexistent package.
+.PP
+\&\fBNote\fR: Setting \f(CW$AUTOLOAD\fR stopped working in 5.6.1, which did not support
+XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the
+XSUB itself. Perl 5.16.0 restored the setting of \f(CW$AUTOLOAD\fR. If you need
+to support 5.8\-5.14, use the XSUB's fields.
+.SS "Calling Perl Routines from within C Programs"
+.IX Subsection "Calling Perl Routines from within C Programs"
+There are four routines that can be used to call a Perl subroutine from
+within a C program. These four are:
+.PP
+.Vb 4
+\& I32 call_sv(SV*, I32);
+\& I32 call_pv(const char*, I32);
+\& I32 call_method(const char*, I32);
+\& I32 call_argv(const char*, I32, char**);
+.Ve
+.PP
+The routine most often used is \f(CW\*(C`call_sv\*(C'\fR. The \f(CW\*(C`SV*\*(C'\fR argument
+contains either the name of the Perl subroutine to be called, or a
+reference to the subroutine. The second argument consists of flags
+that control the context in which the subroutine is called, whether
+or not the subroutine is being passed arguments, how errors should be
+trapped, and how to treat return values.
+.PP
+All four routines return the number of arguments that the subroutine returned
+on the Perl stack.
+.PP
+These routines used to be called \f(CW\*(C`perl_call_sv\*(C'\fR, etc., before Perl v5.6.0,
+but those names are now deprecated; macros of the same name are provided for
+compatibility.
+.PP
+When using any of these routines (except \f(CW\*(C`call_argv\*(C'\fR), the programmer
+must manipulate the Perl stack. These include the following macros and
+functions:
+.PP
+.Vb 11
+\& dSP
+\& SP
+\& PUSHMARK()
+\& PUTBACK
+\& SPAGAIN
+\& ENTER
+\& SAVETMPS
+\& FREETMPS
+\& LEAVE
+\& XPUSH*()
+\& POP*()
+.Ve
+.PP
+For a detailed description of calling conventions from C to Perl,
+consult perlcall.
+.SS "Putting a C value on Perl stack"
+.IX Subsection "Putting a C value on Perl stack"
+A lot of opcodes (this is an elementary operation in the internal perl
+stack machine) put an SV* on the stack. However, as an optimization
+the corresponding SV is (usually) not recreated each time. The opcodes
+reuse specially assigned SVs (\fItarget\fRs) which are (as a corollary)
+not constantly freed/created.
+.PP
+Each of the targets is created only once (but see
+"Scratchpads and recursion" below), and when an opcode needs to put
+an integer, a double, or a string on the stack, it just sets the
+corresponding parts of its \fItarget\fR and puts the \fItarget\fR on stack.
+.PP
+The macro to put this target on stack is \f(CW\*(C`PUSHTARG\*(C'\fR, and it is
+directly used in some opcodes, as well as indirectly in zillions of
+others, which use it via \f(CW\*(C`(X)PUSH[iunp]\*(C'\fR.
+.PP
+Because the target is reused, you must be careful when pushing multiple
+values on the stack. The following code will not do what you think:
+.PP
+.Vb 2
+\& XPUSHi(10);
+\& XPUSHi(20);
+.Ve
+.PP
+This translates as "set \f(CW\*(C`TARG\*(C'\fR to 10, push a pointer to \f(CW\*(C`TARG\*(C'\fR onto
+the stack; set \f(CW\*(C`TARG\*(C'\fR to 20, push a pointer to \f(CW\*(C`TARG\*(C'\fR onto the stack".
+At the end of the operation, the stack does not contain the values 10
+and 20, but actually contains two pointers to \f(CW\*(C`TARG\*(C'\fR, which we have set
+to 20.
+.PP
+If you need to push multiple different values then you should either use
+the \f(CW\*(C`(X)PUSHs\*(C'\fR macros, or else use the new \f(CW\*(C`m(X)PUSH[iunp]\*(C'\fR macros,
+none of which make use of \f(CW\*(C`TARG\*(C'\fR. The \f(CW\*(C`(X)PUSHs\*(C'\fR macros simply push an
+SV* on the stack, which, as noted under "XSUBs and the Argument Stack",
+will often need to be "mortal". The new \f(CW\*(C`m(X)PUSH[iunp]\*(C'\fR macros make
+this a little easier to achieve by creating a new mortal for you (via
+\&\f(CW\*(C`(X)PUSHmortal\*(C'\fR), pushing that onto the stack (extending it if necessary
+in the case of the \f(CW\*(C`mXPUSH[iunp]\*(C'\fR macros), and then setting its value.
+Thus, instead of writing this to "fix" the example above:
+.PP
+.Vb 2
+\& XPUSHs(sv_2mortal(newSViv(10)))
+\& XPUSHs(sv_2mortal(newSViv(20)))
+.Ve
+.PP
+you can simply write:
+.PP
+.Vb 2
+\& mXPUSHi(10)
+\& mXPUSHi(20)
+.Ve
+.PP
+On a related note, if you do use \f(CW\*(C`(X)PUSH[iunp]\*(C'\fR, then you're going to
+need a \f(CW\*(C`dTARG\*(C'\fR in your variable declarations so that the \f(CW\*(C`*PUSH*\*(C'\fR
+macros can make use of the local variable \f(CW\*(C`TARG\*(C'\fR. See also
+\&\f(CW\*(C`dTARGET\*(C'\fR and \f(CW\*(C`dXSTARG\*(C'\fR.
+.SS Scratchpads
+.IX Subsection "Scratchpads"
+The question remains on when the SVs which are \fItarget\fRs for opcodes
+are created. The answer is that they are created when the current
+unit\-\-a subroutine or a file (for opcodes for statements outside of
+subroutines)\-\-is compiled. During this time a special anonymous Perl
+array is created, which is called a scratchpad for the current unit.
+.PP
+A scratchpad keeps SVs which are lexicals for the current unit and are
+targets for opcodes. A previous version of this document
+stated that one can deduce that an SV lives on a scratchpad
+by looking on its flags: lexicals have \f(CW\*(C`SVs_PADMY\*(C'\fR set, and
+\&\fItarget\fRs have \f(CW\*(C`SVs_PADTMP\*(C'\fR set. But this has never been fully true.
+\&\f(CW\*(C`SVs_PADMY\*(C'\fR could be set on a variable that no longer resides in any pad.
+While \fItarget\fRs do have \f(CW\*(C`SVs_PADTMP\*(C'\fR set, it can also be set on variables
+that have never resided in a pad, but nonetheless act like \fItarget\fRs. As
+of perl 5.21.5, the \f(CW\*(C`SVs_PADMY\*(C'\fR flag is no longer used and is defined as
+0. \f(CWSvPADMY()\fR now returns true for anything without \f(CW\*(C`SVs_PADTMP\*(C'\fR.
+.PP
+The correspondence between OPs and \fItarget\fRs is not 1\-to\-1. Different
+OPs in the compile tree of the unit can use the same target, if this
+would not conflict with the expected life of the temporary.
+.SS "Scratchpads and recursion"
+.IX Subsection "Scratchpads and recursion"
+In fact it is not 100% true that a compiled unit contains a pointer to
+the scratchpad AV. In fact it contains a pointer to an AV of
+(initially) one element, and this element is the scratchpad AV. Why do
+we need an extra level of indirection?
+.PP
+The answer is \fBrecursion\fR, and maybe \fBthreads\fR. Both
+these can create several execution pointers going into the same
+subroutine. For the subroutine-child not write over the temporaries
+for the subroutine-parent (lifespan of which covers the call to the
+child), the parent and the child should have different
+scratchpads. (\fIAnd\fR the lexicals should be separate anyway!)
+.PP
+So each subroutine is born with an array of scratchpads (of length 1).
+On each entry to the subroutine it is checked that the current
+depth of the recursion is not more than the length of this array, and
+if it is, new scratchpad is created and pushed into the array.
+.PP
+The \fItarget\fRs on this scratchpad are \f(CW\*(C`undef\*(C'\fRs, but they are already
+marked with correct flags.
+.SH "Memory Allocation"
+.IX Header "Memory Allocation"
+.SS Allocation
+.IX Subsection "Allocation"
+All memory meant to be used with the Perl API functions should be manipulated
+using the macros described in this section. The macros provide the necessary
+transparency between differences in the actual malloc implementation that is
+used within perl.
+.PP
+The following three macros are used to initially allocate memory :
+.PP
+.Vb 3
+\& Newx(pointer, number, type);
+\& Newxc(pointer, number, type, cast);
+\& Newxz(pointer, number, type);
+.Ve
+.PP
+The first argument \f(CW\*(C`pointer\*(C'\fR should be the name of a variable that will
+point to the newly allocated memory.
+.PP
+The second and third arguments \f(CW\*(C`number\*(C'\fR and \f(CW\*(C`type\*(C'\fR specify how many of
+the specified type of data structure should be allocated. The argument
+\&\f(CW\*(C`type\*(C'\fR is passed to \f(CW\*(C`sizeof\*(C'\fR. The final argument to \f(CW\*(C`Newxc\*(C'\fR, \f(CW\*(C`cast\*(C'\fR,
+should be used if the \f(CW\*(C`pointer\*(C'\fR argument is different from the \f(CW\*(C`type\*(C'\fR
+argument.
+.PP
+Unlike the \f(CW\*(C`Newx\*(C'\fR and \f(CW\*(C`Newxc\*(C'\fR macros, the \f(CW\*(C`Newxz\*(C'\fR macro calls \f(CW\*(C`memzero\*(C'\fR
+to zero out all the newly allocated memory.
+.SS Reallocation
+.IX Subsection "Reallocation"
+.Vb 3
+\& Renew(pointer, number, type);
+\& Renewc(pointer, number, type, cast);
+\& Safefree(pointer)
+.Ve
+.PP
+These three macros are used to change a memory buffer size or to free a
+piece of memory no longer needed. The arguments to \f(CW\*(C`Renew\*(C'\fR and \f(CW\*(C`Renewc\*(C'\fR
+match those of \f(CW\*(C`New\*(C'\fR and \f(CW\*(C`Newc\*(C'\fR with the exception of not needing the
+"magic cookie" argument.
+.SS Moving
+.IX Subsection "Moving"
+.Vb 3
+\& Move(source, dest, number, type);
+\& Copy(source, dest, number, type);
+\& Zero(dest, number, type);
+.Ve
+.PP
+These three macros are used to move, copy, or zero out previously allocated
+memory. The \f(CW\*(C`source\*(C'\fR and \f(CW\*(C`dest\*(C'\fR arguments point to the source and
+destination starting points. Perl will move, copy, or zero out \f(CW\*(C`number\*(C'\fR
+instances of the size of the \f(CW\*(C`type\*(C'\fR data structure (using the \f(CW\*(C`sizeof\*(C'\fR
+function).
+.SH PerlIO
+.IX Header "PerlIO"
+The most recent development releases of Perl have been experimenting with
+removing Perl's dependency on the "normal" standard I/O suite and allowing
+other stdio implementations to be used. This involves creating a new
+abstraction layer that then calls whichever implementation of stdio Perl
+was compiled with. All XSUBs should now use the functions in the PerlIO
+abstraction layer and not make any assumptions about what kind of stdio
+is being used.
+.PP
+For a complete description of the PerlIO abstraction, consult perlapio.
+.SH "Compiled code"
+.IX Header "Compiled code"
+.SS "Code tree"
+.IX Subsection "Code tree"
+Here we describe the internal form your code is converted to by
+Perl. Start with a simple example:
+.PP
+.Vb 1
+\& $a = $b + $c;
+.Ve
+.PP
+This is converted to a tree similar to this one:
+.PP
+.Vb 5
+\& assign\-to
+\& / \e
+\& + $a
+\& / \e
+\& $b $c
+.Ve
+.PP
+(but slightly more complicated). This tree reflects the way Perl
+parsed your code, but has nothing to do with the execution order.
+There is an additional "thread" going through the nodes of the tree
+which shows the order of execution of the nodes. In our simplified
+example above it looks like:
+.PP
+.Vb 1
+\& $b \-\-\-> $c \-\-\-> + \-\-\-> $a \-\-\-> assign\-to
+.Ve
+.PP
+But with the actual compile tree for \f(CW\*(C`$a = $b + $c\*(C'\fR it is different:
+some nodes \fIoptimized away\fR. As a corollary, though the actual tree
+contains more nodes than our simplified example, the execution order
+is the same as in our example.
+.SS "Examining the tree"
+.IX Subsection "Examining the tree"
+If you have your perl compiled for debugging (usually done with
+\&\f(CW\*(C`\-DDEBUGGING\*(C'\fR on the \f(CW\*(C`Configure\*(C'\fR command line), you may examine the
+compiled tree by specifying \f(CW\*(C`\-Dx\*(C'\fR on the Perl command line. The
+output takes several lines per node, and for \f(CW\*(C`$b+$c\*(C'\fR it looks like
+this:
+.PP
+.Vb 10
+\& 5 TYPE = add ===> 6
+\& TARG = 1
+\& FLAGS = (SCALAR,KIDS)
+\& {
+\& TYPE = null ===> (4)
+\& (was rv2sv)
+\& FLAGS = (SCALAR,KIDS)
+\& {
+\& 3 TYPE = gvsv ===> 4
+\& FLAGS = (SCALAR)
+\& GV = main::b
+\& }
+\& }
+\& {
+\& TYPE = null ===> (5)
+\& (was rv2sv)
+\& FLAGS = (SCALAR,KIDS)
+\& {
+\& 4 TYPE = gvsv ===> 5
+\& FLAGS = (SCALAR)
+\& GV = main::c
+\& }
+\& }
+.Ve
+.PP
+This tree has 5 nodes (one per \f(CW\*(C`TYPE\*(C'\fR specifier), only 3 of them are
+not optimized away (one per number in the left column). The immediate
+children of the given node correspond to \f(CW\*(C`{}\*(C'\fR pairs on the same level
+of indentation, thus this listing corresponds to the tree:
+.PP
+.Vb 5
+\& add
+\& / \e
+\& null null
+\& | |
+\& gvsv gvsv
+.Ve
+.PP
+The execution order is indicated by \f(CW\*(C`===>\*(C'\fR marks, thus it is \f(CW\*(C`3
+4 5 6\*(C'\fR (node \f(CW6\fR is not included into above listing), i.e.,
+\&\f(CW\*(C`gvsv gvsv add whatever\*(C'\fR.
+.PP
+Each of these nodes represents an op, a fundamental operation inside the
+Perl core. The code which implements each operation can be found in the
+\&\fIpp*.c\fR files; the function which implements the op with type \f(CW\*(C`gvsv\*(C'\fR
+is \f(CW\*(C`pp_gvsv\*(C'\fR, and so on. As the tree above shows, different ops have
+different numbers of children: \f(CW\*(C`add\*(C'\fR is a binary operator, as one would
+expect, and so has two children. To accommodate the various different
+numbers of children, there are various types of op data structure, and
+they link together in different ways.
+.PP
+The simplest type of op structure is \f(CW\*(C`OP\*(C'\fR: this has no children. Unary
+operators, \f(CW\*(C`UNOP\*(C'\fRs, have one child, and this is pointed to by the
+\&\f(CW\*(C`op_first\*(C'\fR field. Binary operators (\f(CW\*(C`BINOP\*(C'\fRs) have not only an
+\&\f(CW\*(C`op_first\*(C'\fR field but also an \f(CW\*(C`op_last\*(C'\fR field. The most complex type of
+op is a \f(CW\*(C`LISTOP\*(C'\fR, which has any number of children. In this case, the
+first child is pointed to by \f(CW\*(C`op_first\*(C'\fR and the last child by
+\&\f(CW\*(C`op_last\*(C'\fR. The children in between can be found by iteratively
+following the \f(CW\*(C`OpSIBLING\*(C'\fR pointer from the first child to the last (but
+see below).
+.PP
+There are also some other op types: a \f(CW\*(C`PMOP\*(C'\fR holds a regular expression,
+and has no children, and a \f(CW\*(C`LOOP\*(C'\fR may or may not have children. If the
+\&\f(CW\*(C`op_children\*(C'\fR field is non-zero, it behaves like a \f(CW\*(C`LISTOP\*(C'\fR. To
+complicate matters, if a \f(CW\*(C`UNOP\*(C'\fR is actually a \f(CW\*(C`null\*(C'\fR op after
+optimization (see "Compile pass 2: context propagation") it will still
+have children in accordance with its former type.
+.PP
+Finally, there is a \f(CW\*(C`LOGOP\*(C'\fR, or logic op. Like a \f(CW\*(C`LISTOP\*(C'\fR, this has one
+or more children, but it doesn't have an \f(CW\*(C`op_last\*(C'\fR field: so you have to
+follow \f(CW\*(C`op_first\*(C'\fR and then the \f(CW\*(C`OpSIBLING\*(C'\fR chain itself to find the
+last child. Instead it has an \f(CW\*(C`op_other\*(C'\fR field, which is comparable to
+the \f(CW\*(C`op_next\*(C'\fR field described below, and represents an alternate
+execution path. Operators like \f(CW\*(C`and\*(C'\fR, \f(CW\*(C`or\*(C'\fR and \f(CW\*(C`?\*(C'\fR are \f(CW\*(C`LOGOP\*(C'\fRs. Note
+that in general, \f(CW\*(C`op_other\*(C'\fR may not point to any of the direct children
+of the \f(CW\*(C`LOGOP\*(C'\fR.
+.PP
+Starting in version 5.21.2, perls built with the experimental
+define \f(CW\*(C`\-DPERL_OP_PARENT\*(C'\fR add an extra boolean flag for each op,
+\&\f(CW\*(C`op_moresib\*(C'\fR. When not set, this indicates that this is the last op in an
+\&\f(CW\*(C`OpSIBLING\*(C'\fR chain. This frees up the \f(CW\*(C`op_sibling\*(C'\fR field on the last
+sibling to point back to the parent op. Under this build, that field is
+also renamed \f(CW\*(C`op_sibparent\*(C'\fR to reflect its joint role. The macro
+\&\f(CWOpSIBLING(o)\fR wraps this special behaviour, and always returns NULL on
+the last sibling. With this build the \f(CWop_parent(o)\fR function can be
+used to find the parent of any op. Thus for forward compatibility, you
+should always use the \f(CWOpSIBLING(o)\fR macro rather than accessing
+\&\f(CW\*(C`op_sibling\*(C'\fR directly.
+.PP
+Another way to examine the tree is to use a compiler back-end module, such
+as B::Concise.
+.SS "Compile pass 1: check routines"
+.IX Subsection "Compile pass 1: check routines"
+The tree is created by the compiler while \fIyacc\fR code feeds it
+the constructions it recognizes. Since \fIyacc\fR works bottom-up, so does
+the first pass of perl compilation.
+.PP
+What makes this pass interesting for perl developers is that some
+optimization may be performed on this pass. This is optimization by
+so-called "check routines". The correspondence between node names
+and corresponding check routines is described in \fIopcode.pl\fR (do not
+forget to run \f(CW\*(C`make regen_headers\*(C'\fR if you modify this file).
+.PP
+A check routine is called when the node is fully constructed except
+for the execution-order thread. Since at this time there are no
+back-links to the currently constructed node, one can do most any
+operation to the top-level node, including freeing it and/or creating
+new nodes above/below it.
+.PP
+The check routine returns the node which should be inserted into the
+tree (if the top-level node was not modified, check routine returns
+its argument).
+.PP
+By convention, check routines have names \f(CW\*(C`ck_*\*(C'\fR. They are usually
+called from \f(CW\*(C`new*OP\*(C'\fR subroutines (or \f(CW\*(C`convert\*(C'\fR) (which in turn are
+called from \fIperly.y\fR).
+.SS "Compile pass 1a: constant folding"
+.IX Subsection "Compile pass 1a: constant folding"
+Immediately after the check routine is called the returned node is
+checked for being compile-time executable. If it is (the value is
+judged to be constant) it is immediately executed, and a \fIconstant\fR
+node with the "return value" of the corresponding subtree is
+substituted instead. The subtree is deleted.
+.PP
+If constant folding was not performed, the execution-order thread is
+created.
+.SS "Compile pass 2: context propagation"
+.IX Subsection "Compile pass 2: context propagation"
+When a context for a part of compile tree is known, it is propagated
+down through the tree. At this time the context can have 5 values
+(instead of 2 for runtime context): void, boolean, scalar, list, and
+lvalue. In contrast with the pass 1 this pass is processed from top
+to bottom: a node's context determines the context for its children.
+.PP
+Additional context-dependent optimizations are performed at this time.
+Since at this moment the compile tree contains back-references (via
+"thread" pointers), nodes cannot be \fBfree()\fRd now. To allow
+optimized-away nodes at this stage, such nodes are \fBnull()\fRified instead
+of \fBfree()\fRing (i.e. their type is changed to OP_NULL).
+.SS "Compile pass 3: peephole optimization"
+.IX Subsection "Compile pass 3: peephole optimization"
+After the compile tree for a subroutine (or for an \f(CW\*(C`eval\*(C'\fR or a file)
+is created, an additional pass over the code is performed. This pass
+is neither top-down or bottom-up, but in the execution order (with
+additional complications for conditionals). Optimizations performed
+at this stage are subject to the same restrictions as in the pass 2.
+.PP
+Peephole optimizations are done by calling the function pointed to
+by the global variable \f(CW\*(C`PL_peepp\*(C'\fR. By default, \f(CW\*(C`PL_peepp\*(C'\fR just
+calls the function pointed to by the global variable \f(CW\*(C`PL_rpeepp\*(C'\fR.
+By default, that performs some basic op fixups and optimisations along
+the execution-order op chain, and recursively calls \f(CW\*(C`PL_rpeepp\*(C'\fR for
+each side chain of ops (resulting from conditionals). Extensions may
+provide additional optimisations or fixups, hooking into either the
+per-subroutine or recursive stage, like this:
+.PP
+.Vb 10
+\& static peep_t prev_peepp;
+\& static void my_peep(pTHX_ OP *o)
+\& {
+\& /* custom per\-subroutine optimisation goes here */
+\& prev_peepp(aTHX_ o);
+\& /* custom per\-subroutine optimisation may also go here */
+\& }
+\& BOOT:
+\& prev_peepp = PL_peepp;
+\& PL_peepp = my_peep;
+\&
+\& static peep_t prev_rpeepp;
+\& static void my_rpeep(pTHX_ OP *first)
+\& {
+\& OP *o = first, *t = first;
+\& for(; o = o\->op_next, t = t\->op_next) {
+\& /* custom per\-op optimisation goes here */
+\& o = o\->op_next;
+\& if (!o || o == t) break;
+\& /* custom per\-op optimisation goes AND here */
+\& }
+\& prev_rpeepp(aTHX_ orig_o);
+\& }
+\& BOOT:
+\& prev_rpeepp = PL_rpeepp;
+\& PL_rpeepp = my_rpeep;
+.Ve
+.SS "Pluggable runops"
+.IX Subsection "Pluggable runops"
+The compile tree is executed in a runops function. There are two runops
+functions, in \fIrun.c\fR and in \fIdump.c\fR. \f(CW\*(C`Perl_runops_debug\*(C'\fR is used
+with DEBUGGING and \f(CW\*(C`Perl_runops_standard\*(C'\fR is used otherwise. For fine
+control over the execution of the compile tree it is possible to provide
+your own runops function.
+.PP
+It's probably best to copy one of the existing runops functions and
+change it to suit your needs. Then, in the BOOT section of your XS
+file, add the line:
+.PP
+.Vb 1
+\& PL_runops = my_runops;
+.Ve
+.PP
+This function should be as efficient as possible to keep your programs
+running as fast as possible.
+.SS "Compile-time scope hooks"
+.IX Subsection "Compile-time scope hooks"
+As of perl 5.14 it is possible to hook into the compile-time lexical
+scope mechanism using \f(CW\*(C`Perl_blockhook_register\*(C'\fR. This is used like
+this:
+.PP
+.Vb 2
+\& STATIC void my_start_hook(pTHX_ int full);
+\& STATIC BHK my_hooks;
+\&
+\& BOOT:
+\& BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
+\& Perl_blockhook_register(aTHX_ &my_hooks);
+.Ve
+.PP
+This will arrange to have \f(CW\*(C`my_start_hook\*(C'\fR called at the start of
+compiling every lexical scope. The available hooks are:
+.ie n .IP """void bhk_start(pTHX_ int full)""" 4
+.el .IP "\f(CWvoid bhk_start(pTHX_ int full)\fR" 4
+.IX Item "void bhk_start(pTHX_ int full)"
+This is called just after starting a new lexical scope. Note that Perl
+code like
+.Sp
+.Vb 1
+\& if ($x) { ... }
+.Ve
+.Sp
+creates two scopes: the first starts at the \f(CW\*(C`(\*(C'\fR and has \f(CW\*(C`full == 1\*(C'\fR,
+the second starts at the \f(CW\*(C`{\*(C'\fR and has \f(CW\*(C`full == 0\*(C'\fR. Both end at the
+\&\f(CW\*(C`}\*(C'\fR, so calls to \f(CW\*(C`start\*(C'\fR and \f(CW\*(C`pre\*(C'\fR/\f(CW\*(C`post_end\*(C'\fR will match. Anything
+pushed onto the save stack by this hook will be popped just before the
+scope ends (between the \f(CW\*(C`pre_\*(C'\fR and \f(CW\*(C`post_end\*(C'\fR hooks, in fact).
+.ie n .IP """void bhk_pre_end(pTHX_ OP **o)""" 4
+.el .IP "\f(CWvoid bhk_pre_end(pTHX_ OP **o)\fR" 4
+.IX Item "void bhk_pre_end(pTHX_ OP **o)"
+This is called at the end of a lexical scope, just before unwinding the
+stack. \fIo\fR is the root of the optree representing the scope; it is a
+double pointer so you can replace the OP if you need to.
+.ie n .IP """void bhk_post_end(pTHX_ OP **o)""" 4
+.el .IP "\f(CWvoid bhk_post_end(pTHX_ OP **o)\fR" 4
+.IX Item "void bhk_post_end(pTHX_ OP **o)"
+This is called at the end of a lexical scope, just after unwinding the
+stack. \fIo\fR is as above. Note that it is possible for calls to \f(CW\*(C`pre_\*(C'\fR
+and \f(CW\*(C`post_end\*(C'\fR to nest, if there is something on the save stack that
+calls string eval.
+.ie n .IP """void bhk_eval(pTHX_ OP *const o)""" 4
+.el .IP "\f(CWvoid bhk_eval(pTHX_ OP *const o)\fR" 4
+.IX Item "void bhk_eval(pTHX_ OP *const o)"
+This is called just before starting to compile an \f(CW\*(C`eval STRING\*(C'\fR, \f(CW\*(C`do
+FILE\*(C'\fR, \f(CW\*(C`require\*(C'\fR or \f(CW\*(C`use\*(C'\fR, after the eval has been set up. \fIo\fR is the
+OP that requested the eval, and will normally be an \f(CW\*(C`OP_ENTEREVAL\*(C'\fR,
+\&\f(CW\*(C`OP_DOFILE\*(C'\fR or \f(CW\*(C`OP_REQUIRE\*(C'\fR.
+.PP
+Once you have your hook functions, you need a \f(CW\*(C`BHK\*(C'\fR structure to put
+them in. It's best to allocate it statically, since there is no way to
+free it once it's registered. The function pointers should be inserted
+into this structure using the \f(CW\*(C`BhkENTRY_set\*(C'\fR macro, which will also set
+flags indicating which entries are valid. If you do need to allocate
+your \f(CW\*(C`BHK\*(C'\fR dynamically for some reason, be sure to zero it before you
+start.
+.PP
+Once registered, there is no mechanism to switch these hooks off, so if
+that is necessary you will need to do this yourself. An entry in \f(CW\*(C`%^H\*(C'\fR
+is probably the best way, so the effect is lexically scoped; however it
+is also possible to use the \f(CW\*(C`BhkDISABLE\*(C'\fR and \f(CW\*(C`BhkENABLE\*(C'\fR macros to
+temporarily switch entries on and off. You should also be aware that
+generally speaking at least one scope will have opened before your
+extension is loaded, so you will see some \f(CW\*(C`pre\*(C'\fR/\f(CW\*(C`post_end\*(C'\fR pairs that
+didn't have a matching \f(CW\*(C`start\*(C'\fR.
+.ie n .SH "Examining internal data structures with the ""dump"" functions"
+.el .SH "Examining internal data structures with the \f(CWdump\fP functions"
+.IX Header "Examining internal data structures with the dump functions"
+To aid debugging, the source file \fIdump.c\fR contains a number of
+functions which produce formatted output of internal data structures.
+.PP
+The most commonly used of these functions is \f(CW\*(C`Perl_sv_dump\*(C'\fR; it's used
+for dumping SVs, AVs, HVs, and CVs. The \f(CW\*(C`Devel::Peek\*(C'\fR module calls
+\&\f(CW\*(C`sv_dump\*(C'\fR to produce debugging output from Perl-space, so users of that
+module should already be familiar with its format.
+.PP
+\&\f(CW\*(C`Perl_op_dump\*(C'\fR can be used to dump an \f(CW\*(C`OP\*(C'\fR structure or any of its
+derivatives, and produces output similar to \f(CW\*(C`perl \-Dx\*(C'\fR; in fact,
+\&\f(CW\*(C`Perl_dump_eval\*(C'\fR will dump the main root of the code being evaluated,
+exactly like \f(CW\*(C`\-Dx\*(C'\fR.
+.PP
+Other useful functions are \f(CW\*(C`Perl_dump_sub\*(C'\fR, which turns a \f(CW\*(C`GV\*(C'\fR into an
+op tree, \f(CW\*(C`Perl_dump_packsubs\*(C'\fR which calls \f(CW\*(C`Perl_dump_sub\*(C'\fR on all the
+subroutines in a package like so: (Thankfully, these are all xsubs, so
+there is no op tree)
+.PP
+.Vb 1
+\& (gdb) print Perl_dump_packsubs(PL_defstash)
+\&
+\& SUB attributes::bootstrap = (xsub 0x811fedc 0)
+\&
+\& SUB UNIVERSAL::can = (xsub 0x811f50c 0)
+\&
+\& SUB UNIVERSAL::isa = (xsub 0x811f304 0)
+\&
+\& SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
+\&
+\& SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
+.Ve
+.PP
+and \f(CW\*(C`Perl_dump_all\*(C'\fR, which dumps all the subroutines in the stash and
+the op tree of the main root.
+.SH "How multiple interpreters and concurrency are supported"
+.IX Header "How multiple interpreters and concurrency are supported"
+.SS "Background and MULTIPLICITY"
+.IX Subsection "Background and MULTIPLICITY"
+The Perl interpreter can be regarded as a closed box: it has an API
+for feeding it code or otherwise making it do things, but it also has
+functions for its own use. This smells a lot like an object, and
+there is a way for you to build Perl so that you can have multiple
+interpreters, with one interpreter represented either as a C structure,
+or inside a thread-specific structure. These structures contain all
+the context, the state of that interpreter.
+.PP
+The macro that controls the major Perl build flavor is MULTIPLICITY. The
+MULTIPLICITY build has a C structure that packages all the interpreter
+state, which is being passed to various perl functions as a "hidden"
+first argument. MULTIPLICITY makes multi-threaded perls possible (with the
+ithreads threading model, related to the macro USE_ITHREADS.)
+.PP
+PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY.
+.PP
+To see whether you have non-const data you can use a BSD (or GNU)
+compatible \f(CW\*(C`nm\*(C'\fR:
+.PP
+.Vb 1
+\& nm libperl.a | grep \-v \*(Aq [TURtr] \*(Aq
+.Ve
+.PP
+If this displays any \f(CW\*(C`D\*(C'\fR or \f(CW\*(C`d\*(C'\fR symbols (or possibly \f(CW\*(C`C\*(C'\fR or \f(CW\*(C`c\*(C'\fR),
+you have non-const data. The symbols the \f(CW\*(C`grep\*(C'\fR removed are as follows:
+\&\f(CW\*(C`Tt\*(C'\fR are \fItext\fR, or code, the \f(CW\*(C`Rr\*(C'\fR are \fIread-only\fR (const) data,
+and the \f(CW\*(C`U\*(C'\fR is <undefined>, external symbols referred to.
+.PP
+The test \fIt/porting/libperl.t\fR does this kind of symbol sanity
+checking on \f(CW\*(C`libperl.a\*(C'\fR.
+.PP
+All this obviously requires a way for the Perl internal functions to be
+either subroutines taking some kind of structure as the first
+argument, or subroutines taking nothing as the first argument. To
+enable these two very different ways of building the interpreter,
+the Perl source (as it does in so many other situations) makes heavy
+use of macros and subroutine naming conventions.
+.PP
+First problem: deciding which functions will be public API functions and
+which will be private. All functions whose names begin \f(CW\*(C`S_\*(C'\fR are private
+(think "S" for "secret" or "static"). All other functions begin with
+"Perl_", but just because a function begins with "Perl_" does not mean it is
+part of the API. (See "Internal
+Functions".) The easiest way to be \fBsure\fR a
+function is part of the API is to find its entry in perlapi.
+If it exists in perlapi, it's part of the API. If it doesn't, and you
+think it should be (i.e., you need it for your extension), submit an issue at
+<https://github.com/Perl/perl5/issues> explaining why you think it should be.
+.PP
+Second problem: there must be a syntax so that the same subroutine
+declarations and calls can pass a structure as their first argument,
+or pass nothing. To solve this, the subroutines are named and
+declared in a particular way. Here's a typical start of a static
+function used within the Perl guts:
+.PP
+.Vb 2
+\& STATIC void
+\& S_incline(pTHX_ char *s)
+.Ve
+.PP
+STATIC becomes "static" in C, and may be #define'd to nothing in some
+configurations in the future.
+.PP
+A public function (i.e. part of the internal API, but not necessarily
+sanctioned for use in extensions) begins like this:
+.PP
+.Vb 2
+\& void
+\& Perl_sv_setiv(pTHX_ SV* dsv, IV num)
+.Ve
+.PP
+\&\f(CW\*(C`pTHX_\*(C'\fR is one of a number of macros (in \fIperl.h\fR) that hide the
+details of the interpreter's context. THX stands for "thread", "this",
+or "thingy", as the case may be. (And no, George Lucas is not involved. :\-)
+The first character could be 'p' for a \fBp\fRrototype, 'a' for \fBa\fRrgument,
+or 'd' for \fBd\fReclaration, so we have \f(CW\*(C`pTHX\*(C'\fR, \f(CW\*(C`aTHX\*(C'\fR and \f(CW\*(C`dTHX\*(C'\fR, and
+their variants.
+.PP
+When Perl is built without options that set MULTIPLICITY, there is no
+first argument containing the interpreter's context. The trailing underscore
+in the pTHX_ macro indicates that the macro expansion needs a comma
+after the context argument because other arguments follow it. If
+MULTIPLICITY is not defined, pTHX_ will be ignored, and the
+subroutine is not prototyped to take the extra argument. The form of the
+macro without the trailing underscore is used when there are no additional
+explicit arguments.
+.PP
+When a core function calls another, it must pass the context. This
+is normally hidden via macros. Consider \f(CW\*(C`sv_setiv\*(C'\fR. It expands into
+something like this:
+.PP
+.Vb 6
+\& #ifdef MULTIPLICITY
+\& #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
+\& /* can\*(Aqt do this for vararg functions, see below */
+\& #else
+\& #define sv_setiv Perl_sv_setiv
+\& #endif
+.Ve
+.PP
+This works well, and means that XS authors can gleefully write:
+.PP
+.Vb 1
+\& sv_setiv(foo, bar);
+.Ve
+.PP
+and still have it work under all the modes Perl could have been
+compiled with.
+.PP
+This doesn't work so cleanly for varargs functions, though, as macros
+imply that the number of arguments is known in advance. Instead we
+either need to spell them out fully, passing \f(CW\*(C`aTHX_\*(C'\fR as the first
+argument (the Perl core tends to do this with functions like
+Perl_warner), or use a context-free version.
+.PP
+The context-free version of Perl_warner is called
+Perl_warner_nocontext, and does not take the extra argument. Instead
+it does \f(CW\*(C`dTHX;\*(C'\fR to get the context from thread-local storage. We
+\&\f(CW\*(C`#define warner Perl_warner_nocontext\*(C'\fR so that extensions get source
+compatibility at the expense of performance. (Passing an arg is
+cheaper than grabbing it from thread-local storage.)
+.PP
+You can ignore [pad]THXx when browsing the Perl headers/sources.
+Those are strictly for use within the core. Extensions and embedders
+need only be aware of [pad]THX.
+.SS "So what happened to dTHR?"
+.IX Subsection "So what happened to dTHR?"
+\&\f(CW\*(C`dTHR\*(C'\fR was introduced in perl 5.005 to support the older thread model.
+The older thread model now uses the \f(CW\*(C`THX\*(C'\fR mechanism to pass context
+pointers around, so \f(CW\*(C`dTHR\*(C'\fR is not useful any more. Perl 5.6.0 and
+later still have it for backward source compatibility, but it is defined
+to be a no-op.
+.SS "How do I use all this in extensions?"
+.IX Subsection "How do I use all this in extensions?"
+When Perl is built with MULTIPLICITY, extensions that call
+any functions in the Perl API will need to pass the initial context
+argument somehow. The kicker is that you will need to write it in
+such a way that the extension still compiles when Perl hasn't been
+built with MULTIPLICITY enabled.
+.PP
+There are three ways to do this. First, the easy but inefficient way,
+which is also the default, in order to maintain source compatibility
+with extensions: whenever \fIXSUB.h\fR is #included, it redefines the aTHX
+and aTHX_ macros to call a function that will return the context.
+Thus, something like:
+.PP
+.Vb 1
+\& sv_setiv(sv, num);
+.Ve
+.PP
+in your extension will translate to this when MULTIPLICITY is
+in effect:
+.PP
+.Vb 1
+\& Perl_sv_setiv(Perl_get_context(), sv, num);
+.Ve
+.PP
+or to this otherwise:
+.PP
+.Vb 1
+\& Perl_sv_setiv(sv, num);
+.Ve
+.PP
+You don't have to do anything new in your extension to get this; since
+the Perl library provides \fBPerl_get_context()\fR, it will all just
+work.
+.PP
+The second, more efficient way is to use the following template for
+your Foo.xs:
+.PP
+.Vb 4
+\& #define PERL_NO_GET_CONTEXT /* we want efficiency */
+\& #include "EXTERN.h"
+\& #include "perl.h"
+\& #include "XSUB.h"
+\&
+\& STATIC void my_private_function(int arg1, int arg2);
+\&
+\& STATIC void
+\& my_private_function(int arg1, int arg2)
+\& {
+\& dTHX; /* fetch context */
+\& ... call many Perl API functions ...
+\& }
+\&
+\& [... etc ...]
+\&
+\& MODULE = Foo PACKAGE = Foo
+\&
+\& /* typical XSUB */
+\&
+\& void
+\& my_xsub(arg)
+\& int arg
+\& CODE:
+\& my_private_function(arg, 10);
+.Ve
+.PP
+Note that the only two changes from the normal way of writing an
+extension is the addition of a \f(CW\*(C`#define PERL_NO_GET_CONTEXT\*(C'\fR before
+including the Perl headers, followed by a \f(CW\*(C`dTHX;\*(C'\fR declaration at
+the start of every function that will call the Perl API. (You'll
+know which functions need this, because the C compiler will complain
+that there's an undeclared identifier in those functions.) No changes
+are needed for the XSUBs themselves, because the \fBXS()\fR macro is
+correctly defined to pass in the implicit context if needed.
+.PP
+The third, even more efficient way is to ape how it is done within
+the Perl guts:
+.PP
+.Vb 4
+\& #define PERL_NO_GET_CONTEXT /* we want efficiency */
+\& #include "EXTERN.h"
+\& #include "perl.h"
+\& #include "XSUB.h"
+\&
+\& /* pTHX_ only needed for functions that call Perl API */
+\& STATIC void my_private_function(pTHX_ int arg1, int arg2);
+\&
+\& STATIC void
+\& my_private_function(pTHX_ int arg1, int arg2)
+\& {
+\& /* dTHX; not needed here, because THX is an argument */
+\& ... call Perl API functions ...
+\& }
+\&
+\& [... etc ...]
+\&
+\& MODULE = Foo PACKAGE = Foo
+\&
+\& /* typical XSUB */
+\&
+\& void
+\& my_xsub(arg)
+\& int arg
+\& CODE:
+\& my_private_function(aTHX_ arg, 10);
+.Ve
+.PP
+This implementation never has to fetch the context using a function
+call, since it is always passed as an extra argument. Depending on
+your needs for simplicity or efficiency, you may mix the previous
+two approaches freely.
+.PP
+Never add a comma after \f(CW\*(C`pTHX\*(C'\fR yourself\-\-always use the form of the
+macro with the underscore for functions that take explicit arguments,
+or the form without the argument for functions with no explicit arguments.
+.SS "Should I do anything special if I call perl from multiple threads?"
+.IX Subsection "Should I do anything special if I call perl from multiple threads?"
+If you create interpreters in one thread and then proceed to call them in
+another, you need to make sure perl's own Thread Local Storage (TLS) slot is
+initialized correctly in each of those threads.
+.PP
+The \f(CW\*(C`perl_alloc\*(C'\fR and \f(CW\*(C`perl_clone\*(C'\fR API functions will automatically set
+the TLS slot to the interpreter they created, so that there is no need to do
+anything special if the interpreter is always accessed in the same thread that
+created it, and that thread did not create or call any other interpreters
+afterwards. If that is not the case, you have to set the TLS slot of the
+thread before calling any functions in the Perl API on that particular
+interpreter. This is done by calling the \f(CW\*(C`PERL_SET_CONTEXT\*(C'\fR macro in that
+thread as the first thing you do:
+.PP
+.Vb 2
+\& /* do this before doing anything else with some_perl */
+\& PERL_SET_CONTEXT(some_perl);
+\&
+\& ... other Perl API calls on some_perl go here ...
+.Ve
+.PP
+(You can always get the current context via \f(CW\*(C`PERL_GET_CONTEXT\*(C'\fR.)
+.SS "Future Plans and PERL_IMPLICIT_SYS"
+.IX Subsection "Future Plans and PERL_IMPLICIT_SYS"
+Just as MULTIPLICITY provides a way to bundle up everything
+that the interpreter knows about itself and pass it around, so too are
+there plans to allow the interpreter to bundle up everything it knows
+about the environment it's running on. This is enabled with the
+PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on
+Windows.
+.PP
+This allows the ability to provide an extra pointer (called the "host"
+environment) for all the system calls. This makes it possible for
+all the system stuff to maintain their own state, broken down into
+seven C structures. These are thin wrappers around the usual system
+calls (see \fIwin32/perllib.c\fR) for the default perl executable, but for a
+more ambitious host (like the one that would do \fBfork()\fR emulation) all
+the extra work needed to pretend that different interpreters are
+actually different "processes", would be done here.
+.PP
+The Perl engine/interpreter and the host are orthogonal entities.
+There could be one or more interpreters in a process, and one or
+more "hosts", with free association between them.
+.SH "Internal Functions"
+.IX Header "Internal Functions"
+All of Perl's internal functions which will be exposed to the outside
+world are prefixed by \f(CW\*(C`Perl_\*(C'\fR so that they will not conflict with XS
+functions or functions used in a program in which Perl is embedded.
+Similarly, all global variables begin with \f(CW\*(C`PL_\*(C'\fR. (By convention,
+static functions start with \f(CW\*(C`S_\*(C'\fR.)
+.PP
+Inside the Perl core (\f(CW\*(C`PERL_CORE\*(C'\fR defined), you can get at the functions
+either with or without the \f(CW\*(C`Perl_\*(C'\fR prefix, thanks to a bunch of defines
+that live in \fIembed.h\fR. Note that extension code should \fInot\fR set
+\&\f(CW\*(C`PERL_CORE\*(C'\fR; this exposes the full perl internals, and is likely to cause
+breakage of the XS in each new perl release.
+.PP
+The file \fIembed.h\fR is generated automatically from
+\&\fIembed.pl\fR and \fIembed.fnc\fR. \fIembed.pl\fR also creates the prototyping
+header files for the internal functions, generates the documentation
+and a lot of other bits and pieces. It's important that when you add
+a new function to the core or change an existing one, you change the
+data in the table in \fIembed.fnc\fR as well. Here's a sample entry from
+that table:
+.PP
+.Vb 1
+\& Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
+.Ve
+.PP
+The first column is a set of flags, the second column the return type,
+the third column the name. Columns after that are the arguments.
+The flags are documented at the top of \fIembed.fnc\fR.
+.PP
+If you edit \fIembed.pl\fR or \fIembed.fnc\fR, you will need to run
+\&\f(CW\*(C`make regen_headers\*(C'\fR to force a rebuild of \fIembed.h\fR and other
+auto-generated files.
+.SS "Formatted Printing of IVs, UVs, and NVs"
+.IX Subsection "Formatted Printing of IVs, UVs, and NVs"
+If you are printing IVs, UVs, or NVS instead of the \fBstdio\fR\|(3) style
+formatting codes like \f(CW%d\fR, \f(CW%ld\fR, \f(CW%f\fR, you should use the
+following macros for portability
+.PP
+.Vb 7
+\& IVdf IV in decimal
+\& UVuf UV in decimal
+\& UVof UV in octal
+\& UVxf UV in hexadecimal
+\& NVef NV %e\-like
+\& NVff NV %f\-like
+\& NVgf NV %g\-like
+.Ve
+.PP
+These will take care of 64\-bit integers and long doubles.
+For example:
+.PP
+.Vb 1
+\& printf("IV is %" IVdf "\en", iv);
+.Ve
+.PP
+The \f(CW\*(C`IVdf\*(C'\fR will expand to whatever is the correct format for the IVs.
+Note that the spaces are required around the format in case the code is
+compiled with C++, to maintain compliance with its standard.
+.PP
+Note that there are different "long doubles": Perl will use
+whatever the compiler has.
+.PP
+If you are printing addresses of pointers, use \f(CW%p\fR or UVxf combined
+with \fBPTR2UV()\fR.
+.SS "Formatted Printing of SVs"
+.IX Subsection "Formatted Printing of SVs"
+The contents of SVs may be printed using the \f(CW\*(C`SVf\*(C'\fR format, like so:
+.PP
+.Vb 1
+\& Perl_croak(aTHX_ "This croaked because: %" SVf "\en", SVfARG(err_msg))
+.Ve
+.PP
+where \f(CW\*(C`err_msg\*(C'\fR is an SV.
+.PP
+Not all scalar types are printable. Simple values certainly are: one of
+IV, UV, NV, or PV. Also, if the SV is a reference to some value,
+either it will be dereferenced and the value printed, or information
+about the type of that value and its address are displayed. The results
+of printing any other type of SV are undefined and likely to lead to an
+interpreter crash. NVs are printed using a \f(CW%g\fR\-ish format.
+.PP
+Note that the spaces are required around the \f(CW\*(C`SVf\*(C'\fR in case the code is
+compiled with C++, to maintain compliance with its standard.
+.PP
+Note that any filehandle being printed to under UTF\-8 must be expecting
+UTF\-8 in order to get good results and avoid Wide-character warnings.
+One way to do this for typical filehandles is to invoke perl with the
+\&\f(CW\*(C`\-C\*(C'\fR parameter. (See "\-C [number/list]" in perlrun.
+.PP
+You can use this to concatenate two scalars:
+.PP
+.Vb 4
+\& SV *var1 = get_sv("var1", GV_ADD);
+\& SV *var2 = get_sv("var2", GV_ADD);
+\& SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf,
+\& SVfARG(var1), SVfARG(var2));
+.Ve
+.PP
+\&\f(CW\*(C`SVf_QUOTEDPREFIX\*(C'\fR is similar to \f(CW\*(C`SVf\*(C'\fR except that it restricts the
+number of the characters printed, showing at most the first
+\&\f(CW\*(C`PERL_QUOTEDPREFIX_LEN\*(C'\fR characters of the argument, and rendering it with
+double quotes and with the contents escaped using double quoted string
+escaping rules. If the string is longer than this then ellipses "..."
+will be appended after the trailing quote. This is intended for error
+messages where the string is assumed to be a class name.
+.PP
+\&\f(CW\*(C`HvNAMEf\*(C'\fR and \f(CW\*(C`HvNAMEf_QUOTEDPREFIX\*(C'\fR are similar to \f(CW\*(C`SVf\*(C'\fR except they
+extract the string, length and utf8 flags from the argument using the
+\&\f(CWHvNAME()\fR, \f(CWHvNAMELEN()\fR, \f(CWHvNAMEUTF8()\fR macros. This is intended
+for stringifying a class name directly from an stash HV.
+.SS "Formatted Printing of Strings"
+.IX Subsection "Formatted Printing of Strings"
+If you just want the bytes printed in a 7bit NUL-terminated string, you can
+just use \f(CW%s\fR (assuming they are all really only 7bit). But if there is a
+possibility the value will be encoded as UTF\-8 or contains bytes above
+\&\f(CW0x7F\fR (and therefore 8bit), you should instead use the \f(CW\*(C`UTF8f\*(C'\fR format.
+And as its parameter, use the \f(CWUTF8fARG()\fR macro:
+.PP
+.Vb 1
+\& chr * msg;
+\&
+\& /* U+2018: \exE2\ex80\ex98 LEFT SINGLE QUOTATION MARK
+\& U+2019: \exE2\ex80\ex99 RIGHT SINGLE QUOTATION MARK */
+\& if (can_utf8)
+\& msg = "\exE2\ex80\ex98Uses fancy quotes\exE2\ex80\ex99";
+\& else
+\& msg = "\*(AqUses simple quotes\*(Aq";
+\&
+\& Perl_croak(aTHX_ "The message is: %" UTF8f "\en",
+\& UTF8fARG(can_utf8, strlen(msg), msg));
+.Ve
+.PP
+The first parameter to \f(CW\*(C`UTF8fARG\*(C'\fR is a boolean: 1 if the string is in
+UTF\-8; 0 if string is in native byte encoding (Latin1).
+The second parameter is the number of bytes in the string to print.
+And the third and final parameter is a pointer to the first byte in the
+string.
+.PP
+Note that any filehandle being printed to under UTF\-8 must be expecting
+UTF\-8 in order to get good results and avoid Wide-character warnings.
+One way to do this for typical filehandles is to invoke perl with the
+\&\f(CW\*(C`\-C\*(C'\fR parameter. (See "\-C [number/list]" in perlrun.
+.ie n .SS "Formatted Printing of ""Size_t"" and ""SSize_t"""
+.el .SS "Formatted Printing of \f(CWSize_t\fP and \f(CWSSize_t\fP"
+.IX Subsection "Formatted Printing of Size_t and SSize_t"
+The most general way to do this is to cast them to a UV or IV, and
+print as in the
+previous section.
+.PP
+But if you're using \f(CWPerlIO_printf()\fR, it's less typing and visual
+clutter to use the \f(CW%z\fR length modifier (for \fIsiZe\fR):
+.PP
+.Vb 1
+\& PerlIO_printf("STRLEN is %zu\en", len);
+.Ve
+.PP
+This modifier is not portable, so its use should be restricted to
+\&\f(CWPerlIO_printf()\fR.
+.ie n .SS "Formatted Printing of ""Ptrdiff_t"", ""intmax_t"", ""short"" and other special sizes"
+.el .SS "Formatted Printing of \f(CWPtrdiff_t\fP, \f(CWintmax_t\fP, \f(CWshort\fP and other special sizes"
+.IX Subsection "Formatted Printing of Ptrdiff_t, intmax_t, short and other special sizes"
+There are modifiers for these special situations if you are using
+\&\f(CWPerlIO_printf()\fR. See "size" in perlfunc.
+.SS "Pointer-To-Integer and Integer-To-Pointer"
+.IX Subsection "Pointer-To-Integer and Integer-To-Pointer"
+Because pointer size does not necessarily equal integer size,
+use the follow macros to do it right.
+.PP
+.Vb 4
+\& PTR2UV(pointer)
+\& PTR2IV(pointer)
+\& PTR2NV(pointer)
+\& INT2PTR(pointertotype, integer)
+.Ve
+.PP
+For example:
+.PP
+.Vb 2
+\& IV iv = ...;
+\& SV *sv = INT2PTR(SV*, iv);
+.Ve
+.PP
+and
+.PP
+.Vb 2
+\& AV *av = ...;
+\& UV uv = PTR2UV(av);
+.Ve
+.PP
+There are also
+.PP
+.Vb 2
+\& PTR2nat(pointer) /* pointer to integer of PTRSIZE */
+\& PTR2ul(pointer) /* pointer to unsigned long */
+.Ve
+.PP
+And \f(CW\*(C`PTRV\*(C'\fR which gives the native type for an integer the same size as
+pointers, such as \f(CW\*(C`unsigned\*(C'\fR or \f(CW\*(C`unsigned long\*(C'\fR.
+.SS "Exception Handling"
+.IX Subsection "Exception Handling"
+There are a couple of macros to do very basic exception handling in XS
+modules. You have to define \f(CW\*(C`NO_XSLOCKS\*(C'\fR before including \fIXSUB.h\fR to
+be able to use these macros:
+.PP
+.Vb 2
+\& #define NO_XSLOCKS
+\& #include "XSUB.h"
+.Ve
+.PP
+You can use these macros if you call code that may croak, but you need
+to do some cleanup before giving control back to Perl. For example:
+.PP
+.Vb 1
+\& dXCPT; /* set up necessary variables */
+\&
+\& XCPT_TRY_START {
+\& code_that_may_croak();
+\& } XCPT_TRY_END
+\&
+\& XCPT_CATCH
+\& {
+\& /* do cleanup here */
+\& XCPT_RETHROW;
+\& }
+.Ve
+.PP
+Note that you always have to rethrow an exception that has been
+caught. Using these macros, it is not possible to just catch the
+exception and ignore it. If you have to ignore the exception, you
+have to use the \f(CW\*(C`call_*\*(C'\fR function.
+.PP
+The advantage of using the above macros is that you don't have
+to setup an extra function for \f(CW\*(C`call_*\*(C'\fR, and that using these
+macros is faster than using \f(CW\*(C`call_*\*(C'\fR.
+.SS "Source Documentation"
+.IX Subsection "Source Documentation"
+There's an effort going on to document the internal functions and
+automatically produce reference manuals from them \-\- perlapi is one
+such manual which details all the functions which are available to XS
+writers. perlintern is the autogenerated manual for the functions
+which are not part of the API and are supposedly for internal use only.
+.PP
+Source documentation is created by putting POD comments into the C
+source, like this:
+.PP
+.Vb 2
+\& /*
+\& =for apidoc sv_setiv
+\&
+\& Copies an integer into the given SV. Does not handle \*(Aqset\*(Aq magic. See
+\& L<perlapi/sv_setiv_mg>.
+\&
+\& =cut
+\& */
+.Ve
+.PP
+Please try and supply some documentation if you add functions to the
+Perl core.
+.SS "Backwards compatibility"
+.IX Subsection "Backwards compatibility"
+The Perl API changes over time. New functions are
+added or the interfaces of existing functions are
+changed. The \f(CW\*(C`Devel::PPPort\*(C'\fR module tries to
+provide compatibility code for some of these changes, so XS writers don't
+have to code it themselves when supporting multiple versions of Perl.
+.PP
+\&\f(CW\*(C`Devel::PPPort\*(C'\fR generates a C header file \fIppport.h\fR that can also
+be run as a Perl script. To generate \fIppport.h\fR, run:
+.PP
+.Vb 1
+\& perl \-MDevel::PPPort \-eDevel::PPPort::WriteFile
+.Ve
+.PP
+Besides checking existing XS code, the script can also be used to retrieve
+compatibility information for various API calls using the \f(CW\*(C`\-\-api\-info\*(C'\fR
+command line switch. For example:
+.PP
+.Vb 1
+\& % perl ppport.h \-\-api\-info=sv_magicext
+.Ve
+.PP
+For details, see \f(CW\*(C`perldoc\ ppport.h\*(C'\fR.
+.SH "Unicode Support"
+.IX Header "Unicode Support"
+Perl 5.6.0 introduced Unicode support. It's important for porters and XS
+writers to understand this support and make sure that the code they
+write does not corrupt Unicode data.
+.SS "What \fBis\fP Unicode, anyway?"
+.IX Subsection "What is Unicode, anyway?"
+In the olden, less enlightened times, we all used to use ASCII. Most of
+us did, anyway. The big problem with ASCII is that it's American. Well,
+no, that's not actually the problem; the problem is that it's not
+particularly useful for people who don't use the Roman alphabet. What
+used to happen was that particular languages would stick their own
+alphabet in the upper range of the sequence, between 128 and 255. Of
+course, we then ended up with plenty of variants that weren't quite
+ASCII, and the whole point of it being a standard was lost.
+.PP
+Worse still, if you've got a language like Chinese or
+Japanese that has hundreds or thousands of characters, then you really
+can't fit them into a mere 256, so they had to forget about ASCII
+altogether, and build their own systems using pairs of numbers to refer
+to one character.
+.PP
+To fix this, some people formed Unicode, Inc. and
+produced a new character set containing all the characters you can
+possibly think of and more. There are several ways of representing these
+characters, and the one Perl uses is called UTF\-8. UTF\-8 uses
+a variable number of bytes to represent a character. You can learn more
+about Unicode and Perl's Unicode model in perlunicode.
+.PP
+(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of
+UTF\-8 adapted for EBCDIC platforms. Below, we just talk about UTF\-8.
+UTF-EBCDIC is like UTF\-8, but the details are different. The macros
+hide the differences from you, just remember that the particular numbers
+and bit patterns presented below will differ in UTF-EBCDIC.)
+.SS "How can I recognise a UTF\-8 string?"
+.IX Subsection "How can I recognise a UTF-8 string?"
+You can't. This is because UTF\-8 data is stored in bytes just like
+non\-UTF\-8 data. The Unicode character 200, (\f(CW0xC8\fR for you hex types)
+capital E with a grave accent, is represented by the two bytes
+\&\f(CW\*(C`v196.172\*(C'\fR. Unfortunately, the non-Unicode string \f(CW\*(C`chr(196).chr(172)\*(C'\fR
+has that byte sequence as well. So you can't tell just by looking \-\- this
+is what makes Unicode input an interesting problem.
+.PP
+In general, you either have to know what you're dealing with, or you
+have to guess. The API function \f(CW\*(C`is_utf8_string\*(C'\fR can help; it'll tell
+you if a string contains only valid UTF\-8 characters, and the chances
+of a non\-UTF\-8 string looking like valid UTF\-8 become very small very
+quickly with increasing string length. On a character-by-character
+basis, \f(CW\*(C`isUTF8_CHAR\*(C'\fR
+will tell you whether the current character in a string is valid UTF\-8.
+.SS "How does UTF\-8 represent Unicode characters?"
+.IX Subsection "How does UTF-8 represent Unicode characters?"
+As mentioned above, UTF\-8 uses a variable number of bytes to store a
+character. Characters with values 0...127 are stored in one
+byte, just like good ol' ASCII. Character 128 is stored as
+\&\f(CW\*(C`v194.128\*(C'\fR; this continues up to character 191, which is
+\&\f(CW\*(C`v194.191\*(C'\fR. Now we've run out of bits (191 is binary
+\&\f(CW10111111\fR) so we move on; character 192 is \f(CW\*(C`v195.128\*(C'\fR. And
+so it goes on, moving to three bytes at character 2048.
+"Unicode Encodings" in perlunicode has pictures of how this works.
+.PP
+Assuming you know you're dealing with a UTF\-8 string, you can find out
+how long the first character in it is with the \f(CW\*(C`UTF8SKIP\*(C'\fR macro:
+.PP
+.Vb 2
+\& char *utf = "\e305\e233\e340\e240\e201";
+\& I32 len;
+\&
+\& len = UTF8SKIP(utf); /* len is 2 here */
+\& utf += len;
+\& len = UTF8SKIP(utf); /* len is 3 here */
+.Ve
+.PP
+Another way to skip over characters in a UTF\-8 string is to use
+\&\f(CW\*(C`utf8_hop\*(C'\fR, which takes a string and a number of characters to skip
+over. You're on your own about bounds checking, though, so don't use it
+lightly.
+.PP
+All bytes in a multi-byte UTF\-8 character will have the high bit set,
+so you can test if you need to do something special with this
+character like this (the \f(CWUTF8_IS_INVARIANT()\fR is a macro that tests
+whether the byte is encoded as a single byte even in UTF\-8):
+.PP
+.Vb 7
+\& U8 *utf; /* Initialize this to point to the beginning of the
+\& sequence to convert */
+\& U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence
+\& pointed to by \*(Aqutf\*(Aq */
+\& UV uv; /* Returned code point; note: a UV, not a U8, not a
+\& char */
+\& STRLEN len; /* Returned length of character in bytes */
+\&
+\& if (!UTF8_IS_INVARIANT(*utf))
+\& /* Must treat this as UTF\-8 */
+\& uv = utf8_to_uvchr_buf(utf, utf_end, &len);
+\& else
+\& /* OK to treat this character as a byte */
+\& uv = *utf;
+.Ve
+.PP
+You can also see in that example that we use \f(CW\*(C`utf8_to_uvchr_buf\*(C'\fR to get the
+value of the character; the inverse function \f(CW\*(C`uvchr_to_utf8\*(C'\fR is available
+for putting a UV into UTF\-8:
+.PP
+.Vb 6
+\& if (!UVCHR_IS_INVARIANT(uv))
+\& /* Must treat this as UTF8 */
+\& utf8 = uvchr_to_utf8(utf8, uv);
+\& else
+\& /* OK to treat this character as a byte */
+\& *utf8++ = uv;
+.Ve
+.PP
+You \fBmust\fR convert characters to UVs using the above functions if
+you're ever in a situation where you have to match UTF\-8 and non\-UTF\-8
+characters. You may not skip over UTF\-8 characters in this case. If you
+do this, you'll lose the ability to match hi-bit non\-UTF\-8 characters;
+for instance, if your UTF\-8 string contains \f(CW\*(C`v196.172\*(C'\fR, and you skip
+that character, you can never match a \f(CWchr(200)\fR in a non\-UTF\-8 string.
+So don't do that!
+.PP
+(Note that we don't have to test for invariant characters in the
+examples above. The functions work on any well-formed UTF\-8 input.
+It's just that its faster to avoid the function overhead when it's not
+needed.)
+.SS "How does Perl store UTF\-8 strings?"
+.IX Subsection "How does Perl store UTF-8 strings?"
+Currently, Perl deals with UTF\-8 strings and non\-UTF\-8 strings
+slightly differently. A flag in the SV, \f(CW\*(C`SVf_UTF8\*(C'\fR, indicates that the
+string is internally encoded as UTF\-8. Without it, the byte value is the
+codepoint number and vice versa. This flag is only meaningful if the SV
+is \f(CW\*(C`SvPOK\*(C'\fR or immediately after stringification via \f(CW\*(C`SvPV\*(C'\fR or a
+similar macro. You can check and manipulate this flag with the
+following macros:
+.PP
+.Vb 3
+\& SvUTF8(sv)
+\& SvUTF8_on(sv)
+\& SvUTF8_off(sv)
+.Ve
+.PP
+This flag has an important effect on Perl's treatment of the string: if
+UTF\-8 data is not properly distinguished, regular expressions,
+\&\f(CW\*(C`length\*(C'\fR, \f(CW\*(C`substr\*(C'\fR and other string handling operations will have
+undesirable (wrong) results.
+.PP
+The problem comes when you have, for instance, a string that isn't
+flagged as UTF\-8, and contains a byte sequence that could be UTF\-8 \-\-
+especially when combining non\-UTF\-8 and UTF\-8 strings.
+.PP
+Never forget that the \f(CW\*(C`SVf_UTF8\*(C'\fR flag is separate from the PV value; you
+need to be sure you don't accidentally knock it off while you're
+manipulating SVs. More specifically, you cannot expect to do this:
+.PP
+.Vb 4
+\& SV *sv;
+\& SV *nsv;
+\& STRLEN len;
+\& char *p;
+\&
+\& p = SvPV(sv, len);
+\& frobnicate(p);
+\& nsv = newSVpvn(p, len);
+.Ve
+.PP
+The \f(CW\*(C`char*\*(C'\fR string does not tell you the whole story, and you can't
+copy or reconstruct an SV just by copying the string value. Check if the
+old SV has the UTF8 flag set (\fIafter\fR the \f(CW\*(C`SvPV\*(C'\fR call), and act
+accordingly:
+.PP
+.Vb 6
+\& p = SvPV(sv, len);
+\& is_utf8 = SvUTF8(sv);
+\& frobnicate(p, is_utf8);
+\& nsv = newSVpvn(p, len);
+\& if (is_utf8)
+\& SvUTF8_on(nsv);
+.Ve
+.PP
+In the above, your \f(CW\*(C`frobnicate\*(C'\fR function has been changed to be made
+aware of whether or not it's dealing with UTF\-8 data, so that it can
+handle the string appropriately.
+.PP
+Since just passing an SV to an XS function and copying the data of
+the SV is not enough to copy the UTF8 flags, even less right is just
+passing a \f(CW\*(C`char\ *\*(C'\fR to an XS function.
+.PP
+For full generality, use the \f(CW\*(C`DO_UTF8\*(C'\fR macro to see if the
+string in an SV is to be \fItreated\fR as UTF\-8. This takes into account
+if the call to the XS function is being made from within the scope of
+\&\f(CW\*(C`use\ bytes\*(C'\fR. If so, the underlying bytes that comprise the
+UTF\-8 string are to be exposed, rather than the character they
+represent. But this pragma should only really be used for debugging and
+perhaps low-level testing at the byte level. Hence most XS code need
+not concern itself with this, but various areas of the perl core do need
+to support it.
+.PP
+And this isn't the whole story. Starting in Perl v5.12, strings that
+aren't encoded in UTF\-8 may also be treated as Unicode under various
+conditions (see "ASCII Rules versus Unicode Rules" in perlunicode).
+This is only really a problem for characters whose ordinals are between
+128 and 255, and their behavior varies under ASCII versus Unicode rules
+in ways that your code cares about (see "The "Unicode Bug"" in perlunicode).
+There is no published API for dealing with this, as it is subject to
+change, but you can look at the code for \f(CW\*(C`pp_lc\*(C'\fR in \fIpp.c\fR for an
+example as to how it's currently done.
+.SS "How do I pass a Perl string to a C library?"
+.IX Subsection "How do I pass a Perl string to a C library?"
+A Perl string, conceptually, is an opaque sequence of code points.
+Many C libraries expect their inputs to be "classical" C strings, which are
+arrays of octets 1\-255, terminated with a NUL byte. Your job when writing
+an interface between Perl and a C library is to define the mapping between
+Perl and that library.
+.PP
+Generally speaking, \f(CW\*(C`SvPVbyte\*(C'\fR and related macros suit this task well.
+These assume that your Perl string is a "byte string", i.e., is either
+raw, undecoded input into Perl or is pre-encoded to, e.g., UTF\-8.
+.PP
+Alternatively, if your C library expects UTF\-8 text, you can use
+\&\f(CW\*(C`SvPVutf8\*(C'\fR and related macros. This has the same effect as encoding
+to UTF\-8 then calling the corresponding \f(CW\*(C`SvPVbyte\*(C'\fR\-related macro.
+.PP
+Some C libraries may expect other encodings (e.g., UTF\-16LE). To give
+Perl strings to such libraries
+you must either do that encoding in Perl then use \f(CW\*(C`SvPVbyte\*(C'\fR, or
+use an intermediary C library to convert from however Perl stores the
+string to the desired encoding.
+.PP
+Take care also that NULs in your Perl string don't confuse the C
+library. If possible, give the string's length to the C library; if that's
+not possible, consider rejecting strings that contain NUL bytes.
+.PP
+\fIWhat about \fR\f(CI\*(C`SvPV\*(C'\fR\fI, \fR\f(CI\*(C`SvPV_nolen\*(C'\fR\fI, etc.?\fR
+.IX Subsection "What about SvPV, SvPV_nolen, etc.?"
+.PP
+Consider a 3\-character Perl string \f(CW\*(C`$foo = "\ex64\ex78\ex8c"\*(C'\fR.
+Perl can store these 3 characters either of two ways:
+.IP \(bu 4
+bytes: 0x64 0x78 0x8c
+.IP \(bu 4
+UTF\-8: 0x64 0x78 0xc2 0x8c
+.PP
+Now let's say you convert \f(CW$foo\fR to a C string thus:
+.PP
+.Vb 2
+\& STRLEN strlen;
+\& char *str = SvPV(foo_sv, strlen);
+.Ve
+.PP
+At this point \f(CW\*(C`str\*(C'\fR could point to a 3\-byte C string or a 4\-byte one.
+.PP
+Generally speaking, we want \f(CW\*(C`str\*(C'\fR to be the same regardless of how
+Perl stores \f(CW$foo\fR, so the ambiguity here is undesirable. \f(CW\*(C`SvPVbyte\*(C'\fR
+and \f(CW\*(C`SvPVutf8\*(C'\fR solve that by giving predictable output: use
+\&\f(CW\*(C`SvPVbyte\*(C'\fR if your C library expects byte strings, or \f(CW\*(C`SvPVutf8\*(C'\fR
+if it expects UTF\-8.
+.PP
+If your C library happens to support both encodings, then \f(CW\*(C`SvPV\*(C'\fR\-\-always
+in tandem with lookups to \f(CW\*(C`SvUTF8\*(C'\fR!\-\-may be safe and (slightly) more
+efficient.
+.PP
+\&\fBTESTING\fR \fBTIP:\fR Use utf8's \f(CW\*(C`upgrade\*(C'\fR and \f(CW\*(C`downgrade\*(C'\fR functions
+in your tests to ensure consistent handling regardless of Perl's
+internal encoding.
+.SS "How do I convert a string to UTF\-8?"
+.IX Subsection "How do I convert a string to UTF-8?"
+If you're mixing UTF\-8 and non\-UTF\-8 strings, it is necessary to upgrade
+the non\-UTF\-8 strings to UTF\-8. If you've got an SV, the easiest way to do
+this is:
+.PP
+.Vb 1
+\& sv_utf8_upgrade(sv);
+.Ve
+.PP
+However, you must not do this, for example:
+.PP
+.Vb 2
+\& if (!SvUTF8(left))
+\& sv_utf8_upgrade(left);
+.Ve
+.PP
+If you do this in a binary operator, you will actually change one of the
+strings that came into the operator, and, while it shouldn't be noticeable
+by the end user, it can cause problems in deficient code.
+.PP
+Instead, \f(CW\*(C`bytes_to_utf8\*(C'\fR will give you a UTF\-8\-encoded \fBcopy\fR of its
+string argument. This is useful for having the data available for
+comparisons and so on, without harming the original SV. There's also
+\&\f(CW\*(C`utf8_to_bytes\*(C'\fR to go the other way, but naturally, this will fail if
+the string contains any characters above 255 that can't be represented
+in a single byte.
+.SS "How do I compare strings?"
+.IX Subsection "How do I compare strings?"
+"sv_cmp" in perlapi and "sv_cmp_flags" in perlapi do a lexigraphic
+comparison of two SV's, and handle UTF\-8ness properly. Note, however,
+that Unicode specifies a much fancier mechanism for collation, available
+via the Unicode::Collate module.
+.PP
+To just compare two strings for equality/non\-equality, you can just use
+\&\f(CWmemEQ()\fR and \f(CWmemNE()\fR as usual,
+except the strings must be both UTF\-8 or not UTF\-8 encoded.
+.PP
+To compare two strings case-insensitively, use
+\&\f(CWfoldEQ_utf8()\fR (the strings don't have to have
+the same UTF\-8ness).
+.SS "Is there anything else I need to know?"
+.IX Subsection "Is there anything else I need to know?"
+Not really. Just remember these things:
+.IP \(bu 3
+There's no way to tell if a \f(CW\*(C`char\ *\*(C'\fR or \f(CW\*(C`U8\ *\*(C'\fR string is UTF\-8
+or not. But you can tell if an SV is to be treated as UTF\-8 by calling
+\&\f(CW\*(C`DO_UTF8\*(C'\fR on it, after stringifying it with \f(CW\*(C`SvPV\*(C'\fR or a similar
+macro. And, you can tell if SV is actually UTF\-8 (even if it is not to
+be treated as such) by looking at its \f(CW\*(C`SvUTF8\*(C'\fR flag (again after
+stringifying it). Don't forget to set the flag if something should be
+UTF\-8.
+Treat the flag as part of the PV, even though it's not \-\- if you pass on
+the PV to somewhere, pass on the flag too.
+.IP \(bu 3
+If a string is UTF\-8, \fBalways\fR use \f(CW\*(C`utf8_to_uvchr_buf\*(C'\fR to get at the value,
+unless \f(CWUTF8_IS_INVARIANT(*s)\fR in which case you can use \f(CW*s\fR.
+.IP \(bu 3
+When writing a character UV to a UTF\-8 string, \fBalways\fR use
+\&\f(CW\*(C`uvchr_to_utf8\*(C'\fR, unless \f(CW\*(C`UVCHR_IS_INVARIANT(uv))\*(C'\fR in which case
+you can use \f(CW\*(C`*s = uv\*(C'\fR.
+.IP \(bu 3
+Mixing UTF\-8 and non\-UTF\-8 strings is
+tricky. Use \f(CW\*(C`bytes_to_utf8\*(C'\fR to get
+a new string which is UTF\-8 encoded, and then combine them.
+.SH "Custom Operators"
+.IX Header "Custom Operators"
+Custom operator support is an experimental feature that allows you to
+define your own ops. This is primarily to allow the building of
+interpreters for other languages in the Perl core, but it also allows
+optimizations through the creation of "macro-ops" (ops which perform the
+functions of multiple ops which are usually executed together, such as
+\&\f(CW\*(C`gvsv, gvsv, add\*(C'\fR.)
+.PP
+This feature is implemented as a new op type, \f(CW\*(C`OP_CUSTOM\*(C'\fR. The Perl
+core does not "know" anything special about this op type, and so it will
+not be involved in any optimizations. This also means that you can
+define your custom ops to be any op structure \-\- unary, binary, list and
+so on \-\- you like.
+.PP
+It's important to know what custom operators won't do for you. They
+won't let you add new syntax to Perl, directly. They won't even let you
+add new keywords, directly. In fact, they won't change the way Perl
+compiles a program at all. You have to do those changes yourself, after
+Perl has compiled the program. You do this either by manipulating the op
+tree using a \f(CW\*(C`CHECK\*(C'\fR block and the \f(CW\*(C`B::Generate\*(C'\fR module, or by adding
+a custom peephole optimizer with the \f(CW\*(C`optimize\*(C'\fR module.
+.PP
+When you do this, you replace ordinary Perl ops with custom ops by
+creating ops with the type \f(CW\*(C`OP_CUSTOM\*(C'\fR and the \f(CW\*(C`op_ppaddr\*(C'\fR of your own
+PP function. This should be defined in XS code, and should look like
+the PP ops in \f(CW\*(C`pp_*.c\*(C'\fR. You are responsible for ensuring that your op
+takes the appropriate number of values from the stack, and you are
+responsible for adding stack marks if necessary.
+.PP
+You should also "register" your op with the Perl interpreter so that it
+can produce sensible error and warning messages. Since it is possible to
+have multiple custom ops within the one "logical" op type \f(CW\*(C`OP_CUSTOM\*(C'\fR,
+Perl uses the value of \f(CW\*(C`o\->op_ppaddr\*(C'\fR to determine which custom op
+it is dealing with. You should create an \f(CW\*(C`XOP\*(C'\fR structure for each
+ppaddr you use, set the properties of the custom op with
+\&\f(CW\*(C`XopENTRY_set\*(C'\fR, and register the structure against the ppaddr using
+\&\f(CW\*(C`Perl_custom_op_register\*(C'\fR. A trivial example might look like:
+.PP
+.Vb 2
+\& static XOP my_xop;
+\& static OP *my_pp(pTHX);
+\&
+\& BOOT:
+\& XopENTRY_set(&my_xop, xop_name, "myxop");
+\& XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
+\& Perl_custom_op_register(aTHX_ my_pp, &my_xop);
+.Ve
+.PP
+The available fields in the structure are:
+.IP xop_name 4
+.IX Item "xop_name"
+A short name for your op. This will be included in some error messages,
+and will also be returned as \f(CW\*(C`$op\->name\*(C'\fR by the B module, so
+it will appear in the output of module like B::Concise.
+.IP xop_desc 4
+.IX Item "xop_desc"
+A short description of the function of the op.
+.IP xop_class 4
+.IX Item "xop_class"
+Which of the various \f(CW*OP\fR structures this op uses. This should be one of
+the \f(CW\*(C`OA_*\*(C'\fR constants from \fIop.h\fR, namely
+.RS 4
+.IP OA_BASEOP 4
+.IX Item "OA_BASEOP"
+.PD 0
+.IP OA_UNOP 4
+.IX Item "OA_UNOP"
+.IP OA_BINOP 4
+.IX Item "OA_BINOP"
+.IP OA_LOGOP 4
+.IX Item "OA_LOGOP"
+.IP OA_LISTOP 4
+.IX Item "OA_LISTOP"
+.IP OA_PMOP 4
+.IX Item "OA_PMOP"
+.IP OA_SVOP 4
+.IX Item "OA_SVOP"
+.IP OA_PADOP 4
+.IX Item "OA_PADOP"
+.IP OA_PVOP_OR_SVOP 4
+.IX Item "OA_PVOP_OR_SVOP"
+.PD
+This should be interpreted as '\f(CW\*(C`PVOP\*(C'\fR' only. The \f(CW\*(C`_OR_SVOP\*(C'\fR is because
+the only core \f(CW\*(C`PVOP\*(C'\fR, \f(CW\*(C`OP_TRANS\*(C'\fR, can sometimes be a \f(CW\*(C`SVOP\*(C'\fR instead.
+.IP OA_LOOP 4
+.IX Item "OA_LOOP"
+.PD 0
+.IP OA_COP 4
+.IX Item "OA_COP"
+.RE
+.RS 4
+.PD
+.Sp
+The other \f(CW\*(C`OA_*\*(C'\fR constants should not be used.
+.RE
+.IP xop_peep 4
+.IX Item "xop_peep"
+This member is of type \f(CW\*(C`Perl_cpeep_t\*(C'\fR, which expands to \f(CW\*(C`void
+(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)\*(C'\fR. If it is set, this function
+will be called from \f(CW\*(C`Perl_rpeep\*(C'\fR when ops of this type are encountered
+by the peephole optimizer. \fIo\fR is the OP that needs optimizing;
+\&\fIoldop\fR is the previous OP optimized, whose \f(CW\*(C`op_next\*(C'\fR points to \fIo\fR.
+.PP
+\&\f(CW\*(C`B::Generate\*(C'\fR directly supports the creation of custom ops by name.
+.SH Stacks
+.IX Header "Stacks"
+Descriptions above occasionally refer to "the stack", but there are in fact
+many stack-like data structures within the perl interpreter. When otherwise
+unqualified, "the stack" usually refers to the value stack.
+.PP
+The various stacks have different purposes, and operate in slightly different
+ways. Their differences are noted below.
+.SS "Value Stack"
+.IX Subsection "Value Stack"
+This stack stores the values that regular perl code is operating on, usually
+intermediate values of expressions within a statement. The stack itself is
+formed of an array of SV pointers.
+.PP
+The base of this stack is pointed to by the interpreter variable
+\&\f(CW\*(C`PL_stack_base\*(C'\fR, of type \f(CW\*(C`SV **\*(C'\fR.
+.PP
+The head of the stack is \f(CW\*(C`PL_stack_sp\*(C'\fR, and points to the most
+recently-pushed item.
+.PP
+Items are pushed to the stack by using the \f(CWPUSHs()\fR macro or its variants
+described above; \f(CWXPUSHs()\fR, \f(CWmPUSHs()\fR, \f(CWmXPUSHs()\fR and the typed
+versions. Note carefully that the non\-\f(CW\*(C`X\*(C'\fR versions of these macros do not
+check the size of the stack and assume it to be big enough. These must be
+paired with a suitable check of the stack's size, such as the \f(CW\*(C`EXTEND\*(C'\fR macro
+to ensure it is large enough. For example
+.PP
+.Vb 5
+\& EXTEND(SP, 4);
+\& mPUSHi(10);
+\& mPUSHi(20);
+\& mPUSHi(30);
+\& mPUSHi(40);
+.Ve
+.PP
+This is slightly more performant than making four separate checks in four
+separate \f(CWmXPUSHi()\fR calls.
+.PP
+As a further performance optimisation, the various \f(CW\*(C`PUSH\*(C'\fR macros all operate
+using a local variable \f(CW\*(C`SP\*(C'\fR, rather than the interpreter-global variable
+\&\f(CW\*(C`PL_stack_sp\*(C'\fR. This variable is declared by the \f(CW\*(C`dSP\*(C'\fR macro \- though it is
+normally implied by XSUBs and similar so it is rare you have to consider it
+directly. Once declared, the \f(CW\*(C`PUSH\*(C'\fR macros will operate only on this local
+variable, so before invoking any other perl core functions you must use the
+\&\f(CW\*(C`PUTBACK\*(C'\fR macro to return the value from the local \f(CW\*(C`SP\*(C'\fR variable back to
+the interpreter variable. Similarly, after calling a perl core function which
+may have had reason to move the stack or push/pop values to it, you must use
+the \f(CW\*(C`SPAGAIN\*(C'\fR macro which refreshes the local \f(CW\*(C`SP\*(C'\fR value back from the
+interpreter one.
+.PP
+Items are popped from the stack by using the \f(CW\*(C`POPs\*(C'\fR macro or its typed
+versions, There is also a macro \f(CW\*(C`TOPs\*(C'\fR that inspects the topmost item without
+removing it.
+.PP
+Note specifically that SV pointers on the value stack do not contribute to the
+overall reference count of the xVs being referred to. If newly-created xVs are
+being pushed to the stack you must arrange for them to be destroyed at a
+suitable time; usually by using one of the \f(CW\*(C`mPUSH*\*(C'\fR macros or \f(CWsv_2mortal()\fR
+to mortalise the xV.
+.SS "Mark Stack"
+.IX Subsection "Mark Stack"
+The value stack stores individual perl scalar values as temporaries between
+expressions. Some perl expressions operate on entire lists; for that purpose
+we need to know where on the stack each list begins. This is the purpose of the
+mark stack.
+.PP
+The mark stack stores integers as I32 values, which are the height of the
+value stack at the time before the list began; thus the mark itself actually
+points to the value stack entry one before the list. The list itself starts at
+\&\f(CW\*(C`mark + 1\*(C'\fR.
+.PP
+The base of this stack is pointed to by the interpreter variable
+\&\f(CW\*(C`PL_markstack\*(C'\fR, of type \f(CW\*(C`I32 *\*(C'\fR.
+.PP
+The head of the stack is \f(CW\*(C`PL_markstack_ptr\*(C'\fR, and points to the most
+recently-pushed item.
+.PP
+Items are pushed to the stack by using the \f(CWPUSHMARK()\fR macro. Even though
+the stack itself stores (value) stack indices as integers, the \f(CW\*(C`PUSHMARK\*(C'\fR
+macro should be given a stack pointer directly; it will calculate the index
+offset by comparing to the \f(CW\*(C`PL_stack_sp\*(C'\fR variable. Thus almost always the
+code to perform this is
+.PP
+.Vb 1
+\& PUSHMARK(SP);
+.Ve
+.PP
+Items are popped from the stack by the \f(CW\*(C`POPMARK\*(C'\fR macro. There is also a macro
+\&\f(CW\*(C`TOPMARK\*(C'\fR that inspects the topmost item without removing it. These macros
+return I32 index values directly. There is also the \f(CW\*(C`dMARK\*(C'\fR macro which
+declares a new SV double-pointer variable, called \f(CW\*(C`mark\*(C'\fR, which points at the
+marked stack slot; this is the usual macro that C code will use when operating
+on lists given on the stack.
+.PP
+As noted above, the \f(CW\*(C`mark\*(C'\fR variable itself will point at the most recently
+pushed value on the value stack before the list begins, and so the list itself
+starts at \f(CW\*(C`mark + 1\*(C'\fR. The values of the list may be iterated by code such as
+.PP
+.Vb 4
+\& for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) {
+\& SV *item = *svp;
+\& ...
+\& }
+.Ve
+.PP
+Note specifically in the case that the list is already empty, \f(CW\*(C`mark\*(C'\fR will
+equal \f(CW\*(C`PL_stack_sp\*(C'\fR.
+.PP
+Because the \f(CW\*(C`mark\*(C'\fR variable is converted to a pointer on the value stack,
+extra care must be taken if \f(CW\*(C`EXTEND\*(C'\fR or any of the \f(CW\*(C`XPUSH\*(C'\fR macros are
+invoked within the function, because the stack may need to be moved to
+extend it and so the existing pointer will now be invalid. If this may be a
+problem, a possible solution is to track the mark offset as an integer and
+track the mark itself later on after the stack had been moved.
+.PP
+.Vb 1
+\& I32 markoff = POPMARK;
+\&
+\& ...
+\&
+\& SP **mark = PL_stack_base + markoff;
+.Ve
+.SS "Temporaries Stack"
+.IX Subsection "Temporaries Stack"
+As noted above, xV references on the main value stack do not contribute to the
+reference count of an xV, and so another mechanism is used to track when
+temporary values which live on the stack must be released. This is the job of
+the temporaries stack.
+.PP
+The temporaries stack stores pointers to xVs whose reference counts will be
+decremented soon.
+.PP
+The base of this stack is pointed to by the interpreter variable
+\&\f(CW\*(C`PL_tmps_stack\*(C'\fR, of type \f(CW\*(C`SV **\*(C'\fR.
+.PP
+The head of the stack is indexed by \f(CW\*(C`PL_tmps_ix\*(C'\fR, an integer which stores the
+index in the array of the most recently-pushed item.
+.PP
+There is no public API to directly push items to the temporaries stack. Instead,
+the API function \f(CWsv_2mortal()\fR is used to mortalize an xV, adding its
+address to the temporaries stack.
+.PP
+Likewise, there is no public API to read values from the temporaries stack.
+Instead, the macros \f(CW\*(C`SAVETMPS\*(C'\fR and \f(CW\*(C`FREETMPS\*(C'\fR are used. The \f(CW\*(C`SAVETMPS\*(C'\fR
+macro establishes the base levels of the temporaries stack, by capturing the
+current value of \f(CW\*(C`PL_tmps_ix\*(C'\fR into \f(CW\*(C`PL_tmps_floor\*(C'\fR and saving the previous
+value to the save stack. Thereafter, whenever \f(CW\*(C`FREETMPS\*(C'\fR is invoked all of
+the temporaries that have been pushed since that level are reclaimed.
+.PP
+While it is common to see these two macros in pairs within an \f(CW\*(C`ENTER\*(C'\fR/
+\&\f(CW\*(C`LEAVE\*(C'\fR pair, it is not necessary to match them. It is permitted to invoke
+\&\f(CW\*(C`FREETMPS\*(C'\fR multiple times since the most recent \f(CW\*(C`SAVETMPS\*(C'\fR; for example in a
+loop iterating over elements of a list. While you can invoke \f(CW\*(C`SAVETMPS\*(C'\fR
+multiple times within a scope pair, it is unlikely to be useful. Subsequent
+invocations will move the temporaries floor further up, thus effectively
+trapping the existing temporaries to only be released at the end of the scope.
+.SS "Save Stack"
+.IX Subsection "Save Stack"
+The save stack is used by perl to implement the \f(CW\*(C`local\*(C'\fR keyword and other
+similar behaviours; any cleanup operations that need to be performed when
+leaving the current scope. Items pushed to this stack generally capture the
+current value of some internal variable or state, which will be restored when
+the scope is unwound due to leaving, \f(CW\*(C`return\*(C'\fR, \f(CW\*(C`die\*(C'\fR, \f(CW\*(C`goto\*(C'\fR or other
+reasons.
+.PP
+Whereas other perl internal stacks store individual items all of the same type
+(usually SV pointers or integers), the items pushed to the save stack are
+formed of many different types, having multiple fields to them. For example,
+the \f(CW\*(C`SAVEt_INT\*(C'\fR type needs to store both the address of the \f(CW\*(C`int\*(C'\fR variable
+to restore, and the value to restore it to. This information could have been
+stored using fields of a \f(CW\*(C`struct\*(C'\fR, but would have to be large enough to store
+three pointers in the largest case, which would waste a lot of space in most
+of the smaller cases.
+.PP
+Instead, the stack stores information in a variable-length encoding of \f(CW\*(C`ANY\*(C'\fR
+structures. The final value pushed is stored in the \f(CW\*(C`UV\*(C'\fR field which encodes
+the kind of item held by the preceding items; the count and types of which
+will depend on what kind of item is being stored. The kind field is pushed
+last because that will be the first field to be popped when unwinding items
+from the stack.
+.PP
+The base of this stack is pointed to by the interpreter variable
+\&\f(CW\*(C`PL_savestack\*(C'\fR, of type \f(CW\*(C`ANY *\*(C'\fR.
+.PP
+The head of the stack is indexed by \f(CW\*(C`PL_savestack_ix\*(C'\fR, an integer which
+stores the index in the array at which the next item should be pushed. (Note
+that this is different to most other stacks, which reference the most
+recently-pushed item).
+.PP
+Items are pushed to the save stack by using the various \f(CW\*(C`SAVE...()\*(C'\fR macros.
+Many of these macros take a variable and store both its address and current
+value on the save stack, ensuring that value gets restored on scope exit.
+.PP
+.Vb 5
+\& SAVEI8(i8)
+\& SAVEI16(i16)
+\& SAVEI32(i32)
+\& SAVEINT(i)
+\& ...
+.Ve
+.PP
+There are also a variety of other special-purpose macros which save particular
+types or values of interest. \f(CW\*(C`SAVETMPS\*(C'\fR has already been mentioned above.
+Others include \f(CW\*(C`SAVEFREEPV\*(C'\fR which arranges for a PV (i.e. a string buffer) to
+be freed, or \f(CW\*(C`SAVEDESTRUCTOR\*(C'\fR which arranges for a given function pointer to
+be invoked on scope exit. A full list of such macros can be found in
+\&\fIscope.h\fR.
+.PP
+There is no public API for popping individual values or items from the save
+stack. Instead, via the scope stack, the \f(CW\*(C`ENTER\*(C'\fR and \f(CW\*(C`LEAVE\*(C'\fR pair form a way
+to start and stop nested scopes. Leaving a nested scope via \f(CW\*(C`LEAVE\*(C'\fR will
+restore all of the saved values that had been pushed since the most recent
+\&\f(CW\*(C`ENTER\*(C'\fR.
+.SS "Scope Stack"
+.IX Subsection "Scope Stack"
+As with the mark stack to the value stack, the scope stack forms a pair with
+the save stack. The scope stack stores the height of the save stack at which
+nested scopes begin, and allows the save stack to be unwound back to that
+point when the scope is left.
+.PP
+When perl is built with debugging enabled, there is a second part to this
+stack storing human-readable string names describing the type of stack
+context. Each push operation saves the name as well as the height of the save
+stack, and each pop operation checks the topmost name with what is expected,
+causing an assertion failure if the name does not match.
+.PP
+The base of this stack is pointed to by the interpreter variable
+\&\f(CW\*(C`PL_scopestack\*(C'\fR, of type \f(CW\*(C`I32 *\*(C'\fR. If enabled, the scope stack names are
+stored in a separate array pointed to by \f(CW\*(C`PL_scopestack_name\*(C'\fR, of type
+\&\f(CW\*(C`const char **\*(C'\fR.
+.PP
+The head of the stack is indexed by \f(CW\*(C`PL_scopestack_ix\*(C'\fR, an integer which
+stores the index of the array or arrays at which the next item should be
+pushed. (Note that this is different to most other stacks, which reference the
+most recently-pushed item).
+.PP
+Values are pushed to the scope stack using the \f(CW\*(C`ENTER\*(C'\fR macro, which begins a
+new nested scope. Any items pushed to the save stack are then restored at the
+next nested invocation of the \f(CW\*(C`LEAVE\*(C'\fR macro.
+.SH "Dynamic Scope and the Context Stack"
+.IX Header "Dynamic Scope and the Context Stack"
+\&\fBNote:\fR this section describes a non-public internal API that is subject
+to change without notice.
+.SS "Introduction to the context stack"
+.IX Subsection "Introduction to the context stack"
+In Perl, dynamic scoping refers to the runtime nesting of things like
+subroutine calls, evals etc, as well as the entering and exiting of block
+scopes. For example, the restoring of a \f(CW\*(C`local\*(C'\fRised variable is
+determined by the dynamic scope.
+.PP
+Perl tracks the dynamic scope by a data structure called the context
+stack, which is an array of \f(CW\*(C`PERL_CONTEXT\*(C'\fR structures, and which is
+itself a big union for all the types of context. Whenever a new scope is
+entered (such as a block, a \f(CW\*(C`for\*(C'\fR loop, or a subroutine call), a new
+context entry is pushed onto the stack. Similarly when leaving a block or
+returning from a subroutine call etc. a context is popped. Since the
+context stack represents the current dynamic scope, it can be searched.
+For example, \f(CW\*(C`next LABEL\*(C'\fR searches back through the stack looking for a
+loop context that matches the label; \f(CW\*(C`return\*(C'\fR pops contexts until it
+finds a sub or eval context or similar; \f(CW\*(C`caller\*(C'\fR examines sub contexts on
+the stack.
+.PP
+Each context entry is labelled with a context type, \f(CW\*(C`cx_type\*(C'\fR. Typical
+context types are \f(CW\*(C`CXt_SUB\*(C'\fR, \f(CW\*(C`CXt_EVAL\*(C'\fR etc., as well as \f(CW\*(C`CXt_BLOCK\*(C'\fR
+and \f(CW\*(C`CXt_NULL\*(C'\fR which represent a basic scope (as pushed by \f(CW\*(C`pp_enter\*(C'\fR)
+and a sort block. The type determines which part of the context union are
+valid.
+.PP
+The main division in the context struct is between a substitution scope
+(\f(CW\*(C`CXt_SUBST\*(C'\fR) and block scopes, which are everything else. The former is
+just used while executing \f(CW\*(C`s///e\*(C'\fR, and won't be discussed further
+here.
+.PP
+All the block scope types share a common base, which corresponds to
+\&\f(CW\*(C`CXt_BLOCK\*(C'\fR. This stores the old values of various scope-related
+variables like \f(CW\*(C`PL_curpm\*(C'\fR, as well as information about the current
+scope, such as \f(CW\*(C`gimme\*(C'\fR. On scope exit, the old variables are restored.
+.PP
+Particular block scope types store extra per-type information. For
+example, \f(CW\*(C`CXt_SUB\*(C'\fR stores the currently executing CV, while the various
+for loop types might hold the original loop variable SV. On scope exit,
+the per-type data is processed; for example the CV has its reference count
+decremented, and the original loop variable is restored.
+.PP
+The macro \f(CW\*(C`cxstack\*(C'\fR returns the base of the current context stack, while
+\&\f(CW\*(C`cxstack_ix\*(C'\fR is the index of the current frame within that stack.
+.PP
+In fact, the context stack is actually part of a stack-of-stacks system;
+whenever something unusual is done such as calling a \f(CW\*(C`DESTROY\*(C'\fR or tie
+handler, a new stack is pushed, then popped at the end.
+.PP
+Note that the API described here changed considerably in perl 5.24; prior
+to that, big macros like \f(CW\*(C`PUSHBLOCK\*(C'\fR and \f(CW\*(C`POPSUB\*(C'\fR were used; in 5.24
+they were replaced by the inline static functions described below. In
+addition, the ordering and detail of how these macros/function work
+changed in many ways, often subtly. In particular they didn't handle
+saving the savestack and temps stack positions, and required additional
+\&\f(CW\*(C`ENTER\*(C'\fR, \f(CW\*(C`SAVETMPS\*(C'\fR and \f(CW\*(C`LEAVE\*(C'\fR compared to the new functions. The
+old-style macros will not be described further.
+.SS "Pushing contexts"
+.IX Subsection "Pushing contexts"
+For pushing a new context, the two basic functions are
+\&\f(CW\*(C`cx = cx_pushblock()\*(C'\fR, which pushes a new basic context block and returns
+its address, and a family of similar functions with names like
+\&\f(CWcx_pushsub(cx)\fR which populate the additional type-dependent fields in
+the \f(CW\*(C`cx\*(C'\fR struct. Note that \f(CW\*(C`CXt_NULL\*(C'\fR and \f(CW\*(C`CXt_BLOCK\*(C'\fR don't have their
+own push functions, as they don't store any data beyond that pushed by
+\&\f(CW\*(C`cx_pushblock\*(C'\fR.
+.PP
+The fields of the context struct and the arguments to the \f(CW\*(C`cx_*\*(C'\fR
+functions are subject to change between perl releases, representing
+whatever is convenient or efficient for that release.
+.PP
+A typical context stack pushing can be found in \f(CW\*(C`pp_entersub\*(C'\fR; the
+following shows a simplified and stripped-down example of a non-XS call,
+along with comments showing roughly what each function does.
+.PP
+.Vb 6
+\& dMARK;
+\& U8 gimme = GIMME_V;
+\& bool hasargs = cBOOL(PL_op\->op_flags & OPf_STACKED);
+\& OP *retop = PL_op\->op_next;
+\& I32 old_ss_ix = PL_savestack_ix;
+\& CV *cv = ....;
+\&
+\& /* ... make mortal copies of stack args which are PADTMPs here ... */
+\&
+\& /* ... do any additional savestack pushes here ... */
+\&
+\& /* Now push a new context entry of type \*(AqCXt_SUB\*(Aq; initially just
+\& * doing the actions common to all block types: */
+\&
+\& cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix);
+\&
+\& /* this does (approximately):
+\& CXINC; /* cxstack_ix++ (grow if necessary) */
+\& cx = CX_CUR(); /* and get the address of new frame */
+\& cx\->cx_type = CXt_SUB;
+\& cx\->blk_gimme = gimme;
+\& cx\->blk_oldsp = MARK \- PL_stack_base;
+\& cx\->blk_oldsaveix = old_ss_ix;
+\& cx\->blk_oldcop = PL_curcop;
+\& cx\->blk_oldmarksp = PL_markstack_ptr \- PL_markstack;
+\& cx\->blk_oldscopesp = PL_scopestack_ix;
+\& cx\->blk_oldpm = PL_curpm;
+\& cx\->blk_old_tmpsfloor = PL_tmps_floor;
+\&
+\& PL_tmps_floor = PL_tmps_ix;
+\& */
+\&
+\&
+\& /* then update the new context frame with subroutine\-specific info,
+\& * such as the CV about to be executed: */
+\&
+\& cx_pushsub(cx, cv, retop, hasargs);
+\&
+\& /* this does (approximately):
+\& cx\->blk_sub.cv = cv;
+\& cx\->blk_sub.olddepth = CvDEPTH(cv);
+\& cx\->blk_sub.prevcomppad = PL_comppad;
+\& cx\->cx_type |= (hasargs) ? CXp_HASARGS : 0;
+\& cx\->blk_sub.retop = retop;
+\& SvREFCNT_inc_simple_void_NN(cv);
+\& */
+.Ve
+.PP
+Note that \f(CWcx_pushblock()\fR sets two new floors: for the args stack (to
+\&\f(CW\*(C`MARK\*(C'\fR) and the temps stack (to \f(CW\*(C`PL_tmps_ix\*(C'\fR). While executing at this
+scope level, every \f(CW\*(C`nextstate\*(C'\fR (amongst others) will reset the args and
+tmps stack levels to these floors. Note that since \f(CW\*(C`cx_pushblock\*(C'\fR uses
+the current value of \f(CW\*(C`PL_tmps_ix\*(C'\fR rather than it being passed as an arg,
+this dictates at what point \f(CW\*(C`cx_pushblock\*(C'\fR should be called. In
+particular, any new mortals which should be freed only on scope exit
+(rather than at the next \f(CW\*(C`nextstate\*(C'\fR) should be created first.
+.PP
+Most callers of \f(CW\*(C`cx_pushblock\*(C'\fR simply set the new args stack floor to the
+top of the previous stack frame, but for \f(CW\*(C`CXt_LOOP_LIST\*(C'\fR it stores the
+items being iterated over on the stack, and so sets \f(CW\*(C`blk_oldsp\*(C'\fR to the
+top of these items instead. Note that, contrary to its name, \f(CW\*(C`blk_oldsp\*(C'\fR
+doesn't always represent the value to restore \f(CW\*(C`PL_stack_sp\*(C'\fR to on scope
+exit.
+.PP
+Note the early capture of \f(CW\*(C`PL_savestack_ix\*(C'\fR to \f(CW\*(C`old_ss_ix\*(C'\fR, which is
+later passed as an arg to \f(CW\*(C`cx_pushblock\*(C'\fR. In the case of \f(CW\*(C`pp_entersub\*(C'\fR,
+this is because, although most values needing saving are stored in fields
+of the context struct, an extra value needs saving only when the debugger
+is running, and it doesn't make sense to bloat the struct for this rare
+case. So instead it is saved on the savestack. Since this value gets
+calculated and saved before the context is pushed, it is necessary to pass
+the old value of \f(CW\*(C`PL_savestack_ix\*(C'\fR to \f(CW\*(C`cx_pushblock\*(C'\fR, to ensure that the
+saved value gets freed during scope exit. For most users of
+\&\f(CW\*(C`cx_pushblock\*(C'\fR, where nothing needs pushing on the save stack,
+\&\f(CW\*(C`PL_savestack_ix\*(C'\fR is just passed directly as an arg to \f(CW\*(C`cx_pushblock\*(C'\fR.
+.PP
+Note that where possible, values should be saved in the context struct
+rather than on the save stack; it's much faster that way.
+.PP
+Normally \f(CW\*(C`cx_pushblock\*(C'\fR should be immediately followed by the appropriate
+\&\f(CW\*(C`cx_pushfoo\*(C'\fR, with nothing between them; this is because if code
+in-between could die (e.g. a warning upgraded to fatal), then the context
+stack unwinding code in \f(CW\*(C`dounwind\*(C'\fR would see (in the example above) a
+\&\f(CW\*(C`CXt_SUB\*(C'\fR context frame, but without all the subroutine-specific fields
+set, and crashes would soon ensue.
+.PP
+Where the two must be separate, initially set the type to \f(CW\*(C`CXt_NULL\*(C'\fR or
+\&\f(CW\*(C`CXt_BLOCK\*(C'\fR, and later change it to \f(CW\*(C`CXt_foo\*(C'\fR when doing the
+\&\f(CW\*(C`cx_pushfoo\*(C'\fR. This is exactly what \f(CW\*(C`pp_enteriter\*(C'\fR does, once it's
+determined which type of loop it's pushing.
+.SS "Popping contexts"
+.IX Subsection "Popping contexts"
+Contexts are popped using \f(CWcx_popsub()\fR etc. and \f(CWcx_popblock()\fR. Note
+however, that unlike \f(CW\*(C`cx_pushblock\*(C'\fR, neither of these functions actually
+decrement the current context stack index; this is done separately using
+\&\f(CWCX_POP()\fR.
+.PP
+There are two main ways that contexts are popped. During normal execution
+as scopes are exited, functions like \f(CW\*(C`pp_leave\*(C'\fR, \f(CW\*(C`pp_leaveloop\*(C'\fR and
+\&\f(CW\*(C`pp_leavesub\*(C'\fR process and pop just one context using \f(CW\*(C`cx_popfoo\*(C'\fR and
+\&\f(CW\*(C`cx_popblock\*(C'\fR. On the other hand, things like \f(CW\*(C`pp_return\*(C'\fR and \f(CW\*(C`next\*(C'\fR
+may have to pop back several scopes until a sub or loop context is found,
+and exceptions (such as \f(CW\*(C`die\*(C'\fR) need to pop back contexts until an eval
+context is found. Both of these are accomplished by \f(CWdounwind()\fR, which
+is capable of processing and popping all contexts above the target one.
+.PP
+Here is a typical example of context popping, as found in \f(CW\*(C`pp_leavesub\*(C'\fR
+(simplified slightly):
+.PP
+.Vb 4
+\& U8 gimme;
+\& PERL_CONTEXT *cx;
+\& SV **oldsp;
+\& OP *retop;
+\&
+\& cx = CX_CUR();
+\&
+\& gimme = cx\->blk_gimme;
+\& oldsp = PL_stack_base + cx\->blk_oldsp; /* last arg of previous frame */
+\&
+\& if (gimme == G_VOID)
+\& PL_stack_sp = oldsp;
+\& else
+\& leave_adjust_stacks(oldsp, oldsp, gimme, 0);
+\&
+\& CX_LEAVE_SCOPE(cx);
+\& cx_popsub(cx);
+\& cx_popblock(cx);
+\& retop = cx\->blk_sub.retop;
+\& CX_POP(cx);
+\&
+\& return retop;
+.Ve
+.PP
+The steps above are in a very specific order, designed to be the reverse
+order of when the context was pushed. The first thing to do is to copy
+and/or protect any return arguments and free any temps in the current
+scope. Scope exits like an rvalue sub normally return a mortal copy of
+their return args (as opposed to lvalue subs). It is important to make
+this copy before the save stack is popped or variables are restored, or
+bad things like the following can happen:
+.PP
+.Vb 2
+\& sub f { my $x =...; $x } # $x freed before we get to copy it
+\& sub f { /(...)/; $1 } # PL_curpm restored before $1 copied
+.Ve
+.PP
+Although we wish to free any temps at the same time, we have to be careful
+not to free any temps which are keeping return args alive; nor to free the
+temps we have just created while mortal copying return args. Fortunately,
+\&\f(CWleave_adjust_stacks()\fR is capable of making mortal copies of return args,
+shifting args down the stack, and only processing those entries on the
+temps stack that are safe to do so.
+.PP
+In void context no args are returned, so it's more efficient to skip
+calling \f(CWleave_adjust_stacks()\fR. Also in void context, a \f(CW\*(C`nextstate\*(C'\fR op
+is likely to be imminently called which will do a \f(CW\*(C`FREETMPS\*(C'\fR, so there's
+no need to do that either.
+.PP
+The next step is to pop savestack entries: \f(CWCX_LEAVE_SCOPE(cx)\fR is just
+defined as \f(CWLEAVE_SCOPE(cx\->blk_oldsaveix)\fR. Note that during the
+popping, it's possible for perl to call destructors, call \f(CW\*(C`STORE\*(C'\fR to undo
+localisations of tied vars, and so on. Any of these can die or call
+\&\f(CWexit()\fR. In this case, \f(CWdounwind()\fR will be called, and the current
+context stack frame will be re-processed. Thus it is vital that all steps
+in popping a context are done in such a way to support reentrancy. The
+other alternative, of decrementing \f(CW\*(C`cxstack_ix\*(C'\fR \fIbefore\fR processing the
+frame, would lead to leaks and the like if something died halfway through,
+or overwriting of the current frame.
+.PP
+\&\f(CW\*(C`CX_LEAVE_SCOPE\*(C'\fR itself is safely re-entrant: if only half the savestack
+items have been popped before dying and getting trapped by eval, then the
+\&\f(CW\*(C`CX_LEAVE_SCOPE\*(C'\fRs in \f(CW\*(C`dounwind\*(C'\fR or \f(CW\*(C`pp_leaveeval\*(C'\fR will continue where
+the first one left off.
+.PP
+The next step is the type-specific context processing; in this case
+\&\f(CW\*(C`cx_popsub\*(C'\fR. In part, this looks like:
+.PP
+.Vb 4
+\& cv = cx\->blk_sub.cv;
+\& CvDEPTH(cv) = cx\->blk_sub.olddepth;
+\& cx\->blk_sub.cv = NULL;
+\& SvREFCNT_dec(cv);
+.Ve
+.PP
+where its processing the just-executed CV. Note that before it decrements
+the CV's reference count, it nulls the \f(CW\*(C`blk_sub.cv\*(C'\fR. This means that if
+it re-enters, the CV won't be freed twice. It also means that you can't
+rely on such type-specific fields having useful values after the return
+from \f(CW\*(C`cx_popfoo\*(C'\fR.
+.PP
+Next, \f(CW\*(C`cx_popblock\*(C'\fR restores all the various interpreter vars to their
+previous values or previous high water marks; it expands to:
+.PP
+.Vb 5
+\& PL_markstack_ptr = PL_markstack + cx\->blk_oldmarksp;
+\& PL_scopestack_ix = cx\->blk_oldscopesp;
+\& PL_curpm = cx\->blk_oldpm;
+\& PL_curcop = cx\->blk_oldcop;
+\& PL_tmps_floor = cx\->blk_old_tmpsfloor;
+.Ve
+.PP
+Note that it \fIdoesn't\fR restore \f(CW\*(C`PL_stack_sp\*(C'\fR; as mentioned earlier,
+which value to restore it to depends on the context type (specifically
+\&\f(CW\*(C`for (list) {}\*(C'\fR), and what args (if any) it returns; and that will
+already have been sorted out earlier by \f(CWleave_adjust_stacks()\fR.
+.PP
+Finally, the context stack pointer is actually decremented by \f(CWCX_POP(cx)\fR.
+After this point, it's possible that that the current context frame could
+be overwritten by other contexts being pushed. Although things like ties
+and \f(CW\*(C`DESTROY\*(C'\fR are supposed to work within a new context stack, it's best
+not to assume this. Indeed on debugging builds, \f(CWCX_POP(cx)\fR deliberately
+sets \f(CW\*(C`cx\*(C'\fR to null to detect code that is still relying on the field
+values in that context frame. Note in the \f(CWpp_leavesub()\fR example above,
+we grab \f(CW\*(C`blk_sub.retop\*(C'\fR \fIbefore\fR calling \f(CW\*(C`CX_POP\*(C'\fR.
+.SS "Redoing contexts"
+.IX Subsection "Redoing contexts"
+Finally, there is \f(CWcx_topblock(cx)\fR, which acts like a super\-\f(CW\*(C`nextstate\*(C'\fR
+as regards to resetting various vars to their base values. It is used in
+places like \f(CW\*(C`pp_next\*(C'\fR, \f(CW\*(C`pp_redo\*(C'\fR and \f(CW\*(C`pp_goto\*(C'\fR where rather than
+exiting a scope, we want to re-initialise the scope. As well as resetting
+\&\f(CW\*(C`PL_stack_sp\*(C'\fR like \f(CW\*(C`nextstate\*(C'\fR, it also resets \f(CW\*(C`PL_markstack_ptr\*(C'\fR,
+\&\f(CW\*(C`PL_scopestack_ix\*(C'\fR and \f(CW\*(C`PL_curpm\*(C'\fR. Note that it doesn't do a
+\&\f(CW\*(C`FREETMPS\*(C'\fR.
+.SH "Slab-based operator allocation"
+.IX Header "Slab-based operator allocation"
+\&\fBNote:\fR this section describes a non-public internal API that is subject
+to change without notice.
+.PP
+Perl's internal error-handling mechanisms implement \f(CW\*(C`die\*(C'\fR (and its internal
+equivalents) using longjmp. If this occurs during lexing, parsing or
+compilation, we must ensure that any ops allocated as part of the compilation
+process are freed. (Older Perl versions did not adequately handle this
+situation: when failing a parse, they would leak ops that were stored in
+C \f(CW\*(C`auto\*(C'\fR variables and not linked anywhere else.)
+.PP
+To handle this situation, Perl uses \fIop slabs\fR that are attached to the
+currently-compiling CV. A slab is a chunk of allocated memory. New ops are
+allocated as regions of the slab. If the slab fills up, a new one is created
+(and linked from the previous one). When an error occurs and the CV is freed,
+any ops remaining are freed.
+.PP
+Each op is preceded by two pointers: one points to the next op in the slab, and
+the other points to the slab that owns it. The next-op pointer is needed so
+that Perl can iterate over a slab and free all its ops. (Op structures are of
+different sizes, so the slab's ops can't merely be treated as a dense array.)
+The slab pointer is needed for accessing a reference count on the slab: when
+the last op on a slab is freed, the slab itself is freed.
+.PP
+The slab allocator puts the ops at the end of the slab first. This will tend to
+allocate the leaves of the op tree first, and the layout will therefore
+hopefully be cache-friendly. In addition, this means that there's no need to
+store the size of the slab (see below on why slabs vary in size), because Perl
+can follow pointers to find the last op.
+.PP
+It might seem possible to eliminate slab reference counts altogether, by having
+all ops implicitly attached to \f(CW\*(C`PL_compcv\*(C'\fR when allocated and freed when the
+CV is freed. That would also allow \f(CW\*(C`op_free\*(C'\fR to skip \f(CW\*(C`FreeOp\*(C'\fR altogether, and
+thus free ops faster. But that doesn't work in those cases where ops need to
+survive beyond their CVs, such as re-evals.
+.PP
+The CV also has to have a reference count on the slab. Sometimes the first op
+created is immediately freed. If the reference count of the slab reaches 0,
+then it will be freed with the CV still pointing to it.
+.PP
+CVs use the \f(CW\*(C`CVf_SLABBED\*(C'\fR flag to indicate that the CV has a reference count
+on the slab. When this flag is set, the slab is accessible via \f(CW\*(C`CvSTART\*(C'\fR when
+\&\f(CW\*(C`CvROOT\*(C'\fR is not set, or by subtracting two pointers \f(CW\*(C`(2*sizeof(I32 *))\*(C'\fR from
+\&\f(CW\*(C`CvROOT\*(C'\fR when it is set. The alternative to this approach of sneaking the slab
+into \f(CW\*(C`CvSTART\*(C'\fR during compilation would be to enlarge the \f(CW\*(C`xpvcv\*(C'\fR struct by
+another pointer. But that would make all CVs larger, even though slab-based op
+freeing is typically of benefit only for programs that make significant use of
+string eval.
+.PP
+When the \f(CW\*(C`CVf_SLABBED\*(C'\fR flag is set, the CV takes responsibility for freeing
+the slab. If \f(CW\*(C`CvROOT\*(C'\fR is not set when the CV is freed or undeffed, it is
+assumed that a compilation error has occurred, so the op slab is traversed and
+all the ops are freed.
+.PP
+Under normal circumstances, the CV forgets about its slab (decrementing the
+reference count) when the root is attached. So the slab reference counting that
+happens when ops are freed takes care of freeing the slab. In some cases, the
+CV is told to forget about the slab (\f(CW\*(C`cv_forget_slab\*(C'\fR) precisely so that the
+ops can survive after the CV is done away with.
+.PP
+Forgetting the slab when the root is attached is not strictly necessary, but
+avoids potential problems with \f(CW\*(C`CvROOT\*(C'\fR being written over. There is code all
+over the place, both in core and on CPAN, that does things with \f(CW\*(C`CvROOT\*(C'\fR, so
+forgetting the slab makes things more robust and avoids potential problems.
+.PP
+Since the CV takes ownership of its slab when flagged, that flag is never
+copied when a CV is cloned, as one CV could free a slab that another CV still
+points to, since forced freeing of ops ignores the reference count (but asserts
+that it looks right).
+.PP
+To avoid slab fragmentation, freed ops are marked as freed and attached to the
+slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused
+when possible. Not reusing freed ops would be simpler, but it would result in
+significantly higher memory usage for programs with large \f(CW\*(C`if (DEBUG) {...}\*(C'\fR
+blocks.
+.PP
+\&\f(CW\*(C`SAVEFREEOP\*(C'\fR is slightly problematic under this scheme. Sometimes it can cause
+an op to be freed after its CV. If the CV has forcibly freed the ops on its
+slab and the slab itself, then we will be fiddling with a freed slab. Making
+\&\f(CW\*(C`SAVEFREEOP\*(C'\fR a no-op doesn't help, as sometimes an op can be savefreed when
+there is no compilation error, so the op would never be freed. It holds
+a reference count on the slab, so the whole slab would leak. So \f(CW\*(C`SAVEFREEOP\*(C'\fR
+now sets a special flag on the op (\f(CW\*(C`\->op_savefree\*(C'\fR). The forced freeing of
+ops after a compilation error won't free any ops thus marked.
+.PP
+Since many pieces of code create tiny subroutines consisting of only a few ops,
+and since a huge slab would be quite a bit of baggage for those to carry
+around, the first slab is always very small. To avoid allocating too many
+slabs for a single CV, each subsequent slab is twice the size of the previous.
+.PP
+Smartmatch expects to be able to allocate an op at run time, run it, and then
+throw it away. For that to work the op is simply malloced when \f(CW\*(C`PL_compcv\*(C'\fR hasn't
+been set up. So all slab-allocated ops are marked as such (\f(CW\*(C`\->op_slabbed\*(C'\fR),
+to distinguish them from malloced ops.
+.SH AUTHORS
+.IX Header "AUTHORS"
+Until May 1997, this document was maintained by Jeff Okamoto
+<okamoto@corp.hp.com>. It is now maintained as part of Perl
+itself by the Perl 5 Porters <perl5\-porters@perl.org>.
+.PP
+With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
+Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
+Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
+Stephen McCamant, and Gurusamy Sarathy.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+perlapi, perlintern, perlxs, perlembed