From fc22b3d6507c6745911b9dfcc68f1e665ae13dbc Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Mon, 15 Apr 2024 21:43:11 +0200 Subject: Adding upstream version 4.22.0. Signed-off-by: Daniel Baumann --- upstream/fedora-rawhide/man1/perlguts.1 | 4465 +++++++++++++++++++++++++++++++ 1 file changed, 4465 insertions(+) create mode 100644 upstream/fedora-rawhide/man1/perlguts.1 (limited to 'upstream/fedora-rawhide/man1/perlguts.1') diff --git a/upstream/fedora-rawhide/man1/perlguts.1 b/upstream/fedora-rawhide/man1/perlguts.1 new file mode 100644 index 00000000..fe69a378 --- /dev/null +++ b/upstream/fedora-rawhide/man1/perlguts.1 @@ -0,0 +1,4465 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLGUTS 1" +.TH PERLGUTS 1 2024-01-25 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perlguts \- Introduction to the Perl API +.SH DESCRIPTION +.IX Header "DESCRIPTION" +This document attempts to describe how to use the Perl API, as well as +to provide some info on the basic workings of the Perl core. It is far +from complete and probably contains many errors. Please refer any +questions or comments to the author below. +.SH Variables +.IX Header "Variables" +.SS Datatypes +.IX Subsection "Datatypes" +Perl has three typedefs that handle Perl's three main data types: +.PP +.Vb 3 +\& SV Scalar Value +\& AV Array Value +\& HV Hash Value +.Ve +.PP +Each typedef has specific routines that manipulate the various data types. +.SS "What is an ""IV""?" +.IX Subsection "What is an ""IV""?" +Perl uses a special typedef IV which is a simple signed integer type that is +guaranteed to be large enough to hold a pointer (as well as an integer). +Additionally, there is the UV, which is simply an unsigned IV. +.PP +Perl also uses several special typedefs to declare variables to hold +integers of (at least) a given size. +Use I8, I16, I32, and I64 to declare a signed integer variable which has +at least as many bits as the number in its name. These all evaluate to +the native C type that is closest to the given number of bits, but no +smaller than that number. For example, on many platforms, a \f(CW\*(C`short\*(C'\fR is +16 bits long, and if so, I16 will evaluate to a \f(CW\*(C`short\*(C'\fR. But on +platforms where a \f(CW\*(C`short\*(C'\fR isn't exactly 16 bits, Perl will use the +smallest type that contains 16 bits or more. +.PP +U8, U16, U32, and U64 are to declare the corresponding unsigned integer +types. +.PP +If the platform doesn't support 64\-bit integers, both I64 and U64 will +be undefined. Use IV and UV to declare the largest practicable, and +\&\f(CW\*(C`"WIDEST_UTYPE" in perlapi\*(C'\fR for the absolute maximum unsigned, but which +may not be usable in all circumstances. +.PP +A numeric constant can be specified with "\f(CW\*(C`INT16_C\*(C'\fR" in perlapi, +"\f(CW\*(C`UINTMAX_C\*(C'\fR" in perlapi, and similar. +.SS "Working with SVs" +.IX Subsection "Working with SVs" +An SV can be created and loaded with one command. There are five types of +values that can be loaded: an integer value (IV), an unsigned integer +value (UV), a double (NV), a string (PV), and another scalar (SV). +("PV" stands for "Pointer Value". You might think that it is misnamed +because it is described as pointing only to strings. However, it is +possible to have it point to other things. For example, it could point +to an array of UVs. But, +using it for non-strings requires care, as the underlying assumption of +much of the internals is that PVs are just for strings. Often, for +example, a trailing \f(CW\*(C`NUL\*(C'\fR is tacked on automatically. The non-string use +is documented only in this paragraph.) +.PP +The seven routines are: +.PP +.Vb 7 +\& SV* newSViv(IV); +\& SV* newSVuv(UV); +\& SV* newSVnv(double); +\& SV* newSVpv(const char*, STRLEN); +\& SV* newSVpvn(const char*, STRLEN); +\& SV* newSVpvf(const char*, ...); +\& SV* newSVsv(SV*); +.Ve +.PP +\&\f(CW\*(C`STRLEN\*(C'\fR is an integer type (\f(CW\*(C`Size_t\*(C'\fR, usually defined as \f(CW\*(C`size_t\*(C'\fR in +\&\fIconfig.h\fR) guaranteed to be large enough to represent the size of +any string that perl can handle. +.PP +In the unlikely case of a SV requiring more complex initialization, you +can create an empty SV with newSV(len). If \f(CW\*(C`len\*(C'\fR is 0 an empty SV of +type NULL is returned, else an SV of type PV is returned with len + 1 (for +the \f(CW\*(C`NUL\*(C'\fR) bytes of storage allocated, accessible via SvPVX. In both cases +the SV has the undef value. +.PP +.Vb 3 +\& SV *sv = newSV(0); /* no storage allocated */ +\& SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage +\& * allocated */ +.Ve +.PP +To change the value of an \fIalready-existing\fR SV, there are eight routines: +.PP +.Vb 9 +\& void sv_setiv(SV*, IV); +\& void sv_setuv(SV*, UV); +\& void sv_setnv(SV*, double); +\& void sv_setpv(SV*, const char*); +\& void sv_setpvn(SV*, const char*, STRLEN) +\& void sv_setpvf(SV*, const char*, ...); +\& void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, +\& SV **, Size_t, bool *); +\& void sv_setsv(SV*, SV*); +.Ve +.PP +Notice that you can choose to specify the length of the string to be +assigned by using \f(CW\*(C`sv_setpvn\*(C'\fR, \f(CW\*(C`newSVpvn\*(C'\fR, or \f(CW\*(C`newSVpv\*(C'\fR, or you may +allow Perl to calculate the length by using \f(CW\*(C`sv_setpv\*(C'\fR or by specifying +0 as the second argument to \f(CW\*(C`newSVpv\*(C'\fR. Be warned, though, that Perl will +determine the string's length by using \f(CW\*(C`strlen\*(C'\fR, which depends on the +string terminating with a \f(CW\*(C`NUL\*(C'\fR character, and not otherwise containing +NULs. +.PP +The arguments of \f(CW\*(C`sv_setpvf\*(C'\fR are processed like \f(CW\*(C`sprintf\*(C'\fR, and the +formatted output becomes the value. +.PP +\&\f(CW\*(C`sv_vsetpvfn\*(C'\fR is an analogue of \f(CW\*(C`vsprintf\*(C'\fR, but it allows you to specify +either a pointer to a variable argument list or the address and length of +an array of SVs. The last argument points to a boolean; on return, if that +boolean is true, then locale-specific information has been used to format +the string, and the string's contents are therefore untrustworthy (see +perlsec). This pointer may be NULL if that information is not +important. Note that this function requires you to specify the length of +the format. +.PP +The \f(CW\*(C`sv_set*()\*(C'\fR functions are not generic enough to operate on values +that have "magic". See "Magic Virtual Tables" later in this document. +.PP +All SVs that contain strings should be terminated with a \f(CW\*(C`NUL\*(C'\fR character. +If it is not \f(CW\*(C`NUL\*(C'\fR\-terminated there is a risk of +core dumps and corruptions from code which passes the string to C +functions or system calls which expect a \f(CW\*(C`NUL\*(C'\fR\-terminated string. +Perl's own functions typically add a trailing \f(CW\*(C`NUL\*(C'\fR for this reason. +Nevertheless, you should be very careful when you pass a string stored +in an SV to a C function or system call. +.PP +To access the actual value that an SV points to, Perl's API exposes +several macros that coerce the actual scalar type into an IV, UV, double, +or string: +.IP \(bu 4 +\&\f(CWSvIV(SV*)\fR (\f(CW\*(C`IV\*(C'\fR) and \f(CWSvUV(SV*)\fR (\f(CW\*(C`UV\*(C'\fR) +.IP \(bu 4 +\&\f(CWSvNV(SV*)\fR (\f(CW\*(C`double\*(C'\fR) +.IP \(bu 4 +Strings are a bit complicated: +.RS 4 +.IP \(bu 4 +Byte string: \f(CW\*(C`SvPVbyte(SV*, STRLEN len)\*(C'\fR or \f(CWSvPVbyte_nolen(SV*)\fR +.Sp +If the Perl string is \f(CW"\exff\exff"\fR, then this returns a 2\-byte \f(CW\*(C`char*\*(C'\fR. +.Sp +This is suitable for Perl strings that represent bytes. +.IP \(bu 4 +UTF\-8 string: \f(CW\*(C`SvPVutf8(SV*, STRLEN len)\*(C'\fR or \f(CWSvPVutf8_nolen(SV*)\fR +.Sp +If the Perl string is \f(CW"\exff\exff"\fR, then this returns a 4\-byte \f(CW\*(C`char*\*(C'\fR. +.Sp +This is suitable for Perl strings that represent characters. +.Sp +\&\fBCAVEAT\fR: That \f(CW\*(C`char*\*(C'\fR will be encoded via Perl's internal UTF\-8 variant, +which means that if the SV contains non-Unicode code points (e.g., +0x110000), then the result may contain extensions over valid UTF\-8. +See "is_strict_utf8_string" in perlapi for some methods Perl gives +you to check the UTF\-8 validity of these macros' returns. +.IP \(bu 4 +You can also use \f(CW\*(C`SvPV(SV*, STRLEN len)\*(C'\fR or \f(CWSvPV_nolen(SV*)\fR +to fetch the SV's raw internal buffer. This is tricky, though; if your Perl +string +is \f(CW"\exff\exff"\fR, then depending on the SV's internal encoding you might get +back a 2\-byte \fBOR\fR a 4\-byte \f(CW\*(C`char*\*(C'\fR. +Moreover, if it's the 4\-byte string, that could come from either Perl +\&\f(CW"\exff\exff"\fR stored UTF\-8 encoded, or Perl \f(CW"\exc3\exbf\exc3\exbf"\fR stored +as raw octets. To differentiate between these you \fBMUST\fR look up the +SV's UTF8 bit (cf. \f(CW\*(C`SvUTF8\*(C'\fR) to know whether the source Perl string +is 2 characters (\f(CW\*(C`SvUTF8\*(C'\fR would be on) or 4 characters (\f(CW\*(C`SvUTF8\*(C'\fR would be +off). +.Sp +\&\fBIMPORTANT:\fR Use of \f(CW\*(C`SvPV\*(C'\fR, \f(CW\*(C`SvPV_nolen\*(C'\fR, or +similarly-named macros \fIwithout\fR looking up the SV's UTF8 bit is +almost certainly a bug if non-ASCII input is allowed. +.Sp +When the UTF8 bit is on, the same \fBCAVEAT\fR about UTF\-8 validity applies +here as for \f(CW\*(C`SvPVutf8\*(C'\fR. +.RE +.RS 4 +.Sp +(See "How do I pass a Perl string to a C library?" for more details.) +.Sp +In \f(CW\*(C`SvPVbyte\*(C'\fR, \f(CW\*(C`SvPVutf8\*(C'\fR, and \f(CW\*(C`SvPV\*(C'\fR, the length of the \f(CW\*(C`char*\*(C'\fR returned +is placed into the +variable \f(CW\*(C`len\*(C'\fR (these are macros, so you do \fInot\fR use \f(CW&len\fR). If you do +not care what the length of the data is, use \f(CW\*(C`SvPVbyte_nolen\*(C'\fR, +\&\f(CW\*(C`SvPVutf8_nolen\*(C'\fR, or \f(CW\*(C`SvPV_nolen\*(C'\fR instead. +The global variable \f(CW\*(C`PL_na\*(C'\fR can also be given to +\&\f(CW\*(C`SvPVbyte\*(C'\fR/\f(CW\*(C`SvPVutf8\*(C'\fR/\f(CW\*(C`SvPV\*(C'\fR +in this case. But that can be quite inefficient because \f(CW\*(C`PL_na\*(C'\fR must +be accessed in thread-local storage in threaded Perl. In any case, remember +that Perl allows arbitrary strings of data that may both contain NULs and +might not be terminated by a \f(CW\*(C`NUL\*(C'\fR. +.Sp +Also remember that C doesn't allow you to safely say \f(CW\*(C`foo(SvPVbyte(s, len), +len);\*(C'\fR. It might work with your +compiler, but it won't work for everyone. +Break this sort of statement up into separate assignments: +.Sp +.Vb 5 +\& SV *s; +\& STRLEN len; +\& char *ptr; +\& ptr = SvPVbyte(s, len); +\& foo(ptr, len); +.Ve +.RE +.PP +If you want to know if the scalar value is TRUE, you can use: +.PP +.Vb 1 +\& SvTRUE(SV*) +.Ve +.PP +Although Perl will automatically grow strings for you, if you need to force +Perl to allocate more memory for your SV, you can use the macro +.PP +.Vb 1 +\& SvGROW(SV*, STRLEN newlen) +.Ve +.PP +which will determine if more memory needs to be allocated. If so, it will +call the function \f(CW\*(C`sv_grow\*(C'\fR. Note that \f(CW\*(C`SvGROW\*(C'\fR can only increase, not +decrease, the allocated memory of an SV and that it does not automatically +add space for the trailing \f(CW\*(C`NUL\*(C'\fR byte (perl's own string functions typically do +\&\f(CW\*(C`SvGROW(sv, len + 1)\*(C'\fR). +.PP +If you want to write to an existing SV's buffer and set its value to a +string, use \fBSvPVbyte_force()\fR or one of its variants to force the SV to be +a PV. This will remove any of various types of non-stringness from +the SV while preserving the content of the SV in the PV. This can be +used, for example, to append data from an API function to a buffer +without extra copying: +.PP +.Vb 11 +\& (void)SvPVbyte_force(sv, len); +\& s = SvGROW(sv, len + needlen + 1); +\& /* something that modifies up to needlen bytes at s+len, but +\& modifies newlen bytes +\& eg. newlen = read(fd, s + len, needlen); +\& ignoring errors for these examples +\& */ +\& s[len + newlen] = \*(Aq\e0\*(Aq; +\& SvCUR_set(sv, len + newlen); +\& SvUTF8_off(sv); +\& SvSETMAGIC(sv); +.Ve +.PP +If you already have the data in memory or if you want to keep your +code simple, you can use one of the sv_cat*() variants, such as +\&\fBsv_catpvn()\fR. If you want to insert anywhere in the string you can use +\&\fBsv_insert()\fR or \fBsv_insert_flags()\fR. +.PP +If you don't need the existing content of the SV, you can avoid some +copying with: +.PP +.Vb 10 +\& SvPVCLEAR(sv); +\& s = SvGROW(sv, needlen + 1); +\& /* something that modifies up to needlen bytes at s, but modifies +\& newlen bytes +\& eg. newlen = read(fd, s, needlen); +\& */ +\& s[newlen] = \*(Aq\e0\*(Aq; +\& SvCUR_set(sv, newlen); +\& SvPOK_only(sv); /* also clears SVf_UTF8 */ +\& SvSETMAGIC(sv); +.Ve +.PP +Again, if you already have the data in memory or want to avoid the +complexity of the above, you can use \fBsv_setpvn()\fR. +.PP +If you have a buffer allocated with \fBNewx()\fR and want to set that as the +SV's value, you can use \fBsv_usepvn_flags()\fR. That has some requirements +if you want to avoid perl re-allocating the buffer to fit the trailing +NUL: +.PP +.Vb 5 +\& Newx(buf, somesize+1, char); +\& /* ... fill in buf ... */ +\& buf[somesize] = \*(Aq\e0\*(Aq; +\& sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); +\& /* buf now belongs to perl, don\*(Aqt release it */ +.Ve +.PP +If you have an SV and want to know what kind of data Perl thinks is stored +in it, you can use the following macros to check the type of SV you have. +.PP +.Vb 3 +\& SvIOK(SV*) +\& SvNOK(SV*) +\& SvPOK(SV*) +.Ve +.PP +Be aware that retrieving the numeric value of an SV can set IOK or NOK +on that SV, even when the SV started as a string. Prior to Perl +5.36.0 retrieving the string value of an integer could set POK, but +this can no longer occur. From 5.36.0 this can be used to distinguish +the original representation of an SV and is intended to make life +simpler for serializers: +.PP +.Vb 10 +\& /* references handled elsewhere */ +\& if (SvIsBOOL(sv)) { +\& /* originally boolean */ +\& ... +\& } +\& else if (SvPOK(sv)) { +\& /* originally a string */ +\& ... +\& } +\& else if (SvNIOK(sv)) { +\& /* originally numeric */ +\& ... +\& } +\& else { +\& /* something special or undef */ +\& } +.Ve +.PP +You can get and set the current length of the string stored in an SV with +the following macros: +.PP +.Vb 2 +\& SvCUR(SV*) +\& SvCUR_set(SV*, I32 val) +.Ve +.PP +You can also get a pointer to the end of the string stored in the SV +with the macro: +.PP +.Vb 1 +\& SvEND(SV*) +.Ve +.PP +But note that these last three macros are valid only if \f(CWSvPOK()\fR is true. +.PP +If you want to append something to the end of string stored in an \f(CW\*(C`SV*\*(C'\fR, +you can use the following functions: +.PP +.Vb 6 +\& void sv_catpv(SV*, const char*); +\& void sv_catpvn(SV*, const char*, STRLEN); +\& void sv_catpvf(SV*, const char*, ...); +\& void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, +\& I32, bool); +\& void sv_catsv(SV*, SV*); +.Ve +.PP +The first function calculates the length of the string to be appended by +using \f(CW\*(C`strlen\*(C'\fR. In the second, you specify the length of the string +yourself. The third function processes its arguments like \f(CW\*(C`sprintf\*(C'\fR and +appends the formatted output. The fourth function works like \f(CW\*(C`vsprintf\*(C'\fR. +You can specify the address and length of an array of SVs instead of the +va_list argument. The fifth function +extends the string stored in the first +SV with the string stored in the second SV. It also forces the second SV +to be interpreted as a string. +.PP +The \f(CW\*(C`sv_cat*()\*(C'\fR functions are not generic enough to operate on values that +have "magic". See "Magic Virtual Tables" later in this document. +.PP +If you know the name of a scalar variable, you can get a pointer to its SV +by using the following: +.PP +.Vb 1 +\& SV* get_sv("package::varname", 0); +.Ve +.PP +This returns NULL if the variable does not exist. +.PP +If you want to know if this variable (or any other SV) is actually \f(CW\*(C`defined\*(C'\fR, +you can call: +.PP +.Vb 1 +\& SvOK(SV*) +.Ve +.PP +The scalar \f(CW\*(C`undef\*(C'\fR value is stored in an SV instance called \f(CW\*(C`PL_sv_undef\*(C'\fR. +.PP +Its address can be used whenever an \f(CW\*(C`SV*\*(C'\fR is needed. Make sure that +you don't try to compare a random sv with \f(CW&PL_sv_undef\fR. For example +when interfacing Perl code, it'll work correctly for: +.PP +.Vb 1 +\& foo(undef); +.Ve +.PP +But won't work when called as: +.PP +.Vb 2 +\& $x = undef; +\& foo($x); +.Ve +.PP +So to repeat always use \fBSvOK()\fR to check whether an sv is defined. +.PP +Also you have to be careful when using \f(CW&PL_sv_undef\fR as a value in +AVs or HVs (see "AVs, HVs and undefined values"). +.PP +There are also the two values \f(CW\*(C`PL_sv_yes\*(C'\fR and \f(CW\*(C`PL_sv_no\*(C'\fR, which contain +boolean TRUE and FALSE values, respectively. Like \f(CW\*(C`PL_sv_undef\*(C'\fR, their +addresses can be used whenever an \f(CW\*(C`SV*\*(C'\fR is needed. +.PP +Do not be fooled into thinking that \f(CW\*(C`(SV *) 0\*(C'\fR is the same as \f(CW&PL_sv_undef\fR. +Take this code: +.PP +.Vb 5 +\& SV* sv = (SV*) 0; +\& if (I\-am\-to\-return\-a\-real\-value) { +\& sv = sv_2mortal(newSViv(42)); +\& } +\& sv_setsv(ST(0), sv); +.Ve +.PP +This code tries to return a new SV (which contains the value 42) if it should +return a real value, or undef otherwise. Instead it has returned a NULL +pointer which, somewhere down the line, will cause a segmentation violation, +bus error, or just weird results. Change the zero to \f(CW&PL_sv_undef\fR in the +first line and all will be well. +.PP +To free an SV that you've created, call \f(CWSvREFCNT_dec(SV*)\fR. Normally this +call is not necessary (see "Reference Counts and Mortality"). +.SS Offsets +.IX Subsection "Offsets" +Perl provides the function \f(CW\*(C`sv_chop\*(C'\fR to efficiently remove characters +from the beginning of a string; you give it an SV and a pointer to +somewhere inside the PV, and it discards everything before the +pointer. The efficiency comes by means of a little hack: instead of +actually removing the characters, \f(CW\*(C`sv_chop\*(C'\fR sets the flag \f(CW\*(C`OOK\*(C'\fR +(offset OK) to signal to other functions that the offset hack is in +effect, and it moves the PV pointer (called \f(CW\*(C`SvPVX\*(C'\fR) forward +by the number of bytes chopped off, and adjusts \f(CW\*(C`SvCUR\*(C'\fR and \f(CW\*(C`SvLEN\*(C'\fR +accordingly. (A portion of the space between the old and new PV +pointers is used to store the count of chopped bytes.) +.PP +Hence, at this point, the start of the buffer that we allocated lives +at \f(CW\*(C`SvPVX(sv) \- SvIV(sv)\*(C'\fR in memory and the PV pointer is pointing +into the middle of this allocated storage. +.PP +This is best demonstrated by example. Normally copy-on-write will prevent +the substitution from operator from using this hack, but if you can craft a +string for which copy-on-write is not possible, you can see it in play. In +the current implementation, the final byte of a string buffer is used as a +copy-on-write reference count. If the buffer is not big enough, then +copy-on-write is skipped. First have a look at an empty string: +.PP +.Vb 7 +\& % ./perl \-Ilib \-MDevel::Peek \-le \*(Aq$a=""; $a .= ""; Dump $a\*(Aq +\& SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 +\& REFCNT = 1 +\& FLAGS = (POK,pPOK) +\& PV = 0x7ffb7bc05b50 ""\e0 +\& CUR = 0 +\& LEN = 10 +.Ve +.PP +Notice here the LEN is 10. (It may differ on your platform.) Extend the +length of the string to one less than 10, and do a substitution: +.PP +.Vb 9 +\& % ./perl \-Ilib \-MDevel::Peek \-le \*(Aq$a=""; $a.="123456789"; $a=~s/.//; \e +\& Dump($a)\*(Aq +\& SV = PV(0x7ffa04008a70) at 0x7ffa04030390 +\& REFCNT = 1 +\& FLAGS = (POK,OOK,pPOK) +\& OFFSET = 1 +\& PV = 0x7ffa03c05b61 ( "\e1" . ) "23456789"\e0 +\& CUR = 8 +\& LEN = 9 +.Ve +.PP +Here the number of bytes chopped off (1) is shown next as the OFFSET. The +portion of the string between the "real" and the "fake" beginnings is +shown in parentheses, and the values of \f(CW\*(C`SvCUR\*(C'\fR and \f(CW\*(C`SvLEN\*(C'\fR reflect +the fake beginning, not the real one. (The first character of the string +buffer happens to have changed to "\e1" here, not "1", because the current +implementation stores the offset count in the string buffer. This is +subject to change.) +.PP +Something similar to the offset hack is performed on AVs to enable +efficient shifting and splicing off the beginning of the array; while +\&\f(CW\*(C`AvARRAY\*(C'\fR points to the first element in the array that is visible from +Perl, \f(CW\*(C`AvALLOC\*(C'\fR points to the real start of the C array. These are +usually the same, but a \f(CW\*(C`shift\*(C'\fR operation can be carried out by +increasing \f(CW\*(C`AvARRAY\*(C'\fR by one and decreasing \f(CW\*(C`AvFILL\*(C'\fR and \f(CW\*(C`AvMAX\*(C'\fR. +Again, the location of the real start of the C array only comes into +play when freeing the array. See \f(CW\*(C`av_shift\*(C'\fR in \fIav.c\fR. +.SS "What's Really Stored in an SV?" +.IX Subsection "What's Really Stored in an SV?" +Recall that the usual method of determining the type of scalar you have is +to use \f(CW\*(C`Sv*OK\*(C'\fR macros. Because a scalar can be both a number and a string, +usually these macros will always return TRUE and calling the \f(CW\*(C`Sv*V\*(C'\fR +macros will do the appropriate conversion of string to integer/double or +integer/double to string. +.PP +If you \fIreally\fR need to know if you have an integer, double, or string +pointer in an SV, you can use the following three macros instead: +.PP +.Vb 3 +\& SvIOKp(SV*) +\& SvNOKp(SV*) +\& SvPOKp(SV*) +.Ve +.PP +These will tell you if you truly have an integer, double, or string pointer +stored in your SV. The "p" stands for private. +.PP +There are various ways in which the private and public flags may differ. +For example, in perl 5.16 and earlier a tied SV may have a valid +underlying value in the IV slot (so SvIOKp is true), but the data +should be accessed via the FETCH routine rather than directly, +so SvIOK is false. (In perl 5.18 onwards, tied scalars use +the flags the same way as untied scalars.) Another is when +numeric conversion has occurred and precision has been lost: only the +private flag is set on 'lossy' values. So when an NV is converted to an +IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. +.PP +In general, though, it's best to use the \f(CW\*(C`Sv*V\*(C'\fR macros. +.SS "Working with AVs" +.IX Subsection "Working with AVs" +There are two main, longstanding ways to create and load an AV. The first +method creates an empty AV: +.PP +.Vb 1 +\& AV* newAV(); +.Ve +.PP +The second method both creates the AV and initially populates it with SVs: +.PP +.Vb 1 +\& AV* av_make(SSize_t num, SV **ptr); +.Ve +.PP +The second argument points to an array containing \f(CW\*(C`num\*(C'\fR \f(CW\*(C`SV*\*(C'\fR's. Once the +AV has been created, the SVs can be destroyed, if so desired. +.PP +Perl v5.36 added two new ways to create an AV and allocate a SV** array +without populating it. These are more efficient than a \fBnewAV()\fR followed by an +\&\fBav_extend()\fR. +.PP +.Vb 4 +\& /* Creates but does not initialize (Zero) the SV** array */ +\& AV *av = newAV_alloc_x(1); +\& /* Creates and does initialize (Zero) the SV** array */ +\& AV *av = newAV_alloc_xz(1); +.Ve +.PP +The numerical argument refers to the number of array elements to allocate, not +an array index, and must be >0. The first form must only ever be used when all +elements will be initialized before any read occurs. Reading a non-initialized +SV* \- i.e. treating a random memory address as a SV* \- is a serious bug. +.PP +Once the AV has been created, the following operations are possible on it: +.PP +.Vb 4 +\& void av_push(AV*, SV*); +\& SV* av_pop(AV*); +\& SV* av_shift(AV*); +\& void av_unshift(AV*, SSize_t num); +.Ve +.PP +These should be familiar operations, with the exception of \f(CW\*(C`av_unshift\*(C'\fR. +This routine adds \f(CW\*(C`num\*(C'\fR elements at the front of the array with the \f(CW\*(C`undef\*(C'\fR +value. You must then use \f(CW\*(C`av_store\*(C'\fR (described below) to assign values +to these new elements. +.PP +Here are some other functions: +.PP +.Vb 3 +\& SSize_t av_top_index(AV*); +\& SV** av_fetch(AV*, SSize_t key, I32 lval); +\& SV** av_store(AV*, SSize_t key, SV* val); +.Ve +.PP +The \f(CW\*(C`av_top_index\*(C'\fR function returns the highest index value in an array (just +like $#array in Perl). If the array is empty, \-1 is returned. The +\&\f(CW\*(C`av_fetch\*(C'\fR function returns the value at index \f(CW\*(C`key\*(C'\fR, but if \f(CW\*(C`lval\*(C'\fR +is non-zero, then \f(CW\*(C`av_fetch\*(C'\fR will store an undef value at that index. +The \f(CW\*(C`av_store\*(C'\fR function stores the value \f(CW\*(C`val\*(C'\fR at index \f(CW\*(C`key\*(C'\fR, and does +not increment the reference count of \f(CW\*(C`val\*(C'\fR. Thus the caller is responsible +for taking care of that, and if \f(CW\*(C`av_store\*(C'\fR returns NULL, the caller will +have to decrement the reference count to avoid a memory leak. Note that +\&\f(CW\*(C`av_fetch\*(C'\fR and \f(CW\*(C`av_store\*(C'\fR both return \f(CW\*(C`SV**\*(C'\fR's, not \f(CW\*(C`SV*\*(C'\fR's as their +return value. +.PP +A few more: +.PP +.Vb 3 +\& void av_clear(AV*); +\& void av_undef(AV*); +\& void av_extend(AV*, SSize_t key); +.Ve +.PP +The \f(CW\*(C`av_clear\*(C'\fR function deletes all the elements in the AV* array, but +does not actually delete the array itself. The \f(CW\*(C`av_undef\*(C'\fR function will +delete all the elements in the array plus the array itself. The +\&\f(CW\*(C`av_extend\*(C'\fR function extends the array so that it contains at least \f(CW\*(C`key+1\*(C'\fR +elements. If \f(CW\*(C`key+1\*(C'\fR is less than the currently allocated length of the array, +then nothing is done. +.PP +If you know the name of an array variable, you can get a pointer to its AV +by using the following: +.PP +.Vb 1 +\& AV* get_av("package::varname", 0); +.Ve +.PP +This returns NULL if the variable does not exist. +.PP +See "Understanding the Magic of Tied Hashes and Arrays" for more +information on how to use the array access functions on tied arrays. +.PP +\fIMore efficient working with new or vanilla AVs\fR +.IX Subsection "More efficient working with new or vanilla AVs" +.PP +Perl v5.36 and v5.38 introduced streamlined, inlined versions of some +functions: +.IP \(bu 4 +\&\f(CW\*(C`av_store_simple\*(C'\fR +.IP \(bu 4 +\&\f(CW\*(C`av_fetch_simple\*(C'\fR +.IP \(bu 4 +\&\f(CW\*(C`av_push_simple\*(C'\fR +.PP +These are drop-in replacements, but can only be used on straightforward +AVs that meet the following criteria: +.IP \(bu 4 +are not magical +.IP \(bu 4 +are not readonly +.IP \(bu 4 +are "real" (refcounted) AVs +.IP \(bu 4 +have an av_top_index value > \-2 +.PP +AVs created using \f(CWnewAV()\fR, \f(CW\*(C`av_make\*(C'\fR, \f(CW\*(C`newAV_alloc_x\*(C'\fR, and +\&\f(CW\*(C`newAV_alloc_xz\*(C'\fR are all compatible at the time of creation. It is +only if they are declared readonly or unreal, have magic attached, or +are otherwise configured unusually that they will stop being compatible. +.PP +Note that some interpreter functions may attach magic to an AV as part +of normal operations. It is therefore safest, unless you are sure of the +lifecycle of an AV, to only use these new functions close to the point +of AV creation. +.SS "Working with HVs" +.IX Subsection "Working with HVs" +To create an HV, you use the following routine: +.PP +.Vb 1 +\& HV* newHV(); +.Ve +.PP +Once the HV has been created, the following operations are possible on it: +.PP +.Vb 2 +\& SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); +\& SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); +.Ve +.PP +The \f(CW\*(C`klen\*(C'\fR parameter is the length of the key being passed in (Note that +you cannot pass 0 in as a value of \f(CW\*(C`klen\*(C'\fR to tell Perl to measure the +length of the key). The \f(CW\*(C`val\*(C'\fR argument contains the SV pointer to the +scalar being stored, and \f(CW\*(C`hash\*(C'\fR is the precomputed hash value (zero if +you want \f(CW\*(C`hv_store\*(C'\fR to calculate it for you). The \f(CW\*(C`lval\*(C'\fR parameter +indicates whether this fetch is actually a part of a store operation, in +which case a new undefined value will be added to the HV with the supplied +key and \f(CW\*(C`hv_fetch\*(C'\fR will return as if the value had already existed. +.PP +Remember that \f(CW\*(C`hv_store\*(C'\fR and \f(CW\*(C`hv_fetch\*(C'\fR return \f(CW\*(C`SV**\*(C'\fR's and not just +\&\f(CW\*(C`SV*\*(C'\fR. To access the scalar value, you must first dereference the return +value. However, you should check to make sure that the return value is +not NULL before dereferencing it. +.PP +The first of these two functions checks if a hash table entry exists, and the +second deletes it. +.PP +.Vb 2 +\& bool hv_exists(HV*, const char* key, U32 klen); +\& SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); +.Ve +.PP +If \f(CW\*(C`flags\*(C'\fR does not include the \f(CW\*(C`G_DISCARD\*(C'\fR flag then \f(CW\*(C`hv_delete\*(C'\fR will +create and return a mortal copy of the deleted value. +.PP +And more miscellaneous functions: +.PP +.Vb 2 +\& void hv_clear(HV*); +\& void hv_undef(HV*); +.Ve +.PP +Like their AV counterparts, \f(CW\*(C`hv_clear\*(C'\fR deletes all the entries in the hash +table but does not actually delete the hash table. The \f(CW\*(C`hv_undef\*(C'\fR deletes +both the entries and the hash table itself. +.PP +Perl keeps the actual data in a linked list of structures with a typedef of HE. +These contain the actual key and value pointers (plus extra administrative +overhead). The key is a string pointer; the value is an \f(CW\*(C`SV*\*(C'\fR. However, +once you have an \f(CW\*(C`HE*\*(C'\fR, to get the actual key and value, use the routines +specified below. +.PP +.Vb 10 +\& I32 hv_iterinit(HV*); +\& /* Prepares starting point to traverse hash table */ +\& HE* hv_iternext(HV*); +\& /* Get the next entry, and return a pointer to a +\& structure that has both the key and value */ +\& char* hv_iterkey(HE* entry, I32* retlen); +\& /* Get the key from an HE structure and also return +\& the length of the key string */ +\& SV* hv_iterval(HV*, HE* entry); +\& /* Return an SV pointer to the value of the HE +\& structure */ +\& SV* hv_iternextsv(HV*, char** key, I32* retlen); +\& /* This convenience routine combines hv_iternext, +\& hv_iterkey, and hv_iterval. The key and retlen +\& arguments are return values for the key and its +\& length. The value is returned in the SV* argument */ +.Ve +.PP +If you know the name of a hash variable, you can get a pointer to its HV +by using the following: +.PP +.Vb 1 +\& HV* get_hv("package::varname", 0); +.Ve +.PP +This returns NULL if the variable does not exist. +.PP +The hash algorithm is defined in the \f(CW\*(C`PERL_HASH\*(C'\fR macro: +.PP +.Vb 1 +\& PERL_HASH(hash, key, klen) +.Ve +.PP +The exact implementation of this macro varies by architecture and version +of perl, and the return value may change per invocation, so the value +is only valid for the duration of a single perl process. +.PP +See "Understanding the Magic of Tied Hashes and Arrays" for more +information on how to use the hash access functions on tied hashes. +.SS "Hash API Extensions" +.IX Subsection "Hash API Extensions" +Beginning with version 5.004, the following functions are also supported: +.PP +.Vb 2 +\& HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); +\& HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); +\& +\& bool hv_exists_ent (HV* tb, SV* key, U32 hash); +\& SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); +\& +\& SV* hv_iterkeysv (HE* entry); +.Ve +.PP +Note that these functions take \f(CW\*(C`SV*\*(C'\fR keys, which simplifies writing +of extension code that deals with hash structures. These functions +also allow passing of \f(CW\*(C`SV*\*(C'\fR keys to \f(CW\*(C`tie\*(C'\fR functions without forcing +you to stringify the keys (unlike the previous set of functions). +.PP +They also return and accept whole hash entries (\f(CW\*(C`HE*\*(C'\fR), making their +use more efficient (since the hash number for a particular string +doesn't have to be recomputed every time). See perlapi for detailed +descriptions. +.PP +The following macros must always be used to access the contents of hash +entries. Note that the arguments to these macros must be simple +variables, since they may get evaluated more than once. See +perlapi for detailed descriptions of these macros. +.PP +.Vb 6 +\& HePV(HE* he, STRLEN len) +\& HeVAL(HE* he) +\& HeHASH(HE* he) +\& HeSVKEY(HE* he) +\& HeSVKEY_force(HE* he) +\& HeSVKEY_set(HE* he, SV* sv) +.Ve +.PP +These two lower level macros are defined, but must only be used when +dealing with keys that are not \f(CW\*(C`SV*\*(C'\fRs: +.PP +.Vb 2 +\& HeKEY(HE* he) +\& HeKLEN(HE* he) +.Ve +.PP +Note that both \f(CW\*(C`hv_store\*(C'\fR and \f(CW\*(C`hv_store_ent\*(C'\fR do not increment the +reference count of the stored \f(CW\*(C`val\*(C'\fR, which is the caller's responsibility. +If these functions return a NULL value, the caller will usually have to +decrement the reference count of \f(CW\*(C`val\*(C'\fR to avoid a memory leak. +.SS "AVs, HVs and undefined values" +.IX Subsection "AVs, HVs and undefined values" +Sometimes you have to store undefined values in AVs or HVs. Although +this may be a rare case, it can be tricky. That's because you're +used to using \f(CW&PL_sv_undef\fR if you need an undefined SV. +.PP +For example, intuition tells you that this XS code: +.PP +.Vb 2 +\& AV *av = newAV(); +\& av_store( av, 0, &PL_sv_undef ); +.Ve +.PP +is equivalent to this Perl code: +.PP +.Vb 2 +\& my @av; +\& $av[0] = undef; +.Ve +.PP +Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use \f(CW&PL_sv_undef\fR as a marker +for indicating that an array element has not yet been initialized. +Thus, \f(CW\*(C`exists $av[0]\*(C'\fR would be true for the above Perl code, but +false for the array generated by the XS code. In perl 5.20, storing +&PL_sv_undef will create a read-only element, because the scalar +&PL_sv_undef itself is stored, not a copy. +.PP +Similar problems can occur when storing \f(CW&PL_sv_undef\fR in HVs: +.PP +.Vb 1 +\& hv_store( hv, "key", 3, &PL_sv_undef, 0 ); +.Ve +.PP +This will indeed make the value \f(CW\*(C`undef\*(C'\fR, but if you try to modify +the value of \f(CW\*(C`key\*(C'\fR, you'll get the following error: +.PP +.Vb 1 +\& Modification of non\-creatable hash value attempted +.Ve +.PP +In perl 5.8.0, \f(CW&PL_sv_undef\fR was also used to mark placeholders +in restricted hashes. This caused such hash entries not to appear +when iterating over the hash or when checking for the keys +with the \f(CW\*(C`hv_exists\*(C'\fR function. +.PP +You can run into similar problems when you store \f(CW&PL_sv_yes\fR or +\&\f(CW&PL_sv_no\fR into AVs or HVs. Trying to modify such elements +will give you the following error: +.PP +.Vb 1 +\& Modification of a read\-only value attempted +.Ve +.PP +To make a long story short, you can use the special variables +\&\f(CW&PL_sv_undef\fR, \f(CW&PL_sv_yes\fR and \f(CW&PL_sv_no\fR with AVs and +HVs, but you have to make sure you know what you're doing. +.PP +Generally, if you want to store an undefined value in an AV +or HV, you should not use \f(CW&PL_sv_undef\fR, but rather create a +new undefined value using the \f(CW\*(C`newSV\*(C'\fR function, for example: +.PP +.Vb 2 +\& av_store( av, 42, newSV(0) ); +\& hv_store( hv, "foo", 3, newSV(0), 0 ); +.Ve +.SS References +.IX Subsection "References" +References are a special type of scalar that point to other data types +(including other references). +.PP +To create a reference, use either of the following functions: +.PP +.Vb 2 +\& SV* newRV_inc((SV*) thing); +\& SV* newRV_noinc((SV*) thing); +.Ve +.PP +The \f(CW\*(C`thing\*(C'\fR argument can be any of an \f(CW\*(C`SV*\*(C'\fR, \f(CW\*(C`AV*\*(C'\fR, or \f(CW\*(C`HV*\*(C'\fR. The +functions are identical except that \f(CW\*(C`newRV_inc\*(C'\fR increments the reference +count of the \f(CW\*(C`thing\*(C'\fR, while \f(CW\*(C`newRV_noinc\*(C'\fR does not. For historical +reasons, \f(CW\*(C`newRV\*(C'\fR is a synonym for \f(CW\*(C`newRV_inc\*(C'\fR. +.PP +Once you have a reference, you can use the following macro to dereference +the reference: +.PP +.Vb 1 +\& SvRV(SV*) +.Ve +.PP +then call the appropriate routines, casting the returned \f(CW\*(C`SV*\*(C'\fR to either an +\&\f(CW\*(C`AV*\*(C'\fR or \f(CW\*(C`HV*\*(C'\fR, if required. +.PP +To determine if an SV is a reference, you can use the following macro: +.PP +.Vb 1 +\& SvROK(SV*) +.Ve +.PP +To discover what type of value the reference refers to, use the following +macro and then check the return value. +.PP +.Vb 1 +\& SvTYPE(SvRV(SV*)) +.Ve +.PP +The most useful types that will be returned are: +.PP +.Vb 4 +\& SVt_PVAV Array +\& SVt_PVHV Hash +\& SVt_PVCV Code +\& SVt_PVGV Glob (possibly a file handle) +.Ve +.PP +Any numerical value returned which is less than SVt_PVAV will be a scalar +of some form. +.PP +See "svtype" in perlapi for more details. +.SS "Blessed References and Class Objects" +.IX Subsection "Blessed References and Class Objects" +References are also used to support object-oriented programming. In perl's +OO lexicon, an object is simply a reference that has been blessed into a +package (or class). Once blessed, the programmer may now use the reference +to access the various methods in the class. +.PP +A reference can be blessed into a package with the following function: +.PP +.Vb 1 +\& SV* sv_bless(SV* sv, HV* stash); +.Ve +.PP +The \f(CW\*(C`sv\*(C'\fR argument must be a reference value. The \f(CW\*(C`stash\*(C'\fR argument +specifies which class the reference will belong to. See +"Stashes and Globs" for information on converting class names into stashes. +.PP +/* Still under construction */ +.PP +The following function upgrades rv to reference if not already one. +Creates a new SV for rv to point to. If \f(CW\*(C`classname\*(C'\fR is non-null, the SV +is blessed into the specified class. SV is returned. +.PP +.Vb 1 +\& SV* newSVrv(SV* rv, const char* classname); +.Ve +.PP +The following three functions copy integer, unsigned integer or double +into an SV whose reference is \f(CW\*(C`rv\*(C'\fR. SV is blessed if \f(CW\*(C`classname\*(C'\fR is +non-null. +.PP +.Vb 3 +\& SV* sv_setref_iv(SV* rv, const char* classname, IV iv); +\& SV* sv_setref_uv(SV* rv, const char* classname, UV uv); +\& SV* sv_setref_nv(SV* rv, const char* classname, NV iv); +.Ve +.PP +The following function copies the pointer value (\fIthe address, not the +string!\fR) into an SV whose reference is rv. SV is blessed if \f(CW\*(C`classname\*(C'\fR +is non-null. +.PP +.Vb 1 +\& SV* sv_setref_pv(SV* rv, const char* classname, void* pv); +.Ve +.PP +The following function copies a string into an SV whose reference is \f(CW\*(C`rv\*(C'\fR. +Set length to 0 to let Perl calculate the string length. SV is blessed if +\&\f(CW\*(C`classname\*(C'\fR is non-null. +.PP +.Vb 2 +\& SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, +\& STRLEN length); +.Ve +.PP +The following function tests whether the SV is blessed into the specified +class. It does not check inheritance relationships. +.PP +.Vb 1 +\& int sv_isa(SV* sv, const char* name); +.Ve +.PP +The following function tests whether the SV is a reference to a blessed object. +.PP +.Vb 1 +\& int sv_isobject(SV* sv); +.Ve +.PP +The following function tests whether the SV is derived from the specified +class. SV can be either a reference to a blessed object or a string +containing a class name. This is the function implementing the +\&\f(CW\*(C`UNIVERSAL::isa\*(C'\fR functionality. +.PP +.Vb 1 +\& bool sv_derived_from(SV* sv, const char* name); +.Ve +.PP +To check if you've got an object derived from a specific class you have +to write: +.PP +.Vb 1 +\& if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } +.Ve +.SS "Creating New Variables" +.IX Subsection "Creating New Variables" +To create a new Perl variable with an undef value which can be accessed from +your Perl script, use the following routines, depending on the variable type. +.PP +.Vb 3 +\& SV* get_sv("package::varname", GV_ADD); +\& AV* get_av("package::varname", GV_ADD); +\& HV* get_hv("package::varname", GV_ADD); +.Ve +.PP +Notice the use of GV_ADD as the second parameter. The new variable can now +be set, using the routines appropriate to the data type. +.PP +There are additional macros whose values may be bitwise OR'ed with the +\&\f(CW\*(C`GV_ADD\*(C'\fR argument to enable certain extra features. Those bits are: +.IP GV_ADDMULTI 4 +.IX Item "GV_ADDMULTI" +Marks the variable as multiply defined, thus preventing the: +.Sp +.Vb 1 +\& Name used only once: possible typo +.Ve +.Sp +warning. +.IP GV_ADDWARN 4 +.IX Item "GV_ADDWARN" +Issues the warning: +.Sp +.Vb 1 +\& Had to create unexpectedly +.Ve +.Sp +if the variable did not exist before the function was called. +.PP +If you do not specify a package name, the variable is created in the current +package. +.SS "Reference Counts and Mortality" +.IX Subsection "Reference Counts and Mortality" +Perl uses a reference count-driven garbage collection mechanism. SVs, +AVs, or HVs (xV for short in the following) start their life with a +reference count of 1. If the reference count of an xV ever drops to 0, +then it will be destroyed and its memory made available for reuse. +At the most basic internal level, reference counts can be manipulated +with the following macros: +.PP +.Vb 3 +\& int SvREFCNT(SV* sv); +\& SV* SvREFCNT_inc(SV* sv); +\& void SvREFCNT_dec(SV* sv); +.Ve +.PP +(There are also suffixed versions of the increment and decrement macros, +for situations where the full generality of these basic macros can be +exchanged for some performance.) +.PP +However, the way a programmer should think about references is not so +much in terms of the bare reference count, but in terms of \fIownership\fR +of references. A reference to an xV can be owned by any of a variety +of entities: another xV, the Perl interpreter, an XS data structure, +a piece of running code, or a dynamic scope. An xV generally does not +know what entities own the references to it; it only knows how many +references there are, which is the reference count. +.PP +To correctly maintain reference counts, it is essential to keep track +of what references the XS code is manipulating. The programmer should +always know where a reference has come from and who owns it, and be +aware of any creation or destruction of references, and any transfers +of ownership. Because ownership isn't represented explicitly in the xV +data structures, only the reference count need be actually maintained +by the code, and that means that this understanding of ownership is not +actually evident in the code. For example, transferring ownership of a +reference from one owner to another doesn't change the reference count +at all, so may be achieved with no actual code. (The transferring code +doesn't touch the referenced object, but does need to ensure that the +former owner knows that it no longer owns the reference, and that the +new owner knows that it now does.) +.PP +An xV that is visible at the Perl level should not become unreferenced +and thus be destroyed. Normally, an object will only become unreferenced +when it is no longer visible, often by the same means that makes it +invisible. For example, a Perl reference value (RV) owns a reference to +its referent, so if the RV is overwritten that reference gets destroyed, +and the no-longer-reachable referent may be destroyed as a result. +.PP +Many functions have some kind of reference manipulation as +part of their purpose. Sometimes this is documented in terms +of ownership of references, and sometimes it is (less helpfully) +documented in terms of changes to reference counts. For example, the +\&\fBnewRV_inc()\fR function is documented to create a new RV +(with reference count 1) and increment the reference count of the referent +that was supplied by the caller. This is best understood as creating +a new reference to the referent, which is owned by the created RV, +and returning to the caller ownership of the sole reference to the RV. +The \fBnewRV_noinc()\fR function instead does not +increment the reference count of the referent, but the RV nevertheless +ends up owning a reference to the referent. It is therefore implied +that the caller of \f(CWnewRV_noinc()\fR is relinquishing a reference to the +referent, making this conceptually a more complicated operation even +though it does less to the data structures. +.PP +For example, imagine you want to return a reference from an XSUB +function. Inside the XSUB routine, you create an SV which initially +has just a single reference, owned by the XSUB routine. This reference +needs to be disposed of before the routine is complete, otherwise it +will leak, preventing the SV from ever being destroyed. So to create +an RV referencing the SV, it is most convenient to pass the SV to +\&\f(CWnewRV_noinc()\fR, which consumes that reference. Now the XSUB routine +no longer owns a reference to the SV, but does own a reference to the RV, +which in turn owns a reference to the SV. The ownership of the reference +to the RV is then transferred by the process of returning the RV from +the XSUB. +.PP +There are some convenience functions available that can help with the +destruction of xVs. These functions introduce the concept of "mortality". +Much documentation speaks of an xV itself being mortal, but this is +misleading. It is really \fIa reference to\fR an xV that is mortal, and it +is possible for there to be more than one mortal reference to a single xV. +For a reference to be mortal means that it is owned by the temps stack, +one of perl's many internal stacks, which will destroy that reference +"a short time later". Usually the "short time later" is the end of +the current Perl statement. However, it gets more complicated around +dynamic scopes: there can be multiple sets of mortal references hanging +around at the same time, with different death dates. Internally, the +actual determinant for when mortal xV references are destroyed depends +on two macros, SAVETMPS and FREETMPS. See perlcall and perlxs +and "Temporaries Stack" below for more details on these macros. +.PP +Mortal references are mainly used for xVs that are placed on perl's +main stack. The stack is problematic for reference tracking, because it +contains a lot of xV references, but doesn't own those references: they +are not counted. Currently, there are many bugs resulting from xVs being +destroyed while referenced by the stack, because the stack's uncounted +references aren't enough to keep the xVs alive. So when putting an +(uncounted) reference on the stack, it is vitally important to ensure that +there will be a counted reference to the same xV that will last at least +as long as the uncounted reference. But it's also important that that +counted reference be cleaned up at an appropriate time, and not unduly +prolong the xV's life. For there to be a mortal reference is often the +best way to satisfy this requirement, especially if the xV was created +especially to be put on the stack and would otherwise be unreferenced. +.PP +To create a mortal reference, use the functions: +.PP +.Vb 3 +\& SV* sv_newmortal() +\& SV* sv_mortalcopy(SV*) +\& SV* sv_2mortal(SV*) +.Ve +.PP +\&\f(CWsv_newmortal()\fR creates an SV (with the undefined value) whose sole +reference is mortal. \f(CWsv_mortalcopy()\fR creates an xV whose value is a +copy of a supplied xV and whose sole reference is mortal. \f(CWsv_2mortal()\fR +mortalises an existing xV reference: it transfers ownership of a reference +from the caller to the temps stack. Because \f(CW\*(C`sv_newmortal\*(C'\fR gives the new +SV no value, it must normally be given one via \f(CW\*(C`sv_setpv\*(C'\fR, \f(CW\*(C`sv_setiv\*(C'\fR, +etc. : +.PP +.Vb 2 +\& SV *tmp = sv_newmortal(); +\& sv_setiv(tmp, an_integer); +.Ve +.PP +As that is multiple C statements it is quite common so see this idiom instead: +.PP +.Vb 1 +\& SV *tmp = sv_2mortal(newSViv(an_integer)); +.Ve +.PP +The mortal routines are not just for SVs; AVs and HVs can be +made mortal by passing their address (type-casted to \f(CW\*(C`SV*\*(C'\fR) to the +\&\f(CW\*(C`sv_2mortal\*(C'\fR or \f(CW\*(C`sv_mortalcopy\*(C'\fR routines. +.SS "Stashes and Globs" +.IX Subsection "Stashes and Globs" +A \fBstash\fR is a hash that contains all variables that are defined +within a package. Each key of the stash is a symbol +name (shared by all the different types of objects that have the same +name), and each value in the hash table is a GV (Glob Value). This GV +in turn contains references to the various objects of that name, +including (but not limited to) the following: +.PP +.Vb 6 +\& Scalar Value +\& Array Value +\& Hash Value +\& I/O Handle +\& Format +\& Subroutine +.Ve +.PP +There is a single stash called \f(CW\*(C`PL_defstash\*(C'\fR that holds the items that exist +in the \f(CW\*(C`main\*(C'\fR package. To get at the items in other packages, append the +string "::" to the package name. The items in the \f(CW\*(C`Foo\*(C'\fR package are in +the stash \f(CW\*(C`Foo::\*(C'\fR in PL_defstash. The items in the \f(CW\*(C`Bar::Baz\*(C'\fR package are +in the stash \f(CW\*(C`Baz::\*(C'\fR in \f(CW\*(C`Bar::\*(C'\fR's stash. +.PP +To get the stash pointer for a particular package, use the function: +.PP +.Vb 2 +\& HV* gv_stashpv(const char* name, I32 flags) +\& HV* gv_stashsv(SV*, I32 flags) +.Ve +.PP +The first function takes a literal string, the second uses the string stored +in the SV. Remember that a stash is just a hash table, so you get back an +\&\f(CW\*(C`HV*\*(C'\fR. The \f(CW\*(C`flags\*(C'\fR flag will create a new package if it is set to GV_ADD. +.PP +The name that \f(CW\*(C`gv_stash*v\*(C'\fR wants is the name of the package whose symbol table +you want. The default package is called \f(CW\*(C`main\*(C'\fR. If you have multiply nested +packages, pass their names to \f(CW\*(C`gv_stash*v\*(C'\fR, separated by \f(CW\*(C`::\*(C'\fR as in the Perl +language itself. +.PP +Alternately, if you have an SV that is a blessed reference, you can find +out the stash pointer by using: +.PP +.Vb 1 +\& HV* SvSTASH(SvRV(SV*)); +.Ve +.PP +then use the following to get the package name itself: +.PP +.Vb 1 +\& char* HvNAME(HV* stash); +.Ve +.PP +If you need to bless or re-bless an object you can use the following +function: +.PP +.Vb 1 +\& SV* sv_bless(SV*, HV* stash) +.Ve +.PP +where the first argument, an \f(CW\*(C`SV*\*(C'\fR, must be a reference, and the second +argument is a stash. The returned \f(CW\*(C`SV*\*(C'\fR can now be used in the same way +as any other SV. +.PP +For more information on references and blessings, consult perlref. +.SS "I/O Handles" +.IX Subsection "I/O Handles" +Like AVs and HVs, IO objects are another type of non-scalar SV which +may contain input and output PerlIO objects or a \f(CW\*(C`DIR *\*(C'\fR +from \fBopendir()\fR. +.PP +You can create a new IO object: +.PP +.Vb 1 +\& IO* newIO(); +.Ve +.PP +Unlike other SVs, a new IO object is automatically blessed into the +IO::File class. +.PP +The IO object contains an input and output PerlIO handle: +.PP +.Vb 2 +\& PerlIO *IoIFP(IO *io); +\& PerlIO *IoOFP(IO *io); +.Ve +.PP +Typically if the IO object has been opened on a file, the input handle +is always present, but the output handle is only present if the file +is open for output. For a file, if both are present they will be the +same PerlIO object. +.PP +Distinct input and output PerlIO objects are created for sockets and +character devices. +.PP +The IO object also contains other data associated with Perl I/O +handles: +.PP +.Vb 12 +\& IV IoLINES(io); /* $. */ +\& IV IoPAGE(io); /* $% */ +\& IV IoPAGE_LEN(io); /* $= */ +\& IV IoLINES_LEFT(io); /* $\- */ +\& char *IoTOP_NAME(io); /* $^ */ +\& GV *IoTOP_GV(io); /* $^ */ +\& char *IoFMT_NAME(io); /* $~ */ +\& GV *IoFMT_GV(io); /* $~ */ +\& char *IoBOTTOM_NAME(io); +\& GV *IoBOTTOM_GV(io); +\& char IoTYPE(io); +\& U8 IoFLAGS(io); +\& +\& =for apidoc_sections $io_scn, $formats_section +\&=for apidoc_section $reports +\&=for apidoc Amh|IV|IoLINES|IO *io +\&=for apidoc Amh|IV|IoPAGE|IO *io +\&=for apidoc Amh|IV|IoPAGE_LEN|IO *io +\&=for apidoc Amh|IV|IoLINES_LEFT|IO *io +\&=for apidoc Amh|char *|IoTOP_NAME|IO *io +\&=for apidoc Amh|GV *|IoTOP_GV|IO *io +\&=for apidoc Amh|char *|IoFMT_NAME|IO *io +\&=for apidoc Amh|GV *|IoFMT_GV|IO *io +\&=for apidoc Amh|char *|IoBOTTOM_NAME|IO *io +\&=for apidoc Amh|GV *|IoBOTTOM_GV|IO *io +\&=for apidoc_section $io +\&=for apidoc Amh|char|IoTYPE|IO *io +\&=for apidoc Amh|U8|IoFLAGS|IO *io +.Ve +.PP +Most of these are involved with formats. +.PP +\&\fBIoFLAGs()\fR may contain a combination of flags, the most interesting of +which are \f(CW\*(C`IOf_FLUSH\*(C'\fR (\f(CW$|\fR) for autoflush and \f(CW\*(C`IOf_UNTAINT\*(C'\fR, +settable with IO::Handle's \fBuntaint()\fR method. +.PP +The IO object may also contains a directory handle: +.PP +.Vb 1 +\& DIR *IoDIRP(io); +.Ve +.PP +suitable for use with \fBPerlDir_read()\fR etc. +.PP +All of these accessors macros are lvalues, there are no distinct +\&\f(CW_set()\fR macros to modify the members of the IO object. +.SS "Double-Typed SVs" +.IX Subsection "Double-Typed SVs" +Scalar variables normally contain only one type of value, an integer, +double, pointer, or reference. Perl will automatically convert the +actual scalar data from the stored type into the requested type. +.PP +Some scalar variables contain more than one type of scalar data. For +example, the variable \f(CW$!\fR contains either the numeric value of \f(CW\*(C`errno\*(C'\fR +or its string equivalent from either \f(CW\*(C`strerror\*(C'\fR or \f(CW\*(C`sys_errlist[]\*(C'\fR. +.PP +To force multiple data values into an SV, you must do two things: use the +\&\f(CW\*(C`sv_set*v\*(C'\fR routines to add the additional scalar type, then set a flag +so that Perl will believe it contains more than one type of data. The +four macros to set the flags are: +.PP +.Vb 4 +\& SvIOK_on +\& SvNOK_on +\& SvPOK_on +\& SvROK_on +.Ve +.PP +The particular macro you must use depends on which \f(CW\*(C`sv_set*v\*(C'\fR routine +you called first. This is because every \f(CW\*(C`sv_set*v\*(C'\fR routine turns on +only the bit for the particular type of data being set, and turns off +all the rest. +.PP +For example, to create a new Perl variable called "dberror" that contains +both the numeric and descriptive string error values, you could use the +following code: +.PP +.Vb 2 +\& extern int dberror; +\& extern char *dberror_list; +\& +\& SV* sv = get_sv("dberror", GV_ADD); +\& sv_setiv(sv, (IV) dberror); +\& sv_setpv(sv, dberror_list[dberror]); +\& SvIOK_on(sv); +.Ve +.PP +If the order of \f(CW\*(C`sv_setiv\*(C'\fR and \f(CW\*(C`sv_setpv\*(C'\fR had been reversed, then the +macro \f(CW\*(C`SvPOK_on\*(C'\fR would need to be called instead of \f(CW\*(C`SvIOK_on\*(C'\fR. +.SS "Read-Only Values" +.IX Subsection "Read-Only Values" +In Perl 5.16 and earlier, copy-on-write (see the next section) shared a +flag bit with read-only scalars. So the only way to test whether +\&\f(CW\*(C`sv_setsv\*(C'\fR, etc., will raise a "Modification of a read-only value" error +in those versions is: +.PP +.Vb 1 +\& SvREADONLY(sv) && !SvIsCOW(sv) +.Ve +.PP +Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, +and, under 5.20, copy-on-write scalars can also be read-only, so the above +check is incorrect. You just want: +.PP +.Vb 1 +\& SvREADONLY(sv) +.Ve +.PP +If you need to do this check often, define your own macro like this: +.PP +.Vb 5 +\& #if PERL_VERSION >= 18 +\& # define SvTRULYREADONLY(sv) SvREADONLY(sv) +\& #else +\& # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) +\& #endif +.Ve +.SS "Copy on Write" +.IX Subsection "Copy on Write" +Perl implements a copy-on-write (COW) mechanism for scalars, in which +string copies are not immediately made when requested, but are deferred +until made necessary by one or the other scalar changing. This is mostly +transparent, but one must take care not to modify string buffers that are +shared by multiple SVs. +.PP +You can test whether an SV is using copy-on-write with \f(CWSvIsCOW(sv)\fR. +.PP +You can force an SV to make its own copy of its string buffer by calling \f(CWsv_force_normal(sv)\fR or SvPV_force_nolen(sv). +.PP +If you want to make the SV drop its string buffer, use +\&\f(CW\*(C`sv_force_normal_flags(sv, SV_COW_DROP_PV)\*(C'\fR or simply +\&\f(CW\*(C`sv_setsv(sv, NULL)\*(C'\fR. +.PP +All of these functions will croak on read-only scalars (see the previous +section for more on those). +.PP +To test that your code is behaving correctly and not modifying COW buffers, +on systems that support \fBmmap\fR\|(2) (i.e., Unix) you can configure perl with +\&\f(CW\*(C`\-Accflags=\-DPERL_DEBUG_READONLY_COW\*(C'\fR and it will turn buffer violations +into crashes. You will find it to be marvellously slow, so you may want to +skip perl's own tests. +.SS "Magic Variables" +.IX Subsection "Magic Variables" +[This section still under construction. Ignore everything here. Post no +bills. Everything not permitted is forbidden.] +.PP +Any SV may be magical, that is, it has special features that a normal +SV does not have. These features are stored in the SV structure in a +linked list of \f(CW\*(C`struct magic\*(C'\fR's, typedef'ed to \f(CW\*(C`MAGIC\*(C'\fR. +.PP +.Vb 10 +\& struct magic { +\& MAGIC* mg_moremagic; +\& MGVTBL* mg_virtual; +\& U16 mg_private; +\& char mg_type; +\& U8 mg_flags; +\& I32 mg_len; +\& SV* mg_obj; +\& char* mg_ptr; +\& }; +.Ve +.PP +Note this is current as of patchlevel 0, and could change at any time. +.SS "Assigning Magic" +.IX Subsection "Assigning Magic" +Perl adds magic to an SV using the sv_magic function: +.PP +.Vb 1 +\& void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); +.Ve +.PP +The \f(CW\*(C`sv\*(C'\fR argument is a pointer to the SV that is to acquire a new magical +feature. +.PP +If \f(CW\*(C`sv\*(C'\fR is not already magical, Perl uses the \f(CW\*(C`SvUPGRADE\*(C'\fR macro to +convert \f(CW\*(C`sv\*(C'\fR to type \f(CW\*(C`SVt_PVMG\*(C'\fR. +Perl then continues by adding new magic +to the beginning of the linked list of magical features. Any prior entry +of the same type of magic is deleted. Note that this can be overridden, +and multiple instances of the same type of magic can be associated with an +SV. +.PP +The \f(CW\*(C`name\*(C'\fR and \f(CW\*(C`namlen\*(C'\fR arguments are used to associate a string with +the magic, typically the name of a variable. \f(CW\*(C`namlen\*(C'\fR is stored in the +\&\f(CW\*(C`mg_len\*(C'\fR field and if \f(CW\*(C`name\*(C'\fR is non-null then either a \f(CW\*(C`savepvn\*(C'\fR copy of +\&\f(CW\*(C`name\*(C'\fR or \f(CW\*(C`name\*(C'\fR itself is stored in the \f(CW\*(C`mg_ptr\*(C'\fR field, depending on +whether \f(CW\*(C`namlen\*(C'\fR is greater than zero or equal to zero respectively. As a +special case, if \f(CW\*(C`(name && namlen == HEf_SVKEY)\*(C'\fR then \f(CW\*(C`name\*(C'\fR is assumed +to contain an \f(CW\*(C`SV*\*(C'\fR and is stored as-is with its REFCNT incremented. +.PP +The sv_magic function uses \f(CW\*(C`how\*(C'\fR to determine which, if any, predefined +"Magic Virtual Table" should be assigned to the \f(CW\*(C`mg_virtual\*(C'\fR field. +See the "Magic Virtual Tables" section below. The \f(CW\*(C`how\*(C'\fR argument is also +stored in the \f(CW\*(C`mg_type\*(C'\fR field. The value of +\&\f(CW\*(C`how\*(C'\fR should be chosen from the set of macros +\&\f(CW\*(C`PERL_MAGIC_foo\*(C'\fR found in \fIperl.h\fR. Note that before +these macros were added, Perl internals used to directly use character +literals, so you may occasionally come across old code or documentation +referring to 'U' magic rather than \f(CW\*(C`PERL_MAGIC_uvar\*(C'\fR for example. +.PP +The \f(CW\*(C`obj\*(C'\fR argument is stored in the \f(CW\*(C`mg_obj\*(C'\fR field of the \f(CW\*(C`MAGIC\*(C'\fR +structure. If it is not the same as the \f(CW\*(C`sv\*(C'\fR argument, the reference +count of the \f(CW\*(C`obj\*(C'\fR object is incremented. If it is the same, or if +the \f(CW\*(C`how\*(C'\fR argument is \f(CW\*(C`PERL_MAGIC_arylen\*(C'\fR, \f(CW\*(C`PERL_MAGIC_regdatum\*(C'\fR, +\&\f(CW\*(C`PERL_MAGIC_regdata\*(C'\fR, or if it is a NULL pointer, then \f(CW\*(C`obj\*(C'\fR is merely +stored, without the reference count being incremented. +.PP +See also \f(CW\*(C`sv_magicext\*(C'\fR in perlapi for a more flexible way to add magic +to an SV. +.PP +There is also a function to add magic to an \f(CW\*(C`HV\*(C'\fR: +.PP +.Vb 1 +\& void hv_magic(HV *hv, GV *gv, int how); +.Ve +.PP +This simply calls \f(CW\*(C`sv_magic\*(C'\fR and coerces the \f(CW\*(C`gv\*(C'\fR argument into an \f(CW\*(C`SV\*(C'\fR. +.PP +To remove the magic from an SV, call the function sv_unmagic: +.PP +.Vb 1 +\& int sv_unmagic(SV *sv, int type); +.Ve +.PP +The \f(CW\*(C`type\*(C'\fR argument should be equal to the \f(CW\*(C`how\*(C'\fR value when the \f(CW\*(C`SV\*(C'\fR +was initially made magical. +.PP +However, note that \f(CW\*(C`sv_unmagic\*(C'\fR removes all magic of a certain \f(CW\*(C`type\*(C'\fR from the +\&\f(CW\*(C`SV\*(C'\fR. If you want to remove only certain +magic of a \f(CW\*(C`type\*(C'\fR based on the magic +virtual table, use \f(CW\*(C`sv_unmagicext\*(C'\fR instead: +.PP +.Vb 1 +\& int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); +.Ve +.SS "Magic Virtual Tables" +.IX Subsection "Magic Virtual Tables" +The \f(CW\*(C`mg_virtual\*(C'\fR field in the \f(CW\*(C`MAGIC\*(C'\fR structure is a pointer to an +\&\f(CW\*(C`MGVTBL\*(C'\fR, which is a structure of function pointers and stands for +"Magic Virtual Table" to handle the various operations that might be +applied to that variable. +.PP +The \f(CW\*(C`MGVTBL\*(C'\fR has five (or sometimes eight) pointers to the following +routine types: +.PP +.Vb 5 +\& int (*svt_get) (pTHX_ SV* sv, MAGIC* mg); +\& int (*svt_set) (pTHX_ SV* sv, MAGIC* mg); +\& U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg); +\& int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg); +\& int (*svt_free) (pTHX_ SV* sv, MAGIC* mg); +\& +\& int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv, +\& const char *name, I32 namlen); +\& int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param); +\& int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg); +.Ve +.PP +This MGVTBL structure is set at compile-time in \fIperl.h\fR and there are +currently 32 types. These different structures contain pointers to various +routines that perform additional actions depending on which function is +being called. +.PP +.Vb 8 +\& Function pointer Action taken +\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\- +\& svt_get Do something before the value of the SV is +\& retrieved. +\& svt_set Do something after the SV is assigned a value. +\& svt_len Report on the SV\*(Aqs length. +\& svt_clear Clear something the SV represents. +\& svt_free Free any extra storage associated with the SV. +\& +\& svt_copy copy tied variable magic to a tied element +\& svt_dup duplicate a magic structure during thread cloning +\& svt_local copy magic to local value during \*(Aqlocal\*(Aq +.Ve +.PP +For instance, the MGVTBL structure called \f(CW\*(C`vtbl_sv\*(C'\fR (which corresponds +to an \f(CW\*(C`mg_type\*(C'\fR of \f(CW\*(C`PERL_MAGIC_sv\*(C'\fR) contains: +.PP +.Vb 1 +\& { magic_get, magic_set, magic_len, 0, 0 } +.Ve +.PP +Thus, when an SV is determined to be magical and of type \f(CW\*(C`PERL_MAGIC_sv\*(C'\fR, +if a get operation is being performed, the routine \f(CW\*(C`magic_get\*(C'\fR is +called. All the various routines for the various magical types begin +with \f(CW\*(C`magic_\*(C'\fR. NOTE: the magic routines are not considered part of +the Perl API, and may not be exported by the Perl library. +.PP +The last three slots are a recent addition, and for source code +compatibility they are only checked for if one of the three flags +\&\f(CW\*(C`MGf_COPY\*(C'\fR, \f(CW\*(C`MGf_DUP\*(C'\fR, or \f(CW\*(C`MGf_LOCAL\*(C'\fR is set in mg_flags. +This means that most code can continue declaring +a vtable as a 5\-element value. These three are +currently used exclusively by the threading code, and are highly subject +to change. +.PP +The current kinds of Magic Virtual Tables are: +.PP +.Vb 10 +\& mg_type +\& (old\-style char and macro) MGVTBL Type of magic +\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-\- +\& \e0 PERL_MAGIC_sv vtbl_sv Special scalar variable +\& # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) +\& % PERL_MAGIC_rhash (none) Extra data for restricted +\& hashes +\& * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace +\& vars +\& . PERL_MAGIC_pos vtbl_pos pos() lvalue +\& : PERL_MAGIC_symtab (none) Extra data for symbol +\& tables +\& < PERL_MAGIC_backref vtbl_backref For weak ref data +\& @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV +\& B PERL_MAGIC_bm vtbl_regexp Boyer\-Moore +\& (fast string search) +\& c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table +\& (AMT) on stash +\& D PERL_MAGIC_regdata vtbl_regdata Regex match position data +\& (@+ and @\- vars) +\& d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data +\& element +\& E PERL_MAGIC_env vtbl_env %ENV hash +\& e PERL_MAGIC_envelem vtbl_envelem %ENV hash element +\& f PERL_MAGIC_fm vtbl_regexp Formline +\& (\*(Aqcompiled\*(Aq format) +\& g PERL_MAGIC_regex_global vtbl_mglob m//g target +\& H PERL_MAGIC_hints vtbl_hints %^H hash +\& h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element +\& I PERL_MAGIC_isa vtbl_isa @ISA array +\& i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element +\& k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue +\& L PERL_MAGIC_dbfile (none) Debugger %_mg_ptr; +\& ... +\& } +.Ve +.PP +Also note that the \f(CW\*(C`sv_set*()\*(C'\fR and \f(CW\*(C`sv_cat*()\*(C'\fR functions described +earlier do \fBnot\fR invoke 'set' magic on their targets. This must +be done by the user either by calling the \f(CWSvSETMAGIC()\fR macro after +calling these functions, or by using one of the \f(CW\*(C`sv_set*_mg()\*(C'\fR or +\&\f(CW\*(C`sv_cat*_mg()\*(C'\fR functions. Similarly, generic C code must call the +\&\f(CWSvGETMAGIC()\fR macro to invoke any 'get' magic if they use an SV +obtained from external sources in functions that don't handle magic. +See perlapi for a description of these functions. +For example, calls to the \f(CW\*(C`sv_cat*()\*(C'\fR functions typically need to be +followed by \f(CWSvSETMAGIC()\fR, but they don't need a prior \f(CWSvGETMAGIC()\fR +since their implementation handles 'get' magic. +.SS "Finding Magic" +.IX Subsection "Finding Magic" +.Vb 2 +\& MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that +\& * type */ +.Ve +.PP +This routine returns a pointer to a \f(CW\*(C`MAGIC\*(C'\fR structure stored in the SV. +If the SV does not have that magical +feature, \f(CW\*(C`NULL\*(C'\fR is returned. If the +SV has multiple instances of that magical feature, the first one will be +returned. \f(CW\*(C`mg_findext\*(C'\fR can be used +to find a \f(CW\*(C`MAGIC\*(C'\fR structure of an SV +based on both its magic type and its magic virtual table: +.PP +.Vb 1 +\& MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); +.Ve +.PP +Also, if the SV passed to \f(CW\*(C`mg_find\*(C'\fR or \f(CW\*(C`mg_findext\*(C'\fR is not of type +SVt_PVMG, Perl may core dump. +.PP +.Vb 1 +\& int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); +.Ve +.PP +This routine checks to see what types of magic \f(CW\*(C`sv\*(C'\fR has. If the mg_type +field is an uppercase letter, then the mg_obj is copied to \f(CW\*(C`nsv\*(C'\fR, but +the mg_type field is changed to be the lowercase letter. +.SS "Understanding the Magic of Tied Hashes and Arrays" +.IX Subsection "Understanding the Magic of Tied Hashes and Arrays" +Tied hashes and arrays are magical beasts of the \f(CW\*(C`PERL_MAGIC_tied\*(C'\fR +magic type. +.PP +WARNING: As of the 5.004 release, proper usage of the array and hash +access functions requires understanding a few caveats. Some +of these caveats are actually considered bugs in the API, to be fixed +in later releases, and are bracketed with [MAYCHANGE] below. If +you find yourself actually applying such information in this section, be +aware that the behavior may change in the future, umm, without warning. +.PP +The perl tie function associates a variable with an object that implements +the various GET, SET, etc methods. To perform the equivalent of the perl +tie function from an XSUB, you must mimic this behaviour. The code below +carries out the necessary steps \-\- firstly it creates a new hash, and then +creates a second hash which it blesses into the class which will implement +the tie methods. Lastly it ties the two hashes together, and returns a +reference to the new tied hash. Note that the code below does NOT call the +TIEHASH method in the MyTie class \- +see "Calling Perl Routines from within C Programs" for details on how +to do this. +.PP +.Vb 10 +\& SV* +\& mytie() +\& PREINIT: +\& HV *hash; +\& HV *stash; +\& SV *tie; +\& CODE: +\& hash = newHV(); +\& tie = newRV_noinc((SV*)newHV()); +\& stash = gv_stashpv("MyTie", GV_ADD); +\& sv_bless(tie, stash); +\& hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); +\& RETVAL = newRV_noinc(hash); +\& OUTPUT: +\& RETVAL +.Ve +.PP +The \f(CW\*(C`av_store\*(C'\fR function, when given a tied array argument, merely +copies the magic of the array onto the value to be "stored", using +\&\f(CW\*(C`mg_copy\*(C'\fR. It may also return NULL, indicating that the value did not +actually need to be stored in the array. [MAYCHANGE] After a call to +\&\f(CW\*(C`av_store\*(C'\fR on a tied array, the caller will usually need to call +\&\f(CWmg_set(val)\fR to actually invoke the perl level "STORE" method on the +TIEARRAY object. If \f(CW\*(C`av_store\*(C'\fR did return NULL, a call to +\&\f(CWSvREFCNT_dec(val)\fR will also be usually necessary to avoid a memory +leak. [/MAYCHANGE] +.PP +The previous paragraph is applicable verbatim to tied hash access using the +\&\f(CW\*(C`hv_store\*(C'\fR and \f(CW\*(C`hv_store_ent\*(C'\fR functions as well. +.PP +\&\f(CW\*(C`av_fetch\*(C'\fR and the corresponding hash functions \f(CW\*(C`hv_fetch\*(C'\fR and +\&\f(CW\*(C`hv_fetch_ent\*(C'\fR actually return an undefined mortal value whose magic +has been initialized using \f(CW\*(C`mg_copy\*(C'\fR. Note the value so returned does not +need to be deallocated, as it is already mortal. [MAYCHANGE] But you will +need to call \f(CWmg_get()\fR on the returned value in order to actually invoke +the perl level "FETCH" method on the underlying TIE object. Similarly, +you may also call \f(CWmg_set()\fR on the return value after possibly assigning +a suitable value to it using \f(CW\*(C`sv_setsv\*(C'\fR, which will invoke the "STORE" +method on the TIE object. [/MAYCHANGE] +.PP +[MAYCHANGE] +In other words, the array or hash fetch/store functions don't really +fetch and store actual values in the case of tied arrays and hashes. They +merely call \f(CW\*(C`mg_copy\*(C'\fR to attach magic to the values that were meant to be +"stored" or "fetched". Later calls to \f(CW\*(C`mg_get\*(C'\fR and \f(CW\*(C`mg_set\*(C'\fR actually +do the job of invoking the TIE methods on the underlying objects. Thus +the magic mechanism currently implements a kind of lazy access to arrays +and hashes. +.PP +Currently (as of perl version 5.004), use of the hash and array access +functions requires the user to be aware of whether they are operating on +"normal" hashes and arrays, or on their tied variants. The API may be +changed to provide more transparent access to both tied and normal data +types in future versions. +[/MAYCHANGE] +.PP +You would do well to understand that the TIEARRAY and TIEHASH interfaces +are mere sugar to invoke some perl method calls while using the uniform hash +and array syntax. The use of this sugar imposes some overhead (typically +about two to four extra opcodes per FETCH/STORE operation, in addition to +the creation of all the mortal variables required to invoke the methods). +This overhead will be comparatively small if the TIE methods are themselves +substantial, but if they are only a few statements long, the overhead +will not be insignificant. +.SS "Localizing changes" +.IX Subsection "Localizing changes" +Perl has a very handy construction +.PP +.Vb 4 +\& { +\& local $var = 2; +\& ... +\& } +.Ve +.PP +This construction is \fIapproximately\fR equivalent to +.PP +.Vb 6 +\& { +\& my $oldvar = $var; +\& $var = 2; +\& ... +\& $var = $oldvar; +\& } +.Ve +.PP +The biggest difference is that the first construction would +reinstate the initial value of \f(CW$var\fR, irrespective of how control exits +the block: \f(CW\*(C`goto\*(C'\fR, \f(CW\*(C`return\*(C'\fR, \f(CW\*(C`die\*(C'\fR/\f(CW\*(C`eval\*(C'\fR, etc. It is a little bit +more efficient as well. +.PP +There is a way to achieve a similar task from C via Perl API: create a +\&\fIpseudo-block\fR, and arrange for some changes to be automatically +undone at the end of it, either explicit, or via a non-local exit (via +\&\fBdie()\fR). A \fIblock\fR\-like construct is created by a pair of +\&\f(CW\*(C`ENTER\*(C'\fR/\f(CW\*(C`LEAVE\*(C'\fR macros (see "Returning a Scalar" in perlcall). +Such a construct may be created specially for some important localized +task, or an existing one (like boundaries of enclosing Perl +subroutine/block, or an existing pair for freeing TMPs) may be +used. (In the second case the overhead of additional localization must +be almost negligible.) Note that any XSUB is automatically enclosed in +an \f(CW\*(C`ENTER\*(C'\fR/\f(CW\*(C`LEAVE\*(C'\fR pair. +.PP +Inside such a \fIpseudo-block\fR the following service is available: +.ie n .IP """SAVEINT(int i)""" 4 +.el .IP "\f(CWSAVEINT(int i)\fR" 4 +.IX Item "SAVEINT(int i)" +.PD 0 +.ie n .IP """SAVEIV(IV i)""" 4 +.el .IP "\f(CWSAVEIV(IV i)\fR" 4 +.IX Item "SAVEIV(IV i)" +.ie n .IP """SAVEI32(I32 i)""" 4 +.el .IP "\f(CWSAVEI32(I32 i)\fR" 4 +.IX Item "SAVEI32(I32 i)" +.ie n .IP """SAVELONG(long i)""" 4 +.el .IP "\f(CWSAVELONG(long i)\fR" 4 +.IX Item "SAVELONG(long i)" +.ie n .IP """SAVEI8(I8 i)""" 4 +.el .IP "\f(CWSAVEI8(I8 i)\fR" 4 +.IX Item "SAVEI8(I8 i)" +.ie n .IP """SAVEI16(I16 i)""" 4 +.el .IP "\f(CWSAVEI16(I16 i)\fR" 4 +.IX Item "SAVEI16(I16 i)" +.ie n .IP """SAVEBOOL(int i)""" 4 +.el .IP "\f(CWSAVEBOOL(int i)\fR" 4 +.IX Item "SAVEBOOL(int i)" +.ie n .IP """SAVESTRLEN(STRLEN i)""" 4 +.el .IP "\f(CWSAVESTRLEN(STRLEN i)\fR" 4 +.IX Item "SAVESTRLEN(STRLEN i)" +.PD +These macros arrange things to restore the value of integer variable +\&\f(CW\*(C`i\*(C'\fR at the end of the enclosing \fIpseudo-block\fR. +.ie n .IP SAVESPTR(s) 4 +.el .IP \f(CWSAVESPTR(s)\fR 4 +.IX Item "SAVESPTR(s)" +.PD 0 +.ie n .IP SAVEPPTR(p) 4 +.el .IP \f(CWSAVEPPTR(p)\fR 4 +.IX Item "SAVEPPTR(p)" +.PD +These macros arrange things to restore the value of pointers \f(CW\*(C`s\*(C'\fR and +\&\f(CW\*(C`p\*(C'\fR. \f(CW\*(C`s\*(C'\fR must be a pointer of a type which survives conversion to +\&\f(CW\*(C`SV*\*(C'\fR and back, \f(CW\*(C`p\*(C'\fR should be able to survive conversion to \f(CW\*(C`char*\*(C'\fR +and back. +.ie n .IP """SAVERCPV(char **ppv)""" 4 +.el .IP "\f(CWSAVERCPV(char **ppv)\fR" 4 +.IX Item "SAVERCPV(char **ppv)" +This macro arranges to restore the value of a \f(CW\*(C`char *\*(C'\fR variable which +was allocated with a call to \f(CWrcpv_new()\fR to its previous state when +the current pseudo block is completed. The pointer stored in \f(CW*ppv\fR at +the time of the call will be refcount incremented and stored on the save +stack. Later when the current \fIpseudo-block\fR is completed the value +stored in \f(CW*ppv\fR will be refcount decremented, and the previous value +restored from the savestack which will also be refcount decremented. +.Sp +This is the \f(CW\*(C`RCPV\*(C'\fR equivalent of \f(CWSAVEGENERICSV()\fR. +.ie n .IP """SAVEGENERICSV(SV **psv)""" 4 +.el .IP "\f(CWSAVEGENERICSV(SV **psv)\fR" 4 +.IX Item "SAVEGENERICSV(SV **psv)" +This macro arranges to restore the value of a \f(CW\*(C`SV *\*(C'\fR variable to its +previous state when the current pseudo block is completed. The pointer +stored in \f(CW*psv\fR at the time of the call will be refcount incremented +and stored on the save stack. Later when the current \fIpseudo-block\fR is +completed the value stored in \f(CW*ppv\fR will be refcount decremented, and +the previous value restored from the savestack which will also be refcount +decremented. This the C equivalent of \f(CW\*(C`local $sv\*(C'\fR. +.ie n .IP """SAVEFREESV(SV *sv)""" 4 +.el .IP "\f(CWSAVEFREESV(SV *sv)\fR" 4 +.IX Item "SAVEFREESV(SV *sv)" +The refcount of \f(CW\*(C`sv\*(C'\fR will be decremented at the end of +\&\fIpseudo-block\fR. This is similar to \f(CW\*(C`sv_2mortal\*(C'\fR in that it is also a +mechanism for doing a delayed \f(CW\*(C`SvREFCNT_dec\*(C'\fR. However, while \f(CW\*(C`sv_2mortal\*(C'\fR +extends the lifetime of \f(CW\*(C`sv\*(C'\fR until the beginning of the next statement, +\&\f(CW\*(C`SAVEFREESV\*(C'\fR extends it until the end of the enclosing scope. These +lifetimes can be wildly different. +.Sp +Also compare \f(CW\*(C`SAVEMORTALIZESV\*(C'\fR. +.ie n .IP """SAVEMORTALIZESV(SV *sv)""" 4 +.el .IP "\f(CWSAVEMORTALIZESV(SV *sv)\fR" 4 +.IX Item "SAVEMORTALIZESV(SV *sv)" +Just like \f(CW\*(C`SAVEFREESV\*(C'\fR, but mortalizes \f(CW\*(C`sv\*(C'\fR at the end of the current +scope instead of decrementing its reference count. This usually has the +effect of keeping \f(CW\*(C`sv\*(C'\fR alive until the statement that called the currently +live scope has finished executing. +.ie n .IP """SAVEFREEOP(OP *op)""" 4 +.el .IP "\f(CWSAVEFREEOP(OP *op)\fR" 4 +.IX Item "SAVEFREEOP(OP *op)" +The \f(CW\*(C`OP *\*(C'\fR is \f(CWop_free()\fRed at the end of \fIpseudo-block\fR. +.ie n .IP SAVEFREEPV(p) 4 +.el .IP \f(CWSAVEFREEPV(p)\fR 4 +.IX Item "SAVEFREEPV(p)" +The chunk of memory which is pointed to by \f(CW\*(C`p\*(C'\fR is \f(CWSafefree()\fRed at the +end of the current \fIpseudo-block\fR. +.ie n .IP """SAVEFREERCPV(char *pv)""" 4 +.el .IP "\f(CWSAVEFREERCPV(char *pv)\fR" 4 +.IX Item "SAVEFREERCPV(char *pv)" +Ensures that a \f(CW\*(C`char *\*(C'\fR which was created by a call to \f(CWrcpv_new()\fR is +\&\f(CWrcpv_free()\fRed at the end of the current \fIpseudo-block\fR. +.Sp +This is the RCPV equivalent of \f(CWSAVEFREESV()\fR. +.ie n .IP """SAVECLEARSV(SV *sv)""" 4 +.el .IP "\f(CWSAVECLEARSV(SV *sv)\fR" 4 +.IX Item "SAVECLEARSV(SV *sv)" +Clears a slot in the current scratchpad which corresponds to \f(CW\*(C`sv\*(C'\fR at +the end of \fIpseudo-block\fR. +.ie n .IP """SAVEDELETE(HV *hv, char *key, I32 length)""" 4 +.el .IP "\f(CWSAVEDELETE(HV *hv, char *key, I32 length)\fR" 4 +.IX Item "SAVEDELETE(HV *hv, char *key, I32 length)" +The key \f(CW\*(C`key\*(C'\fR of \f(CW\*(C`hv\*(C'\fR is deleted at the end of \fIpseudo-block\fR. The +string pointed to by \f(CW\*(C`key\*(C'\fR is \fBSafefree()\fRed. If one has a \fIkey\fR in +short-lived storage, the corresponding string may be reallocated like +this: +.Sp +.Vb 1 +\& SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); +.Ve +.ie n .IP """SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)""" 4 +.el .IP "\f(CWSAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)\fR" 4 +.IX Item "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)" +At the end of \fIpseudo-block\fR the function \f(CW\*(C`f\*(C'\fR is called with the +only argument \f(CW\*(C`p\*(C'\fR which may be NULL. +.ie n .IP """SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)""" 4 +.el .IP "\f(CWSAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)\fR" 4 +.IX Item "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)" +At the end of \fIpseudo-block\fR the function \f(CW\*(C`f\*(C'\fR is called with the +implicit context argument (if any), and \f(CW\*(C`p\*(C'\fR which may be NULL. +.Sp +Note the \fIend of the current pseudo-block\fR may occur much later than +the \fIend of the current statement\fR. You may wish to look at the +\&\f(CWMORTALDESTRUCTOR_X()\fR macro instead. +.ie n .IP """MORTALSVFUNC_X(SVFUNC_t f, SV *sv)""" 4 +.el .IP "\f(CWMORTALSVFUNC_X(SVFUNC_t f, SV *sv)\fR" 4 +.IX Item "MORTALSVFUNC_X(SVFUNC_t f, SV *sv)" +At the end of \fIthe current statement\fR the function \f(CW\*(C`f\*(C'\fR is called with +the implicit context argument (if any), and \f(CW\*(C`sv\*(C'\fR which may be NULL. +.Sp +Be aware that the parameter argument to the destructor function differs +from the related \f(CWSAVEDESTRUCTOR_X()\fR in that it MUST be either NULL or +an \f(CW\*(C`SV*\*(C'\fR. +.Sp +Note the \fIend of the current statement\fR may occur much before the +the \fIend of the current pseudo-block\fR. You may wish to look at the +\&\f(CWSAVEDESTRUCTOR_X()\fR macro instead. +.ie n .IP """MORTALDESTRUCTOR_SV(SV *coderef, SV *args)""" 4 +.el .IP "\f(CWMORTALDESTRUCTOR_SV(SV *coderef, SV *args)\fR" 4 +.IX Item "MORTALDESTRUCTOR_SV(SV *coderef, SV *args)" +At the end of \fIthe current statement\fR the Perl function contained in +\&\f(CW\*(C`coderef\*(C'\fR is called with the arguments provided (if any) in \f(CW\*(C`args\*(C'\fR. +See the documentation for \f(CWmortal_destructor_sv()\fR for details on +the \f(CW\*(C`args\*(C'\fR parameter is handled. +.Sp +Note the \fIend of the current statement\fR may occur much before the +the \fIend of the current pseudo-block\fR. If you wish to call a perl +function at the end of the current pseudo block you should use the +\&\f(CWSAVEDESTRUCTOR_X()\fR API instead, which will require you create a +C wrapper to call the Perl function. +.ie n .IP SAVESTACK_POS() 4 +.el .IP \f(CWSAVESTACK_POS()\fR 4 +.IX Item "SAVESTACK_POS()" +The current offset on the Perl internal stack (cf. \f(CW\*(C`SP\*(C'\fR) is restored +at the end of \fIpseudo-block\fR. +.PP +The following API list contains functions, thus one needs to +provide pointers to the modifiable data explicitly (either C pointers, +or Perlish \f(CW\*(C`GV *\*(C'\fRs). Where the above macros take \f(CW\*(C`int\*(C'\fR, a similar +function takes \f(CW\*(C`int *\*(C'\fR. +.PP +Other macros above have functions implementing them, but its probably +best to just use the macro, and not those or the ones below. +.ie n .IP """SV* save_scalar(GV *gv)""" 4 +.el .IP "\f(CWSV* save_scalar(GV *gv)\fR" 4 +.IX Item "SV* save_scalar(GV *gv)" +Equivalent to Perl code \f(CW\*(C`local $gv\*(C'\fR. +.ie n .IP """AV* save_ary(GV *gv)""" 4 +.el .IP "\f(CWAV* save_ary(GV *gv)\fR" 4 +.IX Item "AV* save_ary(GV *gv)" +.PD 0 +.ie n .IP """HV* save_hash(GV *gv)""" 4 +.el .IP "\f(CWHV* save_hash(GV *gv)\fR" 4 +.IX Item "HV* save_hash(GV *gv)" +.PD +Similar to \f(CW\*(C`save_scalar\*(C'\fR, but localize \f(CW@gv\fR and \f(CW%gv\fR. +.ie n .IP """void save_item(SV *item)""" 4 +.el .IP "\f(CWvoid save_item(SV *item)\fR" 4 +.IX Item "void save_item(SV *item)" +Duplicates the current value of \f(CW\*(C`SV\*(C'\fR. On the exit from the current +\&\f(CW\*(C`ENTER\*(C'\fR/\f(CW\*(C`LEAVE\*(C'\fR \fIpseudo-block\fR the value of \f(CW\*(C`SV\*(C'\fR will be restored +using the stored value. It doesn't handle magic. Use \f(CW\*(C`save_scalar\*(C'\fR if +magic is affected. +.ie n .IP """SV* save_svref(SV **sptr)""" 4 +.el .IP "\f(CWSV* save_svref(SV **sptr)\fR" 4 +.IX Item "SV* save_svref(SV **sptr)" +Similar to \f(CW\*(C`save_scalar\*(C'\fR, but will reinstate an \f(CW\*(C`SV *\*(C'\fR. +.ie n .IP """void save_aptr(AV **aptr)""" 4 +.el .IP "\f(CWvoid save_aptr(AV **aptr)\fR" 4 +.IX Item "void save_aptr(AV **aptr)" +.PD 0 +.ie n .IP """void save_hptr(HV **hptr)""" 4 +.el .IP "\f(CWvoid save_hptr(HV **hptr)\fR" 4 +.IX Item "void save_hptr(HV **hptr)" +.PD +Similar to \f(CW\*(C`save_svref\*(C'\fR, but localize \f(CW\*(C`AV *\*(C'\fR and \f(CW\*(C`HV *\*(C'\fR. +.PP +The \f(CW\*(C`Alias\*(C'\fR module implements localization of the basic types within the +\&\fIcaller's scope\fR. People who are interested in how to localize things in +the containing scope should take a look there too. +.SH Subroutines +.IX Header "Subroutines" +.SS "XSUBs and the Argument Stack" +.IX Subsection "XSUBs and the Argument Stack" +The XSUB mechanism is a simple way for Perl programs to access C subroutines. +An XSUB routine will have a stack that contains the arguments from the Perl +program, and a way to map from the Perl data structures to a C equivalent. +.PP +The stack arguments are accessible through the \f(CWST(n)\fR macro, which returns +the \f(CW\*(C`n\*(C'\fR'th stack argument. Argument 0 is the first argument passed in the +Perl subroutine call. These arguments are \f(CW\*(C`SV*\*(C'\fR, and can be used anywhere +an \f(CW\*(C`SV*\*(C'\fR is used. +.PP +Most of the time, output from the C routine can be handled through use of +the RETVAL and OUTPUT directives. However, there are some cases where the +argument stack is not already long enough to handle all the return values. +An example is the POSIX \fBtzname()\fR call, which takes no arguments, but returns +two, the local time zone's standard and summer time abbreviations. +.PP +To handle this situation, the PPCODE directive is used and the stack is +extended using the macro: +.PP +.Vb 1 +\& EXTEND(SP, num); +.Ve +.PP +where \f(CW\*(C`SP\*(C'\fR is the macro that represents the local copy of the stack pointer, +and \f(CW\*(C`num\*(C'\fR is the number of elements the stack should be extended by. +.PP +Now that there is room on the stack, values can be pushed on it using \f(CW\*(C`PUSHs\*(C'\fR +macro. The pushed values will often need to be "mortal" (See +"Reference Counts and Mortality"): +.PP +.Vb 7 +\& PUSHs(sv_2mortal(newSViv(an_integer))) +\& PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) +\& PUSHs(sv_2mortal(newSVnv(a_double))) +\& PUSHs(sv_2mortal(newSVpv("Some String",0))) +\& /* Although the last example is better written as the more +\& * efficient: */ +\& PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) +.Ve +.PP +And now the Perl program calling \f(CW\*(C`tzname\*(C'\fR, the two values will be assigned +as in: +.PP +.Vb 1 +\& ($standard_abbrev, $summer_abbrev) = POSIX::tzname; +.Ve +.PP +An alternate (and possibly simpler) method to pushing values on the stack is +to use the macro: +.PP +.Vb 1 +\& XPUSHs(SV*) +.Ve +.PP +This macro automatically adjusts the stack for you, if needed. Thus, you +do not need to call \f(CW\*(C`EXTEND\*(C'\fR to extend the stack. +.PP +Despite their suggestions in earlier versions of this document the macros +\&\f(CW\*(C`(X)PUSH[iunp]\*(C'\fR are \fInot\fR suited to XSUBs which return multiple results. +For that, either stick to the \f(CW\*(C`(X)PUSHs\*(C'\fR macros shown above, or use the new +\&\f(CW\*(C`m(X)PUSH[iunp]\*(C'\fR macros instead; see "Putting a C value on Perl stack". +.PP +For more information, consult perlxs and perlxstut. +.SS "Autoloading with XSUBs" +.IX Subsection "Autoloading with XSUBs" +If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the +fully-qualified name of the autoloaded subroutine in the \f(CW$AUTOLOAD\fR variable +of the XSUB's package. +.PP +But it also puts the same information in certain fields of the XSUB itself: +.PP +.Vb 4 +\& HV *stash = CvSTASH(cv); +\& const char *subname = SvPVX(cv); +\& STRLEN name_length = SvCUR(cv); /* in bytes */ +\& U32 is_utf8 = SvUTF8(cv); +.Ve +.PP +\&\f(CWSvPVX(cv)\fR contains just the sub name itself, not including the package. +For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, +\&\f(CWCvSTASH(cv)\fR returns NULL during a method call on a nonexistent package. +.PP +\&\fBNote\fR: Setting \f(CW$AUTOLOAD\fR stopped working in 5.6.1, which did not support +XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the +XSUB itself. Perl 5.16.0 restored the setting of \f(CW$AUTOLOAD\fR. If you need +to support 5.8\-5.14, use the XSUB's fields. +.SS "Calling Perl Routines from within C Programs" +.IX Subsection "Calling Perl Routines from within C Programs" +There are four routines that can be used to call a Perl subroutine from +within a C program. These four are: +.PP +.Vb 4 +\& I32 call_sv(SV*, I32); +\& I32 call_pv(const char*, I32); +\& I32 call_method(const char*, I32); +\& I32 call_argv(const char*, I32, char**); +.Ve +.PP +The routine most often used is \f(CW\*(C`call_sv\*(C'\fR. The \f(CW\*(C`SV*\*(C'\fR argument +contains either the name of the Perl subroutine to be called, or a +reference to the subroutine. The second argument consists of flags +that control the context in which the subroutine is called, whether +or not the subroutine is being passed arguments, how errors should be +trapped, and how to treat return values. +.PP +All four routines return the number of arguments that the subroutine returned +on the Perl stack. +.PP +These routines used to be called \f(CW\*(C`perl_call_sv\*(C'\fR, etc., before Perl v5.6.0, +but those names are now deprecated; macros of the same name are provided for +compatibility. +.PP +When using any of these routines (except \f(CW\*(C`call_argv\*(C'\fR), the programmer +must manipulate the Perl stack. These include the following macros and +functions: +.PP +.Vb 11 +\& dSP +\& SP +\& PUSHMARK() +\& PUTBACK +\& SPAGAIN +\& ENTER +\& SAVETMPS +\& FREETMPS +\& LEAVE +\& XPUSH*() +\& POP*() +.Ve +.PP +For a detailed description of calling conventions from C to Perl, +consult perlcall. +.SS "Putting a C value on Perl stack" +.IX Subsection "Putting a C value on Perl stack" +A lot of opcodes (this is an elementary operation in the internal perl +stack machine) put an SV* on the stack. However, as an optimization +the corresponding SV is (usually) not recreated each time. The opcodes +reuse specially assigned SVs (\fItarget\fRs) which are (as a corollary) +not constantly freed/created. +.PP +Each of the targets is created only once (but see +"Scratchpads and recursion" below), and when an opcode needs to put +an integer, a double, or a string on the stack, it just sets the +corresponding parts of its \fItarget\fR and puts the \fItarget\fR on stack. +.PP +The macro to put this target on stack is \f(CW\*(C`PUSHTARG\*(C'\fR, and it is +directly used in some opcodes, as well as indirectly in zillions of +others, which use it via \f(CW\*(C`(X)PUSH[iunp]\*(C'\fR. +.PP +Because the target is reused, you must be careful when pushing multiple +values on the stack. The following code will not do what you think: +.PP +.Vb 2 +\& XPUSHi(10); +\& XPUSHi(20); +.Ve +.PP +This translates as "set \f(CW\*(C`TARG\*(C'\fR to 10, push a pointer to \f(CW\*(C`TARG\*(C'\fR onto +the stack; set \f(CW\*(C`TARG\*(C'\fR to 20, push a pointer to \f(CW\*(C`TARG\*(C'\fR onto the stack". +At the end of the operation, the stack does not contain the values 10 +and 20, but actually contains two pointers to \f(CW\*(C`TARG\*(C'\fR, which we have set +to 20. +.PP +If you need to push multiple different values then you should either use +the \f(CW\*(C`(X)PUSHs\*(C'\fR macros, or else use the new \f(CW\*(C`m(X)PUSH[iunp]\*(C'\fR macros, +none of which make use of \f(CW\*(C`TARG\*(C'\fR. The \f(CW\*(C`(X)PUSHs\*(C'\fR macros simply push an +SV* on the stack, which, as noted under "XSUBs and the Argument Stack", +will often need to be "mortal". The new \f(CW\*(C`m(X)PUSH[iunp]\*(C'\fR macros make +this a little easier to achieve by creating a new mortal for you (via +\&\f(CW\*(C`(X)PUSHmortal\*(C'\fR), pushing that onto the stack (extending it if necessary +in the case of the \f(CW\*(C`mXPUSH[iunp]\*(C'\fR macros), and then setting its value. +Thus, instead of writing this to "fix" the example above: +.PP +.Vb 2 +\& XPUSHs(sv_2mortal(newSViv(10))) +\& XPUSHs(sv_2mortal(newSViv(20))) +.Ve +.PP +you can simply write: +.PP +.Vb 2 +\& mXPUSHi(10) +\& mXPUSHi(20) +.Ve +.PP +On a related note, if you do use \f(CW\*(C`(X)PUSH[iunp]\*(C'\fR, then you're going to +need a \f(CW\*(C`dTARG\*(C'\fR in your variable declarations so that the \f(CW\*(C`*PUSH*\*(C'\fR +macros can make use of the local variable \f(CW\*(C`TARG\*(C'\fR. See also +\&\f(CW\*(C`dTARGET\*(C'\fR and \f(CW\*(C`dXSTARG\*(C'\fR. +.SS Scratchpads +.IX Subsection "Scratchpads" +The question remains on when the SVs which are \fItarget\fRs for opcodes +are created. The answer is that they are created when the current +unit\-\-a subroutine or a file (for opcodes for statements outside of +subroutines)\-\-is compiled. During this time a special anonymous Perl +array is created, which is called a scratchpad for the current unit. +.PP +A scratchpad keeps SVs which are lexicals for the current unit and are +targets for opcodes. A previous version of this document +stated that one can deduce that an SV lives on a scratchpad +by looking on its flags: lexicals have \f(CW\*(C`SVs_PADMY\*(C'\fR set, and +\&\fItarget\fRs have \f(CW\*(C`SVs_PADTMP\*(C'\fR set. But this has never been fully true. +\&\f(CW\*(C`SVs_PADMY\*(C'\fR could be set on a variable that no longer resides in any pad. +While \fItarget\fRs do have \f(CW\*(C`SVs_PADTMP\*(C'\fR set, it can also be set on variables +that have never resided in a pad, but nonetheless act like \fItarget\fRs. As +of perl 5.21.5, the \f(CW\*(C`SVs_PADMY\*(C'\fR flag is no longer used and is defined as +0. \f(CWSvPADMY()\fR now returns true for anything without \f(CW\*(C`SVs_PADTMP\*(C'\fR. +.PP +The correspondence between OPs and \fItarget\fRs is not 1\-to\-1. Different +OPs in the compile tree of the unit can use the same target, if this +would not conflict with the expected life of the temporary. +.SS "Scratchpads and recursion" +.IX Subsection "Scratchpads and recursion" +In fact it is not 100% true that a compiled unit contains a pointer to +the scratchpad AV. In fact it contains a pointer to an AV of +(initially) one element, and this element is the scratchpad AV. Why do +we need an extra level of indirection? +.PP +The answer is \fBrecursion\fR, and maybe \fBthreads\fR. Both +these can create several execution pointers going into the same +subroutine. For the subroutine-child not write over the temporaries +for the subroutine-parent (lifespan of which covers the call to the +child), the parent and the child should have different +scratchpads. (\fIAnd\fR the lexicals should be separate anyway!) +.PP +So each subroutine is born with an array of scratchpads (of length 1). +On each entry to the subroutine it is checked that the current +depth of the recursion is not more than the length of this array, and +if it is, new scratchpad is created and pushed into the array. +.PP +The \fItarget\fRs on this scratchpad are \f(CW\*(C`undef\*(C'\fRs, but they are already +marked with correct flags. +.SH "Memory Allocation" +.IX Header "Memory Allocation" +.SS Allocation +.IX Subsection "Allocation" +All memory meant to be used with the Perl API functions should be manipulated +using the macros described in this section. The macros provide the necessary +transparency between differences in the actual malloc implementation that is +used within perl. +.PP +The following three macros are used to initially allocate memory : +.PP +.Vb 3 +\& Newx(pointer, number, type); +\& Newxc(pointer, number, type, cast); +\& Newxz(pointer, number, type); +.Ve +.PP +The first argument \f(CW\*(C`pointer\*(C'\fR should be the name of a variable that will +point to the newly allocated memory. +.PP +The second and third arguments \f(CW\*(C`number\*(C'\fR and \f(CW\*(C`type\*(C'\fR specify how many of +the specified type of data structure should be allocated. The argument +\&\f(CW\*(C`type\*(C'\fR is passed to \f(CW\*(C`sizeof\*(C'\fR. The final argument to \f(CW\*(C`Newxc\*(C'\fR, \f(CW\*(C`cast\*(C'\fR, +should be used if the \f(CW\*(C`pointer\*(C'\fR argument is different from the \f(CW\*(C`type\*(C'\fR +argument. +.PP +Unlike the \f(CW\*(C`Newx\*(C'\fR and \f(CW\*(C`Newxc\*(C'\fR macros, the \f(CW\*(C`Newxz\*(C'\fR macro calls \f(CW\*(C`memzero\*(C'\fR +to zero out all the newly allocated memory. +.SS Reallocation +.IX Subsection "Reallocation" +.Vb 3 +\& Renew(pointer, number, type); +\& Renewc(pointer, number, type, cast); +\& Safefree(pointer) +.Ve +.PP +These three macros are used to change a memory buffer size or to free a +piece of memory no longer needed. The arguments to \f(CW\*(C`Renew\*(C'\fR and \f(CW\*(C`Renewc\*(C'\fR +match those of \f(CW\*(C`New\*(C'\fR and \f(CW\*(C`Newc\*(C'\fR with the exception of not needing the +"magic cookie" argument. +.SS Moving +.IX Subsection "Moving" +.Vb 3 +\& Move(source, dest, number, type); +\& Copy(source, dest, number, type); +\& Zero(dest, number, type); +.Ve +.PP +These three macros are used to move, copy, or zero out previously allocated +memory. The \f(CW\*(C`source\*(C'\fR and \f(CW\*(C`dest\*(C'\fR arguments point to the source and +destination starting points. Perl will move, copy, or zero out \f(CW\*(C`number\*(C'\fR +instances of the size of the \f(CW\*(C`type\*(C'\fR data structure (using the \f(CW\*(C`sizeof\*(C'\fR +function). +.SH PerlIO +.IX Header "PerlIO" +The most recent development releases of Perl have been experimenting with +removing Perl's dependency on the "normal" standard I/O suite and allowing +other stdio implementations to be used. This involves creating a new +abstraction layer that then calls whichever implementation of stdio Perl +was compiled with. All XSUBs should now use the functions in the PerlIO +abstraction layer and not make any assumptions about what kind of stdio +is being used. +.PP +For a complete description of the PerlIO abstraction, consult perlapio. +.SH "Compiled code" +.IX Header "Compiled code" +.SS "Code tree" +.IX Subsection "Code tree" +Here we describe the internal form your code is converted to by +Perl. Start with a simple example: +.PP +.Vb 1 +\& $a = $b + $c; +.Ve +.PP +This is converted to a tree similar to this one: +.PP +.Vb 5 +\& assign\-to +\& / \e +\& + $a +\& / \e +\& $b $c +.Ve +.PP +(but slightly more complicated). This tree reflects the way Perl +parsed your code, but has nothing to do with the execution order. +There is an additional "thread" going through the nodes of the tree +which shows the order of execution of the nodes. In our simplified +example above it looks like: +.PP +.Vb 1 +\& $b \-\-\-> $c \-\-\-> + \-\-\-> $a \-\-\-> assign\-to +.Ve +.PP +But with the actual compile tree for \f(CW\*(C`$a = $b + $c\*(C'\fR it is different: +some nodes \fIoptimized away\fR. As a corollary, though the actual tree +contains more nodes than our simplified example, the execution order +is the same as in our example. +.SS "Examining the tree" +.IX Subsection "Examining the tree" +If you have your perl compiled for debugging (usually done with +\&\f(CW\*(C`\-DDEBUGGING\*(C'\fR on the \f(CW\*(C`Configure\*(C'\fR command line), you may examine the +compiled tree by specifying \f(CW\*(C`\-Dx\*(C'\fR on the Perl command line. The +output takes several lines per node, and for \f(CW\*(C`$b+$c\*(C'\fR it looks like +this: +.PP +.Vb 10 +\& 5 TYPE = add ===> 6 +\& TARG = 1 +\& FLAGS = (SCALAR,KIDS) +\& { +\& TYPE = null ===> (4) +\& (was rv2sv) +\& FLAGS = (SCALAR,KIDS) +\& { +\& 3 TYPE = gvsv ===> 4 +\& FLAGS = (SCALAR) +\& GV = main::b +\& } +\& } +\& { +\& TYPE = null ===> (5) +\& (was rv2sv) +\& FLAGS = (SCALAR,KIDS) +\& { +\& 4 TYPE = gvsv ===> 5 +\& FLAGS = (SCALAR) +\& GV = main::c +\& } +\& } +.Ve +.PP +This tree has 5 nodes (one per \f(CW\*(C`TYPE\*(C'\fR specifier), only 3 of them are +not optimized away (one per number in the left column). The immediate +children of the given node correspond to \f(CW\*(C`{}\*(C'\fR pairs on the same level +of indentation, thus this listing corresponds to the tree: +.PP +.Vb 5 +\& add +\& / \e +\& null null +\& | | +\& gvsv gvsv +.Ve +.PP +The execution order is indicated by \f(CW\*(C`===>\*(C'\fR marks, thus it is \f(CW\*(C`3 +4 5 6\*(C'\fR (node \f(CW6\fR is not included into above listing), i.e., +\&\f(CW\*(C`gvsv gvsv add whatever\*(C'\fR. +.PP +Each of these nodes represents an op, a fundamental operation inside the +Perl core. The code which implements each operation can be found in the +\&\fIpp*.c\fR files; the function which implements the op with type \f(CW\*(C`gvsv\*(C'\fR +is \f(CW\*(C`pp_gvsv\*(C'\fR, and so on. As the tree above shows, different ops have +different numbers of children: \f(CW\*(C`add\*(C'\fR is a binary operator, as one would +expect, and so has two children. To accommodate the various different +numbers of children, there are various types of op data structure, and +they link together in different ways. +.PP +The simplest type of op structure is \f(CW\*(C`OP\*(C'\fR: this has no children. Unary +operators, \f(CW\*(C`UNOP\*(C'\fRs, have one child, and this is pointed to by the +\&\f(CW\*(C`op_first\*(C'\fR field. Binary operators (\f(CW\*(C`BINOP\*(C'\fRs) have not only an +\&\f(CW\*(C`op_first\*(C'\fR field but also an \f(CW\*(C`op_last\*(C'\fR field. The most complex type of +op is a \f(CW\*(C`LISTOP\*(C'\fR, which has any number of children. In this case, the +first child is pointed to by \f(CW\*(C`op_first\*(C'\fR and the last child by +\&\f(CW\*(C`op_last\*(C'\fR. The children in between can be found by iteratively +following the \f(CW\*(C`OpSIBLING\*(C'\fR pointer from the first child to the last (but +see below). +.PP +There are also some other op types: a \f(CW\*(C`PMOP\*(C'\fR holds a regular expression, +and has no children, and a \f(CW\*(C`LOOP\*(C'\fR may or may not have children. If the +\&\f(CW\*(C`op_children\*(C'\fR field is non-zero, it behaves like a \f(CW\*(C`LISTOP\*(C'\fR. To +complicate matters, if a \f(CW\*(C`UNOP\*(C'\fR is actually a \f(CW\*(C`null\*(C'\fR op after +optimization (see "Compile pass 2: context propagation") it will still +have children in accordance with its former type. +.PP +Finally, there is a \f(CW\*(C`LOGOP\*(C'\fR, or logic op. Like a \f(CW\*(C`LISTOP\*(C'\fR, this has one +or more children, but it doesn't have an \f(CW\*(C`op_last\*(C'\fR field: so you have to +follow \f(CW\*(C`op_first\*(C'\fR and then the \f(CW\*(C`OpSIBLING\*(C'\fR chain itself to find the +last child. Instead it has an \f(CW\*(C`op_other\*(C'\fR field, which is comparable to +the \f(CW\*(C`op_next\*(C'\fR field described below, and represents an alternate +execution path. Operators like \f(CW\*(C`and\*(C'\fR, \f(CW\*(C`or\*(C'\fR and \f(CW\*(C`?\*(C'\fR are \f(CW\*(C`LOGOP\*(C'\fRs. Note +that in general, \f(CW\*(C`op_other\*(C'\fR may not point to any of the direct children +of the \f(CW\*(C`LOGOP\*(C'\fR. +.PP +Starting in version 5.21.2, perls built with the experimental +define \f(CW\*(C`\-DPERL_OP_PARENT\*(C'\fR add an extra boolean flag for each op, +\&\f(CW\*(C`op_moresib\*(C'\fR. When not set, this indicates that this is the last op in an +\&\f(CW\*(C`OpSIBLING\*(C'\fR chain. This frees up the \f(CW\*(C`op_sibling\*(C'\fR field on the last +sibling to point back to the parent op. Under this build, that field is +also renamed \f(CW\*(C`op_sibparent\*(C'\fR to reflect its joint role. The macro +\&\f(CWOpSIBLING(o)\fR wraps this special behaviour, and always returns NULL on +the last sibling. With this build the \f(CWop_parent(o)\fR function can be +used to find the parent of any op. Thus for forward compatibility, you +should always use the \f(CWOpSIBLING(o)\fR macro rather than accessing +\&\f(CW\*(C`op_sibling\*(C'\fR directly. +.PP +Another way to examine the tree is to use a compiler back-end module, such +as B::Concise. +.SS "Compile pass 1: check routines" +.IX Subsection "Compile pass 1: check routines" +The tree is created by the compiler while \fIyacc\fR code feeds it +the constructions it recognizes. Since \fIyacc\fR works bottom-up, so does +the first pass of perl compilation. +.PP +What makes this pass interesting for perl developers is that some +optimization may be performed on this pass. This is optimization by +so-called "check routines". The correspondence between node names +and corresponding check routines is described in \fIopcode.pl\fR (do not +forget to run \f(CW\*(C`make regen_headers\*(C'\fR if you modify this file). +.PP +A check routine is called when the node is fully constructed except +for the execution-order thread. Since at this time there are no +back-links to the currently constructed node, one can do most any +operation to the top-level node, including freeing it and/or creating +new nodes above/below it. +.PP +The check routine returns the node which should be inserted into the +tree (if the top-level node was not modified, check routine returns +its argument). +.PP +By convention, check routines have names \f(CW\*(C`ck_*\*(C'\fR. They are usually +called from \f(CW\*(C`new*OP\*(C'\fR subroutines (or \f(CW\*(C`convert\*(C'\fR) (which in turn are +called from \fIperly.y\fR). +.SS "Compile pass 1a: constant folding" +.IX Subsection "Compile pass 1a: constant folding" +Immediately after the check routine is called the returned node is +checked for being compile-time executable. If it is (the value is +judged to be constant) it is immediately executed, and a \fIconstant\fR +node with the "return value" of the corresponding subtree is +substituted instead. The subtree is deleted. +.PP +If constant folding was not performed, the execution-order thread is +created. +.SS "Compile pass 2: context propagation" +.IX Subsection "Compile pass 2: context propagation" +When a context for a part of compile tree is known, it is propagated +down through the tree. At this time the context can have 5 values +(instead of 2 for runtime context): void, boolean, scalar, list, and +lvalue. In contrast with the pass 1 this pass is processed from top +to bottom: a node's context determines the context for its children. +.PP +Additional context-dependent optimizations are performed at this time. +Since at this moment the compile tree contains back-references (via +"thread" pointers), nodes cannot be \fBfree()\fRd now. To allow +optimized-away nodes at this stage, such nodes are \fBnull()\fRified instead +of \fBfree()\fRing (i.e. their type is changed to OP_NULL). +.SS "Compile pass 3: peephole optimization" +.IX Subsection "Compile pass 3: peephole optimization" +After the compile tree for a subroutine (or for an \f(CW\*(C`eval\*(C'\fR or a file) +is created, an additional pass over the code is performed. This pass +is neither top-down or bottom-up, but in the execution order (with +additional complications for conditionals). Optimizations performed +at this stage are subject to the same restrictions as in the pass 2. +.PP +Peephole optimizations are done by calling the function pointed to +by the global variable \f(CW\*(C`PL_peepp\*(C'\fR. By default, \f(CW\*(C`PL_peepp\*(C'\fR just +calls the function pointed to by the global variable \f(CW\*(C`PL_rpeepp\*(C'\fR. +By default, that performs some basic op fixups and optimisations along +the execution-order op chain, and recursively calls \f(CW\*(C`PL_rpeepp\*(C'\fR for +each side chain of ops (resulting from conditionals). Extensions may +provide additional optimisations or fixups, hooking into either the +per-subroutine or recursive stage, like this: +.PP +.Vb 10 +\& static peep_t prev_peepp; +\& static void my_peep(pTHX_ OP *o) +\& { +\& /* custom per\-subroutine optimisation goes here */ +\& prev_peepp(aTHX_ o); +\& /* custom per\-subroutine optimisation may also go here */ +\& } +\& BOOT: +\& prev_peepp = PL_peepp; +\& PL_peepp = my_peep; +\& +\& static peep_t prev_rpeepp; +\& static void my_rpeep(pTHX_ OP *first) +\& { +\& OP *o = first, *t = first; +\& for(; o = o\->op_next, t = t\->op_next) { +\& /* custom per\-op optimisation goes here */ +\& o = o\->op_next; +\& if (!o || o == t) break; +\& /* custom per\-op optimisation goes AND here */ +\& } +\& prev_rpeepp(aTHX_ orig_o); +\& } +\& BOOT: +\& prev_rpeepp = PL_rpeepp; +\& PL_rpeepp = my_rpeep; +.Ve +.SS "Pluggable runops" +.IX Subsection "Pluggable runops" +The compile tree is executed in a runops function. There are two runops +functions, in \fIrun.c\fR and in \fIdump.c\fR. \f(CW\*(C`Perl_runops_debug\*(C'\fR is used +with DEBUGGING and \f(CW\*(C`Perl_runops_standard\*(C'\fR is used otherwise. For fine +control over the execution of the compile tree it is possible to provide +your own runops function. +.PP +It's probably best to copy one of the existing runops functions and +change it to suit your needs. Then, in the BOOT section of your XS +file, add the line: +.PP +.Vb 1 +\& PL_runops = my_runops; +.Ve +.PP +This function should be as efficient as possible to keep your programs +running as fast as possible. +.SS "Compile-time scope hooks" +.IX Subsection "Compile-time scope hooks" +As of perl 5.14 it is possible to hook into the compile-time lexical +scope mechanism using \f(CW\*(C`Perl_blockhook_register\*(C'\fR. This is used like +this: +.PP +.Vb 2 +\& STATIC void my_start_hook(pTHX_ int full); +\& STATIC BHK my_hooks; +\& +\& BOOT: +\& BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); +\& Perl_blockhook_register(aTHX_ &my_hooks); +.Ve +.PP +This will arrange to have \f(CW\*(C`my_start_hook\*(C'\fR called at the start of +compiling every lexical scope. The available hooks are: +.ie n .IP """void bhk_start(pTHX_ int full)""" 4 +.el .IP "\f(CWvoid bhk_start(pTHX_ int full)\fR" 4 +.IX Item "void bhk_start(pTHX_ int full)" +This is called just after starting a new lexical scope. Note that Perl +code like +.Sp +.Vb 1 +\& if ($x) { ... } +.Ve +.Sp +creates two scopes: the first starts at the \f(CW\*(C`(\*(C'\fR and has \f(CW\*(C`full == 1\*(C'\fR, +the second starts at the \f(CW\*(C`{\*(C'\fR and has \f(CW\*(C`full == 0\*(C'\fR. Both end at the +\&\f(CW\*(C`}\*(C'\fR, so calls to \f(CW\*(C`start\*(C'\fR and \f(CW\*(C`pre\*(C'\fR/\f(CW\*(C`post_end\*(C'\fR will match. Anything +pushed onto the save stack by this hook will be popped just before the +scope ends (between the \f(CW\*(C`pre_\*(C'\fR and \f(CW\*(C`post_end\*(C'\fR hooks, in fact). +.ie n .IP """void bhk_pre_end(pTHX_ OP **o)""" 4 +.el .IP "\f(CWvoid bhk_pre_end(pTHX_ OP **o)\fR" 4 +.IX Item "void bhk_pre_end(pTHX_ OP **o)" +This is called at the end of a lexical scope, just before unwinding the +stack. \fIo\fR is the root of the optree representing the scope; it is a +double pointer so you can replace the OP if you need to. +.ie n .IP """void bhk_post_end(pTHX_ OP **o)""" 4 +.el .IP "\f(CWvoid bhk_post_end(pTHX_ OP **o)\fR" 4 +.IX Item "void bhk_post_end(pTHX_ OP **o)" +This is called at the end of a lexical scope, just after unwinding the +stack. \fIo\fR is as above. Note that it is possible for calls to \f(CW\*(C`pre_\*(C'\fR +and \f(CW\*(C`post_end\*(C'\fR to nest, if there is something on the save stack that +calls string eval. +.ie n .IP """void bhk_eval(pTHX_ OP *const o)""" 4 +.el .IP "\f(CWvoid bhk_eval(pTHX_ OP *const o)\fR" 4 +.IX Item "void bhk_eval(pTHX_ OP *const o)" +This is called just before starting to compile an \f(CW\*(C`eval STRING\*(C'\fR, \f(CW\*(C`do +FILE\*(C'\fR, \f(CW\*(C`require\*(C'\fR or \f(CW\*(C`use\*(C'\fR, after the eval has been set up. \fIo\fR is the +OP that requested the eval, and will normally be an \f(CW\*(C`OP_ENTEREVAL\*(C'\fR, +\&\f(CW\*(C`OP_DOFILE\*(C'\fR or \f(CW\*(C`OP_REQUIRE\*(C'\fR. +.PP +Once you have your hook functions, you need a \f(CW\*(C`BHK\*(C'\fR structure to put +them in. It's best to allocate it statically, since there is no way to +free it once it's registered. The function pointers should be inserted +into this structure using the \f(CW\*(C`BhkENTRY_set\*(C'\fR macro, which will also set +flags indicating which entries are valid. If you do need to allocate +your \f(CW\*(C`BHK\*(C'\fR dynamically for some reason, be sure to zero it before you +start. +.PP +Once registered, there is no mechanism to switch these hooks off, so if +that is necessary you will need to do this yourself. An entry in \f(CW\*(C`%^H\*(C'\fR +is probably the best way, so the effect is lexically scoped; however it +is also possible to use the \f(CW\*(C`BhkDISABLE\*(C'\fR and \f(CW\*(C`BhkENABLE\*(C'\fR macros to +temporarily switch entries on and off. You should also be aware that +generally speaking at least one scope will have opened before your +extension is loaded, so you will see some \f(CW\*(C`pre\*(C'\fR/\f(CW\*(C`post_end\*(C'\fR pairs that +didn't have a matching \f(CW\*(C`start\*(C'\fR. +.ie n .SH "Examining internal data structures with the ""dump"" functions" +.el .SH "Examining internal data structures with the \f(CWdump\fP functions" +.IX Header "Examining internal data structures with the dump functions" +To aid debugging, the source file \fIdump.c\fR contains a number of +functions which produce formatted output of internal data structures. +.PP +The most commonly used of these functions is \f(CW\*(C`Perl_sv_dump\*(C'\fR; it's used +for dumping SVs, AVs, HVs, and CVs. The \f(CW\*(C`Devel::Peek\*(C'\fR module calls +\&\f(CW\*(C`sv_dump\*(C'\fR to produce debugging output from Perl-space, so users of that +module should already be familiar with its format. +.PP +\&\f(CW\*(C`Perl_op_dump\*(C'\fR can be used to dump an \f(CW\*(C`OP\*(C'\fR structure or any of its +derivatives, and produces output similar to \f(CW\*(C`perl \-Dx\*(C'\fR; in fact, +\&\f(CW\*(C`Perl_dump_eval\*(C'\fR will dump the main root of the code being evaluated, +exactly like \f(CW\*(C`\-Dx\*(C'\fR. +.PP +Other useful functions are \f(CW\*(C`Perl_dump_sub\*(C'\fR, which turns a \f(CW\*(C`GV\*(C'\fR into an +op tree, \f(CW\*(C`Perl_dump_packsubs\*(C'\fR which calls \f(CW\*(C`Perl_dump_sub\*(C'\fR on all the +subroutines in a package like so: (Thankfully, these are all xsubs, so +there is no op tree) +.PP +.Vb 1 +\& (gdb) print Perl_dump_packsubs(PL_defstash) +\& +\& SUB attributes::bootstrap = (xsub 0x811fedc 0) +\& +\& SUB UNIVERSAL::can = (xsub 0x811f50c 0) +\& +\& SUB UNIVERSAL::isa = (xsub 0x811f304 0) +\& +\& SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) +\& +\& SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) +.Ve +.PP +and \f(CW\*(C`Perl_dump_all\*(C'\fR, which dumps all the subroutines in the stash and +the op tree of the main root. +.SH "How multiple interpreters and concurrency are supported" +.IX Header "How multiple interpreters and concurrency are supported" +.SS "Background and MULTIPLICITY" +.IX Subsection "Background and MULTIPLICITY" +The Perl interpreter can be regarded as a closed box: it has an API +for feeding it code or otherwise making it do things, but it also has +functions for its own use. This smells a lot like an object, and +there is a way for you to build Perl so that you can have multiple +interpreters, with one interpreter represented either as a C structure, +or inside a thread-specific structure. These structures contain all +the context, the state of that interpreter. +.PP +The macro that controls the major Perl build flavor is MULTIPLICITY. The +MULTIPLICITY build has a C structure that packages all the interpreter +state, which is being passed to various perl functions as a "hidden" +first argument. MULTIPLICITY makes multi-threaded perls possible (with the +ithreads threading model, related to the macro USE_ITHREADS.) +.PP +PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY. +.PP +To see whether you have non-const data you can use a BSD (or GNU) +compatible \f(CW\*(C`nm\*(C'\fR: +.PP +.Vb 1 +\& nm libperl.a | grep \-v \*(Aq [TURtr] \*(Aq +.Ve +.PP +If this displays any \f(CW\*(C`D\*(C'\fR or \f(CW\*(C`d\*(C'\fR symbols (or possibly \f(CW\*(C`C\*(C'\fR or \f(CW\*(C`c\*(C'\fR), +you have non-const data. The symbols the \f(CW\*(C`grep\*(C'\fR removed are as follows: +\&\f(CW\*(C`Tt\*(C'\fR are \fItext\fR, or code, the \f(CW\*(C`Rr\*(C'\fR are \fIread-only\fR (const) data, +and the \f(CW\*(C`U\*(C'\fR is , external symbols referred to. +.PP +The test \fIt/porting/libperl.t\fR does this kind of symbol sanity +checking on \f(CW\*(C`libperl.a\*(C'\fR. +.PP +All this obviously requires a way for the Perl internal functions to be +either subroutines taking some kind of structure as the first +argument, or subroutines taking nothing as the first argument. To +enable these two very different ways of building the interpreter, +the Perl source (as it does in so many other situations) makes heavy +use of macros and subroutine naming conventions. +.PP +First problem: deciding which functions will be public API functions and +which will be private. All functions whose names begin \f(CW\*(C`S_\*(C'\fR are private +(think "S" for "secret" or "static"). All other functions begin with +"Perl_", but just because a function begins with "Perl_" does not mean it is +part of the API. (See "Internal +Functions".) The easiest way to be \fBsure\fR a +function is part of the API is to find its entry in perlapi. +If it exists in perlapi, it's part of the API. If it doesn't, and you +think it should be (i.e., you need it for your extension), submit an issue at + explaining why you think it should be. +.PP +Second problem: there must be a syntax so that the same subroutine +declarations and calls can pass a structure as their first argument, +or pass nothing. To solve this, the subroutines are named and +declared in a particular way. Here's a typical start of a static +function used within the Perl guts: +.PP +.Vb 2 +\& STATIC void +\& S_incline(pTHX_ char *s) +.Ve +.PP +STATIC becomes "static" in C, and may be #define'd to nothing in some +configurations in the future. +.PP +A public function (i.e. part of the internal API, but not necessarily +sanctioned for use in extensions) begins like this: +.PP +.Vb 2 +\& void +\& Perl_sv_setiv(pTHX_ SV* dsv, IV num) +.Ve +.PP +\&\f(CW\*(C`pTHX_\*(C'\fR is one of a number of macros (in \fIperl.h\fR) that hide the +details of the interpreter's context. THX stands for "thread", "this", +or "thingy", as the case may be. (And no, George Lucas is not involved. :\-) +The first character could be 'p' for a \fBp\fRrototype, 'a' for \fBa\fRrgument, +or 'd' for \fBd\fReclaration, so we have \f(CW\*(C`pTHX\*(C'\fR, \f(CW\*(C`aTHX\*(C'\fR and \f(CW\*(C`dTHX\*(C'\fR, and +their variants. +.PP +When Perl is built without options that set MULTIPLICITY, there is no +first argument containing the interpreter's context. The trailing underscore +in the pTHX_ macro indicates that the macro expansion needs a comma +after the context argument because other arguments follow it. If +MULTIPLICITY is not defined, pTHX_ will be ignored, and the +subroutine is not prototyped to take the extra argument. The form of the +macro without the trailing underscore is used when there are no additional +explicit arguments. +.PP +When a core function calls another, it must pass the context. This +is normally hidden via macros. Consider \f(CW\*(C`sv_setiv\*(C'\fR. It expands into +something like this: +.PP +.Vb 6 +\& #ifdef MULTIPLICITY +\& #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) +\& /* can\*(Aqt do this for vararg functions, see below */ +\& #else +\& #define sv_setiv Perl_sv_setiv +\& #endif +.Ve +.PP +This works well, and means that XS authors can gleefully write: +.PP +.Vb 1 +\& sv_setiv(foo, bar); +.Ve +.PP +and still have it work under all the modes Perl could have been +compiled with. +.PP +This doesn't work so cleanly for varargs functions, though, as macros +imply that the number of arguments is known in advance. Instead we +either need to spell them out fully, passing \f(CW\*(C`aTHX_\*(C'\fR as the first +argument (the Perl core tends to do this with functions like +Perl_warner), or use a context-free version. +.PP +The context-free version of Perl_warner is called +Perl_warner_nocontext, and does not take the extra argument. Instead +it does \f(CW\*(C`dTHX;\*(C'\fR to get the context from thread-local storage. We +\&\f(CW\*(C`#define warner Perl_warner_nocontext\*(C'\fR so that extensions get source +compatibility at the expense of performance. (Passing an arg is +cheaper than grabbing it from thread-local storage.) +.PP +You can ignore [pad]THXx when browsing the Perl headers/sources. +Those are strictly for use within the core. Extensions and embedders +need only be aware of [pad]THX. +.SS "So what happened to dTHR?" +.IX Subsection "So what happened to dTHR?" +\&\f(CW\*(C`dTHR\*(C'\fR was introduced in perl 5.005 to support the older thread model. +The older thread model now uses the \f(CW\*(C`THX\*(C'\fR mechanism to pass context +pointers around, so \f(CW\*(C`dTHR\*(C'\fR is not useful any more. Perl 5.6.0 and +later still have it for backward source compatibility, but it is defined +to be a no-op. +.SS "How do I use all this in extensions?" +.IX Subsection "How do I use all this in extensions?" +When Perl is built with MULTIPLICITY, extensions that call +any functions in the Perl API will need to pass the initial context +argument somehow. The kicker is that you will need to write it in +such a way that the extension still compiles when Perl hasn't been +built with MULTIPLICITY enabled. +.PP +There are three ways to do this. First, the easy but inefficient way, +which is also the default, in order to maintain source compatibility +with extensions: whenever \fIXSUB.h\fR is #included, it redefines the aTHX +and aTHX_ macros to call a function that will return the context. +Thus, something like: +.PP +.Vb 1 +\& sv_setiv(sv, num); +.Ve +.PP +in your extension will translate to this when MULTIPLICITY is +in effect: +.PP +.Vb 1 +\& Perl_sv_setiv(Perl_get_context(), sv, num); +.Ve +.PP +or to this otherwise: +.PP +.Vb 1 +\& Perl_sv_setiv(sv, num); +.Ve +.PP +You don't have to do anything new in your extension to get this; since +the Perl library provides \fBPerl_get_context()\fR, it will all just +work. +.PP +The second, more efficient way is to use the following template for +your Foo.xs: +.PP +.Vb 4 +\& #define PERL_NO_GET_CONTEXT /* we want efficiency */ +\& #include "EXTERN.h" +\& #include "perl.h" +\& #include "XSUB.h" +\& +\& STATIC void my_private_function(int arg1, int arg2); +\& +\& STATIC void +\& my_private_function(int arg1, int arg2) +\& { +\& dTHX; /* fetch context */ +\& ... call many Perl API functions ... +\& } +\& +\& [... etc ...] +\& +\& MODULE = Foo PACKAGE = Foo +\& +\& /* typical XSUB */ +\& +\& void +\& my_xsub(arg) +\& int arg +\& CODE: +\& my_private_function(arg, 10); +.Ve +.PP +Note that the only two changes from the normal way of writing an +extension is the addition of a \f(CW\*(C`#define PERL_NO_GET_CONTEXT\*(C'\fR before +including the Perl headers, followed by a \f(CW\*(C`dTHX;\*(C'\fR declaration at +the start of every function that will call the Perl API. (You'll +know which functions need this, because the C compiler will complain +that there's an undeclared identifier in those functions.) No changes +are needed for the XSUBs themselves, because the \fBXS()\fR macro is +correctly defined to pass in the implicit context if needed. +.PP +The third, even more efficient way is to ape how it is done within +the Perl guts: +.PP +.Vb 4 +\& #define PERL_NO_GET_CONTEXT /* we want efficiency */ +\& #include "EXTERN.h" +\& #include "perl.h" +\& #include "XSUB.h" +\& +\& /* pTHX_ only needed for functions that call Perl API */ +\& STATIC void my_private_function(pTHX_ int arg1, int arg2); +\& +\& STATIC void +\& my_private_function(pTHX_ int arg1, int arg2) +\& { +\& /* dTHX; not needed here, because THX is an argument */ +\& ... call Perl API functions ... +\& } +\& +\& [... etc ...] +\& +\& MODULE = Foo PACKAGE = Foo +\& +\& /* typical XSUB */ +\& +\& void +\& my_xsub(arg) +\& int arg +\& CODE: +\& my_private_function(aTHX_ arg, 10); +.Ve +.PP +This implementation never has to fetch the context using a function +call, since it is always passed as an extra argument. Depending on +your needs for simplicity or efficiency, you may mix the previous +two approaches freely. +.PP +Never add a comma after \f(CW\*(C`pTHX\*(C'\fR yourself\-\-always use the form of the +macro with the underscore for functions that take explicit arguments, +or the form without the argument for functions with no explicit arguments. +.SS "Should I do anything special if I call perl from multiple threads?" +.IX Subsection "Should I do anything special if I call perl from multiple threads?" +If you create interpreters in one thread and then proceed to call them in +another, you need to make sure perl's own Thread Local Storage (TLS) slot is +initialized correctly in each of those threads. +.PP +The \f(CW\*(C`perl_alloc\*(C'\fR and \f(CW\*(C`perl_clone\*(C'\fR API functions will automatically set +the TLS slot to the interpreter they created, so that there is no need to do +anything special if the interpreter is always accessed in the same thread that +created it, and that thread did not create or call any other interpreters +afterwards. If that is not the case, you have to set the TLS slot of the +thread before calling any functions in the Perl API on that particular +interpreter. This is done by calling the \f(CW\*(C`PERL_SET_CONTEXT\*(C'\fR macro in that +thread as the first thing you do: +.PP +.Vb 2 +\& /* do this before doing anything else with some_perl */ +\& PERL_SET_CONTEXT(some_perl); +\& +\& ... other Perl API calls on some_perl go here ... +.Ve +.PP +(You can always get the current context via \f(CW\*(C`PERL_GET_CONTEXT\*(C'\fR.) +.SS "Future Plans and PERL_IMPLICIT_SYS" +.IX Subsection "Future Plans and PERL_IMPLICIT_SYS" +Just as MULTIPLICITY provides a way to bundle up everything +that the interpreter knows about itself and pass it around, so too are +there plans to allow the interpreter to bundle up everything it knows +about the environment it's running on. This is enabled with the +PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on +Windows. +.PP +This allows the ability to provide an extra pointer (called the "host" +environment) for all the system calls. This makes it possible for +all the system stuff to maintain their own state, broken down into +seven C structures. These are thin wrappers around the usual system +calls (see \fIwin32/perllib.c\fR) for the default perl executable, but for a +more ambitious host (like the one that would do \fBfork()\fR emulation) all +the extra work needed to pretend that different interpreters are +actually different "processes", would be done here. +.PP +The Perl engine/interpreter and the host are orthogonal entities. +There could be one or more interpreters in a process, and one or +more "hosts", with free association between them. +.SH "Internal Functions" +.IX Header "Internal Functions" +All of Perl's internal functions which will be exposed to the outside +world are prefixed by \f(CW\*(C`Perl_\*(C'\fR so that they will not conflict with XS +functions or functions used in a program in which Perl is embedded. +Similarly, all global variables begin with \f(CW\*(C`PL_\*(C'\fR. (By convention, +static functions start with \f(CW\*(C`S_\*(C'\fR.) +.PP +Inside the Perl core (\f(CW\*(C`PERL_CORE\*(C'\fR defined), you can get at the functions +either with or without the \f(CW\*(C`Perl_\*(C'\fR prefix, thanks to a bunch of defines +that live in \fIembed.h\fR. Note that extension code should \fInot\fR set +\&\f(CW\*(C`PERL_CORE\*(C'\fR; this exposes the full perl internals, and is likely to cause +breakage of the XS in each new perl release. +.PP +The file \fIembed.h\fR is generated automatically from +\&\fIembed.pl\fR and \fIembed.fnc\fR. \fIembed.pl\fR also creates the prototyping +header files for the internal functions, generates the documentation +and a lot of other bits and pieces. It's important that when you add +a new function to the core or change an existing one, you change the +data in the table in \fIembed.fnc\fR as well. Here's a sample entry from +that table: +.PP +.Vb 1 +\& Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval +.Ve +.PP +The first column is a set of flags, the second column the return type, +the third column the name. Columns after that are the arguments. +The flags are documented at the top of \fIembed.fnc\fR. +.PP +If you edit \fIembed.pl\fR or \fIembed.fnc\fR, you will need to run +\&\f(CW\*(C`make regen_headers\*(C'\fR to force a rebuild of \fIembed.h\fR and other +auto-generated files. +.SS "Formatted Printing of IVs, UVs, and NVs" +.IX Subsection "Formatted Printing of IVs, UVs, and NVs" +If you are printing IVs, UVs, or NVS instead of the \fBstdio\fR\|(3) style +formatting codes like \f(CW%d\fR, \f(CW%ld\fR, \f(CW%f\fR, you should use the +following macros for portability +.PP +.Vb 7 +\& IVdf IV in decimal +\& UVuf UV in decimal +\& UVof UV in octal +\& UVxf UV in hexadecimal +\& NVef NV %e\-like +\& NVff NV %f\-like +\& NVgf NV %g\-like +.Ve +.PP +These will take care of 64\-bit integers and long doubles. +For example: +.PP +.Vb 1 +\& printf("IV is %" IVdf "\en", iv); +.Ve +.PP +The \f(CW\*(C`IVdf\*(C'\fR will expand to whatever is the correct format for the IVs. +Note that the spaces are required around the format in case the code is +compiled with C++, to maintain compliance with its standard. +.PP +Note that there are different "long doubles": Perl will use +whatever the compiler has. +.PP +If you are printing addresses of pointers, use \f(CW%p\fR or UVxf combined +with \fBPTR2UV()\fR. +.SS "Formatted Printing of SVs" +.IX Subsection "Formatted Printing of SVs" +The contents of SVs may be printed using the \f(CW\*(C`SVf\*(C'\fR format, like so: +.PP +.Vb 1 +\& Perl_croak(aTHX_ "This croaked because: %" SVf "\en", SVfARG(err_msg)) +.Ve +.PP +where \f(CW\*(C`err_msg\*(C'\fR is an SV. +.PP +Not all scalar types are printable. Simple values certainly are: one of +IV, UV, NV, or PV. Also, if the SV is a reference to some value, +either it will be dereferenced and the value printed, or information +about the type of that value and its address are displayed. The results +of printing any other type of SV are undefined and likely to lead to an +interpreter crash. NVs are printed using a \f(CW%g\fR\-ish format. +.PP +Note that the spaces are required around the \f(CW\*(C`SVf\*(C'\fR in case the code is +compiled with C++, to maintain compliance with its standard. +.PP +Note that any filehandle being printed to under UTF\-8 must be expecting +UTF\-8 in order to get good results and avoid Wide-character warnings. +One way to do this for typical filehandles is to invoke perl with the +\&\f(CW\*(C`\-C\*(C'\fR parameter. (See "\-C [number/list]" in perlrun. +.PP +You can use this to concatenate two scalars: +.PP +.Vb 4 +\& SV *var1 = get_sv("var1", GV_ADD); +\& SV *var2 = get_sv("var2", GV_ADD); +\& SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf, +\& SVfARG(var1), SVfARG(var2)); +.Ve +.PP +\&\f(CW\*(C`SVf_QUOTEDPREFIX\*(C'\fR is similar to \f(CW\*(C`SVf\*(C'\fR except that it restricts the +number of the characters printed, showing at most the first +\&\f(CW\*(C`PERL_QUOTEDPREFIX_LEN\*(C'\fR characters of the argument, and rendering it with +double quotes and with the contents escaped using double quoted string +escaping rules. If the string is longer than this then ellipses "..." +will be appended after the trailing quote. This is intended for error +messages where the string is assumed to be a class name. +.PP +\&\f(CW\*(C`HvNAMEf\*(C'\fR and \f(CW\*(C`HvNAMEf_QUOTEDPREFIX\*(C'\fR are similar to \f(CW\*(C`SVf\*(C'\fR except they +extract the string, length and utf8 flags from the argument using the +\&\f(CWHvNAME()\fR, \f(CWHvNAMELEN()\fR, \f(CWHvNAMEUTF8()\fR macros. This is intended +for stringifying a class name directly from an stash HV. +.SS "Formatted Printing of Strings" +.IX Subsection "Formatted Printing of Strings" +If you just want the bytes printed in a 7bit NUL-terminated string, you can +just use \f(CW%s\fR (assuming they are all really only 7bit). But if there is a +possibility the value will be encoded as UTF\-8 or contains bytes above +\&\f(CW0x7F\fR (and therefore 8bit), you should instead use the \f(CW\*(C`UTF8f\*(C'\fR format. +And as its parameter, use the \f(CWUTF8fARG()\fR macro: +.PP +.Vb 1 +\& chr * msg; +\& +\& /* U+2018: \exE2\ex80\ex98 LEFT SINGLE QUOTATION MARK +\& U+2019: \exE2\ex80\ex99 RIGHT SINGLE QUOTATION MARK */ +\& if (can_utf8) +\& msg = "\exE2\ex80\ex98Uses fancy quotes\exE2\ex80\ex99"; +\& else +\& msg = "\*(AqUses simple quotes\*(Aq"; +\& +\& Perl_croak(aTHX_ "The message is: %" UTF8f "\en", +\& UTF8fARG(can_utf8, strlen(msg), msg)); +.Ve +.PP +The first parameter to \f(CW\*(C`UTF8fARG\*(C'\fR is a boolean: 1 if the string is in +UTF\-8; 0 if string is in native byte encoding (Latin1). +The second parameter is the number of bytes in the string to print. +And the third and final parameter is a pointer to the first byte in the +string. +.PP +Note that any filehandle being printed to under UTF\-8 must be expecting +UTF\-8 in order to get good results and avoid Wide-character warnings. +One way to do this for typical filehandles is to invoke perl with the +\&\f(CW\*(C`\-C\*(C'\fR parameter. (See "\-C [number/list]" in perlrun. +.ie n .SS "Formatted Printing of ""Size_t"" and ""SSize_t""" +.el .SS "Formatted Printing of \f(CWSize_t\fP and \f(CWSSize_t\fP" +.IX Subsection "Formatted Printing of Size_t and SSize_t" +The most general way to do this is to cast them to a UV or IV, and +print as in the +previous section. +.PP +But if you're using \f(CWPerlIO_printf()\fR, it's less typing and visual +clutter to use the \f(CW%z\fR length modifier (for \fIsiZe\fR): +.PP +.Vb 1 +\& PerlIO_printf("STRLEN is %zu\en", len); +.Ve +.PP +This modifier is not portable, so its use should be restricted to +\&\f(CWPerlIO_printf()\fR. +.ie n .SS "Formatted Printing of ""Ptrdiff_t"", ""intmax_t"", ""short"" and other special sizes" +.el .SS "Formatted Printing of \f(CWPtrdiff_t\fP, \f(CWintmax_t\fP, \f(CWshort\fP and other special sizes" +.IX Subsection "Formatted Printing of Ptrdiff_t, intmax_t, short and other special sizes" +There are modifiers for these special situations if you are using +\&\f(CWPerlIO_printf()\fR. See "size" in perlfunc. +.SS "Pointer-To-Integer and Integer-To-Pointer" +.IX Subsection "Pointer-To-Integer and Integer-To-Pointer" +Because pointer size does not necessarily equal integer size, +use the follow macros to do it right. +.PP +.Vb 4 +\& PTR2UV(pointer) +\& PTR2IV(pointer) +\& PTR2NV(pointer) +\& INT2PTR(pointertotype, integer) +.Ve +.PP +For example: +.PP +.Vb 2 +\& IV iv = ...; +\& SV *sv = INT2PTR(SV*, iv); +.Ve +.PP +and +.PP +.Vb 2 +\& AV *av = ...; +\& UV uv = PTR2UV(av); +.Ve +.PP +There are also +.PP +.Vb 2 +\& PTR2nat(pointer) /* pointer to integer of PTRSIZE */ +\& PTR2ul(pointer) /* pointer to unsigned long */ +.Ve +.PP +And \f(CW\*(C`PTRV\*(C'\fR which gives the native type for an integer the same size as +pointers, such as \f(CW\*(C`unsigned\*(C'\fR or \f(CW\*(C`unsigned long\*(C'\fR. +.SS "Exception Handling" +.IX Subsection "Exception Handling" +There are a couple of macros to do very basic exception handling in XS +modules. You have to define \f(CW\*(C`NO_XSLOCKS\*(C'\fR before including \fIXSUB.h\fR to +be able to use these macros: +.PP +.Vb 2 +\& #define NO_XSLOCKS +\& #include "XSUB.h" +.Ve +.PP +You can use these macros if you call code that may croak, but you need +to do some cleanup before giving control back to Perl. For example: +.PP +.Vb 1 +\& dXCPT; /* set up necessary variables */ +\& +\& XCPT_TRY_START { +\& code_that_may_croak(); +\& } XCPT_TRY_END +\& +\& XCPT_CATCH +\& { +\& /* do cleanup here */ +\& XCPT_RETHROW; +\& } +.Ve +.PP +Note that you always have to rethrow an exception that has been +caught. Using these macros, it is not possible to just catch the +exception and ignore it. If you have to ignore the exception, you +have to use the \f(CW\*(C`call_*\*(C'\fR function. +.PP +The advantage of using the above macros is that you don't have +to setup an extra function for \f(CW\*(C`call_*\*(C'\fR, and that using these +macros is faster than using \f(CW\*(C`call_*\*(C'\fR. +.SS "Source Documentation" +.IX Subsection "Source Documentation" +There's an effort going on to document the internal functions and +automatically produce reference manuals from them \-\- perlapi is one +such manual which details all the functions which are available to XS +writers. perlintern is the autogenerated manual for the functions +which are not part of the API and are supposedly for internal use only. +.PP +Source documentation is created by putting POD comments into the C +source, like this: +.PP +.Vb 2 +\& /* +\& =for apidoc sv_setiv +\& +\& Copies an integer into the given SV. Does not handle \*(Aqset\*(Aq magic. See +\& L. +\& +\& =cut +\& */ +.Ve +.PP +Please try and supply some documentation if you add functions to the +Perl core. +.SS "Backwards compatibility" +.IX Subsection "Backwards compatibility" +The Perl API changes over time. New functions are +added or the interfaces of existing functions are +changed. The \f(CW\*(C`Devel::PPPort\*(C'\fR module tries to +provide compatibility code for some of these changes, so XS writers don't +have to code it themselves when supporting multiple versions of Perl. +.PP +\&\f(CW\*(C`Devel::PPPort\*(C'\fR generates a C header file \fIppport.h\fR that can also +be run as a Perl script. To generate \fIppport.h\fR, run: +.PP +.Vb 1 +\& perl \-MDevel::PPPort \-eDevel::PPPort::WriteFile +.Ve +.PP +Besides checking existing XS code, the script can also be used to retrieve +compatibility information for various API calls using the \f(CW\*(C`\-\-api\-info\*(C'\fR +command line switch. For example: +.PP +.Vb 1 +\& % perl ppport.h \-\-api\-info=sv_magicext +.Ve +.PP +For details, see \f(CW\*(C`perldoc\ ppport.h\*(C'\fR. +.SH "Unicode Support" +.IX Header "Unicode Support" +Perl 5.6.0 introduced Unicode support. It's important for porters and XS +writers to understand this support and make sure that the code they +write does not corrupt Unicode data. +.SS "What \fBis\fP Unicode, anyway?" +.IX Subsection "What is Unicode, anyway?" +In the olden, less enlightened times, we all used to use ASCII. Most of +us did, anyway. The big problem with ASCII is that it's American. Well, +no, that's not actually the problem; the problem is that it's not +particularly useful for people who don't use the Roman alphabet. What +used to happen was that particular languages would stick their own +alphabet in the upper range of the sequence, between 128 and 255. Of +course, we then ended up with plenty of variants that weren't quite +ASCII, and the whole point of it being a standard was lost. +.PP +Worse still, if you've got a language like Chinese or +Japanese that has hundreds or thousands of characters, then you really +can't fit them into a mere 256, so they had to forget about ASCII +altogether, and build their own systems using pairs of numbers to refer +to one character. +.PP +To fix this, some people formed Unicode, Inc. and +produced a new character set containing all the characters you can +possibly think of and more. There are several ways of representing these +characters, and the one Perl uses is called UTF\-8. UTF\-8 uses +a variable number of bytes to represent a character. You can learn more +about Unicode and Perl's Unicode model in perlunicode. +.PP +(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of +UTF\-8 adapted for EBCDIC platforms. Below, we just talk about UTF\-8. +UTF-EBCDIC is like UTF\-8, but the details are different. The macros +hide the differences from you, just remember that the particular numbers +and bit patterns presented below will differ in UTF-EBCDIC.) +.SS "How can I recognise a UTF\-8 string?" +.IX Subsection "How can I recognise a UTF-8 string?" +You can't. This is because UTF\-8 data is stored in bytes just like +non\-UTF\-8 data. The Unicode character 200, (\f(CW0xC8\fR for you hex types) +capital E with a grave accent, is represented by the two bytes +\&\f(CW\*(C`v196.172\*(C'\fR. Unfortunately, the non-Unicode string \f(CW\*(C`chr(196).chr(172)\*(C'\fR +has that byte sequence as well. So you can't tell just by looking \-\- this +is what makes Unicode input an interesting problem. +.PP +In general, you either have to know what you're dealing with, or you +have to guess. The API function \f(CW\*(C`is_utf8_string\*(C'\fR can help; it'll tell +you if a string contains only valid UTF\-8 characters, and the chances +of a non\-UTF\-8 string looking like valid UTF\-8 become very small very +quickly with increasing string length. On a character-by-character +basis, \f(CW\*(C`isUTF8_CHAR\*(C'\fR +will tell you whether the current character in a string is valid UTF\-8. +.SS "How does UTF\-8 represent Unicode characters?" +.IX Subsection "How does UTF-8 represent Unicode characters?" +As mentioned above, UTF\-8 uses a variable number of bytes to store a +character. Characters with values 0...127 are stored in one +byte, just like good ol' ASCII. Character 128 is stored as +\&\f(CW\*(C`v194.128\*(C'\fR; this continues up to character 191, which is +\&\f(CW\*(C`v194.191\*(C'\fR. Now we've run out of bits (191 is binary +\&\f(CW10111111\fR) so we move on; character 192 is \f(CW\*(C`v195.128\*(C'\fR. And +so it goes on, moving to three bytes at character 2048. +"Unicode Encodings" in perlunicode has pictures of how this works. +.PP +Assuming you know you're dealing with a UTF\-8 string, you can find out +how long the first character in it is with the \f(CW\*(C`UTF8SKIP\*(C'\fR macro: +.PP +.Vb 2 +\& char *utf = "\e305\e233\e340\e240\e201"; +\& I32 len; +\& +\& len = UTF8SKIP(utf); /* len is 2 here */ +\& utf += len; +\& len = UTF8SKIP(utf); /* len is 3 here */ +.Ve +.PP +Another way to skip over characters in a UTF\-8 string is to use +\&\f(CW\*(C`utf8_hop\*(C'\fR, which takes a string and a number of characters to skip +over. You're on your own about bounds checking, though, so don't use it +lightly. +.PP +All bytes in a multi-byte UTF\-8 character will have the high bit set, +so you can test if you need to do something special with this +character like this (the \f(CWUTF8_IS_INVARIANT()\fR is a macro that tests +whether the byte is encoded as a single byte even in UTF\-8): +.PP +.Vb 7 +\& U8 *utf; /* Initialize this to point to the beginning of the +\& sequence to convert */ +\& U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence +\& pointed to by \*(Aqutf\*(Aq */ +\& UV uv; /* Returned code point; note: a UV, not a U8, not a +\& char */ +\& STRLEN len; /* Returned length of character in bytes */ +\& +\& if (!UTF8_IS_INVARIANT(*utf)) +\& /* Must treat this as UTF\-8 */ +\& uv = utf8_to_uvchr_buf(utf, utf_end, &len); +\& else +\& /* OK to treat this character as a byte */ +\& uv = *utf; +.Ve +.PP +You can also see in that example that we use \f(CW\*(C`utf8_to_uvchr_buf\*(C'\fR to get the +value of the character; the inverse function \f(CW\*(C`uvchr_to_utf8\*(C'\fR is available +for putting a UV into UTF\-8: +.PP +.Vb 6 +\& if (!UVCHR_IS_INVARIANT(uv)) +\& /* Must treat this as UTF8 */ +\& utf8 = uvchr_to_utf8(utf8, uv); +\& else +\& /* OK to treat this character as a byte */ +\& *utf8++ = uv; +.Ve +.PP +You \fBmust\fR convert characters to UVs using the above functions if +you're ever in a situation where you have to match UTF\-8 and non\-UTF\-8 +characters. You may not skip over UTF\-8 characters in this case. If you +do this, you'll lose the ability to match hi-bit non\-UTF\-8 characters; +for instance, if your UTF\-8 string contains \f(CW\*(C`v196.172\*(C'\fR, and you skip +that character, you can never match a \f(CWchr(200)\fR in a non\-UTF\-8 string. +So don't do that! +.PP +(Note that we don't have to test for invariant characters in the +examples above. The functions work on any well-formed UTF\-8 input. +It's just that its faster to avoid the function overhead when it's not +needed.) +.SS "How does Perl store UTF\-8 strings?" +.IX Subsection "How does Perl store UTF-8 strings?" +Currently, Perl deals with UTF\-8 strings and non\-UTF\-8 strings +slightly differently. A flag in the SV, \f(CW\*(C`SVf_UTF8\*(C'\fR, indicates that the +string is internally encoded as UTF\-8. Without it, the byte value is the +codepoint number and vice versa. This flag is only meaningful if the SV +is \f(CW\*(C`SvPOK\*(C'\fR or immediately after stringification via \f(CW\*(C`SvPV\*(C'\fR or a +similar macro. You can check and manipulate this flag with the +following macros: +.PP +.Vb 3 +\& SvUTF8(sv) +\& SvUTF8_on(sv) +\& SvUTF8_off(sv) +.Ve +.PP +This flag has an important effect on Perl's treatment of the string: if +UTF\-8 data is not properly distinguished, regular expressions, +\&\f(CW\*(C`length\*(C'\fR, \f(CW\*(C`substr\*(C'\fR and other string handling operations will have +undesirable (wrong) results. +.PP +The problem comes when you have, for instance, a string that isn't +flagged as UTF\-8, and contains a byte sequence that could be UTF\-8 \-\- +especially when combining non\-UTF\-8 and UTF\-8 strings. +.PP +Never forget that the \f(CW\*(C`SVf_UTF8\*(C'\fR flag is separate from the PV value; you +need to be sure you don't accidentally knock it off while you're +manipulating SVs. More specifically, you cannot expect to do this: +.PP +.Vb 4 +\& SV *sv; +\& SV *nsv; +\& STRLEN len; +\& char *p; +\& +\& p = SvPV(sv, len); +\& frobnicate(p); +\& nsv = newSVpvn(p, len); +.Ve +.PP +The \f(CW\*(C`char*\*(C'\fR string does not tell you the whole story, and you can't +copy or reconstruct an SV just by copying the string value. Check if the +old SV has the UTF8 flag set (\fIafter\fR the \f(CW\*(C`SvPV\*(C'\fR call), and act +accordingly: +.PP +.Vb 6 +\& p = SvPV(sv, len); +\& is_utf8 = SvUTF8(sv); +\& frobnicate(p, is_utf8); +\& nsv = newSVpvn(p, len); +\& if (is_utf8) +\& SvUTF8_on(nsv); +.Ve +.PP +In the above, your \f(CW\*(C`frobnicate\*(C'\fR function has been changed to be made +aware of whether or not it's dealing with UTF\-8 data, so that it can +handle the string appropriately. +.PP +Since just passing an SV to an XS function and copying the data of +the SV is not enough to copy the UTF8 flags, even less right is just +passing a \f(CW\*(C`char\ *\*(C'\fR to an XS function. +.PP +For full generality, use the \f(CW\*(C`DO_UTF8\*(C'\fR macro to see if the +string in an SV is to be \fItreated\fR as UTF\-8. This takes into account +if the call to the XS function is being made from within the scope of +\&\f(CW\*(C`use\ bytes\*(C'\fR. If so, the underlying bytes that comprise the +UTF\-8 string are to be exposed, rather than the character they +represent. But this pragma should only really be used for debugging and +perhaps low-level testing at the byte level. Hence most XS code need +not concern itself with this, but various areas of the perl core do need +to support it. +.PP +And this isn't the whole story. Starting in Perl v5.12, strings that +aren't encoded in UTF\-8 may also be treated as Unicode under various +conditions (see "ASCII Rules versus Unicode Rules" in perlunicode). +This is only really a problem for characters whose ordinals are between +128 and 255, and their behavior varies under ASCII versus Unicode rules +in ways that your code cares about (see "The "Unicode Bug"" in perlunicode). +There is no published API for dealing with this, as it is subject to +change, but you can look at the code for \f(CW\*(C`pp_lc\*(C'\fR in \fIpp.c\fR for an +example as to how it's currently done. +.SS "How do I pass a Perl string to a C library?" +.IX Subsection "How do I pass a Perl string to a C library?" +A Perl string, conceptually, is an opaque sequence of code points. +Many C libraries expect their inputs to be "classical" C strings, which are +arrays of octets 1\-255, terminated with a NUL byte. Your job when writing +an interface between Perl and a C library is to define the mapping between +Perl and that library. +.PP +Generally speaking, \f(CW\*(C`SvPVbyte\*(C'\fR and related macros suit this task well. +These assume that your Perl string is a "byte string", i.e., is either +raw, undecoded input into Perl or is pre-encoded to, e.g., UTF\-8. +.PP +Alternatively, if your C library expects UTF\-8 text, you can use +\&\f(CW\*(C`SvPVutf8\*(C'\fR and related macros. This has the same effect as encoding +to UTF\-8 then calling the corresponding \f(CW\*(C`SvPVbyte\*(C'\fR\-related macro. +.PP +Some C libraries may expect other encodings (e.g., UTF\-16LE). To give +Perl strings to such libraries +you must either do that encoding in Perl then use \f(CW\*(C`SvPVbyte\*(C'\fR, or +use an intermediary C library to convert from however Perl stores the +string to the desired encoding. +.PP +Take care also that NULs in your Perl string don't confuse the C +library. If possible, give the string's length to the C library; if that's +not possible, consider rejecting strings that contain NUL bytes. +.PP +\fIWhat about \fR\f(CI\*(C`SvPV\*(C'\fR\fI, \fR\f(CI\*(C`SvPV_nolen\*(C'\fR\fI, etc.?\fR +.IX Subsection "What about SvPV, SvPV_nolen, etc.?" +.PP +Consider a 3\-character Perl string \f(CW\*(C`$foo = "\ex64\ex78\ex8c"\*(C'\fR. +Perl can store these 3 characters either of two ways: +.IP \(bu 4 +bytes: 0x64 0x78 0x8c +.IP \(bu 4 +UTF\-8: 0x64 0x78 0xc2 0x8c +.PP +Now let's say you convert \f(CW$foo\fR to a C string thus: +.PP +.Vb 2 +\& STRLEN strlen; +\& char *str = SvPV(foo_sv, strlen); +.Ve +.PP +At this point \f(CW\*(C`str\*(C'\fR could point to a 3\-byte C string or a 4\-byte one. +.PP +Generally speaking, we want \f(CW\*(C`str\*(C'\fR to be the same regardless of how +Perl stores \f(CW$foo\fR, so the ambiguity here is undesirable. \f(CW\*(C`SvPVbyte\*(C'\fR +and \f(CW\*(C`SvPVutf8\*(C'\fR solve that by giving predictable output: use +\&\f(CW\*(C`SvPVbyte\*(C'\fR if your C library expects byte strings, or \f(CW\*(C`SvPVutf8\*(C'\fR +if it expects UTF\-8. +.PP +If your C library happens to support both encodings, then \f(CW\*(C`SvPV\*(C'\fR\-\-always +in tandem with lookups to \f(CW\*(C`SvUTF8\*(C'\fR!\-\-may be safe and (slightly) more +efficient. +.PP +\&\fBTESTING\fR \fBTIP:\fR Use utf8's \f(CW\*(C`upgrade\*(C'\fR and \f(CW\*(C`downgrade\*(C'\fR functions +in your tests to ensure consistent handling regardless of Perl's +internal encoding. +.SS "How do I convert a string to UTF\-8?" +.IX Subsection "How do I convert a string to UTF-8?" +If you're mixing UTF\-8 and non\-UTF\-8 strings, it is necessary to upgrade +the non\-UTF\-8 strings to UTF\-8. If you've got an SV, the easiest way to do +this is: +.PP +.Vb 1 +\& sv_utf8_upgrade(sv); +.Ve +.PP +However, you must not do this, for example: +.PP +.Vb 2 +\& if (!SvUTF8(left)) +\& sv_utf8_upgrade(left); +.Ve +.PP +If you do this in a binary operator, you will actually change one of the +strings that came into the operator, and, while it shouldn't be noticeable +by the end user, it can cause problems in deficient code. +.PP +Instead, \f(CW\*(C`bytes_to_utf8\*(C'\fR will give you a UTF\-8\-encoded \fBcopy\fR of its +string argument. This is useful for having the data available for +comparisons and so on, without harming the original SV. There's also +\&\f(CW\*(C`utf8_to_bytes\*(C'\fR to go the other way, but naturally, this will fail if +the string contains any characters above 255 that can't be represented +in a single byte. +.SS "How do I compare strings?" +.IX Subsection "How do I compare strings?" +"sv_cmp" in perlapi and "sv_cmp_flags" in perlapi do a lexigraphic +comparison of two SV's, and handle UTF\-8ness properly. Note, however, +that Unicode specifies a much fancier mechanism for collation, available +via the Unicode::Collate module. +.PP +To just compare two strings for equality/non\-equality, you can just use +\&\f(CWmemEQ()\fR and \f(CWmemNE()\fR as usual, +except the strings must be both UTF\-8 or not UTF\-8 encoded. +.PP +To compare two strings case-insensitively, use +\&\f(CWfoldEQ_utf8()\fR (the strings don't have to have +the same UTF\-8ness). +.SS "Is there anything else I need to know?" +.IX Subsection "Is there anything else I need to know?" +Not really. Just remember these things: +.IP \(bu 3 +There's no way to tell if a \f(CW\*(C`char\ *\*(C'\fR or \f(CW\*(C`U8\ *\*(C'\fR string is UTF\-8 +or not. But you can tell if an SV is to be treated as UTF\-8 by calling +\&\f(CW\*(C`DO_UTF8\*(C'\fR on it, after stringifying it with \f(CW\*(C`SvPV\*(C'\fR or a similar +macro. And, you can tell if SV is actually UTF\-8 (even if it is not to +be treated as such) by looking at its \f(CW\*(C`SvUTF8\*(C'\fR flag (again after +stringifying it). Don't forget to set the flag if something should be +UTF\-8. +Treat the flag as part of the PV, even though it's not \-\- if you pass on +the PV to somewhere, pass on the flag too. +.IP \(bu 3 +If a string is UTF\-8, \fBalways\fR use \f(CW\*(C`utf8_to_uvchr_buf\*(C'\fR to get at the value, +unless \f(CWUTF8_IS_INVARIANT(*s)\fR in which case you can use \f(CW*s\fR. +.IP \(bu 3 +When writing a character UV to a UTF\-8 string, \fBalways\fR use +\&\f(CW\*(C`uvchr_to_utf8\*(C'\fR, unless \f(CW\*(C`UVCHR_IS_INVARIANT(uv))\*(C'\fR in which case +you can use \f(CW\*(C`*s = uv\*(C'\fR. +.IP \(bu 3 +Mixing UTF\-8 and non\-UTF\-8 strings is +tricky. Use \f(CW\*(C`bytes_to_utf8\*(C'\fR to get +a new string which is UTF\-8 encoded, and then combine them. +.SH "Custom Operators" +.IX Header "Custom Operators" +Custom operator support is an experimental feature that allows you to +define your own ops. This is primarily to allow the building of +interpreters for other languages in the Perl core, but it also allows +optimizations through the creation of "macro-ops" (ops which perform the +functions of multiple ops which are usually executed together, such as +\&\f(CW\*(C`gvsv, gvsv, add\*(C'\fR.) +.PP +This feature is implemented as a new op type, \f(CW\*(C`OP_CUSTOM\*(C'\fR. The Perl +core does not "know" anything special about this op type, and so it will +not be involved in any optimizations. This also means that you can +define your custom ops to be any op structure \-\- unary, binary, list and +so on \-\- you like. +.PP +It's important to know what custom operators won't do for you. They +won't let you add new syntax to Perl, directly. They won't even let you +add new keywords, directly. In fact, they won't change the way Perl +compiles a program at all. You have to do those changes yourself, after +Perl has compiled the program. You do this either by manipulating the op +tree using a \f(CW\*(C`CHECK\*(C'\fR block and the \f(CW\*(C`B::Generate\*(C'\fR module, or by adding +a custom peephole optimizer with the \f(CW\*(C`optimize\*(C'\fR module. +.PP +When you do this, you replace ordinary Perl ops with custom ops by +creating ops with the type \f(CW\*(C`OP_CUSTOM\*(C'\fR and the \f(CW\*(C`op_ppaddr\*(C'\fR of your own +PP function. This should be defined in XS code, and should look like +the PP ops in \f(CW\*(C`pp_*.c\*(C'\fR. You are responsible for ensuring that your op +takes the appropriate number of values from the stack, and you are +responsible for adding stack marks if necessary. +.PP +You should also "register" your op with the Perl interpreter so that it +can produce sensible error and warning messages. Since it is possible to +have multiple custom ops within the one "logical" op type \f(CW\*(C`OP_CUSTOM\*(C'\fR, +Perl uses the value of \f(CW\*(C`o\->op_ppaddr\*(C'\fR to determine which custom op +it is dealing with. You should create an \f(CW\*(C`XOP\*(C'\fR structure for each +ppaddr you use, set the properties of the custom op with +\&\f(CW\*(C`XopENTRY_set\*(C'\fR, and register the structure against the ppaddr using +\&\f(CW\*(C`Perl_custom_op_register\*(C'\fR. A trivial example might look like: +.PP +.Vb 2 +\& static XOP my_xop; +\& static OP *my_pp(pTHX); +\& +\& BOOT: +\& XopENTRY_set(&my_xop, xop_name, "myxop"); +\& XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); +\& Perl_custom_op_register(aTHX_ my_pp, &my_xop); +.Ve +.PP +The available fields in the structure are: +.IP xop_name 4 +.IX Item "xop_name" +A short name for your op. This will be included in some error messages, +and will also be returned as \f(CW\*(C`$op\->name\*(C'\fR by the B module, so +it will appear in the output of module like B::Concise. +.IP xop_desc 4 +.IX Item "xop_desc" +A short description of the function of the op. +.IP xop_class 4 +.IX Item "xop_class" +Which of the various \f(CW*OP\fR structures this op uses. This should be one of +the \f(CW\*(C`OA_*\*(C'\fR constants from \fIop.h\fR, namely +.RS 4 +.IP OA_BASEOP 4 +.IX Item "OA_BASEOP" +.PD 0 +.IP OA_UNOP 4 +.IX Item "OA_UNOP" +.IP OA_BINOP 4 +.IX Item "OA_BINOP" +.IP OA_LOGOP 4 +.IX Item "OA_LOGOP" +.IP OA_LISTOP 4 +.IX Item "OA_LISTOP" +.IP OA_PMOP 4 +.IX Item "OA_PMOP" +.IP OA_SVOP 4 +.IX Item "OA_SVOP" +.IP OA_PADOP 4 +.IX Item "OA_PADOP" +.IP OA_PVOP_OR_SVOP 4 +.IX Item "OA_PVOP_OR_SVOP" +.PD +This should be interpreted as '\f(CW\*(C`PVOP\*(C'\fR' only. The \f(CW\*(C`_OR_SVOP\*(C'\fR is because +the only core \f(CW\*(C`PVOP\*(C'\fR, \f(CW\*(C`OP_TRANS\*(C'\fR, can sometimes be a \f(CW\*(C`SVOP\*(C'\fR instead. +.IP OA_LOOP 4 +.IX Item "OA_LOOP" +.PD 0 +.IP OA_COP 4 +.IX Item "OA_COP" +.RE +.RS 4 +.PD +.Sp +The other \f(CW\*(C`OA_*\*(C'\fR constants should not be used. +.RE +.IP xop_peep 4 +.IX Item "xop_peep" +This member is of type \f(CW\*(C`Perl_cpeep_t\*(C'\fR, which expands to \f(CW\*(C`void +(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)\*(C'\fR. If it is set, this function +will be called from \f(CW\*(C`Perl_rpeep\*(C'\fR when ops of this type are encountered +by the peephole optimizer. \fIo\fR is the OP that needs optimizing; +\&\fIoldop\fR is the previous OP optimized, whose \f(CW\*(C`op_next\*(C'\fR points to \fIo\fR. +.PP +\&\f(CW\*(C`B::Generate\*(C'\fR directly supports the creation of custom ops by name. +.SH Stacks +.IX Header "Stacks" +Descriptions above occasionally refer to "the stack", but there are in fact +many stack-like data structures within the perl interpreter. When otherwise +unqualified, "the stack" usually refers to the value stack. +.PP +The various stacks have different purposes, and operate in slightly different +ways. Their differences are noted below. +.SS "Value Stack" +.IX Subsection "Value Stack" +This stack stores the values that regular perl code is operating on, usually +intermediate values of expressions within a statement. The stack itself is +formed of an array of SV pointers. +.PP +The base of this stack is pointed to by the interpreter variable +\&\f(CW\*(C`PL_stack_base\*(C'\fR, of type \f(CW\*(C`SV **\*(C'\fR. +.PP +The head of the stack is \f(CW\*(C`PL_stack_sp\*(C'\fR, and points to the most +recently-pushed item. +.PP +Items are pushed to the stack by using the \f(CWPUSHs()\fR macro or its variants +described above; \f(CWXPUSHs()\fR, \f(CWmPUSHs()\fR, \f(CWmXPUSHs()\fR and the typed +versions. Note carefully that the non\-\f(CW\*(C`X\*(C'\fR versions of these macros do not +check the size of the stack and assume it to be big enough. These must be +paired with a suitable check of the stack's size, such as the \f(CW\*(C`EXTEND\*(C'\fR macro +to ensure it is large enough. For example +.PP +.Vb 5 +\& EXTEND(SP, 4); +\& mPUSHi(10); +\& mPUSHi(20); +\& mPUSHi(30); +\& mPUSHi(40); +.Ve +.PP +This is slightly more performant than making four separate checks in four +separate \f(CWmXPUSHi()\fR calls. +.PP +As a further performance optimisation, the various \f(CW\*(C`PUSH\*(C'\fR macros all operate +using a local variable \f(CW\*(C`SP\*(C'\fR, rather than the interpreter-global variable +\&\f(CW\*(C`PL_stack_sp\*(C'\fR. This variable is declared by the \f(CW\*(C`dSP\*(C'\fR macro \- though it is +normally implied by XSUBs and similar so it is rare you have to consider it +directly. Once declared, the \f(CW\*(C`PUSH\*(C'\fR macros will operate only on this local +variable, so before invoking any other perl core functions you must use the +\&\f(CW\*(C`PUTBACK\*(C'\fR macro to return the value from the local \f(CW\*(C`SP\*(C'\fR variable back to +the interpreter variable. Similarly, after calling a perl core function which +may have had reason to move the stack or push/pop values to it, you must use +the \f(CW\*(C`SPAGAIN\*(C'\fR macro which refreshes the local \f(CW\*(C`SP\*(C'\fR value back from the +interpreter one. +.PP +Items are popped from the stack by using the \f(CW\*(C`POPs\*(C'\fR macro or its typed +versions, There is also a macro \f(CW\*(C`TOPs\*(C'\fR that inspects the topmost item without +removing it. +.PP +Note specifically that SV pointers on the value stack do not contribute to the +overall reference count of the xVs being referred to. If newly-created xVs are +being pushed to the stack you must arrange for them to be destroyed at a +suitable time; usually by using one of the \f(CW\*(C`mPUSH*\*(C'\fR macros or \f(CWsv_2mortal()\fR +to mortalise the xV. +.SS "Mark Stack" +.IX Subsection "Mark Stack" +The value stack stores individual perl scalar values as temporaries between +expressions. Some perl expressions operate on entire lists; for that purpose +we need to know where on the stack each list begins. This is the purpose of the +mark stack. +.PP +The mark stack stores integers as I32 values, which are the height of the +value stack at the time before the list began; thus the mark itself actually +points to the value stack entry one before the list. The list itself starts at +\&\f(CW\*(C`mark + 1\*(C'\fR. +.PP +The base of this stack is pointed to by the interpreter variable +\&\f(CW\*(C`PL_markstack\*(C'\fR, of type \f(CW\*(C`I32 *\*(C'\fR. +.PP +The head of the stack is \f(CW\*(C`PL_markstack_ptr\*(C'\fR, and points to the most +recently-pushed item. +.PP +Items are pushed to the stack by using the \f(CWPUSHMARK()\fR macro. Even though +the stack itself stores (value) stack indices as integers, the \f(CW\*(C`PUSHMARK\*(C'\fR +macro should be given a stack pointer directly; it will calculate the index +offset by comparing to the \f(CW\*(C`PL_stack_sp\*(C'\fR variable. Thus almost always the +code to perform this is +.PP +.Vb 1 +\& PUSHMARK(SP); +.Ve +.PP +Items are popped from the stack by the \f(CW\*(C`POPMARK\*(C'\fR macro. There is also a macro +\&\f(CW\*(C`TOPMARK\*(C'\fR that inspects the topmost item without removing it. These macros +return I32 index values directly. There is also the \f(CW\*(C`dMARK\*(C'\fR macro which +declares a new SV double-pointer variable, called \f(CW\*(C`mark\*(C'\fR, which points at the +marked stack slot; this is the usual macro that C code will use when operating +on lists given on the stack. +.PP +As noted above, the \f(CW\*(C`mark\*(C'\fR variable itself will point at the most recently +pushed value on the value stack before the list begins, and so the list itself +starts at \f(CW\*(C`mark + 1\*(C'\fR. The values of the list may be iterated by code such as +.PP +.Vb 4 +\& for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) { +\& SV *item = *svp; +\& ... +\& } +.Ve +.PP +Note specifically in the case that the list is already empty, \f(CW\*(C`mark\*(C'\fR will +equal \f(CW\*(C`PL_stack_sp\*(C'\fR. +.PP +Because the \f(CW\*(C`mark\*(C'\fR variable is converted to a pointer on the value stack, +extra care must be taken if \f(CW\*(C`EXTEND\*(C'\fR or any of the \f(CW\*(C`XPUSH\*(C'\fR macros are +invoked within the function, because the stack may need to be moved to +extend it and so the existing pointer will now be invalid. If this may be a +problem, a possible solution is to track the mark offset as an integer and +track the mark itself later on after the stack had been moved. +.PP +.Vb 1 +\& I32 markoff = POPMARK; +\& +\& ... +\& +\& SP **mark = PL_stack_base + markoff; +.Ve +.SS "Temporaries Stack" +.IX Subsection "Temporaries Stack" +As noted above, xV references on the main value stack do not contribute to the +reference count of an xV, and so another mechanism is used to track when +temporary values which live on the stack must be released. This is the job of +the temporaries stack. +.PP +The temporaries stack stores pointers to xVs whose reference counts will be +decremented soon. +.PP +The base of this stack is pointed to by the interpreter variable +\&\f(CW\*(C`PL_tmps_stack\*(C'\fR, of type \f(CW\*(C`SV **\*(C'\fR. +.PP +The head of the stack is indexed by \f(CW\*(C`PL_tmps_ix\*(C'\fR, an integer which stores the +index in the array of the most recently-pushed item. +.PP +There is no public API to directly push items to the temporaries stack. Instead, +the API function \f(CWsv_2mortal()\fR is used to mortalize an xV, adding its +address to the temporaries stack. +.PP +Likewise, there is no public API to read values from the temporaries stack. +Instead, the macros \f(CW\*(C`SAVETMPS\*(C'\fR and \f(CW\*(C`FREETMPS\*(C'\fR are used. The \f(CW\*(C`SAVETMPS\*(C'\fR +macro establishes the base levels of the temporaries stack, by capturing the +current value of \f(CW\*(C`PL_tmps_ix\*(C'\fR into \f(CW\*(C`PL_tmps_floor\*(C'\fR and saving the previous +value to the save stack. Thereafter, whenever \f(CW\*(C`FREETMPS\*(C'\fR is invoked all of +the temporaries that have been pushed since that level are reclaimed. +.PP +While it is common to see these two macros in pairs within an \f(CW\*(C`ENTER\*(C'\fR/ +\&\f(CW\*(C`LEAVE\*(C'\fR pair, it is not necessary to match them. It is permitted to invoke +\&\f(CW\*(C`FREETMPS\*(C'\fR multiple times since the most recent \f(CW\*(C`SAVETMPS\*(C'\fR; for example in a +loop iterating over elements of a list. While you can invoke \f(CW\*(C`SAVETMPS\*(C'\fR +multiple times within a scope pair, it is unlikely to be useful. Subsequent +invocations will move the temporaries floor further up, thus effectively +trapping the existing temporaries to only be released at the end of the scope. +.SS "Save Stack" +.IX Subsection "Save Stack" +The save stack is used by perl to implement the \f(CW\*(C`local\*(C'\fR keyword and other +similar behaviours; any cleanup operations that need to be performed when +leaving the current scope. Items pushed to this stack generally capture the +current value of some internal variable or state, which will be restored when +the scope is unwound due to leaving, \f(CW\*(C`return\*(C'\fR, \f(CW\*(C`die\*(C'\fR, \f(CW\*(C`goto\*(C'\fR or other +reasons. +.PP +Whereas other perl internal stacks store individual items all of the same type +(usually SV pointers or integers), the items pushed to the save stack are +formed of many different types, having multiple fields to them. For example, +the \f(CW\*(C`SAVEt_INT\*(C'\fR type needs to store both the address of the \f(CW\*(C`int\*(C'\fR variable +to restore, and the value to restore it to. This information could have been +stored using fields of a \f(CW\*(C`struct\*(C'\fR, but would have to be large enough to store +three pointers in the largest case, which would waste a lot of space in most +of the smaller cases. +.PP +Instead, the stack stores information in a variable-length encoding of \f(CW\*(C`ANY\*(C'\fR +structures. The final value pushed is stored in the \f(CW\*(C`UV\*(C'\fR field which encodes +the kind of item held by the preceding items; the count and types of which +will depend on what kind of item is being stored. The kind field is pushed +last because that will be the first field to be popped when unwinding items +from the stack. +.PP +The base of this stack is pointed to by the interpreter variable +\&\f(CW\*(C`PL_savestack\*(C'\fR, of type \f(CW\*(C`ANY *\*(C'\fR. +.PP +The head of the stack is indexed by \f(CW\*(C`PL_savestack_ix\*(C'\fR, an integer which +stores the index in the array at which the next item should be pushed. (Note +that this is different to most other stacks, which reference the most +recently-pushed item). +.PP +Items are pushed to the save stack by using the various \f(CW\*(C`SAVE...()\*(C'\fR macros. +Many of these macros take a variable and store both its address and current +value on the save stack, ensuring that value gets restored on scope exit. +.PP +.Vb 5 +\& SAVEI8(i8) +\& SAVEI16(i16) +\& SAVEI32(i32) +\& SAVEINT(i) +\& ... +.Ve +.PP +There are also a variety of other special-purpose macros which save particular +types or values of interest. \f(CW\*(C`SAVETMPS\*(C'\fR has already been mentioned above. +Others include \f(CW\*(C`SAVEFREEPV\*(C'\fR which arranges for a PV (i.e. a string buffer) to +be freed, or \f(CW\*(C`SAVEDESTRUCTOR\*(C'\fR which arranges for a given function pointer to +be invoked on scope exit. A full list of such macros can be found in +\&\fIscope.h\fR. +.PP +There is no public API for popping individual values or items from the save +stack. Instead, via the scope stack, the \f(CW\*(C`ENTER\*(C'\fR and \f(CW\*(C`LEAVE\*(C'\fR pair form a way +to start and stop nested scopes. Leaving a nested scope via \f(CW\*(C`LEAVE\*(C'\fR will +restore all of the saved values that had been pushed since the most recent +\&\f(CW\*(C`ENTER\*(C'\fR. +.SS "Scope Stack" +.IX Subsection "Scope Stack" +As with the mark stack to the value stack, the scope stack forms a pair with +the save stack. The scope stack stores the height of the save stack at which +nested scopes begin, and allows the save stack to be unwound back to that +point when the scope is left. +.PP +When perl is built with debugging enabled, there is a second part to this +stack storing human-readable string names describing the type of stack +context. Each push operation saves the name as well as the height of the save +stack, and each pop operation checks the topmost name with what is expected, +causing an assertion failure if the name does not match. +.PP +The base of this stack is pointed to by the interpreter variable +\&\f(CW\*(C`PL_scopestack\*(C'\fR, of type \f(CW\*(C`I32 *\*(C'\fR. If enabled, the scope stack names are +stored in a separate array pointed to by \f(CW\*(C`PL_scopestack_name\*(C'\fR, of type +\&\f(CW\*(C`const char **\*(C'\fR. +.PP +The head of the stack is indexed by \f(CW\*(C`PL_scopestack_ix\*(C'\fR, an integer which +stores the index of the array or arrays at which the next item should be +pushed. (Note that this is different to most other stacks, which reference the +most recently-pushed item). +.PP +Values are pushed to the scope stack using the \f(CW\*(C`ENTER\*(C'\fR macro, which begins a +new nested scope. Any items pushed to the save stack are then restored at the +next nested invocation of the \f(CW\*(C`LEAVE\*(C'\fR macro. +.SH "Dynamic Scope and the Context Stack" +.IX Header "Dynamic Scope and the Context Stack" +\&\fBNote:\fR this section describes a non-public internal API that is subject +to change without notice. +.SS "Introduction to the context stack" +.IX Subsection "Introduction to the context stack" +In Perl, dynamic scoping refers to the runtime nesting of things like +subroutine calls, evals etc, as well as the entering and exiting of block +scopes. For example, the restoring of a \f(CW\*(C`local\*(C'\fRised variable is +determined by the dynamic scope. +.PP +Perl tracks the dynamic scope by a data structure called the context +stack, which is an array of \f(CW\*(C`PERL_CONTEXT\*(C'\fR structures, and which is +itself a big union for all the types of context. Whenever a new scope is +entered (such as a block, a \f(CW\*(C`for\*(C'\fR loop, or a subroutine call), a new +context entry is pushed onto the stack. Similarly when leaving a block or +returning from a subroutine call etc. a context is popped. Since the +context stack represents the current dynamic scope, it can be searched. +For example, \f(CW\*(C`next LABEL\*(C'\fR searches back through the stack looking for a +loop context that matches the label; \f(CW\*(C`return\*(C'\fR pops contexts until it +finds a sub or eval context or similar; \f(CW\*(C`caller\*(C'\fR examines sub contexts on +the stack. +.PP +Each context entry is labelled with a context type, \f(CW\*(C`cx_type\*(C'\fR. Typical +context types are \f(CW\*(C`CXt_SUB\*(C'\fR, \f(CW\*(C`CXt_EVAL\*(C'\fR etc., as well as \f(CW\*(C`CXt_BLOCK\*(C'\fR +and \f(CW\*(C`CXt_NULL\*(C'\fR which represent a basic scope (as pushed by \f(CW\*(C`pp_enter\*(C'\fR) +and a sort block. The type determines which part of the context union are +valid. +.PP +The main division in the context struct is between a substitution scope +(\f(CW\*(C`CXt_SUBST\*(C'\fR) and block scopes, which are everything else. The former is +just used while executing \f(CW\*(C`s///e\*(C'\fR, and won't be discussed further +here. +.PP +All the block scope types share a common base, which corresponds to +\&\f(CW\*(C`CXt_BLOCK\*(C'\fR. This stores the old values of various scope-related +variables like \f(CW\*(C`PL_curpm\*(C'\fR, as well as information about the current +scope, such as \f(CW\*(C`gimme\*(C'\fR. On scope exit, the old variables are restored. +.PP +Particular block scope types store extra per-type information. For +example, \f(CW\*(C`CXt_SUB\*(C'\fR stores the currently executing CV, while the various +for loop types might hold the original loop variable SV. On scope exit, +the per-type data is processed; for example the CV has its reference count +decremented, and the original loop variable is restored. +.PP +The macro \f(CW\*(C`cxstack\*(C'\fR returns the base of the current context stack, while +\&\f(CW\*(C`cxstack_ix\*(C'\fR is the index of the current frame within that stack. +.PP +In fact, the context stack is actually part of a stack-of-stacks system; +whenever something unusual is done such as calling a \f(CW\*(C`DESTROY\*(C'\fR or tie +handler, a new stack is pushed, then popped at the end. +.PP +Note that the API described here changed considerably in perl 5.24; prior +to that, big macros like \f(CW\*(C`PUSHBLOCK\*(C'\fR and \f(CW\*(C`POPSUB\*(C'\fR were used; in 5.24 +they were replaced by the inline static functions described below. In +addition, the ordering and detail of how these macros/function work +changed in many ways, often subtly. In particular they didn't handle +saving the savestack and temps stack positions, and required additional +\&\f(CW\*(C`ENTER\*(C'\fR, \f(CW\*(C`SAVETMPS\*(C'\fR and \f(CW\*(C`LEAVE\*(C'\fR compared to the new functions. The +old-style macros will not be described further. +.SS "Pushing contexts" +.IX Subsection "Pushing contexts" +For pushing a new context, the two basic functions are +\&\f(CW\*(C`cx = cx_pushblock()\*(C'\fR, which pushes a new basic context block and returns +its address, and a family of similar functions with names like +\&\f(CWcx_pushsub(cx)\fR which populate the additional type-dependent fields in +the \f(CW\*(C`cx\*(C'\fR struct. Note that \f(CW\*(C`CXt_NULL\*(C'\fR and \f(CW\*(C`CXt_BLOCK\*(C'\fR don't have their +own push functions, as they don't store any data beyond that pushed by +\&\f(CW\*(C`cx_pushblock\*(C'\fR. +.PP +The fields of the context struct and the arguments to the \f(CW\*(C`cx_*\*(C'\fR +functions are subject to change between perl releases, representing +whatever is convenient or efficient for that release. +.PP +A typical context stack pushing can be found in \f(CW\*(C`pp_entersub\*(C'\fR; the +following shows a simplified and stripped-down example of a non-XS call, +along with comments showing roughly what each function does. +.PP +.Vb 6 +\& dMARK; +\& U8 gimme = GIMME_V; +\& bool hasargs = cBOOL(PL_op\->op_flags & OPf_STACKED); +\& OP *retop = PL_op\->op_next; +\& I32 old_ss_ix = PL_savestack_ix; +\& CV *cv = ....; +\& +\& /* ... make mortal copies of stack args which are PADTMPs here ... */ +\& +\& /* ... do any additional savestack pushes here ... */ +\& +\& /* Now push a new context entry of type \*(AqCXt_SUB\*(Aq; initially just +\& * doing the actions common to all block types: */ +\& +\& cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix); +\& +\& /* this does (approximately): +\& CXINC; /* cxstack_ix++ (grow if necessary) */ +\& cx = CX_CUR(); /* and get the address of new frame */ +\& cx\->cx_type = CXt_SUB; +\& cx\->blk_gimme = gimme; +\& cx\->blk_oldsp = MARK \- PL_stack_base; +\& cx\->blk_oldsaveix = old_ss_ix; +\& cx\->blk_oldcop = PL_curcop; +\& cx\->blk_oldmarksp = PL_markstack_ptr \- PL_markstack; +\& cx\->blk_oldscopesp = PL_scopestack_ix; +\& cx\->blk_oldpm = PL_curpm; +\& cx\->blk_old_tmpsfloor = PL_tmps_floor; +\& +\& PL_tmps_floor = PL_tmps_ix; +\& */ +\& +\& +\& /* then update the new context frame with subroutine\-specific info, +\& * such as the CV about to be executed: */ +\& +\& cx_pushsub(cx, cv, retop, hasargs); +\& +\& /* this does (approximately): +\& cx\->blk_sub.cv = cv; +\& cx\->blk_sub.olddepth = CvDEPTH(cv); +\& cx\->blk_sub.prevcomppad = PL_comppad; +\& cx\->cx_type |= (hasargs) ? CXp_HASARGS : 0; +\& cx\->blk_sub.retop = retop; +\& SvREFCNT_inc_simple_void_NN(cv); +\& */ +.Ve +.PP +Note that \f(CWcx_pushblock()\fR sets two new floors: for the args stack (to +\&\f(CW\*(C`MARK\*(C'\fR) and the temps stack (to \f(CW\*(C`PL_tmps_ix\*(C'\fR). While executing at this +scope level, every \f(CW\*(C`nextstate\*(C'\fR (amongst others) will reset the args and +tmps stack levels to these floors. Note that since \f(CW\*(C`cx_pushblock\*(C'\fR uses +the current value of \f(CW\*(C`PL_tmps_ix\*(C'\fR rather than it being passed as an arg, +this dictates at what point \f(CW\*(C`cx_pushblock\*(C'\fR should be called. In +particular, any new mortals which should be freed only on scope exit +(rather than at the next \f(CW\*(C`nextstate\*(C'\fR) should be created first. +.PP +Most callers of \f(CW\*(C`cx_pushblock\*(C'\fR simply set the new args stack floor to the +top of the previous stack frame, but for \f(CW\*(C`CXt_LOOP_LIST\*(C'\fR it stores the +items being iterated over on the stack, and so sets \f(CW\*(C`blk_oldsp\*(C'\fR to the +top of these items instead. Note that, contrary to its name, \f(CW\*(C`blk_oldsp\*(C'\fR +doesn't always represent the value to restore \f(CW\*(C`PL_stack_sp\*(C'\fR to on scope +exit. +.PP +Note the early capture of \f(CW\*(C`PL_savestack_ix\*(C'\fR to \f(CW\*(C`old_ss_ix\*(C'\fR, which is +later passed as an arg to \f(CW\*(C`cx_pushblock\*(C'\fR. In the case of \f(CW\*(C`pp_entersub\*(C'\fR, +this is because, although most values needing saving are stored in fields +of the context struct, an extra value needs saving only when the debugger +is running, and it doesn't make sense to bloat the struct for this rare +case. So instead it is saved on the savestack. Since this value gets +calculated and saved before the context is pushed, it is necessary to pass +the old value of \f(CW\*(C`PL_savestack_ix\*(C'\fR to \f(CW\*(C`cx_pushblock\*(C'\fR, to ensure that the +saved value gets freed during scope exit. For most users of +\&\f(CW\*(C`cx_pushblock\*(C'\fR, where nothing needs pushing on the save stack, +\&\f(CW\*(C`PL_savestack_ix\*(C'\fR is just passed directly as an arg to \f(CW\*(C`cx_pushblock\*(C'\fR. +.PP +Note that where possible, values should be saved in the context struct +rather than on the save stack; it's much faster that way. +.PP +Normally \f(CW\*(C`cx_pushblock\*(C'\fR should be immediately followed by the appropriate +\&\f(CW\*(C`cx_pushfoo\*(C'\fR, with nothing between them; this is because if code +in-between could die (e.g. a warning upgraded to fatal), then the context +stack unwinding code in \f(CW\*(C`dounwind\*(C'\fR would see (in the example above) a +\&\f(CW\*(C`CXt_SUB\*(C'\fR context frame, but without all the subroutine-specific fields +set, and crashes would soon ensue. +.PP +Where the two must be separate, initially set the type to \f(CW\*(C`CXt_NULL\*(C'\fR or +\&\f(CW\*(C`CXt_BLOCK\*(C'\fR, and later change it to \f(CW\*(C`CXt_foo\*(C'\fR when doing the +\&\f(CW\*(C`cx_pushfoo\*(C'\fR. This is exactly what \f(CW\*(C`pp_enteriter\*(C'\fR does, once it's +determined which type of loop it's pushing. +.SS "Popping contexts" +.IX Subsection "Popping contexts" +Contexts are popped using \f(CWcx_popsub()\fR etc. and \f(CWcx_popblock()\fR. Note +however, that unlike \f(CW\*(C`cx_pushblock\*(C'\fR, neither of these functions actually +decrement the current context stack index; this is done separately using +\&\f(CWCX_POP()\fR. +.PP +There are two main ways that contexts are popped. During normal execution +as scopes are exited, functions like \f(CW\*(C`pp_leave\*(C'\fR, \f(CW\*(C`pp_leaveloop\*(C'\fR and +\&\f(CW\*(C`pp_leavesub\*(C'\fR process and pop just one context using \f(CW\*(C`cx_popfoo\*(C'\fR and +\&\f(CW\*(C`cx_popblock\*(C'\fR. On the other hand, things like \f(CW\*(C`pp_return\*(C'\fR and \f(CW\*(C`next\*(C'\fR +may have to pop back several scopes until a sub or loop context is found, +and exceptions (such as \f(CW\*(C`die\*(C'\fR) need to pop back contexts until an eval +context is found. Both of these are accomplished by \f(CWdounwind()\fR, which +is capable of processing and popping all contexts above the target one. +.PP +Here is a typical example of context popping, as found in \f(CW\*(C`pp_leavesub\*(C'\fR +(simplified slightly): +.PP +.Vb 4 +\& U8 gimme; +\& PERL_CONTEXT *cx; +\& SV **oldsp; +\& OP *retop; +\& +\& cx = CX_CUR(); +\& +\& gimme = cx\->blk_gimme; +\& oldsp = PL_stack_base + cx\->blk_oldsp; /* last arg of previous frame */ +\& +\& if (gimme == G_VOID) +\& PL_stack_sp = oldsp; +\& else +\& leave_adjust_stacks(oldsp, oldsp, gimme, 0); +\& +\& CX_LEAVE_SCOPE(cx); +\& cx_popsub(cx); +\& cx_popblock(cx); +\& retop = cx\->blk_sub.retop; +\& CX_POP(cx); +\& +\& return retop; +.Ve +.PP +The steps above are in a very specific order, designed to be the reverse +order of when the context was pushed. The first thing to do is to copy +and/or protect any return arguments and free any temps in the current +scope. Scope exits like an rvalue sub normally return a mortal copy of +their return args (as opposed to lvalue subs). It is important to make +this copy before the save stack is popped or variables are restored, or +bad things like the following can happen: +.PP +.Vb 2 +\& sub f { my $x =...; $x } # $x freed before we get to copy it +\& sub f { /(...)/; $1 } # PL_curpm restored before $1 copied +.Ve +.PP +Although we wish to free any temps at the same time, we have to be careful +not to free any temps which are keeping return args alive; nor to free the +temps we have just created while mortal copying return args. Fortunately, +\&\f(CWleave_adjust_stacks()\fR is capable of making mortal copies of return args, +shifting args down the stack, and only processing those entries on the +temps stack that are safe to do so. +.PP +In void context no args are returned, so it's more efficient to skip +calling \f(CWleave_adjust_stacks()\fR. Also in void context, a \f(CW\*(C`nextstate\*(C'\fR op +is likely to be imminently called which will do a \f(CW\*(C`FREETMPS\*(C'\fR, so there's +no need to do that either. +.PP +The next step is to pop savestack entries: \f(CWCX_LEAVE_SCOPE(cx)\fR is just +defined as \f(CWLEAVE_SCOPE(cx\->blk_oldsaveix)\fR. Note that during the +popping, it's possible for perl to call destructors, call \f(CW\*(C`STORE\*(C'\fR to undo +localisations of tied vars, and so on. Any of these can die or call +\&\f(CWexit()\fR. In this case, \f(CWdounwind()\fR will be called, and the current +context stack frame will be re-processed. Thus it is vital that all steps +in popping a context are done in such a way to support reentrancy. The +other alternative, of decrementing \f(CW\*(C`cxstack_ix\*(C'\fR \fIbefore\fR processing the +frame, would lead to leaks and the like if something died halfway through, +or overwriting of the current frame. +.PP +\&\f(CW\*(C`CX_LEAVE_SCOPE\*(C'\fR itself is safely re-entrant: if only half the savestack +items have been popped before dying and getting trapped by eval, then the +\&\f(CW\*(C`CX_LEAVE_SCOPE\*(C'\fRs in \f(CW\*(C`dounwind\*(C'\fR or \f(CW\*(C`pp_leaveeval\*(C'\fR will continue where +the first one left off. +.PP +The next step is the type-specific context processing; in this case +\&\f(CW\*(C`cx_popsub\*(C'\fR. In part, this looks like: +.PP +.Vb 4 +\& cv = cx\->blk_sub.cv; +\& CvDEPTH(cv) = cx\->blk_sub.olddepth; +\& cx\->blk_sub.cv = NULL; +\& SvREFCNT_dec(cv); +.Ve +.PP +where its processing the just-executed CV. Note that before it decrements +the CV's reference count, it nulls the \f(CW\*(C`blk_sub.cv\*(C'\fR. This means that if +it re-enters, the CV won't be freed twice. It also means that you can't +rely on such type-specific fields having useful values after the return +from \f(CW\*(C`cx_popfoo\*(C'\fR. +.PP +Next, \f(CW\*(C`cx_popblock\*(C'\fR restores all the various interpreter vars to their +previous values or previous high water marks; it expands to: +.PP +.Vb 5 +\& PL_markstack_ptr = PL_markstack + cx\->blk_oldmarksp; +\& PL_scopestack_ix = cx\->blk_oldscopesp; +\& PL_curpm = cx\->blk_oldpm; +\& PL_curcop = cx\->blk_oldcop; +\& PL_tmps_floor = cx\->blk_old_tmpsfloor; +.Ve +.PP +Note that it \fIdoesn't\fR restore \f(CW\*(C`PL_stack_sp\*(C'\fR; as mentioned earlier, +which value to restore it to depends on the context type (specifically +\&\f(CW\*(C`for (list) {}\*(C'\fR), and what args (if any) it returns; and that will +already have been sorted out earlier by \f(CWleave_adjust_stacks()\fR. +.PP +Finally, the context stack pointer is actually decremented by \f(CWCX_POP(cx)\fR. +After this point, it's possible that that the current context frame could +be overwritten by other contexts being pushed. Although things like ties +and \f(CW\*(C`DESTROY\*(C'\fR are supposed to work within a new context stack, it's best +not to assume this. Indeed on debugging builds, \f(CWCX_POP(cx)\fR deliberately +sets \f(CW\*(C`cx\*(C'\fR to null to detect code that is still relying on the field +values in that context frame. Note in the \f(CWpp_leavesub()\fR example above, +we grab \f(CW\*(C`blk_sub.retop\*(C'\fR \fIbefore\fR calling \f(CW\*(C`CX_POP\*(C'\fR. +.SS "Redoing contexts" +.IX Subsection "Redoing contexts" +Finally, there is \f(CWcx_topblock(cx)\fR, which acts like a super\-\f(CW\*(C`nextstate\*(C'\fR +as regards to resetting various vars to their base values. It is used in +places like \f(CW\*(C`pp_next\*(C'\fR, \f(CW\*(C`pp_redo\*(C'\fR and \f(CW\*(C`pp_goto\*(C'\fR where rather than +exiting a scope, we want to re-initialise the scope. As well as resetting +\&\f(CW\*(C`PL_stack_sp\*(C'\fR like \f(CW\*(C`nextstate\*(C'\fR, it also resets \f(CW\*(C`PL_markstack_ptr\*(C'\fR, +\&\f(CW\*(C`PL_scopestack_ix\*(C'\fR and \f(CW\*(C`PL_curpm\*(C'\fR. Note that it doesn't do a +\&\f(CW\*(C`FREETMPS\*(C'\fR. +.SH "Slab-based operator allocation" +.IX Header "Slab-based operator allocation" +\&\fBNote:\fR this section describes a non-public internal API that is subject +to change without notice. +.PP +Perl's internal error-handling mechanisms implement \f(CW\*(C`die\*(C'\fR (and its internal +equivalents) using longjmp. If this occurs during lexing, parsing or +compilation, we must ensure that any ops allocated as part of the compilation +process are freed. (Older Perl versions did not adequately handle this +situation: when failing a parse, they would leak ops that were stored in +C \f(CW\*(C`auto\*(C'\fR variables and not linked anywhere else.) +.PP +To handle this situation, Perl uses \fIop slabs\fR that are attached to the +currently-compiling CV. A slab is a chunk of allocated memory. New ops are +allocated as regions of the slab. If the slab fills up, a new one is created +(and linked from the previous one). When an error occurs and the CV is freed, +any ops remaining are freed. +.PP +Each op is preceded by two pointers: one points to the next op in the slab, and +the other points to the slab that owns it. The next-op pointer is needed so +that Perl can iterate over a slab and free all its ops. (Op structures are of +different sizes, so the slab's ops can't merely be treated as a dense array.) +The slab pointer is needed for accessing a reference count on the slab: when +the last op on a slab is freed, the slab itself is freed. +.PP +The slab allocator puts the ops at the end of the slab first. This will tend to +allocate the leaves of the op tree first, and the layout will therefore +hopefully be cache-friendly. In addition, this means that there's no need to +store the size of the slab (see below on why slabs vary in size), because Perl +can follow pointers to find the last op. +.PP +It might seem possible to eliminate slab reference counts altogether, by having +all ops implicitly attached to \f(CW\*(C`PL_compcv\*(C'\fR when allocated and freed when the +CV is freed. That would also allow \f(CW\*(C`op_free\*(C'\fR to skip \f(CW\*(C`FreeOp\*(C'\fR altogether, and +thus free ops faster. But that doesn't work in those cases where ops need to +survive beyond their CVs, such as re-evals. +.PP +The CV also has to have a reference count on the slab. Sometimes the first op +created is immediately freed. If the reference count of the slab reaches 0, +then it will be freed with the CV still pointing to it. +.PP +CVs use the \f(CW\*(C`CVf_SLABBED\*(C'\fR flag to indicate that the CV has a reference count +on the slab. When this flag is set, the slab is accessible via \f(CW\*(C`CvSTART\*(C'\fR when +\&\f(CW\*(C`CvROOT\*(C'\fR is not set, or by subtracting two pointers \f(CW\*(C`(2*sizeof(I32 *))\*(C'\fR from +\&\f(CW\*(C`CvROOT\*(C'\fR when it is set. The alternative to this approach of sneaking the slab +into \f(CW\*(C`CvSTART\*(C'\fR during compilation would be to enlarge the \f(CW\*(C`xpvcv\*(C'\fR struct by +another pointer. But that would make all CVs larger, even though slab-based op +freeing is typically of benefit only for programs that make significant use of +string eval. +.PP +When the \f(CW\*(C`CVf_SLABBED\*(C'\fR flag is set, the CV takes responsibility for freeing +the slab. If \f(CW\*(C`CvROOT\*(C'\fR is not set when the CV is freed or undeffed, it is +assumed that a compilation error has occurred, so the op slab is traversed and +all the ops are freed. +.PP +Under normal circumstances, the CV forgets about its slab (decrementing the +reference count) when the root is attached. So the slab reference counting that +happens when ops are freed takes care of freeing the slab. In some cases, the +CV is told to forget about the slab (\f(CW\*(C`cv_forget_slab\*(C'\fR) precisely so that the +ops can survive after the CV is done away with. +.PP +Forgetting the slab when the root is attached is not strictly necessary, but +avoids potential problems with \f(CW\*(C`CvROOT\*(C'\fR being written over. There is code all +over the place, both in core and on CPAN, that does things with \f(CW\*(C`CvROOT\*(C'\fR, so +forgetting the slab makes things more robust and avoids potential problems. +.PP +Since the CV takes ownership of its slab when flagged, that flag is never +copied when a CV is cloned, as one CV could free a slab that another CV still +points to, since forced freeing of ops ignores the reference count (but asserts +that it looks right). +.PP +To avoid slab fragmentation, freed ops are marked as freed and attached to the +slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused +when possible. Not reusing freed ops would be simpler, but it would result in +significantly higher memory usage for programs with large \f(CW\*(C`if (DEBUG) {...}\*(C'\fR +blocks. +.PP +\&\f(CW\*(C`SAVEFREEOP\*(C'\fR is slightly problematic under this scheme. Sometimes it can cause +an op to be freed after its CV. If the CV has forcibly freed the ops on its +slab and the slab itself, then we will be fiddling with a freed slab. Making +\&\f(CW\*(C`SAVEFREEOP\*(C'\fR a no-op doesn't help, as sometimes an op can be savefreed when +there is no compilation error, so the op would never be freed. It holds +a reference count on the slab, so the whole slab would leak. So \f(CW\*(C`SAVEFREEOP\*(C'\fR +now sets a special flag on the op (\f(CW\*(C`\->op_savefree\*(C'\fR). The forced freeing of +ops after a compilation error won't free any ops thus marked. +.PP +Since many pieces of code create tiny subroutines consisting of only a few ops, +and since a huge slab would be quite a bit of baggage for those to carry +around, the first slab is always very small. To avoid allocating too many +slabs for a single CV, each subsequent slab is twice the size of the previous. +.PP +Smartmatch expects to be able to allocate an op at run time, run it, and then +throw it away. For that to work the op is simply malloced when \f(CW\*(C`PL_compcv\*(C'\fR hasn't +been set up. So all slab-allocated ops are marked as such (\f(CW\*(C`\->op_slabbed\*(C'\fR), +to distinguish them from malloced ops. +.SH AUTHORS +.IX Header "AUTHORS" +Until May 1997, this document was maintained by Jeff Okamoto +. It is now maintained as part of Perl +itself by the Perl 5 Porters . +.PP +With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, +Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil +Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, +Stephen McCamant, and Gurusamy Sarathy. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +perlapi, perlintern, perlxs, perlembed -- cgit v1.2.3