diff options
Diffstat (limited to 'upstream/archlinux/man1/perlhacktips.1perl')
-rw-r--r-- | upstream/archlinux/man1/perlhacktips.1perl | 2067 |
1 files changed, 2067 insertions, 0 deletions
diff --git a/upstream/archlinux/man1/perlhacktips.1perl b/upstream/archlinux/man1/perlhacktips.1perl new file mode 100644 index 00000000..ad533d60 --- /dev/null +++ b/upstream/archlinux/man1/perlhacktips.1perl @@ -0,0 +1,2067 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLHACKTIPS 1perl" +.TH PERLHACKTIPS 1perl 2024-02-11 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perlhacktips \- Tips for Perl core C code hacking +.SH DESCRIPTION +.IX Header "DESCRIPTION" +This document will help you learn the best way to go about hacking on +the Perl core C code. It covers common problems, debugging, profiling, +and more. +.PP +If you haven't read perlhack and perlhacktut yet, you might want +to do that first. +.SH "COMMON PROBLEMS" +.IX Header "COMMON PROBLEMS" +Perl source now permits some specific C99 features which we know are +supported by all platforms, but mostly plays by ANSI C89 rules. +You don't care about some particular platform having broken Perl? I +hear there is still a strong demand for J2EE programmers. +.SS "Perl environment problems" +.IX Subsection "Perl environment problems" +.IP \(bu 4 +Not compiling with threading +.Sp +Compiling with threading (\-Duseithreads) completely rewrites the +function prototypes of Perl. You better try your changes with that. +Related to this is the difference between "Perl_\-less" and "Perl_\-ly" +APIs, for example: +.Sp +.Vb 2 +\& Perl_sv_setiv(aTHX_ ...); +\& sv_setiv(...); +.Ve +.Sp +The first one explicitly passes in the context, which is needed for +e.g. threaded builds. The second one does that implicitly; do not get +them mixed. If you are not passing in a aTHX_, you will need to do a +dTHX as the first thing in the function. +.Sp +See "How multiple interpreters and concurrency are +supported" in perlguts for further discussion about context. +.IP \(bu 4 +Not compiling with \-DDEBUGGING +.Sp +The DEBUGGING define exposes more code to the compiler, therefore more +ways for things to go wrong. You should try it. +.IP \(bu 4 +Introducing (non-read-only) globals +.Sp +Do not introduce any modifiable globals, truly global or file static. +They are bad form and complicate multithreading and other forms of +concurrency. The right way is to introduce them as new interpreter +variables, see \fIintrpvar.h\fR (at the very end for binary +compatibility). +.Sp +Introducing read-only (const) globals is okay, as long as you verify +with e.g. \f(CW\*(C`nm libperl.a|egrep \-v \*(Aq [TURtr] \*(Aq\*(C'\fR (if your \f(CW\*(C`nm\*(C'\fR has +BSD-style output) that the data you added really is read-only. (If it +is, it shouldn't show up in the output of that command.) +.Sp +If you want to have static strings, make them constant: +.Sp +.Vb 1 +\& static const char etc[] = "..."; +.Ve +.Sp +If you want to have arrays of constant strings, note carefully the +right combination of \f(CW\*(C`const\*(C'\fRs: +.Sp +.Vb 2 +\& static const char * const yippee[] = +\& {"hi", "ho", "silver"}; +.Ve +.IP \(bu 4 +Not exporting your new function +.Sp +Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any +function that is part of the public API (the shared Perl library) to be +explicitly marked as exported. See the discussion about \fIembed.pl\fR in +perlguts. +.IP \(bu 4 +Exporting your new function +.Sp +The new shiny result of either genuine new functionality or your +arduous refactoring is now ready and correctly exported. So what could +possibly go wrong? +.Sp +Maybe simply that your function did not need to be exported in the +first place. Perl has a long and not so glorious history of exporting +functions that it should not have. +.Sp +If the function is used only inside one source code file, make it +static. See the discussion about \fIembed.pl\fR in perlguts. +.Sp +If the function is used across several files, but intended only for +Perl's internal use (and this should be the common case), do not export +it to the public API. See the discussion about \fIembed.pl\fR in +perlguts. +.SS C99 +.IX Subsection "C99" +Starting from 5.35.5 we now permit some C99 features in the core C source. +However, code in dual life extensions still needs to be C89 only, because it +needs to compile against earlier version of Perl running on older platforms. +Also note that our headers need to also be valid as C++, because XS extensions +written in C++ need to include them, hence \fImember structure initialisers\fR +can't be used in headers. +.PP +C99 support is still far from complete on all platforms we currently support. +As a baseline we can only assume C89 semantics with the specific C99 features +described below, which we've verified work everywhere. It's fine to probe for +additional C99 features and use them where available, providing there is also a +fallback for compilers that don't support the feature. For example, we use C11 +thread local storage when available, but fall back to POSIX thread specific +APIs otherwise, and we use \f(CW\*(C`char\*(C'\fR for booleans if \f(CW\*(C`<stdbool.h>\*(C'\fR isn't +available. +.PP +Code can use (and rely on) the following C99 features being present +.IP \(bu 4 +mixed declarations and code +.IP \(bu 4 +64 bit integer types +.Sp +For consistency with the existing source code, use the typedefs \f(CW\*(C`I64\*(C'\fR and +\&\f(CW\*(C`U64\*(C'\fR, instead of using \f(CW\*(C`long long\*(C'\fR and \f(CW\*(C`unsigned long long\*(C'\fR directly. +.IP \(bu 4 +variadic macros +.Sp +.Vb 2 +\& void greet(char *file, unsigned int line, char *format, ...); +\& #define logged_greet(...) greet(_\|_FILE_\|_, _\|_LINE_\|_, _\|_VA_ARGS_\|_); +.Ve +.Sp +Note that \f(CW\*(C`_\|_VA_OPT_\|_\*(C'\fR is a gcc extension not yet in any published standard. +.IP \(bu 4 +declarations in for loops +.Sp +.Vb 3 +\& for (const char *p = message; *p; ++p) { +\& putchar(*p); +\& } +.Ve +.IP \(bu 4 +member structure initialisers +.Sp +But not in headers, as support was only added to C++ relatively recently. +.Sp +Hence this is fine in C and XS code, but not headers: +.Sp +.Vb 4 +\& struct message { +\& char *action; +\& char *target; +\& }; +\& +\& struct message mcguffin = { +\& .target = "member structure initialisers", +\& .action = "Built" +\& }; +.Ve +.IP \(bu 4 +flexible array members +.Sp +This is standards conformant: +.Sp +.Vb 4 +\& struct greeting { +\& unsigned int len; +\& char message[]; +\& }; +.Ve +.Sp +However, the source code already uses the "unwarranted chumminess with the +compiler" hack in many places: +.Sp +.Vb 4 +\& struct greeting { +\& unsigned int len; +\& char message[1]; +\& }; +.Ve +.Sp +Strictly it \fBis\fR undefined behaviour accessing beyond \f(CW\*(C`message[0]\*(C'\fR, but this +has been a commonly used hack since K&R times, and using it hasn't been a +practical issue anywhere (in the perl source or any other common C code). +Hence it's unclear what we would gain from actively changing to the C99 +approach. +.IP \(bu 4 +\&\f(CW\*(C`//\*(C'\fR comments +.Sp +All compilers we tested support their use. Not all humans we tested support +their use. +.PP +Code explicitly should not use any other C99 features. For example +.IP \(bu 4 +variable length arrays +.Sp +Not supported by \fBany\fR MSVC, and this is not going to change. +.Sp +Even "variable" length arrays where the variable is a constant expression +are syntax errors under MSVC. +.IP \(bu 4 +C99 types in \f(CW\*(C`<stdint.h>\*(C'\fR +.Sp +Use \f(CW\*(C`PERL_INT_FAST8_T\*(C'\fR etc as defined in \fIhandy.h\fR +.IP \(bu 4 +C99 format strings in \f(CW\*(C`<inttypes.h>\*(C'\fR +.Sp +\&\f(CW\*(C`snprintf\*(C'\fR in the VMS libc only added support for \f(CW\*(C`PRIdN\*(C'\fR etc very recently, +meaning that there are live supported installations without this, or formats +such as \f(CW%zu\fR. +.Sp +(perl's \f(CW\*(C`sv_catpvf\*(C'\fR etc use parser code code in \f(CW\*(C`sv.c\*(C'\fR, which supports the +\&\f(CW\*(C`z\*(C'\fR modifier, along with perl-specific formats such as \f(CW\*(C`SVf\*(C'\fR.) +.PP +If you want to use a C99 feature not listed above then you need to do one of +.IP \(bu 4 +Probe for it in \fIConfigure\fR, set a variable in \fIconfig.sh\fR, and add fallback logic in the headers for platforms which don't have it. +.IP \(bu 4 +Write test code and verify that it works on platforms we need to support, before relying on it unconditionally. +.PP +Likely you want to repeat the same plan as we used to get the current C99 +feature set. See the message at https://markmail.org/thread/odr4fjrn72u2fkpz +for the C99 probes we used before. Note that the two most "fussy" compilers +appear to be MSVC and the vendor compiler on VMS. To date all the *nix +compilers have been far more flexible in what they support. +.PP +On *nix platforms, \fIConfigure\fR attempts to set compiler flags appropriately. +All vendor compilers that we tested defaulted to C99 (or C11) support. +However, older versions of gcc default to C89, or permit \fImost\fR C99 (with +warnings), but forbid \fIdeclarations in for loops\fR unless \f(CW\*(C`\-std=gnu99\*(C'\fR is +added. The alternative \f(CW\*(C`\-std=c99\*(C'\fR \fBmight\fR seem better, but using it on some +platforms can prevent \f(CW\*(C`<unistd.h>\*(C'\fR declaring some prototypes being +declared, which breaks the build. gcc's \f(CW\*(C`\-ansi\*(C'\fR flag implies \f(CW\*(C`\-std=c89\*(C'\fR so we +can no longer set that, hence the Configure option \f(CW\*(C`\-gccansipedantic\*(C'\fR now only +adds \f(CW\*(C`\-pedantic\*(C'\fR. +.PP +The Perl core source code files (the ones at the top level of the source code +distribution) are automatically compiled with as many as possible of the +\&\f(CW\*(C`\-std=gnu99\*(C'\fR, \f(CW\*(C`\-pedantic\*(C'\fR, and a selection of \f(CW\*(C`\-W\*(C'\fR flags (see +cflags.SH). Files in \fIext/\fR \fIdist/\fR \fIcpan/\fR etc are compiled with the same +flags as the installed perl would use to compile XS extensions. +.PP +Basically, it's safe to assume that \fIConfigure\fR and \fIcflags.SH\fR have +picked the best combination of flags for the version of gcc on the platform, +and attempting to add more flags related to enforcing a C dialect will +cause problems either locally, or on other systems that the code is shipped +to. +.PP +We believe that the C99 support in gcc 3.1 is good enough for us, but we don't +have a 19 year old gcc handy to check this :\-) +If you have ancient vendor compilers that don't default to C99, the flags +you might want to try are +.IP AIX 4 +.IX Item "AIX" +\&\f(CW\*(C`\-qlanglvl=stdc99\*(C'\fR +.IP HP/UX 4 +.IX Item "HP/UX" +\&\f(CW\*(C`\-AC99\*(C'\fR +.IP Solaris 4 +.IX Item "Solaris" +\&\f(CW\*(C`\-xc99\*(C'\fR +.SS "Symbol Names and Namespace Pollution" +.IX Subsection "Symbol Names and Namespace Pollution" +\fIChoosing legal symbol names\fR +.IX Subsection "Choosing legal symbol names" +.PP +C reserves for its implementation any symbol whose name begins with an +underscore followed immediately by either an uppercase letter \f(CW\*(C`[A\-Z]\*(C'\fR +or another underscore. C++ further reserves any symbol containing two +consecutive underscores, and further reserves in the global name space any +symbol beginning with an underscore, not just ones followed by a +capital. We care about C++ because \f(CW\*(C`hdr\*(C'\fR files need to be compilable by +it, and some people do all their development using a C++ compiler. +.PP +The consequences of failing to do this are probably none. Unless you +stumble on a name that the implementation uses, things will work. +Indeed, the perl core has more than a few instances of using +implementation-reserved symbols. (These are gradually being changed.) +But your code might stop working any time that the implementation +decides to use a name you already had chosen, potentially many years +before. +.PP +It's best then to: +.ie n .IP "\fBDon't begin a symbol name with an underscore\fR; (\fIe.g.\fR, don't use: ""_FOOBAR"")" 4 +.el .IP "\fBDon't begin a symbol name with an underscore\fR; (\fIe.g.\fR, don't use: \f(CW_FOOBAR\fR)" 4 +.IX Item "Don't begin a symbol name with an underscore; (e.g., don't use: _FOOBAR)" +.PD 0 +.ie n .IP "\fBDon't use two consecutive underscores in a symbol name\fR; (\fIe.g.\fR, don't use ""FOO_\|_BAR"")" 4 +.el .IP "\fBDon't use two consecutive underscores in a symbol name\fR; (\fIe.g.\fR, don't use \f(CWFOO_\|_BAR\fR)" 4 +.IX Item "Don't use two consecutive underscores in a symbol name; (e.g., don't use FOO__BAR)" +.PD +.PP +POSIX also reserves many symbols. See Section 2.2.2 in +<http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html>. +Perl also has conflicts with that. +.PP +Perl reserves for its use any symbol beginning with \f(CW\*(C`Perl\*(C'\fR, \f(CW\*(C`perl\*(C'\fR, or +\&\f(CW\*(C`PL_\*(C'\fR. Any time you introduce a macro into a \f(CW\*(C`hdr\*(C'\fR file that doesn't +follow that convention, you are creating the possiblity of a namespace +clash with an existing XS module, unless you restrict it by, say, +.PP +.Vb 3 +\& #ifdef PERL_CORE +\& # define my_symbol +\& #endif +.Ve +.PP +There are many symbols in \f(CW\*(C`hdr\*(C'\fR files that aren't of this form, and +which are accessible from XS namespace, intentionally or not, just about +anything in \fIconfig.h\fR, for example. +.PP +Having to use one of these prefixes detracts from the readability of the +code, and hasn't been an actual issue for non-trivial names. Things +like perl defining its own \f(CW\*(C`MAX\*(C'\fR macro have been problematic, but they +were quickly discovered, and a \f(CW\*(C`#ifdef\ PERL_CORE\*(C'\fR guard added. +.PP +So there's no rule imposed about using such symbols, just be aware of +the issues. +.PP +\fIChoosing good symbol names\fR +.IX Subsection "Choosing good symbol names" +.PP +Ideally, a symbol name name should correctly and precisely describe its +intended purpose. But there is a tension between that and getting names +that are overly long and hence awkward to type and read. Metaphors +could be helpful (a poetic name), but those tend to be culturally +specific, and may not translate for someone whose native language isn't +English, or even comes from a different cultural background. Besides, +the talent of writing poetry seems to be rare in programmers. +.PP +Certain symbol names don't reflect their purpose, but are nonetheless +fine to use because of long-standing conventions. These often +originated in the field of Mathematics, where \f(CW\*(C`i\*(C'\fR and \f(CW\*(C`j\*(C'\fR are +frequently used as subscripts, and \f(CW\*(C`n\*(C'\fR as a population count. Since at +least the 1950's, computer programs have used \f(CW\*(C`i\*(C'\fR, \fIetc.\fR as loop +variables. +.PP +Our guidance is to choose a name that reasonably describes the purpose, +and to comment its declaration more precisely. +.PP +One certainly shouldn't use misleading nor ambiguous names. \f(CW\*(C`last_foo\*(C'\fR +could mean either the final \f(CW\*(C`foo\*(C'\fR or the previous \f(CW\*(C`foo\*(C'\fR, and so could +be confusing to the reader, or even to the writer coming back to the +code after a few months of working on something else. Sometimes the +programmer has a particular line of thought in mind, and it doesn't +occur to them that ambiguity is present. +.PP +There are probably still many off\-by\-1 bugs around because the name +"\f(CW\*(C`av_len\*(C'\fR" in perlapi doesn't correspond to what other \fI\-len\fR constructs +mean, such as "\f(CW\*(C`sv_len\*(C'\fR" in perlapi. Awkward (and controversial) +synonyms were created to use instead that conveyed its true meaning +("\f(CW\*(C`av_top_index\*(C'\fR" in perlapi). Eventually, though someone had the better +idea to create a new name to signify what most people think \f(CW\*(C`\-len\*(C'\fR +signifies. So "\f(CW\*(C`av_count\*(C'\fR" in perlapi was born. And we wish it had been +thought up much earlier. +.SS "Writing safer macros" +.IX Subsection "Writing safer macros" +Macros are used extensively in the Perl core for such things as hiding +internal details from the caller, so that it doesn't have to be +concerned about them. For example, most lines of code don't need +to know if they are running on a threaded versus unthreaded perl. That +detail is automatically mostly hidden. +.PP +It is often better to use an inline function instead of a macro. They +are immune to name collisions with the caller, and don't magnify +problems when called with parameters that are expressions with side +effects. There was a time when one might choose a macro over an inline +function because compiler support for inline functions was quite +limited. Some only would actually only inline the first two or three +encountered in a compilation. But those days are long gone, and inline +functions are fully supported in modern compilers. +.PP +Nevertheless, there are situations where a function won't do, and a +macro is required. One example is when a parameter can be any of +several types. A function has to be declared with a single explicit +.PP +Or maybe the code involved is so trivial that a function would be just +complicating overkill, such as when the macro simply creates a mnemonic +name for some constant value. +.PP +If you do choose to use a non-trivial macro, be aware that there are +several avoidable pitfalls that can occur. Keep in mind that a macro is +expanded within the lexical context of each place in the source it is +called. If you have a token \f(CW\*(C`foo\*(C'\fR in the macro and the source happens +also to have \f(CW\*(C`foo\*(C'\fR, the meaning of the macro's \f(CW\*(C`foo\*(C'\fR will become that +of the caller's. Sometimes that is exactly the behavior you want, but +be aware that this tends to be confusing later on. It effectively turns +\&\f(CW\*(C`foo\*(C'\fR into a reserved word for any code that calls the macro, and this +fact is usually not documented nor considered. It is safer to pass +\&\f(CW\*(C`foo\*(C'\fR as a parameter, so that \f(CW\*(C`foo\*(C'\fR remains freely available to the +caller and the macro interface is explicitly specified. +.PP +Worse is when the equivalence between the two \f(CW\*(C`foo\*(C'\fR's is coincidental. +Suppose for example, that the macro declares a variable +.PP +.Vb 1 +\& int foo +.Ve +.PP +That works fine as long as the caller doesn't define the string \f(CW\*(C`foo\*(C'\fR +in some way. And it might not be until years later that someone comes +along with an instance where \f(CW\*(C`foo\*(C'\fR is used. For example a future +caller could do this: +.PP +.Vb 1 +\& #define foo bar +.Ve +.PP +Then that declaration of \f(CW\*(C`foo\*(C'\fR in the macro suddenly becomes +.PP +.Vb 1 +\& int bar +.Ve +.PP +That could mean that something completely different happens than +intended. It is hard to debug; the macro and call may not even be in +the same file, so it would require some digging and gnashing of teeth to +figure out. +.PP +Therefore, if a macro does use variables, their names should be such +that it is very unlikely that they would collide with any caller, now or +forever. One way to do that, now being used in the perl source, is to +include the name of the macro itself as part of the name of each +variable in the macro. Suppose the macro is named \f(CW\*(C`SvPV\*(C'\fR Then we +could have +.PP +.Vb 1 +\& int foo_svpv_ = 0; +.Ve +.PP +This is harder to read than plain \f(CW\*(C`foo\*(C'\fR, but it is pretty much +guaranteed that a caller will never naively use \f(CW\*(C`foo_svpv_\*(C'\fR (and run +into problems). (The lowercasing makes it clearer that this is a +variable, but assumes that there won't be two elements whose names +differ only in the case of their letters.) The trailing underscore +makes it even more unlikely to clash, as those, by convention, signify a +private variable name. (See "Choosing legal symbol names" for +restrictions on what names you can use.) +.PP +This kind of name collision doesn't happen with the macro's formal +parameters, so they don't need to have complicated names. But there are +pitfalls when a a parameter is an expression, or has some Perl magic +attached. When calling a function, C will evaluate the parameter once, +and pass the result to the function. But when calling a macro, the +parameter is copied as-is by the C preprocessor to each instance inside +the macro. This means that when evaluating a parameter having side +effects, the function and macro results differ. This is particularly +fraught when a parameter has overload magic, say it is a tied variable +that reads the next line in a file upon each evaluation. Having it read +multiple lines per call is probably not what the caller intended. If a +macro refers to a potentially overloadable parameter more than once, it +should first make a copy and then use that copy the rest of the time. +There are macros in the perl core that violate this, but are gradually +being converted, usually by changing to use inline functions instead. +.PP +Above we said "first make a copy". In a macro, that is easier said than +done, because macros are normally expressions, and declarations aren't +allowed in expressions. But the \f(CW\*(C`STMT_START\*(C'\fR\ ..\ \f(CW\*(C`STMT_END\*(C'\fR +construct, described in perlapi, allows you to +have declarations in most contexts, as long as you don't need a return +value. If you do need a value returned, you can make the interface such +that a pointer is passed to the construct, which then stores its result +there. (Or you can use GCC brace groups. But these require a fallback +if the code will ever get executed on a platform that lacks this +non-standard extension to C. And that fallback would be another code +path, which can get out-of-sync with the brace group one, so doing this +isn't advisable.) In situations where there's no other way, Perl does +furnish "\f(CW\*(C`PL_Sv\*(C'\fR" in perlintern and "\f(CW\*(C`PL_na\*(C'\fR" in perlapi to use (with a +slight performance penalty) for some such common cases. But beware that +a call chain involving multiple macros using them will zap the other's +use. These have been very difficult to debug. +.PP +For a concrete example of these pitfalls in action, see +<https://perlmonks.org/?node_id=11144355> +.SS "Portability problems" +.IX Subsection "Portability problems" +The following are common causes of compilation and/or execution +failures, not common to Perl as such. The C FAQ is good bedtime +reading. Please test your changes with as many C compilers and +platforms as possible; we will, anyway, and it's nice to save oneself +from public embarrassment. +.PP +Also study perlport carefully to avoid any bad assumptions about the +operating system, filesystems, character set, and so forth. +.PP +Do not assume an operating system indicates a certain compiler. +.IP \(bu 4 +Casting pointers to integers or casting integers to pointers +.Sp +.Vb 3 +\& void castaway(U8* p) +\& { +\& IV i = p; +.Ve +.Sp +or +.Sp +.Vb 3 +\& void castaway(U8* p) +\& { +\& IV i = (IV)p; +.Ve +.Sp +Both are bad, and broken, and unportable. Use the \fBPTR2IV()\fR macro that +does it right. (Likewise, there are \fBPTR2UV()\fR, \fBPTR2NV()\fR, \fBINT2PTR()\fR, and +\&\fBNUM2PTR()\fR.) +.IP \(bu 4 +Casting between function pointers and data pointers +.Sp +Technically speaking casting between function pointers and data +pointers is unportable and undefined, but practically speaking it seems +to work, but you should use the \fBFPTR2DPTR()\fR and \fBDPTR2FPTR()\fR macros. +Sometimes you can also play games with unions. +.IP \(bu 4 +Assuming sizeof(int) == sizeof(long) +.Sp +There are platforms where longs are 64 bits, and platforms where ints +are 64 bits, and while we are out to shock you, even platforms where +shorts are 64 bits. This is all legal according to the C standard. (In +other words, "long long" is not a portable way to specify 64 bits, and +"long long" is not even guaranteed to be any wider than "long".) +.Sp +Instead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth. +Avoid things like I32 because they are \fBnot\fR guaranteed to be +\&\fIexactly\fR 32 bits, they are \fIat least\fR 32 bits, nor are they +guaranteed to be \fBint\fR or \fBlong\fR. If you explicitly need +64\-bit variables, use I64 and U64. +.IP \(bu 4 +Assuming one can dereference any type of pointer for any type of data +.Sp +.Vb 2 +\& char *p = ...; +\& long pony = *(long *)p; /* BAD */ +.Ve +.Sp +Many platforms, quite rightly so, will give you a core dump instead of +a pony if the p happens not to be correctly aligned. +.IP \(bu 4 +Lvalue casts +.Sp +.Vb 1 +\& (int)*p = ...; /* BAD */ +.Ve +.Sp +Simply not portable. Get your lvalue to be of the right type, or maybe +use temporary variables, or dirty tricks with unions. +.IP \(bu 4 +Assume \fBanything\fR about structs (especially the ones you don't +control, like the ones coming from the system headers) +.RS 4 +.IP \(bu 8 +That a certain field exists in a struct +.IP \(bu 8 +That no other fields exist besides the ones you know of +.IP \(bu 8 +That a field is of certain signedness, sizeof, or type +.IP \(bu 8 +That the fields are in a certain order +.RS 8 +.IP \(bu 8 +While C guarantees the ordering specified in the struct definition, +between different platforms the definitions might differ +.RE +.RS 8 +.RE +.IP \(bu 8 +That the sizeof(struct) or the alignments are the same everywhere +.RS 8 +.IP \(bu 8 +There might be padding bytes between the fields to align the fields \- +the bytes can be anything +.IP \(bu 8 +Structs are required to be aligned to the maximum alignment required by +the fields \- which for native types is for usually equivalent to +\&\fBsizeof()\fR of the field +.RE +.RS 8 +.RE +.RE +.RS 4 +.RE +.IP \(bu 4 +Assuming the character set is ASCIIish +.Sp +Perl can compile and run under EBCDIC platforms. See perlebcdic. +This is transparent for the most part, but because the character sets +differ, you shouldn't use numeric (decimal, octal, nor hex) constants +to refer to characters. You can safely say \f(CW\*(AqA\*(Aq\fR, but not \f(CW0x41\fR. +You can safely say \f(CW\*(Aq\en\*(Aq\fR, but not \f(CW\*(C`\e012\*(C'\fR. However, you can use +macros defined in \fIutf8.h\fR to specify any code point portably. +\&\f(CWLATIN1_TO_NATIVE(0xDF)\fR is going to be the code point that means +LATIN SMALL LETTER SHARP S on whatever platform you are running on (on +ASCII platforms it compiles without adding any extra code, so there is +zero performance hit on those). The acceptable inputs to +\&\f(CW\*(C`LATIN1_TO_NATIVE\*(C'\fR are from \f(CW0x00\fR through \f(CW0xFF\fR. If your input +isn't guaranteed to be in that range, use \f(CW\*(C`UNICODE_TO_NATIVE\*(C'\fR instead. +\&\f(CW\*(C`NATIVE_TO_LATIN1\*(C'\fR and \f(CW\*(C`NATIVE_TO_UNICODE\*(C'\fR translate the opposite +direction. +.Sp +If you need the string representation of a character that doesn't have a +mnemonic name in C, you should add it to the list in +\&\fIregen/unicode_constants.pl\fR, and have Perl create \f(CW\*(C`#define\*(C'\fR's for you, +based on the current platform. +.Sp +Note that the \f(CW\*(C`is\fR\f(CIFOO\fR\f(CW\*(C'\fR and \f(CW\*(C`to\fR\f(CIFOO\fR\f(CW\*(C'\fR macros in \fIhandy.h\fR work +properly on native code points and strings. +.Sp +Also, the range 'A' \- 'Z' in ASCII is an unbroken sequence of 26 upper +case alphabetic characters. That is not true in EBCDIC. Nor for 'a' to +\&'z'. But '0' \- '9' is an unbroken range in both systems. Don't assume +anything about other ranges. (Note that special handling of ranges in +regular expression patterns and transliterations makes it appear to Perl +code that the aforementioned ranges are all unbroken.) +.Sp +Many of the comments in the existing code ignore the possibility of +EBCDIC, and may be wrong therefore, even if the code works. This is +actually a tribute to the successful transparent insertion of being +able to handle EBCDIC without having to change pre-existing code. +.Sp +UTF\-8 and UTF-EBCDIC are two different encodings used to represent +Unicode code points as sequences of bytes. Macros with the same names +(but different definitions) in \fIutf8.h\fR and \fIutfebcdic.h\fR are used to +allow the calling code to think that there is only one such encoding. +This is almost always referred to as \f(CW\*(C`utf8\*(C'\fR, but it means the EBCDIC +version as well. Again, comments in the code may well be wrong even if +the code itself is right. For example, the concept of UTF\-8 \f(CW\*(C`invariant +characters\*(C'\fR differs between ASCII and EBCDIC. On ASCII platforms, only +characters that do not have the high-order bit set (i.e. whose ordinals +are strict ASCII, 0 \- 127) are invariant, and the documentation and +comments in the code may assume that, often referring to something +like, say, \f(CW\*(C`hibit\*(C'\fR. The situation differs and is not so simple on +EBCDIC machines, but as long as the code itself uses the +\&\f(CWNATIVE_IS_INVARIANT()\fR macro appropriately, it works, even if the +comments are wrong. +.Sp +As noted in "TESTING" in perlhack, when writing test scripts, the file +\&\fIt/charset_tools.pl\fR contains some helpful functions for writing tests +valid on both ASCII and EBCDIC platforms. Sometimes, though, a test +can't use a function and it's inconvenient to have different test +versions depending on the platform. There are 20 code points that are +the same in all 4 character sets currently recognized by Perl (the 3 +EBCDIC code pages plus ISO 8859\-1 (ASCII/Latin1)). These can be used in +such tests, though there is a small possibility that Perl will become +available in yet another character set, breaking your test. All but one +of these code points are C0 control characters. The most significant +controls that are the same are \f(CW\*(C`\e0\*(C'\fR, \f(CW\*(C`\er\*(C'\fR, and \f(CW\*(C`\eN{VT}\*(C'\fR (also +specifiable as \f(CW\*(C`\ecK\*(C'\fR, \f(CW\*(C`\ex0B\*(C'\fR, \f(CW\*(C`\eN{U+0B}\*(C'\fR, or \f(CW\*(C`\e013\*(C'\fR). The single +non-control is U+00B6 PILCROW SIGN. The controls that are the same have +the same bit pattern in all 4 character sets, regardless of the UTF8ness +of the string containing them. The bit pattern for U+B6 is the same in +all 4 for non\-UTF8 strings, but differs in each when its containing +string is UTF\-8 encoded. The only other code points that have some sort +of sameness across all 4 character sets are the pair 0xDC and 0xFC. +Together these represent upper\- and lowercase LATIN LETTER U WITH +DIAERESIS, but which is upper and which is lower may be reversed: 0xDC +is the capital in Latin1 and 0xFC is the small letter, while 0xFC is the +capital in EBCDIC and 0xDC is the small one. This factoid may be +exploited in writing case insensitive tests that are the same across all +4 character sets. +.IP \(bu 4 +Assuming the character set is just ASCII +.Sp +ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra +characters have different meanings depending on the locale. Absent a +locale, currently these extra characters are generally considered to be +unassigned, and this has presented some problems. This has being +changed starting in 5.12 so that these characters can be considered to +be Latin\-1 (ISO\-8859\-1). +.IP \(bu 4 +Mixing #define and #ifdef +.Sp +.Vb 6 +\& #define BURGLE(x) ... \e +\& #ifdef BURGLE_OLD_STYLE /* BAD */ +\& ... do it the old way ... \e +\& #else +\& ... do it the new way ... \e +\& #endif +.Ve +.Sp +You cannot portably "stack" cpp directives. For example in the above +you need two separate \fBBURGLE()\fR #defines, one for each #ifdef branch. +.IP \(bu 4 +Adding non-comment stuff after #endif or #else +.Sp +.Vb 5 +\& #ifdef SNOSH +\& ... +\& #else !SNOSH /* BAD */ +\& ... +\& #endif SNOSH /* BAD */ +.Ve +.Sp +The #endif and #else cannot portably have anything non-comment after +them. If you want to document what is going (which is a good idea +especially if the branches are long), use (C) comments: +.Sp +.Vb 5 +\& #ifdef SNOSH +\& ... +\& #else /* !SNOSH */ +\& ... +\& #endif /* SNOSH */ +.Ve +.Sp +The gcc option \f(CW\*(C`\-Wendif\-labels\*(C'\fR warns about the bad variant (by +default on starting from Perl 5.9.4). +.IP \(bu 4 +Having a comma after the last element of an enum list +.Sp +.Vb 5 +\& enum color { +\& CERULEAN, +\& CHARTREUSE, +\& CINNABAR, /* BAD */ +\& }; +.Ve +.Sp +is not portable. Leave out the last comma. +.Sp +Also note that whether enums are implicitly morphable to ints varies +between compilers, you might need to (int). +.IP \(bu 4 +Mixing signed char pointers with unsigned char pointers +.Sp +.Vb 4 +\& int foo(char *s) { ... } +\& ... +\& unsigned char *t = ...; /* Or U8* t = ... */ +\& foo(t); /* BAD */ +.Ve +.Sp +While this is legal practice, it is certainly dubious, and downright +fatal in at least one platform: for example VMS cc considers this a +fatal error. One cause for people often making this mistake is that a +"naked char" and therefore dereferencing a "naked char pointer" have an +undefined signedness: it depends on the compiler and the flags of the +compiler and the underlying platform whether the result is signed or +unsigned. For this very same reason using a 'char' as an array index is +bad. +.IP \(bu 4 +Macros that have string constants and their arguments as substrings of +the string constants +.Sp +.Vb 2 +\& #define FOO(n) printf("number = %d\en", n) /* BAD */ +\& FOO(10); +.Ve +.Sp +Pre-ANSI semantics for that was equivalent to +.Sp +.Vb 1 +\& printf("10umber = %d\e10"); +.Ve +.Sp +which is probably not what you were expecting. Unfortunately at least +one reasonably common and modern C compiler does "real backward +compatibility" here, in AIX that is what still happens even though the +rest of the AIX compiler is very happily C89. +.IP \(bu 4 +Using printf formats for non-basic C types +.Sp +.Vb 2 +\& IV i = ...; +\& printf("i = %d\en", i); /* BAD */ +.Ve +.Sp +While this might by accident work in some platform (where IV happens to +be an \f(CW\*(C`int\*(C'\fR), in general it cannot. IV might be something larger. Even +worse the situation is with more specific types (defined by Perl's +configuration step in \fIconfig.h\fR): +.Sp +.Vb 2 +\& Uid_t who = ...; +\& printf("who = %d\en", who); /* BAD */ +.Ve +.Sp +The problem here is that Uid_t might be not only not \f(CW\*(C`int\*(C'\fR\-wide but it +might also be unsigned, in which case large uids would be printed as +negative values. +.Sp +There is no simple solution to this because of \fBprintf()\fR's limited +intelligence, but for many types the right format is available as with +either 'f' or '_f' suffix, for example: +.Sp +.Vb 2 +\& IVdf /* IV in decimal */ +\& UVxf /* UV is hexadecimal */ +\& +\& printf("i = %"IVdf"\en", i); /* The IVdf is a string constant. */ +\& +\& Uid_t_f /* Uid_t in decimal */ +\& +\& printf("who = %"Uid_t_f"\en", who); +.Ve +.Sp +Or you can try casting to a "wide enough" type: +.Sp +.Vb 1 +\& printf("i = %"IVdf"\en", (IV)something_very_small_and_signed); +.Ve +.Sp +See "Formatted Printing of Size_t and SSize_t" in perlguts for how to +print those. +.Sp +Also remember that the \f(CW%p\fR format really does require a void pointer: +.Sp +.Vb 2 +\& U8* p = ...; +\& printf("p = %p\en", (void*)p); +.Ve +.Sp +The gcc option \f(CW\*(C`\-Wformat\*(C'\fR scans for such problems. +.IP \(bu 4 +Blindly passing va_list +.Sp +Not all platforms support passing va_list to further varargs (stdarg) +functions. The right thing to do is to copy the va_list using the +\&\fBPerl_va_copy()\fR if the NEED_VA_COPY is defined. +.IP \(bu 4 +Using gcc statement expressions +.Sp +.Vb 1 +\& val = ({...;...;...}); /* BAD */ +.Ve +.Sp +While a nice extension, it's not portable. Historically, Perl used +them in macros if available to gain some extra speed (essentially +as a funky form of inlining), but we now support (or emulate) C99 +\&\f(CW\*(C`static inline\*(C'\fR functions, so use them instead. Declare functions as +\&\f(CW\*(C`PERL_STATIC_INLINE\*(C'\fR to transparently fall back to emulation where needed. +.IP \(bu 4 +Binding together several statements in a macro +.Sp +Use the macros \f(CW\*(C`STMT_START\*(C'\fR and \f(CW\*(C`STMT_END\*(C'\fR. +.Sp +.Vb 3 +\& STMT_START { +\& ... +\& } STMT_END +.Ve +.Sp +But there can be subtle (but avoidable if you do it right) bugs +introduced with these; see "\f(CW\*(C`STMT_START\*(C'\fR" in perlapi for best practices +for their use. +.IP \(bu 4 +Testing for operating systems or versions when you should be testing for +features +.Sp +.Vb 3 +\& #ifdef _\|_FOONIX_\|_ /* BAD */ +\& foo = quux(); +\& #endif +.Ve +.Sp +Unless you know with 100% certainty that \fBquux()\fR is only ever available +for the "Foonix" operating system \fBand\fR that is available \fBand\fR +correctly working for \fBall\fR past, present, \fBand\fR future versions of +"Foonix", the above is very wrong. This is more correct (though still +not perfect, because the below is a compile-time check): +.Sp +.Vb 3 +\& #ifdef HAS_QUUX +\& foo = quux(); +\& #endif +.Ve +.Sp +How does the HAS_QUUX become defined where it needs to be? Well, if +Foonix happens to be Unixy enough to be able to run the Configure +script, and Configure has been taught about detecting and testing +\&\fBquux()\fR, the HAS_QUUX will be correctly defined. In other platforms, the +corresponding configuration step will hopefully do the same. +.Sp +In a pinch, if you cannot wait for Configure to be educated, or if you +have a good hunch of where \fBquux()\fR might be available, you can +temporarily try the following: +.Sp +.Vb 3 +\& #if (defined(_\|_FOONIX_\|_) || defined(_\|_BARNIX_\|_)) +\& # define HAS_QUUX +\& #endif +\& +\& ... +\& +\& #ifdef HAS_QUUX +\& foo = quux(); +\& #endif +.Ve +.Sp +But in any case, try to keep the features and operating systems +separate. +.Sp +A good resource on the predefined macros for various operating +systems, compilers, and so forth is +<http://sourceforge.net/p/predef/wiki/Home/> +.IP \(bu 4 +Assuming the contents of static memory pointed to by the return values +of Perl wrappers for C library functions doesn't change. Many C library +functions return pointers to static storage that can be overwritten by +subsequent calls to the same or related functions. Perl has wrappers +for some of these functions. Originally many of those wrappers returned +those volatile pointers. But over time almost all of them have evolved +to return stable copies. To cope with the remaining ones, do a +"savepv" in perlapi to make a copy, thus avoiding these problems. You +will have to free the copy when you're done to avoid memory leaks. If +you don't have control over when it gets freed, you'll need to make the +copy in a mortal scalar, like so +.Sp +.Vb 1 +\& SvPVX(sv_2mortal(newSVpv(volatile_string, 0))) +.Ve +.SS "Problematic System Interfaces" +.IX Subsection "Problematic System Interfaces" +.IP \(bu 4 +Perl strings are NOT the same as C strings: They may contain \f(CW\*(C`NUL\*(C'\fR +characters, whereas a C string is terminated by the first \f(CW\*(C`NUL\*(C'\fR. +That is why Perl API functions that deal with strings generally take a +pointer to the first byte and either a length or a pointer to the byte +just beyond the final one. +.Sp +And this is the reason that many of the C library string handling +functions should not be used. They don't cope with the full generality +of Perl strings. It may be that your test cases don't have embedded +\&\f(CW\*(C`NUL\*(C'\fRs, and so the tests pass, whereas there may well eventually arise +real-world cases where they fail. A lesson here is to include \f(CW\*(C`NUL\*(C'\fRs +in your tests. Now it's fairly rare in most real world cases to get +\&\f(CW\*(C`NUL\*(C'\fRs, so your code may seem to work, until one day a \f(CW\*(C`NUL\*(C'\fR comes +along. +.Sp +Here's an example. It used to be a common paradigm, for decades, in the +perl core to use \f(CW\*(C`strchr("list",\ c)\*(C'\fR to see if the character \f(CW\*(C`c\*(C'\fR is +any of the ones given in \f(CW"list"\fR, a double-quote-enclosed string of +the set of characters that we are seeing if \f(CW\*(C`c\*(C'\fR is one of. As long as +\&\f(CW\*(C`c\*(C'\fR isn't a \f(CW\*(C`NUL\*(C'\fR, it works. But when \f(CW\*(C`c\*(C'\fR is a \f(CW\*(C`NUL\*(C'\fR, \f(CW\*(C`strchr\*(C'\fR +returns a pointer to the terminating \f(CW\*(C`NUL\*(C'\fR in \f(CW"list"\fR. This likely +will result in a segfault or a security issue when the caller uses that +end pointer as the starting point to read from. +.Sp +A solution to this and many similar issues is to use the \f(CW\*(C`mem\*(C'\fR\fI\-foo\fR C +library functions instead. In this case \f(CW\*(C`memchr\*(C'\fR can be used to see if +\&\f(CW\*(C`c\*(C'\fR is in \f(CW"list"\fR and works even if \f(CW\*(C`c\*(C'\fR is \f(CW\*(C`NUL\*(C'\fR. These functions +need an additional parameter to give the string length. +In the case of literal string parameters, perl has defined macros that +calculate the length for you. See "String Handling" in perlapi. +.IP \(bu 4 +\&\fBmalloc\fR\|(0), \fBrealloc\fR\|(0), calloc(0, 0) are non-portable. To be portable +allocate at least one byte. (In general you should rarely need to work +at this low level, but instead use the various malloc wrappers.) +.IP \(bu 4 +\&\fBsnprintf()\fR \- the return type is unportable. Use \fBmy_snprintf()\fR instead. +.SS "Security problems" +.IX Subsection "Security problems" +Last but not least, here are various tips for safer coding. +See also perlclib for libc/stdio replacements one should use. +.IP \(bu 4 +Do not use \fBgets()\fR +.Sp +Or we will publicly ridicule you. Seriously. +.IP \(bu 4 +Do not use \fBtmpfile()\fR +.Sp +Use \fBmkstemp()\fR instead. +.IP \(bu 4 +Do not use \fBstrcpy()\fR or \fBstrcat()\fR or \fBstrncpy()\fR or \fBstrncat()\fR +.Sp +Use \fBmy_strlcpy()\fR and \fBmy_strlcat()\fR instead: they either use the native +implementation, or Perl's own implementation (borrowed from the public +domain implementation of INN). +.IP \(bu 4 +Do not use \fBsprintf()\fR or \fBvsprintf()\fR +.Sp +If you really want just plain byte strings, use \fBmy_snprintf()\fR and +\&\fBmy_vsnprintf()\fR instead, which will try to use \fBsnprintf()\fR and +\&\fBvsnprintf()\fR if those safer APIs are available. If you want something +fancier than a plain byte string, use +\&\f(CW\*(C`Perl_form\*(C'\fR() or SVs and +\&\f(CWPerl_sv_catpvf()\fR. +.Sp +Note that glibc \f(CWprintf()\fR, \f(CWsprintf()\fR, etc. are buggy before glibc +version 2.17. They won't allow a \f(CW\*(C`%.s\*(C'\fR format with a precision to +create a string that isn't valid UTF\-8 if the current underlying locale +of the program is UTF\-8. What happens is that the \f(CW%s\fR and its operand are +simply skipped without any notice. +<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>. +.IP \(bu 4 +Do not use \fBatoi()\fR +.Sp +Use \fBgrok_atoUV()\fR instead. \fBatoi()\fR has ill-defined behavior on overflows, +and cannot be used for incremental parsing. It is also affected by locale, +which is bad. +.IP \(bu 4 +Do not use \fBstrtol()\fR or \fBstrtoul()\fR +.Sp +Use \fBgrok_atoUV()\fR instead. \fBstrtol()\fR or \fBstrtoul()\fR (or their IV/UV\-friendly +macro disguises, \fBStrtol()\fR and \fBStrtoul()\fR, or \fBAtol()\fR and \fBAtoul()\fR are +affected by locale, which is bad. +.SH DEBUGGING +.IX Header "DEBUGGING" +You can compile a special debugging version of Perl, which allows you +to use the \f(CW\*(C`\-D\*(C'\fR option of Perl to tell more about what Perl is doing. +But sometimes there is no alternative than to dive in with a debugger, +either to see the stack trace of a core dump (very useful in a bug +report), or trying to figure out what went wrong before the core dump +happened, or how did we end up having wrong or unexpected results. +.SS "Poking at Perl" +.IX Subsection "Poking at Perl" +To really poke around with Perl, you'll probably want to build Perl for +debugging, like this: +.PP +.Vb 2 +\& ./Configure \-d \-DDEBUGGING +\& make +.Ve +.PP +\&\f(CW\*(C`\-DDEBUGGING\*(C'\fR turns on the C compiler's \f(CW\*(C`\-g\*(C'\fR flag to have it produce +debugging information which will allow us to step through a running +program, and to see in which C function we are at (without the debugging +information we might see only the numerical addresses of the functions, +which is not very helpful). It will also turn on the \f(CW\*(C`DEBUGGING\*(C'\fR +compilation symbol which enables all the internal debugging code in Perl. +There are a whole bunch of things you can debug with this: +perlrun lists them all, and the best way to find out +about them is to play about with them. The most useful options are +probably +.PP +.Vb 5 +\& l Context (loop) stack processing +\& s Stack snapshots (with v, displays all stacks) +\& t Trace execution +\& o Method and overloading resolution +\& c String/numeric conversions +.Ve +.PP +For example +.PP +.Vb 8 +\& $ perl \-Dst \-e \*(Aq$a + 1\*(Aq +\& .... +\& (\-e:1) gvsv(main::a) +\& => UNDEF +\& (\-e:1) const(IV(1)) +\& => UNDEF IV(1) +\& (\-e:1) add +\& => NV(1) +.Ve +.PP +Some of the functionality of the debugging code can be achieved with a +non-debugging perl by using XS modules: +.PP +.Vb 2 +\& \-Dr => use re \*(Aqdebug\*(Aq +\& \-Dx => use O \*(AqDebug\*(Aq +.Ve +.SS "Using a source-level debugger" +.IX Subsection "Using a source-level debugger" +If the debugging output of \f(CW\*(C`\-D\*(C'\fR doesn't help you, it's time to step +through perl's execution with a source-level debugger. +.IP \(bu 3 +We'll use \f(CW\*(C`gdb\*(C'\fR for our examples here; the principles will apply to +any debugger (many vendors call their debugger \f(CW\*(C`dbx\*(C'\fR), but check the +manual of the one you're using. +.PP +To fire up the debugger, type +.PP +.Vb 1 +\& gdb ./perl +.Ve +.PP +Or if you have a core dump: +.PP +.Vb 1 +\& gdb ./perl core +.Ve +.PP +You'll want to do that in your Perl source tree so the debugger can +read the source code. You should see the copyright message, followed by +the prompt. +.PP +.Vb 1 +\& (gdb) +.Ve +.PP +\&\f(CW\*(C`help\*(C'\fR will get you into the documentation, but here are the most +useful commands: +.IP \(bu 3 +run [args] +.Sp +Run the program with the given arguments. +.IP \(bu 3 +break function_name +.IP \(bu 3 +break source.c:xxx +.Sp +Tells the debugger that we'll want to pause execution when we reach +either the named function (but see "Internal Functions" in perlguts!) or +the given line in the named source file. +.IP \(bu 3 +step +.Sp +Steps through the program a line at a time. +.IP \(bu 3 +next +.Sp +Steps through the program a line at a time, without descending into +functions. +.IP \(bu 3 +continue +.Sp +Run until the next breakpoint. +.IP \(bu 3 +finish +.Sp +Run until the end of the current function, then stop again. +.IP \(bu 3 +\&'enter' +.Sp +Just pressing Enter will do the most recent operation again \- it's a +blessing when stepping through miles of source code. +.IP \(bu 3 +ptype +.Sp +Prints the C definition of the argument given. +.Sp +.Vb 10 +\& (gdb) ptype PL_op +\& type = struct op { +\& OP *op_next; +\& OP *op_sibparent; +\& OP *(*op_ppaddr)(void); +\& PADOFFSET op_targ; +\& unsigned int op_type : 9; +\& unsigned int op_opt : 1; +\& unsigned int op_slabbed : 1; +\& unsigned int op_savefree : 1; +\& unsigned int op_static : 1; +\& unsigned int op_folded : 1; +\& unsigned int op_spare : 2; +\& U8 op_flags; +\& U8 op_private; +\& } * +.Ve +.IP \(bu 3 +print +.Sp +Execute the given C code and print its results. \fBWARNING\fR: Perl makes +heavy use of macros, and \fIgdb\fR does not necessarily support macros +(see later "gdb macro support"). You'll have to substitute them +yourself, or to invoke cpp on the source code files (see "The .i +Targets") So, for instance, you can't say +.Sp +.Vb 1 +\& print SvPV_nolen(sv) +.Ve +.Sp +but you have to say +.Sp +.Vb 1 +\& print Perl_sv_2pv_nolen(sv) +.Ve +.PP +You may find it helpful to have a "macro dictionary", which you can +produce by saying \f(CW\*(C`cpp \-dM perl.c | sort\*(C'\fR. Even then, \fIcpp\fR won't +recursively apply those macros for you. +.SS "gdb macro support" +.IX Subsection "gdb macro support" +Recent versions of \fIgdb\fR have fairly good macro support, but in order +to use it you'll need to compile perl with macro definitions included +in the debugging information. Using \fIgcc\fR version 3.1, this means +configuring with \f(CW\*(C`\-Doptimize=\-g3\*(C'\fR. Other compilers might use a +different switch (if they support debugging macros at all). +.SS "Dumping Perl Data Structures" +.IX Subsection "Dumping Perl Data Structures" +One way to get around this macro hell is to use the dumping functions +in \fIdump.c\fR; these work a little like an internal +Devel::Peek, but they also cover OPs and other +structures that you can't get at from Perl. Let's take an example. +We'll use the \f(CW\*(C`$a = $b + $c\*(C'\fR we used before, but give it a bit of +context: \f(CW\*(C`$b = "6XXXX"; $c = 2.3;\*(C'\fR. Where's a good place to stop and +poke around? +.PP +What about \f(CW\*(C`pp_add\*(C'\fR, the function we examined earlier to implement the +\&\f(CW\*(C`+\*(C'\fR operator: +.PP +.Vb 2 +\& (gdb) break Perl_pp_add +\& Breakpoint 1 at 0x46249f: file pp_hot.c, line 309. +.Ve +.PP +Notice we use \f(CW\*(C`Perl_pp_add\*(C'\fR and not \f(CW\*(C`pp_add\*(C'\fR \- see +"Internal Functions" in perlguts. With the breakpoint in place, we can +run our program: +.PP +.Vb 1 +\& (gdb) run \-e \*(Aq$b = "6XXXX"; $c = 2.3; $a = $b + $c\*(Aq +.Ve +.PP +Lots of junk will go past as gdb reads in the relevant source files and +libraries, and then: +.PP +.Vb 5 +\& Breakpoint 1, Perl_pp_add () at pp_hot.c:309 +\& 1396 dSP; dATARGET; bool useleft; SV *svl, *svr; +\& (gdb) step +\& 311 dPOPTOPnnrl_ul; +\& (gdb) +.Ve +.PP +We looked at this bit of code before, and we said that +\&\f(CW\*(C`dPOPTOPnnrl_ul\*(C'\fR arranges for two \f(CW\*(C`NV\*(C'\fRs to be placed into \f(CW\*(C`left\*(C'\fR and +\&\f(CW\*(C`right\*(C'\fR \- let's slightly expand it: +.PP +.Vb 3 +\& #define dPOPTOPnnrl_ul NV right = POPn; \e +\& SV *leftsv = TOPs; \e +\& NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0 +.Ve +.PP +\&\f(CW\*(C`POPn\*(C'\fR takes the SV from the top of the stack and obtains its NV +either directly (if \f(CW\*(C`SvNOK\*(C'\fR is set) or by calling the \f(CW\*(C`sv_2nv\*(C'\fR +function. \f(CW\*(C`TOPs\*(C'\fR takes the next SV from the top of the stack \- yes, +\&\f(CW\*(C`POPn\*(C'\fR uses \f(CW\*(C`TOPs\*(C'\fR \- but doesn't remove it. We then use \f(CW\*(C`SvNV\*(C'\fR to +get the NV from \f(CW\*(C`leftsv\*(C'\fR in the same way as before \- yes, \f(CW\*(C`POPn\*(C'\fR uses +\&\f(CW\*(C`SvNV\*(C'\fR. +.PP +Since we don't have an NV for \f(CW$b\fR, we'll have to use \f(CW\*(C`sv_2nv\*(C'\fR to +convert it. If we step again, we'll find ourselves there: +.PP +.Vb 4 +\& (gdb) step +\& Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669 +\& 1669 if (!sv) +\& (gdb) +.Ve +.PP +We can now use \f(CW\*(C`Perl_sv_dump\*(C'\fR to investigate the SV: +.PP +.Vb 8 +\& (gdb) print Perl_sv_dump(sv) +\& SV = PV(0xa057cc0) at 0xa0675d0 +\& REFCNT = 1 +\& FLAGS = (POK,pPOK) +\& PV = 0xa06a510 "6XXXX"\e0 +\& CUR = 5 +\& LEN = 6 +\& $1 = void +.Ve +.PP +We know we're going to get \f(CW6\fR from this, so let's finish the +subroutine: +.PP +.Vb 4 +\& (gdb) finish +\& Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671 +\& 0x462669 in Perl_pp_add () at pp_hot.c:311 +\& 311 dPOPTOPnnrl_ul; +.Ve +.PP +We can also dump out this op: the current op is always stored in +\&\f(CW\*(C`PL_op\*(C'\fR, and we can dump it with \f(CW\*(C`Perl_op_dump\*(C'\fR. This'll give us +similar output to CPAN module B::Debug. +.PP +.Vb 10 +\& (gdb) print Perl_op_dump(PL_op) +\& { +\& 13 TYPE = add ===> 14 +\& TARG = 1 +\& FLAGS = (SCALAR,KIDS) +\& { +\& TYPE = null ===> (12) +\& (was rv2sv) +\& FLAGS = (SCALAR,KIDS) +\& { +\& 11 TYPE = gvsv ===> 12 +\& FLAGS = (SCALAR) +\& GV = main::b +\& } +\& } +.Ve +.PP +# finish this later # +.SS "Using gdb to look at specific parts of a program" +.IX Subsection "Using gdb to look at specific parts of a program" +With the example above, you knew to look for \f(CW\*(C`Perl_pp_add\*(C'\fR, but what if +there were multiple calls to it all over the place, or you didn't know what +the op was you were looking for? +.PP +One way to do this is to inject a rare call somewhere near what you're looking +for. For example, you could add \f(CW\*(C`study\*(C'\fR before your method: +.PP +.Vb 1 +\& study; +.Ve +.PP +And in gdb do: +.PP +.Vb 1 +\& (gdb) break Perl_pp_study +.Ve +.PP +And then step until you hit what you're +looking for. This works well in a loop +if you want to only break at certain iterations: +.PP +.Vb 3 +\& for my $c (1..100) { +\& study if $c == 50; +\& } +.Ve +.SS "Using gdb to look at what the parser/lexer are doing" +.IX Subsection "Using gdb to look at what the parser/lexer are doing" +If you want to see what perl is doing when parsing/lexing your code, you can +use \f(CW\*(C`BEGIN {}\*(C'\fR: +.PP +.Vb 3 +\& print "Before\en"; +\& BEGIN { study; } +\& print "After\en"; +.Ve +.PP +And in gdb: +.PP +.Vb 1 +\& (gdb) break Perl_pp_study +.Ve +.PP +If you want to see what the parser/lexer is doing inside of \f(CW\*(C`if\*(C'\fR blocks and +the like you need to be a little trickier: +.PP +.Vb 1 +\& if ($a && $b && do { BEGIN { study } 1 } && $c) { ... } +.Ve +.SH "SOURCE CODE STATIC ANALYSIS" +.IX Header "SOURCE CODE STATIC ANALYSIS" +Various tools exist for analysing C source code \fBstatically\fR, as +opposed to \fBdynamically\fR, that is, without executing the code. It is +possible to detect resource leaks, undefined behaviour, type +mismatches, portability problems, code paths that would cause illegal +memory accesses, and other similar problems by just parsing the C code +and looking at the resulting graph, what does it tell about the +execution and data flows. As a matter of fact, this is exactly how C +compilers know to give warnings about dubious code. +.SS lint +.IX Subsection "lint" +The good old C code quality inspector, \f(CW\*(C`lint\*(C'\fR, is available in several +platforms, but please be aware that there are several different +implementations of it by different vendors, which means that the flags +are not identical across different platforms. +.PP +There is a \f(CW\*(C`lint\*(C'\fR target in Makefile, but you may have to +diddle with the flags (see above). +.SS Coverity +.IX Subsection "Coverity" +Coverity (<http://www.coverity.com/>) is a product similar to lint and as +a testbed for their product they periodically check several open source +projects, and they give out accounts to open source developers to the +defect databases. +.PP +There is Coverity setup for the perl5 project: +<https://scan.coverity.com/projects/perl5> +.SS "HP-UX cadvise (Code Advisor)" +.IX Subsection "HP-UX cadvise (Code Advisor)" +HP has a C/C++ static analyzer product for HP-UX caller Code Advisor. +(Link not given here because the URL is horribly long and seems horribly +unstable; use the search engine of your choice to find it.) The use of +the \f(CW\*(C`cadvise_cc\*(C'\fR recipe with \f(CW\*(C`Configure ... \-Dcc=./cadvise_cc\*(C'\fR +(see cadvise "User Guide") is recommended; as is the use of \f(CW\*(C`+wall\*(C'\fR. +.SS "cpd (cut-and-paste detector)" +.IX Subsection "cpd (cut-and-paste detector)" +The cpd tool detects cut-and-paste coding. If one instance of the +cut-and-pasted code changes, all the other spots should probably be +changed, too. Therefore such code should probably be turned into a +subroutine or a macro. +.PP +cpd (<https://pmd.github.io/latest/pmd_userdocs_cpd.html>) is part of the pmd project +(<https://pmd.github.io/>). pmd was originally written for static +analysis of Java code, but later the cpd part of it was extended to +parse also C and C++. +.PP +Download the pmd\-bin\-X.Y.zip () from the SourceForge site, extract the +pmd\-X.Y.jar from it, and then run that on source code thusly: +.PP +.Vb 2 +\& java \-cp pmd\-X.Y.jar net.sourceforge.pmd.cpd.CPD \e +\& \-\-minimum\-tokens 100 \-\-files /some/where/src \-\-language c > cpd.txt +.Ve +.PP +You may run into memory limits, in which case you should use the \-Xmx +option: +.PP +.Vb 1 +\& java \-Xmx512M ... +.Ve +.SS "gcc warnings" +.IX Subsection "gcc warnings" +Though much can be written about the inconsistency and coverage +problems of gcc warnings (like \f(CW\*(C`\-Wall\*(C'\fR not meaning "all the warnings", +or some common portability problems not being covered by \f(CW\*(C`\-Wall\*(C'\fR, or +\&\f(CW\*(C`\-ansi\*(C'\fR and \f(CW\*(C`\-pedantic\*(C'\fR both being a poorly defined collection of +warnings, and so forth), gcc is still a useful tool in keeping our +coding nose clean. +.PP +The \f(CW\*(C`\-Wall\*(C'\fR is by default on. +.PP +It would be nice for \f(CW\*(C`\-pedantic\*(C'\fR) to be on always, but unfortunately it is not +safe on all platforms \- for example fatal conflicts with the system headers +(Solaris being a prime example). If Configure \f(CW\*(C`\-Dgccansipedantic\*(C'\fR is used, +the \f(CW\*(C`cflags\*(C'\fR frontend selects \f(CW\*(C`\-pedantic\*(C'\fR for the platforms where it is known +to be safe. +.PP +The following extra flags are added: +.IP \(bu 4 +\&\f(CW\*(C`\-Wendif\-labels\*(C'\fR +.IP \(bu 4 +\&\f(CW\*(C`\-Wextra\*(C'\fR +.IP \(bu 4 +\&\f(CW\*(C`\-Wc++\-compat\*(C'\fR +.IP \(bu 4 +\&\f(CW\*(C`\-Wwrite\-strings\*(C'\fR +.IP \(bu 4 +\&\f(CW\*(C`\-Werror=pointer\-arith\*(C'\fR +.IP \(bu 4 +\&\f(CW\*(C`\-Werror=vla\*(C'\fR +.PP +The following flags would be nice to have but they would first need +their own Augean stablemaster: +.IP \(bu 4 +\&\f(CW\*(C`\-Wshadow\*(C'\fR +.IP \(bu 4 +\&\f(CW\*(C`\-Wstrict\-prototypes\*(C'\fR +.PP +The \f(CW\*(C`\-Wtraditional\*(C'\fR is another example of the annoying tendency of gcc +to bundle a lot of warnings under one switch (it would be impossible to +deploy in practice because it would complain a lot) but it does contain +some warnings that would be beneficial to have available on their own, +such as the warning about string constants inside macros containing the +macro arguments: this behaved differently pre-ANSI than it does in +ANSI, and some C compilers are still in transition, AIX being an +example. +.SS "Warnings of other C compilers" +.IX Subsection "Warnings of other C compilers" +Other C compilers (yes, there \fBare\fR other C compilers than gcc) often +have their "strict ANSI" or "strict ANSI with some portability +extensions" modes on, like for example the Sun Workshop has its \f(CW\*(C`\-Xa\*(C'\fR +mode on (though implicitly), or the DEC (these days, HP...) has its +\&\f(CW\*(C`\-std1\*(C'\fR mode on. +.SH "MEMORY DEBUGGERS" +.IX Header "MEMORY DEBUGGERS" +\&\fBNOTE 1\fR: Running under older memory debuggers such as Purify, +valgrind or Third Degree greatly slows down the execution: seconds +become minutes, minutes become hours. For example as of Perl 5.8.1, the +ext/Encode/t/Unicode.t takes extraordinarily long to complete under +e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more +than six hours, even on a snappy computer. The said test must be doing +something that is quite unfriendly for memory debuggers. If you don't +feel like waiting, that you can simply kill away the perl process. +Roughly valgrind slows down execution by factor 10, AddressSanitizer by +factor 2. +.PP +\&\fBNOTE 2\fR: To minimize the number of memory leak false alarms (see +"PERL_DESTRUCT_LEVEL" for more information), you have to set the +environment variable PERL_DESTRUCT_LEVEL to 2. For example, like this: +.PP +.Vb 1 +\& env PERL_DESTRUCT_LEVEL=2 valgrind ./perl \-Ilib ... +.Ve +.PP +\&\fBNOTE 3\fR: There are known memory leaks when there are compile-time +errors within eval or require, seeing \f(CW\*(C`S_doeval\*(C'\fR in the call stack is +a good sign of these. Fixing these leaks is non-trivial, unfortunately, +but they must be fixed eventually. +.PP +\&\fBNOTE 4\fR: DynaLoader will not clean up after itself completely +unless Perl is built with the Configure option +\&\f(CW\*(C`\-Accflags=\-DDL_UNLOAD_ALL_AT_EXIT\*(C'\fR. +.SS valgrind +.IX Subsection "valgrind" +The valgrind tool can be used to find out both memory leaks and illegal +heap memory accesses. As of version 3.3.0, Valgrind only supports Linux +on x86, x86\-64 and PowerPC and Darwin (OS X) on x86 and x86\-64. The +special "test.valgrind" target can be used to run the tests under +valgrind. Found errors and memory leaks are logged in files named +\&\fItestfile.valgrind\fR and by default output is displayed inline. +.PP +Example usage: +.PP +.Vb 1 +\& make test.valgrind +.Ve +.PP +Since valgrind adds significant overhead, tests will take much longer to +run. The valgrind tests support being run in parallel to help with this: +.PP +.Vb 1 +\& TEST_JOBS=9 make test.valgrind +.Ve +.PP +Note that the above two invocations will be very verbose as reachable +memory and leak-checking is enabled by default. If you want to just see +pure errors, try: +.PP +.Vb 2 +\& VG_OPTS=\*(Aq\-q \-\-leak\-check=no \-\-show\-reachable=no\*(Aq TEST_JOBS=9 \e +\& make test.valgrind +.Ve +.PP +Valgrind also provides a cachegrind tool, invoked on perl as: +.PP +.Vb 1 +\& VG_OPTS=\-\-tool=cachegrind make test.valgrind +.Ve +.PP +As system libraries (most notably glibc) are also triggering errors, +valgrind allows to suppress such errors using suppression files. The +default suppression file that comes with valgrind already catches a lot +of them. Some additional suppressions are defined in \fIt/perl.supp\fR. +.PP +To get valgrind and for more information see +.PP +.Vb 1 +\& http://valgrind.org/ +.Ve +.SS AddressSanitizer +.IX Subsection "AddressSanitizer" +AddressSanitizer ("ASan") consists of a compiler instrumentation module +and a run-time \f(CW\*(C`malloc\*(C'\fR library. ASan is available for a variety of +architectures, operating systems, and compilers (see project link below). +It checks for unsafe memory usage, such as use after free and buffer +overflow conditions, and is fast enough that you can easily compile your +debugging or optimized perl with it. Modern versions of ASan check for +memory leaks by default on most platforms, otherwise (e.g. x86_64 OS X) +this feature can be enabled via \f(CW\*(C`ASAN_OPTIONS=detect_leaks=1\*(C'\fR. +.PP +To build perl with AddressSanitizer, your Configure invocation should +look like: +.PP +.Vb 4 +\& sh Configure \-des \-Dcc=clang \e +\& \-Accflags=\-fsanitize=address \-Aldflags=\-fsanitize=address \e +\& \-Alddlflags=\-shared\e \-fsanitize=address \e +\& \-fsanitize\-blacklist=\`pwd\`/asan_ignore +.Ve +.PP +where these arguments mean: +.IP \(bu 4 +\&\-Dcc=clang +.Sp +This should be replaced by the full path to your clang executable if it +is not in your path. +.IP \(bu 4 +\&\-Accflags=\-fsanitize=address +.Sp +Compile perl and extensions sources with AddressSanitizer. +.IP \(bu 4 +\&\-Aldflags=\-fsanitize=address +.Sp +Link the perl executable with AddressSanitizer. +.IP \(bu 4 +\&\-Alddlflags=\-shared\e \-fsanitize=address +.Sp +Link dynamic extensions with AddressSanitizer. You must manually +specify \f(CW\*(C`\-shared\*(C'\fR because using \f(CW\*(C`\-Alddlflags=\-shared\*(C'\fR will prevent +Configure from setting a default value for \f(CW\*(C`lddlflags\*(C'\fR, which usually +contains \f(CW\*(C`\-shared\*(C'\fR (at least on Linux). +.IP \(bu 4 +\&\-fsanitize\-blacklist=`pwd`/asan_ignore +.Sp +AddressSanitizer will ignore functions listed in the \f(CW\*(C`asan_ignore\*(C'\fR +file. (This file should contain a short explanation of why each of +the functions is listed.) +.PP +See also +<https://github.com/google/sanitizers/wiki/AddressSanitizer>. +.SH PROFILING +.IX Header "PROFILING" +Depending on your platform there are various ways of profiling Perl. +.PP +There are two commonly used techniques of profiling executables: +\&\fIstatistical time-sampling\fR and \fIbasic-block counting\fR. +.PP +The first method takes periodically samples of the CPU program counter, +and since the program counter can be correlated with the code generated +for functions, we get a statistical view of in which functions the +program is spending its time. The caveats are that very small/fast +functions have lower probability of showing up in the profile, and that +periodically interrupting the program (this is usually done rather +frequently, in the scale of milliseconds) imposes an additional +overhead that may skew the results. The first problem can be alleviated +by running the code for longer (in general this is a good idea for +profiling), the second problem is usually kept in guard by the +profiling tools themselves. +.PP +The second method divides up the generated code into \fIbasic blocks\fR. +Basic blocks are sections of code that are entered only in the +beginning and exited only at the end. For example, a conditional jump +starts a basic block. Basic block profiling usually works by +\&\fIinstrumenting\fR the code by adding \fIenter basic block #nnnn\fR +book-keeping code to the generated code. During the execution of the +code the basic block counters are then updated appropriately. The +caveat is that the added extra code can skew the results: again, the +profiling tools usually try to factor their own effects out of the +results. +.SS "Gprof Profiling" +.IX Subsection "Gprof Profiling" +\&\fIgprof\fR is a profiling tool available in many Unix platforms which +uses \fIstatistical time-sampling\fR. You can build a profiled version of +\&\fIperl\fR by compiling using gcc with the flag \f(CW\*(C`\-pg\*(C'\fR. Either edit +\&\fIconfig.sh\fR or re-run \fIConfigure\fR. Running the profiled version of +Perl will create an output file called \fIgmon.out\fR which contains the +profiling data collected during the execution. +.PP +quick hint: +.PP +.Vb 6 +\& $ sh Configure \-des \-Dusedevel \-Accflags=\*(Aq\-pg\*(Aq \e +\& \-Aldflags=\*(Aq\-pg\*(Aq \-Alddlflags=\*(Aq\-pg \-shared\*(Aq \e +\& && make perl +\& $ ./perl ... # creates gmon.out in current directory +\& $ gprof ./perl > out +\& $ less out +.Ve +.PP +(you probably need to add \f(CW\*(C`\-shared\*(C'\fR to the <\-Alddlflags> line until RT +#118199 is resolved) +.PP +The \fIgprof\fR tool can then display the collected data in various ways. +Usually \fIgprof\fR understands the following options: +.IP \(bu 4 +\&\-a +.Sp +Suppress statically defined functions from the profile. +.IP \(bu 4 +\&\-b +.Sp +Suppress the verbose descriptions in the profile. +.IP \(bu 4 +\&\-e routine +.Sp +Exclude the given routine and its descendants from the profile. +.IP \(bu 4 +\&\-f routine +.Sp +Display only the given routine and its descendants in the profile. +.IP \(bu 4 +\&\-s +.Sp +Generate a summary file called \fIgmon.sum\fR which then may be given to +subsequent gprof runs to accumulate data over several runs. +.IP \(bu 4 +\&\-z +.Sp +Display routines that have zero usage. +.PP +For more detailed explanation of the available commands and output +formats, see your own local documentation of \fIgprof\fR. +.SS "GCC gcov Profiling" +.IX Subsection "GCC gcov Profiling" +\&\fIbasic block profiling\fR is officially available in gcc 3.0 and later. +You can build a profiled version of \fIperl\fR by compiling using gcc with +the flags \f(CW\*(C`\-fprofile\-arcs \-ftest\-coverage\*(C'\fR. Either edit \fIconfig.sh\fR +or re-run \fIConfigure\fR. +.PP +quick hint: +.PP +.Vb 9 +\& $ sh Configure \-des \-Dusedevel \-Doptimize=\*(Aq\-g\*(Aq \e +\& \-Accflags=\*(Aq\-fprofile\-arcs \-ftest\-coverage\*(Aq \e +\& \-Aldflags=\*(Aq\-fprofile\-arcs \-ftest\-coverage\*(Aq \e +\& \-Alddlflags=\*(Aq\-fprofile\-arcs \-ftest\-coverage \-shared\*(Aq \e +\& && make perl +\& $ rm \-f regexec.c.gcov regexec.gcda +\& $ ./perl ... +\& $ gcov regexec.c +\& $ less regexec.c.gcov +.Ve +.PP +(you probably need to add \f(CW\*(C`\-shared\*(C'\fR to the <\-Alddlflags> line until RT +#118199 is resolved) +.PP +Running the profiled version of Perl will cause profile output to be +generated. For each source file an accompanying \fI.gcda\fR file will be +created. +.PP +To display the results you use the \fIgcov\fR utility (which should be +installed if you have gcc 3.0 or newer installed). \fIgcov\fR is run on +source code files, like this +.PP +.Vb 1 +\& gcov sv.c +.Ve +.PP +which will cause \fIsv.c.gcov\fR to be created. The \fI.gcov\fR files contain +the source code annotated with relative frequencies of execution +indicated by "#" markers. If you want to generate \fI.gcov\fR files for +all profiled object files, you can run something like this: +.PP +.Vb 3 +\& for file in \`find . \-name \e*.gcno\` +\& do sh \-c "cd \`dirname $file\` && gcov \`basename $file .gcno\`" +\& done +.Ve +.PP +Useful options of \fIgcov\fR include \f(CW\*(C`\-b\*(C'\fR which will summarise the basic +block, branch, and function call coverage, and \f(CW\*(C`\-c\*(C'\fR which instead of +relative frequencies will use the actual counts. For more information +on the use of \fIgcov\fR and basic block profiling with gcc, see the +latest GNU CC manual. As of gcc 4.8, this is at +<http://gcc.gnu.org/onlinedocs/gcc/Gcov\-Intro.html#Gcov\-Intro> +.SS "callgrind profiling" +.IX Subsection "callgrind profiling" +callgrind is a valgrind tool for profiling source code. Paired +with kcachegrind (a Qt based UI), it gives you an overview of +where code is taking up time, as well as the ability +to examine callers, call trees, and more. One of its benefits +is you can use it on perl and XS modules that have not been +compiled with debugging symbols. +.PP +If perl is compiled with debugging symbols (\f(CW\*(C`\-g\*(C'\fR), you can view +the annotated source and click around, much like Devel::NYTProf's +HTML output. +.PP +For basic usage: +.PP +.Vb 1 +\& valgrind \-\-tool=callgrind ./perl ... +.Ve +.PP +By default it will write output to \fIcallgrind.out.PID\fR, but you +can change that with \f(CW\*(C`\-\-callgrind\-out\-file=...\*(C'\fR +.PP +To view the data, do: +.PP +.Vb 1 +\& kcachegrind callgrind.out.PID +.Ve +.PP +If you'd prefer to view the data in a terminal, you can use +\&\fIcallgrind_annotate\fR. In it's basic form: +.PP +.Vb 1 +\& callgrind_annotate callgrind.out.PID | less +.Ve +.PP +Some useful options are: +.IP \(bu 4 +\&\-\-threshold +.Sp +Percentage of counts (of primary sort event) we are interested in. +The default is 99%, 100% might show things that seem to be missing. +.IP \(bu 4 +\&\-\-auto +.Sp +Annotate all source files containing functions that helped reach +the event count threshold. +.SH "MISCELLANEOUS TRICKS" +.IX Header "MISCELLANEOUS TRICKS" +.SS PERL_DESTRUCT_LEVEL +.IX Subsection "PERL_DESTRUCT_LEVEL" +If you want to run any of the tests yourself manually using e.g. +valgrind, please note that by default perl \fBdoes not\fR explicitly +cleanup all the memory it has allocated (such as global memory arenas) +but instead lets the \fBexit()\fR of the whole program "take care" of such +allocations, also known as "global destruction of objects". +.PP +There is a way to tell perl to do complete cleanup: set the environment +variable PERL_DESTRUCT_LEVEL to a non-zero value. The t/TEST wrapper +does set this to 2, and this is what you need to do too, if you don't +want to see the "global leaks": For example, for running under valgrind +.PP +.Vb 1 +\& env PERL_DESTRUCT_LEVEL=2 valgrind ./perl \-Ilib t/foo/bar.t +.Ve +.PP +(Note: the mod_perl apache module uses also this environment variable +for its own purposes and extended its semantics. Refer to the mod_perl +documentation for more information. Also, spawned threads do the +equivalent of setting this variable to the value 1.) +.PP +If, at the end of a run you get the message \fIN scalars leaked\fR, you +can recompile with \f(CW\*(C`\-DDEBUG_LEAKING_SCALARS\*(C'\fR, +(\f(CW\*(C`Configure \-Accflags=\-DDEBUG_LEAKING_SCALARS\*(C'\fR), which will cause the +addresses of all those leaked SVs to be dumped along with details as to +where each SV was originally allocated. This information is also +displayed by Devel::Peek. Note that the extra details recorded with +each SV increases memory usage, so it shouldn't be used in production +environments. It also converts \f(CWnew_SV()\fR from a macro into a real +function, so you can use your favourite debugger to discover where +those pesky SVs were allocated. +.PP +If you see that you're leaking memory at runtime, but neither valgrind +nor \f(CW\*(C`\-DDEBUG_LEAKING_SCALARS\*(C'\fR will find anything, you're probably +leaking SVs that are still reachable and will be properly cleaned up +during destruction of the interpreter. In such cases, using the \f(CW\*(C`\-Dm\*(C'\fR +switch can point you to the source of the leak. If the executable was +built with \f(CW\*(C`\-DDEBUG_LEAKING_SCALARS\*(C'\fR, \f(CW\*(C`\-Dm\*(C'\fR will output SV +allocations in addition to memory allocations. Each SV allocation has a +distinct serial number that will be written on creation and destruction +of the SV. So if you're executing the leaking code in a loop, you need +to look for SVs that are created, but never destroyed between each +cycle. If such an SV is found, set a conditional breakpoint within +\&\f(CWnew_SV()\fR and make it break only when \f(CW\*(C`PL_sv_serial\*(C'\fR is equal to the +serial number of the leaking SV. Then you will catch the interpreter in +exactly the state where the leaking SV is allocated, which is +sufficient in many cases to find the source of the leak. +.PP +As \f(CW\*(C`\-Dm\*(C'\fR is using the PerlIO layer for output, it will by itself +allocate quite a bunch of SVs, which are hidden to avoid recursion. You +can bypass the PerlIO layer if you use the SV logging provided by +\&\f(CW\*(C`\-DPERL_MEM_LOG\*(C'\fR instead. +.SS PERL_MEM_LOG +.IX Subsection "PERL_MEM_LOG" +If compiled with \f(CW\*(C`\-DPERL_MEM_LOG\*(C'\fR (\f(CW\*(C`\-Accflags=\-DPERL_MEM_LOG\*(C'\fR), both +memory and SV allocations go through logging functions, which is +handy for breakpoint setting. +.PP +Unless \f(CW\*(C`\-DPERL_MEM_LOG_NOIMPL\*(C'\fR (\f(CW\*(C`\-Accflags=\-DPERL_MEM_LOG_NOIMPL\*(C'\fR) is +also compiled, the logging functions read \f(CW$ENV\fR{PERL_MEM_LOG} to +determine whether to log the event, and if so how: +.PP +.Vb 6 +\& $ENV{PERL_MEM_LOG} =~ /m/ Log all memory ops +\& $ENV{PERL_MEM_LOG} =~ /s/ Log all SV ops +\& $ENV{PERL_MEM_LOG} =~ /c/ Additionally log C backtrace for +\& new_SV events +\& $ENV{PERL_MEM_LOG} =~ /t/ include timestamp in Log +\& $ENV{PERL_MEM_LOG} =~ /^(\ed+)/ write to FD given (default is 2) +.Ve +.PP +Memory logging is somewhat similar to \f(CW\*(C`\-Dm\*(C'\fR but is independent of +\&\f(CW\*(C`\-DDEBUGGING\*(C'\fR, and at a higher level; all uses of \fBNewx()\fR, \fBRenew()\fR, and +\&\fBSafefree()\fR are logged with the caller's source code file and line +number (and C function name, if supported by the C compiler). In +contrast, \f(CW\*(C`\-Dm\*(C'\fR is directly at the point of \f(CWmalloc()\fR. SV logging is +similar. +.PP +Since the logging doesn't use PerlIO, all SV allocations are logged and +no extra SV allocations are introduced by enabling the logging. If +compiled with \f(CW\*(C`\-DDEBUG_LEAKING_SCALARS\*(C'\fR, the serial number for each SV +allocation is also logged. +.PP +The \f(CW\*(C`c\*(C'\fR option uses the \f(CW\*(C`Perl_c_backtrace\*(C'\fR facility, and therefore +additionally requires the Configure \f(CW\*(C`\-Dusecbacktrace\*(C'\fR compile flag in +order to access it. +.SS "DDD over gdb" +.IX Subsection "DDD over gdb" +Those debugging perl with the DDD frontend over gdb may find the +following useful: +.PP +You can extend the data conversion shortcuts menu, so for example you +can display an SV's IV value with one click, without doing any typing. +To do that simply edit ~/.ddd/init file and add after: +.PP +.Vb 6 +\& ! Display shortcuts. +\& Ddd*gdbDisplayShortcuts: \e +\& /t () // Convert to Bin\en\e +\& /d () // Convert to Dec\en\e +\& /x () // Convert to Hex\en\e +\& /o () // Convert to Oct(\en\e +.Ve +.PP +the following two lines: +.PP +.Vb 2 +\& ((XPV*) (())\->sv_any )\->xpv_pv // 2pvx\en\e +\& ((XPVIV*) (())\->sv_any )\->xiv_iv // 2ivx +.Ve +.PP +so now you can do ivx and pvx lookups or you can plug there the sv_peek +"conversion": +.PP +.Vb 1 +\& Perl_sv_peek(my_perl, (SV*)()) // sv_peek +.Ve +.PP +(The my_perl is for threaded builds.) Just remember that every line, +but the last one, should end with \en\e +.PP +Alternatively edit the init file interactively via: 3rd mouse button \-> +New Display \-> Edit Menu +.PP +Note: you can define up to 20 conversion shortcuts in the gdb section. +.SS "C backtrace" +.IX Subsection "C backtrace" +On some platforms Perl supports retrieving the C level backtrace +(similar to what symbolic debuggers like gdb do). +.PP +The backtrace returns the stack trace of the C call frames, +with the symbol names (function names), the object names (like "perl"), +and if it can, also the source code locations (file:line). +.PP +The supported platforms are Linux, and OS X (some *BSD might +work at least partly, but they have not yet been tested). +.PP +This feature hasn't been tested with multiple threads, but it will +only show the backtrace of the thread doing the backtracing. +.PP +The feature needs to be enabled with \f(CW\*(C`Configure \-Dusecbacktrace\*(C'\fR. +.PP +The \f(CW\*(C`\-Dusecbacktrace\*(C'\fR also enables keeping the debug information when +compiling/linking (often: \f(CW\*(C`\-g\*(C'\fR). Many compilers/linkers do support +having both optimization and keeping the debug information. The debug +information is needed for the symbol names and the source locations. +.PP +Static functions might not be visible for the backtrace. +.PP +Source code locations, even if available, can often be missing or +misleading if the compiler has e.g. inlined code. Optimizer can +make matching the source code and the object code quite challenging. +.IP Linux 4 +.IX Item "Linux" +You \fBmust\fR have the BFD (\-lbfd) library installed, otherwise \f(CW\*(C`perl\*(C'\fR will +fail to link. The BFD is usually distributed as part of the GNU binutils. +.Sp +Summary: \f(CW\*(C`Configure ... \-Dusecbacktrace\*(C'\fR +and you need \f(CW\*(C`\-lbfd\*(C'\fR. +.IP "OS X" 4 +.IX Item "OS X" +The source code locations are supported \fBonly\fR if you have +the Developer Tools installed. (BFD is \fBnot\fR needed.) +.Sp +Summary: \f(CW\*(C`Configure ... \-Dusecbacktrace\*(C'\fR +and installing the Developer Tools would be good. +.PP +Optionally, for trying out the feature, you may want to enable +automatic dumping of the backtrace just before a warning or croak (die) +message is emitted, by adding \f(CW\*(C`\-Accflags=\-DUSE_C_BACKTRACE_ON_ERROR\*(C'\fR +for Configure. +.PP +Unless the above additional feature is enabled, nothing about the +backtrace functionality is visible, except for the Perl/XS level. +.PP +Furthermore, even if you have enabled this feature to be compiled, +you need to enable it in runtime with an environment variable: +\&\f(CW\*(C`PERL_C_BACKTRACE_ON_ERROR=10\*(C'\fR. It must be an integer higher +than zero, telling the desired frame count. +.PP +Retrieving the backtrace from Perl level (using for example an XS +extension) would be much less exciting than one would hope: normally +you would see \f(CW\*(C`runops\*(C'\fR, \f(CW\*(C`entersub\*(C'\fR, and not much else. This API is +intended to be called \fBfrom within\fR the Perl implementation, not from +Perl level execution. +.PP +The C API for the backtrace is as follows: +.IP get_c_backtrace 4 +.IX Item "get_c_backtrace" +.PD 0 +.IP free_c_backtrace 4 +.IX Item "free_c_backtrace" +.IP get_c_backtrace_dump 4 +.IX Item "get_c_backtrace_dump" +.IP dump_c_backtrace 4 +.IX Item "dump_c_backtrace" +.PD +.SS Poison +.IX Subsection "Poison" +If you see in a debugger a memory area mysteriously full of 0xABABABAB +or 0xEFEFEFEF, you may be seeing the effect of the \fBPoison()\fR macros, see +perlclib. +.SS "Read-only optrees" +.IX Subsection "Read-only optrees" +Under ithreads the optree is read only. If you want to enforce this, to +check for write accesses from buggy code, compile with +\&\f(CW\*(C`\-Accflags=\-DPERL_DEBUG_READONLY_OPS\*(C'\fR +to enable code that allocates op memory +via \f(CW\*(C`mmap\*(C'\fR, and sets it read-only when it is attached to a subroutine. +Any write access to an op results in a \f(CW\*(C`SIGBUS\*(C'\fR and abort. +.PP +This code is intended for development only, and may not be portable +even to all Unix variants. Also, it is an 80% solution, in that it +isn't able to make all ops read only. Specifically it does not apply to +op slabs belonging to \f(CW\*(C`BEGIN\*(C'\fR blocks. +.PP +However, as an 80% solution it is still effective, as it has caught +bugs in the past. +.SS "When is a bool not a bool?" +.IX Subsection "When is a bool not a bool?" +There wasn't necessarily a standard \f(CW\*(C`bool\*(C'\fR type on compilers prior to +C99, and so some workarounds were created. The \f(CW\*(C`TRUE\*(C'\fR and \f(CW\*(C`FALSE\*(C'\fR +macros are still available as alternatives for \f(CW\*(C`true\*(C'\fR and \f(CW\*(C`false\*(C'\fR. +And the \f(CW\*(C`cBOOL\*(C'\fR macro was created to correctly cast to a true/false +value in all circumstances, but should no longer be necessary. +Using \f(CW\*(C`(bool)\*(C'\fR\ \fIexpr\fR> should now always work. +.PP +There are no plans to remove any of \f(CW\*(C`TRUE\*(C'\fR, \f(CW\*(C`FALSE\*(C'\fR, nor \f(CW\*(C`cBOOL\*(C'\fR. +.SS "Finding unsafe truncations" +.IX Subsection "Finding unsafe truncations" +You may wish to run \f(CW\*(C`Configure\*(C'\fR with something like +.PP +.Vb 1 +\& \-Accflags=\*(Aq\-Wconversion \-Wno\-sign\-conversion \-Wno\-shorten\-64\-to\-32\*(Aq +.Ve +.PP +or your compiler's equivalent to make it easier to spot any unsafe truncations +that show up. +.SS "The .i Targets" +.IX Subsection "The .i Targets" +You can expand the macros in a \fIfoo.c\fR file by saying +.PP +.Vb 1 +\& make foo.i +.Ve +.PP +which will expand the macros using cpp. Don't be scared by the +results. +.SH AUTHOR +.IX Header "AUTHOR" +This document was originally written by Nathan Torkington, and is +maintained by the perl5\-porters mailing list. |