summaryrefslogtreecommitdiffstats
path: root/upstream/debian-unstable/man1/perlhacktips.1
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/debian-unstable/man1/perlhacktips.1')
-rw-r--r--upstream/debian-unstable/man1/perlhacktips.12067
1 files changed, 2067 insertions, 0 deletions
diff --git a/upstream/debian-unstable/man1/perlhacktips.1 b/upstream/debian-unstable/man1/perlhacktips.1
new file mode 100644
index 00000000..a2011627
--- /dev/null
+++ b/upstream/debian-unstable/man1/perlhacktips.1
@@ -0,0 +1,2067 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLHACKTIPS 1"
+.TH PERLHACKTIPS 1 2024-01-12 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlhacktips \- Tips for Perl core C code hacking
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+This document will help you learn the best way to go about hacking on
+the Perl core C code. It covers common problems, debugging, profiling,
+and more.
+.PP
+If you haven't read perlhack and perlhacktut yet, you might want
+to do that first.
+.SH "COMMON PROBLEMS"
+.IX Header "COMMON PROBLEMS"
+Perl source now permits some specific C99 features which we know are
+supported by all platforms, but mostly plays by ANSI C89 rules.
+You don't care about some particular platform having broken Perl? I
+hear there is still a strong demand for J2EE programmers.
+.SS "Perl environment problems"
+.IX Subsection "Perl environment problems"
+.IP \(bu 4
+Not compiling with threading
+.Sp
+Compiling with threading (\-Duseithreads) completely rewrites the
+function prototypes of Perl. You better try your changes with that.
+Related to this is the difference between "Perl_\-less" and "Perl_\-ly"
+APIs, for example:
+.Sp
+.Vb 2
+\& Perl_sv_setiv(aTHX_ ...);
+\& sv_setiv(...);
+.Ve
+.Sp
+The first one explicitly passes in the context, which is needed for
+e.g. threaded builds. The second one does that implicitly; do not get
+them mixed. If you are not passing in a aTHX_, you will need to do a
+dTHX as the first thing in the function.
+.Sp
+See "How multiple interpreters and concurrency are
+supported" in perlguts for further discussion about context.
+.IP \(bu 4
+Not compiling with \-DDEBUGGING
+.Sp
+The DEBUGGING define exposes more code to the compiler, therefore more
+ways for things to go wrong. You should try it.
+.IP \(bu 4
+Introducing (non-read-only) globals
+.Sp
+Do not introduce any modifiable globals, truly global or file static.
+They are bad form and complicate multithreading and other forms of
+concurrency. The right way is to introduce them as new interpreter
+variables, see \fIintrpvar.h\fR (at the very end for binary
+compatibility).
+.Sp
+Introducing read-only (const) globals is okay, as long as you verify
+with e.g. \f(CW\*(C`nm libperl.a|egrep \-v \*(Aq [TURtr] \*(Aq\*(C'\fR (if your \f(CW\*(C`nm\*(C'\fR has
+BSD-style output) that the data you added really is read-only. (If it
+is, it shouldn't show up in the output of that command.)
+.Sp
+If you want to have static strings, make them constant:
+.Sp
+.Vb 1
+\& static const char etc[] = "...";
+.Ve
+.Sp
+If you want to have arrays of constant strings, note carefully the
+right combination of \f(CW\*(C`const\*(C'\fRs:
+.Sp
+.Vb 2
+\& static const char * const yippee[] =
+\& {"hi", "ho", "silver"};
+.Ve
+.IP \(bu 4
+Not exporting your new function
+.Sp
+Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any
+function that is part of the public API (the shared Perl library) to be
+explicitly marked as exported. See the discussion about \fIembed.pl\fR in
+perlguts.
+.IP \(bu 4
+Exporting your new function
+.Sp
+The new shiny result of either genuine new functionality or your
+arduous refactoring is now ready and correctly exported. So what could
+possibly go wrong?
+.Sp
+Maybe simply that your function did not need to be exported in the
+first place. Perl has a long and not so glorious history of exporting
+functions that it should not have.
+.Sp
+If the function is used only inside one source code file, make it
+static. See the discussion about \fIembed.pl\fR in perlguts.
+.Sp
+If the function is used across several files, but intended only for
+Perl's internal use (and this should be the common case), do not export
+it to the public API. See the discussion about \fIembed.pl\fR in
+perlguts.
+.SS C99
+.IX Subsection "C99"
+Starting from 5.35.5 we now permit some C99 features in the core C source.
+However, code in dual life extensions still needs to be C89 only, because it
+needs to compile against earlier version of Perl running on older platforms.
+Also note that our headers need to also be valid as C++, because XS extensions
+written in C++ need to include them, hence \fImember structure initialisers\fR
+can't be used in headers.
+.PP
+C99 support is still far from complete on all platforms we currently support.
+As a baseline we can only assume C89 semantics with the specific C99 features
+described below, which we've verified work everywhere. It's fine to probe for
+additional C99 features and use them where available, providing there is also a
+fallback for compilers that don't support the feature. For example, we use C11
+thread local storage when available, but fall back to POSIX thread specific
+APIs otherwise, and we use \f(CW\*(C`char\*(C'\fR for booleans if \f(CW\*(C`<stdbool.h>\*(C'\fR isn't
+available.
+.PP
+Code can use (and rely on) the following C99 features being present
+.IP \(bu 4
+mixed declarations and code
+.IP \(bu 4
+64 bit integer types
+.Sp
+For consistency with the existing source code, use the typedefs \f(CW\*(C`I64\*(C'\fR and
+\&\f(CW\*(C`U64\*(C'\fR, instead of using \f(CW\*(C`long long\*(C'\fR and \f(CW\*(C`unsigned long long\*(C'\fR directly.
+.IP \(bu 4
+variadic macros
+.Sp
+.Vb 2
+\& void greet(char *file, unsigned int line, char *format, ...);
+\& #define logged_greet(...) greet(_\|_FILE_\|_, _\|_LINE_\|_, _\|_VA_ARGS_\|_);
+.Ve
+.Sp
+Note that \f(CW\*(C`_\|_VA_OPT_\|_\*(C'\fR is a gcc extension not yet in any published standard.
+.IP \(bu 4
+declarations in for loops
+.Sp
+.Vb 3
+\& for (const char *p = message; *p; ++p) {
+\& putchar(*p);
+\& }
+.Ve
+.IP \(bu 4
+member structure initialisers
+.Sp
+But not in headers, as support was only added to C++ relatively recently.
+.Sp
+Hence this is fine in C and XS code, but not headers:
+.Sp
+.Vb 4
+\& struct message {
+\& char *action;
+\& char *target;
+\& };
+\&
+\& struct message mcguffin = {
+\& .target = "member structure initialisers",
+\& .action = "Built"
+\& };
+.Ve
+.IP \(bu 4
+flexible array members
+.Sp
+This is standards conformant:
+.Sp
+.Vb 4
+\& struct greeting {
+\& unsigned int len;
+\& char message[];
+\& };
+.Ve
+.Sp
+However, the source code already uses the "unwarranted chumminess with the
+compiler" hack in many places:
+.Sp
+.Vb 4
+\& struct greeting {
+\& unsigned int len;
+\& char message[1];
+\& };
+.Ve
+.Sp
+Strictly it \fBis\fR undefined behaviour accessing beyond \f(CW\*(C`message[0]\*(C'\fR, but this
+has been a commonly used hack since K&R times, and using it hasn't been a
+practical issue anywhere (in the perl source or any other common C code).
+Hence it's unclear what we would gain from actively changing to the C99
+approach.
+.IP \(bu 4
+\&\f(CW\*(C`//\*(C'\fR comments
+.Sp
+All compilers we tested support their use. Not all humans we tested support
+their use.
+.PP
+Code explicitly should not use any other C99 features. For example
+.IP \(bu 4
+variable length arrays
+.Sp
+Not supported by \fBany\fR MSVC, and this is not going to change.
+.Sp
+Even "variable" length arrays where the variable is a constant expression
+are syntax errors under MSVC.
+.IP \(bu 4
+C99 types in \f(CW\*(C`<stdint.h>\*(C'\fR
+.Sp
+Use \f(CW\*(C`PERL_INT_FAST8_T\*(C'\fR etc as defined in \fIhandy.h\fR
+.IP \(bu 4
+C99 format strings in \f(CW\*(C`<inttypes.h>\*(C'\fR
+.Sp
+\&\f(CW\*(C`snprintf\*(C'\fR in the VMS libc only added support for \f(CW\*(C`PRIdN\*(C'\fR etc very recently,
+meaning that there are live supported installations without this, or formats
+such as \f(CW%zu\fR.
+.Sp
+(perl's \f(CW\*(C`sv_catpvf\*(C'\fR etc use parser code code in \f(CW\*(C`sv.c\*(C'\fR, which supports the
+\&\f(CW\*(C`z\*(C'\fR modifier, along with perl-specific formats such as \f(CW\*(C`SVf\*(C'\fR.)
+.PP
+If you want to use a C99 feature not listed above then you need to do one of
+.IP \(bu 4
+Probe for it in \fIConfigure\fR, set a variable in \fIconfig.sh\fR, and add fallback logic in the headers for platforms which don't have it.
+.IP \(bu 4
+Write test code and verify that it works on platforms we need to support, before relying on it unconditionally.
+.PP
+Likely you want to repeat the same plan as we used to get the current C99
+feature set. See the message at https://markmail.org/thread/odr4fjrn72u2fkpz
+for the C99 probes we used before. Note that the two most "fussy" compilers
+appear to be MSVC and the vendor compiler on VMS. To date all the *nix
+compilers have been far more flexible in what they support.
+.PP
+On *nix platforms, \fIConfigure\fR attempts to set compiler flags appropriately.
+All vendor compilers that we tested defaulted to C99 (or C11) support.
+However, older versions of gcc default to C89, or permit \fImost\fR C99 (with
+warnings), but forbid \fIdeclarations in for loops\fR unless \f(CW\*(C`\-std=gnu99\*(C'\fR is
+added. The alternative \f(CW\*(C`\-std=c99\*(C'\fR \fBmight\fR seem better, but using it on some
+platforms can prevent \f(CW\*(C`<unistd.h>\*(C'\fR declaring some prototypes being
+declared, which breaks the build. gcc's \f(CW\*(C`\-ansi\*(C'\fR flag implies \f(CW\*(C`\-std=c89\*(C'\fR so we
+can no longer set that, hence the Configure option \f(CW\*(C`\-gccansipedantic\*(C'\fR now only
+adds \f(CW\*(C`\-pedantic\*(C'\fR.
+.PP
+The Perl core source code files (the ones at the top level of the source code
+distribution) are automatically compiled with as many as possible of the
+\&\f(CW\*(C`\-std=gnu99\*(C'\fR, \f(CW\*(C`\-pedantic\*(C'\fR, and a selection of \f(CW\*(C`\-W\*(C'\fR flags (see
+cflags.SH). Files in \fIext/\fR \fIdist/\fR \fIcpan/\fR etc are compiled with the same
+flags as the installed perl would use to compile XS extensions.
+.PP
+Basically, it's safe to assume that \fIConfigure\fR and \fIcflags.SH\fR have
+picked the best combination of flags for the version of gcc on the platform,
+and attempting to add more flags related to enforcing a C dialect will
+cause problems either locally, or on other systems that the code is shipped
+to.
+.PP
+We believe that the C99 support in gcc 3.1 is good enough for us, but we don't
+have a 19 year old gcc handy to check this :\-)
+If you have ancient vendor compilers that don't default to C99, the flags
+you might want to try are
+.IP AIX 4
+.IX Item "AIX"
+\&\f(CW\*(C`\-qlanglvl=stdc99\*(C'\fR
+.IP HP/UX 4
+.IX Item "HP/UX"
+\&\f(CW\*(C`\-AC99\*(C'\fR
+.IP Solaris 4
+.IX Item "Solaris"
+\&\f(CW\*(C`\-xc99\*(C'\fR
+.SS "Symbol Names and Namespace Pollution"
+.IX Subsection "Symbol Names and Namespace Pollution"
+\fIChoosing legal symbol names\fR
+.IX Subsection "Choosing legal symbol names"
+.PP
+C reserves for its implementation any symbol whose name begins with an
+underscore followed immediately by either an uppercase letter \f(CW\*(C`[A\-Z]\*(C'\fR
+or another underscore. C++ further reserves any symbol containing two
+consecutive underscores, and further reserves in the global name space any
+symbol beginning with an underscore, not just ones followed by a
+capital. We care about C++ because \f(CW\*(C`hdr\*(C'\fR files need to be compilable by
+it, and some people do all their development using a C++ compiler.
+.PP
+The consequences of failing to do this are probably none. Unless you
+stumble on a name that the implementation uses, things will work.
+Indeed, the perl core has more than a few instances of using
+implementation-reserved symbols. (These are gradually being changed.)
+But your code might stop working any time that the implementation
+decides to use a name you already had chosen, potentially many years
+before.
+.PP
+It's best then to:
+.ie n .IP "\fBDon't begin a symbol name with an underscore\fR; (\fIe.g.\fR, don't use: ""_FOOBAR"")" 4
+.el .IP "\fBDon't begin a symbol name with an underscore\fR; (\fIe.g.\fR, don't use: \f(CW_FOOBAR\fR)" 4
+.IX Item "Don't begin a symbol name with an underscore; (e.g., don't use: _FOOBAR)"
+.PD 0
+.ie n .IP "\fBDon't use two consecutive underscores in a symbol name\fR; (\fIe.g.\fR, don't use ""FOO_\|_BAR"")" 4
+.el .IP "\fBDon't use two consecutive underscores in a symbol name\fR; (\fIe.g.\fR, don't use \f(CWFOO_\|_BAR\fR)" 4
+.IX Item "Don't use two consecutive underscores in a symbol name; (e.g., don't use FOO__BAR)"
+.PD
+.PP
+POSIX also reserves many symbols. See Section 2.2.2 in
+<http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html>.
+Perl also has conflicts with that.
+.PP
+Perl reserves for its use any symbol beginning with \f(CW\*(C`Perl\*(C'\fR, \f(CW\*(C`perl\*(C'\fR, or
+\&\f(CW\*(C`PL_\*(C'\fR. Any time you introduce a macro into a \f(CW\*(C`hdr\*(C'\fR file that doesn't
+follow that convention, you are creating the possiblity of a namespace
+clash with an existing XS module, unless you restrict it by, say,
+.PP
+.Vb 3
+\& #ifdef PERL_CORE
+\& # define my_symbol
+\& #endif
+.Ve
+.PP
+There are many symbols in \f(CW\*(C`hdr\*(C'\fR files that aren't of this form, and
+which are accessible from XS namespace, intentionally or not, just about
+anything in \fIconfig.h\fR, for example.
+.PP
+Having to use one of these prefixes detracts from the readability of the
+code, and hasn't been an actual issue for non-trivial names. Things
+like perl defining its own \f(CW\*(C`MAX\*(C'\fR macro have been problematic, but they
+were quickly discovered, and a \f(CW\*(C`#ifdef\ PERL_CORE\*(C'\fR guard added.
+.PP
+So there's no rule imposed about using such symbols, just be aware of
+the issues.
+.PP
+\fIChoosing good symbol names\fR
+.IX Subsection "Choosing good symbol names"
+.PP
+Ideally, a symbol name name should correctly and precisely describe its
+intended purpose. But there is a tension between that and getting names
+that are overly long and hence awkward to type and read. Metaphors
+could be helpful (a poetic name), but those tend to be culturally
+specific, and may not translate for someone whose native language isn't
+English, or even comes from a different cultural background. Besides,
+the talent of writing poetry seems to be rare in programmers.
+.PP
+Certain symbol names don't reflect their purpose, but are nonetheless
+fine to use because of long-standing conventions. These often
+originated in the field of Mathematics, where \f(CW\*(C`i\*(C'\fR and \f(CW\*(C`j\*(C'\fR are
+frequently used as subscripts, and \f(CW\*(C`n\*(C'\fR as a population count. Since at
+least the 1950's, computer programs have used \f(CW\*(C`i\*(C'\fR, \fIetc.\fR as loop
+variables.
+.PP
+Our guidance is to choose a name that reasonably describes the purpose,
+and to comment its declaration more precisely.
+.PP
+One certainly shouldn't use misleading nor ambiguous names. \f(CW\*(C`last_foo\*(C'\fR
+could mean either the final \f(CW\*(C`foo\*(C'\fR or the previous \f(CW\*(C`foo\*(C'\fR, and so could
+be confusing to the reader, or even to the writer coming back to the
+code after a few months of working on something else. Sometimes the
+programmer has a particular line of thought in mind, and it doesn't
+occur to them that ambiguity is present.
+.PP
+There are probably still many off\-by\-1 bugs around because the name
+"\f(CW\*(C`av_len\*(C'\fR" in perlapi doesn't correspond to what other \fI\-len\fR constructs
+mean, such as "\f(CW\*(C`sv_len\*(C'\fR" in perlapi. Awkward (and controversial)
+synonyms were created to use instead that conveyed its true meaning
+("\f(CW\*(C`av_top_index\*(C'\fR" in perlapi). Eventually, though someone had the better
+idea to create a new name to signify what most people think \f(CW\*(C`\-len\*(C'\fR
+signifies. So "\f(CW\*(C`av_count\*(C'\fR" in perlapi was born. And we wish it had been
+thought up much earlier.
+.SS "Writing safer macros"
+.IX Subsection "Writing safer macros"
+Macros are used extensively in the Perl core for such things as hiding
+internal details from the caller, so that it doesn't have to be
+concerned about them. For example, most lines of code don't need
+to know if they are running on a threaded versus unthreaded perl. That
+detail is automatically mostly hidden.
+.PP
+It is often better to use an inline function instead of a macro. They
+are immune to name collisions with the caller, and don't magnify
+problems when called with parameters that are expressions with side
+effects. There was a time when one might choose a macro over an inline
+function because compiler support for inline functions was quite
+limited. Some only would actually only inline the first two or three
+encountered in a compilation. But those days are long gone, and inline
+functions are fully supported in modern compilers.
+.PP
+Nevertheless, there are situations where a function won't do, and a
+macro is required. One example is when a parameter can be any of
+several types. A function has to be declared with a single explicit
+.PP
+Or maybe the code involved is so trivial that a function would be just
+complicating overkill, such as when the macro simply creates a mnemonic
+name for some constant value.
+.PP
+If you do choose to use a non-trivial macro, be aware that there are
+several avoidable pitfalls that can occur. Keep in mind that a macro is
+expanded within the lexical context of each place in the source it is
+called. If you have a token \f(CW\*(C`foo\*(C'\fR in the macro and the source happens
+also to have \f(CW\*(C`foo\*(C'\fR, the meaning of the macro's \f(CW\*(C`foo\*(C'\fR will become that
+of the caller's. Sometimes that is exactly the behavior you want, but
+be aware that this tends to be confusing later on. It effectively turns
+\&\f(CW\*(C`foo\*(C'\fR into a reserved word for any code that calls the macro, and this
+fact is usually not documented nor considered. It is safer to pass
+\&\f(CW\*(C`foo\*(C'\fR as a parameter, so that \f(CW\*(C`foo\*(C'\fR remains freely available to the
+caller and the macro interface is explicitly specified.
+.PP
+Worse is when the equivalence between the two \f(CW\*(C`foo\*(C'\fR's is coincidental.
+Suppose for example, that the macro declares a variable
+.PP
+.Vb 1
+\& int foo
+.Ve
+.PP
+That works fine as long as the caller doesn't define the string \f(CW\*(C`foo\*(C'\fR
+in some way. And it might not be until years later that someone comes
+along with an instance where \f(CW\*(C`foo\*(C'\fR is used. For example a future
+caller could do this:
+.PP
+.Vb 1
+\& #define foo bar
+.Ve
+.PP
+Then that declaration of \f(CW\*(C`foo\*(C'\fR in the macro suddenly becomes
+.PP
+.Vb 1
+\& int bar
+.Ve
+.PP
+That could mean that something completely different happens than
+intended. It is hard to debug; the macro and call may not even be in
+the same file, so it would require some digging and gnashing of teeth to
+figure out.
+.PP
+Therefore, if a macro does use variables, their names should be such
+that it is very unlikely that they would collide with any caller, now or
+forever. One way to do that, now being used in the perl source, is to
+include the name of the macro itself as part of the name of each
+variable in the macro. Suppose the macro is named \f(CW\*(C`SvPV\*(C'\fR Then we
+could have
+.PP
+.Vb 1
+\& int foo_svpv_ = 0;
+.Ve
+.PP
+This is harder to read than plain \f(CW\*(C`foo\*(C'\fR, but it is pretty much
+guaranteed that a caller will never naively use \f(CW\*(C`foo_svpv_\*(C'\fR (and run
+into problems). (The lowercasing makes it clearer that this is a
+variable, but assumes that there won't be two elements whose names
+differ only in the case of their letters.) The trailing underscore
+makes it even more unlikely to clash, as those, by convention, signify a
+private variable name. (See "Choosing legal symbol names" for
+restrictions on what names you can use.)
+.PP
+This kind of name collision doesn't happen with the macro's formal
+parameters, so they don't need to have complicated names. But there are
+pitfalls when a a parameter is an expression, or has some Perl magic
+attached. When calling a function, C will evaluate the parameter once,
+and pass the result to the function. But when calling a macro, the
+parameter is copied as-is by the C preprocessor to each instance inside
+the macro. This means that when evaluating a parameter having side
+effects, the function and macro results differ. This is particularly
+fraught when a parameter has overload magic, say it is a tied variable
+that reads the next line in a file upon each evaluation. Having it read
+multiple lines per call is probably not what the caller intended. If a
+macro refers to a potentially overloadable parameter more than once, it
+should first make a copy and then use that copy the rest of the time.
+There are macros in the perl core that violate this, but are gradually
+being converted, usually by changing to use inline functions instead.
+.PP
+Above we said "first make a copy". In a macro, that is easier said than
+done, because macros are normally expressions, and declarations aren't
+allowed in expressions. But the \f(CW\*(C`STMT_START\*(C'\fR\ ..\ \f(CW\*(C`STMT_END\*(C'\fR
+construct, described in perlapi, allows you to
+have declarations in most contexts, as long as you don't need a return
+value. If you do need a value returned, you can make the interface such
+that a pointer is passed to the construct, which then stores its result
+there. (Or you can use GCC brace groups. But these require a fallback
+if the code will ever get executed on a platform that lacks this
+non-standard extension to C. And that fallback would be another code
+path, which can get out-of-sync with the brace group one, so doing this
+isn't advisable.) In situations where there's no other way, Perl does
+furnish "\f(CW\*(C`PL_Sv\*(C'\fR" in perlintern and "\f(CW\*(C`PL_na\*(C'\fR" in perlapi to use (with a
+slight performance penalty) for some such common cases. But beware that
+a call chain involving multiple macros using them will zap the other's
+use. These have been very difficult to debug.
+.PP
+For a concrete example of these pitfalls in action, see
+<https://perlmonks.org/?node_id=11144355>
+.SS "Portability problems"
+.IX Subsection "Portability problems"
+The following are common causes of compilation and/or execution
+failures, not common to Perl as such. The C FAQ is good bedtime
+reading. Please test your changes with as many C compilers and
+platforms as possible; we will, anyway, and it's nice to save oneself
+from public embarrassment.
+.PP
+Also study perlport carefully to avoid any bad assumptions about the
+operating system, filesystems, character set, and so forth.
+.PP
+Do not assume an operating system indicates a certain compiler.
+.IP \(bu 4
+Casting pointers to integers or casting integers to pointers
+.Sp
+.Vb 3
+\& void castaway(U8* p)
+\& {
+\& IV i = p;
+.Ve
+.Sp
+or
+.Sp
+.Vb 3
+\& void castaway(U8* p)
+\& {
+\& IV i = (IV)p;
+.Ve
+.Sp
+Both are bad, and broken, and unportable. Use the \fBPTR2IV()\fR macro that
+does it right. (Likewise, there are \fBPTR2UV()\fR, \fBPTR2NV()\fR, \fBINT2PTR()\fR, and
+\&\fBNUM2PTR()\fR.)
+.IP \(bu 4
+Casting between function pointers and data pointers
+.Sp
+Technically speaking casting between function pointers and data
+pointers is unportable and undefined, but practically speaking it seems
+to work, but you should use the \fBFPTR2DPTR()\fR and \fBDPTR2FPTR()\fR macros.
+Sometimes you can also play games with unions.
+.IP \(bu 4
+Assuming sizeof(int) == sizeof(long)
+.Sp
+There are platforms where longs are 64 bits, and platforms where ints
+are 64 bits, and while we are out to shock you, even platforms where
+shorts are 64 bits. This is all legal according to the C standard. (In
+other words, "long long" is not a portable way to specify 64 bits, and
+"long long" is not even guaranteed to be any wider than "long".)
+.Sp
+Instead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth.
+Avoid things like I32 because they are \fBnot\fR guaranteed to be
+\&\fIexactly\fR 32 bits, they are \fIat least\fR 32 bits, nor are they
+guaranteed to be \fBint\fR or \fBlong\fR. If you explicitly need
+64\-bit variables, use I64 and U64.
+.IP \(bu 4
+Assuming one can dereference any type of pointer for any type of data
+.Sp
+.Vb 2
+\& char *p = ...;
+\& long pony = *(long *)p; /* BAD */
+.Ve
+.Sp
+Many platforms, quite rightly so, will give you a core dump instead of
+a pony if the p happens not to be correctly aligned.
+.IP \(bu 4
+Lvalue casts
+.Sp
+.Vb 1
+\& (int)*p = ...; /* BAD */
+.Ve
+.Sp
+Simply not portable. Get your lvalue to be of the right type, or maybe
+use temporary variables, or dirty tricks with unions.
+.IP \(bu 4
+Assume \fBanything\fR about structs (especially the ones you don't
+control, like the ones coming from the system headers)
+.RS 4
+.IP \(bu 8
+That a certain field exists in a struct
+.IP \(bu 8
+That no other fields exist besides the ones you know of
+.IP \(bu 8
+That a field is of certain signedness, sizeof, or type
+.IP \(bu 8
+That the fields are in a certain order
+.RS 8
+.IP \(bu 8
+While C guarantees the ordering specified in the struct definition,
+between different platforms the definitions might differ
+.RE
+.RS 8
+.RE
+.IP \(bu 8
+That the sizeof(struct) or the alignments are the same everywhere
+.RS 8
+.IP \(bu 8
+There might be padding bytes between the fields to align the fields \-
+the bytes can be anything
+.IP \(bu 8
+Structs are required to be aligned to the maximum alignment required by
+the fields \- which for native types is for usually equivalent to
+\&\fBsizeof()\fR of the field
+.RE
+.RS 8
+.RE
+.RE
+.RS 4
+.RE
+.IP \(bu 4
+Assuming the character set is ASCIIish
+.Sp
+Perl can compile and run under EBCDIC platforms. See perlebcdic.
+This is transparent for the most part, but because the character sets
+differ, you shouldn't use numeric (decimal, octal, nor hex) constants
+to refer to characters. You can safely say \f(CW\*(AqA\*(Aq\fR, but not \f(CW0x41\fR.
+You can safely say \f(CW\*(Aq\en\*(Aq\fR, but not \f(CW\*(C`\e012\*(C'\fR. However, you can use
+macros defined in \fIutf8.h\fR to specify any code point portably.
+\&\f(CWLATIN1_TO_NATIVE(0xDF)\fR is going to be the code point that means
+LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
+ASCII platforms it compiles without adding any extra code, so there is
+zero performance hit on those). The acceptable inputs to
+\&\f(CW\*(C`LATIN1_TO_NATIVE\*(C'\fR are from \f(CW0x00\fR through \f(CW0xFF\fR. If your input
+isn't guaranteed to be in that range, use \f(CW\*(C`UNICODE_TO_NATIVE\*(C'\fR instead.
+\&\f(CW\*(C`NATIVE_TO_LATIN1\*(C'\fR and \f(CW\*(C`NATIVE_TO_UNICODE\*(C'\fR translate the opposite
+direction.
+.Sp
+If you need the string representation of a character that doesn't have a
+mnemonic name in C, you should add it to the list in
+\&\fIregen/unicode_constants.pl\fR, and have Perl create \f(CW\*(C`#define\*(C'\fR's for you,
+based on the current platform.
+.Sp
+Note that the \f(CW\*(C`is\fR\f(CIFOO\fR\f(CW\*(C'\fR and \f(CW\*(C`to\fR\f(CIFOO\fR\f(CW\*(C'\fR macros in \fIhandy.h\fR work
+properly on native code points and strings.
+.Sp
+Also, the range 'A' \- 'Z' in ASCII is an unbroken sequence of 26 upper
+case alphabetic characters. That is not true in EBCDIC. Nor for 'a' to
+\&'z'. But '0' \- '9' is an unbroken range in both systems. Don't assume
+anything about other ranges. (Note that special handling of ranges in
+regular expression patterns and transliterations makes it appear to Perl
+code that the aforementioned ranges are all unbroken.)
+.Sp
+Many of the comments in the existing code ignore the possibility of
+EBCDIC, and may be wrong therefore, even if the code works. This is
+actually a tribute to the successful transparent insertion of being
+able to handle EBCDIC without having to change pre-existing code.
+.Sp
+UTF\-8 and UTF-EBCDIC are two different encodings used to represent
+Unicode code points as sequences of bytes. Macros with the same names
+(but different definitions) in \fIutf8.h\fR and \fIutfebcdic.h\fR are used to
+allow the calling code to think that there is only one such encoding.
+This is almost always referred to as \f(CW\*(C`utf8\*(C'\fR, but it means the EBCDIC
+version as well. Again, comments in the code may well be wrong even if
+the code itself is right. For example, the concept of UTF\-8 \f(CW\*(C`invariant
+characters\*(C'\fR differs between ASCII and EBCDIC. On ASCII platforms, only
+characters that do not have the high-order bit set (i.e. whose ordinals
+are strict ASCII, 0 \- 127) are invariant, and the documentation and
+comments in the code may assume that, often referring to something
+like, say, \f(CW\*(C`hibit\*(C'\fR. The situation differs and is not so simple on
+EBCDIC machines, but as long as the code itself uses the
+\&\f(CWNATIVE_IS_INVARIANT()\fR macro appropriately, it works, even if the
+comments are wrong.
+.Sp
+As noted in "TESTING" in perlhack, when writing test scripts, the file
+\&\fIt/charset_tools.pl\fR contains some helpful functions for writing tests
+valid on both ASCII and EBCDIC platforms. Sometimes, though, a test
+can't use a function and it's inconvenient to have different test
+versions depending on the platform. There are 20 code points that are
+the same in all 4 character sets currently recognized by Perl (the 3
+EBCDIC code pages plus ISO 8859\-1 (ASCII/Latin1)). These can be used in
+such tests, though there is a small possibility that Perl will become
+available in yet another character set, breaking your test. All but one
+of these code points are C0 control characters. The most significant
+controls that are the same are \f(CW\*(C`\e0\*(C'\fR, \f(CW\*(C`\er\*(C'\fR, and \f(CW\*(C`\eN{VT}\*(C'\fR (also
+specifiable as \f(CW\*(C`\ecK\*(C'\fR, \f(CW\*(C`\ex0B\*(C'\fR, \f(CW\*(C`\eN{U+0B}\*(C'\fR, or \f(CW\*(C`\e013\*(C'\fR). The single
+non-control is U+00B6 PILCROW SIGN. The controls that are the same have
+the same bit pattern in all 4 character sets, regardless of the UTF8ness
+of the string containing them. The bit pattern for U+B6 is the same in
+all 4 for non\-UTF8 strings, but differs in each when its containing
+string is UTF\-8 encoded. The only other code points that have some sort
+of sameness across all 4 character sets are the pair 0xDC and 0xFC.
+Together these represent upper\- and lowercase LATIN LETTER U WITH
+DIAERESIS, but which is upper and which is lower may be reversed: 0xDC
+is the capital in Latin1 and 0xFC is the small letter, while 0xFC is the
+capital in EBCDIC and 0xDC is the small one. This factoid may be
+exploited in writing case insensitive tests that are the same across all
+4 character sets.
+.IP \(bu 4
+Assuming the character set is just ASCII
+.Sp
+ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra
+characters have different meanings depending on the locale. Absent a
+locale, currently these extra characters are generally considered to be
+unassigned, and this has presented some problems. This has being
+changed starting in 5.12 so that these characters can be considered to
+be Latin\-1 (ISO\-8859\-1).
+.IP \(bu 4
+Mixing #define and #ifdef
+.Sp
+.Vb 6
+\& #define BURGLE(x) ... \e
+\& #ifdef BURGLE_OLD_STYLE /* BAD */
+\& ... do it the old way ... \e
+\& #else
+\& ... do it the new way ... \e
+\& #endif
+.Ve
+.Sp
+You cannot portably "stack" cpp directives. For example in the above
+you need two separate \fBBURGLE()\fR #defines, one for each #ifdef branch.
+.IP \(bu 4
+Adding non-comment stuff after #endif or #else
+.Sp
+.Vb 5
+\& #ifdef SNOSH
+\& ...
+\& #else !SNOSH /* BAD */
+\& ...
+\& #endif SNOSH /* BAD */
+.Ve
+.Sp
+The #endif and #else cannot portably have anything non-comment after
+them. If you want to document what is going (which is a good idea
+especially if the branches are long), use (C) comments:
+.Sp
+.Vb 5
+\& #ifdef SNOSH
+\& ...
+\& #else /* !SNOSH */
+\& ...
+\& #endif /* SNOSH */
+.Ve
+.Sp
+The gcc option \f(CW\*(C`\-Wendif\-labels\*(C'\fR warns about the bad variant (by
+default on starting from Perl 5.9.4).
+.IP \(bu 4
+Having a comma after the last element of an enum list
+.Sp
+.Vb 5
+\& enum color {
+\& CERULEAN,
+\& CHARTREUSE,
+\& CINNABAR, /* BAD */
+\& };
+.Ve
+.Sp
+is not portable. Leave out the last comma.
+.Sp
+Also note that whether enums are implicitly morphable to ints varies
+between compilers, you might need to (int).
+.IP \(bu 4
+Mixing signed char pointers with unsigned char pointers
+.Sp
+.Vb 4
+\& int foo(char *s) { ... }
+\& ...
+\& unsigned char *t = ...; /* Or U8* t = ... */
+\& foo(t); /* BAD */
+.Ve
+.Sp
+While this is legal practice, it is certainly dubious, and downright
+fatal in at least one platform: for example VMS cc considers this a
+fatal error. One cause for people often making this mistake is that a
+"naked char" and therefore dereferencing a "naked char pointer" have an
+undefined signedness: it depends on the compiler and the flags of the
+compiler and the underlying platform whether the result is signed or
+unsigned. For this very same reason using a 'char' as an array index is
+bad.
+.IP \(bu 4
+Macros that have string constants and their arguments as substrings of
+the string constants
+.Sp
+.Vb 2
+\& #define FOO(n) printf("number = %d\en", n) /* BAD */
+\& FOO(10);
+.Ve
+.Sp
+Pre-ANSI semantics for that was equivalent to
+.Sp
+.Vb 1
+\& printf("10umber = %d\e10");
+.Ve
+.Sp
+which is probably not what you were expecting. Unfortunately at least
+one reasonably common and modern C compiler does "real backward
+compatibility" here, in AIX that is what still happens even though the
+rest of the AIX compiler is very happily C89.
+.IP \(bu 4
+Using printf formats for non-basic C types
+.Sp
+.Vb 2
+\& IV i = ...;
+\& printf("i = %d\en", i); /* BAD */
+.Ve
+.Sp
+While this might by accident work in some platform (where IV happens to
+be an \f(CW\*(C`int\*(C'\fR), in general it cannot. IV might be something larger. Even
+worse the situation is with more specific types (defined by Perl's
+configuration step in \fIconfig.h\fR):
+.Sp
+.Vb 2
+\& Uid_t who = ...;
+\& printf("who = %d\en", who); /* BAD */
+.Ve
+.Sp
+The problem here is that Uid_t might be not only not \f(CW\*(C`int\*(C'\fR\-wide but it
+might also be unsigned, in which case large uids would be printed as
+negative values.
+.Sp
+There is no simple solution to this because of \fBprintf()\fR's limited
+intelligence, but for many types the right format is available as with
+either 'f' or '_f' suffix, for example:
+.Sp
+.Vb 2
+\& IVdf /* IV in decimal */
+\& UVxf /* UV is hexadecimal */
+\&
+\& printf("i = %"IVdf"\en", i); /* The IVdf is a string constant. */
+\&
+\& Uid_t_f /* Uid_t in decimal */
+\&
+\& printf("who = %"Uid_t_f"\en", who);
+.Ve
+.Sp
+Or you can try casting to a "wide enough" type:
+.Sp
+.Vb 1
+\& printf("i = %"IVdf"\en", (IV)something_very_small_and_signed);
+.Ve
+.Sp
+See "Formatted Printing of Size_t and SSize_t" in perlguts for how to
+print those.
+.Sp
+Also remember that the \f(CW%p\fR format really does require a void pointer:
+.Sp
+.Vb 2
+\& U8* p = ...;
+\& printf("p = %p\en", (void*)p);
+.Ve
+.Sp
+The gcc option \f(CW\*(C`\-Wformat\*(C'\fR scans for such problems.
+.IP \(bu 4
+Blindly passing va_list
+.Sp
+Not all platforms support passing va_list to further varargs (stdarg)
+functions. The right thing to do is to copy the va_list using the
+\&\fBPerl_va_copy()\fR if the NEED_VA_COPY is defined.
+.IP \(bu 4
+Using gcc statement expressions
+.Sp
+.Vb 1
+\& val = ({...;...;...}); /* BAD */
+.Ve
+.Sp
+While a nice extension, it's not portable. Historically, Perl used
+them in macros if available to gain some extra speed (essentially
+as a funky form of inlining), but we now support (or emulate) C99
+\&\f(CW\*(C`static inline\*(C'\fR functions, so use them instead. Declare functions as
+\&\f(CW\*(C`PERL_STATIC_INLINE\*(C'\fR to transparently fall back to emulation where needed.
+.IP \(bu 4
+Binding together several statements in a macro
+.Sp
+Use the macros \f(CW\*(C`STMT_START\*(C'\fR and \f(CW\*(C`STMT_END\*(C'\fR.
+.Sp
+.Vb 3
+\& STMT_START {
+\& ...
+\& } STMT_END
+.Ve
+.Sp
+But there can be subtle (but avoidable if you do it right) bugs
+introduced with these; see "\f(CW\*(C`STMT_START\*(C'\fR" in perlapi for best practices
+for their use.
+.IP \(bu 4
+Testing for operating systems or versions when you should be testing for
+features
+.Sp
+.Vb 3
+\& #ifdef _\|_FOONIX_\|_ /* BAD */
+\& foo = quux();
+\& #endif
+.Ve
+.Sp
+Unless you know with 100% certainty that \fBquux()\fR is only ever available
+for the "Foonix" operating system \fBand\fR that is available \fBand\fR
+correctly working for \fBall\fR past, present, \fBand\fR future versions of
+"Foonix", the above is very wrong. This is more correct (though still
+not perfect, because the below is a compile-time check):
+.Sp
+.Vb 3
+\& #ifdef HAS_QUUX
+\& foo = quux();
+\& #endif
+.Ve
+.Sp
+How does the HAS_QUUX become defined where it needs to be? Well, if
+Foonix happens to be Unixy enough to be able to run the Configure
+script, and Configure has been taught about detecting and testing
+\&\fBquux()\fR, the HAS_QUUX will be correctly defined. In other platforms, the
+corresponding configuration step will hopefully do the same.
+.Sp
+In a pinch, if you cannot wait for Configure to be educated, or if you
+have a good hunch of where \fBquux()\fR might be available, you can
+temporarily try the following:
+.Sp
+.Vb 3
+\& #if (defined(_\|_FOONIX_\|_) || defined(_\|_BARNIX_\|_))
+\& # define HAS_QUUX
+\& #endif
+\&
+\& ...
+\&
+\& #ifdef HAS_QUUX
+\& foo = quux();
+\& #endif
+.Ve
+.Sp
+But in any case, try to keep the features and operating systems
+separate.
+.Sp
+A good resource on the predefined macros for various operating
+systems, compilers, and so forth is
+<http://sourceforge.net/p/predef/wiki/Home/>
+.IP \(bu 4
+Assuming the contents of static memory pointed to by the return values
+of Perl wrappers for C library functions doesn't change. Many C library
+functions return pointers to static storage that can be overwritten by
+subsequent calls to the same or related functions. Perl has wrappers
+for some of these functions. Originally many of those wrappers returned
+those volatile pointers. But over time almost all of them have evolved
+to return stable copies. To cope with the remaining ones, do a
+"savepv" in perlapi to make a copy, thus avoiding these problems. You
+will have to free the copy when you're done to avoid memory leaks. If
+you don't have control over when it gets freed, you'll need to make the
+copy in a mortal scalar, like so
+.Sp
+.Vb 1
+\& SvPVX(sv_2mortal(newSVpv(volatile_string, 0)))
+.Ve
+.SS "Problematic System Interfaces"
+.IX Subsection "Problematic System Interfaces"
+.IP \(bu 4
+Perl strings are NOT the same as C strings: They may contain \f(CW\*(C`NUL\*(C'\fR
+characters, whereas a C string is terminated by the first \f(CW\*(C`NUL\*(C'\fR.
+That is why Perl API functions that deal with strings generally take a
+pointer to the first byte and either a length or a pointer to the byte
+just beyond the final one.
+.Sp
+And this is the reason that many of the C library string handling
+functions should not be used. They don't cope with the full generality
+of Perl strings. It may be that your test cases don't have embedded
+\&\f(CW\*(C`NUL\*(C'\fRs, and so the tests pass, whereas there may well eventually arise
+real-world cases where they fail. A lesson here is to include \f(CW\*(C`NUL\*(C'\fRs
+in your tests. Now it's fairly rare in most real world cases to get
+\&\f(CW\*(C`NUL\*(C'\fRs, so your code may seem to work, until one day a \f(CW\*(C`NUL\*(C'\fR comes
+along.
+.Sp
+Here's an example. It used to be a common paradigm, for decades, in the
+perl core to use \f(CW\*(C`strchr("list",\ c)\*(C'\fR to see if the character \f(CW\*(C`c\*(C'\fR is
+any of the ones given in \f(CW"list"\fR, a double-quote-enclosed string of
+the set of characters that we are seeing if \f(CW\*(C`c\*(C'\fR is one of. As long as
+\&\f(CW\*(C`c\*(C'\fR isn't a \f(CW\*(C`NUL\*(C'\fR, it works. But when \f(CW\*(C`c\*(C'\fR is a \f(CW\*(C`NUL\*(C'\fR, \f(CW\*(C`strchr\*(C'\fR
+returns a pointer to the terminating \f(CW\*(C`NUL\*(C'\fR in \f(CW"list"\fR. This likely
+will result in a segfault or a security issue when the caller uses that
+end pointer as the starting point to read from.
+.Sp
+A solution to this and many similar issues is to use the \f(CW\*(C`mem\*(C'\fR\fI\-foo\fR C
+library functions instead. In this case \f(CW\*(C`memchr\*(C'\fR can be used to see if
+\&\f(CW\*(C`c\*(C'\fR is in \f(CW"list"\fR and works even if \f(CW\*(C`c\*(C'\fR is \f(CW\*(C`NUL\*(C'\fR. These functions
+need an additional parameter to give the string length.
+In the case of literal string parameters, perl has defined macros that
+calculate the length for you. See "String Handling" in perlapi.
+.IP \(bu 4
+\&\fBmalloc\fR\|(0), \fBrealloc\fR\|(0), calloc(0, 0) are non-portable. To be portable
+allocate at least one byte. (In general you should rarely need to work
+at this low level, but instead use the various malloc wrappers.)
+.IP \(bu 4
+\&\fBsnprintf()\fR \- the return type is unportable. Use \fBmy_snprintf()\fR instead.
+.SS "Security problems"
+.IX Subsection "Security problems"
+Last but not least, here are various tips for safer coding.
+See also perlclib for libc/stdio replacements one should use.
+.IP \(bu 4
+Do not use \fBgets()\fR
+.Sp
+Or we will publicly ridicule you. Seriously.
+.IP \(bu 4
+Do not use \fBtmpfile()\fR
+.Sp
+Use \fBmkstemp()\fR instead.
+.IP \(bu 4
+Do not use \fBstrcpy()\fR or \fBstrcat()\fR or \fBstrncpy()\fR or \fBstrncat()\fR
+.Sp
+Use \fBmy_strlcpy()\fR and \fBmy_strlcat()\fR instead: they either use the native
+implementation, or Perl's own implementation (borrowed from the public
+domain implementation of INN).
+.IP \(bu 4
+Do not use \fBsprintf()\fR or \fBvsprintf()\fR
+.Sp
+If you really want just plain byte strings, use \fBmy_snprintf()\fR and
+\&\fBmy_vsnprintf()\fR instead, which will try to use \fBsnprintf()\fR and
+\&\fBvsnprintf()\fR if those safer APIs are available. If you want something
+fancier than a plain byte string, use
+\&\f(CW\*(C`Perl_form\*(C'\fR() or SVs and
+\&\f(CWPerl_sv_catpvf()\fR.
+.Sp
+Note that glibc \f(CWprintf()\fR, \f(CWsprintf()\fR, etc. are buggy before glibc
+version 2.17. They won't allow a \f(CW\*(C`%.s\*(C'\fR format with a precision to
+create a string that isn't valid UTF\-8 if the current underlying locale
+of the program is UTF\-8. What happens is that the \f(CW%s\fR and its operand are
+simply skipped without any notice.
+<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
+.IP \(bu 4
+Do not use \fBatoi()\fR
+.Sp
+Use \fBgrok_atoUV()\fR instead. \fBatoi()\fR has ill-defined behavior on overflows,
+and cannot be used for incremental parsing. It is also affected by locale,
+which is bad.
+.IP \(bu 4
+Do not use \fBstrtol()\fR or \fBstrtoul()\fR
+.Sp
+Use \fBgrok_atoUV()\fR instead. \fBstrtol()\fR or \fBstrtoul()\fR (or their IV/UV\-friendly
+macro disguises, \fBStrtol()\fR and \fBStrtoul()\fR, or \fBAtol()\fR and \fBAtoul()\fR are
+affected by locale, which is bad.
+.SH DEBUGGING
+.IX Header "DEBUGGING"
+You can compile a special debugging version of Perl, which allows you
+to use the \f(CW\*(C`\-D\*(C'\fR option of Perl to tell more about what Perl is doing.
+But sometimes there is no alternative than to dive in with a debugger,
+either to see the stack trace of a core dump (very useful in a bug
+report), or trying to figure out what went wrong before the core dump
+happened, or how did we end up having wrong or unexpected results.
+.SS "Poking at Perl"
+.IX Subsection "Poking at Perl"
+To really poke around with Perl, you'll probably want to build Perl for
+debugging, like this:
+.PP
+.Vb 2
+\& ./Configure \-d \-DDEBUGGING
+\& make
+.Ve
+.PP
+\&\f(CW\*(C`\-DDEBUGGING\*(C'\fR turns on the C compiler's \f(CW\*(C`\-g\*(C'\fR flag to have it produce
+debugging information which will allow us to step through a running
+program, and to see in which C function we are at (without the debugging
+information we might see only the numerical addresses of the functions,
+which is not very helpful). It will also turn on the \f(CW\*(C`DEBUGGING\*(C'\fR
+compilation symbol which enables all the internal debugging code in Perl.
+There are a whole bunch of things you can debug with this:
+perlrun lists them all, and the best way to find out
+about them is to play about with them. The most useful options are
+probably
+.PP
+.Vb 5
+\& l Context (loop) stack processing
+\& s Stack snapshots (with v, displays all stacks)
+\& t Trace execution
+\& o Method and overloading resolution
+\& c String/numeric conversions
+.Ve
+.PP
+For example
+.PP
+.Vb 8
+\& $ perl \-Dst \-e \*(Aq$a + 1\*(Aq
+\& ....
+\& (\-e:1) gvsv(main::a)
+\& => UNDEF
+\& (\-e:1) const(IV(1))
+\& => UNDEF IV(1)
+\& (\-e:1) add
+\& => NV(1)
+.Ve
+.PP
+Some of the functionality of the debugging code can be achieved with a
+non-debugging perl by using XS modules:
+.PP
+.Vb 2
+\& \-Dr => use re \*(Aqdebug\*(Aq
+\& \-Dx => use O \*(AqDebug\*(Aq
+.Ve
+.SS "Using a source-level debugger"
+.IX Subsection "Using a source-level debugger"
+If the debugging output of \f(CW\*(C`\-D\*(C'\fR doesn't help you, it's time to step
+through perl's execution with a source-level debugger.
+.IP \(bu 3
+We'll use \f(CW\*(C`gdb\*(C'\fR for our examples here; the principles will apply to
+any debugger (many vendors call their debugger \f(CW\*(C`dbx\*(C'\fR), but check the
+manual of the one you're using.
+.PP
+To fire up the debugger, type
+.PP
+.Vb 1
+\& gdb ./perl
+.Ve
+.PP
+Or if you have a core dump:
+.PP
+.Vb 1
+\& gdb ./perl core
+.Ve
+.PP
+You'll want to do that in your Perl source tree so the debugger can
+read the source code. You should see the copyright message, followed by
+the prompt.
+.PP
+.Vb 1
+\& (gdb)
+.Ve
+.PP
+\&\f(CW\*(C`help\*(C'\fR will get you into the documentation, but here are the most
+useful commands:
+.IP \(bu 3
+run [args]
+.Sp
+Run the program with the given arguments.
+.IP \(bu 3
+break function_name
+.IP \(bu 3
+break source.c:xxx
+.Sp
+Tells the debugger that we'll want to pause execution when we reach
+either the named function (but see "Internal Functions" in perlguts!) or
+the given line in the named source file.
+.IP \(bu 3
+step
+.Sp
+Steps through the program a line at a time.
+.IP \(bu 3
+next
+.Sp
+Steps through the program a line at a time, without descending into
+functions.
+.IP \(bu 3
+continue
+.Sp
+Run until the next breakpoint.
+.IP \(bu 3
+finish
+.Sp
+Run until the end of the current function, then stop again.
+.IP \(bu 3
+\&'enter'
+.Sp
+Just pressing Enter will do the most recent operation again \- it's a
+blessing when stepping through miles of source code.
+.IP \(bu 3
+ptype
+.Sp
+Prints the C definition of the argument given.
+.Sp
+.Vb 10
+\& (gdb) ptype PL_op
+\& type = struct op {
+\& OP *op_next;
+\& OP *op_sibparent;
+\& OP *(*op_ppaddr)(void);
+\& PADOFFSET op_targ;
+\& unsigned int op_type : 9;
+\& unsigned int op_opt : 1;
+\& unsigned int op_slabbed : 1;
+\& unsigned int op_savefree : 1;
+\& unsigned int op_static : 1;
+\& unsigned int op_folded : 1;
+\& unsigned int op_spare : 2;
+\& U8 op_flags;
+\& U8 op_private;
+\& } *
+.Ve
+.IP \(bu 3
+print
+.Sp
+Execute the given C code and print its results. \fBWARNING\fR: Perl makes
+heavy use of macros, and \fIgdb\fR does not necessarily support macros
+(see later "gdb macro support"). You'll have to substitute them
+yourself, or to invoke cpp on the source code files (see "The .i
+Targets") So, for instance, you can't say
+.Sp
+.Vb 1
+\& print SvPV_nolen(sv)
+.Ve
+.Sp
+but you have to say
+.Sp
+.Vb 1
+\& print Perl_sv_2pv_nolen(sv)
+.Ve
+.PP
+You may find it helpful to have a "macro dictionary", which you can
+produce by saying \f(CW\*(C`cpp \-dM perl.c | sort\*(C'\fR. Even then, \fIcpp\fR won't
+recursively apply those macros for you.
+.SS "gdb macro support"
+.IX Subsection "gdb macro support"
+Recent versions of \fIgdb\fR have fairly good macro support, but in order
+to use it you'll need to compile perl with macro definitions included
+in the debugging information. Using \fIgcc\fR version 3.1, this means
+configuring with \f(CW\*(C`\-Doptimize=\-g3\*(C'\fR. Other compilers might use a
+different switch (if they support debugging macros at all).
+.SS "Dumping Perl Data Structures"
+.IX Subsection "Dumping Perl Data Structures"
+One way to get around this macro hell is to use the dumping functions
+in \fIdump.c\fR; these work a little like an internal
+Devel::Peek, but they also cover OPs and other
+structures that you can't get at from Perl. Let's take an example.
+We'll use the \f(CW\*(C`$a = $b + $c\*(C'\fR we used before, but give it a bit of
+context: \f(CW\*(C`$b = "6XXXX"; $c = 2.3;\*(C'\fR. Where's a good place to stop and
+poke around?
+.PP
+What about \f(CW\*(C`pp_add\*(C'\fR, the function we examined earlier to implement the
+\&\f(CW\*(C`+\*(C'\fR operator:
+.PP
+.Vb 2
+\& (gdb) break Perl_pp_add
+\& Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
+.Ve
+.PP
+Notice we use \f(CW\*(C`Perl_pp_add\*(C'\fR and not \f(CW\*(C`pp_add\*(C'\fR \- see
+"Internal Functions" in perlguts. With the breakpoint in place, we can
+run our program:
+.PP
+.Vb 1
+\& (gdb) run \-e \*(Aq$b = "6XXXX"; $c = 2.3; $a = $b + $c\*(Aq
+.Ve
+.PP
+Lots of junk will go past as gdb reads in the relevant source files and
+libraries, and then:
+.PP
+.Vb 5
+\& Breakpoint 1, Perl_pp_add () at pp_hot.c:309
+\& 1396 dSP; dATARGET; bool useleft; SV *svl, *svr;
+\& (gdb) step
+\& 311 dPOPTOPnnrl_ul;
+\& (gdb)
+.Ve
+.PP
+We looked at this bit of code before, and we said that
+\&\f(CW\*(C`dPOPTOPnnrl_ul\*(C'\fR arranges for two \f(CW\*(C`NV\*(C'\fRs to be placed into \f(CW\*(C`left\*(C'\fR and
+\&\f(CW\*(C`right\*(C'\fR \- let's slightly expand it:
+.PP
+.Vb 3
+\& #define dPOPTOPnnrl_ul NV right = POPn; \e
+\& SV *leftsv = TOPs; \e
+\& NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
+.Ve
+.PP
+\&\f(CW\*(C`POPn\*(C'\fR takes the SV from the top of the stack and obtains its NV
+either directly (if \f(CW\*(C`SvNOK\*(C'\fR is set) or by calling the \f(CW\*(C`sv_2nv\*(C'\fR
+function. \f(CW\*(C`TOPs\*(C'\fR takes the next SV from the top of the stack \- yes,
+\&\f(CW\*(C`POPn\*(C'\fR uses \f(CW\*(C`TOPs\*(C'\fR \- but doesn't remove it. We then use \f(CW\*(C`SvNV\*(C'\fR to
+get the NV from \f(CW\*(C`leftsv\*(C'\fR in the same way as before \- yes, \f(CW\*(C`POPn\*(C'\fR uses
+\&\f(CW\*(C`SvNV\*(C'\fR.
+.PP
+Since we don't have an NV for \f(CW$b\fR, we'll have to use \f(CW\*(C`sv_2nv\*(C'\fR to
+convert it. If we step again, we'll find ourselves there:
+.PP
+.Vb 4
+\& (gdb) step
+\& Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
+\& 1669 if (!sv)
+\& (gdb)
+.Ve
+.PP
+We can now use \f(CW\*(C`Perl_sv_dump\*(C'\fR to investigate the SV:
+.PP
+.Vb 8
+\& (gdb) print Perl_sv_dump(sv)
+\& SV = PV(0xa057cc0) at 0xa0675d0
+\& REFCNT = 1
+\& FLAGS = (POK,pPOK)
+\& PV = 0xa06a510 "6XXXX"\e0
+\& CUR = 5
+\& LEN = 6
+\& $1 = void
+.Ve
+.PP
+We know we're going to get \f(CW6\fR from this, so let's finish the
+subroutine:
+.PP
+.Vb 4
+\& (gdb) finish
+\& Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
+\& 0x462669 in Perl_pp_add () at pp_hot.c:311
+\& 311 dPOPTOPnnrl_ul;
+.Ve
+.PP
+We can also dump out this op: the current op is always stored in
+\&\f(CW\*(C`PL_op\*(C'\fR, and we can dump it with \f(CW\*(C`Perl_op_dump\*(C'\fR. This'll give us
+similar output to CPAN module B::Debug.
+.PP
+.Vb 10
+\& (gdb) print Perl_op_dump(PL_op)
+\& {
+\& 13 TYPE = add ===> 14
+\& TARG = 1
+\& FLAGS = (SCALAR,KIDS)
+\& {
+\& TYPE = null ===> (12)
+\& (was rv2sv)
+\& FLAGS = (SCALAR,KIDS)
+\& {
+\& 11 TYPE = gvsv ===> 12
+\& FLAGS = (SCALAR)
+\& GV = main::b
+\& }
+\& }
+.Ve
+.PP
+# finish this later #
+.SS "Using gdb to look at specific parts of a program"
+.IX Subsection "Using gdb to look at specific parts of a program"
+With the example above, you knew to look for \f(CW\*(C`Perl_pp_add\*(C'\fR, but what if
+there were multiple calls to it all over the place, or you didn't know what
+the op was you were looking for?
+.PP
+One way to do this is to inject a rare call somewhere near what you're looking
+for. For example, you could add \f(CW\*(C`study\*(C'\fR before your method:
+.PP
+.Vb 1
+\& study;
+.Ve
+.PP
+And in gdb do:
+.PP
+.Vb 1
+\& (gdb) break Perl_pp_study
+.Ve
+.PP
+And then step until you hit what you're
+looking for. This works well in a loop
+if you want to only break at certain iterations:
+.PP
+.Vb 3
+\& for my $c (1..100) {
+\& study if $c == 50;
+\& }
+.Ve
+.SS "Using gdb to look at what the parser/lexer are doing"
+.IX Subsection "Using gdb to look at what the parser/lexer are doing"
+If you want to see what perl is doing when parsing/lexing your code, you can
+use \f(CW\*(C`BEGIN {}\*(C'\fR:
+.PP
+.Vb 3
+\& print "Before\en";
+\& BEGIN { study; }
+\& print "After\en";
+.Ve
+.PP
+And in gdb:
+.PP
+.Vb 1
+\& (gdb) break Perl_pp_study
+.Ve
+.PP
+If you want to see what the parser/lexer is doing inside of \f(CW\*(C`if\*(C'\fR blocks and
+the like you need to be a little trickier:
+.PP
+.Vb 1
+\& if ($a && $b && do { BEGIN { study } 1 } && $c) { ... }
+.Ve
+.SH "SOURCE CODE STATIC ANALYSIS"
+.IX Header "SOURCE CODE STATIC ANALYSIS"
+Various tools exist for analysing C source code \fBstatically\fR, as
+opposed to \fBdynamically\fR, that is, without executing the code. It is
+possible to detect resource leaks, undefined behaviour, type
+mismatches, portability problems, code paths that would cause illegal
+memory accesses, and other similar problems by just parsing the C code
+and looking at the resulting graph, what does it tell about the
+execution and data flows. As a matter of fact, this is exactly how C
+compilers know to give warnings about dubious code.
+.SS lint
+.IX Subsection "lint"
+The good old C code quality inspector, \f(CW\*(C`lint\*(C'\fR, is available in several
+platforms, but please be aware that there are several different
+implementations of it by different vendors, which means that the flags
+are not identical across different platforms.
+.PP
+There is a \f(CW\*(C`lint\*(C'\fR target in Makefile, but you may have to
+diddle with the flags (see above).
+.SS Coverity
+.IX Subsection "Coverity"
+Coverity (<http://www.coverity.com/>) is a product similar to lint and as
+a testbed for their product they periodically check several open source
+projects, and they give out accounts to open source developers to the
+defect databases.
+.PP
+There is Coverity setup for the perl5 project:
+<https://scan.coverity.com/projects/perl5>
+.SS "HP-UX cadvise (Code Advisor)"
+.IX Subsection "HP-UX cadvise (Code Advisor)"
+HP has a C/C++ static analyzer product for HP-UX caller Code Advisor.
+(Link not given here because the URL is horribly long and seems horribly
+unstable; use the search engine of your choice to find it.) The use of
+the \f(CW\*(C`cadvise_cc\*(C'\fR recipe with \f(CW\*(C`Configure ... \-Dcc=./cadvise_cc\*(C'\fR
+(see cadvise "User Guide") is recommended; as is the use of \f(CW\*(C`+wall\*(C'\fR.
+.SS "cpd (cut-and-paste detector)"
+.IX Subsection "cpd (cut-and-paste detector)"
+The cpd tool detects cut-and-paste coding. If one instance of the
+cut-and-pasted code changes, all the other spots should probably be
+changed, too. Therefore such code should probably be turned into a
+subroutine or a macro.
+.PP
+cpd (<https://pmd.github.io/latest/pmd_userdocs_cpd.html>) is part of the pmd project
+(<https://pmd.github.io/>). pmd was originally written for static
+analysis of Java code, but later the cpd part of it was extended to
+parse also C and C++.
+.PP
+Download the pmd\-bin\-X.Y.zip () from the SourceForge site, extract the
+pmd\-X.Y.jar from it, and then run that on source code thusly:
+.PP
+.Vb 2
+\& java \-cp pmd\-X.Y.jar net.sourceforge.pmd.cpd.CPD \e
+\& \-\-minimum\-tokens 100 \-\-files /some/where/src \-\-language c > cpd.txt
+.Ve
+.PP
+You may run into memory limits, in which case you should use the \-Xmx
+option:
+.PP
+.Vb 1
+\& java \-Xmx512M ...
+.Ve
+.SS "gcc warnings"
+.IX Subsection "gcc warnings"
+Though much can be written about the inconsistency and coverage
+problems of gcc warnings (like \f(CW\*(C`\-Wall\*(C'\fR not meaning "all the warnings",
+or some common portability problems not being covered by \f(CW\*(C`\-Wall\*(C'\fR, or
+\&\f(CW\*(C`\-ansi\*(C'\fR and \f(CW\*(C`\-pedantic\*(C'\fR both being a poorly defined collection of
+warnings, and so forth), gcc is still a useful tool in keeping our
+coding nose clean.
+.PP
+The \f(CW\*(C`\-Wall\*(C'\fR is by default on.
+.PP
+It would be nice for \f(CW\*(C`\-pedantic\*(C'\fR) to be on always, but unfortunately it is not
+safe on all platforms \- for example fatal conflicts with the system headers
+(Solaris being a prime example). If Configure \f(CW\*(C`\-Dgccansipedantic\*(C'\fR is used,
+the \f(CW\*(C`cflags\*(C'\fR frontend selects \f(CW\*(C`\-pedantic\*(C'\fR for the platforms where it is known
+to be safe.
+.PP
+The following extra flags are added:
+.IP \(bu 4
+\&\f(CW\*(C`\-Wendif\-labels\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`\-Wextra\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`\-Wc++\-compat\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`\-Wwrite\-strings\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`\-Werror=pointer\-arith\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`\-Werror=vla\*(C'\fR
+.PP
+The following flags would be nice to have but they would first need
+their own Augean stablemaster:
+.IP \(bu 4
+\&\f(CW\*(C`\-Wshadow\*(C'\fR
+.IP \(bu 4
+\&\f(CW\*(C`\-Wstrict\-prototypes\*(C'\fR
+.PP
+The \f(CW\*(C`\-Wtraditional\*(C'\fR is another example of the annoying tendency of gcc
+to bundle a lot of warnings under one switch (it would be impossible to
+deploy in practice because it would complain a lot) but it does contain
+some warnings that would be beneficial to have available on their own,
+such as the warning about string constants inside macros containing the
+macro arguments: this behaved differently pre-ANSI than it does in
+ANSI, and some C compilers are still in transition, AIX being an
+example.
+.SS "Warnings of other C compilers"
+.IX Subsection "Warnings of other C compilers"
+Other C compilers (yes, there \fBare\fR other C compilers than gcc) often
+have their "strict ANSI" or "strict ANSI with some portability
+extensions" modes on, like for example the Sun Workshop has its \f(CW\*(C`\-Xa\*(C'\fR
+mode on (though implicitly), or the DEC (these days, HP...) has its
+\&\f(CW\*(C`\-std1\*(C'\fR mode on.
+.SH "MEMORY DEBUGGERS"
+.IX Header "MEMORY DEBUGGERS"
+\&\fBNOTE 1\fR: Running under older memory debuggers such as Purify,
+valgrind or Third Degree greatly slows down the execution: seconds
+become minutes, minutes become hours. For example as of Perl 5.8.1, the
+ext/Encode/t/Unicode.t takes extraordinarily long to complete under
+e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more
+than six hours, even on a snappy computer. The said test must be doing
+something that is quite unfriendly for memory debuggers. If you don't
+feel like waiting, that you can simply kill away the perl process.
+Roughly valgrind slows down execution by factor 10, AddressSanitizer by
+factor 2.
+.PP
+\&\fBNOTE 2\fR: To minimize the number of memory leak false alarms (see
+"PERL_DESTRUCT_LEVEL" for more information), you have to set the
+environment variable PERL_DESTRUCT_LEVEL to 2. For example, like this:
+.PP
+.Vb 1
+\& env PERL_DESTRUCT_LEVEL=2 valgrind ./perl \-Ilib ...
+.Ve
+.PP
+\&\fBNOTE 3\fR: There are known memory leaks when there are compile-time
+errors within eval or require, seeing \f(CW\*(C`S_doeval\*(C'\fR in the call stack is
+a good sign of these. Fixing these leaks is non-trivial, unfortunately,
+but they must be fixed eventually.
+.PP
+\&\fBNOTE 4\fR: DynaLoader will not clean up after itself completely
+unless Perl is built with the Configure option
+\&\f(CW\*(C`\-Accflags=\-DDL_UNLOAD_ALL_AT_EXIT\*(C'\fR.
+.SS valgrind
+.IX Subsection "valgrind"
+The valgrind tool can be used to find out both memory leaks and illegal
+heap memory accesses. As of version 3.3.0, Valgrind only supports Linux
+on x86, x86\-64 and PowerPC and Darwin (OS X) on x86 and x86\-64. The
+special "test.valgrind" target can be used to run the tests under
+valgrind. Found errors and memory leaks are logged in files named
+\&\fItestfile.valgrind\fR and by default output is displayed inline.
+.PP
+Example usage:
+.PP
+.Vb 1
+\& make test.valgrind
+.Ve
+.PP
+Since valgrind adds significant overhead, tests will take much longer to
+run. The valgrind tests support being run in parallel to help with this:
+.PP
+.Vb 1
+\& TEST_JOBS=9 make test.valgrind
+.Ve
+.PP
+Note that the above two invocations will be very verbose as reachable
+memory and leak-checking is enabled by default. If you want to just see
+pure errors, try:
+.PP
+.Vb 2
+\& VG_OPTS=\*(Aq\-q \-\-leak\-check=no \-\-show\-reachable=no\*(Aq TEST_JOBS=9 \e
+\& make test.valgrind
+.Ve
+.PP
+Valgrind also provides a cachegrind tool, invoked on perl as:
+.PP
+.Vb 1
+\& VG_OPTS=\-\-tool=cachegrind make test.valgrind
+.Ve
+.PP
+As system libraries (most notably glibc) are also triggering errors,
+valgrind allows to suppress such errors using suppression files. The
+default suppression file that comes with valgrind already catches a lot
+of them. Some additional suppressions are defined in \fIt/perl.supp\fR.
+.PP
+To get valgrind and for more information see
+.PP
+.Vb 1
+\& http://valgrind.org/
+.Ve
+.SS AddressSanitizer
+.IX Subsection "AddressSanitizer"
+AddressSanitizer ("ASan") consists of a compiler instrumentation module
+and a run-time \f(CW\*(C`malloc\*(C'\fR library. ASan is available for a variety of
+architectures, operating systems, and compilers (see project link below).
+It checks for unsafe memory usage, such as use after free and buffer
+overflow conditions, and is fast enough that you can easily compile your
+debugging or optimized perl with it. Modern versions of ASan check for
+memory leaks by default on most platforms, otherwise (e.g. x86_64 OS X)
+this feature can be enabled via \f(CW\*(C`ASAN_OPTIONS=detect_leaks=1\*(C'\fR.
+.PP
+To build perl with AddressSanitizer, your Configure invocation should
+look like:
+.PP
+.Vb 4
+\& sh Configure \-des \-Dcc=clang \e
+\& \-Accflags=\-fsanitize=address \-Aldflags=\-fsanitize=address \e
+\& \-Alddlflags=\-shared\e \-fsanitize=address \e
+\& \-fsanitize\-blacklist=\`pwd\`/asan_ignore
+.Ve
+.PP
+where these arguments mean:
+.IP \(bu 4
+\&\-Dcc=clang
+.Sp
+This should be replaced by the full path to your clang executable if it
+is not in your path.
+.IP \(bu 4
+\&\-Accflags=\-fsanitize=address
+.Sp
+Compile perl and extensions sources with AddressSanitizer.
+.IP \(bu 4
+\&\-Aldflags=\-fsanitize=address
+.Sp
+Link the perl executable with AddressSanitizer.
+.IP \(bu 4
+\&\-Alddlflags=\-shared\e \-fsanitize=address
+.Sp
+Link dynamic extensions with AddressSanitizer. You must manually
+specify \f(CW\*(C`\-shared\*(C'\fR because using \f(CW\*(C`\-Alddlflags=\-shared\*(C'\fR will prevent
+Configure from setting a default value for \f(CW\*(C`lddlflags\*(C'\fR, which usually
+contains \f(CW\*(C`\-shared\*(C'\fR (at least on Linux).
+.IP \(bu 4
+\&\-fsanitize\-blacklist=`pwd`/asan_ignore
+.Sp
+AddressSanitizer will ignore functions listed in the \f(CW\*(C`asan_ignore\*(C'\fR
+file. (This file should contain a short explanation of why each of
+the functions is listed.)
+.PP
+See also
+<https://github.com/google/sanitizers/wiki/AddressSanitizer>.
+.SH PROFILING
+.IX Header "PROFILING"
+Depending on your platform there are various ways of profiling Perl.
+.PP
+There are two commonly used techniques of profiling executables:
+\&\fIstatistical time-sampling\fR and \fIbasic-block counting\fR.
+.PP
+The first method takes periodically samples of the CPU program counter,
+and since the program counter can be correlated with the code generated
+for functions, we get a statistical view of in which functions the
+program is spending its time. The caveats are that very small/fast
+functions have lower probability of showing up in the profile, and that
+periodically interrupting the program (this is usually done rather
+frequently, in the scale of milliseconds) imposes an additional
+overhead that may skew the results. The first problem can be alleviated
+by running the code for longer (in general this is a good idea for
+profiling), the second problem is usually kept in guard by the
+profiling tools themselves.
+.PP
+The second method divides up the generated code into \fIbasic blocks\fR.
+Basic blocks are sections of code that are entered only in the
+beginning and exited only at the end. For example, a conditional jump
+starts a basic block. Basic block profiling usually works by
+\&\fIinstrumenting\fR the code by adding \fIenter basic block #nnnn\fR
+book-keeping code to the generated code. During the execution of the
+code the basic block counters are then updated appropriately. The
+caveat is that the added extra code can skew the results: again, the
+profiling tools usually try to factor their own effects out of the
+results.
+.SS "Gprof Profiling"
+.IX Subsection "Gprof Profiling"
+\&\fIgprof\fR is a profiling tool available in many Unix platforms which
+uses \fIstatistical time-sampling\fR. You can build a profiled version of
+\&\fIperl\fR by compiling using gcc with the flag \f(CW\*(C`\-pg\*(C'\fR. Either edit
+\&\fIconfig.sh\fR or re-run \fIConfigure\fR. Running the profiled version of
+Perl will create an output file called \fIgmon.out\fR which contains the
+profiling data collected during the execution.
+.PP
+quick hint:
+.PP
+.Vb 6
+\& $ sh Configure \-des \-Dusedevel \-Accflags=\*(Aq\-pg\*(Aq \e
+\& \-Aldflags=\*(Aq\-pg\*(Aq \-Alddlflags=\*(Aq\-pg \-shared\*(Aq \e
+\& && make perl
+\& $ ./perl ... # creates gmon.out in current directory
+\& $ gprof ./perl > out
+\& $ less out
+.Ve
+.PP
+(you probably need to add \f(CW\*(C`\-shared\*(C'\fR to the <\-Alddlflags> line until RT
+#118199 is resolved)
+.PP
+The \fIgprof\fR tool can then display the collected data in various ways.
+Usually \fIgprof\fR understands the following options:
+.IP \(bu 4
+\&\-a
+.Sp
+Suppress statically defined functions from the profile.
+.IP \(bu 4
+\&\-b
+.Sp
+Suppress the verbose descriptions in the profile.
+.IP \(bu 4
+\&\-e routine
+.Sp
+Exclude the given routine and its descendants from the profile.
+.IP \(bu 4
+\&\-f routine
+.Sp
+Display only the given routine and its descendants in the profile.
+.IP \(bu 4
+\&\-s
+.Sp
+Generate a summary file called \fIgmon.sum\fR which then may be given to
+subsequent gprof runs to accumulate data over several runs.
+.IP \(bu 4
+\&\-z
+.Sp
+Display routines that have zero usage.
+.PP
+For more detailed explanation of the available commands and output
+formats, see your own local documentation of \fIgprof\fR.
+.SS "GCC gcov Profiling"
+.IX Subsection "GCC gcov Profiling"
+\&\fIbasic block profiling\fR is officially available in gcc 3.0 and later.
+You can build a profiled version of \fIperl\fR by compiling using gcc with
+the flags \f(CW\*(C`\-fprofile\-arcs \-ftest\-coverage\*(C'\fR. Either edit \fIconfig.sh\fR
+or re-run \fIConfigure\fR.
+.PP
+quick hint:
+.PP
+.Vb 9
+\& $ sh Configure \-des \-Dusedevel \-Doptimize=\*(Aq\-g\*(Aq \e
+\& \-Accflags=\*(Aq\-fprofile\-arcs \-ftest\-coverage\*(Aq \e
+\& \-Aldflags=\*(Aq\-fprofile\-arcs \-ftest\-coverage\*(Aq \e
+\& \-Alddlflags=\*(Aq\-fprofile\-arcs \-ftest\-coverage \-shared\*(Aq \e
+\& && make perl
+\& $ rm \-f regexec.c.gcov regexec.gcda
+\& $ ./perl ...
+\& $ gcov regexec.c
+\& $ less regexec.c.gcov
+.Ve
+.PP
+(you probably need to add \f(CW\*(C`\-shared\*(C'\fR to the <\-Alddlflags> line until RT
+#118199 is resolved)
+.PP
+Running the profiled version of Perl will cause profile output to be
+generated. For each source file an accompanying \fI.gcda\fR file will be
+created.
+.PP
+To display the results you use the \fIgcov\fR utility (which should be
+installed if you have gcc 3.0 or newer installed). \fIgcov\fR is run on
+source code files, like this
+.PP
+.Vb 1
+\& gcov sv.c
+.Ve
+.PP
+which will cause \fIsv.c.gcov\fR to be created. The \fI.gcov\fR files contain
+the source code annotated with relative frequencies of execution
+indicated by "#" markers. If you want to generate \fI.gcov\fR files for
+all profiled object files, you can run something like this:
+.PP
+.Vb 3
+\& for file in \`find . \-name \e*.gcno\`
+\& do sh \-c "cd \`dirname $file\` && gcov \`basename $file .gcno\`"
+\& done
+.Ve
+.PP
+Useful options of \fIgcov\fR include \f(CW\*(C`\-b\*(C'\fR which will summarise the basic
+block, branch, and function call coverage, and \f(CW\*(C`\-c\*(C'\fR which instead of
+relative frequencies will use the actual counts. For more information
+on the use of \fIgcov\fR and basic block profiling with gcc, see the
+latest GNU CC manual. As of gcc 4.8, this is at
+<http://gcc.gnu.org/onlinedocs/gcc/Gcov\-Intro.html#Gcov\-Intro>
+.SS "callgrind profiling"
+.IX Subsection "callgrind profiling"
+callgrind is a valgrind tool for profiling source code. Paired
+with kcachegrind (a Qt based UI), it gives you an overview of
+where code is taking up time, as well as the ability
+to examine callers, call trees, and more. One of its benefits
+is you can use it on perl and XS modules that have not been
+compiled with debugging symbols.
+.PP
+If perl is compiled with debugging symbols (\f(CW\*(C`\-g\*(C'\fR), you can view
+the annotated source and click around, much like Devel::NYTProf's
+HTML output.
+.PP
+For basic usage:
+.PP
+.Vb 1
+\& valgrind \-\-tool=callgrind ./perl ...
+.Ve
+.PP
+By default it will write output to \fIcallgrind.out.PID\fR, but you
+can change that with \f(CW\*(C`\-\-callgrind\-out\-file=...\*(C'\fR
+.PP
+To view the data, do:
+.PP
+.Vb 1
+\& kcachegrind callgrind.out.PID
+.Ve
+.PP
+If you'd prefer to view the data in a terminal, you can use
+\&\fIcallgrind_annotate\fR. In it's basic form:
+.PP
+.Vb 1
+\& callgrind_annotate callgrind.out.PID | less
+.Ve
+.PP
+Some useful options are:
+.IP \(bu 4
+\&\-\-threshold
+.Sp
+Percentage of counts (of primary sort event) we are interested in.
+The default is 99%, 100% might show things that seem to be missing.
+.IP \(bu 4
+\&\-\-auto
+.Sp
+Annotate all source files containing functions that helped reach
+the event count threshold.
+.SH "MISCELLANEOUS TRICKS"
+.IX Header "MISCELLANEOUS TRICKS"
+.SS PERL_DESTRUCT_LEVEL
+.IX Subsection "PERL_DESTRUCT_LEVEL"
+If you want to run any of the tests yourself manually using e.g.
+valgrind, please note that by default perl \fBdoes not\fR explicitly
+cleanup all the memory it has allocated (such as global memory arenas)
+but instead lets the \fBexit()\fR of the whole program "take care" of such
+allocations, also known as "global destruction of objects".
+.PP
+There is a way to tell perl to do complete cleanup: set the environment
+variable PERL_DESTRUCT_LEVEL to a non-zero value. The t/TEST wrapper
+does set this to 2, and this is what you need to do too, if you don't
+want to see the "global leaks": For example, for running under valgrind
+.PP
+.Vb 1
+\& env PERL_DESTRUCT_LEVEL=2 valgrind ./perl \-Ilib t/foo/bar.t
+.Ve
+.PP
+(Note: the mod_perl apache module uses also this environment variable
+for its own purposes and extended its semantics. Refer to the mod_perl
+documentation for more information. Also, spawned threads do the
+equivalent of setting this variable to the value 1.)
+.PP
+If, at the end of a run you get the message \fIN scalars leaked\fR, you
+can recompile with \f(CW\*(C`\-DDEBUG_LEAKING_SCALARS\*(C'\fR,
+(\f(CW\*(C`Configure \-Accflags=\-DDEBUG_LEAKING_SCALARS\*(C'\fR), which will cause the
+addresses of all those leaked SVs to be dumped along with details as to
+where each SV was originally allocated. This information is also
+displayed by Devel::Peek. Note that the extra details recorded with
+each SV increases memory usage, so it shouldn't be used in production
+environments. It also converts \f(CWnew_SV()\fR from a macro into a real
+function, so you can use your favourite debugger to discover where
+those pesky SVs were allocated.
+.PP
+If you see that you're leaking memory at runtime, but neither valgrind
+nor \f(CW\*(C`\-DDEBUG_LEAKING_SCALARS\*(C'\fR will find anything, you're probably
+leaking SVs that are still reachable and will be properly cleaned up
+during destruction of the interpreter. In such cases, using the \f(CW\*(C`\-Dm\*(C'\fR
+switch can point you to the source of the leak. If the executable was
+built with \f(CW\*(C`\-DDEBUG_LEAKING_SCALARS\*(C'\fR, \f(CW\*(C`\-Dm\*(C'\fR will output SV
+allocations in addition to memory allocations. Each SV allocation has a
+distinct serial number that will be written on creation and destruction
+of the SV. So if you're executing the leaking code in a loop, you need
+to look for SVs that are created, but never destroyed between each
+cycle. If such an SV is found, set a conditional breakpoint within
+\&\f(CWnew_SV()\fR and make it break only when \f(CW\*(C`PL_sv_serial\*(C'\fR is equal to the
+serial number of the leaking SV. Then you will catch the interpreter in
+exactly the state where the leaking SV is allocated, which is
+sufficient in many cases to find the source of the leak.
+.PP
+As \f(CW\*(C`\-Dm\*(C'\fR is using the PerlIO layer for output, it will by itself
+allocate quite a bunch of SVs, which are hidden to avoid recursion. You
+can bypass the PerlIO layer if you use the SV logging provided by
+\&\f(CW\*(C`\-DPERL_MEM_LOG\*(C'\fR instead.
+.SS PERL_MEM_LOG
+.IX Subsection "PERL_MEM_LOG"
+If compiled with \f(CW\*(C`\-DPERL_MEM_LOG\*(C'\fR (\f(CW\*(C`\-Accflags=\-DPERL_MEM_LOG\*(C'\fR), both
+memory and SV allocations go through logging functions, which is
+handy for breakpoint setting.
+.PP
+Unless \f(CW\*(C`\-DPERL_MEM_LOG_NOIMPL\*(C'\fR (\f(CW\*(C`\-Accflags=\-DPERL_MEM_LOG_NOIMPL\*(C'\fR) is
+also compiled, the logging functions read \f(CW$ENV\fR{PERL_MEM_LOG} to
+determine whether to log the event, and if so how:
+.PP
+.Vb 6
+\& $ENV{PERL_MEM_LOG} =~ /m/ Log all memory ops
+\& $ENV{PERL_MEM_LOG} =~ /s/ Log all SV ops
+\& $ENV{PERL_MEM_LOG} =~ /c/ Additionally log C backtrace for
+\& new_SV events
+\& $ENV{PERL_MEM_LOG} =~ /t/ include timestamp in Log
+\& $ENV{PERL_MEM_LOG} =~ /^(\ed+)/ write to FD given (default is 2)
+.Ve
+.PP
+Memory logging is somewhat similar to \f(CW\*(C`\-Dm\*(C'\fR but is independent of
+\&\f(CW\*(C`\-DDEBUGGING\*(C'\fR, and at a higher level; all uses of \fBNewx()\fR, \fBRenew()\fR, and
+\&\fBSafefree()\fR are logged with the caller's source code file and line
+number (and C function name, if supported by the C compiler). In
+contrast, \f(CW\*(C`\-Dm\*(C'\fR is directly at the point of \f(CWmalloc()\fR. SV logging is
+similar.
+.PP
+Since the logging doesn't use PerlIO, all SV allocations are logged and
+no extra SV allocations are introduced by enabling the logging. If
+compiled with \f(CW\*(C`\-DDEBUG_LEAKING_SCALARS\*(C'\fR, the serial number for each SV
+allocation is also logged.
+.PP
+The \f(CW\*(C`c\*(C'\fR option uses the \f(CW\*(C`Perl_c_backtrace\*(C'\fR facility, and therefore
+additionally requires the Configure \f(CW\*(C`\-Dusecbacktrace\*(C'\fR compile flag in
+order to access it.
+.SS "DDD over gdb"
+.IX Subsection "DDD over gdb"
+Those debugging perl with the DDD frontend over gdb may find the
+following useful:
+.PP
+You can extend the data conversion shortcuts menu, so for example you
+can display an SV's IV value with one click, without doing any typing.
+To do that simply edit ~/.ddd/init file and add after:
+.PP
+.Vb 6
+\& ! Display shortcuts.
+\& Ddd*gdbDisplayShortcuts: \e
+\& /t () // Convert to Bin\en\e
+\& /d () // Convert to Dec\en\e
+\& /x () // Convert to Hex\en\e
+\& /o () // Convert to Oct(\en\e
+.Ve
+.PP
+the following two lines:
+.PP
+.Vb 2
+\& ((XPV*) (())\->sv_any )\->xpv_pv // 2pvx\en\e
+\& ((XPVIV*) (())\->sv_any )\->xiv_iv // 2ivx
+.Ve
+.PP
+so now you can do ivx and pvx lookups or you can plug there the sv_peek
+"conversion":
+.PP
+.Vb 1
+\& Perl_sv_peek(my_perl, (SV*)()) // sv_peek
+.Ve
+.PP
+(The my_perl is for threaded builds.) Just remember that every line,
+but the last one, should end with \en\e
+.PP
+Alternatively edit the init file interactively via: 3rd mouse button \->
+New Display \-> Edit Menu
+.PP
+Note: you can define up to 20 conversion shortcuts in the gdb section.
+.SS "C backtrace"
+.IX Subsection "C backtrace"
+On some platforms Perl supports retrieving the C level backtrace
+(similar to what symbolic debuggers like gdb do).
+.PP
+The backtrace returns the stack trace of the C call frames,
+with the symbol names (function names), the object names (like "perl"),
+and if it can, also the source code locations (file:line).
+.PP
+The supported platforms are Linux, and OS X (some *BSD might
+work at least partly, but they have not yet been tested).
+.PP
+This feature hasn't been tested with multiple threads, but it will
+only show the backtrace of the thread doing the backtracing.
+.PP
+The feature needs to be enabled with \f(CW\*(C`Configure \-Dusecbacktrace\*(C'\fR.
+.PP
+The \f(CW\*(C`\-Dusecbacktrace\*(C'\fR also enables keeping the debug information when
+compiling/linking (often: \f(CW\*(C`\-g\*(C'\fR). Many compilers/linkers do support
+having both optimization and keeping the debug information. The debug
+information is needed for the symbol names and the source locations.
+.PP
+Static functions might not be visible for the backtrace.
+.PP
+Source code locations, even if available, can often be missing or
+misleading if the compiler has e.g. inlined code. Optimizer can
+make matching the source code and the object code quite challenging.
+.IP Linux 4
+.IX Item "Linux"
+You \fBmust\fR have the BFD (\-lbfd) library installed, otherwise \f(CW\*(C`perl\*(C'\fR will
+fail to link. The BFD is usually distributed as part of the GNU binutils.
+.Sp
+Summary: \f(CW\*(C`Configure ... \-Dusecbacktrace\*(C'\fR
+and you need \f(CW\*(C`\-lbfd\*(C'\fR.
+.IP "OS X" 4
+.IX Item "OS X"
+The source code locations are supported \fBonly\fR if you have
+the Developer Tools installed. (BFD is \fBnot\fR needed.)
+.Sp
+Summary: \f(CW\*(C`Configure ... \-Dusecbacktrace\*(C'\fR
+and installing the Developer Tools would be good.
+.PP
+Optionally, for trying out the feature, you may want to enable
+automatic dumping of the backtrace just before a warning or croak (die)
+message is emitted, by adding \f(CW\*(C`\-Accflags=\-DUSE_C_BACKTRACE_ON_ERROR\*(C'\fR
+for Configure.
+.PP
+Unless the above additional feature is enabled, nothing about the
+backtrace functionality is visible, except for the Perl/XS level.
+.PP
+Furthermore, even if you have enabled this feature to be compiled,
+you need to enable it in runtime with an environment variable:
+\&\f(CW\*(C`PERL_C_BACKTRACE_ON_ERROR=10\*(C'\fR. It must be an integer higher
+than zero, telling the desired frame count.
+.PP
+Retrieving the backtrace from Perl level (using for example an XS
+extension) would be much less exciting than one would hope: normally
+you would see \f(CW\*(C`runops\*(C'\fR, \f(CW\*(C`entersub\*(C'\fR, and not much else. This API is
+intended to be called \fBfrom within\fR the Perl implementation, not from
+Perl level execution.
+.PP
+The C API for the backtrace is as follows:
+.IP get_c_backtrace 4
+.IX Item "get_c_backtrace"
+.PD 0
+.IP free_c_backtrace 4
+.IX Item "free_c_backtrace"
+.IP get_c_backtrace_dump 4
+.IX Item "get_c_backtrace_dump"
+.IP dump_c_backtrace 4
+.IX Item "dump_c_backtrace"
+.PD
+.SS Poison
+.IX Subsection "Poison"
+If you see in a debugger a memory area mysteriously full of 0xABABABAB
+or 0xEFEFEFEF, you may be seeing the effect of the \fBPoison()\fR macros, see
+perlclib.
+.SS "Read-only optrees"
+.IX Subsection "Read-only optrees"
+Under ithreads the optree is read only. If you want to enforce this, to
+check for write accesses from buggy code, compile with
+\&\f(CW\*(C`\-Accflags=\-DPERL_DEBUG_READONLY_OPS\*(C'\fR
+to enable code that allocates op memory
+via \f(CW\*(C`mmap\*(C'\fR, and sets it read-only when it is attached to a subroutine.
+Any write access to an op results in a \f(CW\*(C`SIGBUS\*(C'\fR and abort.
+.PP
+This code is intended for development only, and may not be portable
+even to all Unix variants. Also, it is an 80% solution, in that it
+isn't able to make all ops read only. Specifically it does not apply to
+op slabs belonging to \f(CW\*(C`BEGIN\*(C'\fR blocks.
+.PP
+However, as an 80% solution it is still effective, as it has caught
+bugs in the past.
+.SS "When is a bool not a bool?"
+.IX Subsection "When is a bool not a bool?"
+There wasn't necessarily a standard \f(CW\*(C`bool\*(C'\fR type on compilers prior to
+C99, and so some workarounds were created. The \f(CW\*(C`TRUE\*(C'\fR and \f(CW\*(C`FALSE\*(C'\fR
+macros are still available as alternatives for \f(CW\*(C`true\*(C'\fR and \f(CW\*(C`false\*(C'\fR.
+And the \f(CW\*(C`cBOOL\*(C'\fR macro was created to correctly cast to a true/false
+value in all circumstances, but should no longer be necessary.
+Using \f(CW\*(C`(bool)\*(C'\fR\ \fIexpr\fR> should now always work.
+.PP
+There are no plans to remove any of \f(CW\*(C`TRUE\*(C'\fR, \f(CW\*(C`FALSE\*(C'\fR, nor \f(CW\*(C`cBOOL\*(C'\fR.
+.SS "Finding unsafe truncations"
+.IX Subsection "Finding unsafe truncations"
+You may wish to run \f(CW\*(C`Configure\*(C'\fR with something like
+.PP
+.Vb 1
+\& \-Accflags=\*(Aq\-Wconversion \-Wno\-sign\-conversion \-Wno\-shorten\-64\-to\-32\*(Aq
+.Ve
+.PP
+or your compiler's equivalent to make it easier to spot any unsafe truncations
+that show up.
+.SS "The .i Targets"
+.IX Subsection "The .i Targets"
+You can expand the macros in a \fIfoo.c\fR file by saying
+.PP
+.Vb 1
+\& make foo.i
+.Ve
+.PP
+which will expand the macros using cpp. Don't be scared by the
+results.
+.SH AUTHOR
+.IX Header "AUTHOR"
+This document was originally written by Nathan Torkington, and is
+maintained by the perl5\-porters mailing list.