diff options
Diffstat (limited to 'upstream/fedora-40/man1/perldebguts.1')
-rw-r--r-- | upstream/fedora-40/man1/perldebguts.1 | 1131 |
1 files changed, 1131 insertions, 0 deletions
diff --git a/upstream/fedora-40/man1/perldebguts.1 b/upstream/fedora-40/man1/perldebguts.1 new file mode 100644 index 00000000..f85a1937 --- /dev/null +++ b/upstream/fedora-40/man1/perldebguts.1 @@ -0,0 +1,1131 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLDEBGUTS 1" +.TH PERLDEBGUTS 1 2024-01-25 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perldebguts \- Guts of Perl debugging +.SH DESCRIPTION +.IX Header "DESCRIPTION" +This is not perldebug, which tells you how to use +the debugger. This manpage describes low-level details concerning +the debugger's internals, which range from difficult to impossible +to understand for anyone who isn't incredibly intimate with Perl's guts. +Caveat lector. +.SH "Debugger Internals" +.IX Header "Debugger Internals" +Perl has special debugging hooks at compile-time and run-time used +to create debugging environments. These hooks are not to be confused +with the \fIperl \-Dxxx\fR command described in perlrun, +which is usable only if a special Perl is built per the instructions in +the \fIINSTALL\fR file in the Perl source tree. +.PP +For example, whenever you call Perl's built-in \f(CW\*(C`caller\*(C'\fR function +from the package \f(CW\*(C`DB\*(C'\fR, the arguments that the corresponding stack +frame was called with are copied to the \f(CW@DB::args\fR array. These +mechanisms are enabled by calling Perl with the \fB\-d\fR switch. +Specifically, the following additional features are enabled +(cf. "$^P" in perlvar): +.IP \(bu 4 +Perl inserts the contents of \f(CW$ENV{PERL5DB}\fR (or \f(CW\*(C`BEGIN {require +\&\*(Aqperl5db.pl\*(Aq}\*(C'\fR if not present) before the first line of your program. +.IP \(bu 4 +Each array \f(CW\*(C`@{"_<$filename"}\*(C'\fR holds the lines of \f(CW$filename\fR for a +file compiled by Perl. The same is also true for \f(CW\*(C`eval\*(C'\fRed strings +that contain subroutines, or which are currently being executed. +The \f(CW$filename\fR for \f(CW\*(C`eval\*(C'\fRed strings looks like \f(CW\*(C`(eval 34)\*(C'\fR. +.Sp +Values in this array are magical in numeric context: they compare +equal to zero only if the line is not breakable. +.IP \(bu 4 +Each hash \f(CW\*(C`%{"_<$filename"}\*(C'\fR contains breakpoints and actions keyed +by line number. Individual entries (as opposed to the whole hash) +are settable. Perl only cares about Boolean true here, although +the values used by \fIperl5db.pl\fR have the form +\&\f(CW"$break_condition\e0$action"\fR. +.Sp +The same holds for evaluated strings that contain subroutines, or +which are currently being executed. The \f(CW$filename\fR for \f(CW\*(C`eval\*(C'\fRed strings +looks like \f(CW\*(C`(eval 34)\*(C'\fR. +.IP \(bu 4 +Each scalar \f(CW\*(C`${"_<$filename"}\*(C'\fR contains \f(CW$filename\fR. This is +also the case for evaluated strings that contain subroutines, or +which are currently being executed. The \f(CW$filename\fR for \f(CW\*(C`eval\*(C'\fRed +strings looks like \f(CW\*(C`(eval 34)\*(C'\fR. +.IP \(bu 4 +After each \f(CW\*(C`require\*(C'\fRd file is compiled, but before it is executed, +\&\f(CWDB::postponed(*{"_<$filename"})\fR is called if the subroutine +\&\f(CW\*(C`DB::postponed\*(C'\fR exists. Here, the \f(CW$filename\fR is the expanded name of +the \f(CW\*(C`require\*(C'\fRd file, as found in the values of \f(CW%INC\fR. +.IP \(bu 4 +After each subroutine \f(CW\*(C`subname\*(C'\fR is compiled, the existence of +\&\f(CW$DB::postponed{subname}\fR is checked. If this key exists, +\&\f(CWDB::postponed(subname)\fR is called if the \f(CW\*(C`DB::postponed\*(C'\fR subroutine +also exists. +.IP \(bu 4 +A hash \f(CW%DB::sub\fR is maintained, whose keys are subroutine names +and whose values have the form \f(CW\*(C`filename:startline\-endline\*(C'\fR. +\&\f(CW\*(C`filename\*(C'\fR has the form \f(CW\*(C`(eval 34)\*(C'\fR for subroutines defined inside +\&\f(CW\*(C`eval\*(C'\fRs. +.IP \(bu 4 +When the execution of your program reaches a point that can hold a +breakpoint, the \f(CWDB::DB()\fR subroutine is called if any of the variables +\&\f(CW$DB::trace\fR, \f(CW$DB::single\fR, or \f(CW$DB::signal\fR is true. These variables +are not \f(CW\*(C`local\*(C'\fRizable. This feature is disabled when executing +inside \f(CWDB::DB()\fR, including functions called from it +unless \f(CW\*(C`$^D & (1<<30)\*(C'\fR is true. +.IP \(bu 4 +When execution of the program reaches a subroutine call, a call to +\&\f(CW&DB::sub\fR(\fIargs\fR) is made instead, with \f(CW$DB::sub\fR set to identify +the called subroutine. (This doesn't happen if the calling subroutine +was compiled in the \f(CW\*(C`DB\*(C'\fR package.) \f(CW$DB::sub\fR normally holds the name +of the called subroutine, if it has a name by which it can be looked up. +Failing that, \f(CW$DB::sub\fR will hold a reference to the called subroutine. +Either way, the \f(CW&DB::sub\fR subroutine can use \f(CW$DB::sub\fR as a reference +by which to call the called subroutine, which it will normally want to do. +.Sp +If the call is to an lvalue subroutine, and \f(CW&DB::lsub\fR +is defined \f(CW&DB::lsub\fR(\fIargs\fR) is called instead, otherwise falling +back to \f(CW&DB::sub\fR(\fIargs\fR). +.IX Xref "&DB::lsub" +.IP \(bu 4 +When execution of the program uses \f(CW\*(C`goto\*(C'\fR to enter a non-XS subroutine +and the 0x80 bit is set in \f(CW$^P\fR, a call to \f(CW&DB::goto\fR is made, with +\&\f(CW$DB::sub\fR set to identify the subroutine being entered. The call to +\&\f(CW&DB::goto\fR does not replace the \f(CW\*(C`goto\*(C'\fR; the requested subroutine will +still be entered once \f(CW&DB::goto\fR has returned. \f(CW$DB::sub\fR normally +holds the name of the subroutine being entered, if it has one. Failing +that, \f(CW$DB::sub\fR will hold a reference to the subroutine being entered. +Unlike when \f(CW&DB::sub\fR is called, it is not guaranteed that \f(CW$DB::sub\fR +can be used as a reference to operate on the subroutine being entered. +.PP +Note that if \f(CW&DB::sub\fR needs external data for it to work, no +subroutine call is possible without it. As an example, the standard +debugger's \f(CW&DB::sub\fR depends on the \f(CW$DB::deep\fR variable +(it defines how many levels of recursion deep into the debugger you can go +before a mandatory break). If \f(CW$DB::deep\fR is not defined, subroutine +calls are not possible, even though \f(CW&DB::sub\fR exists. +.SS "Writing Your Own Debugger" +.IX Subsection "Writing Your Own Debugger" +\fIEnvironment Variables\fR +.IX Subsection "Environment Variables" +.PP +The \f(CW\*(C`PERL5DB\*(C'\fR environment variable can be used to define a debugger. +For example, the minimal "working" debugger (it actually doesn't do anything) +consists of one line: +.PP +.Vb 1 +\& sub DB::DB {} +.Ve +.PP +It can easily be defined like this: +.PP +.Vb 1 +\& $ PERL5DB="sub DB::DB {}" perl \-d your\-script +.Ve +.PP +Another brief debugger, slightly more useful, can be created +with only the line: +.PP +.Vb 1 +\& sub DB::DB {print ++$i; scalar <STDIN>} +.Ve +.PP +This debugger prints a number which increments for each statement +encountered and waits for you to hit a newline before continuing +to the next statement. +.PP +The following debugger is actually useful: +.PP +.Vb 5 +\& { +\& package DB; +\& sub DB {} +\& sub sub {print ++$i, " $sub\en"; &$sub} +\& } +.Ve +.PP +It prints the sequence number of each subroutine call and the name of the +called subroutine. Note that \f(CW&DB::sub\fR is being compiled into the +package \f(CW\*(C`DB\*(C'\fR through the use of the \f(CW\*(C`package\*(C'\fR directive. +.PP +When it starts, the debugger reads your rc file (\fI./.perldb\fR or +\&\fI~/.perldb\fR under Unix), which can set important options. +(A subroutine (\f(CW&afterinit\fR) can be defined here as well; it is executed +after the debugger completes its own initialization.) +.PP +After the rc file is read, the debugger reads the PERLDB_OPTS +environment variable and uses it to set debugger options. The +contents of this variable are treated as if they were the argument +of an \f(CW\*(C`o ...\*(C'\fR debugger command (q.v. in "Configurable Options" in perldebug). +.PP +\fIDebugger Internal Variables\fR +.IX Subsection "Debugger Internal Variables" +.PP +In addition to the file and subroutine-related variables mentioned above, +the debugger also maintains various magical internal variables. +.IP \(bu 4 +\&\f(CW@DB::dbline\fR is an alias for \f(CW\*(C`@{"::_<current_file"}\*(C'\fR, which +holds the lines of the currently-selected file (compiled by Perl), either +explicitly chosen with the debugger's \f(CW\*(C`f\*(C'\fR command, or implicitly by flow +of execution. +.Sp +Values in this array are magical in numeric context: they compare +equal to zero only if the line is not breakable. +.IP \(bu 4 +\&\f(CW%DB::dbline\fR is an alias for \f(CW\*(C`%{"::_<current_file"}\*(C'\fR, which +contains breakpoints and actions keyed by line number in +the currently-selected file, either explicitly chosen with the +debugger's \f(CW\*(C`f\*(C'\fR command, or implicitly by flow of execution. +.Sp +As previously noted, individual entries (as opposed to the whole hash) +are settable. Perl only cares about Boolean true here, although +the values used by \fIperl5db.pl\fR have the form +\&\f(CW"$break_condition\e0$action"\fR. +.PP +\fIDebugger Customization Functions\fR +.IX Subsection "Debugger Customization Functions" +.PP +Some functions are provided to simplify customization. +.IP \(bu 4 +See "Configurable Options" in perldebug for a description of options parsed by +\&\f(CWDB::parse_options(string)\fR. +.IP \(bu 4 +\&\f(CW\*(C`DB::dump_trace(skip[,count])\*(C'\fR skips the specified number of frames +and returns a list containing information about the calling frames (all +of them, if \f(CW\*(C`count\*(C'\fR is missing). Each entry is reference to a hash +with keys \f(CW\*(C`context\*(C'\fR (either \f(CW\*(C`.\*(C'\fR, \f(CW\*(C`$\*(C'\fR, or \f(CW\*(C`@\*(C'\fR), \f(CW\*(C`sub\*(C'\fR (subroutine +name, or info about \f(CW\*(C`eval\*(C'\fR), \f(CW\*(C`args\*(C'\fR (\f(CW\*(C`undef\*(C'\fR or a reference to +an array), \f(CW\*(C`file\*(C'\fR, and \f(CW\*(C`line\*(C'\fR. +.IP \(bu 4 +\&\f(CW\*(C`DB::print_trace(FH, skip[, count[, short]])\*(C'\fR prints +formatted info about caller frames. The last two functions may be +convenient as arguments to \f(CW\*(C`<\*(C'\fR, \f(CW\*(C`<<\*(C'\fR commands. +.PP +Note that any variables and functions that are not documented in +this manpages (or in perldebug) are considered for internal +use only, and as such are subject to change without notice. +.SH "Frame Listing Output Examples" +.IX Header "Frame Listing Output Examples" +The \f(CW\*(C`frame\*(C'\fR option can be used to control the output of frame +information. For example, contrast this expression trace: +.PP +.Vb 2 +\& $ perl \-de 42 +\& Stack dump during die enabled outside of evals. +\& +\& Loading DB routines from perl5db.pl patch level 0.94 +\& Emacs support available. +\& +\& Enter h or \*(Aqh h\*(Aq for help. +\& +\& main::(\-e:1): 0 +\& DB<1> sub foo { 14 } +\& +\& DB<2> sub bar { 3 } +\& +\& DB<3> t print foo() * bar() +\& main::((eval 172):3): print foo() + bar(); +\& main::foo((eval 168):2): +\& main::bar((eval 170):2): +\& 42 +.Ve +.PP +with this one, once the \f(CW\*(C`o\*(C'\fRption \f(CW\*(C`frame=2\*(C'\fR has been set: +.PP +.Vb 11 +\& DB<4> o f=2 +\& frame = \*(Aq2\*(Aq +\& DB<5> t print foo() * bar() +\& 3: foo() * bar() +\& entering main::foo +\& 2: sub foo { 14 }; +\& exited main::foo +\& entering main::bar +\& 2: sub bar { 3 }; +\& exited main::bar +\& 42 +.Ve +.PP +By way of demonstration, we present below a laborious listing +resulting from setting your \f(CW\*(C`PERLDB_OPTS\*(C'\fR environment variable to +the value \f(CW\*(C`f=n N\*(C'\fR, and running \fIperl \-d \-V\fR from the command line. +Examples using various values of \f(CW\*(C`n\*(C'\fR are shown to give you a feel +for the difference between settings. Long though it may be, this +is not a complete listing, but only excerpts. +.IP 1. 4 +.Vb 10 +\& entering main::BEGIN +\& entering Config::BEGIN +\& Package lib/Exporter.pm. +\& Package lib/Carp.pm. +\& Package lib/Config.pm. +\& entering Config::TIEHASH +\& entering Exporter::import +\& entering Exporter::export +\& entering Config::myconfig +\& entering Config::FETCH +\& entering Config::FETCH +\& entering Config::FETCH +\& entering Config::FETCH +.Ve +.IP 2. 4 +.Vb 10 +\& entering main::BEGIN +\& entering Config::BEGIN +\& Package lib/Exporter.pm. +\& Package lib/Carp.pm. +\& exited Config::BEGIN +\& Package lib/Config.pm. +\& entering Config::TIEHASH +\& exited Config::TIEHASH +\& entering Exporter::import +\& entering Exporter::export +\& exited Exporter::export +\& exited Exporter::import +\& exited main::BEGIN +\& entering Config::myconfig +\& entering Config::FETCH +\& exited Config::FETCH +\& entering Config::FETCH +\& exited Config::FETCH +\& entering Config::FETCH +.Ve +.IP 3. 4 +.Vb 10 +\& in $=main::BEGIN() from /dev/null:0 +\& in $=Config::BEGIN() from lib/Config.pm:2 +\& Package lib/Exporter.pm. +\& Package lib/Carp.pm. +\& Package lib/Config.pm. +\& in $=Config::TIEHASH(\*(AqConfig\*(Aq) from lib/Config.pm:644 +\& in $=Exporter::import(\*(AqConfig\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from /dev/null:0 +\& in $=Exporter::export(\*(AqConfig\*(Aq, \*(Aqmain\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from li +\& in @=Config::myconfig() from /dev/null:0 +\& in $=Config::FETCH(ref(Config), \*(Aqpackage\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(ref(Config), \*(Aqbaserev\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(ref(Config), \*(AqPERL_VERSION\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(ref(Config), \*(AqPERL_SUBVERSION\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(ref(Config), \*(Aqosname\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(ref(Config), \*(Aqosvers\*(Aq) from lib/Config.pm:574 +.Ve +.IP 4. 4 +.Vb 10 +\& in $=main::BEGIN() from /dev/null:0 +\& in $=Config::BEGIN() from lib/Config.pm:2 +\& Package lib/Exporter.pm. +\& Package lib/Carp.pm. +\& out $=Config::BEGIN() from lib/Config.pm:0 +\& Package lib/Config.pm. +\& in $=Config::TIEHASH(\*(AqConfig\*(Aq) from lib/Config.pm:644 +\& out $=Config::TIEHASH(\*(AqConfig\*(Aq) from lib/Config.pm:644 +\& in $=Exporter::import(\*(AqConfig\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from /dev/null:0 +\& in $=Exporter::export(\*(AqConfig\*(Aq, \*(Aqmain\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from lib/ +\& out $=Exporter::export(\*(AqConfig\*(Aq, \*(Aqmain\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from lib/ +\& out $=Exporter::import(\*(AqConfig\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from /dev/null:0 +\& out $=main::BEGIN() from /dev/null:0 +\& in @=Config::myconfig() from /dev/null:0 +\& in $=Config::FETCH(ref(Config), \*(Aqpackage\*(Aq) from lib/Config.pm:574 +\& out $=Config::FETCH(ref(Config), \*(Aqpackage\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(ref(Config), \*(Aqbaserev\*(Aq) from lib/Config.pm:574 +\& out $=Config::FETCH(ref(Config), \*(Aqbaserev\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(ref(Config), \*(AqPERL_VERSION\*(Aq) from lib/Config.pm:574 +\& out $=Config::FETCH(ref(Config), \*(AqPERL_VERSION\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(ref(Config), \*(AqPERL_SUBVERSION\*(Aq) from lib/Config.pm:574 +.Ve +.IP 5. 4 +.Vb 10 +\& in $=main::BEGIN() from /dev/null:0 +\& in $=Config::BEGIN() from lib/Config.pm:2 +\& Package lib/Exporter.pm. +\& Package lib/Carp.pm. +\& out $=Config::BEGIN() from lib/Config.pm:0 +\& Package lib/Config.pm. +\& in $=Config::TIEHASH(\*(AqConfig\*(Aq) from lib/Config.pm:644 +\& out $=Config::TIEHASH(\*(AqConfig\*(Aq) from lib/Config.pm:644 +\& in $=Exporter::import(\*(AqConfig\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from /dev/null:0 +\& in $=Exporter::export(\*(AqConfig\*(Aq, \*(Aqmain\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from lib/E +\& out $=Exporter::export(\*(AqConfig\*(Aq, \*(Aqmain\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from lib/E +\& out $=Exporter::import(\*(AqConfig\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from /dev/null:0 +\& out $=main::BEGIN() from /dev/null:0 +\& in @=Config::myconfig() from /dev/null:0 +\& in $=Config::FETCH(\*(AqConfig=HASH(0x1aa444)\*(Aq, \*(Aqpackage\*(Aq) from lib/Config.pm:574 +\& out $=Config::FETCH(\*(AqConfig=HASH(0x1aa444)\*(Aq, \*(Aqpackage\*(Aq) from lib/Config.pm:574 +\& in $=Config::FETCH(\*(AqConfig=HASH(0x1aa444)\*(Aq, \*(Aqbaserev\*(Aq) from lib/Config.pm:574 +\& out $=Config::FETCH(\*(AqConfig=HASH(0x1aa444)\*(Aq, \*(Aqbaserev\*(Aq) from lib/Config.pm:574 +.Ve +.IP 6. 4 +.Vb 10 +\& in $=CODE(0x15eca4)() from /dev/null:0 +\& in $=CODE(0x182528)() from lib/Config.pm:2 +\& Package lib/Exporter.pm. +\& out $=CODE(0x182528)() from lib/Config.pm:0 +\& scalar context return from CODE(0x182528): undef +\& Package lib/Config.pm. +\& in $=Config::TIEHASH(\*(AqConfig\*(Aq) from lib/Config.pm:628 +\& out $=Config::TIEHASH(\*(AqConfig\*(Aq) from lib/Config.pm:628 +\& scalar context return from Config::TIEHASH: empty hash +\& in $=Exporter::import(\*(AqConfig\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from /dev/null:0 +\& in $=Exporter::export(\*(AqConfig\*(Aq, \*(Aqmain\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from lib/Exporter.pm:171 +\& out $=Exporter::export(\*(AqConfig\*(Aq, \*(Aqmain\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from lib/Exporter.pm:171 +\& scalar context return from Exporter::export: \*(Aq\*(Aq +\& out $=Exporter::import(\*(AqConfig\*(Aq, \*(Aqmyconfig\*(Aq, \*(Aqconfig_vars\*(Aq) from /dev/null:0 +\& scalar context return from Exporter::import: \*(Aq\*(Aq +.Ve +.PP +In all cases shown above, the line indentation shows the call tree. +If bit 2 of \f(CW\*(C`frame\*(C'\fR is set, a line is printed on exit from a +subroutine as well. If bit 4 is set, the arguments are printed +along with the caller info. If bit 8 is set, the arguments are +printed even if they are tied or references. If bit 16 is set, the +return value is printed, too. +.PP +When a package is compiled, a line like this +.PP +.Vb 1 +\& Package lib/Carp.pm. +.Ve +.PP +is printed with proper indentation. +.SH "Debugging Regular Expressions" +.IX Header "Debugging Regular Expressions" +There are two ways to enable debugging output for regular expressions. +.PP +If your perl is compiled with \f(CW\*(C`\-DDEBUGGING\*(C'\fR, you may use the +\&\fB\-Dr\fR flag on the command line, and \f(CW\*(C`\-Drv\*(C'\fR for more verbose +information. +.PP +Otherwise, one can \f(CW\*(C`use re \*(Aqdebug\*(Aq\*(C'\fR, which has effects at both +compile time and run time. Since Perl 5.9.5, this pragma is lexically +scoped. +.SS "Compile-time Output" +.IX Subsection "Compile-time Output" +The debugging output at compile time looks like this: +.PP +.Vb 10 +\& Compiling REx \*(Aq[bc]d(ef*g)+h[ij]k$\*(Aq +\& size 45 Got 364 bytes for offset annotations. +\& first at 1 +\& rarest char g at 0 +\& rarest char d at 0 +\& 1: ANYOF[bc](12) +\& 12: EXACT <d>(14) +\& 14: CURLYX[0] {1,32767}(28) +\& 16: OPEN1(18) +\& 18: EXACT <e>(20) +\& 20: STAR(23) +\& 21: EXACT <f>(0) +\& 23: EXACT <g>(25) +\& 25: CLOSE1(27) +\& 27: WHILEM[1/1](0) +\& 28: NOTHING(29) +\& 29: EXACT <h>(31) +\& 31: ANYOF[ij](42) +\& 42: EXACT <k>(44) +\& 44: EOL(45) +\& 45: END(0) +\& anchored \*(Aqde\*(Aq at 1 floating \*(Aqgh\*(Aq at 3..2147483647 (checking floating) +\& stclass \*(AqANYOF[bc]\*(Aq minlen 7 +\& Offsets: [45] +\& 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] +\& 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] +\& 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] +\& 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] +\& Omitting $\` $& $\*(Aq support. +.Ve +.PP +The first line shows the pre-compiled form of the regex. The second +shows the size of the compiled form (in arbitrary units, usually +4\-byte words) and the total number of bytes allocated for the +offset/length table, usually 4+\f(CW\*(C`size\*(C'\fR*8. The next line shows the +label \fIid\fR of the first node that does a match. +.PP +The +.PP +.Vb 2 +\& anchored \*(Aqde\*(Aq at 1 floating \*(Aqgh\*(Aq at 3..2147483647 (checking floating) +\& stclass \*(AqANYOF[bc]\*(Aq minlen 7 +.Ve +.PP +line (split into two lines above) contains optimizer +information. In the example shown, the optimizer found that the match +should contain a substring \f(CW\*(C`de\*(C'\fR at offset 1, plus substring \f(CW\*(C`gh\*(C'\fR +at some offset between 3 and infinity. Moreover, when checking for +these substrings (to abandon impossible matches quickly), Perl will check +for the substring \f(CW\*(C`gh\*(C'\fR before checking for the substring \f(CW\*(C`de\*(C'\fR. The +optimizer may also use the knowledge that the match starts (at the +\&\f(CW\*(C`first\*(C'\fR \fIid\fR) with a character class, and no string +shorter than 7 characters can possibly match. +.PP +The fields of interest which may appear in this line are +.ie n .IP """anchored"" \fISTRING\fR ""at"" \fIPOS\fR" 4 +.el .IP "\f(CWanchored\fR \fISTRING\fR \f(CWat\fR \fIPOS\fR" 4 +.IX Item "anchored STRING at POS" +.PD 0 +.ie n .IP """floating"" \fISTRING\fR ""at"" \fIPOS1..POS2\fR" 4 +.el .IP "\f(CWfloating\fR \fISTRING\fR \f(CWat\fR \fIPOS1..POS2\fR" 4 +.IX Item "floating STRING at POS1..POS2" +.PD +See above. +.ie n .IP """matching floating/anchored""" 4 +.el .IP "\f(CWmatching floating/anchored\fR" 4 +.IX Item "matching floating/anchored" +Which substring to check first. +.ie n .IP """minlen""" 4 +.el .IP \f(CWminlen\fR 4 +.IX Item "minlen" +The minimal length of the match. +.ie n .IP """stclass"" \fITYPE\fR" 4 +.el .IP "\f(CWstclass\fR \fITYPE\fR" 4 +.IX Item "stclass TYPE" +Type of first matching node. +.ie n .IP """noscan""" 4 +.el .IP \f(CWnoscan\fR 4 +.IX Item "noscan" +Don't scan for the found substrings. +.ie n .IP """isall""" 4 +.el .IP \f(CWisall\fR 4 +.IX Item "isall" +Means that the optimizer information is all that the regular +expression contains, and thus one does not need to enter the regex engine at +all. +.ie n .IP """GPOS""" 4 +.el .IP \f(CWGPOS\fR 4 +.IX Item "GPOS" +Set if the pattern contains \f(CW\*(C`\eG\*(C'\fR. +.ie n .IP """plus""" 4 +.el .IP \f(CWplus\fR 4 +.IX Item "plus" +Set if the pattern starts with a repeated char (as in \f(CW\*(C`x+y\*(C'\fR). +.ie n .IP """implicit""" 4 +.el .IP \f(CWimplicit\fR 4 +.IX Item "implicit" +Set if the pattern starts with \f(CW\*(C`.*\*(C'\fR. +.ie n .IP """with eval""" 4 +.el .IP "\f(CWwith eval\fR" 4 +.IX Item "with eval" +Set if the pattern contain eval-groups, such as \f(CW\*(C`(?{ code })\*(C'\fR and +\&\f(CW\*(C`(??{ code })\*(C'\fR. +.ie n .IP anchored(TYPE) 4 +.el .IP \f(CWanchored(TYPE)\fR 4 +.IX Item "anchored(TYPE)" +If the pattern may match only at a handful of places, with \f(CW\*(C`TYPE\*(C'\fR +being \f(CW\*(C`SBOL\*(C'\fR, \f(CW\*(C`MBOL\*(C'\fR, or \f(CW\*(C`GPOS\*(C'\fR. See the table below. +.PP +If a substring is known to match at end-of-line only, it may be +followed by \f(CW\*(C`$\*(C'\fR, as in \f(CW\*(C`floating \*(Aqk\*(Aq$\*(C'\fR. +.PP +The optimizer-specific information is used to avoid entering (a slow) regex +engine on strings that will not definitely match. If the \f(CW\*(C`isall\*(C'\fR flag +is set, a call to the regex engine may be avoided even when the optimizer +found an appropriate place for the match. +.PP +Above the optimizer section is the list of \fInodes\fR of the compiled +form of the regex. Each line has format +.PP +\&\f(CW\*(C` \*(C'\fR\fIid\fR: \fITYPE\fR \fIOPTIONAL-INFO\fR (\fInext-id\fR) +.SS "Types of Nodes" +.IX Subsection "Types of Nodes" +Here are the current possible types, with short descriptions: +.PP +.Vb 1 +\& # TYPE arg\-description [regnode\-struct\-suffix] [longjump\-len] DESCRIPTION +\& +\& # Exit points +\& +\& END no End of program. +\& SUCCEED no Return from a subroutine, basically. +\& +\& # Line Start Anchors: +\& SBOL no Match "" at beginning of line: /^/, /\eA/ +\& MBOL no Same, assuming multiline: /^/m +\& +\& # Line End Anchors: +\& SEOL no Match "" at end of line: /$/ +\& MEOL no Same, assuming multiline: /$/m +\& EOS no Match "" at end of string: /\ez/ +\& +\& # Match Start Anchors: +\& GPOS no Matches where last m//g left off. +\& +\& # Word Boundary Opcodes: +\& BOUND no Like BOUNDA for non\-utf8, otherwise like +\& BOUNDU +\& BOUNDL no Like BOUND/BOUNDU, but \ew and \eW are +\& defined by current locale +\& BOUNDU no Match "" at any boundary of a given type +\& using /u rules. +\& BOUNDA no Match "" at any boundary between \ew\eW or +\& \eW\ew, where \ew is [_a\-zA\-Z0\-9] +\& NBOUND no Like NBOUNDA for non\-utf8, otherwise like +\& BOUNDU +\& NBOUNDL no Like NBOUND/NBOUNDU, but \ew and \eW are +\& defined by current locale +\& NBOUNDU no Match "" at any non\-boundary of a given +\& type using using /u rules. +\& NBOUNDA no Match "" betweeen any \ew\ew or \eW\eW, where +\& \ew is [_a\-zA\-Z0\-9] +\& +\& # [Special] alternatives: +\& REG_ANY no Match any one character (except newline). +\& SANY no Match any one character. +\& ANYOF sv Match character in (or not in) this class, +\& charclass single char match only +\& ANYOFD sv Like ANYOF, but /d is in effect +\& charclass +\& ANYOFL sv Like ANYOF, but /l is in effect +\& charclass +\& ANYOFPOSIXL sv Like ANYOFL, but matches [[:posix:]] +\& charclass_ classes +\& posixl +\& +\& ANYOFH sv 1 Like ANYOF, but only has "High" matches, +\& none in the bitmap; the flags field +\& contains the lowest matchable UTF\-8 start +\& byte +\& ANYOFHb sv 1 Like ANYOFH, but all matches share the same +\& UTF\-8 start byte, given in the flags field +\& ANYOFHr sv 1 Like ANYOFH, but the flags field contains +\& packed bounds for all matchable UTF\-8 start +\& bytes. +\& ANYOFHs sv:str 1 Like ANYOFHb, but has a string field that +\& gives the leading matchable UTF\-8 bytes; +\& flags field is len +\& ANYOFR packed 1 Matches any character in the range given by +\& its packed args: upper 12 bits is the max +\& delta from the base lower 20; the flags +\& field contains the lowest matchable UTF\-8 +\& start byte +\& ANYOFRb packed 1 Like ANYOFR, but all matches share the same +\& UTF\-8 start byte, given in the flags field +\& +\& ANYOFHbbm none bbm Like ANYOFHb, but only for 2\-byte UTF\-8 +\& characters; uses a bitmap to match the +\& continuation byte +\& +\& ANYOFM byte 1 Like ANYOF, but matches an invariant byte +\& as determined by the mask and arg +\& NANYOFM byte 1 complement of ANYOFM +\& +\& # POSIX Character Classes: +\& POSIXD none Some [[:class:]] under /d; the FLAGS field +\& gives which one +\& POSIXL none Some [[:class:]] under /l; the FLAGS field +\& gives which one +\& POSIXU none Some [[:class:]] under /u; the FLAGS field +\& gives which one +\& POSIXA none Some [[:class:]] under /a; the FLAGS field +\& gives which one +\& NPOSIXD none complement of POSIXD, [[:^class:]] +\& NPOSIXL none complement of POSIXL, [[:^class:]] +\& NPOSIXU none complement of POSIXU, [[:^class:]] +\& NPOSIXA none complement of POSIXA, [[:^class:]] +\& +\& CLUMP no Match any extended grapheme cluster +\& sequence +\& +\& # Alternation +\& +\& # BRANCH The set of branches constituting a single choice are +\& # hooked together with their "next" pointers, since +\& # precedence prevents anything being concatenated to +\& # any individual branch. The "next" pointer of the last +\& # BRANCH in a choice points to the thing following the +\& # whole choice. This is also where the final "next" +\& # pointer of each individual branch points; each branch +\& # starts with the operand node of a BRANCH node. +\& # +\& BRANCH node 1 Match this alternative, or the next... +\& +\& # Literals +\& +\& EXACT str Match this string (flags field is the +\& length). +\& +\& # In a long string node, the U32 argument is the length, and is +\& # immediately followed by the string. +\& LEXACT len:str 1 Match this long string (preceded by length; +\& flags unused). +\& EXACTL str Like EXACT, but /l is in effect (used so +\& locale\-related warnings can be checked for) +\& EXACTF str Like EXACT, but match using /id rules; +\& (string not UTF\-8, ASCII folded; non\-ASCII +\& not) +\& EXACTFL str Like EXACT, but match using /il rules; +\& (string not likely to be folded) +\& EXACTFU str Like EXACT, but match using /iu rules; +\& (string folded) +\& +\& EXACTFAA str Like EXACT, but match using /iaa rules; +\& (string folded except MICRO in non\-UTF8 +\& patterns; doesn\*(Aqt contain SHARP S unless +\& UTF\-8; folded length <= unfolded) +\& EXACTFAA_NO_TRIE str Like EXACTFAA, (string not UTF\-8, folded +\& except: MICRO, SHARP S; folded length <= +\& unfolded, not currently trie\-able) +\& +\& EXACTFUP str Like EXACT, but match using /iu rules; +\& (string not UTF\-8, folded except MICRO: +\& hence Problematic) +\& +\& EXACTFLU8 str Like EXACTFU, but use /il, UTF\-8, (string +\& is folded, and everything in it is above +\& 255 +\& EXACT_REQ8 str Like EXACT, but only UTF\-8 encoded targets +\& can match +\& LEXACT_REQ8 len:str 1 Like LEXACT, but only UTF\-8 encoded targets +\& can match +\& EXACTFU_REQ8 str Like EXACTFU, but only UTF\-8 encoded +\& targets can match +\& +\& EXACTFU_S_EDGE str /di rules, but nothing in it precludes /ui, +\& except begins and/or ends with [Ss]; +\& (string not UTF\-8; compile\-time only) +\& +\& # New charclass like patterns +\& LNBREAK none generic newline pattern +\& +\& # Trie Related +\& +\& # Behave the same as A|LIST|OF|WORDS would. The \*(Aq..C\*(Aq variants +\& # have inline charclass data (ascii only), the \*(AqC\*(Aq store it in the +\& # structure. +\& +\& TRIE trie 1 Match many EXACT(F[ALU]?)? at once. +\& flags==type +\& TRIEC trie Same as TRIE, but with embedded charclass +\& charclass data +\& +\& AHOCORASICK trie 1 Aho Corasick stclass. flags==type +\& AHOCORASICKC trie Same as AHOCORASICK, but with embedded +\& charclass charclass data +\& +\& # Do nothing types +\& +\& NOTHING no Match empty string. +\& # A variant of above which delimits a group, thus stops optimizations +\& TAIL no Match empty string. Can jump here from +\& outside. +\& +\& # Loops +\& +\& # STAR,PLUS \*(Aq?\*(Aq, and complex \*(Aq*\*(Aq and \*(Aq+\*(Aq, are implemented as +\& # circular BRANCH structures. Simple cases +\& # (one character per match) are implemented with STAR +\& # and PLUS for speed and to minimize recursive plunges. +\& # +\& STAR node Match this (simple) thing 0 or more times: +\& /A{0,}B/ where A is width 1 char +\& PLUS node Match this (simple) thing 1 or more times: +\& /A{1,}B/ where A is width 1 char +\& +\& CURLY sv 3 Match this (simple) thing {n,m} times: +\& /A{m,n}B/ where A is width 1 char +\& CURLYN no 3 Capture next\-after\-this simple thing: +\& /(A){m,n}B/ where A is width 1 char +\& CURLYM no 3 Capture this medium\-complex thing {n,m} +\& times: /(A){m,n}B/ where A is fixed\-length +\& CURLYX sv 3 Match/Capture this complex thing {n,m} +\& times. +\& +\& # This terminator creates a loop structure for CURLYX +\& WHILEM no Do curly processing and see if rest +\& matches. +\& +\& # Buffer related +\& +\& # OPEN,CLOSE,GROUPP ...are numbered at compile time. +\& OPEN num 1 Mark this point in input as start of #n. +\& CLOSE num 1 Close corresponding OPEN of #n. +\& SROPEN none Same as OPEN, but for script run +\& SRCLOSE none Close preceding SROPEN +\& +\& REF num 2 Match some already matched string +\& REFF num 2 Match already matched string, using /di +\& rules. +\& REFFL num 2 Match already matched string, using /li +\& rules. +\& REFFU num 2 Match already matched string, usng /ui. +\& REFFA num 2 Match already matched string, using /aai +\& rules. +\& +\& # Named references. Code in regcomp.c assumes that these all are after +\& # the numbered references +\& REFN no\-sv 2 Match some already matched string +\& REFFN no\-sv 2 Match already matched string, using /di +\& rules. +\& REFFLN no\-sv 2 Match already matched string, using /li +\& rules. +\& REFFUN num 2 Match already matched string, using /ui +\& rules. +\& REFFAN num 2 Match already matched string, using /aai +\& rules. +\& +\& # Support for long RE +\& LONGJMP off 1 1 Jump far away. +\& BRANCHJ off 2 1 BRANCH with long offset. +\& +\& # Special Case Regops +\& IFMATCH off 1 1 Succeeds if the following matches; non\-zero +\& flags "f", next_off "o" means lookbehind +\& assertion starting "f..(f\-o)" characters +\& before current +\& UNLESSM off 1 1 Fails if the following matches; non\-zero +\& flags "f", next_off "o" means lookbehind +\& assertion starting "f..(f\-o)" characters +\& before current +\& SUSPEND off 1 1 "Independent" sub\-RE. +\& IFTHEN off 1 1 Switch, should be preceded by switcher. +\& GROUPP num 1 Whether the group matched. +\& +\& # The heavy worker +\& +\& EVAL evl/flags Execute some Perl code. +\& 2 +\& +\& # Modifiers +\& +\& MINMOD no Next operator is not greedy. +\& LOGICAL no Next opcode should set the flag only. +\& +\& # This is not used yet +\& RENUM off 1 1 Group with independently numbered parens. +\& +\& # Regex Subroutines +\& GOSUB num/ofs 2 recurse to paren arg1 at (signed) ofs arg2 +\& +\& # Special conditionals +\& GROUPPN no\-sv 1 Whether the group matched. +\& INSUBP num 1 Whether we are in a specific recurse. +\& DEFINEP none 1 Never execute directly. +\& +\& # Backtracking Verbs +\& ENDLIKE none Used only for the type field of verbs +\& OPFAIL no\-sv 1 Same as (?!), but with verb arg +\& ACCEPT no\-sv/num Accepts the current matched string, with +\& 2 verbar +\& +\& # Verbs With Arguments +\& VERB no\-sv 1 Used only for the type field of verbs +\& PRUNE no\-sv 1 Pattern fails at this startpoint if no\- +\& backtracking through this +\& MARKPOINT no\-sv 1 Push the current location for rollback by +\& cut. +\& SKIP no\-sv 1 On failure skip forward (to the mark) +\& before retrying +\& COMMIT no\-sv 1 Pattern fails outright if backtracking +\& through this +\& CUTGROUP no\-sv 1 On failure go to the next alternation in +\& the group +\& +\& # Control what to keep in $&. +\& KEEPS no $& begins here. +\& +\& # Validate that lookbehind IFMATCH and UNLESSM end at the right place +\& LOOKBEHIND_END no Return from lookbehind (IFMATCH/UNLESSM) +\& and validate position +\& +\& # SPECIAL REGOPS +\& +\& # This is not really a node, but an optimized away piece of a "long" +\& # node. To simplify debugging output, we mark it as if it were a node +\& OPTIMIZED off Placeholder for dump. +\& +\& # Special opcode with the property that no opcode in a compiled program +\& # will ever be of this type. Thus it can be used as a flag value that +\& # no other opcode has been seen. END is used similarly, in that an END +\& # node cant be optimized. So END implies "unoptimizable" and PSEUDO +\& # mean "not seen anything to optimize yet". +\& PSEUDO off Pseudo opcode for internal use. +\& +\& REGEX_SET depth p Regex set, temporary node used in pre\- +\& optimization compilation +.Ve +.PP +Following the optimizer information is a dump of the offset/length +table, here split across several lines: +.PP +.Vb 5 +\& Offsets: [45] +\& 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] +\& 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] +\& 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] +\& 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] +.Ve +.PP +The first line here indicates that the offset/length table contains 45 +entries. Each entry is a pair of integers, denoted by \f(CW\*(C`offset[length]\*(C'\fR. +Entries are numbered starting with 1, so entry #1 here is \f(CW\*(C`1[4]\*(C'\fR and +entry #12 is \f(CW\*(C`5[1]\*(C'\fR. \f(CW\*(C`1[4]\*(C'\fR indicates that the node labeled \f(CW\*(C`1:\*(C'\fR +(the \f(CW\*(C`1: ANYOF[bc]\*(C'\fR) begins at character position 1 in the +pre-compiled form of the regex, and has a length of 4 characters. +\&\f(CW\*(C`5[1]\*(C'\fR in position 12 +indicates that the node labeled \f(CW\*(C`12:\*(C'\fR +(the \f(CW\*(C`12: EXACT <d>\*(C'\fR) begins at character position 5 in the +pre-compiled form of the regex, and has a length of 1 character. +\&\f(CW\*(C`12[1]\*(C'\fR in position 14 +indicates that the node labeled \f(CW\*(C`14:\*(C'\fR +(the \f(CW\*(C`14: CURLYX[0] {1,32767}\*(C'\fR) begins at character position 12 in the +pre-compiled form of the regex, and has a length of 1 character\-\-\-that +is, it corresponds to the \f(CW\*(C`+\*(C'\fR symbol in the precompiled regex. +.PP +\&\f(CW\*(C`0[0]\*(C'\fR items indicate that there is no corresponding node. +.SS "Run-time Output" +.IX Subsection "Run-time Output" +First of all, when doing a match, one may get no run-time output even +if debugging is enabled. This means that the regex engine was never +entered and that all of the job was therefore done by the optimizer. +.PP +If the regex engine was entered, the output may look like this: +.PP +.Vb 10 +\& Matching \*(Aq[bc]d(ef*g)+h[ij]k$\*(Aq against \*(Aqabcdefg_\|_gh_\|_\*(Aq +\& Setting an EVAL scope, savestack=3 +\& 2 <ab> <cdefg_\|_gh_> | 1: ANYOF +\& 3 <abc> <defg_\|_gh_> | 11: EXACT <d> +\& 4 <abcd> <efg_\|_gh_> | 13: CURLYX {1,32767} +\& 4 <abcd> <efg_\|_gh_> | 26: WHILEM +\& 0 out of 1..32767 cc=effff31c +\& 4 <abcd> <efg_\|_gh_> | 15: OPEN1 +\& 4 <abcd> <efg_\|_gh_> | 17: EXACT <e> +\& 5 <abcde> <fg_\|_gh_> | 19: STAR +\& EXACT <f> can match 1 times out of 32767... +\& Setting an EVAL scope, savestack=3 +\& 6 <bcdef> <g_\|_gh_\|_> | 22: EXACT <g> +\& 7 <bcdefg> <_\|_gh_\|_> | 24: CLOSE1 +\& 7 <bcdefg> <_\|_gh_\|_> | 26: WHILEM +\& 1 out of 1..32767 cc=effff31c +\& Setting an EVAL scope, savestack=12 +\& 7 <bcdefg> <_\|_gh_\|_> | 15: OPEN1 +\& 7 <bcdefg> <_\|_gh_\|_> | 17: EXACT <e> +\& restoring \e1 to 4(4)..7 +\& failed, try continuation... +\& 7 <bcdefg> <_\|_gh_\|_> | 27: NOTHING +\& 7 <bcdefg> <_\|_gh_\|_> | 28: EXACT <h> +\& failed... +\& failed... +.Ve +.PP +The most significant information in the output is about the particular \fInode\fR +of the compiled regex that is currently being tested against the target string. +The format of these lines is +.PP +\&\f(CW\*(C` \*(C'\fR\fISTRING-OFFSET\fR <\fIPRE-STRING\fR> <\fIPOST-STRING\fR> |\fIID\fR: \fITYPE\fR +.PP +The \fITYPE\fR info is indented with respect to the backtracking level. +Other incidental information appears interspersed within. +.SH "Debugging Perl Memory Usage" +.IX Header "Debugging Perl Memory Usage" +Perl is a profligate wastrel when it comes to memory use. There +is a saying that to estimate memory usage of Perl, assume a reasonable +algorithm for memory allocation, multiply that estimate by 10, and +while you still may miss the mark, at least you won't be quite so +astonished. This is not absolutely true, but may provide a good +grasp of what happens. +.PP +Assume that an integer cannot take less than 20 bytes of memory, a +float cannot take less than 24 bytes, a string cannot take less +than 32 bytes (all these examples assume 32\-bit architectures, the +result are quite a bit worse on 64\-bit architectures). If a variable +is accessed in two of three different ways (which require an integer, +a float, or a string), the memory footprint may increase yet another +20 bytes. A sloppy \fBmalloc\fR\|(3) implementation can inflate these +numbers dramatically. +.PP +On the opposite end of the scale, a declaration like +.PP +.Vb 1 +\& sub foo; +.Ve +.PP +may take up to 500 bytes of memory, depending on which release of Perl +you're running. +.PP +Anecdotal estimates of source-to-compiled code bloat suggest an +eightfold increase. This means that the compiled form of reasonable +(normally commented, properly indented etc.) code will take +about eight times more space in memory than the code took +on disk. +.PP +The \fB\-DL\fR command-line switch is obsolete since circa Perl 5.6.0 +(it was available only if Perl was built with \f(CW\*(C`\-DDEBUGGING\*(C'\fR). +The switch was used to track Perl's memory allocations and possible +memory leaks. These days the use of malloc debugging tools like +\&\fIPurify\fR or \fIvalgrind\fR is suggested instead. See also +"PERL_MEM_LOG" in perlhacktips. +.PP +One way to find out how much memory is being used by Perl data +structures is to install the Devel::Size module from CPAN: it gives +you the minimum number of bytes required to store a particular data +structure. Please be mindful of the difference between the \fBsize()\fR +and \fBtotal_size()\fR. +.PP +If Perl has been compiled using Perl's malloc you can analyze Perl +memory usage by setting \f(CW$ENV\fR{PERL_DEBUG_MSTATS}. +.ie n .SS "Using $ENV{PERL_DEBUG_MSTATS}" +.el .SS "Using \f(CW$ENV{PERL_DEBUG_MSTATS}\fP" +.IX Subsection "Using $ENV{PERL_DEBUG_MSTATS}" +If your perl is using Perl's \fBmalloc()\fR and was compiled with the +necessary switches (this is the default), then it will print memory +usage statistics after compiling your code when \f(CW\*(C`$ENV{PERL_DEBUG_MSTATS} +> 1\*(C'\fR, and before termination of the program when \f(CW\*(C`$ENV{PERL_DEBUG_MSTATS} >= 1\*(C'\fR. The report format is similar to +the following example: +.PP +.Vb 10 +\& $ PERL_DEBUG_MSTATS=2 perl \-e "require Carp" +\& Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) +\& 14216 free: 130 117 28 7 9 0 2 2 1 0 0 +\& 437 61 36 0 5 +\& 60924 used: 125 137 161 55 7 8 6 16 2 0 1 +\& 74 109 304 84 20 +\& Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. +\& Memory allocation statistics after execution: (buckets 4(4)..8188(8192) +\& 30888 free: 245 78 85 13 6 2 1 3 2 0 1 +\& 315 162 39 42 11 +\& 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 +\& 196 178 1066 798 39 +\& Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. +.Ve +.PP +It is possible to ask for such a statistic at arbitrary points in +your execution using the \fBmstat()\fR function out of the standard +Devel::Peek module. +.PP +Here is some explanation of that format: +.ie n .IP """buckets SMALLEST(APPROX)..GREATEST(APPROX)""" 4 +.el .IP "\f(CWbuckets SMALLEST(APPROX)..GREATEST(APPROX)\fR" 4 +.IX Item "buckets SMALLEST(APPROX)..GREATEST(APPROX)" +Perl's \fBmalloc()\fR uses bucketed allocations. Every request is rounded +up to the closest bucket size available, and a bucket is taken from +the pool of buckets of that size. +.Sp +The line above describes the limits of buckets currently in use. +Each bucket has two sizes: memory footprint and the maximal size +of user data that can fit into this bucket. Suppose in the above +example that the smallest bucket were size 4. The biggest bucket +would have usable size 8188, and the memory footprint would be 8192. +.Sp +In a Perl built for debugging, some buckets may have negative usable +size. This means that these buckets cannot (and will not) be used. +For larger buckets, the memory footprint may be one page greater +than a power of 2. If so, the corresponding power of two is +printed in the \f(CW\*(C`APPROX\*(C'\fR field above. +.IP Free/Used 4 +.IX Item "Free/Used" +The 1 or 2 rows of numbers following that correspond to the number +of buckets of each size between \f(CW\*(C`SMALLEST\*(C'\fR and \f(CW\*(C`GREATEST\*(C'\fR. In +the first row, the sizes (memory footprints) of buckets are powers +of two\-\-or possibly one page greater. In the second row, if present, +the memory footprints of the buckets are between the memory footprints +of two buckets "above". +.Sp +For example, suppose under the previous example, the memory footprints +were +.Sp +.Vb 2 +\& free: 8 16 32 64 128 256 512 1024 2048 4096 8192 +\& 4 12 24 48 80 +.Ve +.Sp +With a non\-\f(CW\*(C`DEBUGGING\*(C'\fR perl, the buckets starting from \f(CW128\fR have +a 4\-byte overhead, and thus an 8192\-long bucket may take up to +8188\-byte allocations. +.ie n .IP """Total sbrk(): SBRKed/SBRKs:CONTINUOUS""" 4 +.el .IP "\f(CWTotal sbrk(): SBRKed/SBRKs:CONTINUOUS\fR" 4 +.IX Item "Total sbrk(): SBRKed/SBRKs:CONTINUOUS" +The first two fields give the total amount of memory perl \fBsbrk\fR\|(2)ed +(ess-broken? :\-) and number of \fBsbrk\fR\|(2)s used. The third number is +what perl thinks about continuity of returned chunks. So long as +this number is positive, \fBmalloc()\fR will assume that it is probable +that \fBsbrk\fR\|(2) will provide continuous memory. +.Sp +Memory allocated by external libraries is not counted. +.ie n .IP """pad: 0""" 4 +.el .IP "\f(CWpad: 0\fR" 4 +.IX Item "pad: 0" +The amount of \fBsbrk\fR\|(2)ed memory needed to keep buckets aligned. +.ie n .IP """heads: 2192""" 4 +.el .IP "\f(CWheads: 2192\fR" 4 +.IX Item "heads: 2192" +Although memory overhead of bigger buckets is kept inside the bucket, for +smaller buckets, it is kept in separate areas. This field gives the +total size of these areas. +.ie n .IP """chain: 0""" 4 +.el .IP "\f(CWchain: 0\fR" 4 +.IX Item "chain: 0" +\&\fBmalloc()\fR may want to subdivide a bigger bucket into smaller buckets. +If only a part of the deceased bucket is left unsubdivided, the rest +is kept as an element of a linked list. This field gives the total +size of these chunks. +.ie n .IP """tail: 6144""" 4 +.el .IP "\f(CWtail: 6144\fR" 4 +.IX Item "tail: 6144" +To minimize the number of \fBsbrk\fR\|(2)s, \fBmalloc()\fR asks for more memory. This +field gives the size of the yet unused part, which is \fBsbrk\fR\|(2)ed, but +never touched. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +perldebug, +perl5db.pl, +perlguts, +perlrun, +re, +and +Devel::DProf. |