diff options
Diffstat (limited to 'upstream/fedora-40/man1/perlperf.1')
-rw-r--r-- | upstream/fedora-40/man1/perlperf.1 | 1292 |
1 files changed, 1292 insertions, 0 deletions
diff --git a/upstream/fedora-40/man1/perlperf.1 b/upstream/fedora-40/man1/perlperf.1 new file mode 100644 index 00000000..ec602ed3 --- /dev/null +++ b/upstream/fedora-40/man1/perlperf.1 @@ -0,0 +1,1292 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLPERF 1" +.TH PERLPERF 1 2024-01-25 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perlperf \- Perl Performance and Optimization Techniques +.SH DESCRIPTION +.IX Header "DESCRIPTION" +This is an introduction to the use of performance and optimization techniques +which can be used with particular reference to perl programs. While many perl +developers have come from other languages, and can use their prior knowledge +where appropriate, there are many other people who might benefit from a few +perl specific pointers. If you want the condensed version, perhaps the best +advice comes from the renowned Japanese Samurai, Miyamoto Musashi, who said: +.PP +.Vb 1 +\& "Do Not Engage in Useless Activity" +.Ve +.PP +in 1645. +.SH OVERVIEW +.IX Header "OVERVIEW" +Perhaps the most common mistake programmers make is to attempt to optimize +their code before a program actually does anything useful \- this is a bad idea. +There's no point in having an extremely fast program that doesn't work. The +first job is to get a program to \fIcorrectly\fR do something \fBuseful\fR, (not to +mention ensuring the test suite is fully functional), and only then to consider +optimizing it. Having decided to optimize existing working code, there are +several simple but essential steps to consider which are intrinsic to any +optimization process. +.SS "ONE STEP SIDEWAYS" +.IX Subsection "ONE STEP SIDEWAYS" +Firstly, you need to establish a baseline time for the existing code, which +timing needs to be reliable and repeatable. You'll probably want to use the +\&\f(CW\*(C`Benchmark\*(C'\fR or \f(CW\*(C`Devel::NYTProf\*(C'\fR modules, or something similar, for this step, +or perhaps the Unix system \f(CW\*(C`time\*(C'\fR utility, whichever is appropriate. See the +base of this document for a longer list of benchmarking and profiling modules, +and recommended further reading. +.SS "ONE STEP FORWARD" +.IX Subsection "ONE STEP FORWARD" +Next, having examined the program for \fIhot spots\fR, (places where the code +seems to run slowly), change the code with the intention of making it run +faster. Using version control software, like \f(CW\*(C`subversion\*(C'\fR, will ensure no +changes are irreversible. It's too easy to fiddle here and fiddle there \- +don't change too much at any one time or you might not discover which piece of +code \fBreally\fR was the slow bit. +.SS "ANOTHER STEP SIDEWAYS" +.IX Subsection "ANOTHER STEP SIDEWAYS" +It's not enough to say: "that will make it run faster", you have to check it. +Rerun the code under control of the benchmarking or profiling modules, from the +first step above, and check that the new code executed the \fBsame task\fR in +\&\fIless time\fR. Save your work and repeat... +.SH "GENERAL GUIDELINES" +.IX Header "GENERAL GUIDELINES" +The critical thing when considering performance is to remember there is no such +thing as a \f(CW\*(C`Golden Bullet\*(C'\fR, which is why there are no rules, only guidelines. +.PP +It is clear that inline code is going to be faster than subroutine or method +calls, because there is less overhead, but this approach has the disadvantage +of being less maintainable and comes at the cost of greater memory usage \- +there is no such thing as a free lunch. If you are searching for an element in +a list, it can be more efficient to store the data in a hash structure, and +then simply look to see whether the key is defined, rather than to loop through +the entire array using \fBgrep()\fR for instance. \fBsubstr()\fR may be (a lot) faster +than \fBgrep()\fR but not as flexible, so you have another trade-off to access. Your +code may contain a line which takes 0.01 of a second to execute which if you +call it 1,000 times, quite likely in a program parsing even medium sized files +for instance, you already have a 10 second delay, in just one single code +location, and if you call that line 100,000 times, your entire program will +slow down to an unbearable crawl. +.PP +Using a subroutine as part of your sort is a powerful way to get exactly what +you want, but will usually be slower than the built-in \fIalphabetic\fR \f(CW\*(C`cmp\*(C'\fR and +\&\fInumeric\fR \f(CW\*(C`<=>\*(C'\fR sort operators. It is possible to make multiple +passes over your data, building indices to make the upcoming sort more +efficient, and to use what is known as the \f(CW\*(C`OM\*(C'\fR (Orcish Maneuver) to cache the +sort keys in advance. The cache lookup, while a good idea, can itself be a +source of slowdown by enforcing a double pass over the data \- once to setup the +cache, and once to sort the data. Using \f(CWpack()\fR to extract the required sort +key into a consistent string can be an efficient way to build a single string +to compare, instead of using multiple sort keys, which makes it possible to use +the standard, written in \f(CW\*(C`c\*(C'\fR and fast, perl \f(CWsort()\fR function on the output, +and is the basis of the \f(CW\*(C`GRT\*(C'\fR (Guttman Rossler Transform). Some string +combinations can slow the \f(CW\*(C`GRT\*(C'\fR down, by just being too plain complex for its +own good. +.PP +For applications using database backends, the standard \f(CW\*(C`DBIx\*(C'\fR namespace has +tries to help with keeping things nippy, not least because it tries to \fInot\fR +query the database until the latest possible moment, but always read the docs +which come with your choice of libraries. Among the many issues facing +developers dealing with databases should remain aware of is to always use +\&\f(CW\*(C`SQL\*(C'\fR placeholders and to consider pre-fetching data sets when this might +prove advantageous. Splitting up a large file by assigning multiple processes +to parsing a single file, using say \f(CW\*(C`POE\*(C'\fR, \f(CW\*(C`threads\*(C'\fR or \f(CW\*(C`fork\*(C'\fR can also be a +useful way of optimizing your usage of the available \f(CW\*(C`CPU\*(C'\fR resources, though +this technique is fraught with concurrency issues and demands high attention to +detail. +.PP +Every case has a specific application and one or more exceptions, and there is +no replacement for running a few tests and finding out which method works best +for your particular environment, this is why writing optimal code is not an +exact science, and why we love using Perl so much \- TMTOWTDI. +.SH BENCHMARKS +.IX Header "BENCHMARKS" +Here are a few examples to demonstrate usage of Perl's benchmarking tools. +.SS "Assigning and Dereferencing Variables." +.IX Subsection "Assigning and Dereferencing Variables." +I'm sure most of us have seen code which looks like, (or worse than), this: +.PP +.Vb 2 +\& if ( $obj\->{_ref}\->{_myscore} >= $obj\->{_ref}\->{_yourscore} ) { +\& ... +.Ve +.PP +This sort of code can be a real eyesore to read, as well as being very +sensitive to typos, and it's much clearer to dereference the variable +explicitly. We're side-stepping the issue of working with object-oriented +programming techniques to encapsulate variable access via methods, only +accessible through an object. Here we're just discussing the technical +implementation of choice, and whether this has an effect on performance. We +can see whether this dereferencing operation, has any overhead by putting +comparative code in a file and running a \f(CW\*(C`Benchmark\*(C'\fR test. +.PP +# dereference +.PP +.Vb 1 +\& #!/usr/bin/perl +\& +\& use v5.36; +\& +\& use Benchmark; +\& +\& my $ref = { +\& \*(Aqref\*(Aq => { +\& _myscore => \*(Aq100 + 1\*(Aq, +\& _yourscore => \*(Aq102 \- 1\*(Aq, +\& }, +\& }; +\& +\& timethese(1000000, { +\& \*(Aqdirect\*(Aq => sub { +\& my $x = $ref\->{ref}\->{_myscore} . $ref\->{ref}\->{_yourscore} ; +\& }, +\& \*(Aqdereference\*(Aq => sub { +\& my $ref = $ref\->{ref}; +\& my $myscore = $ref\->{_myscore}; +\& my $yourscore = $ref\->{_yourscore}; +\& my $x = $myscore . $yourscore; +\& }, +\& }); +.Ve +.PP +It's essential to run any timing measurements a sufficient number of times so +the numbers settle on a numerical average, otherwise each run will naturally +fluctuate due to variations in the environment, to reduce the effect of +contention for \f(CW\*(C`CPU\*(C'\fR resources and network bandwidth for instance. Running +the above code for one million iterations, we can take a look at the report +output by the \f(CW\*(C`Benchmark\*(C'\fR module, to see which approach is the most effective. +.PP +.Vb 1 +\& $> perl dereference +\& +\& Benchmark: timing 1000000 iterations of dereference, direct... +\& dereference: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 628930.82/s (n=1000000) +\& direct: 1 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 833333.33/s (n=1000000) +.Ve +.PP +The difference is clear to see and the dereferencing approach is slower. While +it managed to execute an average of 628,930 times a second during our test, the +direct approach managed to run an additional 204,403 times, unfortunately. +Unfortunately, because there are many examples of code written using the +multiple layer direct variable access, and it's usually horrible. It is, +however, minusculy faster. The question remains whether the minute gain is +actually worth the eyestrain, or the loss of maintainability. +.SS "Search and replace or tr" +.IX Subsection "Search and replace or tr" +If we have a string which needs to be modified, while a regex will almost +always be much more flexible, \f(CW\*(C`tr\*(C'\fR, an oft underused tool, can still be a +useful. One scenario might be replace all vowels with another character. The +regex solution might look like this: +.PP +.Vb 1 +\& $str =~ s/[aeiou]/x/g +.Ve +.PP +The \f(CW\*(C`tr\*(C'\fR alternative might look like this: +.PP +.Vb 1 +\& $str =~ tr/aeiou/xxxxx/ +.Ve +.PP +We can put that into a test file which we can run to check which approach is +the fastest, using a global \f(CW$STR\fR variable to assign to the \f(CW\*(C`my $str\*(C'\fR +variable so as to avoid perl trying to optimize any of the work away by +noticing it's assigned only the once. +.PP +# regex-transliterate +.PP +.Vb 1 +\& #!/usr/bin/perl +\& +\& use v5.36; +\& +\& use Benchmark; +\& +\& my $STR = "$$\-this and that"; +\& +\& timethese( 1000000, { +\& \*(Aqsr\*(Aq => sub { my $str = $STR; $str =~ s/[aeiou]/x/g; return $str; }, +\& \*(Aqtr\*(Aq => sub { my $str = $STR; $str =~ tr/aeiou/xxxxx/; return $str; }, +\& }); +.Ve +.PP +Running the code gives us our results: +.PP +.Vb 1 +\& $> perl regex\-transliterate +\& +\& Benchmark: timing 1000000 iterations of sr, tr... +\& sr: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 840336.13/s (n=1000000) +\& tr: 0 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 2040816.33/s (n=1000000) +.Ve +.PP +The \f(CW\*(C`tr\*(C'\fR version is a clear winner. One solution is flexible, the other is +fast \- and it's appropriately the programmer's choice which to use. +.PP +Check the \f(CW\*(C`Benchmark\*(C'\fR docs for further useful techniques. +.SH "PROFILING TOOLS" +.IX Header "PROFILING TOOLS" +A slightly larger piece of code will provide something on which a profiler can +produce more extensive reporting statistics. This example uses the simplistic +\&\f(CW\*(C`wordmatch\*(C'\fR program which parses a given input file and spews out a short +report on the contents. +.PP +# wordmatch +.PP +.Vb 1 +\& #!/usr/bin/perl +\& +\& use v5.36; +\& +\& =head1 NAME +\& +\& filewords \- word analysis of input file +\& +\& =head1 SYNOPSIS +\& +\& filewords \-f inputfilename [\-d] +\& +\& =head1 DESCRIPTION +\& +\& This program parses the given filename, specified with C<\-f>, and +\& displays a simple analysis of the words found therein. Use the C<\-d> +\& switch to enable debugging messages. +\& +\& =cut +\& +\& use FileHandle; +\& use Getopt::Long; +\& +\& my $debug = 0; +\& my $file = \*(Aq\*(Aq; +\& +\& my $result = GetOptions ( +\& \*(Aqdebug\*(Aq => \e$debug, +\& \*(Aqfile=s\*(Aq => \e$file, +\& ); +\& die("invalid args") unless $result; +\& +\& unless ( \-f $file ) { +\& die("Usage: $0 \-f filename [\-d]"); +\& } +\& my $FH = FileHandle\->new("< $file") +\& or die("unable to open file($file): $!"); +\& +\& my $i_LINES = 0; +\& my $i_WORDS = 0; +\& my %count = (); +\& +\& my @lines = <$FH>; +\& foreach my $line ( @lines ) { +\& $i_LINES++; +\& $line =~ s/\en//; +\& my @words = split(/ +/, $line); +\& my $i_words = scalar(@words); +\& $i_WORDS = $i_WORDS + $i_words; +\& debug("line: $i_LINES supplying $i_words words: @words"); +\& my $i_word = 0; +\& foreach my $word ( @words ) { +\& $i_word++; +\& $count{$i_LINES}{spec} += matches($i_word, $word, +\& \*(Aq[^a\-zA\-Z0\-9]\*(Aq); +\& $count{$i_LINES}{only} += matches($i_word, $word, +\& \*(Aq^[^a\-zA\-Z0\-9]+$\*(Aq); +\& $count{$i_LINES}{cons} += matches($i_word, $word, +\& \*(Aq^[(?i:bcdfghjklmnpqrstvwxyz)]+$\*(Aq); +\& $count{$i_LINES}{vows} += matches($i_word, $word, +\& \*(Aq^[(?i:aeiou)]+$\*(Aq); +\& $count{$i_LINES}{caps} += matches($i_word, $word, +\& \*(Aq^[(A\-Z)]+$\*(Aq); +\& } +\& } +\& +\& print report( %count ); +\& +\& sub matches { +\& my $i_wd = shift; +\& my $word = shift; +\& my $regex = shift; +\& my $has = 0; +\& +\& if ( $word =~ /($regex)/ ) { +\& $has++ if $1; +\& } +\& +\& debug( "word: $i_wd " +\& . ($has ? \*(Aqmatches\*(Aq : \*(Aqdoes not match\*(Aq) +\& . " chars: /$regex/"); +\& +\& return $has; +\& } +\& +\& sub report { +\& my %report = @_; +\& my %rep; +\& +\& foreach my $line ( keys %report ) { +\& foreach my $key ( keys $report{$line}\->%* ) { +\& $rep{$key} += $report{$line}{$key}; +\& } +\& } +\& +\& my $report = qq| +\& $0 report for $file: +\& lines in file: $i_LINES +\& words in file: $i_WORDS +\& words with special (non\-word) characters: $i_spec +\& words with only special (non\-word) characters: $i_only +\& words with only consonants: $i_cons +\& words with only capital letters: $i_caps +\& words with only vowels: $i_vows +\& |; +\& +\& return $report; +\& } +\& +\& sub debug { +\& my $message = shift; +\& +\& if ( $debug ) { +\& print STDERR "DBG: $message\en"; +\& } +\& } +\& +\& exit 0; +.Ve +.SS Devel::DProf +.IX Subsection "Devel::DProf" +This venerable module has been the de-facto standard for Perl code profiling +for more than a decade, but has been replaced by a number of other modules +which have brought us back to the 21st century. Although you're recommended to +evaluate your tool from the several mentioned here and from the CPAN list at +the base of this document, (and currently Devel::NYTProf seems to be the +weapon of choice \- see below), we'll take a quick look at the output from +Devel::DProf first, to set a baseline for Perl profiling tools. Run the +above program under the control of \f(CW\*(C`Devel::DProf\*(C'\fR by using the \f(CW\*(C`\-d\*(C'\fR switch on +the command-line. +.PP +.Vb 1 +\& $> perl \-d:DProf wordmatch \-f perl5db.pl +\& +\& <...multiple lines snipped...> +\& +\& wordmatch report for perl5db.pl: +\& lines in file: 9428 +\& words in file: 50243 +\& words with special (non\-word) characters: 20480 +\& words with only special (non\-word) characters: 7790 +\& words with only consonants: 4801 +\& words with only capital letters: 1316 +\& words with only vowels: 1701 +.Ve +.PP +\&\f(CW\*(C`Devel::DProf\*(C'\fR produces a special file, called \fItmon.out\fR by default, and +this file is read by the \f(CW\*(C`dprofpp\*(C'\fR program, which is already installed as part +of the \f(CW\*(C`Devel::DProf\*(C'\fR distribution. If you call \f(CW\*(C`dprofpp\*(C'\fR with no options, +it will read the \fItmon.out\fR file in the current directory and produce a human +readable statistics report of the run of your program. Note that this may take +a little time. +.PP +.Vb 1 +\& $> dprofpp +\& +\& Total Elapsed Time = 2.951677 Seconds +\& User+System Time = 2.871677 Seconds +\& Exclusive Times +\& %Time ExclSec CumulS #Calls sec/call Csec/c Name +\& 102. 2.945 3.003 251215 0.0000 0.0000 main::matches +\& 2.40 0.069 0.069 260643 0.0000 0.0000 main::debug +\& 1.74 0.050 0.050 1 0.0500 0.0500 main::report +\& 1.04 0.030 0.049 4 0.0075 0.0123 main::BEGIN +\& 0.35 0.010 0.010 3 0.0033 0.0033 Exporter::as_heavy +\& 0.35 0.010 0.010 7 0.0014 0.0014 IO::File::BEGIN +\& 0.00 \- \-0.000 1 \- \- Getopt::Long::FindOption +\& 0.00 \- \-0.000 1 \- \- Symbol::BEGIN +\& 0.00 \- \-0.000 1 \- \- Fcntl::BEGIN +\& 0.00 \- \-0.000 1 \- \- Fcntl::bootstrap +\& 0.00 \- \-0.000 1 \- \- warnings::BEGIN +\& 0.00 \- \-0.000 1 \- \- IO::bootstrap +\& 0.00 \- \-0.000 1 \- \- Getopt::Long::ConfigDefaults +\& 0.00 \- \-0.000 1 \- \- Getopt::Long::Configure +\& 0.00 \- \-0.000 1 \- \- Symbol::gensym +.Ve +.PP +\&\f(CW\*(C`dprofpp\*(C'\fR will produce some quite detailed reporting on the activity of the +\&\f(CW\*(C`wordmatch\*(C'\fR program. The wallclock, user and system, times are at the top of +the analysis, and after this are the main columns defining which define the +report. Check the \f(CW\*(C`dprofpp\*(C'\fR docs for details of the many options it supports. +.PP +See also \f(CW\*(C`Apache::DProf\*(C'\fR which hooks \f(CW\*(C`Devel::DProf\*(C'\fR into \f(CW\*(C`mod_perl\*(C'\fR. +.SS Devel::Profiler +.IX Subsection "Devel::Profiler" +Let's take a look at the same program using a different profiler: +\&\f(CW\*(C`Devel::Profiler\*(C'\fR, a drop-in Perl-only replacement for \f(CW\*(C`Devel::DProf\*(C'\fR. The +usage is very slightly different in that instead of using the special \f(CW\*(C`\-d:\*(C'\fR +flag, you pull \f(CW\*(C`Devel::Profiler\*(C'\fR in directly as a module using \f(CW\*(C`\-M\*(C'\fR. +.PP +.Vb 1 +\& $> perl \-MDevel::Profiler wordmatch \-f perl5db.pl +\& +\& <...multiple lines snipped...> +\& +\& wordmatch report for perl5db.pl: +\& lines in file: 9428 +\& words in file: 50243 +\& words with special (non\-word) characters: 20480 +\& words with only special (non\-word) characters: 7790 +\& words with only consonants: 4801 +\& words with only capital letters: 1316 +\& words with only vowels: 1701 +.Ve +.PP +\&\f(CW\*(C`Devel::Profiler\*(C'\fR generates a tmon.out file which is compatible with the +\&\f(CW\*(C`dprofpp\*(C'\fR program, thus saving the construction of a dedicated statistics +reader program. \f(CW\*(C`dprofpp\*(C'\fR usage is therefore identical to the above example. +.PP +.Vb 1 +\& $> dprofpp +\& +\& Total Elapsed Time = 20.984 Seconds +\& User+System Time = 19.981 Seconds +\& Exclusive Times +\& %Time ExclSec CumulS #Calls sec/call Csec/c Name +\& 49.0 9.792 14.509 251215 0.0000 0.0001 main::matches +\& 24.4 4.887 4.887 260643 0.0000 0.0000 main::debug +\& 0.25 0.049 0.049 1 0.0490 0.0490 main::report +\& 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::GetOptions +\& 0.00 0.000 0.000 2 0.0000 0.0000 Getopt::Long::ParseOptionSpec +\& 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::FindOption +\& 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::new +\& 0.00 0.000 0.000 1 0.0000 0.0000 IO::Handle::new +\& 0.00 0.000 0.000 1 0.0000 0.0000 Symbol::gensym +\& 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::open +.Ve +.PP +Interestingly we get slightly different results, which is mostly because the +algorithm which generates the report is different, even though the output file +format was allegedly identical. The elapsed, user and system times are clearly +showing the time it took for \f(CW\*(C`Devel::Profiler\*(C'\fR to execute its own run, but +the column listings feel more accurate somehow than the ones we had earlier +from \f(CW\*(C`Devel::DProf\*(C'\fR. The 102% figure has disappeared, for example. This is +where we have to use the tools at our disposal, and recognise their pros and +cons, before using them. Interestingly, the numbers of calls for each +subroutine are identical in the two reports, it's the percentages which differ. +As the author of \f(CW\*(C`Devel::Profiler\*(C'\fR writes: +.PP +.Vb 6 +\& ...running HTML::Template\*(Aqs test suite under Devel::DProf shows +\& output() taking NO time but Devel::Profiler shows around 10% of the +\& time is in output(). I don\*(Aqt know which to trust but my gut tells me +\& something is wrong with Devel::DProf. HTML::Template::output() is a +\& big routine that\*(Aqs called for every test. Either way, something needs +\& fixing. +.Ve +.PP +YMMV. +.PP +See also \f(CW\*(C`Devel::Apache::Profiler\*(C'\fR which hooks \f(CW\*(C`Devel::Profiler\*(C'\fR +into \f(CW\*(C`mod_perl\*(C'\fR. +.SS Devel::SmallProf +.IX Subsection "Devel::SmallProf" +The \f(CW\*(C`Devel::SmallProf\*(C'\fR profiler examines the runtime of your Perl program and +produces a line-by-line listing to show how many times each line was called, +and how long each line took to execute. It is called by supplying the familiar +\&\f(CW\*(C`\-d\*(C'\fR flag to Perl at runtime. +.PP +.Vb 1 +\& $> perl \-d:SmallProf wordmatch \-f perl5db.pl +\& +\& <...multiple lines snipped...> +\& +\& wordmatch report for perl5db.pl: +\& lines in file: 9428 +\& words in file: 50243 +\& words with special (non\-word) characters: 20480 +\& words with only special (non\-word) characters: 7790 +\& words with only consonants: 4801 +\& words with only capital letters: 1316 +\& words with only vowels: 1701 +.Ve +.PP +\&\f(CW\*(C`Devel::SmallProf\*(C'\fR writes its output into a file called \fIsmallprof.out\fR, by +default. The format of the file looks like this: +.PP +.Vb 1 +\& <num> <time> <ctime> <line>:<text> +.Ve +.PP +When the program has terminated, the output may be examined and sorted using +any standard text filtering utilities. Something like the following may be +sufficient: +.PP +.Vb 1 +\& $> cat smallprof.out | grep \ed*: | sort \-k3 | tac | head \-n20 +\& +\& 251215 1.65674 7.68000 75: if ( $word =~ /($regex)/ ) { +\& 251215 0.03264 4.40000 79: debug("word: $i_wd ".($has ? \*(Aqmatches\*(Aq : +\& 251215 0.02693 4.10000 81: return $has; +\& 260643 0.02841 4.07000 128: if ( $debug ) { +\& 260643 0.02601 4.04000 126: my $message = shift; +\& 251215 0.02641 3.91000 73: my $has = 0; +\& 251215 0.03311 3.71000 70: my $i_wd = shift; +\& 251215 0.02699 3.69000 72: my $regex = shift; +\& 251215 0.02766 3.68000 71: my $word = shift; +\& 50243 0.59726 1.00000 59: $count{$i_LINES}{cons} = +\& 50243 0.48175 0.92000 61: $count{$i_LINES}{spec} = +\& 50243 0.00644 0.89000 56: my $i_cons = matches($i_word, $word, +\& 50243 0.48837 0.88000 63: $count{$i_LINES}{caps} = +\& 50243 0.00516 0.88000 58: my $i_caps = matches($i_word, $word, \*(Aq^[(A\- +\& 50243 0.00631 0.81000 54: my $i_spec = matches($i_word, $word, \*(Aq[^a\- +\& 50243 0.00496 0.80000 57: my $i_vows = matches($i_word, $word, +\& 50243 0.00688 0.80000 53: $i_word++; +\& 50243 0.48469 0.79000 62: $count{$i_LINES}{only} = +\& 50243 0.48928 0.77000 60: $count{$i_LINES}{vows} = +\& 50243 0.00683 0.75000 55: my $i_only = matches($i_word, $word, \*(Aq^[^a\- +.Ve +.PP +You can immediately see a slightly different focus to the subroutine profiling +modules, and we start to see exactly which line of code is taking the most +time. That regex line is looking a bit suspicious, for example. Remember that +these tools are supposed to be used together, there is no single best way to +profile your code, you need to use the best tools for the job. +.PP +See also \f(CW\*(C`Apache::SmallProf\*(C'\fR which hooks \f(CW\*(C`Devel::SmallProf\*(C'\fR into +\&\f(CW\*(C`mod_perl\*(C'\fR. +.SS Devel::FastProf +.IX Subsection "Devel::FastProf" +\&\f(CW\*(C`Devel::FastProf\*(C'\fR is another Perl line profiler. This was written with a view +to getting a faster line profiler, than is possible with for example +\&\f(CW\*(C`Devel::SmallProf\*(C'\fR, because it's written in \f(CW\*(C`C\*(C'\fR. To use \f(CW\*(C`Devel::FastProf\*(C'\fR, +supply the \f(CW\*(C`\-d\*(C'\fR argument to Perl: +.PP +.Vb 1 +\& $> perl \-d:FastProf wordmatch \-f perl5db.pl +\& +\& <...multiple lines snipped...> +\& +\& wordmatch report for perl5db.pl: +\& lines in file: 9428 +\& words in file: 50243 +\& words with special (non\-word) characters: 20480 +\& words with only special (non\-word) characters: 7790 +\& words with only consonants: 4801 +\& words with only capital letters: 1316 +\& words with only vowels: 1701 +.Ve +.PP +\&\f(CW\*(C`Devel::FastProf\*(C'\fR writes statistics to the file \fIfastprof.out\fR in the current +directory. The output file, which can be specified, can be interpreted by using +the \f(CW\*(C`fprofpp\*(C'\fR command-line program. +.PP +.Vb 1 +\& $> fprofpp | head \-n20 +\& +\& # fprofpp output format is: +\& # filename:line time count: source +\& wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) { +\& wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? \*(Aqmatches\*(Aq : \*(Aqdoes not match\*(Aq)." chars: /$regex/"); +\& wordmatch:81 1.47604 251215: return $has; +\& wordmatch:126 1.43441 260643: my $message = shift; +\& wordmatch:128 1.42156 260643: if ( $debug ) { +\& wordmatch:70 1.36824 251215: my $i_wd = shift; +\& wordmatch:71 1.36739 251215: my $word = shift; +\& wordmatch:72 1.35939 251215: my $regex = shift; +.Ve +.PP +Straightaway we can see that the number of times each line has been called is +identical to the \f(CW\*(C`Devel::SmallProf\*(C'\fR output, and the sequence is only very +slightly different based on the ordering of the amount of time each line took +to execute, \f(CW\*(C`if ( $debug ) { \*(C'\fR and \f(CW\*(C`my $message = shift;\*(C'\fR, for example. The +differences in the actual times recorded might be in the algorithm used +internally, or it could be due to system resource limitations or contention. +.PP +See also the DBIx::Profile which will profile database queries running +under the \f(CW\*(C`DBIx::*\*(C'\fR namespace. +.SS Devel::NYTProf +.IX Subsection "Devel::NYTProf" +\&\f(CW\*(C`Devel::NYTProf\*(C'\fR is the \fBnext generation\fR of Perl code profiler, fixing many +shortcomings in other tools and implementing many cool features. First of all it +can be used as either a \fIline\fR profiler, a \fIblock\fR or a \fIsubroutine\fR +profiler, all at once. It can also use sub-microsecond (100ns) resolution on +systems which provide \f(CWclock_gettime()\fR. It can be started and stopped even +by the program being profiled. It's a one-line entry to profile \f(CW\*(C`mod_perl\*(C'\fR +applications. It's written in \f(CW\*(C`c\*(C'\fR and is probably the fastest profiler +available for Perl. The list of coolness just goes on. Enough of that, let's +see how to it works \- just use the familiar \f(CW\*(C`\-d\*(C'\fR switch to plug it in and run +the code. +.PP +.Vb 1 +\& $> perl \-d:NYTProf wordmatch \-f perl5db.pl +\& +\& wordmatch report for perl5db.pl: +\& lines in file: 9427 +\& words in file: 50243 +\& words with special (non\-word) characters: 20480 +\& words with only special (non\-word) characters: 7790 +\& words with only consonants: 4801 +\& words with only capital letters: 1316 +\& words with only vowels: 1701 +.Ve +.PP +\&\f(CW\*(C`NYTProf\*(C'\fR will generate a report database into the file \fInytprof.out\fR by +default. Human readable reports can be generated from here by using the +supplied \f(CW\*(C`nytprofhtml\*(C'\fR (HTML output) and \f(CW\*(C`nytprofcsv\*(C'\fR (CSV output) programs. +We've used the Unix system \f(CW\*(C`html2text\*(C'\fR utility to convert the +\&\fInytprof/index.html\fR file for convenience here. +.PP +.Vb 1 +\& $> html2text nytprof/index.html +\& +\& Performance Profile Index +\& For wordmatch +\& Run on Fri Sep 26 13:46:39 2008 +\& Reported on Fri Sep 26 13:47:23 2008 +\& +\& Top 15 Subroutines \-\- ordered by exclusive time +\& |Calls |P |F |Inclusive|Exclusive|Subroutine | +\& | | | |Time |Time | | +\& |251215|5 |1 |13.09263 |10.47692 |main:: |matches | +\& |260642|2 |1 |2.71199 |2.71199 |main:: |debug | +\& |1 |1 |1 |0.21404 |0.21404 |main:: |report | +\& |2 |2 |2 |0.00511 |0.00511 |XSLoader:: |load (xsub) | +\& |14 |14|7 |0.00304 |0.00298 |Exporter:: |import | +\& |3 |1 |1 |0.00265 |0.00254 |Exporter:: |as_heavy | +\& |10 |10|4 |0.00140 |0.00140 |vars:: |import | +\& |13 |13|1 |0.00129 |0.00109 |constant:: |import | +\& |1 |1 |1 |0.00360 |0.00096 |FileHandle:: |import | +\& |3 |3 |3 |0.00086 |0.00074 |warnings::register::|import | +\& |9 |3 |1 |0.00036 |0.00036 |strict:: |bits | +\& |13 |13|13|0.00032 |0.00029 |strict:: |import | +\& |2 |2 |2 |0.00020 |0.00020 |warnings:: |import | +\& |2 |1 |1 |0.00020 |0.00020 |Getopt::Long:: |ParseOptionSpec| +\& |7 |7 |6 |0.00043 |0.00020 |strict:: |unimport | +\& +\& For more information see the full list of 189 subroutines. +.Ve +.PP +The first part of the report already shows the critical information regarding +which subroutines are using the most time. The next gives some statistics +about the source files profiled. +.PP +.Vb 10 +\& Source Code Files \-\- ordered by exclusive time then name +\& |Stmts |Exclusive|Avg. |Reports |Source File | +\& | |Time | | | | +\& |2699761|15.66654 |6e\-06 |line . block . sub|wordmatch | +\& |35 |0.02187 |0.00062|line . block . sub|IO/Handle.pm | +\& |274 |0.01525 |0.00006|line . block . sub|Getopt/Long.pm | +\& |20 |0.00585 |0.00029|line . block . sub|Fcntl.pm | +\& |128 |0.00340 |0.00003|line . block . sub|Exporter/Heavy.pm | +\& |42 |0.00332 |0.00008|line . block . sub|IO/File.pm | +\& |261 |0.00308 |0.00001|line . block . sub|Exporter.pm | +\& |323 |0.00248 |8e\-06 |line . block . sub|constant.pm | +\& |12 |0.00246 |0.00021|line . block . sub|File/Spec/Unix.pm | +\& |191 |0.00240 |0.00001|line . block . sub|vars.pm | +\& |77 |0.00201 |0.00003|line . block . sub|FileHandle.pm | +\& |12 |0.00198 |0.00016|line . block . sub|Carp.pm | +\& |14 |0.00175 |0.00013|line . block . sub|Symbol.pm | +\& |15 |0.00130 |0.00009|line . block . sub|IO.pm | +\& |22 |0.00120 |0.00005|line . block . sub|IO/Seekable.pm | +\& |198 |0.00085 |4e\-06 |line . block . sub|warnings/register.pm| +\& |114 |0.00080 |7e\-06 |line . block . sub|strict.pm | +\& |47 |0.00068 |0.00001|line . block . sub|warnings.pm | +\& |27 |0.00054 |0.00002|line . block . sub|overload.pm | +\& |9 |0.00047 |0.00005|line . block . sub|SelectSaver.pm | +\& |13 |0.00045 |0.00003|line . block . sub|File/Spec.pm | +\& |2701595|15.73869 | |Total | +\& |128647 |0.74946 | |Average | +\& | |0.00201 |0.00003|Median | +\& | |0.00121 |0.00003|Deviation | +\& +\& Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and +\& Adam Kaplan. +.Ve +.PP +At this point, if you're using the \fIhtml\fR report, you can click through the +various links to bore down into each subroutine and each line of code. Because +we're using the text reporting here, and there's a whole directory full of +reports built for each source file, we'll just display a part of the +corresponding \fIwordmatch\-line.html\fR file, sufficient to give an idea of the +sort of output you can expect from this cool tool. +.PP +.Vb 1 +\& $> html2text nytprof/wordmatch\-line.html +\& +\& Performance Profile \-\- \-block view\-.\-line view\-.\-sub view\- +\& For wordmatch +\& Run on Fri Sep 26 13:46:39 2008 +\& Reported on Fri Sep 26 13:47:22 2008 +\& +\& File wordmatch +\& +\& Subroutines \-\- ordered by exclusive time +\& |Calls |P|F|Inclusive|Exclusive|Subroutine | +\& | | | |Time |Time | | +\& |251215|5|1|13.09263 |10.47692 |main::|matches| +\& |260642|2|1|2.71199 |2.71199 |main::|debug | +\& |1 |1|1|0.21404 |0.21404 |main::|report | +\& |0 |0|0|0 |0 |main::|BEGIN | +\& +\& +\& |Line|Stmts.|Exclusive|Avg. |Code | +\& | | |Time | | | +\& |1 | | | |#!/usr/bin/perl | +\& |2 | | | | | +\& | | | | |use strict; | +\& |3 |3 |0.00086 |0.00029|# spent 0.00003s making 1 calls to strict:: | +\& | | | | |import | +\& | | | | |use warnings; | +\& |4 |3 |0.01563 |0.00521|# spent 0.00012s making 1 calls to warnings:: | +\& | | | | |import | +\& |5 | | | | | +\& |6 | | | |=head1 NAME | +\& |7 | | | | | +\& |8 | | | |filewords \- word analysis of input file | +\& <...snip...> +\& |62 |1 |0.00445 |0.00445|print report( %count ); | +\& | | | | |# spent 0.21404s making 1 calls to main::report| +\& |63 | | | | | +\& | | | | |# spent 23.56955s (10.47692+2.61571) within | +\& | | | | |main::matches which was called 251215 times, | +\& | | | | |avg 0.00005s/call: # 50243 times | +\& | | | | |(2.12134+0.51939s) at line 57 of wordmatch, avg| +\& | | | | |0.00005s/call # 50243 times (2.17735+0.54550s) | +\& |64 | | | |at line 56 of wordmatch, avg 0.00005s/call # | +\& | | | | |50243 times (2.10992+0.51797s) at line 58 of | +\& | | | | |wordmatch, avg 0.00005s/call # 50243 times | +\& | | | | |(2.12696+0.51598s) at line 55 of wordmatch, avg| +\& | | | | |0.00005s/call # 50243 times (1.94134+0.51687s) | +\& | | | | |at line 54 of wordmatch, avg 0.00005s/call | +\& | | | | |sub matches { | +\& <...snip...> +\& |102 | | | | | +\& | | | | |# spent 2.71199s within main::debug which was | +\& | | | | |called 260642 times, avg 0.00001s/call: # | +\& | | | | |251215 times (2.61571+0s) by main::matches at | +\& |103 | | | |line 74 of wordmatch, avg 0.00001s/call # 9427 | +\& | | | | |times (0.09628+0s) at line 50 of wordmatch, avg| +\& | | | | |0.00001s/call | +\& | | | | |sub debug { | +\& |104 |260642|0.58496 |2e\-06 |my $message = shift; | +\& |105 | | | | | +\& |106 |260642|1.09917 |4e\-06 |if ( $debug ) { | +\& |107 | | | |print STDERR "DBG: $message\en"; | +\& |108 | | | |} | +\& |109 | | | |} | +\& |110 | | | | | +\& |111 |1 |0.01501 |0.01501|exit 0; | +\& |112 | | | | | +.Ve +.PP +Oodles of very useful information in there \- this seems to be the way forward. +.PP +See also \f(CW\*(C`Devel::NYTProf::Apache\*(C'\fR which hooks \f(CW\*(C`Devel::NYTProf\*(C'\fR into +\&\f(CW\*(C`mod_perl\*(C'\fR. +.SH SORTING +.IX Header "SORTING" +Perl modules are not the only tools a performance analyst has at their +disposal, system tools like \f(CW\*(C`time\*(C'\fR should not be overlooked as the next +example shows, where we take a quick look at sorting. Many books, theses and +articles, have been written about efficient sorting algorithms, and this is not +the place to repeat such work, there's several good sorting modules which +deserve taking a look at too: \f(CW\*(C`Sort::Maker\*(C'\fR, \f(CW\*(C`Sort::Key\*(C'\fR spring to mind. +However, it's still possible to make some observations on certain Perl specific +interpretations on issues relating to sorting data sets and give an example or +two with regard to how sorting large data volumes can effect performance. +Firstly, an often overlooked point when sorting large amounts of data, one can +attempt to reduce the data set to be dealt with and in many cases \f(CWgrep()\fR can +be quite useful as a simple filter: +.PP +.Vb 1 +\& @data = sort grep { /$filter/ } @incoming +.Ve +.PP +A command such as this can vastly reduce the volume of material to actually +sort through in the first place, and should not be too lightly disregarded +purely on the basis of its simplicity. The \f(CW\*(C`KISS\*(C'\fR principle is too often +overlooked \- the next example uses the simple system \f(CW\*(C`time\*(C'\fR utility to +demonstrate. Let's take a look at an actual example of sorting the contents of +a large file, an apache logfile would do. This one has over a quarter of a +million lines, is 50M in size, and a snippet of it looks like this: +.PP +# logfile +.PP +.Vb 10 +\& 188.209\-65\-87.adsl\-dyn.isp.belgacom.be \- \- [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "\-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" +\& 188.209\-65\-87.adsl\-dyn.isp.belgacom.be \- \- [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "\-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" +\& 151.56.71.198 \- \- [08/Feb/2007:12:57:41 +0000] "GET /suse\-on\-vaio.html HTTP/1.1" 200 2858 "http://www.linux\-on\-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en\-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" +\& 151.56.71.198 \- \- [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse\-on\-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en\-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" +\& 151.56.71.198 \- \- [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "\-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en\-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" +\& 217.113.68.60 \- \- [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 \- "\-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" +\& 217.113.68.60 \- \- [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" +\& debora.to.isac.cnr.it \- \- [08/Feb/2007:13:03:58 +0000] "GET /suse\-on\-vaio.html HTTP/1.1" 200 2858 "http://www.linux\-on\-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" +\& debora.to.isac.cnr.it \- \- [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse\-on\-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" +\& debora.to.isac.cnr.it \- \- [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "\-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)" +\& 195.24.196.99 \- \- [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "\-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" +\& 195.24.196.99 \- \- [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" +\& 195.24.196.99 \- \- [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "\-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9" +\& crawl1.cosmixcorp.com \- \- [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "\-" "voyager/1.0" +\& crawl1.cosmixcorp.com \- \- [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "\-" "voyager/1.0" +\& fhm226.internetdsl.tpnet.pl \- \- [08/Feb/2007:13:37:32 +0000] "GET /suse\-on\-vaio.html HTTP/1.1" 200 2858 "http://www.linux\-on\-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" +\& fhm226.internetdsl.tpnet.pl \- \- [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse\-on\-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" +\& 80.247.140.134 \- \- [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "\-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" +\& 80.247.140.134 \- \- [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" +\& pop.compuscan.co.za \- \- [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "\-" "www.clamav.net" +\& livebot\-207\-46\-98\-57.search.live.com \- \- [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "\-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" +\& livebot\-207\-46\-98\-57.search.live.com \- \- [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "\-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" +\& dslb\-088\-064\-005\-154.pools.arcor\-ip.net \- \- [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "\-" "www.clamav.net" +\& 196.201.92.41 \- \- [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "\-" "MOT\-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP\-2.0 Configuration/CLDC\-1.1" +.Ve +.PP +The specific task here is to sort the 286,525 lines of this file by Response +Code, Query, Browser, Referring Url, and lastly Date. One solution might be to +use the following code, which iterates over the files given on the +command-line. +.PP +# sort-apache-log +.PP +.Vb 1 +\& #!/usr/bin/perl \-n +\& +\& use v5.36; +\& +\& my @data; +\& +\& LINE: +\& while ( <> ) { +\& my $line = $_; +\& if ( +\& $line =~ m/^( +\& ([\ew\e.\e\-]+) # client +\& \es*\-\es*\-\es*\e[ +\& ([^]]+) # date +\& \e]\es*"\ew+\es* +\& (\eS+) # query +\& [^"]+"\es* +\& (\ed+) # status +\& \es+\eS+\es+"[^"]*"\es+" +\& ([^"]*) # browser +\& " +\& .* +\& )$/x +\& ) { +\& my @chunks = split(/ +/, $line); +\& my $ip = $1; +\& my $date = $2; +\& my $query = $3; +\& my $status = $4; +\& my $browser = $5; +\& +\& push(@data, [$ip, $date, $query, $status, $browser, $line]); +\& } +\& } +\& +\& my @sorted = sort { +\& $a\->[3] cmp $b\->[3] +\& || +\& $a\->[2] cmp $b\->[2] +\& || +\& $a\->[0] cmp $b\->[0] +\& || +\& $a\->[1] cmp $b\->[1] +\& || +\& $a\->[4] cmp $b\->[4] +\& } @data; +\& +\& foreach my $data ( @sorted ) { +\& print $data\->[5]; +\& } +\& +\& exit 0; +.Ve +.PP +When running this program, redirect \f(CW\*(C`STDOUT\*(C'\fR so it is possible to check the +output is correct from following test runs and use the system \f(CW\*(C`time\*(C'\fR utility +to check the overall runtime. +.PP +.Vb 1 +\& $> time ./sort\-apache\-log logfile > out\-sort +\& +\& real 0m17.371s +\& user 0m15.757s +\& sys 0m0.592s +.Ve +.PP +The program took just over 17 wallclock seconds to run. Note the different +values \f(CW\*(C`time\*(C'\fR outputs, it's important to always use the same one, and to not +confuse what each one means. +.IP "Elapsed Real Time" 4 +.IX Item "Elapsed Real Time" +The overall, or wallclock, time between when \f(CW\*(C`time\*(C'\fR was called, and when it +terminates. The elapsed time includes both user and system times, and time +spent waiting for other users and processes on the system. Inevitably, this is +the most approximate of the measurements given. +.IP "User CPU Time" 4 +.IX Item "User CPU Time" +The user time is the amount of time the entire process spent on behalf of the +user on this system executing this program. +.IP "System CPU Time" 4 +.IX Item "System CPU Time" +The system time is the amount of time the kernel itself spent executing +routines, or system calls, on behalf of this process user. +.PP +Running this same process as a \f(CW\*(C`Schwarzian Transform\*(C'\fR it is possible to +eliminate the input and output arrays for storing all the data, and work on the +input directly as it arrives too. Otherwise, the code looks fairly similar: +.PP +# sort-apache-log-schwarzian +.PP +.Vb 1 +\& #!/usr/bin/perl \-n +\& +\& use v5.36; +\& +\& print +\& +\& map $_\->[0] => +\& +\& sort { +\& $a\->[4] cmp $b\->[4] +\& || +\& $a\->[3] cmp $b\->[3] +\& || +\& $a\->[1] cmp $b\->[1] +\& || +\& $a\->[2] cmp $b\->[2] +\& || +\& $a\->[5] cmp $b\->[5] +\& } +\& map [ $_, m/^( +\& ([\ew\e.\e\-]+) # client +\& \es*\-\es*\-\es*\e[ +\& ([^]]+) # date +\& \e]\es*"\ew+\es* +\& (\eS+) # query +\& [^"]+"\es* +\& (\ed+) # status +\& \es+\eS+\es+"[^"]*"\es+" +\& ([^"]*) # browser +\& " +\& .* +\& )$/xo ] +\& +\& => <>; +\& +\& exit 0; +.Ve +.PP +Run the new code against the same logfile, as above, to check the new time. +.PP +.Vb 1 +\& $> time ./sort\-apache\-log\-schwarzian logfile > out\-schwarz +\& +\& real 0m9.664s +\& user 0m8.873s +\& sys 0m0.704s +.Ve +.PP +The time has been cut in half, which is a respectable speed improvement by any +standard. Naturally, it is important to check the output is consistent with +the first program run, this is where the Unix system \f(CW\*(C`cksum\*(C'\fR utility comes in. +.PP +.Vb 3 +\& $> cksum out\-sort out\-schwarz +\& 3044173777 52029194 out\-sort +\& 3044173777 52029194 out\-schwarz +.Ve +.PP +BTW. Beware too of pressure from managers who see you speed a program up by 50% +of the runtime once, only to get a request one month later to do the same again +(true story) \- you'll just have to point out you're only human, even if you are a +Perl programmer, and you'll see what you can do... +.SH LOGGING +.IX Header "LOGGING" +An essential part of any good development process is appropriate error handling +with appropriately informative messages, however there exists a school of +thought which suggests that log files should be \fIchatty\fR, as if the chain of +unbroken output somehow ensures the survival of the program. If speed is in +any way an issue, this approach is wrong. +.PP +A common sight is code which looks something like this: +.PP +.Vb 2 +\& logger\->debug( "A logging message via process\-id: $$ INC: " +\& . Dumper(\e%INC) ) +.Ve +.PP +The problem is that this code will always be parsed and executed, even when the +debug level set in the logging configuration file is zero. Once the \fBdebug()\fR +subroutine has been entered, and the internal \f(CW$debug\fR variable confirmed to +be zero, for example, the message which has been sent in will be discarded and +the program will continue. In the example given though, the \f(CW\*(C`\e%INC\*(C'\fR hash will +already have been dumped, and the message string constructed, all of which work +could be bypassed by a debug variable at the statement level, like this: +.PP +.Vb 2 +\& logger\->debug( "A logging message via process\-id: $$ INC: " +\& . Dumper(\e%INC) ) if $DEBUG; +.Ve +.PP +This effect can be demonstrated by setting up a test script with both forms, +including a \f(CWdebug()\fR subroutine to emulate typical \f(CWlogger()\fR functionality. +.PP +# ifdebug +.PP +.Vb 1 +\& #!/usr/bin/perl +\& +\& use v5.36; +\& +\& use Benchmark; +\& use Data::Dumper; +\& my $DEBUG = 0; +\& +\& sub debug { +\& my $msg = shift; +\& +\& if ( $DEBUG ) { +\& print "DEBUG: $msg\en"; +\& } +\& }; +\& +\& timethese(100000, { +\& \*(Aqdebug\*(Aq => sub { +\& debug( "A $0 logging message via process\-id: $$" . Dumper(\e%INC) ) +\& }, +\& \*(Aqifdebug\*(Aq => sub { +\& debug( "A $0 logging message via process\-id: $$" . Dumper(\e%INC) ) if $DEBUG +\& }, +\& }); +.Ve +.PP +Let's see what \f(CW\*(C`Benchmark\*(C'\fR makes of this: +.PP +.Vb 5 +\& $> perl ifdebug +\& Benchmark: timing 100000 iterations of constant, sub... +\& ifdebug: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 10000000.00/s (n=100000) +\& (warning: too few iterations for a reliable count) +\& debug: 14 wallclock secs (13.18 usr + 0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000) +.Ve +.PP +In the one case the code, which does exactly the same thing as far as +outputting any debugging information is concerned, in other words nothing, +takes 14 seconds, and in the other case the code takes one hundredth of a +second. Looks fairly definitive. Use a \f(CW$DEBUG\fR variable BEFORE you call the +subroutine, rather than relying on the smart functionality inside it. +.SS "Logging if DEBUG (constant)" +.IX Subsection "Logging if DEBUG (constant)" +It's possible to take the previous idea a little further, by using a compile +time \f(CW\*(C`DEBUG\*(C'\fR constant. +.PP +# ifdebug-constant +.PP +.Vb 1 +\& #!/usr/bin/perl +\& +\& use v5.36; +\& +\& use Benchmark; +\& use Data::Dumper; +\& use constant +\& DEBUG => 0 +\& ; +\& +\& sub debug { +\& if ( DEBUG ) { +\& my $msg = shift; +\& print "DEBUG: $msg\en"; +\& } +\& }; +\& +\& timethese(100000, { +\& \*(Aqdebug\*(Aq => sub { +\& debug( "A $0 logging message via process\-id: $$" . Dumper(\e%INC) ) +\& }, +\& \*(Aqconstant\*(Aq => sub { +\& debug( "A $0 logging message via process\-id: $$" . Dumper(\e%INC) ) if DEBUG +\& }, +\& }); +.Ve +.PP +Running this program produces the following output: +.PP +.Vb 5 +\& $> perl ifdebug\-constant +\& Benchmark: timing 100000 iterations of constant, sub... +\& constant: 0 wallclock secs (\-0.00 usr + 0.00 sys = \-0.00 CPU) @ \-7205759403792793600000.00/s (n=100000) +\& (warning: too few iterations for a reliable count) +\& sub: 14 wallclock secs (13.09 usr + 0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000) +.Ve +.PP +The \f(CW\*(C`DEBUG\*(C'\fR constant wipes the floor with even the \f(CW$debug\fR variable, +clocking in at minus zero seconds, and generates a "warning: too few iterations +for a reliable count" message into the bargain. To see what is really going +on, and why we had too few iterations when we thought we asked for 100000, we +can use the very useful \f(CW\*(C`B::Deparse\*(C'\fR to inspect the new code: +.PP +.Vb 1 +\& $> perl \-MO=Deparse ifdebug\-constant +\& +\& use Benchmark; +\& use Data::Dumper; +\& use constant (\*(AqDEBUG\*(Aq, 0); +\& sub debug { +\& use warnings; +\& use strict \*(Aqrefs\*(Aq; +\& 0; +\& } +\& use warnings; +\& use strict \*(Aqrefs\*(Aq; +\& timethese(100000, {\*(Aqsub\*(Aq, sub { +\& debug "A $0 logging message via process\-id: $$" . Dumper(\e%INC); +\& } +\& , \*(Aqconstant\*(Aq, sub { +\& 0; +\& } +\& }); +\& ifdebug\-constant syntax OK +.Ve +.PP +The output shows the \fBconstant()\fR subroutine we're testing being replaced with +the value of the \f(CW\*(C`DEBUG\*(C'\fR constant: zero. The line to be tested has been +completely optimized away, and you can't get much more efficient than that. +.SH POSTSCRIPT +.IX Header "POSTSCRIPT" +This document has provided several way to go about identifying hot-spots, and +checking whether any modifications have improved the runtime of the code. +.PP +As a final thought, remember that it's not (at the time of writing) possible to +produce a useful program which will run in zero or negative time and this basic +principle can be written as: \fIuseful programs are slow\fR by their very +definition. It is of course possible to write a nearly instantaneous program, +but it's not going to do very much, here's a very efficient one: +.PP +.Vb 1 +\& $> perl \-e 0 +.Ve +.PP +Optimizing that any further is a job for \f(CW\*(C`p5p\*(C'\fR. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +Further reading can be found using the modules and links below. +.SS PERLDOCS +.IX Subsection "PERLDOCS" +For example: \f(CW\*(C`perldoc \-f sort\*(C'\fR. +.PP +perlfaq4. +.PP +perlfork, perlfunc, perlretut, perlthrtut. +.PP +threads. +.SS "MAN PAGES" +.IX Subsection "MAN PAGES" +\&\f(CW\*(C`time\*(C'\fR. +.SS MODULES +.IX Subsection "MODULES" +It's not possible to individually showcase all the performance related code for +Perl here, naturally, but here's a short list of modules from the CPAN which +deserve further attention. +.PP +.Vb 10 +\& Apache::DProf +\& Apache::SmallProf +\& Benchmark +\& DBIx::Profile +\& Devel::AutoProfiler +\& Devel::DProf +\& Devel::DProfLB +\& Devel::FastProf +\& Devel::GraphVizProf +\& Devel::NYTProf +\& Devel::NYTProf::Apache +\& Devel::Profiler +\& Devel::Profile +\& Devel::Profit +\& Devel::SmallProf +\& Devel::WxProf +\& POE::Devel::Profiler +\& Sort::Key +\& Sort::Maker +.Ve +.SS URLS +.IX Subsection "URLS" +Very useful online reference material: +.PP +.Vb 1 +\& https://web.archive.org/web/20120515021937/http://www.ccl4.org/~nick/P/Fast_Enough/ +\& +\& https://web.archive.org/web/20050706081718/http://www\-106.ibm.com/developerworks/library/l\-optperl.html +\& +\& https://perlbuzz.com/2007/11/14/bind_output_variables_in_dbi_for_speed_and_safety/ +\& +\& http://en.wikipedia.org/wiki/Performance_analysis +\& +\& http://apache.perl.org/docs/1.0/guide/performance.html +\& +\& http://perlgolf.sourceforge.net/ +\& +\& http://www.sysarch.com/Perl/sort_paper.html +.Ve +.SH AUTHOR +.IX Header "AUTHOR" +Richard Foley <richard.foley@rfi.net> Copyright (c) 2008 |