diff options
Diffstat (limited to 'upstream/mageia-cauldron/man3pm/Memoize.3pm')
-rw-r--r-- | upstream/mageia-cauldron/man3pm/Memoize.3pm | 823 |
1 files changed, 823 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man3pm/Memoize.3pm b/upstream/mageia-cauldron/man3pm/Memoize.3pm new file mode 100644 index 00000000..138f2407 --- /dev/null +++ b/upstream/mageia-cauldron/man3pm/Memoize.3pm @@ -0,0 +1,823 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "Memoize 3pm" +.TH Memoize 3pm 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +Memoize \- Make functions faster by trading space for time +.SH SYNOPSIS +.IX Header "SYNOPSIS" +.Vb 3 +\& use Memoize; +\& memoize(\*(Aqslow_function\*(Aq); +\& slow_function(arguments); # Is faster than it was before +.Ve +.PP +This is normally all you need to know. However, many options are available: +.PP +.Vb 1 +\& memoize(function, options...); +.Ve +.PP +Options include: +.PP +.Vb 2 +\& NORMALIZER => function +\& INSTALL => new_name +\& +\& SCALAR_CACHE => \*(AqMEMORY\*(Aq +\& SCALAR_CACHE => [\*(AqHASH\*(Aq, \e%cache_hash ] +\& SCALAR_CACHE => \*(AqFAULT\*(Aq +\& SCALAR_CACHE => \*(AqMERGE\*(Aq +\& +\& LIST_CACHE => \*(AqMEMORY\*(Aq +\& LIST_CACHE => [\*(AqHASH\*(Aq, \e%cache_hash ] +\& LIST_CACHE => \*(AqFAULT\*(Aq +\& LIST_CACHE => \*(AqMERGE\*(Aq +.Ve +.SH DESCRIPTION +.IX Header "DESCRIPTION" +\&\fIMemoizing\fR a function makes it faster by trading space for time. It +does this by caching the return values of the function in a table. +If you call the function again with the same arguments, \f(CW\*(C`memoize\*(C'\fR +jumps in and gives you the value out of the table, instead of letting +the function compute the value all over again. +.SH EXAMPLE +.IX Header "EXAMPLE" +Here is an extreme example. Consider the Fibonacci sequence, defined +by the following function: +.PP +.Vb 6 +\& # Compute Fibonacci numbers +\& sub fib { +\& my $n = shift; +\& return $n if $n < 2; +\& fib($n\-1) + fib($n\-2); +\& } +.Ve +.PP +This function is very slow. Why? To compute fib(14), it first wants +to compute fib(13) and fib(12), and add the results. But to compute +fib(13), it first has to compute fib(12) and fib(11), and then it +comes back and computes fib(12) all over again even though the answer +is the same. And both of the times that it wants to compute fib(12), +it has to compute fib(11) from scratch, and then it has to do it +again each time it wants to compute fib(13). This function does so +much recomputing of old results that it takes a really long time to +run\-\-\-fib(14) makes 1,200 extra recursive calls to itself, to compute +and recompute things that it already computed. +.PP +This function is a good candidate for memoization. If you memoize the +\&\f(CW\*(C`fib\*(C'\fR function above, it will compute fib(14) exactly once, the first +time it needs to, and then save the result in a table. Then if you +ask for fib(14) again, it gives you the result out of the table. +While computing fib(14), instead of computing fib(12) twice, it does +it once; the second time it needs the value it gets it from the table. +It doesn't compute fib(11) four times; it computes it once, getting it +from the table the next three times. Instead of making 1,200 +recursive calls to \f(CW\*(C`fib\*(C'\fR, it makes 15. This makes the function about +150 times faster. +.PP +You could do the memoization yourself, by rewriting the function, like +this: +.PP +.Vb 9 +\& # Compute Fibonacci numbers, memoized version +\& { my @fib; +\& sub fib { +\& my $n = shift; +\& return $fib[$n] if defined $fib[$n]; +\& return $fib[$n] = $n if $n < 2; +\& $fib[$n] = fib($n\-1) + fib($n\-2); +\& } +\& } +.Ve +.PP +Or you could use this module, like this: +.PP +.Vb 2 +\& use Memoize; +\& memoize(\*(Aqfib\*(Aq); +\& +\& # Rest of the fib function just like the original version. +.Ve +.PP +This makes it easy to turn memoizing on and off. +.PP +Here's an even simpler example: I wrote a simple ray tracer; the +program would look in a certain direction, figure out what it was +looking at, and then convert the \f(CW\*(C`color\*(C'\fR value (typically a string +like \f(CW\*(C`red\*(C'\fR) of that object to a red, green, and blue pixel value, like +this: +.PP +.Vb 6 +\& for ($direction = 0; $direction < 300; $direction++) { +\& # Figure out which object is in direction $direction +\& $color = $object\->{color}; +\& ($r, $g, $b) = @{&ColorToRGB($color)}; +\& ... +\& } +.Ve +.PP +Since there are relatively few objects in a picture, there are only a +few colors, which get looked up over and over again. Memoizing +\&\f(CW\*(C`ColorToRGB\*(C'\fR sped up the program by several percent. +.SH DETAILS +.IX Header "DETAILS" +This module exports exactly one function, \f(CW\*(C`memoize\*(C'\fR. The rest of the +functions in this package are None of Your Business. +.PP +You should say +.PP +.Vb 1 +\& memoize(function) +.Ve +.PP +where \f(CW\*(C`function\*(C'\fR is the name of the function you want to memoize, or +a reference to it. \f(CW\*(C`memoize\*(C'\fR returns a reference to the new, +memoized version of the function, or \f(CW\*(C`undef\*(C'\fR on a non-fatal error. +At present, there are no non-fatal errors, but there might be some in +the future. +.PP +If \f(CW\*(C`function\*(C'\fR was the name of a function, then \f(CW\*(C`memoize\*(C'\fR hides the +old version and installs the new memoized version under the old name, +so that \f(CW&function(...)\fR actually invokes the memoized version. +.SH OPTIONS +.IX Header "OPTIONS" +There are some optional options you can pass to \f(CW\*(C`memoize\*(C'\fR to change +the way it behaves a little. To supply options, invoke \f(CW\*(C`memoize\*(C'\fR +like this: +.PP +.Vb 5 +\& memoize(function, NORMALIZER => function, +\& INSTALL => newname, +\& SCALAR_CACHE => option, +\& LIST_CACHE => option +\& ); +.Ve +.PP +Each of these options is optional; you can include some, all, or none +of them. +.SS INSTALL +.IX Subsection "INSTALL" +If you supply a function name with \f(CW\*(C`INSTALL\*(C'\fR, memoize will install +the new, memoized version of the function under the name you give. +For example, +.PP +.Vb 1 +\& memoize(\*(Aqfib\*(Aq, INSTALL => \*(Aqfastfib\*(Aq) +.Ve +.PP +installs the memoized version of \f(CW\*(C`fib\*(C'\fR as \f(CW\*(C`fastfib\*(C'\fR; without the +\&\f(CW\*(C`INSTALL\*(C'\fR option it would have replaced the old \f(CW\*(C`fib\*(C'\fR with the +memoized version. +.PP +To prevent \f(CW\*(C`memoize\*(C'\fR from installing the memoized version anywhere, use +\&\f(CW\*(C`INSTALL => undef\*(C'\fR. +.SS NORMALIZER +.IX Subsection "NORMALIZER" +Suppose your function looks like this: +.PP +.Vb 6 +\& # Typical call: f(\*(Aqaha!\*(Aq, A => 11, B => 12); +\& sub f { +\& my $a = shift; +\& my %hash = @_; +\& $hash{B} ||= 2; # B defaults to 2 +\& $hash{C} ||= 7; # C defaults to 7 +\& +\& # Do something with $a, %hash +\& } +.Ve +.PP +Now, the following calls to your function are all completely equivalent: +.PP +.Vb 6 +\& f(OUCH); +\& f(OUCH, B => 2); +\& f(OUCH, C => 7); +\& f(OUCH, B => 2, C => 7); +\& f(OUCH, C => 7, B => 2); +\& (etc.) +.Ve +.PP +However, unless you tell \f(CW\*(C`Memoize\*(C'\fR that these calls are equivalent, +it will not know that, and it will compute the values for these +invocations of your function separately, and store them separately. +.PP +To prevent this, supply a \f(CW\*(C`NORMALIZER\*(C'\fR function that turns the +program arguments into a string in a way that equivalent arguments +turn into the same string. A \f(CW\*(C`NORMALIZER\*(C'\fR function for \f(CW\*(C`f\*(C'\fR above +might look like this: +.PP +.Vb 5 +\& sub normalize_f { +\& my $a = shift; +\& my %hash = @_; +\& $hash{B} ||= 2; +\& $hash{C} ||= 7; +\& +\& join(\*(Aq,\*(Aq, $a, map ($_ => $hash{$_}) sort keys %hash); +\& } +.Ve +.PP +Each of the argument lists above comes out of the \f(CW\*(C`normalize_f\*(C'\fR +function looking exactly the same, like this: +.PP +.Vb 1 +\& OUCH,B,2,C,7 +.Ve +.PP +You would tell \f(CW\*(C`Memoize\*(C'\fR to use this normalizer this way: +.PP +.Vb 1 +\& memoize(\*(Aqf\*(Aq, NORMALIZER => \*(Aqnormalize_f\*(Aq); +.Ve +.PP +\&\f(CW\*(C`memoize\*(C'\fR knows that if the normalized version of the arguments is +the same for two argument lists, then it can safely look up the value +that it computed for one argument list and return it as the result of +calling the function with the other argument list, even if the +argument lists look different. +.PP +The default normalizer just concatenates the arguments with character +28 in between. (In ASCII, this is called FS or control\-\e.) This +always works correctly for functions with only one string argument, +and also when the arguments never contain character 28. However, it +can confuse certain argument lists: +.PP +.Vb 3 +\& normalizer("a\e034", "b") +\& normalizer("a", "\e034b") +\& normalizer("a\e034\e034b") +.Ve +.PP +for example. +.PP +Since hash keys are strings, the default normalizer will not +distinguish between \f(CW\*(C`undef\*(C'\fR and the empty string. It also won't work +when the function's arguments are references. For example, consider a +function \f(CW\*(C`g\*(C'\fR which gets two arguments: A number, and a reference to +an array of numbers: +.PP +.Vb 1 +\& g(13, [1,2,3,4,5,6,7]); +.Ve +.PP +The default normalizer will turn this into something like +\&\f(CW"13\e034ARRAY(0x436c1f)"\fR. That would be all right, except that a +subsequent array of numbers might be stored at a different location +even though it contains the same data. If this happens, \f(CW\*(C`Memoize\*(C'\fR +will think that the arguments are different, even though they are +equivalent. In this case, a normalizer like this is appropriate: +.PP +.Vb 1 +\& sub normalize { join \*(Aq \*(Aq, $_[0], @{$_[1]} } +.Ve +.PP +For the example above, this produces the key "13 1 2 3 4 5 6 7". +.PP +Another use for normalizers is when the function depends on data other +than those in its arguments. Suppose you have a function which +returns a value which depends on the current hour of the day: +.PP +.Vb 10 +\& sub on_duty { +\& my ($problem_type) = @_; +\& my $hour = (localtime)[2]; +\& open my $fh, "$DIR/$problem_type" or die...; +\& my $line; +\& while ($hour\-\- > 0) +\& $line = <$fh>; +\& } +\& return $line; +\& } +.Ve +.PP +At 10:23, this function generates the 10th line of a data file; at +3:45 PM it generates the 15th line instead. By default, \f(CW\*(C`Memoize\*(C'\fR +will only see the \f(CW$problem_type\fR argument. To fix this, include the +current hour in the normalizer: +.PP +.Vb 1 +\& sub normalize { join \*(Aq \*(Aq, (localtime)[2], @_ } +.Ve +.PP +The calling context of the function (scalar or list context) is +propagated to the normalizer. This means that if the memoized +function will treat its arguments differently in list context than it +would in scalar context, you can have the normalizer function select +its behavior based on the results of \f(CW\*(C`wantarray\*(C'\fR. Even if called in +a list context, a normalizer should still return a single string. +.ie n .SS """SCALAR_CACHE"", ""LIST_CACHE""" +.el .SS "\f(CWSCALAR_CACHE\fP, \f(CWLIST_CACHE\fP" +.IX Subsection "SCALAR_CACHE, LIST_CACHE" +Normally, \f(CW\*(C`Memoize\*(C'\fR caches your function's return values into an +ordinary Perl hash variable. However, you might like to have the +values cached on the disk, so that they persist from one run of your +program to the next, or you might like to associate some other +interesting semantics with the cached values. +.PP +There's a slight complication under the hood of \f(CW\*(C`Memoize\*(C'\fR: There are +actually \fItwo\fR caches, one for scalar values and one for list values. +When your function is called in scalar context, its return value is +cached in one hash, and when your function is called in list context, +its value is cached in the other hash. You can control the caching +behavior of both contexts independently with these options. +.PP +The argument to \f(CW\*(C`LIST_CACHE\*(C'\fR or \f(CW\*(C`SCALAR_CACHE\*(C'\fR must either be one of +the following four strings: +.PP +.Vb 4 +\& MEMORY +\& FAULT +\& MERGE +\& HASH +.Ve +.PP +or else it must be a reference to an array whose first element is one of +these four strings, such as \f(CW\*(C`[HASH, arguments...]\*(C'\fR. +.ie n .IP """MEMORY""" 4 +.el .IP \f(CWMEMORY\fR 4 +.IX Item "MEMORY" +\&\f(CW\*(C`MEMORY\*(C'\fR means that return values from the function will be cached in +an ordinary Perl hash variable. The hash variable will not persist +after the program exits. This is the default. +.ie n .IP """HASH""" 4 +.el .IP \f(CWHASH\fR 4 +.IX Item "HASH" +\&\f(CW\*(C`HASH\*(C'\fR allows you to specify that a particular hash that you supply +will be used as the cache. You can tie this hash beforehand to give +it any behavior you want. +.Sp +A tied hash can have any semantics at all. It is typically tied to an +on-disk database, so that cached values are stored in the database and +retrieved from it again when needed, and the disk file typically +persists after your program has exited. See \f(CW\*(C`perltie\*(C'\fR for more +complete details about \f(CW\*(C`tie\*(C'\fR. +.Sp +A typical example is: +.Sp +.Vb 3 +\& use DB_File; +\& tie my %cache => \*(AqDB_File\*(Aq, $filename, O_RDWR|O_CREAT, 0666; +\& memoize \*(Aqfunction\*(Aq, SCALAR_CACHE => [HASH => \e%cache]; +.Ve +.Sp +This has the effect of storing the cache in a \f(CW\*(C`DB_File\*(C'\fR database +whose name is in \f(CW$filename\fR. The cache will persist after the +program has exited. Next time the program runs, it will find the +cache already populated from the previous run of the program. Or you +can forcibly populate the cache by constructing a batch program that +runs in the background and populates the cache file. Then when you +come to run your real program the memoized function will be fast +because all its results have been precomputed. +.Sp +Another reason to use \f(CW\*(C`HASH\*(C'\fR is to provide your own hash variable. +You can then inspect or modify the contents of the hash to gain finer +control over the cache management. +.ie n .IP """TIE""" 4 +.el .IP \f(CWTIE\fR 4 +.IX Item "TIE" +This option is no longer supported. It is still documented only to +aid in the debugging of old programs that use it. Old programs should +be converted to use the \f(CW\*(C`HASH\*(C'\fR option instead. +.Sp +.Vb 1 +\& memoize ... [\*(AqTIE\*(Aq, PACKAGE, ARGS...] +.Ve +.Sp +is merely a shortcut for +.Sp +.Vb 4 +\& require PACKAGE; +\& { tie my %cache, PACKAGE, ARGS...; +\& memoize ... [HASH => \e%cache]; +\& } +.Ve +.ie n .IP """FAULT""" 4 +.el .IP \f(CWFAULT\fR 4 +.IX Item "FAULT" +\&\f(CW\*(C`FAULT\*(C'\fR means that you never expect to call the function in scalar +(or list) context, and that if \f(CW\*(C`Memoize\*(C'\fR detects such a call, it +should abort the program. The error message is one of +.Sp +.Vb 2 +\& \`foo\*(Aq function called in forbidden list context at line ... +\& \`foo\*(Aq function called in forbidden scalar context at line ... +.Ve +.ie n .IP """MERGE""" 4 +.el .IP \f(CWMERGE\fR 4 +.IX Item "MERGE" +\&\f(CW\*(C`MERGE\*(C'\fR normally means that the memoized function does not +distinguish between list and scalar context, and that return values in +both contexts should be stored together. Both \f(CW\*(C`LIST_CACHE => +MERGE\*(C'\fR and \f(CW\*(C`SCALAR_CACHE => MERGE\*(C'\fR mean the same thing. +.Sp +Consider this function: +.Sp +.Vb 4 +\& sub complicated { +\& # ... time\-consuming calculation of $result +\& return $result; +\& } +.Ve +.Sp +The \f(CW\*(C`complicated\*(C'\fR function will return the same numeric \f(CW$result\fR +regardless of whether it is called in list or in scalar context. +.Sp +Normally, the following code will result in two calls to \f(CW\*(C`complicated\*(C'\fR, even +if \f(CW\*(C`complicated\*(C'\fR is memoized: +.Sp +.Vb 3 +\& $x = complicated(142); +\& ($y) = complicated(142); +\& $z = complicated(142); +.Ve +.Sp +The first call will cache the result, say 37, in the scalar cache; the +second will cache the list \f(CW\*(C`(37)\*(C'\fR in the list cache. The third call +doesn't call the real \f(CW\*(C`complicated\*(C'\fR function; it gets the value 37 +from the scalar cache. +.Sp +Obviously, the second call to \f(CW\*(C`complicated\*(C'\fR is a waste of time, and +storing its return value is a waste of space. Specifying \f(CW\*(C`LIST_CACHE +=> MERGE\*(C'\fR will make \f(CW\*(C`memoize\*(C'\fR use the same cache for scalar and +list context return values, so that the second call uses the scalar +cache that was populated by the first call. \f(CW\*(C`complicated\*(C'\fR ends up +being called only once, and both subsequent calls return \f(CW37\fR from the +cache, regardless of the calling context. +.PP +\fIList values in scalar context\fR +.IX Subsection "List values in scalar context" +.PP +Consider this function: +.PP +.Vb 1 +\& sub iota { return reverse (1..$_[0]) } +.Ve +.PP +This function normally returns a list. Suppose you memoize it and +merge the caches: +.PP +.Vb 1 +\& memoize \*(Aqiota\*(Aq, SCALAR_CACHE => \*(AqMERGE\*(Aq; +\& +\& @i7 = iota(7); +\& $i7 = iota(7); +.Ve +.PP +Here the first call caches the list (1,2,3,4,5,6,7). The second call +does not really make sense. \f(CW\*(C`Memoize\*(C'\fR cannot guess what behavior +\&\f(CW\*(C`iota\*(C'\fR should have in scalar context without actually calling it in +scalar context. Normally \f(CW\*(C`Memoize\*(C'\fR \fIwould\fR call \f(CW\*(C`iota\*(C'\fR in scalar +context and cache the result, but the \f(CW\*(C`SCALAR_CACHE => \*(AqMERGE\*(Aq\*(C'\fR +option says not to do that, but to use the cache list-context value +instead. But it cannot return a list of seven elements in a scalar +context. In this case \f(CW$i7\fR will receive the \fBfirst element\fR of the +cached list value, namely 7. +.PP +\fIMerged disk caches\fR +.IX Subsection "Merged disk caches" +.PP +Another use for \f(CW\*(C`MERGE\*(C'\fR is when you want both kinds of return values +stored in the same disk file; this saves you from having to deal with +two disk files instead of one. You can use a normalizer function to +keep the two sets of return values separate. For example: +.PP +.Vb 2 +\& local $MLDBM::UseDB = \*(AqDB_File\*(Aq; +\& tie my %cache => \*(AqMLDBM\*(Aq, $filename, ...; +\& +\& memoize \*(Aqmyfunc\*(Aq, +\& NORMALIZER => \*(Aqn\*(Aq, +\& SCALAR_CACHE => [HASH => \e%cache], +\& LIST_CACHE => \*(AqMERGE\*(Aq, +\& ; +\& +\& sub n { +\& my $context = wantarray() ? \*(AqL\*(Aq : \*(AqS\*(Aq; +\& # ... now compute the hash key from the arguments ... +\& $hashkey = "$context:$hashkey"; +\& } +.Ve +.PP +This normalizer function will store scalar context return values in +the disk file under keys that begin with \f(CW\*(C`S:\*(C'\fR, and list context +return values under keys that begin with \f(CW\*(C`L:\*(C'\fR. +.SH "OTHER FACILITIES" +.IX Header "OTHER FACILITIES" +.ie n .SS """unmemoize""" +.el .SS \f(CWunmemoize\fP +.IX Subsection "unmemoize" +There's an \f(CW\*(C`unmemoize\*(C'\fR function that you can import if you want to. +Why would you want to? Here's an example: Suppose you have your cache +tied to a DBM file, and you want to make sure that the cache is +written out to disk if someone interrupts the program. If the program +exits normally, this will happen anyway, but if someone types +control-C or something then the program will terminate immediately +without synchronizing the database. So what you can do instead is +.PP +.Vb 1 +\& $SIG{INT} = sub { unmemoize \*(Aqfunction\*(Aq }; +.Ve +.PP +\&\f(CW\*(C`unmemoize\*(C'\fR accepts a reference to, or the name of a previously +memoized function, and undoes whatever it did to provide the memoized +version in the first place, including making the name refer to the +unmemoized version if appropriate. It returns a reference to the +unmemoized version of the function. +.PP +If you ask it to unmemoize a function that was never memoized, it +croaks. +.ie n .SS """flush_cache""" +.el .SS \f(CWflush_cache\fP +.IX Subsection "flush_cache" +\&\f(CWflush_cache(function)\fR will flush out the caches, discarding \fIall\fR +the cached data. The argument may be a function name or a reference +to a function. For finer control over when data is discarded or +expired, see the documentation for \f(CW\*(C`Memoize::Expire\*(C'\fR, included in +this package. +.PP +Note that if the cache is a tied hash, \f(CW\*(C`flush_cache\*(C'\fR will attempt to +invoke the \f(CW\*(C`CLEAR\*(C'\fR method on the hash. If there is no \f(CW\*(C`CLEAR\*(C'\fR +method, this will cause a run-time error. +.PP +An alternative approach to cache flushing is to use the \f(CW\*(C`HASH\*(C'\fR option +(see above) to request that \f(CW\*(C`Memoize\*(C'\fR use a particular hash variable +as its cache. Then you can examine or modify the hash at any time in +any way you desire. You may flush the cache by using \f(CW\*(C`%hash = ()\*(C'\fR. +.SH CAVEATS +.IX Header "CAVEATS" +Memoization is not a cure-all: +.IP \(bu 4 +Do not memoize a function whose behavior depends on program +state other than its own arguments, such as global variables, the time +of day, or file input. These functions will not produce correct +results when memoized. For a particularly easy example: +.Sp +.Vb 3 +\& sub f { +\& time; +\& } +.Ve +.Sp +This function takes no arguments, and as far as \f(CW\*(C`Memoize\*(C'\fR is +concerned, it always returns the same result. \f(CW\*(C`Memoize\*(C'\fR is wrong, of +course, and the memoized version of this function will call \f(CW\*(C`time\*(C'\fR once +to get the current time, and it will return that same time +every time you call it after that. +.IP \(bu 4 +Do not memoize a function with side effects. +.Sp +.Vb 5 +\& sub f { +\& my ($a, $b) = @_; +\& my $s = $a + $b; +\& print "$a + $b = $s.\en"; +\& } +.Ve +.Sp +This function accepts two arguments, adds them, and prints their sum. +Its return value is the number of characters it printed, but you +probably didn't care about that. But \f(CW\*(C`Memoize\*(C'\fR doesn't understand +that. If you memoize this function, you will get the result you +expect the first time you ask it to print the sum of 2 and 3, but +subsequent calls will return 1 (the return value of +\&\f(CW\*(C`print\*(C'\fR) without actually printing anything. +.IP \(bu 4 +Do not memoize a function that returns a data structure that is +modified by its caller. +.Sp +Consider these functions: \f(CW\*(C`getusers\*(C'\fR returns a list of users somehow, +and then \f(CW\*(C`main\*(C'\fR throws away the first user on the list and prints the +rest: +.Sp +.Vb 7 +\& sub main { +\& my $userlist = getusers(); +\& shift @$userlist; +\& foreach $u (@$userlist) { +\& print "User $u\en"; +\& } +\& } +\& +\& sub getusers { +\& my @users; +\& # Do something to get a list of users; +\& \e@users; # Return reference to list. +\& } +.Ve +.Sp +If you memoize \f(CW\*(C`getusers\*(C'\fR here, it will work right exactly once. The +reference to the users list will be stored in the memo table. \f(CW\*(C`main\*(C'\fR +will discard the first element from the referenced list. The next +time you invoke \f(CW\*(C`main\*(C'\fR, \f(CW\*(C`Memoize\*(C'\fR will not call \f(CW\*(C`getusers\*(C'\fR; it will +just return the same reference to the same list it got last time. But +this time the list has already had its head removed; \f(CW\*(C`main\*(C'\fR will +erroneously remove another element from it. The list will get shorter +and shorter every time you call \f(CW\*(C`main\*(C'\fR. +.Sp +Similarly, this: +.Sp +.Vb 3 +\& $u1 = getusers(); +\& $u2 = getusers(); +\& pop @$u1; +.Ve +.Sp +will modify \f(CW$u2\fR as well as \f(CW$u1\fR, because both variables are references +to the same array. Had \f(CW\*(C`getusers\*(C'\fR not been memoized, \f(CW$u1\fR and \f(CW$u2\fR +would have referred to different arrays. +.IP \(bu 4 +Do not memoize a very simple function. +.Sp +Recently someone mentioned to me that the Memoize module made his +program run slower instead of faster. It turned out that he was +memoizing the following function: +.Sp +.Vb 3 +\& sub square { +\& $_[0] * $_[0]; +\& } +.Ve +.Sp +I pointed out that \f(CW\*(C`Memoize\*(C'\fR uses a hash, and that looking up a +number in the hash is necessarily going to take a lot longer than a +single multiplication. There really is no way to speed up the +\&\f(CW\*(C`square\*(C'\fR function. +.Sp +Memoization is not magical. +.SH "PERSISTENT CACHE SUPPORT" +.IX Header "PERSISTENT CACHE SUPPORT" +You can tie the cache tables to any sort of tied hash that you want +to, as long as it supports \f(CW\*(C`TIEHASH\*(C'\fR, \f(CW\*(C`FETCH\*(C'\fR, \f(CW\*(C`STORE\*(C'\fR, and +\&\f(CW\*(C`EXISTS\*(C'\fR. For example, +.PP +.Vb 2 +\& tie my %cache => \*(AqGDBM_File\*(Aq, $filename, O_RDWR|O_CREAT, 0666; +\& memoize \*(Aqfunction\*(Aq, SCALAR_CACHE => [HASH => \e%cache]; +.Ve +.PP +works just fine. For some storage methods, you need a little glue. +.PP +\&\f(CW\*(C`SDBM_File\*(C'\fR doesn't supply an \f(CW\*(C`EXISTS\*(C'\fR method, so included in this +package is a glue module called \f(CW\*(C`Memoize::SDBM_File\*(C'\fR which does +provide one. Use this instead of plain \f(CW\*(C`SDBM_File\*(C'\fR to store your +cache table on disk in an \f(CW\*(C`SDBM_File\*(C'\fR database: +.PP +.Vb 2 +\& tie my %cache => \*(AqMemoize::SDBM_File\*(Aq, $filename, O_RDWR|O_CREAT, 0666; +\& memoize \*(Aqfunction\*(Aq, SCALAR_CACHE => [HASH => \e%cache]; +.Ve +.PP +\&\f(CW\*(C`NDBM_File\*(C'\fR has the same problem and the same solution. (Use +\&\f(CW\*(C`Memoize::NDBM_File instead of plain NDBM_File.\*(C'\fR) +.PP +\&\f(CW\*(C`Storable\*(C'\fR isn't a tied hash class at all. You can use it to store a +hash to disk and retrieve it again, but you can't modify the hash while +it's on the disk. So if you want to store your cache table in a +\&\f(CW\*(C`Storable\*(C'\fR database, use \f(CW\*(C`Memoize::Storable\*(C'\fR, which puts a hashlike +front-end onto \f(CW\*(C`Storable\*(C'\fR. The hash table is actually kept in +memory, and is loaded from your \f(CW\*(C`Storable\*(C'\fR file at the time you +memoize the function, and stored back at the time you unmemoize the +function (or when your program exits): +.PP +.Vb 2 +\& tie my %cache => \*(AqMemoize::Storable\*(Aq, $filename; +\& memoize \*(Aqfunction\*(Aq, SCALAR_CACHE => [HASH => \e%cache]; +\& +\& tie my %cache => \*(AqMemoize::Storable\*(Aq, $filename, \*(Aqnstore\*(Aq; +\& memoize \*(Aqfunction\*(Aq, SCALAR_CACHE => [HASH => \e%cache]; +.Ve +.PP +Include the \f(CW\*(C`nstore\*(C'\fR option to have the \f(CW\*(C`Storable\*(C'\fR database written +in \fInetwork order\fR. (See Storable for more details about this.) +.PP +The \f(CWflush_cache()\fR function will raise a run-time error unless the +tied package provides a \f(CW\*(C`CLEAR\*(C'\fR method. +.SH "EXPIRATION SUPPORT" +.IX Header "EXPIRATION SUPPORT" +See Memoize::Expire, which is a plug-in module that adds expiration +functionality to Memoize. If you don't like the kinds of policies +that Memoize::Expire implements, it is easy to write your own plug-in +module to implement whatever policy you desire. Memoize comes with +several examples. An expiration manager that implements a LRU policy +is available on CPAN as Memoize::ExpireLRU. +.SH BUGS +.IX Header "BUGS" +The test suite is much better, but always needs improvement. +.PP +There is some problem with the way \f(CW\*(C`goto &f\*(C'\fR works under threaded +Perl, perhaps because of the lexical scoping of \f(CW@_\fR. This is a bug +in Perl, and until it is resolved, memoized functions will see a +slightly different \f(CWcaller()\fR and will perform a little more slowly +on threaded perls than unthreaded perls. +.PP +Some versions of \f(CW\*(C`DB_File\*(C'\fR won't let you store data under a key of +length 0. That means that if you have a function \f(CW\*(C`f\*(C'\fR which you +memoized and the cache is in a \f(CW\*(C`DB_File\*(C'\fR database, then the value of +\&\f(CWf()\fR (\f(CW\*(C`f\*(C'\fR called with no arguments) will not be memoized. If this +is a big problem, you can supply a normalizer function that prepends +\&\f(CW"x"\fR to every key. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +At <https://perl.plover.com/MiniMemoize/> there is an article about +memoization and about the internals of Memoize that appeared in The +Perl Journal, issue #13. +.PP +Mark-Jason Dominus's book \fIHigher-Order Perl\fR (2005, ISBN 1558607013, +published +by Morgan Kaufmann) discusses memoization (and many other +topics) in tremendous detail. It is available on-line for free. +For more information, visit <https://hop.perl.plover.com/>. +.SH "THANK YOU" +.IX Header "THANK YOU" +Many thanks to Florian Ragwitz for administration and packaging +assistance, to John Tromp for bug reports, to Jonathan Roy for bug reports +and suggestions, to Michael Schwern for other bug reports and patches, +to Mike Cariaso for helping me to figure out the Right Thing to Do +About Expiration, to Joshua Gerth, Joshua Chamas, Jonathan Roy +(again), Mark D. Anderson, and Andrew Johnson for more suggestions +about expiration, to Brent Powers for the Memoize::ExpireLRU module, +to Ariel Scolnicov for delightful messages about the Fibonacci +function, to Dion Almaer for thought-provoking suggestions about the +default normalizer, to Walt Mankowski and Kurt Starsinic for much help +investigating problems under threaded Perl, to Alex Dudkevich for +reporting the bug in prototyped functions and for checking my patch, +to Tony Bass for many helpful suggestions, to Jonathan Roy (again) for +finding a use for \f(CWunmemoize()\fR, to Philippe Verdret for enlightening +discussion of \f(CW\*(C`Hook::PrePostCall\*(C'\fR, to Nat Torkington for advice I +ignored, to Chris Nandor for portability advice, to Randal Schwartz +for suggesting the '\f(CW\*(C`flush_cache\*(C'\fR function, and to Jenda Krynicky for +being a light in the world. +.PP +Special thanks to Jarkko Hietaniemi, the 5.8.0 pumpking, for including +this module in the core and for his patient and helpful guidance +during the integration process. +.SH AUTHOR +.IX Header "AUTHOR" +Mark Jason Dominus +.SH "COPYRIGHT AND LICENSE" +.IX Header "COPYRIGHT AND LICENSE" +This software is copyright (c) 2012 by Mark Jason Dominus. +.PP +This is free software; you can redistribute it and/or modify it under +the same terms as the Perl 5 programming language system itself. |