summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/geqo-pg-intro.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
commit46651ce6fe013220ed397add242004d764fc0153 (patch)
tree6e5299f990f88e60174a1d3ae6e48eedd2688b2b /doc/src/sgml/html/geqo-pg-intro.html
parentInitial commit. (diff)
downloadpostgresql-14-upstream.tar.xz
postgresql-14-upstream.zip
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/geqo-pg-intro.html')
-rw-r--r--doc/src/sgml/html/geqo-pg-intro.html107
1 files changed, 107 insertions, 0 deletions
diff --git a/doc/src/sgml/html/geqo-pg-intro.html b/doc/src/sgml/html/geqo-pg-intro.html
new file mode 100644
index 0000000..d85697c
--- /dev/null
+++ b/doc/src/sgml/html/geqo-pg-intro.html
@@ -0,0 +1,107 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>60.3. Genetic Query Optimization (GEQO) in PostgreSQL</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="geqo-intro2.html" title="60.2. Genetic Algorithms" /><link rel="next" href="geqo-biblio.html" title="60.4. Further Reading" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">60.3. Genetic Query Optimization (<acronym xmlns="http://www.w3.org/1999/xhtml" class="acronym">GEQO</acronym>) in PostgreSQL</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="geqo-intro2.html" title="60.2. Genetic Algorithms">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="geqo.html" title="Chapter 60. Genetic Query Optimizer">Up</a></td><th width="60%" align="center">Chapter 60. Genetic Query Optimizer</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="geqo-biblio.html" title="60.4. Further Reading">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="GEQO-PG-INTRO"><div class="titlepage"><div><div><h2 class="title" style="clear: both">60.3. Genetic Query Optimization (<acronym class="acronym">GEQO</acronym>) in PostgreSQL</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="geqo-pg-intro.html#id-1.10.12.5.6">60.3.1. Generating Possible Plans with <acronym class="acronym">GEQO</acronym></a></span></dt><dt><span class="sect2"><a href="geqo-pg-intro.html#GEQO-FUTURE">60.3.2. Future Implementation Tasks for
+ <span class="productname">PostgreSQL</span> <acronym class="acronym">GEQO</acronym></a></span></dt></dl></div><p>
+ The <acronym class="acronym">GEQO</acronym> module approaches the query
+ optimization problem as though it were the well-known traveling salesman
+ problem (<acronym class="acronym">TSP</acronym>).
+ Possible query plans are encoded as integer strings. Each string
+ represents the join order from one relation of the query to the next.
+ For example, the join tree
+</p><pre class="literallayout">
+ /\
+ /\ 2
+ /\ 3
+4 1
+</pre><p>
+ is encoded by the integer string '4-1-3-2',
+ which means, first join relation '4' and '1', then '3', and
+ then '2', where 1, 2, 3, 4 are relation IDs within the
+ <span class="productname">PostgreSQL</span> optimizer.
+ </p><p>
+ Specific characteristics of the <acronym class="acronym">GEQO</acronym>
+ implementation in <span class="productname">PostgreSQL</span>
+ are:
+
+ </p><div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: bullet; "><li class="listitem" style="list-style-type: disc"><p>
+ Usage of a <em class="firstterm">steady state</em> <acronym class="acronym">GA</acronym> (replacement of the least fit
+ individuals in a population, not whole-generational replacement)
+ allows fast convergence towards improved query plans. This is
+ essential for query handling with reasonable time;
+ </p></li><li class="listitem" style="list-style-type: disc"><p>
+ Usage of <em class="firstterm">edge recombination crossover</em>
+ which is especially suited to keep edge losses low for the
+ solution of the <acronym class="acronym">TSP</acronym> by means of a
+ <acronym class="acronym">GA</acronym>;
+ </p></li><li class="listitem" style="list-style-type: disc"><p>
+ Mutation as genetic operator is deprecated so that no repair
+ mechanisms are needed to generate legal <acronym class="acronym">TSP</acronym> tours.
+ </p></li></ul></div><p>
+ </p><p>
+ Parts of the <acronym class="acronym">GEQO</acronym> module are adapted from D. Whitley's
+ Genitor algorithm.
+ </p><p>
+ The <acronym class="acronym">GEQO</acronym> module allows
+ the <span class="productname">PostgreSQL</span> query optimizer to
+ support large join queries effectively through
+ non-exhaustive search.
+ </p><div class="sect2" id="id-1.10.12.5.6"><div class="titlepage"><div><div><h3 class="title">60.3.1. Generating Possible Plans with <acronym class="acronym">GEQO</acronym></h3></div></div></div><p>
+ The <acronym class="acronym">GEQO</acronym> planning process uses the standard planner
+ code to generate plans for scans of individual relations. Then join
+ plans are developed using the genetic approach. As shown above, each
+ candidate join plan is represented by a sequence in which to join
+ the base relations. In the initial stage, the <acronym class="acronym">GEQO</acronym>
+ code simply generates some possible join sequences at random. For each
+ join sequence considered, the standard planner code is invoked to
+ estimate the cost of performing the query using that join sequence.
+ (For each step of the join sequence, all three possible join strategies
+ are considered; and all the initially-determined relation scan plans
+ are available. The estimated cost is the cheapest of these
+ possibilities.) Join sequences with lower estimated cost are considered
+ <span class="quote">“<span class="quote">more fit</span>”</span> than those with higher cost. The genetic algorithm
+ discards the least fit candidates. Then new candidates are generated
+ by combining genes of more-fit candidates — that is, by using
+ randomly-chosen portions of known low-cost join sequences to create
+ new sequences for consideration. This process is repeated until a
+ preset number of join sequences have been considered; then the best
+ one found at any time during the search is used to generate the finished
+ plan.
+ </p><p>
+ This process is inherently nondeterministic, because of the randomized
+ choices made during both the initial population selection and subsequent
+ <span class="quote">“<span class="quote">mutation</span>”</span> of the best candidates. To avoid surprising changes
+ of the selected plan, each run of the GEQO algorithm restarts its
+ random number generator with the current <a class="xref" href="runtime-config-query.html#GUC-GEQO-SEED">geqo_seed</a>
+ parameter setting. As long as <code class="varname">geqo_seed</code> and the other
+ GEQO parameters are kept fixed, the same plan will be generated for a
+ given query (and other planner inputs such as statistics). To experiment
+ with different search paths, try changing <code class="varname">geqo_seed</code>.
+ </p></div><div class="sect2" id="GEQO-FUTURE"><div class="titlepage"><div><div><h3 class="title">60.3.2. Future Implementation Tasks for
+ <span class="productname">PostgreSQL</span> <acronym class="acronym">GEQO</acronym></h3></div></div></div><p>
+ Work is still needed to improve the genetic algorithm parameter
+ settings.
+ In file <code class="filename">src/backend/optimizer/geqo/geqo_main.c</code>,
+ routines
+ <code class="function">gimme_pool_size</code> and <code class="function">gimme_number_generations</code>,
+ we have to find a compromise for the parameter settings
+ to satisfy two competing demands:
+ </p><div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; "><li class="listitem"><p>
+ Optimality of the query plan
+ </p></li><li class="listitem"><p>
+ Computing time
+ </p></li></ul></div><p>
+ </p><p>
+ In the current implementation, the fitness of each candidate join
+ sequence is estimated by running the standard planner's join selection
+ and cost estimation code from scratch. To the extent that different
+ candidates use similar sub-sequences of joins, a great deal of work
+ will be repeated. This could be made significantly faster by retaining
+ cost estimates for sub-joins. The problem is to avoid expending
+ unreasonable amounts of memory on retaining that state.
+ </p><p>
+ At a more basic level, it is not clear that solving query optimization
+ with a GA algorithm designed for TSP is appropriate. In the TSP case,
+ the cost associated with any substring (partial tour) is independent
+ of the rest of the tour, but this is certainly not true for query
+ optimization. Thus it is questionable whether edge recombination
+ crossover is the most effective mutation procedure.
+ </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="geqo-intro2.html" title="60.2. Genetic Algorithms">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="geqo.html" title="Chapter 60. Genetic Query Optimizer">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="geqo-biblio.html" title="60.4. Further Reading">Next</a></td></tr><tr><td width="40%" align="left" valign="top">60.2. Genetic Algorithms </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 60.4. Further Reading</td></tr></table></div></body></html> \ No newline at end of file