summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/bki.sgml
blob: f71644e39898286216fec9125313e6b32a7eb5e2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
<!-- doc/src/sgml/bki.sgml -->

<chapter id="bki">
 <title>System Catalog Declarations and Initial Contents</title>

 <para>
  <productname>PostgreSQL</productname> uses many different system catalogs
  to keep track of the existence and properties of database objects, such as
  tables and functions.  Physically there is no difference between a system
  catalog and a plain user table, but the backend C code knows the structure
  and properties of each catalog, and can manipulate it directly at a low
  level.  Thus, for example, it is inadvisable to attempt to alter the
  structure of a catalog on-the-fly; that would break assumptions built into
  the C code about how rows of the catalog are laid out.  But the structure
  of the catalogs can change between major versions.
 </para>

 <para>
  The structures of the catalogs are declared in specially formatted C
  header files in the <filename>src/include/catalog/</filename> directory of
  the source tree.  For each catalog there is a header file
  named after the catalog (e.g., <filename>pg_class.h</filename>
  for <structname>pg_class</structname>), which defines the set of columns
  the catalog has, as well as some other basic properties such as its OID.
 </para>

 <para>
  Many of the catalogs have initial data that must be loaded into them
  during the <quote>bootstrap</quote> phase
  of <application>initdb</application>, to bring the system up to a point
  where it is capable of executing SQL commands.  (For
  example, <filename>pg_class.h</filename> must contain an entry for itself,
  as well as one for each other system catalog and index.)  This
  initial data is kept in editable form in data files that are also stored
  in the <filename>src/include/catalog/</filename> directory.  For example,
  <filename>pg_proc.dat</filename> describes all the initial rows that must
  be inserted into the <structname>pg_proc</structname> catalog.
 </para>

 <para>
  To create the catalog files and load this initial data into them, a
  backend running in bootstrap mode reads a <acronym>BKI</acronym>
  (Backend Interface) file containing commands and initial data.
  The <filename>postgres.bki</filename> file used in this mode is prepared
  from the aforementioned header and data files, while building
  a <productname>PostgreSQL</productname> distribution, by a Perl script
  named <filename>genbki.pl</filename>.
  Although it's specific to a particular <productname>PostgreSQL</productname>
  release, <filename>postgres.bki</filename> is platform-independent and is
  installed in the <filename>share</filename> subdirectory of the
  installation tree.
 </para>

 <para>
  <filename>genbki.pl</filename> also produces a derived header file for
  each catalog, for example <filename>pg_class_d.h</filename> for
  the <structname>pg_class</structname> catalog.  This file contains
  automatically-generated macro definitions, and may contain other macros,
  enum declarations, and so on that can be useful for client C code that
  reads a particular catalog.
 </para>

 <para>
  Most PostgreSQL developers don't need to be directly concerned with
  the <acronym>BKI</acronym> file, but almost any nontrivial feature
  addition in the backend will require modifying the catalog header files
  and/or initial data files.  The rest of this chapter gives some
  information about that, and for completeness describes
  the <acronym>BKI</acronym> file format.
 </para>

 <sect1 id="system-catalog-declarations">
  <title>System Catalog Declaration Rules</title>

  <para>
   The key part of a catalog header file is a C structure definition
   describing the layout of each row of the catalog.  This begins with
   a <literal>CATALOG</literal> macro, which so far as the C compiler is
   concerned is just shorthand for <literal>typedef struct
   FormData_<replaceable>catalogname</replaceable></literal>.
   Each field in the struct gives rise to a catalog column.
   Fields can be annotated using the BKI property macros described
   in <filename>genbki.h</filename>, for example to define a default value
   for a field or mark it as nullable or not nullable.
   The <literal>CATALOG</literal> line can also be annotated, with some
   other BKI property macros described in <filename>genbki.h</filename>, to
   define other properties of the catalog as a whole, such as whether
   it is a shared relation.
  </para>

  <para>
   The system catalog cache code (and most catalog-munging code in general)
   assumes that the fixed-length portions of all system catalog tuples are
   in fact present, because it maps this C struct declaration onto them.
   Thus, all variable-length fields and nullable fields must be placed at
   the end, and they cannot be accessed as struct fields.
   For example, if you tried to
   set <structname>pg_type</structname>.<structfield>typrelid</structfield>
   to be NULL, it would fail when some piece of code tried to reference
   <literal>typetup-&gt;typrelid</literal> (or worse,
   <literal>typetup-&gt;typelem</literal>, because that follows
   <structfield>typrelid</structfield>).  This would result in
   random errors or even segmentation violations.
  </para>

  <para>
   As a partial guard against this type of error, variable-length or
   nullable fields should not be made directly visible to the C compiler.
   This is accomplished by wrapping them in <literal>#ifdef
   CATALOG_VARLEN</literal> ... <literal>#endif</literal> (where
   <literal>CATALOG_VARLEN</literal> is a symbol that is never defined).
   This prevents C code from carelessly trying to access fields that might
   not be there or might be at some other offset.
   As an independent guard against creating incorrect rows, we
   require all columns that should be non-nullable to be marked so
   in <structname>pg_attribute</structname>.  The bootstrap code will
   automatically mark catalog columns as <literal>NOT NULL</literal>
   if they are fixed-width and are not preceded by any nullable or
   variable-width column.
   Where this rule is inadequate, you can force correct marking by using
   <literal>BKI_FORCE_NOT_NULL</literal>
   and <literal>BKI_FORCE_NULL</literal> annotations as needed.
  </para>

  <para>
   Frontend code should not include any <filename>pg_xxx.h</filename>
   catalog header file, as these files may contain C code that won't compile
   outside the backend.  (Typically, that happens because these files also
   contain declarations for functions
   in <filename>src/backend/catalog/</filename> files.)
   Instead, frontend code may include the corresponding
   generated <filename>pg_xxx_d.h</filename> header, which will contain
   OID <literal>#define</literal>s and any other data that might be of use
   on the client side.  If you want macros or other code in a catalog header
   to be visible to frontend code, write <literal>#ifdef
   EXPOSE_TO_CLIENT_CODE</literal> ... <literal>#endif</literal> around that
   section to instruct <filename>genbki.pl</filename> to copy that section
   to the <filename>pg_xxx_d.h</filename> header.
  </para>

  <para>
   A few of the catalogs are so fundamental that they can't even be created
   by the <acronym>BKI</acronym> <literal>create</literal> command that's
   used for most catalogs, because that command needs to write information
   into these catalogs to describe the new catalog.  These are
   called <firstterm>bootstrap</firstterm> catalogs, and defining one takes
   a lot of extra work: you have to manually prepare appropriate entries for
   them in the pre-loaded contents of <structname>pg_class</structname>
   and <structname>pg_type</structname>, and those entries will need to be
   updated for subsequent changes to the catalog's structure.
   (Bootstrap catalogs also need pre-loaded entries
   in <structname>pg_attribute</structname>, but
   fortunately <filename>genbki.pl</filename> handles that chore nowadays.)
   Avoid making new catalogs be bootstrap catalogs if at all possible.
  </para>
 </sect1>

 <sect1 id="system-catalog-initial-data">
  <title>System Catalog Initial Data</title>

  <para>
   Each catalog that has any manually-created initial data (some do not)
   has a corresponding <literal>.dat</literal> file that contains its
   initial data in an editable format.
  </para>

  <sect2 id="system-catalog-initial-data-format">
   <title>Data File Format</title>

   <para>
    Each <literal>.dat</literal> file contains Perl data structure literals
    that are simply eval'd to produce an in-memory data structure consisting
    of an array of hash references, one per catalog row.
    A slightly modified excerpt from <filename>pg_database.dat</filename>
    will demonstrate the key features:
   </para>

<!-- The "slight modification" is the apostrophe in the description. -->
<programlisting><![CDATA[
[

# A comment could appear here.
{ oid => '1', oid_symbol => 'Template1DbOid',
  descr => 'database\'s default template',
  datname => 'template1', encoding => 'ENCODING',
  datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
  datallowconn => 't', datconnlimit => '-1', datfrozenxid => '0',
  datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
  datctype => 'LC_CTYPE', daticulocale => 'ICU_LOCALE', datacl => '_null_' },

]
]]></programlisting>

   <para>
    Points to note:
   </para>

   <itemizedlist>

    <listitem>
     <para>
      The overall file layout is: open square bracket, one or more sets of
      curly braces each of which represents a catalog row, close square
      bracket.  Write a comma after each closing curly brace.
     </para>
    </listitem>

    <listitem>
     <para>
      Within each catalog row, write comma-separated
      <replaceable>key</replaceable> <literal>=&gt;</literal>
      <replaceable>value</replaceable> pairs.  The
      allowed <replaceable>key</replaceable>s are the names of the catalog's
      columns, plus the metadata keys <literal>oid</literal>,
      <literal>oid_symbol</literal>,
      <literal>array_type_oid</literal>, and <literal>descr</literal>.
      (The use of <literal>oid</literal> and <literal>oid_symbol</literal>
      is described in <xref linkend="system-catalog-oid-assignment"/> below,
      while <literal>array_type_oid</literal> is described in
      <xref linkend="system-catalog-auto-array-types"/>.
      <literal>descr</literal> supplies a description string for the object,
      which will be inserted into <structname>pg_description</structname>
      or <structname>pg_shdescription</structname> as appropriate.)
      While the metadata keys are optional, the catalog's defined columns
      must all be provided, except when the catalog's <literal>.h</literal>
      file specifies a default value for the column.
      (In the example above, the <structfield>datdba</structfield> field has
      been omitted because <filename>pg_database.h</filename> supplies a
      suitable default value for it.)
     </para>
    </listitem>

    <listitem>
     <para>
      All values must be single-quoted.  Escape single quotes used within a
      value with a backslash.  Backslashes meant as data can, but need not,
      be doubled; this follows Perl's rules for simple quoted literals.
      Note that backslashes appearing as data will be treated as escapes by
      the bootstrap scanner, according to the same rules as for escape string
      constants (see <xref linkend="sql-syntax-strings-escape"/>); for
      example <literal>\t</literal> converts to a tab character.  If you
      actually want a backslash in the final value, you will need to write
      four of them: Perl strips two, leaving <literal>\\</literal> for the
      bootstrap scanner to see.
     </para>
    </listitem>

    <listitem>
     <para>
      Null values are represented by <literal>_null_</literal>.
      (Note that there is no way to create a value that is just that
      string.)
     </para>
    </listitem>

    <listitem>
     <para>
      Comments are preceded by <literal>#</literal>, and must be on their
      own lines.
     </para>
    </listitem>

    <listitem>
     <para>
      Field values that are OIDs of other catalog entries should be
      represented by symbolic names rather than actual numeric OIDs.
      (In the example above, <structfield>dattablespace</structfield>
      contains such a reference.)
      This is described in <xref linkend="system-catalog-oid-references"/>
      below.
     </para>
    </listitem>

    <listitem>
     <para>
      Since hashes are unordered data structures, field order and line
      layout aren't semantically significant.  However, to maintain a
      consistent appearance, we set a few rules that are applied by the
      formatting script <filename>reformat_dat_file.pl</filename>:

      <itemizedlist>

       <listitem>
        <para>
         Within each pair of curly braces, the metadata
         fields <literal>oid</literal>, <literal>oid_symbol</literal>,
         <literal>array_type_oid</literal>, and <literal>descr</literal>
         (if present) come first, in that order, then the catalog's own
         fields appear in their defined order.
        </para>
       </listitem>

       <listitem>
        <para>
         Newlines are inserted between fields as needed to limit line length
         to 80 characters, if possible.  A newline is also inserted between
         the metadata fields and the regular fields.
        </para>
       </listitem>

       <listitem>
        <para>
         If the catalog's <literal>.h</literal> file specifies a default
         value for a column, and a data entry has that same
         value, <filename>reformat_dat_file.pl</filename> will omit it from
         the data file.  This keeps the data representation compact.
        </para>
       </listitem>

       <listitem>
        <para>
         <filename>reformat_dat_file.pl</filename> preserves blank lines
         and comment lines as-is.
        </para>
       </listitem>

      </itemizedlist>

      It's recommended to run <filename>reformat_dat_file.pl</filename>
      before submitting catalog data patches.  For convenience, you can
      simply change to <filename>src/include/catalog/</filename> and
      run <literal>make reformat-dat-files</literal>.
     </para>
    </listitem>

    <listitem>
     <para>
      If you want to add a new method of making the data representation
      smaller, you must implement it
      in <filename>reformat_dat_file.pl</filename> and also
      teach <function>Catalog::ParseData()</function> how to expand the
      data back into the full representation.
     </para>
    </listitem>

   </itemizedlist>
  </sect2>

  <sect2 id="system-catalog-oid-assignment">
   <title>OID Assignment</title>

   <para>
    A catalog row appearing in the initial data can be given a
    manually-assigned OID by writing an <literal>oid
    =&gt; <replaceable>nnnn</replaceable></literal> metadata field.
    Furthermore, if an OID is assigned, a C macro for that OID can be
    created by writing an <literal>oid_symbol
    =&gt; <replaceable>name</replaceable></literal> metadata field.
   </para>

   <para>
    Pre-loaded catalog rows must have preassigned OIDs if there are OID
    references to them in other pre-loaded rows.  A preassigned OID is
    also needed if the row's OID must be referenced from C code.
    If neither case applies, the <literal>oid</literal> metadata field can
    be omitted, in which case the bootstrap code assigns an OID
    automatically.
    In practice we usually preassign OIDs for all or none of the pre-loaded
    rows in a given catalog, even if only some of them are actually
    cross-referenced.
   </para>

   <para>
    Writing the actual numeric value of any OID in C code is considered
    very bad form; always use a macro, instead.  Direct references
    to <structname>pg_proc</structname> OIDs are common enough that there's
    a special mechanism to create the necessary macros automatically;
    see <filename>src/backend/utils/Gen_fmgrtab.pl</filename>.  Similarly
    &mdash; but, for historical reasons, not done the same way &mdash;
    there's an automatic method for creating macros
    for <structname>pg_type</structname>
    OIDs.  <literal>oid_symbol</literal> entries are therefore not
    necessary in those two catalogs.  Likewise, macros for
    the <structname>pg_class</structname> OIDs of system catalogs and
    indexes are set up automatically.  For all other system catalogs, you
    have to manually specify any macros you need
    via <literal>oid_symbol</literal> entries.
   </para>

   <para>
    To find an available OID for a new pre-loaded row, run the
    script <filename>src/include/catalog/unused_oids</filename>.
    It prints inclusive ranges of unused OIDs (e.g., the output
    line <literal>45-900</literal> means OIDs 45 through 900 have not been
    allocated yet).  Currently, OIDs 1&ndash;9999 are reserved for manual
    assignment; the <filename>unused_oids</filename> script simply looks
    through the catalog headers and <filename>.dat</filename> files
    to see which ones do not appear.  You can also use
    the <filename>duplicate_oids</filename> script to check for mistakes.
    (<filename>genbki.pl</filename> will assign OIDs for any rows that
    didn't get one hand-assigned to them, and it will also detect duplicate
    OIDs at compile time.)
   </para>

   <para>
    When choosing OIDs for a patch that is not expected to be committed
    immediately, best practice is to use a group of more-or-less
    consecutive OIDs starting with some random choice in the range
    8000&mdash;9999.  This minimizes the risk of OID collisions with other
    patches being developed concurrently.  To keep the 8000&mdash;9999
    range free for development purposes, after a patch has been committed
    to the master git repository its OIDs should be renumbered into
    available space below that range.  Typically, this will be done
    near the end of each development cycle, moving all OIDs consumed by
    patches committed in that cycle at the same time.  The script
    <filename>renumber_oids.pl</filename> can be used for this purpose.
    If an uncommitted patch is found to have OID conflicts with some
    recently-committed patch, <filename>renumber_oids.pl</filename> may
    also be useful for recovering from that situation.
   </para>

   <para>
    Because of this convention of possibly renumbering OIDs assigned by
    patches, the OIDs assigned by a patch should not be considered stable
    until the patch has been included in an official release.  We do not
    change manually-assigned object OIDs once released, however, as that
    would create assorted compatibility problems.
   </para>

   <para>
    If <filename>genbki.pl</filename> needs to assign an OID to a catalog
    entry that does not have a manually-assigned OID, it will use a value in
    the range 10000&mdash;11999.  The server's OID counter is set to 10000
    at the start of a bootstrap run, so that any objects created on-the-fly
    during bootstrap processing also receive OIDs in this range.  (The
    usual OID assignment mechanism takes care of preventing any conflicts.)
   </para>

   <para>
    Objects with OIDs below <symbol>FirstUnpinnedObjectId</symbol> (12000)
    are considered <quote>pinned</quote>, preventing them from being
    deleted.  (There are a small number of exceptions, which are
    hard-wired into <function>IsPinnedObject()</function>.)
    <application>initdb</application> forces the OID counter up
    to <symbol>FirstUnpinnedObjectId</symbol> as soon as it's ready to
    create unpinned objects.  Thus objects created during the later phases
    of <application>initdb</application>, such as objects created while
    running the <filename>information_schema.sql</filename> script, will
    not be pinned, while all objects known
    to <filename>genbki.pl</filename> will be.
   </para>

   <para>
    OIDs assigned during normal database operation are constrained to be
    16384 or higher.  This ensures that the range 10000&mdash;16383 is free
    for OIDs assigned automatically by <filename>genbki.pl</filename> or
    during <application>initdb</application>.  These
    automatically-assigned OIDs are not considered stable, and may change
    from one installation to another.
   </para>
  </sect2>

  <sect2 id="system-catalog-oid-references">
   <title>OID Reference Lookup</title>

   <para>
    In principle, cross-references from one initial catalog row to another
    could be written just by writing the preassigned OID of the referenced
    row in the referencing field.  However, that is against project
    policy, because it is error-prone, hard to read, and subject to
    breakage if a newly-assigned OID is renumbered.  Therefore
    <filename>genbki.pl</filename> provides mechanisms to write
    symbolic references instead.
    The rules are as follows:
   </para>

   <itemizedlist>

    <listitem>
     <para>
      Use of symbolic references is enabled in a particular catalog column
      by attaching <literal>BKI_LOOKUP(<replaceable>lookuprule</replaceable>)</literal>
      to the column's definition, where <replaceable>lookuprule</replaceable>
      is the name of the referenced catalog, e.g., <literal>pg_proc</literal>.
      <literal>BKI_LOOKUP</literal> can be attached to columns of
      type <type>Oid</type>, <type>regproc</type>, <type>oidvector</type>,
      or <type>Oid[]</type>; in the latter two cases it implies performing a
      lookup on each element of the array.
     </para>
    </listitem>

    <listitem>
     <para>
      It's also permissible to attach <literal>BKI_LOOKUP(encoding)</literal>
      to integer columns to reference character set encodings, which are
      not currently represented as catalog OIDs, but have a set of values
      known to <filename>genbki.pl</filename>.
     </para>
    </listitem>

    <listitem>
     <para>
      In some catalog columns, it's allowed for entries to be zero instead
      of a valid reference.  If this is allowed, write
      <literal>BKI_LOOKUP_OPT</literal> instead
      of <literal>BKI_LOOKUP</literal>.  Then you can
      write <literal>0</literal> for an entry.  (If the column is
      declared <type>regproc</type>, you can optionally
      write <literal>-</literal> instead of <literal>0</literal>.)
      Except for this special case, all entries in
      a <literal>BKI_LOOKUP</literal> column must be symbolic references.
      <filename>genbki.pl</filename> will warn about unrecognized names.
     </para>
    </listitem>

    <listitem>
     <para>
      Most kinds of catalog objects are simply referenced by their names.
      Note that type names must exactly match the
      referenced <structname>pg_type</structname>
      entry's <structfield>typname</structfield>; you do not get to use
      any aliases such as <literal>integer</literal>
      for <literal>int4</literal>.
     </para>
    </listitem>

    <listitem>
     <para>
      A function can be represented by
      its <structfield>proname</structfield>, if that is unique among
      the <filename>pg_proc.dat</filename> entries (this works like regproc
      input).  Otherwise, write it
      as <replaceable>proname(argtypename,argtypename,...)</replaceable>,
      like regprocedure.  The argument type names must be spelled exactly as
      they are in the <filename>pg_proc.dat</filename> entry's
      <structfield>proargtypes</structfield> field.  Do not insert any
      spaces.
     </para>
    </listitem>

    <listitem>
     <para>
      Operators are represented
      by <replaceable>oprname(lefttype,righttype)</replaceable>,
      writing the type names exactly as they appear in
      the <filename>pg_operator.dat</filename>
      entry's <structfield>oprleft</structfield>
      and <structfield>oprright</structfield> fields.
      (Write <literal>0</literal> for the omitted operand of a unary
      operator.)
     </para>
    </listitem>

    <listitem>
     <para>
      The names of opclasses and opfamilies are only unique within an
      access method, so they are represented
      by <replaceable>access_method_name</replaceable><literal>/</literal><replaceable>object_name</replaceable>.
     </para>
    </listitem>

    <listitem>
     <para>
      In none of these cases is there any provision for
      schema-qualification; all objects created during bootstrap are
      expected to be in the <literal>pg_catalog</literal> schema.
     </para>
    </listitem>
   </itemizedlist>

   <para>
    <filename>genbki.pl</filename> resolves all symbolic references while it
    runs, and puts simple numeric OIDs into the emitted BKI file.  There is
    therefore no need for the bootstrap backend to deal with symbolic
    references.
   </para>

   <para>
    It's desirable to mark OID reference columns
    with <literal>BKI_LOOKUP</literal> or <literal>BKI_LOOKUP_OPT</literal>
    even if the catalog has no initial data that requires lookup.  This
    allows <filename>genbki.pl</filename> to record the foreign key
    relationships that exist in the system catalogs.  That information is
    used in the regression tests to check for incorrect entries.  See also
    the macros <literal>DECLARE_FOREIGN_KEY</literal>,
    <literal>DECLARE_FOREIGN_KEY_OPT</literal>,
    <literal>DECLARE_ARRAY_FOREIGN_KEY</literal>,
    and <literal>DECLARE_ARRAY_FOREIGN_KEY_OPT</literal>, which are
    used to declare foreign key relationships that are too complex
    for <literal>BKI_LOOKUP</literal> (typically, multi-column foreign
    keys).
   </para>
  </sect2>

  <sect2 id="system-catalog-auto-array-types">
   <title>Automatic Creation of Array Types</title>

   <para>
    Most scalar data types should have a corresponding array type (that is,
    a standard varlena array type whose element type is the scalar type, and
    which is referenced by the <structfield>typarray</structfield> field of
    the scalar type's <structname>pg_type</structname>
    entry).  <filename>genbki.pl</filename> is able to generate
    the <structname>pg_type</structname> entry for the array type
    automatically in most cases.
   </para>

   <para>
    To use this facility, just write an <literal>array_type_oid
    =&gt; <replaceable>nnnn</replaceable></literal> metadata field in the
    scalar type's <structname>pg_type</structname> entry, specifying the OID
    to use for the array type.  You may then omit
    the <structfield>typarray</structfield> field, since it will be filled
    automatically with that OID.
   </para>

   <para>
    The generated array type's name is the scalar type's name with an
    underscore prepended.  The array entry's other fields are filled from
    <literal>BKI_ARRAY_DEFAULT(<replaceable>value</replaceable>)</literal>
    annotations in <filename>pg_type.h</filename>, or if there isn't one,
    copied from the scalar type.  (There's also a special case
    for <structfield>typalign</structfield>.)  Then
    the <structfield>typelem</structfield>
    and <structfield>typarray</structfield> fields of the two entries are
    set to cross-reference each other.
   </para>
  </sect2>

  <sect2 id="system-catalog-recipes">
   <title>Recipes for Editing Data Files</title>

   <para>
    Here are some suggestions about the easiest ways to perform common tasks
    when updating catalog data files.
   </para>

   <formalpara>
    <title>Add a new column with a default to a catalog:</title>
    <para>
     Add the column to the header file with
     a <literal>BKI_DEFAULT(<replaceable>value</replaceable>)</literal>
     annotation.  The data file need only be adjusted by adding the field
     in existing rows where a non-default value is needed.
    </para>
   </formalpara>

   <formalpara>
    <title>Add a default value to an existing column that doesn't have
     one:</title>
    <para>
     Add a <literal>BKI_DEFAULT</literal> annotation to the header file,
     then run <literal>make reformat-dat-files</literal> to remove
     now-redundant field entries.
    </para>
   </formalpara>

   <formalpara>
    <title>Remove a column, whether it has a default or not:</title>
    <para>
     Remove the column from the header, then run <literal>make
     reformat-dat-files</literal> to remove now-useless field entries.
    </para>
   </formalpara>

   <formalpara>
    <title>Change or remove an existing default value:</title>
    <para>
     You cannot simply change the header file, since that will cause the
     current data to be interpreted incorrectly.  First run <literal>make
     expand-dat-files</literal> to rewrite the data files with all
     default values inserted explicitly, then change or remove
     the <literal>BKI_DEFAULT</literal> annotation, then run <literal>make
     reformat-dat-files</literal> to remove superfluous fields again.
    </para>
   </formalpara>

   <formalpara>
    <title>Ad-hoc bulk editing:</title>
    <para>
     <filename>reformat_dat_file.pl</filename> can be adapted to perform
     many kinds of bulk changes.  Look for its block comments showing where
     one-off code can be inserted.  In the following example, we are going
     to consolidate two Boolean fields in <structname>pg_proc</structname>
     into a char field:

     <orderedlist>
      <listitem>
       <para>
        Add the new column, with a default,
        to <filename>pg_proc.h</filename>:
<programlisting>
+    /* see PROKIND_ categories below */
+    char        prokind BKI_DEFAULT(f);
</programlisting>
       </para>
      </listitem>

      <listitem>
       <para>
        Create a new script based on <filename>reformat_dat_file.pl</filename>
        to insert appropriate values on-the-fly:
<programlisting>
-           # At this point we have the full row in memory as a hash
-           # and can do any operations we want. As written, it only
-           # removes default values, but this script can be adapted to
-           # do one-off bulk-editing.
+           # One-off change to migrate to prokind
+           # Default has already been filled in by now, so change to other
+           # values as appropriate
+           if ($values{proisagg} eq 't')
+           {
+               $values{prokind} = 'a';
+           }
+           elsif ($values{proiswindow} eq 't')
+           {
+               $values{prokind} = 'w';
+           }
</programlisting>
       </para>
      </listitem>

      <listitem>
       <para>
        Run the new script:
<programlisting>
$ cd src/include/catalog
$ perl  rewrite_dat_with_prokind.pl  pg_proc.dat
</programlisting>
        At this point <filename>pg_proc.dat</filename> has all three
        columns, <structfield>prokind</structfield>,
        <structfield>proisagg</structfield>,
        and <structfield>proiswindow</structfield>, though they will appear
        only in rows where they have non-default values.
       </para>
      </listitem>

      <listitem>
       <para>
        Remove the old columns from <filename>pg_proc.h</filename>:
<programlisting>
-    /* is it an aggregate? */
-    bool        proisagg BKI_DEFAULT(f);
-
-    /* is it a window function? */
-    bool        proiswindow BKI_DEFAULT(f);
</programlisting>
       </para>
      </listitem>

      <listitem>
       <para>
        Finally, run <literal>make reformat-dat-files</literal> to remove
        the useless old entries from <filename>pg_proc.dat</filename>.
       </para>
      </listitem>
     </orderedlist>

     For further examples of scripts used for bulk editing, see
     <filename>convert_oid2name.pl</filename>
     and <filename>remove_pg_type_oid_symbols.pl</filename> attached to this
     message:
     <ulink url="https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tNcCT5D0O3ShfnM6jehA@mail.gmail.com"></ulink>
    </para>
   </formalpara>
  </sect2>
 </sect1>

 <sect1 id="bki-format">
  <title><acronym>BKI</acronym> File Format</title>

  <para>
   This section describes how the <productname>PostgreSQL</productname>
   backend interprets <acronym>BKI</acronym> files.  This description
   will be easier to understand if the <filename>postgres.bki</filename>
   file is at hand as an example.
  </para>

  <para>
   <acronym>BKI</acronym> input consists of a sequence of commands.  Commands are made up
   of a number of tokens, depending on the syntax of the command.
   Tokens are usually separated by whitespace, but need not be if
   there is no ambiguity.  There is no special command separator; the
   next token that syntactically cannot belong to the preceding
   command starts a new one.  (Usually you would put a new command on
   a new line, for clarity.)  Tokens can be certain key words, special
   characters (parentheses, commas, etc.), identifiers, numbers, or
   single-quoted strings.  Everything is case sensitive.
  </para>

  <para>
   Lines starting with <literal>#</literal> are ignored.
  </para>

 </sect1>

 <sect1 id="bki-commands">
  <title><acronym>BKI</acronym> Commands</title>

  <variablelist>
   <varlistentry>
    <term>
     <literal>create</literal>
     <replaceable class="parameter">tablename</replaceable>
     <replaceable class="parameter">tableoid</replaceable>
     <optional><literal>bootstrap</literal></optional>
     <optional><literal>shared_relation</literal></optional>
     <optional><literal>rowtype_oid</literal> <replaceable>oid</replaceable></optional>
     (<replaceable class="parameter">name1</replaceable> =
     <replaceable class="parameter">type1</replaceable>
     <optional><literal>FORCE NOT NULL</literal> | <literal>FORCE NULL</literal> </optional> <optional>,
     <replaceable class="parameter">name2</replaceable> =
     <replaceable class="parameter">type2</replaceable>
     <optional><literal>FORCE NOT NULL</literal> | <literal>FORCE NULL</literal> </optional>,
     ...</optional>)
    </term>

    <listitem>
     <para>
      Create a table named <replaceable
      class="parameter">tablename</replaceable>, and having the OID
      <replaceable class="parameter">tableoid</replaceable>,
      with the columns given in parentheses.
     </para>

     <para>
      The following column types are supported directly by
      <filename>bootstrap.c</filename>: <type>bool</type>,
      <type>bytea</type>, <type>char</type> (1 byte),
      <type>name</type>, <type>int2</type>,
      <type>int4</type>, <type>regproc</type>, <type>regclass</type>,
      <type>regtype</type>, <type>text</type>,
      <type>oid</type>, <type>tid</type>, <type>xid</type>,
      <type>cid</type>, <type>int2vector</type>, <type>oidvector</type>,
      <type>_int4</type> (array), <type>_text</type> (array),
      <type>_oid</type> (array), <type>_char</type> (array),
      <type>_aclitem</type> (array).  Although it is possible to create
      tables containing columns of other types, this cannot be done until
      after <structname>pg_type</structname> has been created and filled with
      appropriate entries.  (That effectively means that only these
      column types can be used in bootstrap catalogs, but non-bootstrap
      catalogs can contain any built-in type.)
     </para>

     <para>
      When <literal>bootstrap</literal> is specified,
      the table will only be created on disk; nothing is entered into
      <structname>pg_class</structname>,
      <structname>pg_attribute</structname>, etc., for it.  Thus the
      table will not be accessible by ordinary SQL operations until
      such entries are made the hard way (with <literal>insert</literal>
      commands).  This option is used for creating
      <structname>pg_class</structname> etc. themselves.
     </para>

     <para>
      The table is created as shared if <literal>shared_relation</literal> is
      specified.
      The table's row type OID (<structname>pg_type</structname> OID) can optionally
      be specified via the <literal>rowtype_oid</literal> clause; if not specified,
      an OID is automatically generated for it.  (The <literal>rowtype_oid</literal>
      clause is useless if <literal>bootstrap</literal> is specified, but it can be
      provided anyway for documentation.)
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>open</literal> <replaceable class="parameter">tablename</replaceable>
    </term>

    <listitem>
     <para>
      Open the table named
      <replaceable class="parameter">tablename</replaceable>
      for insertion of data.  Any currently open table is closed.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>close</literal> <replaceable class="parameter">tablename</replaceable>
    </term>

    <listitem>
     <para>
      Close the open table.  The name of the table must be given as a
      cross-check.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>insert</literal> <literal>(</literal> <optional><replaceable class="parameter">oid_value</replaceable></optional> <replaceable class="parameter">value1</replaceable> <replaceable class="parameter">value2</replaceable> ... <literal>)</literal>
    </term>

    <listitem>
     <para>
      Insert a new row into the open table using <replaceable
      class="parameter">value1</replaceable>, <replaceable
      class="parameter">value2</replaceable>, etc., for its column
      values.
     </para>

     <para>
      NULL values can be specified using the special key word
      <literal>_null_</literal>.  Values that do not look like
      identifiers or digit strings must be single-quoted.
      (To include a single quote in a value, write it twice.
      Escape-string-style backslash escapes are allowed in the string, too.)
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>declare</literal> <optional><literal>unique</literal></optional>
     <literal>index</literal> <replaceable class="parameter">indexname</replaceable>
     <replaceable class="parameter">indexoid</replaceable>
     <literal>on</literal> <replaceable class="parameter">tablename</replaceable>
     <literal>using</literal> <replaceable class="parameter">amname</replaceable>
     <literal>(</literal> <replaceable class="parameter">opclass1</replaceable>
     <replaceable class="parameter">name1</replaceable>
     <optional>, ...</optional> <literal>)</literal>
    </term>

    <listitem>
     <para>
      Create an index named <replaceable
      class="parameter">indexname</replaceable>, having OID
      <replaceable class="parameter">indexoid</replaceable>,
      on the table named
      <replaceable class="parameter">tablename</replaceable>, using the
      <replaceable class="parameter">amname</replaceable> access
      method.  The fields to index are called <replaceable
      class="parameter">name1</replaceable>, <replaceable
      class="parameter">name2</replaceable> etc., and the operator
      classes to use are <replaceable
      class="parameter">opclass1</replaceable>, <replaceable
      class="parameter">opclass2</replaceable> etc., respectively.
      The index file is created and appropriate catalog entries are
      made for it, but the index contents are not initialized by this command.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>declare toast</literal>
     <replaceable class="parameter">toasttableoid</replaceable>
     <replaceable class="parameter">toastindexoid</replaceable>
     <literal>on</literal> <replaceable class="parameter">tablename</replaceable>
    </term>

    <listitem>
     <para>
      Create a TOAST table for the table named
      <replaceable class="parameter">tablename</replaceable>.
      The TOAST table is assigned OID
      <replaceable class="parameter">toasttableoid</replaceable>
      and its index is assigned OID
      <replaceable class="parameter">toastindexoid</replaceable>.
      As with <literal>declare index</literal>, filling of the index
      is postponed.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><literal>build indices</literal></term>

    <listitem>
     <para>
      Fill in the indices that have previously been declared.
     </para>
    </listitem>
   </varlistentry>
  </variablelist>

 </sect1>

 <sect1 id="bki-structure">
  <title>Structure of the Bootstrap <acronym>BKI</acronym> File</title>

  <para>
   The <literal>open</literal> command cannot be used until the tables it uses
   exist and have entries for the table that is to be opened.
   (These minimum tables are <structname>pg_class</structname>,
   <structname>pg_attribute</structname>, <structname>pg_proc</structname>, and
   <structname>pg_type</structname>.)   To allow those tables themselves to be filled,
   <literal>create</literal> with the <literal>bootstrap</literal> option implicitly opens
   the created table for data insertion.
  </para>

  <para>
   Also, the <literal>declare index</literal> and <literal>declare toast</literal>
   commands cannot be used until the system catalogs they need have been
   created and filled in.
  </para>

  <para>
   Thus, the structure of the <filename>postgres.bki</filename> file has to
   be:
   <orderedlist>
    <listitem>
     <para>
      <literal>create bootstrap</literal> one of the critical tables
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>insert</literal> data describing at least the critical tables
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>close</literal>
     </para>
    </listitem>
    <listitem>
     <para>
      Repeat for the other critical tables.
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>create</literal> (without <literal>bootstrap</literal>) a noncritical table
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>open</literal>
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>insert</literal> desired data
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>close</literal>
     </para>
    </listitem>
    <listitem>
     <para>
      Repeat for the other noncritical tables.
     </para>
    </listitem>
    <listitem>
     <para>
      Define indexes and toast tables.
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>build indices</literal>
     </para>
    </listitem>
   </orderedlist>
  </para>

  <para>
   There are doubtless other, undocumented ordering dependencies.
  </para>
 </sect1>

 <sect1 id="bki-example">
  <title>BKI Example</title>

  <para>
   The following sequence of commands will create the table
   <literal>test_table</literal> with OID 420, having three columns
   <literal>oid</literal>, <literal>cola</literal> and <literal>colb</literal>
   of type <type>oid</type>, <type>int4</type> and <type>text</type>,
   respectively, and insert two rows into the table:
<programlisting>
create test_table 420 (oid = oid, cola = int4, colb = text)
open test_table
insert ( 421 1 'value 1' )
insert ( 422 2 _null_ )
close test_table
</programlisting>
  </para>
 </sect1>
</chapter>