summaryrefslogtreecommitdiffstats
path: root/src/encoding/gob/doc.go
diff options
context:
space:
mode:
Diffstat (limited to 'src/encoding/gob/doc.go')
-rw-r--r--src/encoding/gob/doc.go423
1 files changed, 423 insertions, 0 deletions
diff --git a/src/encoding/gob/doc.go b/src/encoding/gob/doc.go
new file mode 100644
index 0000000..3f26ed8
--- /dev/null
+++ b/src/encoding/gob/doc.go
@@ -0,0 +1,423 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+Package gob manages streams of gobs - binary values exchanged between an
+[Encoder] (transmitter) and a [Decoder] (receiver). A typical use is transporting
+arguments and results of remote procedure calls (RPCs) such as those provided by
+[net/rpc].
+
+The implementation compiles a custom codec for each data type in the stream and
+is most efficient when a single [Encoder] is used to transmit a stream of values,
+amortizing the cost of compilation.
+
+# Basics
+
+A stream of gobs is self-describing. Each data item in the stream is preceded by
+a specification of its type, expressed in terms of a small set of predefined
+types. Pointers are not transmitted, but the things they point to are
+transmitted; that is, the values are flattened. Nil pointers are not permitted,
+as they have no value. Recursive types work fine, but
+recursive values (data with cycles) are problematic. This may change.
+
+To use gobs, create an [Encoder] and present it with a series of data items as
+values or addresses that can be dereferenced to values. The [Encoder] makes sure
+all type information is sent before it is needed. At the receive side, a
+[Decoder] retrieves values from the encoded stream and unpacks them into local
+variables.
+
+# Types and Values
+
+The source and destination values/types need not correspond exactly. For structs,
+fields (identified by name) that are in the source but absent from the receiving
+variable will be ignored. Fields that are in the receiving variable but missing
+from the transmitted type or value will be ignored in the destination. If a field
+with the same name is present in both, their types must be compatible. Both the
+receiver and transmitter will do all necessary indirection and dereferencing to
+convert between gobs and actual Go values. For instance, a gob type that is
+schematically,
+
+ struct { A, B int }
+
+can be sent from or received into any of these Go types:
+
+ struct { A, B int } // the same
+ *struct { A, B int } // extra indirection of the struct
+ struct { *A, **B int } // extra indirection of the fields
+ struct { A, B int64 } // different concrete value type; see below
+
+It may also be received into any of these:
+
+ struct { A, B int } // the same
+ struct { B, A int } // ordering doesn't matter; matching is by name
+ struct { A, B, C int } // extra field (C) ignored
+ struct { B int } // missing field (A) ignored; data will be dropped
+ struct { B, C int } // missing field (A) ignored; extra field (C) ignored.
+
+Attempting to receive into these types will draw a decode error:
+
+ struct { A int; B uint } // change of signedness for B
+ struct { A int; B float } // change of type for B
+ struct { } // no field names in common
+ struct { C, D int } // no field names in common
+
+Integers are transmitted two ways: arbitrary precision signed integers or
+arbitrary precision unsigned integers. There is no int8, int16 etc.
+discrimination in the gob format; there are only signed and unsigned integers. As
+described below, the transmitter sends the value in a variable-length encoding;
+the receiver accepts the value and stores it in the destination variable.
+Floating-point numbers are always sent using IEEE-754 64-bit precision (see
+below).
+
+Signed integers may be received into any signed integer variable: int, int16, etc.;
+unsigned integers may be received into any unsigned integer variable; and floating
+point values may be received into any floating point variable. However,
+the destination variable must be able to represent the value or the decode
+operation will fail.
+
+Structs, arrays and slices are also supported. Structs encode and decode only
+exported fields. Strings and arrays of bytes are supported with a special,
+efficient representation (see below). When a slice is decoded, if the existing
+slice has capacity the slice will be extended in place; if not, a new array is
+allocated. Regardless, the length of the resulting slice reports the number of
+elements decoded.
+
+In general, if allocation is required, the decoder will allocate memory. If not,
+it will update the destination variables with values read from the stream. It does
+not initialize them first, so if the destination is a compound value such as a
+map, struct, or slice, the decoded values will be merged elementwise into the
+existing variables.
+
+Functions and channels will not be sent in a gob. Attempting to encode such a value
+at the top level will fail. A struct field of chan or func type is treated exactly
+like an unexported field and is ignored.
+
+Gob can encode a value of any type implementing the [GobEncoder] or
+[encoding.BinaryMarshaler] interfaces by calling the corresponding method,
+in that order of preference.
+
+Gob can decode a value of any type implementing the [GobDecoder] or
+[encoding.BinaryUnmarshaler] interfaces by calling the corresponding method,
+again in that order of preference.
+
+# Encoding Details
+
+This section documents the encoding, details that are not important for most
+users. Details are presented bottom-up.
+
+An unsigned integer is sent one of two ways. If it is less than 128, it is sent
+as a byte with that value. Otherwise it is sent as a minimal-length big-endian
+(high byte first) byte stream holding the value, preceded by one byte holding the
+byte count, negated. Thus 0 is transmitted as (00), 7 is transmitted as (07) and
+256 is transmitted as (FE 01 00).
+
+A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
+
+A signed integer, i, is encoded within an unsigned integer, u. Within u, bits 1
+upward contain the value; bit 0 says whether they should be complemented upon
+receipt. The encode algorithm looks like this:
+
+ var u uint
+ if i < 0 {
+ u = (^uint(i) << 1) | 1 // complement i, bit 0 is 1
+ } else {
+ u = (uint(i) << 1) // do not complement i, bit 0 is 0
+ }
+ encodeUnsigned(u)
+
+The low bit is therefore analogous to a sign bit, but making it the complement bit
+instead guarantees that the largest negative integer is not a special case. For
+example, -129=^128=(^256>>1) encodes as (FE 01 01).
+
+Floating-point numbers are always sent as a representation of a float64 value.
+That value is converted to a uint64 using [math.Float64bits]. The uint64 is then
+byte-reversed and sent as a regular unsigned integer. The byte-reversal means the
+exponent and high-precision part of the mantissa go first. Since the low bits are
+often zero, this can save encoding bytes. For instance, 17.0 is encoded in only
+three bytes (FE 31 40).
+
+Strings and slices of bytes are sent as an unsigned count followed by that many
+uninterpreted bytes of the value.
+
+All other slices and arrays are sent as an unsigned count followed by that many
+elements using the standard gob encoding for their type, recursively.
+
+Maps are sent as an unsigned count followed by that many key, element
+pairs. Empty but non-nil maps are sent, so if the receiver has not allocated
+one already, one will always be allocated on receipt unless the transmitted map
+is nil and not at the top level.
+
+In slices and arrays, as well as maps, all elements, even zero-valued elements,
+are transmitted, even if all the elements are zero.
+
+Structs are sent as a sequence of (field number, field value) pairs. The field
+value is sent using the standard gob encoding for its type, recursively. If a
+field has the zero value for its type (except for arrays; see above), it is omitted
+from the transmission. The field number is defined by the type of the encoded
+struct: the first field of the encoded type is field 0, the second is field 1,
+etc. When encoding a value, the field numbers are delta encoded for efficiency
+and the fields are always sent in order of increasing field number; the deltas are
+therefore unsigned. The initialization for the delta encoding sets the field
+number to -1, so an unsigned integer field 0 with value 7 is transmitted as unsigned
+delta = 1, unsigned value = 7 or (01 07). Finally, after all the fields have been
+sent a terminating mark denotes the end of the struct. That mark is a delta=0
+value, which has representation (00).
+
+Interface types are not checked for compatibility; all interface types are
+treated, for transmission, as members of a single "interface" type, analogous to
+int or []byte - in effect they're all treated as interface{}. Interface values
+are transmitted as a string identifying the concrete type being sent (a name
+that must be pre-defined by calling [Register]), followed by a byte count of the
+length of the following data (so the value can be skipped if it cannot be
+stored), followed by the usual encoding of concrete (dynamic) value stored in
+the interface value. (A nil interface value is identified by the empty string
+and transmits no value.) Upon receipt, the decoder verifies that the unpacked
+concrete item satisfies the interface of the receiving variable.
+
+If a value is passed to [Encoder.Encode] and the type is not a struct (or pointer to struct,
+etc.), for simplicity of processing it is represented as a struct of one field.
+The only visible effect of this is to encode a zero byte after the value, just as
+after the last field of an encoded struct, so that the decode algorithm knows when
+the top-level value is complete.
+
+The representation of types is described below. When a type is defined on a given
+connection between an [Encoder] and [Decoder], it is assigned a signed integer type
+id. When [Encoder.Encode](v) is called, it makes sure there is an id assigned for
+the type of v and all its elements and then it sends the pair (typeid, encoded-v)
+where typeid is the type id of the encoded type of v and encoded-v is the gob
+encoding of the value v.
+
+To define a type, the encoder chooses an unused, positive type id and sends the
+pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
+description, constructed from these types:
+
+ type wireType struct {
+ ArrayT *ArrayType
+ SliceT *SliceType
+ StructT *StructType
+ MapT *MapType
+ GobEncoderT *gobEncoderType
+ BinaryMarshalerT *gobEncoderType
+ TextMarshalerT *gobEncoderType
+
+ }
+ type arrayType struct {
+ CommonType
+ Elem typeId
+ Len int
+ }
+ type CommonType struct {
+ Name string // the name of the struct type
+ Id int // the id of the type, repeated so it's inside the type
+ }
+ type sliceType struct {
+ CommonType
+ Elem typeId
+ }
+ type structType struct {
+ CommonType
+ Field []*fieldType // the fields of the struct.
+ }
+ type fieldType struct {
+ Name string // the name of the field.
+ Id int // the type id of the field, which must be already defined
+ }
+ type mapType struct {
+ CommonType
+ Key typeId
+ Elem typeId
+ }
+ type gobEncoderType struct {
+ CommonType
+ }
+
+If there are nested type ids, the types for all inner type ids must be defined
+before the top-level type id is used to describe an encoded-v.
+
+For simplicity in setup, the connection is defined to understand these types a
+priori, as well as the basic gob types int, uint, etc. Their ids are:
+
+ bool 1
+ int 2
+ uint 3
+ float 4
+ []byte 5
+ string 6
+ complex 7
+ interface 8
+ // gap for reserved ids.
+ WireType 16
+ ArrayType 17
+ CommonType 18
+ SliceType 19
+ StructType 20
+ FieldType 21
+ // 22 is slice of fieldType.
+ MapType 23
+
+Finally, each message created by a call to Encode is preceded by an encoded
+unsigned integer count of the number of bytes remaining in the message. After
+the initial type name, interface values are wrapped the same way; in effect, the
+interface value acts like a recursive invocation of Encode.
+
+In summary, a gob stream looks like
+
+ (byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))*
+
+where * signifies zero or more repetitions and the type id of a value must
+be predefined or be defined before the value in the stream.
+
+Compatibility: Any future changes to the package will endeavor to maintain
+compatibility with streams encoded using previous versions. That is, any released
+version of this package should be able to decode data written with any previously
+released version, subject to issues such as security fixes. See the Go compatibility
+document for background: https://golang.org/doc/go1compat
+
+See "Gobs of data" for a design discussion of the gob wire format:
+https://blog.golang.org/gobs-of-data
+
+# Security
+
+This package is not designed to be hardened against adversarial inputs, and is
+outside the scope of https://go.dev/security/policy. In particular, the [Decoder]
+does only basic sanity checking on decoded input sizes, and its limits are not
+configurable. Care should be taken when decoding gob data from untrusted
+sources, which may consume significant resources.
+*/
+package gob
+
+/*
+Grammar:
+
+Tokens starting with a lower case letter are terminals; int(n)
+and uint(n) represent the signed/unsigned encodings of the value n.
+
+GobStream:
+ DelimitedMessage*
+DelimitedMessage:
+ uint(lengthOfMessage) Message
+Message:
+ TypeSequence TypedValue
+TypeSequence
+ (TypeDefinition DelimitedTypeDefinition*)?
+DelimitedTypeDefinition:
+ uint(lengthOfTypeDefinition) TypeDefinition
+TypedValue:
+ int(typeId) Value
+TypeDefinition:
+ int(-typeId) encodingOfWireType
+Value:
+ SingletonValue | StructValue
+SingletonValue:
+ uint(0) FieldValue
+FieldValue:
+ builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue
+InterfaceValue:
+ NilInterfaceValue | NonNilInterfaceValue
+NilInterfaceValue:
+ uint(0)
+NonNilInterfaceValue:
+ ConcreteTypeName TypeSequence InterfaceContents
+ConcreteTypeName:
+ uint(lengthOfName) [already read=n] name
+InterfaceContents:
+ int(concreteTypeId) DelimitedValue
+DelimitedValue:
+ uint(length) Value
+ArrayValue:
+ uint(n) FieldValue*n [n elements]
+MapValue:
+ uint(n) (FieldValue FieldValue)*n [n (key, value) pairs]
+SliceValue:
+ uint(n) FieldValue*n [n elements]
+StructValue:
+ (uint(fieldDelta) FieldValue)*
+*/
+
+/*
+For implementers and the curious, here is an encoded example. Given
+ type Point struct {X, Y int}
+and the value
+ p := Point{22, 33}
+the bytes transmitted that encode p will be:
+ 1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00
+ 01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00
+ 07 ff 82 01 2c 01 42 00
+They are determined as follows.
+
+Since this is the first transmission of type Point, the type descriptor
+for Point itself must be sent before the value. This is the first type
+we've sent on this Encoder, so it has type id 65 (0 through 64 are
+reserved).
+
+ 1f // This item (a type descriptor) is 31 bytes long.
+ ff 81 // The negative of the id for the type we're defining, -65.
+ // This is one byte (indicated by FF = -1) followed by
+ // ^-65<<1 | 1. The low 1 bit signals to complement the
+ // rest upon receipt.
+
+ // Now we send a type descriptor, which is itself a struct (wireType).
+ // The type of wireType itself is known (it's built in, as is the type of
+ // all its components), so we just need to send a *value* of type wireType
+ // that represents type "Point".
+ // Here starts the encoding of that value.
+ // Set the field number implicitly to -1; this is done at the beginning
+ // of every struct, including nested structs.
+ 03 // Add 3 to field number; now 2 (wireType.structType; this is a struct).
+ // structType starts with an embedded CommonType, which appears
+ // as a regular structure here too.
+ 01 // add 1 to field number (now 0); start of embedded CommonType.
+ 01 // add 1 to field number (now 0, the name of the type)
+ 05 // string is (unsigned) 5 bytes long
+ 50 6f 69 6e 74 // wireType.structType.CommonType.name = "Point"
+ 01 // add 1 to field number (now 1, the id of the type)
+ ff 82 // wireType.structType.CommonType._id = 65
+ 00 // end of embedded wiretype.structType.CommonType struct
+ 01 // add 1 to field number (now 1, the field array in wireType.structType)
+ 02 // There are two fields in the type (len(structType.field))
+ 01 // Start of first field structure; add 1 to get field number 0: field[0].name
+ 01 // 1 byte
+ 58 // structType.field[0].name = "X"
+ 01 // Add 1 to get field number 1: field[0].id
+ 04 // structType.field[0].typeId is 2 (signed int).
+ 00 // End of structType.field[0]; start structType.field[1]; set field number to -1.
+ 01 // Add 1 to get field number 0: field[1].name
+ 01 // 1 byte
+ 59 // structType.field[1].name = "Y"
+ 01 // Add 1 to get field number 1: field[1].id
+ 04 // struct.Type.field[1].typeId is 2 (signed int).
+ 00 // End of structType.field[1]; end of structType.field.
+ 00 // end of wireType.structType structure
+ 00 // end of wireType structure
+
+Now we can send the Point value. Again the field number resets to -1:
+
+ 07 // this value is 7 bytes long
+ ff 82 // the type number, 65 (1 byte (-FF) followed by 65<<1)
+ 01 // add one to field number, yielding field 0
+ 2c // encoding of signed "22" (0x2c = 44 = 22<<1); Point.x = 22
+ 01 // add one to field number, yielding field 1
+ 42 // encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
+ 00 // end of structure
+
+The type encoding is long and fairly intricate but we send it only once.
+If p is transmitted a second time, the type is already known so the
+output will be just:
+
+ 07 ff 82 01 2c 01 42 00
+
+A single non-struct value at top level is transmitted like a field with
+delta tag 0. For instance, a signed integer with value 3 presented as
+the argument to Encode will emit:
+
+ 03 04 00 06
+
+Which represents:
+
+ 03 // this value is 3 bytes long
+ 04 // the type number, 2, represents an integer
+ 00 // tag delta 0
+ 06 // value 3
+
+*/