1 files changed, 254 insertions, 0 deletions
diff --git a/implementation-notes/MILTER b/implementation-notes/MILTER
new file mode 100644
index 0000000..f67fb90
--- /dev/null
+++ b/implementation-notes/MILTER
@@ -0,0 +1,254 @@
+Distribution of Milter responsibility
+=====================================
+
+Milters look at the SMTP commands as well as the message content.
+In Postfix these are handled by different processes:
+
+- smtpd(8) (the SMTP server) focuses on the SMTP commands, strips
+  the SMTP encapsulation, and passes envelope information and message
+  content to the cleanup server.
+
+- the cleanup(8) server parses the message content (it understands
+  headers, body, and MIME structure), and creates a queue file with
+  envelope and content information. The cleanup server adds additional
+  envelope records, such as when to send a "delayed mail" notice.
+
+If we want to support message modifications (add/delete recipient,
+add/delete/replace header, replace body) then it pretty much has
+to be implemented in the cleanup server, if we want to avoid extra
+temporary files.
+
+Network versus local submission
+===============================
+
+As of Sendmail 8.12, all mail is received via SMTP, so all mail is
+subject to Miltering (local submissions are queued in a submission
+queue and then delivered via SMTP to the main MTA, or appended to
+$HOME/dead.letter). In Postfix, local submissions are received by
+the pickup server, which feeds the mail into the cleanup server
+after doing basic sanity checks.
+
+How do we set up the Milters with SMTP mail versus local submissions?
+
+- SMTP mail: smtpd creates Milter contexts, and sends them, including
+  their sockets, to the cleanup server. The smtpd is responsible
+  for sending the Milter abort and close messages. Both smtpd and
+  cleanup are responsible for closing their Milter socket. Since
+  smtpd and cleanup inspect mail at different times, there is no
+  conflict with access to the Milter socket.
+
+- Local submission: the cleanup server creates Milter contexts.
+  The cleanup server provides dummy connect and helo information,
+  or perhaps none at all, and provides sender and recipient events.
+  The cleanup server is responsible for sending the Milter abort
+  and close messages, and for closing the Milter socket.
+
+A special case of local submission is "sendmail -t". This creates
+a record stream in which recipients appear after content. However,
+Milters expect to receive envelope information before content, not
+after.  This is not a problem: just like a queue manager, the
+cleanup-side Milter client can jump around through the queue file
+and send the information to the Milter in the expected order.
+
+Interaction with XCLIENT, "postsuper -r", and external content filters
+======================================================================
+
+Milter applications expect that the MTA supplies context information
+in the form of Sendmail-like macros (j=hostname, {client_name}=the
+SMTP client hostname, etc.). Not all these macros have a Postfix
+equivalent. Postfix 2.3 makes a subset available.
+
+If Postfix does not implement a specific macro, people can usually
+work around it. But we should avoid inconsistency. If Postfix can
+make macro X available at Milter protocol stage Y, then it must
+also be able to make that macro available at all later Milter
+protocol stages, even when some of those stages are handled by a
+different Postfix process.
+
+Thus, when adding Milter support for a specific Sendmail-like macro    
+to the SMTP server:
+
+- We may have to update the XCLIENT protocol, so that Milter
+  applications can be tested with XCLIENT. If not, then we must
+  prominently document everywhere that XCLIENT does not provide
+  100% accurate simulation for Milters. An additional complication
+  is that the SMTP command length is limited, and that each XCLIENT
+  command resets the SMTP server to the 220 stage and generates
+  "connect" events for anvil(8) and for Milters.
+
+- The SMTP server has to send the corresponding attribute to the
+  cleanup server.  The cleanup server then stores the attribute in
+  the queue file, so that Milters produce consistent results when
+  mail is re-queued with "postsuper -r".
+
+But wait, there is more. If mail is filtered by an external content
+filter, then it needs to preserve all the Milter attributes so that
+after "postsuper -r", Milters produce the exact same result as when
+mail was received originally by Postfix. Specifically, after
+"postsuper -r" a signing Milter must not sign mail that it did not
+sign on the first pass through Postfix, and it must not reject mail
+that it accepted on the first pass through Postfix.
+
+Instead of trying to re-create the Milter execution environment
+after "postsuper -r" we simply disable Milter processing. The
+rationale for this is: if mail was Miltered before it was written
+to queue file, then there is no need to Milter it again.
+
+We might want to take a similar approach with external (signing or
+blocking) content filters: don't filter mail that has already been
+filtered, and don't filter mail that didn't need to be filtered.
+Such mail can be recognized by the absence of a "content_filter"
+record. To make the implementation efficient, the cleanup server
+would have to record the presence of a "content_filter" record in
+the queue file header.
+
+Message envelope or content modifications
+=========================================
+
+Milters can send modification requests after receiving the end of
+the message body.  If we can implement all the header/body-related
+Milter operations in the cleanup server, then we can try to edit
+the queue file in place, without ever having to make a temporary
+copy. Once a Milter is done editing, the queue file can be used as
+input for the next Milter, and so on. Finally, the cleanup server
+calls fsync() and waits for successful return.
+
+To implement in-place queue file edits, we need to introduce
+surprisingly little change to the existing Postfix queue file
+structure.  All we need is a way to specify a jump from one place
+in the file to another.
+
+Postfix does not store queue files as plain text files. Instead all
+information is stored in records with an explicit type and length
+for sender, recipient, arrival time, and so on.  Even the content
+that makes up the message header and body is stored as records with
+an explicit type and length.  This organization makes it very easy
+to introduce pointer records, which is what we will use to jump
+from one place in a queue file to another place.
+
+- Deleting a recipient or header record is easy - just mark the
+  record as killed.  When deleting a recipient, we must kill all
+  recipient records that result from virtual alias expansion of the
+  original recipient address. When deleting a very long header or
+  body line, multiple queue file records may need to be killed. We
+  won't try to reuse the deleted space for other purposes.
+
+- Replacing header or body records involves pointer records.
+  Basically, a record is replaced by overwriting it with a forward
+  pointer to space after the end of the queue file, putting the new
+  record there, followed by a reverse pointer to the record that
+  follows the replaced information. If the replaced record is shorter
+  than a pointer record, we relocate the records that follow it to
+  the new area, until we have enough space for the forward pointer
+  record. See below for a discussion on what it takes to make this
+  safe.
+
+  Postfix queue files are segmented. The first segment is for
+  envelope records, the second for message header and body content,
+  and the third segment is for information that was extracted or
+  generated from the message header and body content.  Each segment
+  is terminated by a marker record. For now we don't want to change
+  their location. In particular, we want to avoid moving the start
+  of a segment.
+
+  To ensure that we can always replace a header or body record by
+  a pointer record, without having to relocate a marker record, the
+  cleanup server always places a dummy pointer record at the end
+  of the headers and at the end of the body.
+
+  When a Milter wants to replace an entire body, we have the option
+  to overwrite existing body records until we run out of space, and
+  then writing a pointer to space at the end of the queue file,
+  followed by the remainder of the body, and a pointer to the marker
+  that ends the message content segment.
+
+- Appending a recipient or header record involves pointer records
+  as well. This requires that the queue file already contains a
+  dummy pointer record at the place where we want to append recipient
+  or header content (Milters currently do not replace individual
+  body records, but we could add this if need be).  To append,
+  change the dummy pointer into a forward pointer to space after
+  the end of a message, put the new record there, followed by a
+  reverse pointer to the record that follows the forward pointer.
+
+  To append another record, replace the reverse pointer by a forward
+  pointer to space after the end of a message, put the new record
+  there, followed by the value of the reverse pointer that we
+  replace. Thus, there is no one-to-one correspondence between
+  forward and backward pointers! In fact, there can be multiple
+  forward pointers for one reverse pointer.
+
+When relocating a record we must not relocate the target of a jump
+==================================================================
+
+As discussed above, when replacing an existing record, we overwrite
+it with a forward pointer to the new information. If the old record
+is too small we relocate one or more records that follow the record
+that's being replaced, until we have enough space for the forward
+pointer record.
+
+Now we have to become really careful. Could we end up relocating a
+record that is the target of a forward or reverse pointer, and thus
+corrupt the queue file? The answer is NO.
+
+- We never relocate end-of-segment marker records. Instead, the
+  cleanup server writes dummy pointer records to guarantee that
+  there is always space for a pointer.
+
+- When a record is the target of a forward pointer, it is "edited"
+  information that is preceded either by the end-of-queue-file
+  marker record, or it is preceded by the reverse pointer at the
+  end of earlier written "edited" information. Thus, the target of
+  a forward pointer will not be relocated to make space for a pointer
+  record.
+
+- When a record is the target of a reverse pointer, it is always
+  preceded by a forward pointer record (or by a forward pointer
+  record followed by some unused space). Thus, the target of a
+  reverse pointer will not be relocated to make space for a pointer
+  record.
+
+Could we end up relocating a pointer record?  Yes, but that is OK,
+as long as pointers contain absolute offsets.
+
+Pointer records introduce the possibility of loops
+==================================================
+
+When a queue file is damaged, a bogus pointer value may send Postfix
+into a loop. This must not happen.
+
+Detecting loops is not trivial:
+
+- A sequence of multiple forward pointers may be followed by one
+  legitimate reverse pointer to the location after the first forward
+  pointer. See above for a discussion of how to append a record to
+  an appended record.
+
+- We do know, however, that there will not be more reverse pointers
+  than forward pointers. But this does not help much.
+
+Perhaps we can include a record count at the start of the queue
+file, so that the record walking code knows that it's looking at
+some records more than once, and return an error indication.
+
+How many bytes do we need for a pointer record?
+===============================================
+
+A pointer record would look like this:
+
+    type (1 byte)
+    offset (see below)
+
+Postfix uses long for queue file size/offset information, and stores
+them as %15ld in the SIZE record at the start of the queue file.
+This is somewhat less than a 64-bit long, but it is enough for a
+some time to come, and it is easily changed without breaking forward
+or backward compatibility.
+
+It does mean, however, that a pointer record can easily exceed the
+length of a header record. This is why we go through the trouble
+of record relocation and dummy records. 
+
+In Postfix 2.4 we fixed this by adding padding to short message
+header records so that we can always write a pointer record over a
+message header.  This immensly simplifies the code.