DESKMAIL, DNS based IMAP server selection

What is DESKMAIL?

The deskmail concept provides a way to distribute users among several IMAP servers. With dynamic names, the users can be redistributed among the IMAP servers at any time. There is no need to update individual configuration files on the users' PCs or even to tell the users on which server their mail resides.

This is accomplished by creating an entry in the DNS name space for each user. The entry points to the user's specific IMAP server and all incoming mail and IMAP connections use this name when talking to the server host. For the University of Washington, the DNS entry is of the form user.deskmail.washington.edu.

Maintaining these names with the standard named configuration files would be impossible. Instead, we have a modified version of the named dæmon that looks the name up in an alternative database. This is done by special casing the string xxhomexx in a cname directive. The DNS servers for deskmail.washington.edu subdomain are franklin01 and franklin02. They have the special modified named dæmon. The configuration file for the deskmail domain looks like:

;
; Mail server forwarding info for user.deskmail.washington.edu
;
$ORIGIN deskmail.washington.edu.
@               IN      SOA     franklin01.u.washington.edu. (
                        9610042         ; Serial
                        10800           ; Refresh 
                        1800            ; Retry   
                        3600000         ; Expire  
                        86400 )         ; Minimum 
                IN      NS      franklin01.u.washington.edu.
                IN      NS      franklin02.u.washington.edu.
        1       IN      A       140.142.13.104   ; franklin01
                IN      MX      10      franklin01.u.washington.edu.
*       1       IN      CNAME   XXHOMEXX.u.washington.edu.


When a request comes in for user.deskmail.washington.edu, the standard name server code generates a response with a single cname directive with a value of xxhomexx.u.washington.edu. As it is building this response, the special hack in the name server code kicks in when it sees the xxhomexx keyword and looks user up in a DBM database. The text found in the DBM database is inserted into the response in place of the xxhomexx construct.

In the future I plan to replace the CNAME with an MX and A record. The tricky part is that the hack to the name server code is not in a pleasant place to be looking up A records. I do not want to hard code all the A records for all the hosts that might appear in the DBM database. Rather, I believe I will have a separate process maintain a cache of these names and have named do a lookup on each reference.

The Deskmail Client

There is a client utility on each of the various systems in the homer, dante and aagaard galaxies called /usr/local/etc/deskmail. This utility is used to update the mapping between the user.deskmail.washington.edu DNS name and the actual server that holds that user's mail. The updates are passed to a server dæmons that updates the DBM database. The deskmail client utility takes commands of the form:

Add user newhost oldhost
Set user host1:host2:...:hostn
Get user [full]
Delete user oldhost
These commands can either appear on stdin or a single command can be put on the command line, quoted, following a -c keyword. Entries in the DBM database actually consist of a series of hosts that would be capable of accepting mail for each user. The set command is used to establish the entire list for a user. The add command is used to redirect mail forwarding from "oldhost" to "newhost" for the specified user. On the add command, the newhost specified replaces the oldhost exactly where it appears in the list. If the oldhost parameter is not specified or if the specified oldhost is not in the existing host list, then the newhost is added to the end of the list. If the oldhost parameter is specified as the keyword "*" the newhost is added to the beginning of the list.

When a new account is created on aagaard, dante or homer, the deskmail utility is invoked to add an entry to the end of the list for that user. After an account expires and is deleted, a reconciliation process will remove the corresponding entry from the list.

The DMD Dæmon

The DBM database is maintained by a dæmon called dmd (deskmail dæmon). Dmd runs on each system that is an authoritative DNS server for the deskmail.washington.edu domain. This would be franklin01 and franklin02. The dmd dæmons communicate with each other and their clients via TCP/IP connections. One of the dmd dæmons acts as the master server and the rest act as slave servers. The individual clients connect to the master server and the master server passes all database updates to all the slave servers.

The Deskmail Configuration File

The deskmail clients and dæmons use a configuration file, /usr/local/lib/deskmail/deskmail.cf, that contains the following parameters:

Password primary_pw previous_pw
Server host1 server_id1
Server host2 server_id2

Server hostn server_idn
The password directive lists the primary and previous passwords. The dmd dæmon will accept either of these passwords; clients send the primary password when connecting to a server. The purpose of having two passwords is to allow changing the password without having to update all server and client configuration files in one autonomous operation.

The server directives list the hosts that are configured to be servers. Any one of these can be the master server. When a client wants to connect to the master server it sends a "P" datagram to each of the listed servers and waits for one to answer. Only the active master will respond to a "P" datagram. All active servers will respond to an "S" datagram with their current state. The server id field is a unique number for each server so that transactions can be tagged with the server that made them.

The configuration file is protected with root-only read to protect the password. The deskmail client is not suid to root so only the superuser or another suid root utility can use the client. The deskmail client does no verification on its commands before it sends them to the dmd servers. If it is modified to allow users to access and update their own deskmail forwarding it will have to verify that the user is not setting someone else's forwarding as a "denial of service" attack.

The Deskmail Database

The database that the dmd dæmon maintains consists of three files:

/usr/local/lib/deskmail/mailhome
/usr/local/lib/deskmail/mailhome.pag
/usr/local/lib/deskmail/mailhome.dir
The first is a transaction log, the other two contain the actual DBM database that the name server accesses. The transaction log consists of lines of the form:
$sequence server_id Set user host1:host2:…hostn
The sequence number is an ever-increasing integer number. The log file must be in continuous sequence with no missing or duplicated transactions. All servers should list the same set of transactions.

The DMD Protocol

The dmd dæmons communicate using a unique protocol. When first started, dmd sets up two pseudo-clients. Client zero handles incoming UDP "Argus" packets and client one listens for incoming TCP connections. It then initializes by reading the tail of the transaction log to get its current sequence number and enters search mode, state 1. While in search mode, it sends out an "I am here" datagram to each of the other configured servers every ten seconds and increments its state. If a master server gets an "I am here" query it will respond, telling the requesting server to connect as a slave. Once connected to the master, dmd enters slave mode, state 50. The "I am here" packet also contains the server's last known sequence number. If a searching server gets an "I am here" from another searching server, the one with the lower sequence number will reset its state back to 1. If the searching server gets up to state 3 before it hears from a master server it will declare itself the master and enter master mode, state 100. While in master mode, dmd sends an "I am the master" datagram every ten seconds to all configured servers who are not currently connected as slaves.

When a server connects to the master server as a slave it asks the master to send all updates since its latest sequence number. All updates passed between the master and slave servers are in the form of "set" commands rather than "add" commands. There should never be a case where two servers get out of sync (famous last words), but if they do, an update to the offending user's record will put all the servers back into agreement.

In the case of the ever-worrisome "partitioned network," it could be possible for two servers to become masters at the same time. If we do get two master servers due to a partitioned network, they will eventually detect this. All the time they are separated they will be attempting to send "I am the master" datagrams to each other. When they are reunited they will both hear the messages from the other. The master with the lower sequence number will become a slave. It is possible that the new slave has processed transactions that the master has not yet heard about. The server id on each transaction allows this condition to be detected. When this occurs, the slave and master servers will search backwards through their transaction logs until they find a transaction that has a matching sequence number and server id. The slave will then:

  1. Save all transactions in its own log past that point that have its own server id,
  2. Apply all the master's transactions after that point and then
  3. Forward the transactions it saved to the master to be given new transaction sequence numbers.

Client/Server Messages

Messages sent from the clients to a server are in the form:

seq command
Where "seq" is an arbitrary sequence number to associate a response with the command that generated it. Responses from the server are in the form:
rc id ACK seq reason
rc id NAK seq reason
where "rc" is a response code, "id" is the server's id number, "seq" is the numeric sequence number on the command that generated the response and "reason" is a text string. The response codes are defined in the include file /tulsa/src/deskmail/deskmail.h, but in general are in the following ranges:
200-299: Success
300-399: Failed, but should be retried (eg: no new clients allowed)
400-499: Non-fatal type error (eg: no such user on a 'get')
500-599: Fatal client error (eg: bad command)
600-699: Fatal server error (eg: database not available)

The actual commands are those permitted by the deskmail client above plus a number of internal commands (like "password pw" for signing on). For a complete list of commands, see the code.

Testing and Debugging

The dmd dæmons will produce oodles of debugging messages via the syslog mechanism. All the syslog messages also show up in the Argus status display as well. The debug level can be set via an Argus "ding" of the form "dxxx" where "xxx" is a hexadecimal debug level. The first three characters of the debug messages are the debug bit that triggered the message so you can tell which bits to not select the next time when you get too much junk. For more documentation on the debug messages, see the code.

In addition to deskmail.washington.edu, we also have the desktop.washington.edu domain. The server for the desktop domain is miro.u.washington.edu. Miro is also in the deskmail configuration file as the third host that is configured to run the dmd dæmon. Changes to the name server, such as switching from a single CNAME record to an MX/A record pair will be tested on miro for the desktop.washington.edu domain prior to being put into production on franklin01 and franklin02 for the deskmail.washington.edu domain.