A New Passwd Synchronization Dæmon

The Incentive

There is a need to get the passwords for a particular user synchronized across all the Uniform Access galaxies of clusters of computers. When you change your password on Homer, for example, it should also change it on Mead, Saul, Hawking and the rest.

We have a number of problems with the current cpwsync process. Firstly, it is very CPU and memory intensive. When a user changes a password, the password change is sent to the mxfer dæmon who gives it to the mxferd dæmon. Mxferd then rebuilds the /cluster/etc/passwd file by sucking the whole thing (up to 6 megabytes on Homer just prior to the Homer-Dante split) into memory, making the change and then writing the whole thing out to disk again. Mxferd then passes just the updates back to all the mxfer dæmons via a single broadcast. Each of those mxfer dæmons in turn have an image of the /cluster/etc/passwd file that they rebuild in a shared memory segment that they then give to cpwsync. Cpwsync then sucks /etc/passwd into memory and does a line by line comparison of it with /cluster/etc/passwd to determine what, if anything has changed and rewrites a new /etc/passwd file.

The final result is that for a single password change on an 'n' member galaxy, the 6 megabyte file has been copied 1+2n times (plus 1 for Tolstoy where shared memory doesn't work) and there are 2+3n (plus 1 for Tolstoy) copies of the passwd file left in memory. Luckily, the communication between mxfer and mxferd allows two or more simultaneous password updates to occur in a single pass.

The goal of my pwsync redesign is to utilize the transactional nature of passwd file updates to allow handling only the data that is actually affected by the update. This means rewriting hunks of the files in place rather than copying or comparing whole files for differences. The new pwsync dæmon performs the duties previously performed by the cpwsync, mxfer and mxferd dæmons combined.

The PWSYNC Dæmon

The pwsync dæmon uses the transaction engine designed for the deskmail dmd dæmon to maintain the data used to build the /etc/passwd file (and related shadow files and their DBM counterparts). Three or more computers are designated 'servers' while the rest of the computers are designated 'clients'. One of the servers acts as the master while the others act as chiefs. The client computers always act as slaves. All updates to the database are performed by the master. Each of these transactions is given a transaction sequence number and is written to a log file. The master feeds all the transactions to the chiefs who maintain duplicates of the entire database.

The transactions are also funneled to the slaves who maintain enough of the data to build their own passwd file. When a dæmon comes online it will tell the master its latest sequence number. The master will take the new client on as a child or will forward the connection request to one its existing children who is capable of updating the new dæmon to the current level. In this way a somewhat balanced tree of TCP connections will be built. If a dæmon dies, is shut down or simply quits responding, its children will go through the connection process again to maintain the connectivity. Thus any chief or slave process can have additional slaves as children that they're responsible for. All transactions coming in from the root of the tree are forwarded to each child process before making the update to its own database.

The Database

The main database resides in /usr/local/lib/pwsync on each computer. The file 'data' contains one entry for each active UID. Each entry consists of a 128 byte structure:

typedef struct PGlobal {
  char pwg_acct[10];            /* Account field                */
  char pwg_name[10];            /* User name                    */
  char pwg_passwd[16];          /* Encrypted password           */
  char pwg_gecos[80];           /* Personal id information      */
  int  pwg_gid;                 /* Unix gid                     */
  int  pwg_gflag;               /* Galaxy flags                 */
  int  pwg_cflag;               /* Cluster flags                */
} pwg_t;                        /* Note: Exactly 128=2^7 bytes  */

that is indexed by UID. There is also a data.x file for every cluster 'x'. The data.x files contain the cluster specific information (namely, the shell and home directory) of each user. Each entry in the data.x file consists of a 32 byte structure:

typedef struct PLocal {
  int  pwl_shell;               /* Shell ordinal and flags      */
  char pwl_dir[28];             /* Home directory               */
} pwl_t;                        /* Note: Exactly 32=2^5 bytes   */

that is, again, indexed by UID. Every computer maintains the global data file. Only the servers and the individual member computers maintain the data.x file for a particular cluster.

There is nothing really magical about the sizes of these structures. They're merely powers of two in an attempt to help performance. With 128 bytes per user the global data file will be about 12 megabytes if we have 100,000 users. 128 bytes seems to be about the minimum, it's tough to cram everyone's gecos entry into 80 bytes as it is.

The galaxy flags, pwg_gflag, contains one bit for each of a maximum of 32 clusters. If a bit is set then the user has an account on cluster x and will have a legitimate entry in the data.x file. If the bit is not set then the user does not have an /etc/passwd entry and the record in the data.x file for that UID is undefined.

The cluster flags, pwg_cflag, contains miscellaneous flags. One bit indicates whether the user is "super" (i.e.: those of us with valid accounts on jrr1, hugo2, etc), another indicates whether the user is "research" (i.e.: those with valid mead accounts).

The shell flags, pwl_shell, contains flags and a shell ordinal. The various shells are defined in the pwsync configuration file /usr/local/lib/pwsync/conf. The flags specify shell modifiers such as password change required and/or account expired. Using an integer field rather than spelling out the shells repeatedly greatly reduces the space required, especially when you consider shells like:

/usr/local/etc/expired#/usr/local/etc/pwchange#/usr/local/bin/psh.

The translation from hexadecimal shell ordinal to text is in the configuration file /usr/local/lib/pwsync/conf.

The /etc/passwd File

When pwsync starts up, it will build the /etc/passwd and the appropriate shadow file from the information in the /usr/local/lib/pwsync/data and data.x files. As the files are being built an in-memory table is generated that keeps track of the location and length of each UID's entry in the passwd and shadow files. When an entry changes due to an incoming transaction, the entries for these files are regenerated and then rewritten in place in the files. If the new entry is shorter than the previous one, the gecos field is padded with an extra comma and blanks. If the new entry is longer a dummy entry, "X:*:99:9:filler:/:", is written in its place and the real entry is appended to the end of the file. Periodically the files will be rebuilt from the data and data.x files which will clear out the bogus entries. In the interest of efficiency, no validation is made on the previous text in /etc/passwd prior to overwriting it with new data. This is not a problem as no one other than pwsync should be modifying the /etc/passwd file and in a perfect world the DBM file is used rather than the flat text file, anyway. This could be a problem on AIX for the shadow file as it does not have a DBM database, but hopefully not. In any case, sending a HUP signal to pwsync will cause it to rebuild the passwd and shadow files from scratch.

Updates

The pwsync dæmon takes commands via TCP connections from client processes. All requests use Jim Fox's new Lightweight Secure Connection Library (LSC) so that the client and the server can authenticate each other at a particular security level. Each host will be trusted at one of sixteen levels to give a setuid root process on that system access to some subset of the commands. For example, root on Strunk and White will have full access while root on Becker will be able to change a password only if they can prove that they know the existing password for the entry they're changing.

The pwsync dæmon accepts the following external commands via TCP connections:

The set and setpw commands accept the following field names which modify the corresponding fields in the pwg or pwl data structures:
A:textpwg_acct
C:+/-hexpwg_cflag
F:+/-hexpwg_gflag
G:intpwg_gid
I:textpwg_gecos
N:textpwg_name
P:textpwg_passwd
XxD:textpwl_dir (for data.x)
XxS:+/-hexpwl_shell (for data.x)

The flag codes set with +/-hex take a hexadecimal constant prefixed by a plus for a bit-wise OR, a minus to clear the specified bits (with a bit-wise AND of the inverse) or an asterisk to set the full flag word. If the shell flag is specified as +value and the shell ordinal of the new value is non-zero, the original shell ordinal will be cleared prior to doing the bit-wise OR. Thus, the shell ordinal can be changed with +012 without affecting the shell modifier bits.

The 'pwedit' utility can be used to pass the above commands to the master server. It also will take a list of users or UIDs write their information to a flat text file, invoke an editor and then read the data back in, converting it to appropriate Set commands. These utilities will only be available on secured trusted systems such as melville or daffy. There is also an API in a new library, libuwpw.a, that replaces the current API in the sy99 library that communicates with the mxfer dæmon.

Phasing in the new pwsync dæmon

Pick a day, call it June 18th, to switch to the new dæmon. The current /cluster/etc/passwd files from each galaxy will be loaded into the pwsync database with a utility I've got called 'pwload'. If a UID has an entry on two different galaxies with different gecos fields, pwload will generate a new gecos entry from the two using the shortest non-empty text from each of the subfields; name, office, work phone and home phone. This scheme gets "D. Smith" rather than "Diana Smith" if we've got someone who wanted to be anonymous. It will also tend to discard nicknames if one was entered on one galaxy but not another (there are some nicknames on becker that the owner probably doesn't want put on homer). If the resulting field is too long, the office field will be truncated first and then the rest. Most cases that I've seen that have long gecos entries are medical staff specifying three or four separate offices with their fax numbers.
Pwload will not set the password in the pwsync database. When pwsync on a particular system creates an entry that does not have a password specified it will use the password from the current /etc/passwd file on the local system and set a flag in pwg_cflag indicating that the password should not be propagated. Thus each /etc/passwd will retain its original password. The first time the password is changed on any galaxy member via the pwsync master dæmon, the flag will be cleared and the new password will propagate to all the galaxies. From then on the passwords will remain in sync.