XLA - Extended Load Average design for Sendmail R6 -------------------------------------------------- Christophe Wolfhugel - Herve Schauer Consultants wolf@grasp.insa-lyon.fr, wolf@hsc-sec.fr WARNING: this extension is supplied as a contribution to Sendmail. Should you have trouble, questions, please contact me directly, and *not* the Sendmail development team. ABSTRACT Sendmail currently furnishes a limitation mecanism which is based on the system load average, when available. Experience has prooven that this was not sufficiant for some particular situations, for example if you have slow and/or overloaded links. This can easily cause both system and network congestions with Sendmail having to handle a large number of simultaneous sessions on the same overloaded link, causing most of the SMTP sessions to timeout after a long time. The system load average is also generally too slow to react when your system gets a burst of incoming or outgoing SMTP sessions which on some stations can easily cause system unavailabilities. The extended load average module has been designed in order to furnish a way of limitation the load generated by Sendmail to both your system and your network. This design can be used either alone or as complementary to the system load average if your system supports it. Limitation is based on the number of incoming/outgoing SMTP sessions, and remote hosts are classified in classes. The system administrator will define a maximum number of incoming SMTP sessions as well as a maximum total (incoming + outgoing) sessions for each class of hosts. A class can be either an individual machine or a network. When the limit is reached for a given class, all incoming SMTP connections will be politely refused. When the limit is reached for all classes, the SMTP connections will be refused by the system (which one could consider as less politely :)). On outgoing mail, messages will be queued for delayed processing. The extended load average parameters are given in the Sendmail configuration file, and when not present, Sendmail behaves the usual way. COMPILATION Copy the xla.c module in the src sub-directory, edit the Makefile in order to define XLA (-DXLA). Also add the xla.[co] module name in the list of files so that it gets compiled. Regenerate sendmail by removing all objects, or at least those containing references to XLA (this list may vary, so use grep to get the module list). This will generate a new sendmail executable containing the xla code. Debugging level 59 has been assigned to this module and when used it provides some output (sendmail -d59.x). Please check the source code to see which levels are supported. CONFIGURATION The extended average uses a new set of configuration lines in the sendmail.cf file. All newly introduced line begin with the letter L (capital L). Before detailling the syntax, first an example (this can be placed at any section of the sendmail.cf file, note that the order is important). Fields are separated by (one or more) tabs/spaces. # File name used to store the counters L/etc/sendmail.la # Classes definition # Lname #queue #reject L*.insa-lyon.fr 8 3 L*.univ-lyon1.fr 6 4 L* 15 16 The first line defines the working file which will be used in order to have the occurences of Sendmail read and update the counters. The format of this file is described in the "Design" section. This line is mandatory and the specified file must be absolute (ie begin with a slash). Then you can specify one or more classes. The last class (*) is also mandatory and should be in last position as the first match will stop the search and if there is no match the behavior of Sendmail is unknown. Each class has three fields separated by one or more tabs/spaces. L{mask} {queue_#} {refuse_#} The {mask} is a simple mask. It can be either an explicit host name (like grasp.insa-lyon.fr) or a mask starting with "*." or just "*". No other variants are allowed. Lgrasp.insa-lyon.fr will match exactely any session to/from this host. L*.insa-lyon.fr will match any session to/from any machine in the insa-lyon.fr domain as well as from the machine named "insa-lyon.fr" if it exists. L* will match any session, and thus should also be last in the list to act as a catchup line. The {queue_#} is the maximum number of SMTP sessions in the given class for both incoming and outgoing messages. The {refuse_#} indicates when to refuse incoming messages for this class. The interaction between those counters is somewhat subtle. It seems logical that a standard configuration has {queue_#} >= {refuse_#}, and in fact in most configurations they can be equal (that's why what I use in my environment). Thus, this is not mandatory. If {queue_#} < {refuse_#} outgoing messages will be lower priority than incoming messages and once a class gets loaded the outgoing messages are blocked first. I use very low values in some situations, for example I have a customer connected to the Internet via a 9600 bps line, they also have internal users sending burst of messages (10, 20 small messages coming in just one or two seconds). Both situations were unsupportable. The line is too slow to handle many simultaneous connections and the mail server does not have the ressources to handle such a heavy load (it's a 12 Megs Sun 3 also doing Usenet news). I have defined following section in the configuration file, and experience shows the benefits for everyone. Fake domain for the example: customer.fr. L/etc/sendmail.la L*.customer.fr 8 8 L* 3 3 This means that there might not be more than 8 simultaneous SMTP sessions between the mail server and any other internal host. This is to protect the station from heavy loads like users (or applications !) sending several tenths of messages in just a few seconds). No more than 3 SMTP sessions are authorized with any other host, this is to save the load of the slow 9600 line to the Internet. Drawback is that is you have 3 * 2 Megs sessions established from/to the outside, all your mail will be held until one slot gets available, on a 9600 bps line just make your counts, il blocks your line during over one hour. DESIGN Sendmail will analyze the "L" lines in the configuration file during startup (or read the initialized structure from the frozen file). When started in daemon mode (and only there), any existing working file will be cleared and a new one is created. Each class gets a record in the sendmail.la work file. The size of this record is a short integer (generally two bytes) and represents the count of active sessions in the given class. Read/Write operations in this file are done in one operation (as anyway the size is far below one disk sector). The file is locked with Sendmail's lockfile() function priori to any access. Handling incoming SMTP sessions. There is interaction is two points in the Sendmail source code. First on the listen system call: if all slots in all classes are in use, a listen(0) is done so that the system rejects any incoming SMTP session. This avois to fork and then reject the connexion. If there are some free slots, nothing better than accepting the connection, then forking can be done. The child process then checks if the adequate class is full or not. If full, it rejects the connection with a "421 Too many sessions" diagnostic to the sender (which should then appear when the remote users do a mailq). If the treshold {reject_#} is not reached, the connection is accepted and the counter is sendmail.la is updated. Handling outgoing SMTP sessions. As soon as Sendmail needs to connect to a distant host, the adequate class is checked against {queue_#} and if no slots are available, the message is queued for further processing. Sendmail's connection caching. Sendmail-R6 introduces a new design: connection caching, ie several SMTP sessions can be opened at the same time. This could cause some problems when sending mail, as after having a few connections opened, all slots could be in use and generate a partial delivery of the message. In order to deal with this, xla.c uses following design "for a given sendmail process, only the first connection in a given class is counted". This can be done because sendmail does not do parralel message sending on the different channels. End of connection. As soon as a connection is closed, the counters will be automatically updated. Please look at the code to understand of all this works. Comments, suggestions, questions welcome. Christophe Wolfhugel Herve Schauer Consultants Paris, France May 23, 1993