Justification in Heirloom Troff JUSTIFICATION IN HEIRLOOM TROFF Gunnar Ritter 11/11/06 Heirloom Documentation Tools http://n‐t‐ roff.github.io/heirloom/doctools.html _L_i_n_e‐_b_y‐_l_i_n_e _a_d_j_u_s_t_m_e_n_t When determining line breaks, _t_r_o_f_f tra‐ ditionally uses a simple method: Words are accumulated from input as long as they fit on the current output line. Once a word consumes more space than available, it is hyphenated. If a fea‐ sible breakpoint results, it is chosen; otherwise the current output line ends with the previous word. If the adjustment mode is ‘‘b’’ (adjust both margins) and the line is shorter than the desired line length, interword spaces are widened to make the line fit. The line is then printed either to intermediate output or to the current di‐ version, if any. Afterwards, the process starts again. This method works reasonably well but has two shortcomings: First, it is not possible to compress interword spaces by the slightest amount even if the break‐ point obtainable by widening is inaccept‐ able. Second, if a line that fits perfectly is followed by a line with very loose spacing, it might be better to move the last word of the first line to the next one; then both lines might have less than perfect spacing, but it might never‐ theless be more acceptable than one line with very wide spaces. Both problems are addressed in _H_e_i_r_‐ _l_o_o_m _t_r_o_f_f. As usual, the default behavior has not changed, though, so identical output will be produced for ex‐ isting documents unless they are modified accordingly. _S_h_r_i_n_k_i_n_g _i_n_t_e_r_w_o_r_d _s_p_a_c_e_s The ‘‘.minss’’ request allows to spec‐ ify a minimum interword space. It is only effective when adjusting both mar‐ gins. It accepts an argument with the same semantics as the ‘‘.ss’’ request, i.e. a numeric value that is multiplied by 12/36 of the standard interword space. The space size configured with ‘‘.ss’’ is taken as the optimum setting. Thus with the default ‘‘.ss 12’’, ‘‘.minss 9’’ specifies that spaces may be shrunk to 75 percent. The line breaking process is then changed such that when the first word does not fit on the current output line anymore, _t_r_o_f_f is allowed to shrink in‐ terword spaces to make it fit instead of deferring the word to the next line and expanding the interword spaces on the current one. _t_r_o_f_f has a slight preference for shrinking built‐in, so if shrinking and expanding are equally far away from the optimum, shrinking is cho‐ sen. If a line can be set with the opti‐ mum setting, no shrinking is performed. 2 Harmony, liberal _S_t_a_n_d_a_r_d intercourse with _a_d_j_u_s_t_m_e_n_t all nations, are _s_e_t_t_i_n_g_s recommended by policy, humanity, and interest. But even our com‐ mercial policy should hold an equal and impar‐ tial hand; nei‐ ther seeking nor Harmony, liberal _S_h_r_i_n_k _t_o intercourse with _6_7% (._m_i_n_s_s all nations, are _8) recommended by policy, humanity, and interest. But even our com‐ mercial policy should hold an equal and impar‐ tial hand; nei‐ ther seeking nor While the second setting is certainly not perfect, it is much better than the first one. 3 _P_a_r_a_g_r_a_p_h‐_a_t‐_o_n_c_e _a_d_j_u_s_t_m_e_n_t Adjusting paragraph‐at‐once distributes the word spaces more evenly: Harmony, liberal _L_i_n_e‐_b_y‐ intercourse with _l_i_n_e _a_d_‐ all nations, are _j_u_s_t_m_e_n_t recommended by (._a_d _b) policy, humanity, and interest. But even our com‐ mercial policy should hold an equal and impar‐ tial hand; nei‐ ther seeking nor Harmony, liberal _P_a_r_a_g_r_a_p_h‐ intercourse with _a_t‐_o_n_c_e _a_d_‐ all nations, are _j_u_s_t_m_e_n_t recommended by (._a_d _p) policy, humani‐ ty, and interest. But even our commercial poli‐ cy should hold an equal and impar‐ tial hand; nei‐ ther seeking nor To address the problem of an unnecessar‐ 4 ily loose line, it is obviously necessary to look ahead to following text. Actual‐ ly the best solution may involve multiple lines: the line with sufficiently tight spacing might occur several lines before the loose one, and each line in between simply starts one word earlier but con‐ tains the same number of words. For this reason, _t_r_o_f_f collects the words of an entire paragraph and computes optimal breakpoints when it is ended with the next request causing a _b_r_e_a_k. Break‐ points are considered optimal if all interword spaces in the paragraph are as close to the optimum setting as possible. Once the optimal breakpoints have been computed, the resulting lines are output. At this time, _t_r_a_p_s become effective. When the entire paragraph has been print‐ ed, execution continues with the request that initially caused the _b_r_e_a_k at the end of the paragraph. Paragraph‐at‐once adjustment is enabled per paragraph with ‘‘.ad p’’; the forms ‘‘.ad pc’’, ‘‘.ad pl’’, and ‘‘.ad pr’’ are also supported and apply the method to centered, left‐adjusted, and right‐ad‐ justed text, respectively. The request ‘‘.padj’’ globally enables paragraph‐at‐once adjustment across all environments; it is especially useful to change existing documents to use this mode. Paragraph‐at‐once adjustment is compati‐ ble with almost all existing _t_r_o_f_f code. 5 Most importantly, it works in combina‐ tion with the _t_b_l, _e_q_n, _r_e_f_e_r, and _p_i_c preprocessors as well with the standard ‘‘–mm’’, ‘‘–ms’’, ‘‘–me’’, and ‘‘–man’’ macro sets. The ‘‘.in’’, ‘‘.ti’’, and ‘‘.ll’’ re‐ quests should only be used to set indenting and line length for an entire paragraph. If they are used within a paragraph, breakpoints must be recomput‐ ed, and previous breakpoints are subopti‐ mal. Documents that use such methods e.g. for inline pictures should be adapt‐ ed to achieve optimum results with para‐ graph‐at‐once adjustment. Since positions on the output line are not computed until the entire paragraph has been collected, the ‘‘.k’’ and ‘‘.x’’ number registers cannot contain meaning‐ ful values in paragraph‐at‐once adjust‐ ment mode. Macros that test ‘‘.k’’ only to determine if there is text present will work, though, since it is ensured that ‘‘.k’’ is never zero in this case. The number of the current page in the ‘‘%’’ register can be lower than the num‐ ber of the page on which the current input word will actually be printed in paragraph‐at‐once adjustment mode. Thus e.g. to prepare words for indexing, it is not possible to associate them with page numbers when the input is read. _O_u_t_p_u_t‐ _l_i_n_e _t_r_a_p_s have been introduced to ad‐ dress this issue: A ‘‘\P[xx]’’ in input is passed through all formatting and di‐ 6 Harmony, liberal intercourse with all nations, are recommended by policy, humanity, and interest. But even our commercial policy should hold an equal and impartial hand; neither seeking nor granting exclusive favors or preferences; consulting the natu‐ ral course of things; diffusing and _S_t_a_n_d_a_r_d _a_d_j_u_s_t_m_e_n_t _s_e_t_t_i_n_g_s Harmony, liberal intercourse with all nations, are recommended by policy, humanity, and interest. But even our commercial policy should hold an equal and impartial hand; neither seeking nor granting exclusive favors or preferences; consulting the natu‐ ral course of things; diffusing and _A_l_l_o_w _w_o_r_d _s_p_a_c_e_s _t_o _b_e _s_h_r_u_n_k _t_o _8_3% Harmony, liberal intercourse with all nations, are recommended by policy, humanity, and interest. But even our commercial policy should hold an equal and impartial hand; neither seeking nor granting exclusive fa‐ vors or preferences; consulting the natural course of things; diffus‐ _A_d_j_u_s_t _p_a_r_a_g_r_a_p_h‐_a_t‐_o_n_c_e _a_n_d _a_l_l_o_w _w_o_r_d _s_p_a_c_e_s _t_o _b_e _s_h_r_u_n_k _t_o _8_3% 7 version processing along with the word it has been attached to. When the line con‐ taining it has been actually printed, the macro ‘‘xx’’ is executed. The behavior is then similar to a page trap. Multiple output‐line traps may occur on a single line. An index macro can use this mechanism to defer the processing of an index term until after the position of the word it refers to has been determined: .nr IXcount 0 1 .de IX . de IX‐\\n+[IXcount] . write index \\\\n% \\$1 \\.. \\P[IX‐\\n[IXcount]]\c .. An .IX "index term" index term is contained in this sample text. This example macro takes the index term as a single argument. It creates a separate macro on each invocation and prepends an output‐line trap calling it to the following word. The created macro then prints the current page number (processed in this macro, thus preceded by four backslashes) and the index term argument (processed in the surrounding macro, thus preceded by two backslashes). 8 _M_i_c_r_o_t_y_p_o_g_r_a_p_h_y To further enlarge the range available for adjustment while reducing the amount by which interword spaces are affected, _t_r_o_f_f also allows to vary the size of in‐ terletter spaces and the shape of glyphs with the ‘‘.letadj’’ request. This pro‐ cess is called ‘‘microtypography’’. Microtypography must be applied with care. While the eye is accustomed to varying interword spaces which leave the individual words intact, varying inter‐ letter spaces and letter shapes distort the typeface as soon as they become not‐ icable. This is best demonstrated by using them as an exclusive adjustment mechanism: Harmony, liberal _A_d_j_u_s_t_‐ intercourse with _i_n_g _b_y _l_e_t_‐ all nations, are _t_e_r _s_p_a_c_‐ recommended by _i_n_g _o_n_l_y policy, humani‐ (._l_e_t_a_d_j _9_6 ty, and interest. _1_0_0 _1_2 _1_1_0 But even our _1_0_0) commercial poli‐ cy should hold an equal and impar‐ tial hand; nei‐ ther seeking nor 9 Harmony, liberal _A_d_j_u_s_t_i_n_g intercourse with _b_y _g_l_y_p_h all nations, are _r_e_s_h_a_p_‐ recommended by _i_n_g _o_n_l_y policy, humani‐ (._l_e_t_a_d_j ty, and interest. _1_0_0 _9_6 _1_0_0 But even our _1_1_0) commercial poli‐ cy should hold an equal and impar‐ tial hand; nei‐ ther seeking nor In combination with adjustment of inter‐ word spaces, and if applied with rather strict limits, microtypography can have positive effects, though. This is espe‐ cially true when lines are short; the sample text used so far can be formatted acceptably only if shrinking of interword spaces, paragraph‐at‐once adjustment, and microtypography are all combined: 10 Harmony, liberal _P_a_r_a_g_r_a_p_h‐ intercourse with _a_t‐_o_n_c_e _a_d_‐ all nations, are _j_u_s_t_m_e_n_t, recommended by _w_o_r_d _s_p_a_c_‐ policy, humani‐ _i_n_g _7_5% _t_o ty, and interest. _1_5_0%, _l_e_t_‐ But even our _t_e_r _s_p_a_c_‐ commercial poli‐ _i_n_g _9_5% cy should hold an _t_o _1_0_5%, equal and impar‐ _n_o _g_l_y_p_h tial hand; nei‐ _r_e_s_h_a_p_i_n_g ther seeking nor (._l_e_t_a_d_j _9_5 _1_0_0 _1_8 _1_0_5 _1_0_0) Harmony, liberal _P_a_r_a_g_r_a_p_h‐ intercourse with _a_t‐_o_n_c_e _a_d_‐ all nations, are _j_u_s_t_m_e_n_t, recommended by _w_o_r_d _s_p_a_c_‐ policy, humani‐ _i_n_g _7_5% _t_o ty, and interest. _1_5_0%, _l_e_t_‐ But even our _t_e_r _s_p_a_c_‐ commercial poli‐ _i_n_g _9_5% _t_o cy should hold an _1_0_5%, _g_l_y_p_h equal and impar‐ _r_e_s_h_a_p_i_n_g tial hand; nei‐ _9_8% _t_o _1_0_2% ther seeking nor (._l_e_t_a_d_j _9_5 _9_8 _1_8 _1_0_5 _1_0_2) For layouts with longer lines than in this example, best results are normally achieved with even smaller ranges for letter spacing and glyph reshaping. 11 _P_e_n_a_l_t_i_e_s _f_o_r _l_i_n_e _b_r_e_a_k_s _a_n_d _h_y_p_h_e_n_a_t_i_o_n It is possible to tell _t_r_o_f_f that a line break after a certain word (or hyphenat‐ ed word part) is preferred or discouraged by writing ‘‘\j’_N’’’. A positive _N makes a breakpoint less likely, a negative _N makes it more likely. By default, _t_r_o_f_f hyphenates words when‐ ever necessary in paragraph‐at‐once mode to minimize the adjustment. To reduce the number of hyphenations, _h_y_p_h_e_n_a_t_i_o_n _p_e_n_a_l_t_i_e_s can be configured. Then whenever a breakpoint involves a hyphen‐ ation, it is treated as less optimal, and another breakpoint that does not re‐ quire a hyphenated word may be preferred even though more adjustment may become necessary. Additional penalties can be configured for breakpoints that involve two succes‐ sive hyphens, and for hyphenating the last word of a paragraph. The ‘‘.hypp’’ request takes the single hyphen penalty as its first argument, the penalty for consecutive hyphens as sec‐ ond, and the penalty for hyphenating the last word as third. Effective penalties are between 10 and 200. Hyphenation penalties only make hyphens less likely but do not limit them forcibly. The ‘‘.hlm’’ request imposes a strict limit on the number of consecu‐ tive hyphens. It causes a certain break‐ point to be disabled completely and can 12 thus result in a non‐optimal adjustment. It is recommended to use it in combina‐ tion with ‘‘.spreadwarn’’ to detect such problems. _P_a_r_a_g_r_a_p_h _s_h_a_p_e_s The standard requests for setting indent and line length cannot be used inside a paragraph in paragraph‐at‐once adjust‐ ment mode. It is possible to define the shape of an entire paragraph line‐by‐line with the ‘‘.pshape’’ request. It takes a list of indent and line length pairs as arguments; the first pair applies to the first line of the paragraph, the sec‐ ond pair to the second line, and so forth. If the paragraph has more lines than pairs are given, the last pair is used for them; it it has less lines, the excess pairs are discarded. A paragraph shape is applied to a single paragraph only; it overrides the standard indent, temporary indent, and line length set‐ tings, of which indent and line length become effective for the next paragraph again. 13 For example, it is possible to create a paragraph whose shape forms a cir‐ cle. To create a holey shape like this, define an indent and line pair for each contiguous part and use traps to move the resulting lines in ver‐ tical direc‐ tion to the desired posi‐ tion. This is best done in a diversion so that the whole structure is kept togeth‐ er; diversion traps are the mechanism of choice then. ¶ Admittedly, creating a circle with a hole is hardly a se‐ rious appli‐ cation of the ‘‘.pshape’’ re‐ quest. Com‐ pli‐ cated shapes almost always re‐ quire a 14 care‐ fully word‐ ing of con‐ tent, so para‐ graph‐ at‐ once for‐ mat‐ ting is only a limit‐ ed aid when cre‐ ating them. But ‘‘.pshape’’ is also need‐ ed to flow text around an im‐ age, even if 15 it has sim‐ ply a rect‐ an‐ gular shape. ¶ If you know the ‘‘\par‐ shape’’ com‐ mand from _T_E_X, note that the indent is in‐ cluded in the line length in _t_r_o_f_f, so you have to add every first val‐ ue to 16 every second one for reusing such shapes. _N_o_t_e_s For paragraph‐at‐once adjustment, _t_r_o_f_f uses a variation of the algorithm orig‐ inally developed by Donald Knuth and Michael Plass for the _T_E_X system². The criteria for the quality of a line differ: There is no explicit stretcha‐ bility setting, and the total shrinkabil‐ ity is used to determine whether a break‐ point is feasible, but not for computing its optimality. _t_r_o_f_f currently has a slight preference for tight lines. It might make sense to have this config‐ urable, but it seems that the fact that a line may be stretched or shrinked by a high amount does not necessarily indicate the optimality of doing so. _t_r_o_f_f does not generate ‘‘overfull box‐ es’’, i.e. unadjustable text extending beyond the margin, unless the width of a single word exceeds that of the line. Its warning mechanism can inform about unacceptable adjustments. _t_r_o_f_f makes no use of _f_i_t_n_e_s_s _c_l_a_s_s_e_s and does not prefer to group lines of 17 similar non‐optimal spacing. This is be‐ cause doing so may lead to a more even appearance of the lines of a paragraph, but at the expense of a less even ap‐ pearance in the context of the whole document. For example, consider the case of multiple consecutive lines with loose spacing: If these lines are viewn in isolation, their spacing looks harmoni‐ cally. If they are viewn as part of a document, they look more bright than the rest. It is not clear how to solve this without optimizing the spacing globally for a document, which is not a realistic option. Breakpoints that might occur at differ‐ ent lines are currently not evaluated separately for each such line. Future evaluation may indicate that doing so is necessary in practice. _t_r_o_f_f implements ‘‘microtypography’’ similarly as described by Hàn Thế Thành for _T_E_X³. In particular, it performs a function like ‘‘level 2 font expansion’’ (p. 70), i.e. it considers the possibili‐ ty to shrink interletter spaces and char‐ acter shapes when computing breakpoints. Both stretchability and shrinkability are taken into account for computing the op‐ timality of a breakpoint, but only as far as the width of the possible line is con‐ cerned; the percentual adjustment limits do not influence optimality. The sample text is an excerpt of George Washington’s 1796 Farewell address. Its 18 choice for this purpose is due to exam‐ ples in James Felici’s _C_o_m_p_l_e_t_e _M_a_n_u_a_l _o_f _T_y_p_o_g_r_a_p_h_y¹. _R_e_f_e_r_e_n_c_e_s [1] J. Felici, _T_h_e _C_o_m_p_l_e_t_e _M_a_n_u_a_l _o_f _T_y_‐ _p_o_g_r_a_p_h_y, Berkeley, CA, 2003, pp. 147–149. [2] D. E. Knuth, M. F. Plass, ‘‘Breaking paragraphs into lines’’, _S_o_f_t_w_a_r_e—_P_r_a_c_t_i_c_e _a_n_d _E_x_p_e_r_i_e_n_c_e, Vol. 11, Issue 12 (1981), pp. 1119–1184; also in D. E. Knuth, _D_i_g_i_t_a_l _T_y_p_o_g_r_a_p_h_y, Stanford, 1999 (CSLI lecture notes no. 78), pp. 67–155. [3] Hàn Thế Thành, _M_i_c_r_o‐_t_y_p_o_g_r_a_p_h_i_c _e_x_‐ _t_e_n_s_i_o_n_s _t_o _t_h_e _T_E_X _t_y_p_e_s_e_t_t_i_n_g _s_y_s_t_e_m, Masaryk University Brno, 2000. 19