Hypthenation in LaTeX: why it kills + how to survive
Intro
I recently got bitten by the hyphenation of LaTeX and learned a few things in return. I noticed that two other LaTeX users in my environment were running into similar hyphenation trouble or were about to start a bigger text document without sufficient knowledge of hyphenation. So I would like to summarize a few essentials here.
Why care about hyphenation?
By default, LaTeX does hyphenation for you. If it gets things wrong the reader will attribute that error to you, not to LaTeX. Sometimes bad or wrong hyphenation even affects the layout of the page and makes text go into the margin area. Maybe a reader attributes that to LaTeX; however, it still doesn't look very professional.
Why hyphenation needs to be correct from the very start (and for every
word)
Imagine you work on a document 20+ pages long. You pass a snapshot of the document to someone to read, get some corrections back, start integrating corrections. If you modify content on page 10 there is a chance that hyphenations on pages later in the document change: it may happen the the set of hyphenated words before and after do not share a single word. What that means is that as long as you work on the document, places of hyphenation change. That means errors that no reviewer had a chance to see before. Ouch.
Common pitfalls with hyphenation
Words containing hyphens, e.g "well-understood"
If you use words that contain hyphens, e.g "well-understood", LaTeX breaks these at the very place of the present hyphens only. It may even write the end into the margin area of the page, if it exceeds a certain length. The following example illustrates the issue. Let's look at a document made of the text
Hyphenation is well-understood. Maybe not.
six times. Without manual work you get output as shown in the following excerpt. Focus on the right border.
If you look close, you can see that "well-understood" on the end of the first
line goes beyond the text area. To solve the issue command \hyp
of package
hyphenat
comes to the rescue.
\documentclass{article} \usepackage{hyphenat} \begin{document} Hyphenation is well\hyp{}understood. Maybe not. [..] \end{document}
This time the output respects the size of the text area. Again, focus on the right border.
Mixing languages, e.g. English and German
If you mix two or more languages within the same document you have to tell
LaTeX which words belong to which language. (Once you get used to it, it's
bearable; there isn't much of a way around it.) Otherwise you end up with
things like German hyphenation applied to English words, i.e. wrong
hyphenation. In a case where the main text is written in language A it makes
sense to mark selected words as language B. To mark English words in an
otherwise German document I use a simple self-made command \ENG
:
\usepackage{babel} [..] \newcommand{\ENG}[1]{\foreignlanguage{english}{#1}}
An example use would be
UnionFS ist ein \ENG{Stackabe file system}.
Mapping that to other combinations is left to the reader :-)
Debugging tool \showhyphens
The command \showhyphens
can be used to query all the places where LaTeX
dares to hyphenate a word (or compound word). The output however does not go
into the actual document but to the shell and the log file. If you feed the
following document to LaTeX
\documentclass{article} \usepackage{hyphenat} \begin{document} \showhyphens{well-defined} \showhyphens{well\hyp{}defined} Dummy \end{document}
you can spot this output on the shell:
... [] \OT1/cmr/m/n/10 well-defined ... [] \OT1/cmr/m/n/10 well-de-fined ...
In case you have a script extracting all used words from a LaTeX document for
another view on spelling mistakes you could combine that with the results of
\showhyphens
to a make a single list with all words and their hyphenation
for review.