Skip to main content

Hypthenation in LaTeX: why it kills + how to survive

Intro

I recently got bitten by the hyphenation of LaTeX and learned a few things in return. I noticed that two other LaTeX users in my environment were running into similar hyphenation trouble or were about to start a bigger text document without sufficient knowledge of hyphenation. So I would like to summarize a few essentials here.

Why care about hyphenation?

By default, LaTeX does hyphenation for you. If it gets things wrong the reader will attribute that error to you, not to LaTeX. Sometimes bad or wrong hyphenation even affects the layout of the page and makes text go into the margin area. Maybe a reader attributes that to LaTeX; however, it still doesn't look very professional.

Why hyphenation needs to be correct from the very start (and for every

word)

Imagine you work on a document 20+ pages long. You pass a snapshot of the document to someone to read, get some corrections back, start integrating corrections. If you modify content on page 10 there is a chance that hyphenations on pages later in the document change: it may happen the the set of hyphenated words before and after do not share a single word. What that means is that as long as you work on the document, places of hyphenation change. That means errors that no reviewer had a chance to see before. Ouch.

Common pitfalls with hyphenation

Words containing hyphens, e.g "well-understood"

If you use words that contain hyphens, e.g "well-understood", LaTeX breaks these at the very place of the present hyphens only. It may even write the end into the margin area of the page, if it exceeds a certain length. The following example illustrates the issue. Let's look at a document made of the text

Hyphenation is well-understood. Maybe not.

six times. Without manual work you get output as shown in the following excerpt. Focus on the right border.

If you look close, you can see that "well-understood" on the end of the first line goes beyond the text area. To solve the issue command \hyp of package hyphenat comes to the rescue.

\documentclass{article}
\usepackage{hyphenat}
\begin{document}
Hyphenation is well\hyp{}understood. Maybe not.
[..]
\end{document}

This time the output respects the size of the text area. Again, focus on the right border.

Mixing languages, e.g. English and German

If you mix two or more languages within the same document you have to tell LaTeX which words belong to which language. (Once you get used to it, it's bearable; there isn't much of a way around it.) Otherwise you end up with things like German hyphenation applied to English words, i.e. wrong hyphenation. In a case where the main text is written in language A it makes sense to mark selected words as language B. To mark English words in an otherwise German document I use a simple self-made command \ENG:

\usepackage{babel}
[..]
\newcommand{\ENG}[1]{\foreignlanguage{english}{#1}}

An example use would be

UnionFS ist ein \ENG{Stackabe file system}.

Mapping that to other combinations is left to the reader :-)

Debugging tool \showhyphens

The command \showhyphens can be used to query all the places where LaTeX dares to hyphenate a word (or compound word). The output however does not go into the actual document but to the shell and the log file. If you feed the following document to LaTeX

\documentclass{article}
\usepackage{hyphenat}
\begin{document}
\showhyphens{well-defined}
\showhyphens{well\hyp{}defined}
Dummy
\end{document}

you can spot this output on the shell:

...
[] \OT1/cmr/m/n/10 well-defined
...
[] \OT1/cmr/m/n/10 well-de-fined
...

In case you have a script extracting all used words from a LaTeX document for another view on spelling mistakes you could combine that with the results of \showhyphens to a make a single list with all words and their hyphenation for review.

Futher reading and sources