TLS fatal Alert error on SMTP and Certificate failures

Problem Description

Incoming SMTP connections are failing with a “TLS fatal alert” error
Also sympl-ssl says certificate has expired when I’ve just renewed it.
Web browser says certificate is fine!

Any Error Messages

In /var/log/exim4/mainlog, when a remote incoming SMTP connection is attempted

... (gnutls_handshake): A TLS fatal alert has been received

This is what happens when I renew certificates (I did this because I thought TLS error was due to expired certificate)

sympl@h1:/srv/treewind.co.uk/config$ sudo sympl-ssl -v treewind.co.uk
* Examining certificates for treewind.co.uk
	No valid certificate sets found.
	Fetching a new certificate from LetsEncrypt.
	Requesting verification for treewind.co.uk from https://acme-v02.api.letsencrypt.org/directory
	Successfully verified treewind.co.uk
	Requesting verification for www.treewind.co.uk from https://acme-v02.api.letsencrypt.org/directory
	Successfully verified www.treewind.co.uk
	Successfully fetched new certificate and created set 0
	Rolled over to SSL set 0
sympl@h1:/srv/treewind.co.uk/config$ sudo sympl-ssl -v treewind.co.uk
* Examining certificates for treewind.co.uk
	SSL set 0: Not valid for treewind.co.uk -- certificate has expired (10)
	SSL set 0: Not valid for treewind.co.uk -- certificate has expired (10)
	Current SSL set 0: signed by /C=US/O=Let's Encrypt/CN=R3, expires 2021-12-30 14:52:36 UTC
	SSL set 0: Not valid for treewind.co.uk -- certificate has expired (10)
	SSL set 0: Not valid for treewind.co.uk -- certificate has expired (10)
	The current set is no longer valid for this domain.
	No valid certificate sets found.
	Fetching a new certificate from LetsEncrypt.


Environment

  • Sympl Version [10.0]:
  • Sympl Testing Version? [No]
  • Debian Version [Bullseye]:
  • Hardware Type? [Virtual]
  • Hosted On? [Bitfolk]

Looks like theres two distinct but related problems here, both probably related to the retiring of the old Let’s Encrypt certificates…

I’m not seeing that problem on any of my test machines, and I can send and recieve mail normally on it at the moment, but you may want to check your client/OS isn’t affected and the server’sca-certificates packages is up to date, especially if you’ve upgraded the server rather than a clean install.

This looks to be a bug in the way the code is checking the issued certs - the signing is a little complex now and the old ‘DST Root CA X3’ cert has expired, the old sympl-ssl code looks to be flagging the whole thing as invalid as there’s also a second root cert being sent for compatibility, which is signed by that old expired root.

Browsers and most things should be fine and ignore the extra path, but sympl-ssl is being rather picky, but I’ll have a look and see if I can push a fix for this as a priority.

As is, the certs being picked up are fine (and they will be refreshed each night at the moment), but you should avoid manually running sympl-ssl as it’ll consider all the certs invalid at present, as you’ll end up hitting rate-limits with Let’s Encrypt.

Agreed, it seems a separate problem. I have reached a rate limit for one domain, but there’s a good cert there so it shouldn’t be a problem.

I got a message saying the server needed a reboot (I think that usually means a kernel upgrade.) I’ve done that now and need to check whether the SMTP problem is still happening.

This from SANS Newsbites may be the problem

Let’s Encrypt Root Certificate Expiration Causes Problems

(September 30, 2021)

A Let’s Encrypt root certificate expired, disrupting some popular websites and services. There has been advance warning that the IdentTrust DST Root CA X3 certificate would expire on September 30.

Editor’s Note

[Pescatore]
Certificate management has long been overlooked – expired certificates are a continual source of self-inflicted denial-of-service attacks. This used to be just an internet-facing web server problem, but the increased use of SSL everywhere (both internally and with more than browser to server connections) it becomes more critical. Discovering what certificates are in use and when they will expire is the first step – should be considered a required function within asset inventory and vulnerability management processes.

[Neely]
While you’ve been focused on getting all your sites to be TLS only, and implementing processes and automation to keep those current, don’t overlook the processes needed to keep your root certificate stores current. While you’re working to judiciously apply patches such as browser and OS updates which include updated certificates, don’t overlook application server/service updates which may also include local root certificate stores.

[Frost]
This Let’s Encrypt issue is a good lesson in how vendors and manufacturers think about technology. Deploying certificates is a great and helpful idea. The part of the challenge that I believe many companies or technologists miss is day 2. How do you handle updates for maintenance items on devices that are not general-purpose computers? TVs, printers, light bulbs, Internet connected toasters? How do you revoke or update a certificate on a printer? An intermediary certificate with ten years of life on a device with a ten-year life span is ideal. Using that same certificate on a device created six months ago? Probably not ideal unless you can update it.

Read more in:

  • www.zdnet.com: Fortinet, Shopify and more report issues after root CA certificate from Lets Encrypt expires
    - www.theregister.com: Xero, Slack suffer outages just as Let’s Encrypt root cert expiry downs other websites, services
1 Like

Thank you - that fits perfectly with the timing of this problem starting.
However, it’s not clear to me what I can do about it.
I can send mail, and no others using my server for SMTP have reported a problem yet.
I have one user unable to connect for SMTP because of this. He’s sending automated emails from a Red Hat server via a firewall which seems to be doing some sort of NAT. Does he need to do some kind of security upgrade?
After my server reboot yesterday, the problem persists.

Also, is there a way to get more debugging info on the exact cause of the “TLS fatal alert” ?

So, I wondered if this affected me and here’s the result of the detective work:

So the DST_Root_CA_X3.crt has expired and things should be using ISRG_Root_X1.pem. All the Debian systems I’ve checked have this installed (buster, bullseye and RPi OS).

There is a posting from Redhat on the topic

which seems to contain updated certificate packages that your user needs to install.

So, after some late night/early morning investigation, the problem with Sympl is down to the Ruby OpenSSL library, and the code which checks if certs are valid which is seeing the extra cross-signed certificate associating it with the expired root cert and considering the whole chain as expired, even though the cert itself hasn’t expired.

There’s a workaround for this which involves removing the extra intermediate/chain from config/sets/current/ssl.bundle, which isn’t particularly tidy, but may be the best option in the short term if a proper solution can’t be found.

1 Like

All of my certs seem to have been updated @ 0650 this am…

I did wonder why all my certificates got updated this morning!

Unfortunately fixing this properly will involve quite a bit of work, so I’ve pushed a short term fix for Sympl now (sympl-core x.20211003.0) which simply removes the affected intermediate cert when it’s following the normal intermediate.

The side effect of this is that if a device has support for the old DST root certificate, but not the current ones (ie: hasn’t had an update for >5 years) and is fine with expired root certificates (which is a very bad idea) then it’ll now not like things signed with the updated cleaner certificate. Overall , this should be a very small subset of devices, and means those which would otherwise dislike the extra invalid intermediate will now work as expected again.

1 Like