The Brittleness of the SSL/TLS Certificate System

Despite the time and inconvenience caused to the industry by Heartbleed, its impact does provide some impetus for examining the underlying certificate hierarchy. (As an historical example, in the wake of CA certificate misissuances, the industry looked at one set of flaws: how any one of the many trusted CAs can issue certificates for any site, even if the owner of that site hasn't requested them to do so; that link is also a quick primer on the certificate hierarchy.)

Three years later, one outcome of the uncertainty around Heartbleed - that any certificate on an OpenSSL server *might* have been compromised - is the mass revocation of thousands of otherwise valid certificates.  But, as Adam Langley has pointed out, the revocation process hasn't really worked well for years, and it isn't about to start working any better now.

Revocation is Hard

The core of the problem is that revocation wasn't designed for an epochal event like this; it's never really had the scalability to deal with more than a small number of actively revoked certificates.  The original revocation model was organized around each CA publishing a certificate revocation list (CRL): the list of all non-expired certs the CA would like to revoke.  In theory, a user's browser should download the CRL before trusting the certificate presented to it, and check that the presented certificate isn't on the CRL.  In practice, most don't.  Partly because HTTPS isn't really a standalone protocol: it is the HTTP protocol tunneled over the TLS protocol.  The signaling between these two protocols is limited, and so the revocation check must happen inside the TLS startup, making it a performance challenge for the web, as a browser waits for a CA response before it continues communicating with a web server.

CRLs are a problem not only for the browser, which has to pull the entire CRL when it visits a website, but also for the CA, which has to deliver the entire CRL when a user visits one site.  This led to the development of the online certificate status protocol (OCSP).  OCSP allows a browser to ask a CA "Is this specific cert still good?" and get an answer "That certificate is still good (and you may cache this message for 60 minutes)."  Unfortunately, while OCSP is a huge step forward from CRLs, it still leaves in place the need to not only trust *all* of the possible CAs, but also make a real-time call to one during the initial HTTPS connection.  As Adam notes, the closest thing we have in the near term to operationally "revocable" certs might be OCSP-Must-Staple, in which the OCSP response (signed by the CA) is actually sent to the browser from the HTTPS server alongside the server's certificate.

One Possible Future

A different option entirely might be to move to DANE (DNSSEC Assertion of Named Entities).  In DANE, an enterprise places a record which specifies the exact certificate (or set of certificates, or CA which can issue certificates) which is valid for a  given hostname into its DNS zone file.  This record is then signed with DNSSEC, and a client would then only trust that specific certificate for that hostname. (This is similar to, but slightly more scalable than, Google's certificate pinning initiative.)

DANE puts more trust into the DNSSEC hierarchy, but removes all trust from the CA hierarchy.  That might be the right tradeoff.  Either way, the current system doesn't work and, as Heartbleed has made evident, doesn't meet the web's current or future needs.

(Footnote:  No conversation made herein around Certificate Transparency, or HSTS, both of which are somewhat orthogonal to this problem.)

This entry crossposted at blogs.akamai.com.