SSL Certificates

The SSL/TLS protocol which is the 's' in 'https', bolsters secure communications all over the internet. SSL certificates are a small data files that cryptographically establish an encrypted link between a web server and a browser. This post explores the role of SSL certificates in trustful and secure communication.

Anish Koulgi

08 January 2021 • 15 mins read

Cyber Security

Cryptography

Math

Introduction

HTTPS Protocol

You might have noticed that most of the website URLs today start with 'https://.....'. What is the 's' in 'https'? The Hypertext Transfer Protocol (HTTP) is an application-layer protocol for transmitting hypermedia documents, such as HTML. It was designed primarily for communication between web browsers and web servers. In terms of security, HTTP is fine when only browsing the web. But it becomes a serious issue when it comes to entering sensitive information into form fields on the website. This information is transmitted as cleartext and can be read by anyone.
So to securely transmit sensitive data https was introduced. The 's' in 'https' stands for 'secured' and uses the Transport Layer Security (TLS) protocol, formerly known as Secure Sockets Layer (SSL).

Importance of HTTPS

HTTPS prevents websites from having their information broadcasted and easily viewed by anyone on the network. The data in HTTPS is encrypted and cannot be viewed as plain text.
You might have seen random advertisements pop up on http websites which redirect you to other websites. In websites without HTTPS, it is possible for Internet service providers (ISPs) or other intermediaries to inject content into webpages without the compliance of the website owner. This commonly takes the form of advertising, where ISPs looking to increase revenue inject paid advertising into the webpages of their customers. HTTPS eliminates the ability of untrusted third parties to tamper or change the web content.
And finally https guarantees the authenticity of the web server by validating it. Imagine going to www.google.com. Your computer (assume IP - 172.110.5.1) will send a request to the google server which is at let's assume IP - 110.54.4.1. What if some man (assume IP - 154.72.122.5) on the path from 172.110.5.1 to 110.54.4.1 intercepts your request and replies that I am 110.54.4.1 which is same as saying I am the google server. How can you validate that you are actually talking with the google server and not some random man in the middle? This type of attack is called a Man In The Middle (MITM) attack. HTTPS solves this by using what are known as SSL certificates.

In this post, we'll discuss how does your browser (client) verify the other party (server) you are talking to is actually legitimate. This process of verification is called as Authentication in SSL/TLS.

Pre-requisites

There are a few pre-requisites in order to fully understand this process and we'll go over these in brief.

Public Key Cryptography (PKC)

Public Key Cryptography, also called as Asymmetric key cryptography, uses two different keys to encrypt communications between two parties:

Private key - this key is controlled by the owner of a website and it’s kept private. This key lives on a web server and is used to decrypt information encrypted by the public key.
Public key - this key is available to everyone who wants to interact with the server in a way that’s secure. Information that’s encrypted by the public key can only be decrypted by the private key.

The key point to note here is that a message encrypted with a public key can be decrypted only with it's corresponding private key and vice-a-versa because the two keys are mathematically related. The mathematics of PKC is out of the scope of this post.

PKI Image
So for example, If Alice wants to send a message to Bob, she will encrypt it Bob's public key which is available to all users on the network. Now the encrypted message can only be decrypted by Bob's private key. So once Bob receives the message, he can decrypt it with his private key which only he knows.

Cryptographic Hash Functions

Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve values because it is faster to search using the shorter hashed key than to find it using the original value and the hashing algorithm is called as hash function.
Cryptographic hash functions transform data of arbitrary length into a fixed size hash values. These hash values are called digests. It is easy to generate the hash of the data but it is very difficult to decipher the data back from the hash. Thus, these functions are called one-way functions, that is, functions which are practically infeasible to invert. Ideally, a perfect hash function should have these characteristics :

The hash value(output) for every input is unique. So no two strings (data) can have the same hash.
The hash values are independent of the data,i.e. if s1 = "This is my data." and s2 = "This is my data", then hash(s1) should be completely different from hash(s2). Even if only one character changes in the first string, the new hash should be completely different from the old hash.
For example, let s1 = "This is a beautiful sentence" and s2 = "This is a beautiful sentence.". The SHA-256 hash of s1 is 8858020974e9901cb223501187eb20286a6970c965a8a2f6bb2f4da8fb2820c9 and that for s2 is b94e9ea6de8bb46f4e58165e964352028714b555c854e04b206ebe08624d1986. As you can notice, even the addition of a single ' . ' (full stop) completely changes the hash value. You can play with this function with your own examples here
The hash value of a particular string must be persistent. This means that the hash of a string s1 should remain constant when the same function is used multiple times over a period of time to hash it.

One such set of cryptographic hash functions is the Secure Hash Algorithm (SHA) - 2.
SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. The SHA-2 family consists of six hash functions with digests (hash values) that are 224, 256, 384 or 512 bits: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256.

SHA image

Need of Authentication

As always, the best way to understand a technology is to first understand the reason why it was made. Why do we need authentication?
Let's say Alice wants to send a message to Bob. So what does Alice do?

First she asks Bob to send her his public key.
Bob sends Alice his public key.
Alice then encrypts the message with Bob's public key and sends it to him.
Bob then decrypts the message with his private key.

Pretty straightforward right? Well, it's not so simple. Consider the scenario as shown in the image below(You can click on the image to enlarge it):

MITM image

Alice sends a message to Bob asking for his public key.
Jack (a man in the middle on the routing path between Alice and Bob) intercepts Alice's message before reaching Bob and Jack simply forwards this message to Bob, the only difference being the source address is now Jack's rather than Alice's.
Bob now sends his public key to Jack thinking that he sent it to Alice.
Jack now forwards his public key to Alice.
Alice encrypts her message with Jack's public key thinking it is Bob's and sends it again to Bob.
Jack intercepts Alice's message again and guess what? Jack decrypts the message with his private key and is able to read the whole message!.
Further exacerbating the situation, Jack modifies the message and can also inject malicious content. He then encrypts the message with Bob's public key and sends it to him.
Bob receives the malicious message from Jack thinking he's received it from Alice!.

This whole process described above is called a Man In The Middle(MITM) attack and is as described pretty detrimental. Throughout the whole communication, Alice and Bob had no inkling that their messages were being read and modified by a third party.
So how do we solve the problem?
In the 3rd step, Alice got the key from Jack. Presuming it is Bob's key, she blindly encrypted the message with the it. What if Alice verified the key (that it actually belongs to Bob) by some method? What if there is a protocol or a rule to verify the public key even before starting data transfer? The Man In the middle cannot send his public key because Alice can verify(by some method) that it is not Bob's and stop the communication immediately. This is what SSL certificates are all about. Formally, they help to authenticate the public key of the server.
Now that we understand the problem, let us now look at the solution.

Certificate Authorities (CAs)

Need of a third party in authentication

From the previous problem, it is clear that Alice needs to verify that the public key indeed belongs to Bob. How can she achieve that?

She asks Bob to send proof that he owns the public key.
What does Bob do? Bob asks a third person(whose name is CA) whom both Alice and Bob trust to sign a document which contains his public key and his details.
CA then signs the document and sends it to Bob.
Bob now sends this document to Alice.
Alice gets the document and verifies that it is signed by CA whom she trusts and therefore, Alice can be sure that she has the correct key.

Thus there is a need of an external third party in authentication to verify the credentials of the server. The document is called as a certificate and the third party is called a Certificate Authority(CA) whom everyone on the internet trusts. The CA like everyone on the network, has it's pair of public and private keys.

To sign a certificate means that the CA encrypts the certificate data with it's private key, so that it can be decrypted by anyone having it's public key.

Root Certificate Authorities (Root CAs)

Root CAs are the CAs which are trusted by everyone on the internet. Examples of CAs are Symantec , RapidSSL , GeoTrust , DigiCert.
There is no one above the root CAs in the trust chain (explained below). The certificates of the root CAs are signed by themselves as everyone trusts them. Every browser uses the root certificate to verify server certificates. To become a Certificate Authority, one has to get verified from all the pertinent authorities as described here.

What if a MITM happens when verifying a certificate with the Certificate Authority?
Ans : Most of the devices on the internet are shipped with root CA certificates by the operating system that are trusted globally. Also, most of the browsers themselves have a list of trusted CAs so that they do not have to rely on the OS for verifying certificates. So there is no need to ask the CA for the certificate over the network, the certificates are already available on every device!

On Linux machines, you can view all the root certificates which your machine trusts in the etc/ssl/certs folder.

cd /etc/ssl/certs
ls

Intermediate Certificate Authorities

The root CAs run under extremely strict guidelines. In addition to the regulations and restrictions put forth by the CA/B Forum’s Baseline Requirements, some root programs – for instance, Mozilla’s – add even more stringent requirements on top.

The reason for this is simple: trust.

A root certificate is invaluable, because any certificate signed by it will be automatically trusted by the browsers.

That is why, root Certificate Authorities do not issue server certificates (end user SSL certificates) directly. Those root CAs are too valuable and there’s just too much risk as a single attack on the CA root's private key will be detrimental for the whole internet(explained in the vulnerabilities section).

So, to insulate themselves, CAs generally issue what is called an intermediate root. The CA signs the intermediate root with its private key, which makes it trusted. Then the CA uses the intermediate certificate’s private key to sign and issue end user SSL certificates. This process can play out several times, where an intermediate root signs another intermediate and then a CA uses that to sign certificate. These links, from root to intermediate to leaf – are the certificate chain or the chain of trust.
Certificate chain image

Certificate Signing Request (CSR)

For a web server to obtain a SSL certificate, it has to request one of the Certificate Authorities to sign a certificate. The server does so by sending the CA a document containing the server's information- domain name, organization name, email address,locality,country and finally the public key. This document is called as a Certificate Signing Request (CSR).

Formally, A CSR or Certificate Signing request is a block of encoded text that is given to a Certificate Authority when applying for an SSL Certificate. It is usually generated on the server where the certificate will be installed and contains information that will be included in the certificate such as the organization name, common name (domain name), locality, and country.

Most CSRs are created in the Base-64 encoded PEM format. This format includes the "-----BEGIN CERTIFICATE REQUEST-----" and "-----END CERTIFICATE REQUEST-----" lines at the beginning and end of the CSR. A PEM format CSR can be opened in a text editor and looks like the following example:

SSL Certificate

After receiving the CSR, the CA performs routine checks like - verifying the domain name provided in the CSR, checking the locality,address, business type of the server.
After performing these checks and validating the server, the CA issues a certificate to the server.

CSR

The certificate contains all the information of the server from the CSR and additional information -

The details of the CA (name of CA,company name,organization name).
The CA computes the cryptographic hash of the entire certificate containing the above information. Then it encrypts the hash with it's private key and adds it to the certificate. This is called as the signature or the hash of the certificate and is unique for every certificate.

Why does the CA hash the data?
Hashing helps to shorten the data into a fixed sized string. So the size of the certificate is reduced.
Each certificate will have a unique hash. So if some man in the middle tries to modify the certificate content, it's hash will change, so the hash in the certificate(provided by the CA) will no longer match with the hash of the modified certificate and hence the certificate is invalid.

Why does the CA encrypt the hash?
Let's assume the hash is sent as plain text. Some man in the middle intercepts the certificate. He then changes the public key inside the certificate to his own key (to carry out a MITM attack). This changes the hash of the certificate, so the certificate is invalid.
But now, as the hash is sent as plain text, the man in the middle can simply calculate the new hash using the same hash function and replace it with the hash (issued by the CA) in the certificate! By doing so, he has modified the server certificate to contain his own public key and the certificate is completely valid. He then can carry out the MITM attack successfully.
To prevent this, the CA encrypts the hash(informally meaning signs the hash) with it's private key. Now even if the MITM changes the public key and calculates the new hash using the same hash function, he has to encrypt the hash, but he cannot do so because he will require the CA's private key to encrypt the new hash! Encrypting the hash with his private key is futile as the client (browser) will decrypt it using the CA's public key and the output will be random characters which will be immediately rejected by the client.

These two properties of the signature-hashing and encryption-primarily help in ensuring security in SSL Certificate validation.
The server receives the certificate and stores it permanently on the machine as it will be used to validate all the connections to it.

You can check the certificate of this website by clicking on the lock icon in the chrome addressbar and clicking of Certificate.
Click on the certificate tab in the drop-down.
Lock
You can see the certificate details (Server and CA details).

Click on the details tab to expand each certificate in the chain of trust.

Certificate Validation process

So now that we have understood the certificate structure and how it is issued to the server, we can now look at how a client verifies the certificate.
The validation process requires the chain of trust of certificates.
Let's assume that Server (S) was issued a certificate by a intermediate root (I) which was issued a certificate by the root CA (R).
So the chain of trust is : $R \rightarrow I \rightarrow S$ .
The validation steps are as follows:

Do we trust S ? NO, check the certificate of S.
Certificate of S is valid and is issued by I. Do we trust I ? NO, check the certificate of I.
Certificate of I is valid and is issued by R. Do we trust R? YES, R is a root certificate and is trusted by the machine / browser.
R is trusted $\implies$ I is trusted $\implies$ S is trusted (Chain of trust!).

Note : The certificates of the intermediate roots are sent by the server itself to the client. Root certificate may or may not be sent as the client is expected to have the root certificates stored locally.

In this way, server certificate is validated and hence the server is authenticated.

How does the client validate a certificate ?
Ans : Recall that the hash of the certificate is obtained by hashing the entire content of the certificate and then encrypting it with CA's private key. So the client will use the CA's public key to decrypt the hash ( $H_1$ ). Now it calculates the hash of the certificate content ( $H_2$ ) (excluding the hash in the certificate) using the same hashing function. If the certificate is valid, then $H_1 = H_2$ , else the certificate is invalid.

Apart from checking the signature,

Client will also check the domain name, the domain name should match the domain entered in the address bar of the browser.
It also ensures that the certificate has not expired by checking the valid till date in the certificate.

Now that we have the intuition of the process, we look at the authentication process in detail. Refer the below diagram for the process.

CSR

Process : Assume that the certificates sent by the server is an array
A = [server_cert,intermediate_cert,root_cert]

Server Certificate : Client(browser) first processes the server certificate. For validation, it needs to obtain and verify the public key of the issuer(intermediate). So it then moves to the issuer certificate.
Issuer(intermediate) certificate : For validation, it needs to obtain and verify the public key of the root. So it then moves to the root certificate.
The root certificate is locally available and is self-signed and trusted, so the public key of the root is verified.
Now again move to the issuer(intermediate) certificate. As the public key of root is verified and available, the hash( $H_1$ ) in the certificate is decrypted. Then, the hash( $H_2$ ) of the entire certificate content excluding the signature is calculated and if $H_1 = H_2$ then the certificate is valid.
Now again move to the server(end-user) certificate. As the public key of issuer is verified and available, the hash of the certificate is decrypted and validated using the same method described in the previous point.

This is how your browser verifies the server everytime you connect to any website using https. The browser really has to do a lot of work! Furthermore, it is not necessary that there is only one intermediate certificate in the chain. There can be many intermediates and the browser will verify each of these till the root.
To sum it up, this process can be thought of a recursive function which goes upto the root and then verifies the certificates one by one.
A sample C++ function for validation can be :

bool valid = true;
Certificate* validate(string c) {
    Certificate current = cert(c); // Get current certificate
    if(c->issuer->name != c->name){
        Certificate parent = validate(c->issuer) // cert(string s) returns the certificate of s from the array sent by the server;
        bool fl = verify(current,parent->public_key); // verify current certificate with public key of parent certificate
        if(!fl)
            valid = false;
    }
    return current; // returns the certificate
}

Now that we have understood how SSL Certificates work, let's apply them in a SSL/TLS connection.

The SSL/TLS Handshake

The SSL/TLS handshake is similar to a TCP handshake. It is used to establish an encrypted connection between the client and the server. The TLS handshake takes place over a reliable TCP connection as shown in the diagram below. The TLS handshake follows the process as follows :
Following is a brief explanation of the handshake, you can learn more about it here

The client sends a hello message to the server asking the server for it's certificate and other parameters.
The server replies by sending a server hello message containing it's certificate issued by a CA and CipherSpec parameters.
The client first verifies the server by validating it's certificate. If the certificate is invalid, it aborts the connection immediately. Otherwise, the client continues the handshake by sending the Cipher parameters for the key exchange and symmetric encryption of the data.
The server then completes the symmetric key, now the client and server can securely exchange data.

Thus, every TLS handshake first verifies the server and only then goes ahead. In this way, SSL certificates help to authenticate the servers and prevent attacks. But every system has some vulnerabilities. We will see the SSL vulnerabilities in the next section.

Vulnerabilities

The key role in the security of SSL certificate is the hash function used and encryption of the hash. Likewise there are 2 types attacks possible :

Hash Function collision

If an attacker manages to find a hash value of the modified certificate which is same as the original hash, a collision occurs. So the attacker can send the modified message which is still valid. In it's early days, SSL used the SHA - 1 hashing function which has been broken by many attacks in the past years.
Currently, SSL/TLS use the SHA - 256 hashing function which is secure and has not been broken yet. You can read about the SHA - 1 attack here.

Compromised CA private key

If the CA can be subverted, then the security of the entire system is lost, potentially subverting all the entities that trust the compromised CA. If the private key of a Certificate Authority is leaked, any person in the possession of the key can sign a certificate for any domain,any organisation, any location! This can have a detrimental effect on the whole network.

A notable case of CA subversion like this occurred in 2001, when the certificate authority VeriSign issued two certificates to a person claiming to represent Microsoft. The certificates had the name "Microsoft Corporation", so they could be used to spoof someone into believing that updates to Microsoft software came from Microsoft when they actually did not. The fraud was detected in early 2001. Microsoft and VeriSign took steps to limit the impact of the problem.

Conclusion

The SSL/TLS is an amazing protocol which we rely on for day to day communication and it only makes more sense once we understand the problems with unsecure, unencrypted communication.
The SSL certificates are only a part of the SSL/TLS protocol but play a significant role in authentication of the server and inception of the data transfer.

Thank you very much for reading till here, I know it was a long post, I had to put in all the pertinent content so that you can get a good understanding of SSL certificates. Hope you liked this post, please feel free to comment your thoughts, suggestions or any doubts below.