Monday, February 18, 2013

Certificates

Introduction
Certificates is a new way to prove identity. Unfortunately this topic sometimes is hidden from publicity and still is a mystery for average Internet user. I would like to share some knowledge about certificates, their structure and the way how I understand them. This article covers very basic questions about certificates  that users may have.

Certificate structure (simplified)
What is certificates for? The main purpose is to verify identity (prove that information written in certificate belongs to certificate owner). 

Certificate consists of three main parts:
  • Information - everything that you can see when open certificate in MMC console or other certificate viewer;
  • Public key - a unique sequence that belong to certificate owner;
  • Signature Algorithm - methods that was used to create certificate signature (usually it is built from encryption algorithm and digest algorithm. For example SHA1 with RSA)
  • Signature - another unique sequence that represent result of applying encryption and digest.
Lets take a closer look to every part

Information: 
Here we can add any information we would like to present and prove.
According to Internet standard RFC2459 there are mandatory fields
- issuer - id who created signature
- subject (owner) - owners name or other owners unique information
- serial number - issuer certificate serial number. A pair serial/issuer creates an unique certificate id
- not before, not after - expiration date and effective date.
- constrains - main purpose for the certificate. For example if constrains is main exchange - it should not be used for SSL, ets.

Public key:
This section contains a public key of any asymmetric cryptographic system. The most popular asymmetric system are RSA and Diffie–Hellman

Information + public key also called TBSCertificate. Every
TBSCertificate contains the names of the subject and issuer, a public
key associated with the subject, a validity period, a version number,
and a serial number; some may contain optional unique identifier   fields. (from RFC)


Signature algorithm and Signature:
In order to verify certificate, a verifier should know how to verify certificate signature. For example, if algorithm is SHA1 with RSA then this means that in order to verify certificate signature we need to get SHA1 from certificate and then it should match with data we have got by decrypting signature with issuers public key. The most popular message digest algorithms are SHA1, MD5, MD4, MD6. 


How to create signature?
Issuer is an authority who sign certificates.
In order to create issuer signature. Issuer gets a digest of certificate data (for example it can use SHA1 algorithm for this) then using private key encrypts digest. As result you will have an encrypted certificate digest.

How to verify signature?
Anyone could verify certificate signature in order to confirm that certificate content belongs to owner.
In order to verify certificate signature we should:
- Decrypt signature it using issuers public key;
- Get digest of certificate;
- Compare results. on one hand you have decrypted signature, on other hand digest. They should be the same because issuer created signature in the same way.


This verification is based on assumption that no one, except issuer, doesn't have issuers private key.

Lets have some example:
Root or self signed certificate

Private key        : 44 33 22 11 
(Private key stored separately from certificate and it is confidential) 


Certificate Root
Issuer             : Government
Subject            : Government
PublicKey          : 11 22 33 44
Signature Algorithm: RSA with SHA1
Signature          : 55 44

This is a simplified version of root or self-signed certificate. Issuer and Subject here are the same. Signature in this case is a digest encrypted with issuers private key. Because issuer and owner are the same, there is no way to prove that private/public key belongs to owner.

It could be compromised by replacing public key and signature. That is why we have many trusted authorities that could prove each other identities. This is partially solved by distributing root certificates together with firmware or operating system.

Another example:

Private key        : DD CC BB AA 
(Private key stored separately from certificate and it is confidential) 


Certificate Library
Issuer             : Government
Subject            : Computer
Public Key          : AA BB CC DD
Signature Algorithm: RSA with SHA1
Signature          : EE FF


In case if we trust Certificate Root. We can verify and confirm that information in certificate is not compromised and belongs to Library.

Why we can trust "Certificate Library"?

Because Root we trust has been verified and proved by its signature that  "Certificate Library"  belongs to Library, and because we trust "Certificate Root" we can say that we trust "Certificate Library ".


Lets find out what is Library certificate signature in this case:
Signature          : EE FF


This is suppose to be an encrypted digest. Digest we get from "Certificate Library" body and encrypted with issuer (Certificate Root) private key.
In our example signature EE FF is the result of
RSA([issuers private key], SHA1( Certificate Library's TBSCertificate))


[EE FF] = RSA([ 44 33 22 11], SHA1( "TBSCertificate Library" ))


Using similar approach we can build a certificate trusted chain. For example you don't trust issuer and don't trust issuer's issuer but you do trust issuer's issuer's issuer.


Why it works?
Signature verification works because we made some assumptions. 

First (the most important) is that for asymmetric encryption systems for key pair (public/private keys) there is no easy way (mathematicians didn't find one yet) to calculate private key based on public key.

The nature of asymmetric systems allows to encrypt message with private key and decrypt message using public key. That is true in opposite direction.

Second assumption is related to one way hash functions that are used to create certificate digest. for long enough input data it is possible to generate unique digest and there is no easy way to find out what is input data from digest.



.