Trigger warning: This may be – depending on your knowledge of how encryption works – a tough read and will not always make sense. I try to give the briefest introduction to some things as I possibly can which is not really that great to be honest. I am not really good at keeping things short as you must have already noticed by this lengthly preamble. Actually I could write half a book about the topic of this blog. So please stop the read and research some things that may be a missing part in the puzzle. I will add some links every once in a while. It is good to know how these things work if you are a frequent user of IPsec and will be worth it in the end. I promise.
Alright, it is a bit spoiled now since I put it in the headline already: IPFire is able to run VPN connections with up to 10 GBit/s. This is achieved by using AES with GCM on latest Intel processors which bring CPU optimisations to actually reach this high throughput. Although most of the users of IPFire do not have the need for this, yet, there are some who are already reaching limits at one gigabit per second. That is why we already take care early so that IPFire is ready when you are.
Before diving into the implementation and some benchmarks let us quickly learn what encryption modes are and how they work:
This is the most simple encryption mode that there is. When data is transmitted over a network it will be cut into chunks which I will call messages here. When using a VPN they are then encrypted with a block cipher like AES or Camellia. Those are deterministic for the same pair of input message and key. Hence an encrypted bitmap image would look like this:
Every group of pixels that is just white will generate the same output after encryption. The encrypted result leaves enough information for someone who was able to capture all the transmitted packets to recover the original input. So ECB is not suitable at all for IPsec and similar applications.
At this point I hope you will get why encryption modes are required to ensure the security of the data that is sent through the VPN tunnel.
It is obvious that we need something better will disguise the data input much better so that at no time there will be no possibility to make any assumptions on the ciphertext what the plain text message originally was. The de facto standard for this has been CBC in the past decade. IPFire has been using it for IPsec as well.
An XOR operation with the previous block will be executed for each data block. A random initialization vector (IV) is used for initial randomness. If the same image from above is encrypted the output would be pseudo-random data and look like this:
Of course this mode of operation has some disadvantages. The first one is that any addition operation requires CPU cycles and therefore slows down the throughput of the VPN connection and secondly, CBC requires the last chunk of data to compute the next one. I will explain later why this an important thing.
If you want to know more details about this what is happening here, please head over to Wikipedia for this.
GCM is an approach that does not only encrypt a message, but also computes a Message Authentication Code (MAC) that protects the authenticity of the sent data. If Alice sends a message to Bob, Alice will compute the MAC and transmit it with the message. When Bob receives the message, Bob will recompute the MAC, too. If the transmitted MAC and the computed MAC equal each other, then Bob can be sure that the message is actually coming from Alice. For this process, a chained Galois field multiplications is performed which is a very cheap operation to perform.
The encryption phase of GCM has also some very interesting optimisations: It does not encrypt the message that is to be send with the block cipher. It encrypts a derivative of the IV which is called the counter and performs an XOR operation with the counter and the plaintext message. This approach is similar to the COUNTER or CTR encryption mode and has the advantage, that the counter can be predicted (unlike the previous messages in CBC).
The prediction of the next value is a very important factor for good performance. Modern processors come with many cores which cannot be used in CBC mode as there is always the need to have the last value and then compute the next and the next and the next. At no time there is a chance to have two cores working at the same time. The CTR and GCM modes can actually use multiple cores and compute some parts even in advance. There is only the relatively fast XOR operation left when data has actually to be transmitted.
Even when you can use multiple processors at the time, performing cryptographic operations still takes a long time. Hence latest generations of Intel processors implement CPU instructions that can encrypt and decrypt data using AES block cipher. The marketing term for these instructions is AES-NI.
Modern processors are able to encrypt or decrypt up to four or five times the amount of data per second when they can use AES-NI. That can be of course be used with all encryption modes. CTR and CBC require and additional XOR operation which is not considerably expensive to perform, but requires still some CPU cycles when there is a lot of data to encrypt.
GCM requires the rather expensive computation of the MAC which is very desirable to have because data authenticity is important, too. Luckily the same Intel processors that come with AES-NI also provide an addition instruction:
PCLMULQDQ which executes the Galois field multiplication. For this reason it is way cheaper to compute the MAC and use the advantages of GCM over CBC or CTR.
We did not have the equipment to perform any benchmarks in the real world with two IPFire appliances installed in two different locations connected with each other with 10G over the Internet. If you have this, we would be happy to publish your results. We could just perform this is the lab with a setup that Intel used as well to write a whitepaper after all this has been implemented. You can download the whitepaper here which will give you much more detail on the topic. I will simply summarise it in this blog post.
Two boxes with processors that provide the requirements were set up and connected to each other with a 10G Ethernet connection. One to twelve IPsec VPN connections were set up between them depending on the benchmark that was performed. A traffic generator was used to send data packets through the VPN connections and measure the throughput of the connection (figure 3).
Back in 2010 when this benchmark was done, they used stronSwan 4.5.3. IPFire uses the same software to run IPsec just in the latest version 5.3.0. The GCM encryption mode was used in conjunction with the AES block cipher with a key length of 128 bits.
The first test deliberately limited the Linux kernel to only use one processor for the benchmark. Figure 4 shows already remarkable differences between an implementation using no AES-NI what so ever and an optimised version that used AES-NI and
PCLMULQDQ. To help putting the results into context, they changed the entire setup to use no cipher (the NULL cipher) and plotted this as the blue line. This shows just the overhead of the IPsec stack itself as there are no CPU cycles used for any encryption operation.
Figure 8 shows the results for twelve VPN connections at the same time where also all six available CPU cores were used. The most remarkable thing is that the throughput of a VPN tunnel that uses AES-128-GCM is almost exactly the same as a tunnel that uses no encryption. That means that the processor can perform the encryption tasks with help of the new optimised instructions that quickly that it does not slow down the throughput at all.
The peak performance in that graph is approximately 9.6 GBit/s which is pretty close if not exactly the theoretical throughput of a 10G Ethernet connection without the Ethernet overhead.
Thanks to the great guys from the strongSwan project, the great guys at Intel who built the processors and who contributed highly optimised assembly code to the Linux kernel to use it in the best possible manner, using IPsec tunnels with very high throughput is possible in IPFire, too.
This feature will be included in IPFire 2.17 – Core Update 90
Posted: April 30, 2015 • 2421 views