Posts

Apr 16, 2017
People, not software, first

During my first year at the university, I remember asking my Algorithms and Data Structures professor about how to properly handle user input in C. “If we are going to store this input, how we can be sure that the user only types an integer and not a string?”. I don’t actually recall his exact answer, but it was the first time that someone mentioned to me that handling user input is a recurring problem in Computer Science. And this was way before I learned about related problems like database input sanitization.

Unfortunately, I learned the wrong lesson that day. To me, the problem was that we can’t trust the user, so we have to babysit every step of its human-computer interaction. If we don’t, they will crash our beloved hand-crafted software, which will make us - software developers - sad. And I couldn’t be more wrong. When a piece of software crashes, no one can ever gets sadder than the users who faced an error screen themselves. After all, they needed to use our software and weren’t able to.

I took a few years to figure this out. Even a little after I’ve graduated at the university, I thought that every software that goes to production should be perfect. Every exception that can be raised in its code should be treated and it cannot ever crash in an unpredictable way. But in the real world, things do not work like this. And guess what? Users don’t care if your software isn’t perfect, as long as it suits their needs. All you have to care is about offering a friendly UI and giving proper feedback when things doesn’t work as expected.

A couple weeks ago I found (on this post by Julia Evans - her blog is awesome, you should check it out) a series of tweets by Kelsey Hightower. He talks about how his working life got a lot more meaningful when he started to put people first. The part which I like most is when he mentions that computers are just machines waiting to break and that software is worse, because it’s always broken! Accepting that software isn’t just not perfect, but also broken by design, may be the best way to deal with the issues we face everyday in this industry.

See, I’m not saying that we should be narcissist/nihilist professionals that don’t care about the quality of the work we publish to whomever our users are. I think that maybe, if we treat problems with its proper importance (e.g. pretty serious when it breaks user experience, and not so much if its a known problem that is not user-visible and we can ignore), we can feel a little more proud about the systems we maintain. Otherwise we’ll be doing a Sisyphus job, which can only led to an eternity of useless efforts and unending frustration.
Feb 3, 2017
Software engineering tips from a DevOps
There’s a pattern that I’ve observed in nearly every software development company that I worked for. When discussing solutions with developer teams, two distinct things can happen: if it’s something related to operating systems, networking or infrastructure in general, they agree with me almost never arguing against anything. Sometimes this is bad, as I could possibly be missing an important detail which they thought that’s already figured out. On the other hand, when the matter is related to programming and software engineering, they ignore me virtually every time. It’s almost like if I’ve not even said anything meaningful.

Based on this behavior, I’ve compiled a list of software engineering tips (or best practices) that I urge every developer to follow (and every DevOps/SysAdmin to remember them about it). None of these items were invented by me, nor they are just theoretical things. Those are battle-worn recommendations from someone who is part of this industry for quite a few years, which have seen many things that can go wrong when bad decisions are made. They are not hard to follow, but I’ve seen even experienced developers making the same mistakes mentioned here.

Do not hardcode configuration parameters

This one is tempting and source of “works on my machine” symptoms. We know that developers love to have control about what they are creating and sometimes afraid of someone running their software with badly formatted input. This is even worse with configuration parameters: “you put an ‘http://’ in there, please just use the hostname”. So this may be difficult to ask, but please, read configuration parameters from environment variables not only when you should, but every time you can. A default value can not only be useful for a specific environment (e.g. development), but also works as documentation for the expected format. See this Python example:
```
import os

database_url = os.environ.get('DATABASE_URL', 'postgres://user:pwd@localhost/app_db')
redis_host = os.environ.get('REDIS_HOST', 'localhost')
```
You can assume some things like “this web app will only run on port 80”, but there isn’t a way to know this for sure when it goes to production. It can be inside a container that have its ports forwarded or not. A reverse proxy can be in front of it and the app server will have to bind to a higher port. If the application can dynamically read this kind of information from its environment, you’ll be making both of our jobs way easier. I won’t have to ask you for changes and you won’t have to change the implementation (or worse, force me to do this).

Do not try reinvent the wheel

We all know: software development is great. You can ask a computer to do anything you want and it will do it over and over again. Because of this, you may be attracted by the idea that not only you can do anything, but also get the best possible solution without thinking too much about it. The reality is that this is not going to happen, maybe not even if you are truly a genius. I’ve seen this happening multiple times, specially with parsers for structured text, “things that can be solved with a clever regex” and the like.

The advice I can do to avoid this is: be humble. Assume that your solution may not work for every case. Actually, the most important part is to realize that if you can’t think about an use case, it doesn’t mean that it doesn’t exist and won’t ever appear. A few weeks from now an edge case can come to bite you. Look for opensource frameworks and libraries that can do what you need. Learn to appreciate the work from people that have been polishing these pieces of software for years - and allowing you to use it for free. Maybe you can even do a significant contribution to make them better.

Gerald Sussman, the legendary MIT professor who co-authored the SICP book, was once asked about the switch from Scheme (a Lisp dialect) to Python in the Computer Science undergraduate program. His answer was that it made sense, because these days programming is very different from what it was in the 80s and 90s. Today it’s “more like science”, where you grab some libraries and figure if they can do what you want. So, stand on the shoulder of giants and only write from scratch what you really need.

Opt for rock-solid battle-tested solutions

This one is related to “not reinventing the wheel”, but it’s more about choosing mature implementations that have a greater chance to work. Sometimes you may be tempted to pick this new framework that everyone is talking about (some of them without having ever touched it), but maybe it is not really ready to be used. Of course someone will have to start using it to prove if it works or not, but you probably don’t want to face the problems these early adopters will hit. At least not on a production environment.

The same can be said when you need, for instance, a network protocol. Using raw sockets can be fast and fun, but when you application grows a little bit you’ll realize you need a real protocol. Then you will be implementing compression to trade data efficiently, defining a format to receive arguments, etc. In the end, something like GRPC was the answer of all the problems you were trying to solve, but what you got is a stripped-down version of it that wasn’t tested by thousands of other developers around the globe.

Closing thoughts

I could list a few more things about this subject, but it’s better if we keep this post short. Unfortunately, I’ve experienced some of the mentioned problems more than once in the last couple months. I’m not an expert in managing people, but one of the causes seems to be closely related to ego reasons. Sometimes I think that this may be naivety, when the person in question can’t see the broader picture of the problem they are facing. At the same time, this is funny because it also happens with multiple-year experienced individuals.

If you are a developer who identified with at least part of this list, you don’t really need to listen to me. Think about yourself, as a professional in our area, and what you can do to write software that is easy and robust to be executed in different environments. Always remember that your responsibility doesn’t end when you push code in commits to a repository. You are at least responsible as I am for running your code in production. In the end, it’s your creation, right?
Jan 11, 2017
How SSH authentication works
A great friend of mine, Diego “Diegão” Guimarães (which also happens to be one of the best programmers I ever met), recently asked me: “why do I have to specify the private key when connecting to an SSH server and not the public one?”. I found this question quite interesting, as it reminds us that even seasoned developers may have doubts about things they use every day. And, of course, it’s always better to ask than accept that “this is just the way it works”.

Before explaining how SSH authentication is performed, we have to understand how public key cryptography (also known as asymmetric cryptography) takes part in this process. Worth mentioning that SSH authentication can also be done using passwords (and this usually is the default setting), but this won’t be discussed here. If you have a machine where its SSH server accepts passwords as an authentication method, you should probably disable this anyway.

Public key cryptography works with pairs of keys. A public one, like its name suggests, can be known by anyone. It can be sent to an internet forum or be published as part of a blog post. The only thing you have to worry when facing a public key is that if its owner is really the same person who they affirm to be. Its trustworthiness relies on the fact that no one can impersonate the key owner, i.e., have the private counterpart of a public key pair - but anyone can generate a key and tell you that it’s from someone else.

But how can you be sure about who is the owner looking at the key itself? The problem is that you can’t. A public (Ed25519) key looks like this:
```
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAII+kwSB7I6nTJ+mqhNMlEzofxa4UaGsU09A8Yk95MU1n [email protected]
```
This format is pretty straightforward, based on {key algorithm} {base64 representation of the key itself} {comment}. The problem lies in the fact that pairs of keys can be easily generated with ssh-keygen. The “comment” part can even be altered after it was created. You can only be sure about who is the owner by the way you got a public key from someone else. Did they sent it to you via e-mail? Can you call them to confirm that this is the right key? Did you got it in a thumb drive brought to you personally by themselves?

The core functionality of this whole system is that a piece of information encrypted with a public key can only be decrypted by its private key. The opposite is true as well, as this is how signing works. Based on this cryptographic principle, the authentication process of an SSH connection works (in a simplified view) as follows:
- The client sends an authentication request, informing the username that will be used to log in.
- The server responds, telling the client which authentication methods are accepted (e.g. “publickey”, in this case).
- A message encrypted with the private key (a “signature”) is sent by the client to the server along with its corresponding public key.
- If this public key is listed as acceptable for authentication (usually as an entry under ~/.ssh/authorized_keys) and the signature is correct, the process is considered successful.
It’s important to point out that the only way one can generate a valid signature to be checked with a public key is by having the private key itself. That’s the very reason why private keys shouldn’t be shared in any way, even with people you trust. That’s no need for two or more persons to have access to the same private key. Everyone should have its own and the server should be configured to accept all of them.

We should be aware that there’s a case where the user can be confused about whether the authentication is being done using passwords or public keys: when the private key has a passphrase. To safely store a key, a passphrase can be set to save it in a (symmetric) encrypted format. This also works as another layer of security, as someone who happens to obtain the keyfile would also have to know the passphrase to unlock it. In this situation, ssh-agent can be used to cache this information in order to avoid typing it every time.

References:
- RFC 4252 - The Secure Shell (SSH) Authentication Protocol
Jan 7, 2017
Benchmarking IP and Unix domain sockets (for real)
In a previous post an artificial benchmark was done to measure the performance difference between IP and Unix domain sockets. The results were somewhat impressive, as Unix sockets performing at least twice as fast as IP sockets. But how these two forms of communication behaves in the real-world, using a battle-tested application protocol? Would the throughput really double just by switching between them? We’ll be using a Flask app served by Gunicorn behind an nginx reverse proxy to find out.

The following tests were executed on a c4.large (2 Cores, 3.75GB RAM) instance on Amazon Web Services (AWS). None of the multi-threading/multi-process options offered by Gunicorn were used, so what we’ve got here was really what it can serve using a single CPU core. This way, we’ll also have the benefit of a free core to run both nginx and the benchmarking (wrk) tool itself.

The application is pretty close to the standard Flask “hello world” example:

requirements.txt
```
Flask==0.12
gunicorn==19.6.0
```
server.py
```
from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello there!"

if __name__ == "__main__":
    app.run()
```
Gunicorn was used to serve the application with no other option besides --bind.
- IP: gunicorn --bind 0.0.0.0:8000 server:app
- Unix domain socket: gunicorn --bind unix:/tmp/gunicorn.sock server:app
This is the nginx virtual host configuration for both Gunicorn instances:

/etc/nginx/sites-available/gunicorn
```
server {
    listen 80;
    server_name bench-ip.myhro.info;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_redirect off;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 80;
    server_name bench-unix.myhro.info;

    location / {
        proxy_pass http://unix:/tmp/gunicorn.sock;
        proxy_redirect off;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```
We’ll have to append both hostnames to our /etc/hosts, in order to avoid the need for a DNS server:
```
(...)
127.0.0.1 bench-ip.myhro.info
127.0.0.1 bench-unix.myhro.info
```
The parameters used in this benchmark were pretty much what wrk offers by default. Experimenting with more threads or connections didn’t resulted in a significant difference, so the only parameter set was -d5s, which means “send the maximum number of requests as you can during five seconds”.

IP benchmark
```
Running 5s test @ http://bench-ip.myhro.info/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.44ms  303.33us  11.56ms   99.02%
    Req/Sec     0.92k    16.21     0.96k    66.00%
  9191 requests in 5.00s, 1.60MB read
Requests/sec:   1837.29
Transfer/sec:    328.26KB
```
Unix domain socket benchmark
```
Running 5s test @ http://bench-unix.myhro.info/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.95ms  283.81us  11.25ms   97.96%
    Req/Sec     1.01k    24.75     1.04k    90.00%
  10107 requests in 5.00s, 1.76MB read
Requests/sec:   2019.39
Transfer/sec:    360.79KB
```
During multiple runs, these numbers were consistent. The Unix socket virtual host answered around 5 to 10% more requests in average. This number is small but can be significant, specially when dealing with high traffic web servers answering thousands of requests per minute. Anyway, this isn’t anywhere near the 100% performance improvement we saw when comparing raw sockets instead of a real protocol like HTTP.

It would still be interesting to compare how this application would perform running inside a Docker container. Docker is known for having network overhead when using forwarded ports, so we’ll see how much it means in this case. Two files will be used to create our application image and its containers:

Dockerfile
```
FROM ubuntu:xenial

RUN apt-get update
RUN apt-get install -y python-pip

ADD . /app

RUN pip install -r /app/requirements.txt

WORKDIR /app
```
docker-compose.yml
```
version: "2"
services:
  base:
    build: .
    image: flask
  ip:
    image: flask
    command: gunicorn --bind 0.0.0.0:8000 server:app
    ports:
      - "8000:8000"
    volumes:
      - .:/app
  uds:
    image: flask
    command: gunicorn --bind unix:/tmp/gunicorn.sock server:app
    volumes:
      - .:/app
      - /tmp:/tmp
```
Let’s run wrk again, after docker-compose build and docker-compose up:

Docker IP benchmark
```
$ wrk -d5s http://bench-ip.myhro.info/
Running 5s test @ http://bench-ip.myhro.info/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.03ms  791.63us  16.84ms   93.51%
    Req/Sec   713.54     20.21   747.00     70.00%
  7109 requests in 5.01s, 1.24MB read
Requests/sec:   1420.17
Transfer/sec:    253.73KB
```
Docker Unix domain socket benchmark
```
$ wrk -d5s http://bench-unix.myhro.info/
Running 5s test @ http://bench-unix.myhro.info/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.94ms  266.67us  10.74ms   97.24%
    Req/Sec     1.02k    29.87     1.04k    95.00%
  10116 requests in 5.00s, 1.76MB read
Requests/sec:   2022.18
Transfer/sec:    361.29KB
```
The difference between IP sockets over forwarded ports and Unix sockets via shared volumes were huge under Docker. 40-45% is a pretty big number when considering web server performance penalty. With a setup like this one, it would be needed almost twice as hardware resources to serve the same number of clients, which would directly reflect on infrastructure and project costs as a whole.

A few conclusions can be drawn from this experiment:
- Avoid Docker forwarded ports in production environments. Use either Unix sockets or the host network mode in this case, as it will introduce virtually no overhead.
- Ports can be easier to manage, instead of a bunch of files, when dealing with multiple processes - either regarding many applications or scaling a single one. If you can afford a little drop in throughput, go for IP sockets.
- If you have to extract every drop of performance available, use Unix domain sockets where possible.
Jan 3, 2017
How fast are Unix domain sockets?
Warning: this is my first post written in English, after over five years writing only in Portuguese. After reading many technical articles written in English by non-native speakers, I’ve wondered: imagine how much information I would be missing if they wrote those posts in French or Russian. Following their examples, this blog can also reach a much wider audience as well.

It probably happened more than once, when you ask your team about how a reverse proxy should talk to the application backend server. “Unix sockets. They are faster.”, they’ll say. But how much faster this communication will be? And why a Unix domain socket is faster than an IP socket when multiple processes are talking to each other in the same machine? Before answering those questions, we should figure what Unix sockets really are.

Unix sockets are a form of inter-process communication (IPC) that allows data exchange between processes in the same machine. They are special files, in the sense that they exist in a file system like a regular file (hence, have an inode and metadata like ownership and permissions associated to it), but will be read and written using recv() and send() syscalls instead of read() and write(). When binding and connecting to a Unix socket, we’ll be using file paths instead of IP addresses and ports.

In order to determine how fast a Unix socket is compared to an IP socket, two proofs of concept (POCs) will be used. They were written in Python, due to being small and easy to understand. Their implementation details will be clarified when needed.

IP POC

ip_server.py
```
#!/usr/bin/env python

import socket

server_addr = '127.0.0.1'
server_port = 5000

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind((server_addr, server_port))
sock.listen(0)

print 'Server ready.'

while True:
    conn, _ = sock.accept()
    conn.send('Hello there!')
    conn.close()
```
ip_client.py
```
#!/usr/bin/env python

import socket
import time

server_addr = '127.0.0.1'
server_port = 5000

duration = 1
end = time.time() + duration
msgs = 0

print 'Receiving messages...'

while time.time() < end:
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((server_addr, server_port))
    data = sock.recv(32)
    msgs += 1
    sock.close()

print 'Received {} messages in {} second(s).'.format(msgs, duration)
```
Unix domain socket POC

uds_server.py
```
#!/usr/bin/env python

import os
import socket

server_addr = '/tmp/uds_server.sock'

if os.path.exists(server_addr):
    os.unlink(server_addr)

sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
sock.bind(server_addr)
sock.listen(0)

print 'Server ready.'

while True:
    conn, _ = sock.accept()
    conn.send('Hello there!')
    conn.close()
```
uds_client.py
```
#!/usr/bin/env python

import socket
import time

server_addr = '/tmp/uds_server.sock'

duration = 1
end = time.time() + duration
msgs = 0

print 'Receiving messages...'

while time.time() < end:
    sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    sock.connect(server_addr)
    data = sock.recv(32)
    msgs += 1
    sock.close()

print 'Received {} messages in {} second(s).'.format(msgs, duration)
```
As we can see by those code snippets, both implementations are close to each other as possible. The differences between them are:
- Their address family: socket.AF_INET (IP) and socket.AF_UNIX (Unix sockets).
- To bind a process using socket.AF_UNIX, the socket file should be removed and created again if it already exists.
- When using socket.AF_INET, the socket.SO_REUSEADDR flag have to be set in order to avoid socket.error: [Errno 98] Address already in use errors that may occur even when the socket is properly closed. This option tells the kernel to reuse the same port if there are connections in the TIME_WAIT state.
Both POCs were executed on a Core i3 laptop running Ubuntu 16.04 (Xenial) with stock kernel. There is no output at every loop iteration to avoid the huge performance penalty of writing to a screen. Let’s take a look at their performances.

IP POC

First terminal:
```
$ python ip_server.py
Server ready.
```
Second terminal:
```
$ python ip_client.py
Receiving messages...
Received 10159 messages in 1 second(s).
```
Unix domain socket POC

First terminal:
```
$ python uds_server.py
Server ready.
```
Second terminal:
```
$ python uds_client.py
Receiving messages...
Received 22067 messages in 1 second(s).
```
The Unix socket implementation can send and receive more than twice the number of messages, over the course of a second, when compared to the IP one. During multiple runs, this proportion is consistent, varying around 10% for more or less on both of them. Now that we figured their performance differences, let’s find out why Unix sockets are so much faster.

It’s important to notice that both IP and Unix socket implementations are using TCP (socket.SOCK_STREAM), so the answer isn’t related to how TCP performs in comparison to another transport protocol like UDP, for instance (see update 1). What happens is that when Unix sockets are used, the entire IP stack from the operating system will be bypassed. There will be no headers being added, ~~checksums being calculated~~ (see update 2), encapsulation and decapsulation of packets being done nor routing being performed. Although those tasks are performed really fast by the OS, there is still a visible difference when doing benchmarks like this one.

There’s so much room for real-world comparisons besides this synthetic measurement demonstrated here. What will be the throughput differences when a reverse proxy like nginx is communicating to a Gunicorn backend server using IP or Unix sockets? Will it impact on latency as well? What about transfering big chunks of data, like huge binary files, instead of small messages? Can Unix sockets be used to avoid Docker network overhead when forwarding ports from the host to a container?

References:
Updates:
1. John-Mark Gurney and Justin Cormack pointed out that SOCK_STREAM doesn’t mean TCP under Unix domain sockets. This makes sense, but I couldn’t find any reference affirming nor denying it.
2. Justin Cormack also mentioned that there’s no checksumming on local interfaces by default. Looking at the source code of the Linux loopback driver, this seems to be present in kernel since version 2.6.12-r2.

Newer »

Posts

People, not software, first

Software engineering tips from a DevOps

Do not hardcode configuration parameters

Do not try reinvent the wheel

Opt for rock-solid battle-tested solutions

Closing thoughts

How SSH authentication works

Benchmarking IP and Unix domain sockets (for real)

IP benchmark

Unix domain socket benchmark

Docker IP benchmark

Docker Unix domain socket benchmark

How fast are Unix domain sockets?

IP POC

Unix domain socket POC

IP POC

Unix domain socket POC