Updated: Dec 24, 2019
When it comes to authorization, we constantly find ourselves having to balance between consistency and availability. On one hand, we do not want to make too many trips to our database just to confirm sessions, but on the other, we want to have as much control over the user’s state as we can possibly afford.
In this article we’re going to explore the standard practices in the world of access control, and suggest a hybrid approach for session revocation that both ensures immediate propagation of change while still saving us most (if not all) of our database queries — with just a tiny bit of probabilistic magic…
Let me begin with a short story: it's Saturday night, and you’re about to throw the greatest party of all time. Since your party is so great, many many people have signed up for it, and you know for certain, that many others might try to crash it without an invitation. In order to make sure that things don’t go out of hand, you decide that some sort of authorization should be in place.
Your first idea is to manage a guest list, and use it to verify every guest at the entrance. Your excitement is short lived, though, as a friend of yours points out that so many people have RSVP’d, that it would take some good amount of time for the bouncer to verify each and every one of their names, especially if many of them come together all at once. You decide that he might be right, and instead come up with another method.
The other method you can think of, consists of handing out invitations to your guests, each containing their name. That way, while the invitations are not directly managed by you, you can still ensure their authenticity and authorize your guests on a name basis without managing a guest list…
Handling user authorization in web applications often looks much like throwing a party. We must find a way to make sure that only our guests get in, and when they do, we have to make sure that they don’t step out of line. How do we do that?
Note: if you’re already familiar with modern-day authorization methods, feel free to skip to the next section.
There are two main approaches to solving this problem. The first is the centralized approach, where every access request is being verified against a user (or session) database:
From a security point of view, this approach is as fail-proof as it gets, since every request is being verified with a single source of truth, prior to performing any kind of action. This is our guest list.
But as our friend mentioned, this approach has its downsides. At the very least, it makes requests take longer, as we add another stop between our users and what they want to do. Also, by making them go through the database with every request, we introduce another point of failure — even if our server is up and running, any issue with our database would lead to denial of service. The root of the problem here, is the fact that we introduce a server-side state to the authorization process, which means that we have to maintain some kind of data store to contain and check that state, whenever we need to let guests in.
The second approach is more distributed: rather than saving our user access permissions in our servers, we tell them what their permissions are, and ask them to remind this to us with every request:
In order to prevent them from lying to us about what permissions we gave them in the first place, we sign the permission we give them cryptographically, using a secret only known to us. That way, we can make sure that they have not tampered with their session. (Note: the same thing goes with the database approach — we sign their session identification, so they won’t be able to try and guess some other user’s session).
Once we are done with identifying the user for their log in process, we don’t need the database anymore! We can locally and efficiently ensure that they really are who they say they are, and have the permissions they claim to have!
“But wait!”, says that annoying friend of yours. “What will you do if you change your mind about one of the guests? Once you sign their invitation, there’s no way for you to take it back!”
And he would be entirely right. Since we sign the user’s identity at the time of login, we only know about their state at that time. If we later on decide that their set of permissions have changed or, god forbid, that they’re banned from our party, we won’t be able to tell that simply by looking at their invitation!
The most popular approach to solving the issue of changes in user state, is to issue each session (or invitation to the party) with an inherent expiration date. That way, every now and then, the user will have to renew their session, at which point we’ll be able to update their state, or block them entirely from our service. One very common implementation of that method can be found in the JWT (or JSON Web Token) standard. JWT lets you store user data in the form of a signed (or encrypted) JSON object, which contains an expiration date as part of its metadata (or “headers”, as they call it).
The main drawback of this method, of course, is the fact that it is time based, and the size of the time frames can only ensure one of two things: make it too frequent, and you find yourself making too many trips to the database, but make it too sparse, and it will take too long for changes in authorization to propagate, which greatly reduces their effectiveness.
We are now approaching the point of this article: there is another approach that might help us both maintain consistency and save us a lot of trips to the database; an approach based on chance…
Back in 1970, a man named Burton Howard Bloom came up with a rather sophisticated idea. He wanted to maintain a set of an unreasonably large number of items (for that time) that required an amount of memory that he could not possibly get. Bloom then thought of the idea of a probabilistic data structure: a set that could tell you whether or not it probably contains an item. That data structure, is called the Bloom filter.
The way a Bloom filter works is rather simple: you have an m sized bit-array in which you want to store n elements. (Note: bit, an 8th of a byte, is a storage unit that can basically hold just a boolean value: 0 or 1). For each element we want to put in our set, we run it through a group of hash functions, each function giving us an index inside said array. For every index i we come up with, we turn the ith bit in the array on:
We then want to check for the existence of an item in the set. We do that by passing the new item through the same group of hash functions, and then we go on to check the array. Hash functions are deterministic by nature, and will always map a given input (Ice Cream) to the same output (1, 4, 13), so if at least one of the bits is turned off, then we can safely assume that this item is not in the set. If it were, all the bits should have been turned on before. But what happens when the test comes back positive? Can we say for certain that the item exists within the set? The answer is no.
Due to the fact that bits are arbitrarily assigned to every element, there might be collisions. In the example above, the value “Ice Cream” is assigned with the 1st, 4th and 13th indices. But nothing ensures us that those indices were not covered by one or more elements that are not Ice Cream.
Now let’s go back to our case of user sessions. Since we want to reduce the amount of database access for the average authorization process, we forfeit the reliability of our state. But that doesn’t mean we cannot rely on chance, in the form of a probable set of access denial, stored locally on our server.
Let’s say that we have a database consisting of one million users. Whenever a user logs in, we save their data on a session and sign it, so we can easily trust it later on. One of the things we would probably store in the session is some sort of a session ID.
Now let’s say that we want to revoke a session. If we stored the session ID of the user in the database when we issued that session (as we should have), we can restore it from there. We then put the session in the set of access denial, a Bloom filter that contains all the IDs of the sessions that we have previously revoked.
When a user next tries to gain access, we run that session’s ID through the set. If it returned a negative result, then that session was certainly not revoked, and that’s a key factor here — unauthorized access is not possible with this approach. On the other hand, if we find it to be positive, then the session was probably revoked, and the user will not be able to carry on.
The problem is that a user might be denied access even if we didn’t mean to block them. As with the Ice Cream example, their set of bits in the array might be accidentally covered by others users. We can ease that by referring to a general rule of thumb regarding Bloom filters: if we have 10 bits per expected item in the set, we have a probability of less than 1% for a collision. For a user base consisting of 1 million users, this is roughly 1 MB of space. Not too shabby, huh?
But then a new problem arises. Gradually but surely, we will exhaust our set. As more and more sessions are revoked, more and more bits are turned on, and the chance of false positivity rises sharply. We can prepare ourselves for 1 million session revokes, but what happens when we hit 10 million users? Or 50 million?
The answer is something we’ve already discussed — expiration. Let’s say, once again, that we have one million users, out of which 10,000 are daily active. Say that their sessions are good for a limited amount of time (for the sake of our example, let’s make it a day), after which they require renewal against the database. In that case, we can also empty the set of access denial once a day, as we can shut down the session upon the mandatory renewal anyways. Furthermore, we can even shrink the allocated memory for the set down to match the amount of expected daily users (in our case, it would take 12.5 kb for 10,000 daily users instead of 1 MB for all of our users), and scale it according to traffic. When we hit 50 million users, we’ll still be good to go, as we change the set’s size on a day-to-day basis.
And what will happen if the user was falsely logged out? They’ll have to log in again. When you rely on chance you might make mistakes, but if you keep the collision rate below a certain ratio, this will be so rare, that it won’t be more than a minor inconvenience.
To sum it all up, by using the traditional method of session expiration, we can dramatically reduce the amount of database queries required for the average authorization process. Combined with a Bloom filter, we can further reduce them, as we can make the expiration interval less frequent while still maintaining an immediate consistency of session revocation and access denial. Probably.
This is, of course, theoretically speaking, and it is important to state that this article does not reflect how authorization currently works at Wix in any way. In real life, there are many other concerns which we have not yet discussed, such as how this system would work in a distributed environment, or within a serverless framework. That is a story for another day, though.
The party, by the way, was a hit.
This post was written by Roy Sommer
You can follow him on Medium
For more engineering updates and insights: