Published:

Jarle Aase

How I ended up with a separate back-end service for signing up users and their devices

bookmark 7 min read

When I started on NextApp, I knew that it would be multi-tenant and multi-user. The database tables for tenants and users was among the first I made. Literally. This is from the code for the first iteration of the SQL database for nextappd.

 1static constexpr auto v1_bootstrap = to_array<string_view>({
 2        "CREATE TABLE nextapp (id INTEGER NOT NULL, version INTEGER NOT NULL, serverid VARCHAR(37) NOT NULL DEFAULT UUID()) ",
 3
 4        "INSERT INTO nextapp (id, version) values(1, 0)",
 5
 6        R"(CREATE TABLE tenant (
 7              id UUID not NULL default UUID() PRIMARY KEY,
 8              name VARCHAR(128) NOT NULL,
 9              kind ENUM('super', 'regular') NOT NULL DEFAULT 'regular',
10              descr TEXT,
11              active TINYINT(1) NOT NULL DEFAULT 1))",
12
13        "INSERT INTO tenant (id, name, kind) VALUES ('a5e7bafc-9cba-11ee-a971-978657e51f0c', 'nextapp', 'super')",
14
15        R"(CREATE TABLE user (
16              id UUID not NULL default UUID() PRIMARY KEY,
17              tenant UUID NOT NULL,
18              name VARCHAR(128) NOT NULL,
19              kind ENUM('super', 'regular', 'guest') NOT NULL DEFAULT 'regular',
20              descr TEXT,
21              active TINYINT(1) NOT NULL DEFAULT 1,
22        FOREIGN KEY(tenant) REFERENCES tenant(id)))",

As you can see, after a table to store the version of the database schema and a unique server id, tenant and user was next.

The admin tenant and admin user are created during the bootstrap process the first time the database is initialized. And for more than 6 months, this was the only tenant and user the system could handle. I focused on creating a great app to match my needs. I started to use the app already after less than 1 months of development, when the first feature Green Days was ready. Then I continued to use new features as thew became ready. In July 2024 the app covered my basic needs. It was also way better than the various apps I used before, like VikingGTD and WHID. So I stared to think about how to actually design the multi-user functionality.

I am a very public person in some regards, but also a very private person with a strong focus on privacy and security. The two common approaches to multi tenant apps, Single Sign On (for example via Apple, Google or GitHub authentication) and traditional username/password does not fit my preferences. Single Sign On is convenient, but insecure, and it leaks meta-information to mega-corporations. For example, if your Single Sign On login is compromised, someone may access all your apps that use that authentication. If you you are kicked out by the mega-corporation (which can do that for whatever reason - just look at their EULA), you lose access to all the apps that use that authentication. So, not safe to use for the app that organize your life. Username and password is actually a great way to handle this, provided that you use a password manager and cryptographically random passwords for the service. Most people don't do either. So username/password is - for most users - utterly insecure. Some users will use good, secure passwords, and then forget them. Others will use trusted passwords that they have used for decades and that they remember - like "secure beer 123" passwords that any brute force attack will crack sooner than later.

I wanted something better. Something that don't use any 3rd party authentication, and that also don't give the user the opportunity to shoot themselves in the foot. Because if you have a public service, and there is any chance at all for users to shoot themselves in the foot, you can be pretty sure they will figure out how to do that.

The gRPC implementation from Google supports TLS. gRPC use HTTP 2.0 as the transport layer, and TLS is an important feature of HTTP. But in addition to the usual verification, there are some very useful options you can set on a server-side connection. The most strict TLS mode is GRPC_SSL_REQUEST_AND_REQUIRE_CLIENT_CERTIFICATE_AND_VERIFY. This require that the client has a signed client TLS certificate, and that it is valid. If I give gRPC a self signed X509 root certificate, and set this option, a connection will only be visible to the application-layer if the client has a client certificate signed by that root certificate. This provides a very strong security. And this was the route I chose.

Nextappd, the back-end server for NextApp, now creates a self signed root certificate the first time it starts up. It also create a server cert with a DNS name for each valid fqdn (Fully Qualified Domain Name) the server can be addressed by, and signs it with its the root certificate. When it starts accepting gRPC requests, it only sees requests that has a valid client cert signed by its root certificate. This gives us the strongest possible authentication security one can get by using Public-key cryptography. Normally on the Internet, servers has a X509 certificate that is signed by a "trusted" entity, and the client has no signed, public key at all. This allows the client to validate the servers cert, by checking that the signer of its cert is in the applications list of trusted entities. However, some nation states are in that list. That means that there is a real possibility that for example their intelligence agencies can forge X509 certificates that will look valid to the application, and launch a man in the middle attack, or just hijack a service all together. The same goes for hackers who manage to steal trusted root certs or convince a "trusted" entity that they are in fact you! There are some mitigation's for this. But if you use a self signed cert, and the client application has the public key for that certificate, there is no way to hijack the service or launch a man in the middle attack unless you are hacked, or there is a bug in the openssl libraries used by the application. Such a bug would affect all users of X509 certs, including nation states and intelligence agencies themselves. When both sides of the TLS connection has certificates signed by a self signed root certificate, and both sides has a copy of the root certificates public key, you have the gold standard for authentication.

This is good and well. But unless the NextApp client app is shipped with the public key of the servers root certificate, and a signed client cert, it will be unable to connect to the server. Doing this is generally not a good idea. For example, if somebody other than the actual client creates the clients X509 certificate and private key, somebody else could impersonate a client. So when I drafted all this on paper, I very soon ran into a chicken and egg problem.

My solution was to introduce a new back-end server dedicated to handle new client devices, signupd. This is a service written in C++20, just like nextappd, and also using gRPC for communication. I added a command-line option to nextappd to create a signed client certificate and private key with admin rights for signupd. I consider this safe, as it will be done by whoever manages the back-end services. Those people are normally competent, vetted and they have access to all the infrastructure, certificates and shared secrets for a system anyway. They are the ones responsible to protect the secrets from everybody else.

Signupd use a normal X509 certificate, signed by a trusted 'entity'. That means that when the client connects to signupd, it has the same level of security as a normal TLS connection on the web. Not ideal, but good enough for now. When the client connects to signupd, it can create a new tenant account or add a new device for an existing user. In both cases the client app generates a private key and a certificate signing request. It sends the request, along with an email address, a name of the device (so the user can recognize it in a list of their devices), operating system and version. Signupd forwards the request to nextappd. If nextappd accepts the request, it signes the certificate request, and returns a valid certificate, and the public key for the root certificate. The private key remains on the client device. Ideally, it will never leave it. When the app now connects directly to nextappd, nextappd can identify it from information in the certificate. The client can validate that the server is who it says it is. And the server will just ignore any connections form anyone who don't have a valid, signed client cert.

What do we do if the user want to add another device? For example, if the user signed up with the mobile app, they may want to use their laptop as well, as a laptop is way more efficient to work with. I was very tempted to do like Telegram and scan a QR code on the mobile to authenticate the laptop. Unfortunately, the user may choose to sign up with a laptop, and possibly one without a camera. So a QR code, as safe and convenient as they may be, cannot be the only way to authenticate a new device. At the time when I write this, the app use a One Time Password (OPT) to authenticate a new device. The user install the app on the device, and when it starts for the first time it discovers that it is not initialized with a signed certificate. It therefore show the sign-up workflow in stead of the normal user interface. The user can create a new tenant, or add a new device to an existing user. When adding a device, the app asks for an OTP and the users email address. The user then use a registered device to get a new OTP. The OTP is a 8 digit random number. When the app provides a new OTP, It also reminds the user about the email address that is associated with the user account. This last thing is quite useful for people like me who create unique email addresses for any online service that require an email to sign up. (Why I do this? I don't like to be tracked by hundreds of privacy invading companies who sell any information about me and my habits as they can possible collect.)

When the signup server get the email and the OTP, it forwards it to nextappd. Nextappd computes a hash from the email and OTP and a seed value, and compares the hash with a hash it computed when the OTP was created. If the hash is the same, the device is accepted and the user can immediately start using it. We don't store the OTP in clear text in the database to avoid bad actors from using that information if the data is ever stolen. Unless you have lived under a rock in recent years, you probably know that breaches do happen. When designing data systems and authentication work-flows, it is therefore crucial to make everything as secure as possible, and assume that bad things will happen.

With this architecture, the client application authenticate itself to the server using a cryptographically strong password (private key), and it validates that the server is authentic. The user however, just starts and uses the app without entering any authentication. So it is as convenient to use as any local application, and still very secure. Unless an adversary has local access to a device, it can't realistically guess a password or find it in any of the many dumps of usernames and passwords on the dark web.

Is this the only reason I choose to add another service for this? I mean, it would have been simpler to just add another gRPC endpoint to the existing service and be done with it. So of course there was another reason as well.

Using a separate service for signup makes it much simpler to scale up the back-end infrastructure by adding users to one of several available back-ends (sharding) in the future. If/when NextApp becomes so popular that a single server cannot handle the load and provide good response times, we can add as many new servers as we need. It also allow us to put users on servers close to their physical location, to reduce latency. Currently signupd does not handle either, but both scenarios are relatively trivial to add. By breaking up signup as a separate work-flow already at this phase, we won't incrementally make scaling harder when we add features and improvements to the system.

Here is a video showing what the sign-on workflow looks like just after it was implemented.