The dark age of authentication

It's no secret that authenticating into services is an unresolved topic. With time, we have managed to make them more secure, but that was at the expense of user experience. The new generation of mail codes and authenticator apps has moved us from the ease of one-click browser autocomplete to complex ordeals involving multiple steps and sometimes multiple devices.

Last month, I was logging into Notion after it automatically logged me out, and I couldn't help but think "It feels like I'm logging in here every second week; maybe I'm doing something wrong." After a long examination of the settings, I decided to open a ticket asking if the session length was indeed that short. The response from Notion's team was prompt and specific, a great example of customer service. However, the content of the answer was less pleasing.

Notion response

Notion is not alone in this; many other services enforce similarly short sessions and uncomfortable methods. This has me pondering the evolution of our authentication methods, from their ancient beginnings to modern complexities. Let's take a look at the history of authentication methods and rate them on two scales: user experience and security.

The first recorded password in western history is the book of Judges. Within the text, Gileadite soldiers used the word "shibboleth" to detect their enemies, the Ephraimites. The Ephraimites spoke in a different dialect so that they would say "sibboleth" instead. Experience ★★★★★: you just had to say a word. Security ☆☆☆☆☆: there's a single word to authenticate multiple users and it can be cracked by learning how to spell it.

Ancient Romans also relied on passwords in a similar manner called them "watchwords". Every night, roman military guards would pass around a wooden tablet with the watchword inscribed and every military man would pass the tablet around until every encampment marked their initials. During night patrols, soldiers would whisper the watchword to identify allies. Experience ★★★☆☆: you just had to say a word but you have to memorize it every day. Security ★☆☆☆☆: it changes every day, but it's still a single word, and without a "forgot password" button, a wrong answer would mean a spear in the gut.

Fast forward to the '20s, alcohol became illegal in the US, and speakeasies (illegal drinking establishments) were born. To enter the speakeasy, people had to quietly whisper a code word to keep law enforcement from finding out. Code words were ridiculous, to say the least: coffin varnish, monkey rum, panther sweat, and tarantula juice, to name a few. Experience ★★★★☆: you just had to say a word, and they were made to be memorable. Security ★☆☆☆☆: it's a single word, and it's not even a secret, but at least you don't get stabbed for getting it wrong.

The first recorded usage of a password in the digital age is attributed to Dr. Fernando Corbató. In the 60's, monolithic machines could only work on one problem at a time, which meant that the queue of jobs waiting to be processed was huge and a lot of processing time was lost. He developed an operating system called the Compatible Time-Sharing System (CTSS) that broke large processing tasks into smaller components and gave small slices of time to each task. Since multiple users were sharing one computer, files had to be assigned to individual researchers and available only to them, so he gave every user a unique name and password to access their files stored in the database. However, these passwords were stored in a plaintext file in the computer and there were a few cases of accidental and intentional password leaks. Experience ★★★☆☆: you have to remember a user and password. Security ★★☆☆☆: it's one per user, but they're stored in plaintext.

To prevent the problem of plaintext passwords, Robert Morris and Ken Thompson developed a simulation of a World War 2 crypto machine that scrambled the password before storing it into the system. This way, the system could ask for the password, scramble it, and compare it to the scrambled password stored in the system, a process called one-way hashing. This simulation was included in 6th Edition Unix in 1974, and got several improvements up to our days, but the basic idea remains the same. Experience ★★★☆☆: you have to remember a user and password. Security ★★★☆☆: it's no longer plaintext, but stealing it would still give you access to the system.

A Hagelin rotor crypto machine

Over time, many different problems arised from the fact that people use the same password for multiple services, so the industry started to push for unique passwords for each service. This was a problem for users, since they had to remember a lot of passwords, and password managers were borned. The first password manager was developed by Bruce Schneier in 1997, and currently every major browser comes with a built-in one, often with an option to generate strong passwords and store them for you. Experience ★★★★☆: you have to remember a master password, but the browser remembers the rest. Security ★★★★☆: it's no longer plaintext, but the master password is the weakest link in the chain.

Phishing attacks and data breaches have made passwords a liability, so the industry has been pushing for multiple-factor authentication (MFA) for a while now. 2FA is a method of authentication that requires two different factors to verify your identity. The first factor is usually something you know, like a password, and the second factor is something you have, like a phone. This way, even if someone steals your password, they still need your phone to log in. There is a myriad of ways to implement 2FA, but the most common ones are SMS codes, authenticator apps, and mail codes. It is often used in conjunction with very short session lengths. Experience ☆☆☆☆☆: you have to remember something, have a phone or mail app, and it requires multiple steps. Security ★★★★☆: it's no longer a single factor, but it's still vulnerable to phishing attacks.

I, like most people, hate passwords and all means of authentication bureaucracy. And it looks like we're now at the lowest point in history in terms of UX. There is still hope with the rise of Single Sign-On (SSO) and biometrics. And certainly passkeys, which are getting a lot of traction lately, are a step in the right direction. But only time will tell if their adoption will be widespread enough to make a difference or if we'll be stuck in this dark age of authentication experience for a while.

Related posts:

13 comments

El Bono Sevilla

Una práctica que se ha adoptado últimamente en algunas ciudades es el llamado bono comercio. Es un bono con unidades limitadas que el ayuntamiento vende, y pueden usarse como forma de pago en algunos pequeños comercios físicos para incentivar la compra local.

El esquema actual en Sevilla ofrece estos bonos a 15€ cada uno, que pueden usarse para hacer compras por un valor de 25€. Debido a que sólo hay 55,000 unidades disponibles, se agotaron en cuestión de horas, una tendencia común con este tipo de bonos. La dinámica que esto genera es bastante interesante: el ayuntamiento, y por extensión todos los ciudadanos, está básicamente regalando 0.55 millones de euros para fomentar un gasto de 8.25 millones en comercios locales. Este beneficio se dirige especialmente a aquellos ciudadanos que han sido lo suficientemente rápidos para comprar los bonos en línea, con un límite de cinco por persona (o lo que es lo mismo, 125€ en valor de compra por 75€ invertidos). Es una manera peculiar de promover el comercio local.

Tras haberlos comprado, mi única queja es que el buscador web de comercios es bastante limitado, y la app tiene un mapa pero no ayuda mucho. Cuando quiero gastarlos en algo, o bien conozco el comercio (hay un buscador textual de comercios), o bien la categoría de lo que quiero (haría falta ver el mapa filtrado por categorías), o estoy explorando comercios (haría falta poder ver los establecimientos con niveles de zoom menores).

The Bono Sevilla search engine

He notado en Twitter que la mayoría de usuarios que mencionan su bono lo han usado principalmente para adquirir libros, cómics y juegos. Esto podría ser porque los más tecnológicamente adeptos (por lo habitual más geeks) fueron los más rápidos en conseguirlos, o tal vez porque el buscador actual no haga fácil encontrar nuevos comercios.

Bono Sevilla in Google Earth

Por eso, decidí extraer los datos de los comercios de la web y convertirlos en un archivo KML, disponible a través del enlace en la imagen de arriba. Este formato facilita cargar la información en plataformas como Google Earth Online, simplemente arrastrando el archivo. Al final, un pequeño toque de tecnología puede abrir nuevas puertas a los rincones comerciales de nuestra ciudad.

No comments

Advent of Rust

I'm taking the Advent of Code again this year. This time, I'm using it as an excuse to dive into Rust. Rust is a modern general-purpose programming language that focuses on performance and type safety. It ensures that the memory pointers are safe without relying on a garbage collector, a key feature contributing to its popularity.

Gingerbread boy ASCII with title

However, the adjective modern is what sounds most interesting to me from all this. Until now, C++ has been my go-to compiled language when I need performance. But, having become accustomed to Python being my default choice for general purpose stuff, returning to C++ feels increasingly tedious.

C++ itself isn't the issue, in fact I think of it as the vanilla flavor of programming languages. It's the surrounding ecosystem that feels outdated: scrolling endlessly through 90's sites from when we used to have long attention spans, every dependency installation that is not just one command line away, every unintuitive build chain… all of this made me look for a fresh compiled language.

Being realistic, one can only excel in so many programming languages. I used to be a fairly good Java programmer, but I barely remember how to read from a file. Which is why I'm taking very seriously this search for a new default compiled language that I can use for the next 10 years without having to worry about the next new kid in the neighborhood.

When looking for alternatives, I started by filtering out the top languages from TIOBE — I know it's not a very scientific source, but it gives a good sense of what languages are at the top — because a language can be as cool as you want, that if I cannot find an easy answer to some obscure error or there are no connectors for a less known database I'd like to use, I'm out. And when looking here, Rust and Go where the only real contenders.

Initially I started using Go, as the syntax looked much simpler. I believe a programmer's efficiency is inversely proportional to how many things they have to remember. With good reason, a language ~64 times slower than C is the #1 simply because syntax is as important as semantics. Just as rookies often focus solely on the former, seasoned programmers sometimes overlook the latter. And in this sense, I decided to ignore Rust because I deemed the syntax as too alien. I mean, this is normal Rust code:

fn main() {
    (1..=5).filter(|&x| x % 2 != 0)
           .for_each(|x| println!("{} is odd", x));
}

However, I'm intrigued about why Rust consistently ranks as the most loved language on the yearly Stack Overflow Survey. I really enjoy coding in Go, how unified all the tooling is and how readable everything ends up being (iota aside), but I'm still not convinced on committing to it for the long term.

Hence, I'll be doing the Advent of Code using Rust with Copilot disabled, to also assess how reliant I've become on AI over the last two years. There is something about the Christmas spirit in doing things the old-fashioned, human way for a change.

1 comment

Quanto: a price-based Wordle

At the beginning of the year, I designed Quanto, a game following the Wordle formula—a global game every day with statistics—and it involves guessing the price of six products from Spanish supermarkets.

Quanto • Six daily products

The game has had more impact than I expected and has already reached 50,000 games, a number that made me wonder if my humble home server would cope. So far, it works without problems (the load doesn't even reach 0.01% at the nightly peak), and I can continue to make it available without any advertising.

The products are randomly chosen every Sunday from the websites of Carrefour, Lidl, Alcampo, and Mercadona. In addition to the photo, all come with their name and additional information such as weight or volume. For each product, we have two attempts to guess the price, with a clue to try to get closer on the second attempt:

MessageDeviation
Right on the first try!0%
Almost, a little moreLess than 15% below
Almost, a little lessLess than 15% above
Way too highAt least 15% below
Way too lowAt least 15% above

At the end of the game, certain statistics are shown, like the average percentage of deviation and the typical message of "you are above 60% of the players". To calculate this message, it is necessary to store the scores of all games and use them to calculate percentiles; something like ordering the scores from lowest to highest and cutting them into 100 similarly sized pieces.

Quanto results capture

Observing these percentiles and making statistics by product is very interesting, and the only conclusion I can draw is that most of us have no idea about the prices of what we buy. In uncommon products, deviations are usually very large.

For example, half of the users would pay €7 for a dye that costs €2.50. Surely if there is more variety of dyes and brands in the supermarket, we can get an idea of the real price of the product, but many chains have a single product for many specific needs.

If I feel like it someday, I'll use all the game statistics to put a more detailed analysis of our perception of prices here, but for now, I want to use this post to ask for your suggestions. If you've read this far, what else would you like to add to Quanto?

No comments

After six months with Copilot

I've been using Github Copilot daily: an AI-powered code autocompletion tool that seems to have a life of its own.

A real case searching for a currency conversion API.

GitHub Copilot is available as an extension for Visual Studio Code, Neovim, and JetBrains. It sends the code and context (programming language, entire project code, file names) to OpenAI's servers, where it's completed and sent back as a suggestion. Pressing Tab adds the suggestion to the code, while Esc discards it.

It's based on an AI called Codex, a descendant of GPT-3 tailored for programming languages and trained on all public domain repositories on Github. This includes code, comments, and documentation in every imaginable programming language.

Using it for the first time is exciting. Writing a function name and having it generate not only the code but also a valid link to a currency conversion API feels like something out of a sci-fi novel. However, after the initial hype, it's easy to see that the tool isn't perfect. It needs guidance to achieve the desired result, and you need to understand what it's writing to maintain control.

For example, at second 0:30 in the previous video, it tries to use a paid service, so I define a URL for the European Central Bank to give it a clue about which tools to use. Even then, the generated code isn't perfect, and I have to manually correct the regex to parse the data.

The death pyramids are one of its most typical errors.

Still, I decided to keep the plugin active for the last 5 months, and having tried it with many different languages: Python, JS, CSS, C++, PHP, SQL, Arduino, VBS, OpenScad, GLSL… it's been a radical change in my usual way of programming. One of those leaps that happen few times in life.

When I was young, my father bought Computer Hoy every month, and I read it from cover to cover. In one edition, there was a guide that said something like "learn to optimize your tasks with Visual Basic," and for some reason, I followed it. When I saw the magic of thinking up an idea, describing it to a computer, and seeing it come to life, I felt the first of those leaps.

Since then, I started programming in VBS with nothing more than that guide and some photocopies someone got me. My way of solving my needs was to look at those 20 photocopied pages over and over until I found an answer. When the sheets were worn and yellowed, the internet came home, and the next leap happened.

Programming became very different. Now my knowledge was not limited to a few sheets, but to the complete reference of Visual Basic or any other language I wanted to learn. And between online references, forums, and email threads, I kept programming until I started University, which was when I learned what an IDE with debugger was.

That leap was also significant. Now I could write in a text editor that colored my code, autocompleted some methods, allowed me to navigate the language reference with just Ctrl+Space, and see line by line how my application's state changed. And on top of that, the forum was Stack Overflow.

Today I still program like that, but I have another ace up my sleeve: the ability not to look at an API if I don't remember something, to let the IDE finish my code following the same style as my codebase, and in general to focus on the fun part of programming instead of struggling with the APIs of Matplotlib, Puppeteer, or PyTorch.

Programming image recognition with a webcam on a Raspberry with just kindly asking for it in a comment.

When Copilot came out, all opinions after a week of use were that it would replace programmers and that it was a magical tool, so I wanted to wait until I had tested it for a good time to understand its place in the programmer's toolbox. And my conclusion is that it's very, very useful if used with some philosophy.

It's easy to get carried away by the solutions Copilot proposes (after all there's no Oscar for best code). But as it's not perfect, without knowing what it's doing, it's easy to write dysfunctional code. Or worse: functional but with hard-to-detect vulnerabilities. Something similar to indiscriminate copy-pasting from Stack Overflow.

It's clear that there's a trend towards increasingly assisted programming. And if the future lies in tools like Copilot, we'll have to tackle challenges like avoiding the distraction of recommendations or promoting critical thinking in the face of suggestions that imply bad practices and outdated approaches; something similar to what happens with AIs trained on real datasets and their acquired biases.

For now, we can try it out to better understand its implications. If you've read this far and are interested, you'll be glad to know that telemetry is only used to see which solutions are accepted, but the code that is written does not feed the model. Being a Windows product, they're probably fishing to catch us later with a subscription service, so you can sign up for the beta before it's too late.

Every video in this post was converted to webm like this, avoiding looking 100 times to the FFMPEG reference.

No comments

4d8cd43bbbfbbd2b7aed08d9a2b0ef251cebfd3e2603b74b710a2d38b7f8ec39