Email Tokenization — Identity Without Sensitive Information

tl;dr

Don’t store emails next to identity in one users table. Keep real emails in their own table keyed by an opaque token, and reference that token from the identity table. Identity never sees the email itself, so a read-only leak of one table doesn’t expose PII tied to logins.

Why Separate Email From Identity?

When you build a login system or any website with users, the common approach is simple: collect things like the email, password, and username, then store them in a single table.

Something like this:

users
--------------------------------
id | email | password | username

Looks normal, right?

The problem is that the email address is sitting directly next to the user identity and password. Anyone with access to the database — even if it’s just read-only access — can immediately see sensitive data.

Even worse, this violates a core principle related to PII (Personally Identifiable Information).

You generally don’t want to directly tie a user’s identity to sensitive personal data like this.

So What’s the Solution?

The idea is surprisingly simple.

Instead of storing everything together, we separate responsibilities:

One place stores the real email address
Another place stores the login identity
The connection between them happens through a token, not the email itself

This small architectural change gives us several advantages:

Isolation of sensitive information
Reduced risk if one table leaks
Better control over who can access what
Cleaner system design since identity is no longer tied directly to sensitive fields

How Do We Implement This?

Let’s look at a simple example.

1. A table that stores real emails, but indexed by a token

email_tokens
--------------------------------
token (PRIMARY KEY) | email | created_at

2. A table that stores user identities

identities
--------------------------------
id | username | password_hash | email_token (FK -> token) | created_at

Now the identity table does not know the real email address.

It only knows the token, and that token points to the actual email record.

In other words:

“This identity is linked to that email.”

But the relationship is indirect, which is exactly what we want.

Why Go Through All This Trouble?

Because good systems are designed for the future, not just for something that works today.

As a developer, you should think about:

Designing proper system boundaries
Separating identity from PII
Understanding exposure risk and attack surfaces

Technically, this approach costs almost nothing to implement.

But architecturally, it gives you a cleaner and safer system design.

I also made a small example project showing how this approach can be implemented in Python if you’re curious:

https://github.com/v0id-user/emtok

This English version of the article was translated with the help of AI. Because the original post was written in Arabic, some nuances and details may not have carried over perfectly into the translation. For the most accurate version, you may prefer reading the original Arabic post.