UUIDs for User IDs

Integers By Default 🔢

Phoenix's generators save a ton of time writing boilerplate code. Pow is an Elixir package and Phoenix extension that offers a great way to get user authentication up and running in very little time. But by default Phoenix's generators use auto-incrementing integers for user IDs.

What's Wrong with Integer IDs 🤔?

I've been bitten nastily by integer IDs multiple times in my career.

One time a backup failure caused the counter on user IDs to be reset, and the account records were erased while the data still existed, un-attached. When new accounts got created, those IDs got re-used, and existing data got attached to the new accounts, super bad news. (This is also an argument for using a SQL store with strong relational guarantees, rather than the NoSQL store the company had adopted at the time).

On another occasion, user IDs were being randomly generated, but they were still rather small integers, and the likelihood of a collision was high due to the rate at which new accounts were being created. A simple oversight in the application code was performing an upsert on the newly generated user accounts instead of an insert. The Database would have detected a conflict and thrown an error in the case of an insert on a re-used user ID, but since the app was upserting, re-used user IDs just clobbered old accounts with new credentials. Yikes!

Neither of these problems were caused by integer IDs, but in both cases, using a longer, richer user ID would have massively reduced the likelihood that the bugs would have had negative outcomes.

Switching to UUIDs 👷🏿‍♀️

Let's say you've already gone through the process of generating a new Phoenix app, and you've already followed all the guides from Pow on getting set up, and then you realize that your user IDs are integers. Stop! Don't throw away all of your code and start over. I found myself in the same situation this morning, but thanks to Cam Stuart on GitHub, I got my IDs switched over in no time. Here are the steps.

Change the migration.

Open up priv/repo/migrations/<timestamp>_create_users.exs
Change
  def change do
    create table(:users) do
...
  def change do
    create table(:users, primary_key: false) do
...

Which tells the migration not to create the default auto-incrementing ID column. Then add a line to create your own ID column:

  def change do
    create table(:users, primary_key: false) do
    add :id, :uuid, primary_key: true
...

Change the Model

Next open up lib/boots/users/user.ex and add two module attributes above the "schema" declaration:

  @primary_key {:id, :binary_id, autogenerate: true} # Add this
  @foreign_key_type :binary_id # And add this
  schema "users" do
    ...

Those module attributes tell the schema that it should use UUIDs (here represented as :binary_id) for the ID column, and that it should auto-generate them. It also tells any other schemas that when they're making a foreign key that references this table, they should also use UUIDs.

Update the Test

If you've followed the Pow guide on adding API tokens, you'll have a failing test now. Open up test/your_app_web/api_auth_plug_test.exs and change the lines that create a test user with an integer ID:

    user =
      Repo.insert!(%User{id: 1, email: "test@example.com"})

To use a UUID instead:

    user =
      Repo.insert!(%User{id: "e2c54c31-e574-4c9f-8672-dad23449d4cf", email: "test@example.com"})

Change your Generators

Now that you've fixed your users, let's make it so that any new models you generate will use UUIDs by default. Open up config/config.exs and add the following lines to the end of the file:

config :your_app, :generators,
  migration: true,
  binary_id: true,
  sample_binary_id: "11111111-1111-1111-1111-111111111111"

🚨 Don't forget to change :your_app to the actual name of your app!

This should tell phoenix that whenever it's generating a new model, it should use binary IDs for them.

All Done!

Congratulations! Your app is now a little more robust 👏. It's worth noting that there has been some discussion about whether using random UUIDs significantly hurts Postgres's performance by causing it to do more random seeks. After my research, the evidence against UUIDs has appeared weak enough to me that I'd rather have the safety than be concerned about possible performance loss. I'd love to hear from you if you have more information on this, though!

On Pagination and Sort Order

The Bug 🐞

We've been testing the next generation of our sync engine at Day One prior to public release, and we found a funny bug. We have an endpoint for fetching a big list of entries that have changed since the last fetch. This endpoint is also how a new phone with no content would pull down all entries from a journal. But when we did try to sync all of our journal contents down to a new device, we were missing entries!

Enter Sherlock 🔎

We investigated the internal workings of the endpoint to see what was up. The missing entry was in the database, so that wasn't the issue. We tested the query that fetched the rows and the entry showed up, so that wasn't the issue. Internally we fetch groups of entries from the database in pages and write them out to the endpoint so we figured it must have been some flawed logic with the pagination. We logged the pagination pattern, but everything looked perfect.

Then we looked closer at the response returned by the server.

We were getting the right number of entries pulled down from that endpoint, but it turned out some of the entries were missing from the feed, and others were repeated!

This was a big indicator that we had a problem with sorting. We compared the entry that showed up twice with the entry that didn't show up at all. Sure enough, we found that they had the exact same "sync date", which is the column that our query sorted by.

    builder->orderBy([|"sync_date asc"|]);

The Fix 🛠

All we had to do was add a unique column into the sort clause:

    builder->orderBy([|"sync_date asc, id asc"|]);

And the problem was fixed.

But Why? 🤔

When paginating, the same query is executed over again each time a new page is fetched. Each time, a small window of the results are fetched by skipping over the number of items fetched by previous pages. In order for this to work, however, the overall result set has to be the same for every page**. If the result set changes as new pages are fetched, we run the risk of dropping or repeating items.

In our case, the sync date on an entry was pretty close to unique. It's got millisecond-accuracy. And most of the time, two entries aren't synced in the exact same millisecond. But, as we now know from testing on real journal data, there are entries with identical sync dates. When this happens, The order in which those two entries are returned in a result state is considered unstable. Our query had an unstable sort.

So something about skipping through the items in our result set caused Postgres to reverse the order of those two rows every time. One of them was missed, the other repeated.

Once we added a unique column to the sort, the sort became a stable sort. Which means that Postgres always knows how to order the results in the query, no matter how much skipping or limiting we do. With that change in place, skipping old rows no longer changed the order and all the entries were properly included in the result set.

Don’t Make Me Tap Twice

I've started rough work on a new app for digital "morning baskets". While I just used Expo when starting Storytime (another app I have in-progress), I decided to give the famous Ignite boilerplate from Infinite Red a chance. In short, it's fantastic. In just a few days of side-work time I've almost completed a fully-functioning minimum viable product (MVP) of this new app.

But I quickly ran into that annoying situation where you're editing a text field, and you want to press the "submit" button, but you've gotta tap it twice so that the keyboard gets dismissed before you can progress.

Thanks to a detailed blog post I found a quick solution. In a project generated by Ignite 6.x, we can open up app/components/screen/screen.tsx and find the place where ScrollView is used:

...
<ScrollView
    style={[preset.outer, backgroundStyle]}
    contentContainerStyle={[preset.inner, style]}
>
    {props.children}
</ScrollView>
...

All we have to do is add keyboardShouldPersistTaps="handled" to the props of ScrollView.

...
<ScrollView
    style={[preset.outer, backgroundStyle]}
    contentContainerStyle={[preset.inner, style]}
    keyboardShouldPersistTaps="handled" // <- Here!
>
    {props.children}
</ScrollView>
...

This instructs the scrollview to dismiss the keyboard if it receives a tap gesture which none of its children have handled, but to leave the keyboard alone if one of the children in the view handles the event. In my case, I navigate right after the button click, and this action dismisses the keyboard automatically anyway. So it resolves the problem for me!

Debugging tricky parameterized types

Parameterized functions are fantastic. I'm talking about the functions that operate around some abstract piece of data, without doing anything that would require specific knowledge about that data. Take, for example, the following fictional function:

Example

let maybeSaveThing = (maybeThing: option<debug>): option<debug> => {
  switch maybeThing {
  | Some(thing) => saveThing(thing)
  | None => None
  }
}

The code above is designed to take an option of something, save it somewhere if the option contains data, and return the result. We assume, for this example, that the saveThing function knows now to save anything. We can tell that the function shouldn't care what's contained in the option type by looking at the function signature. It takes an option<'a>, and returns an option<'a>.

A Sneaky Bug

But this code has a sneaky bug. We can see it if we look at the function's inferred type:

option<int> => option<int>

The compiler is telling us that the function takes an option of an integer, even though we explicitly stated in the type signature that the function should be generic. Why is that?

Why Don't we Get a Type Error?

Generics encapsulate all other types, that's why. For example, an option<int> is an option<'a> as far as the type system is concerned. So is an option<string> or a option<blah>. If a part of the code inside of a function returns a specific value where a generic value was expected, the type inference algorithm quietly turns the function's parameter into a specific type. The compiler won't tell you that your type annotation is wrong, because it isn't. An option<int> => option<int> is an option<'a> => option<'a>.

Why Are We Getting an int?

Spoiler alert, there's a call inside of our function that returns an int where the generic type was expected. Stick around and we'll see how to find it in a moment.

Discovering the Mismatch

We intended the type to be generic. It's troublesome that we don't discover this breakage until we want to use this function to save some type other than an int. The compiler will tell us that maybeSaveThing takes an option<int> and we're trying to pass in something like option<string>.

How can we discover these unintentional losses of genericism closer to the site of definition?

Enforcing Genericism

Adding an interface (.resi) file with the generic signature for the function will help the compiler out. Instead of inferring the type of the function to be specific, the compiler will know that you intend to keep that function generic, and it'll tell you that your implementation doesn't match the interface.

Finding the Mistake

Unfortunately, an error that the function definitions don't match doesn't go far to help us find the location of the mistake. When we're dealing with more than mere example code, and we have a complex function with lots of operations, it's tempting to abort the whole effort and scroll through Twitter instead.

But here's a trick that can speed the process up:

type debug

We've added a new type to our code that is the opposite of generic. It's so specific that no value fits it. And it's explicitly for debugging, so we know that no other area of the code will rely on this type for business logic.

If we swap out the generic argument in our function definition with debug temporarily:

type debug

let maybeSaveThing = (maybeThing: option<debug>): option<debug> => {
  switch maybeThing {
  | Some(thing) => saveThing(thing)
  | None => None
  }
}

It'll show us the line of code that is returning a specific, non-generic value:

  8 │ let maybeSaveThing = (maybeThing: option<debug>): option<debug> => {
   9 │   switch maybeThing {
  10 │   | Some(thing) => saveThing(thing)
  11 │   | None => None
  12 │   }

  This has type: option<int>
  Somewhere wanted: option<debug>

  The incompatible parts:
    int vs debug

Aha! The call to saveThing is returning a option<int>. If that had been a generic function, its return type would have been option<debug>. Let's look at its definition:

let saveThing = _thing => {
  Some(21)
}

There's the source of the problem. If we modify it to do something generic instead:

let saveThing = thing => {
  let _ = Js.Json.stringifyAny //-> blah blah do some saving in the background
  Some(thing)
}

Then the compiler is happy, and we know our code is generic again. We can remove debug and put the type parameters back in place.