Rewriting History: A Post-Mortem on fixing a wrong event-stream
When we opened vaamo to the public last summer, we consciously left out some features. We assumed that only few people would actually need those features right away, and we’d therefore have time to build them sometime later.
If you’re a regular reader of our blog, you might know that we use both EventSourcing and CQRS as architectural patterns in our application (read more about our tech stack here). The crucial point is, that one really shouldn’t change readmodels by manually touching them, as they are solely to be created from the events that happened in our domain.
But we didn’t really abide this crucial point from the beginning. The result of not following this rule brought some complications, which are covered in the following lines.
What We Did
One of these seemingly not important features I mentioned earlier was deleting a user account. Early
into the development of our application, it seemed like a lot of work for a
seemingly rare (or rather unpleasant) scenario. So we went ahead and just
invalidated the corresponding row for a user in our
manually changing their
We didn’t touch the rules to derive this table for a long time, so we didn’t
care about implementing an actual
UserDeleted Event and continued updating
the two fields to prevent users from receiving mails and logging in.
email | login_allowed | password ------------------------------+---------------+--------------- email@example.com | f | **PROTECTED** firstname.lastname@example.org | t | **PROTECTED** email@example.com | f | **PROTECTED** firstname.lastname@example.org | t | **PROTECTED**
The example I give above shows a typical excerpt from our database back in January. Here, Marc had us delete his account, just as John, who in contrast to Marc decided to reregister later with the same email address.
The important thing to note here is, not only did we manually change a Readmodel (which is a bad thing to start with), but we also didn’t reflect the change in our event-stream. We basically made our Readmodel an additional Source Of Truth in our application. And the Readmodel’s content severely differed from our event-stream, the original Source Of Truth.
Where This Approach Falls Short
This turned out to be a problem once we had to make a change to the Readmodel and had to rebuild it using the event-stream:
Because we didn’t yet update the rules, by which the
derived from the event-stream, rebuilding the Readmodel would fail for users
like John, because his new and active account wouldn’t be inserted due to a
unique-constraint on the
email_address-column. And for users like Marc,
who had their account deleted at some point in the past, it meant they would be
able to login again and but still would receive system mails from us.
Reflecting Real World Changes In Our Event-Stream
The most crucial point of EventSourcing is that you record events that happened in the real world and from which you derive the current state of your application.
Obviously, we were missing the
UserDeleted event for around 50 users,
although it had happened in reality. This left us with two options to solve this
issue of not being able to rebuild our
This event would’ve perfectly reflected the reality of our application: We knew the user was deleted at some point, but we didn’t reflect it up until now. I understand that this is quite a meta level I’m talking on, but if the fact that this was a manual one-time-fix is worth anything to your business, you probably should implement this event. In our case, it wasn’t worth the effort and also didn’t accurately reflect what has happened in our Readmodel at some point in time.
UserDeletedEventat the most probable point in time
This is what we eventually decided to do: We implemented the UserDeleted event, hooked it up to our Readmodel and then went to pick out all users who got their account deleted. For those who did not reregister, we simply appended a UserDeleted event to their event-stream with the then current timestamp. For those users who already signed up again with us, we had to find the time range between the last event of their old account and the first event of their new account and appended the UserDeleted event with that timestamp.
TL;DR Don’t Try This at Home
Trust me, it’s a pure joy to see a Readmodel once so broken replay just fine,
without any exceptions. We promptly made the
DeleteUser Command available for
our Customer Service colleagues so they could easily delete users themselves in
the future. All without fiddling with Readmodels and the mess that once was this