This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Updated Synapse versions should fail to start if required background updates have not run yet. #16047
Labels
O-Occasional
Affects or can be seen by some users regularly or most users rarely
S-Major
Major functionality / product severely impaired, no satisfactory workaround.
T-Enhancement
New features, changes in functionality, improvements in performance, or user-facing enhancements.
We (Beeper) just had a fun adventure upgrading to 1.88.
Looks like back in 1.74 a background update was added populate_user_directory_process_users and unbeknownst to us we've been grinding away at that background update ever since we upgraded back in January, mostly because we have 28 million users in our users table thanks to all the appservice users, as well as https://github.com/matrix-org/synapse/pull/15435/files#r1281053946 causing us to miss an index and only being able to process a single user every few seconds. This also means that any more recently added background updates haven't run, most importantly, profiles_full_user_id_key_idx and room_membership_user_room_index. We had updated to 1.85 in mid June, but the background update has still been chugging along and we were still missing the indexes. When we upgraded to 1.88 earlier today it ended up blowing up our DB since queries like SELECT displayname FROM profiles WHERE full_user_id = '@brad:beeper.com' would just scan the 28 million row table. We resolved the issue by just adding the index by hand which only took a couple minutes to complete, after which our Synapse instance was usable again.
At the time of the update, our
background_updates
table looked like this:After some discussion in #synapse-dev:matrix.org, it's clear that having these indexes be added by a background update is a good thing, as they can be added without a long blocking migration and done in a release before they're required. However, this leaves open a gap where if for whatever reason the previously assumed-to-be-there background updates don't complete before the update happens that depends on them you end up with a extremely poorly performing or worse broken Synapse installation.
In an ideal world, Synapse at startup could check that the background updates that it needs were completed and explode gracefully with a nice error message. Ideally ideally, this would happen prior to migrations being applied, as to make it easier to rollback the upgrade. This would require completed background updates remaining in the table with something like a
completed_at
column, but that's probably nice for auditability after the fact.The text was updated successfully, but these errors were encountered: