Testing Entity Framework Migrations

“We demand rigidly defined areas of doubt and uncertainty!” — Douglas Adams

This post describes a method to rigorously test Entity Framework Core migrations, in an fast, automated and repeatable way, to build confidence when pushing a migration to a database in production.

The sample code for this solution is in Github — Any and all feedback is welcomed!

In my opinion, applying database migrations to a production environment is the highest risk part of a product update. There’s only really one shot at it, and correcting errors after the event can be extremely difficult. An error in a migration is often an ‘emergency’ which requires teams to drop everything and deal with it. The way to avoid this potential chaos, is repeatable and automated tests that are executed before pushing the migration, but this can be tricky to do.

In this post I’ll present one way testing the data migrations that I think would be great at keeping developer heart rates in the healthy range during a database update.

I’m assuming a decent amount of familiarity with database development, .NET core, and Entity Framework migrations here. I am also assuming a good understanding of TDD, unit testing and integration testing.

Dummy Domain

To illustrate my point, I have set up a dummy database. I’ll start this with an initial migration, and then do a subsequent migration that modifies the schema and also some data that has been been previously stored in the database.

This database structure allows a user to create a blog, and then publish posts to that blog. Lets assume that this is in production, someone points out that storing an ‘AuthorName’ as a string is a bit of a normalisation fail, and that should really be stored in in separate table.

Making this change requires a schema change, and also a data migration.

What happens ‘out of the box’

These kinds of schema transformations are easily handled by Entity Framework, which will spit out automatically generated migration files. These auto generated migrations will apply the schema transformation, but unfortunately there’s not really any way to automate the data transformations. This will have to be done my hand, moving the data in the ‘AuthorName’ column into a. ‘Author’ table, and correctly wiring up the foreign keys.

During the development process, these migrations will be run first on a dev machine, and then probably (hopefully) in a pipeline and a pre-prod environment. The data in these dev / test databases is often minimal and / or idealised, and is unlikely to represent the data that is found in a production database. This may end up with migrations which pass in the test environments, and then fail in production — a bit of a nightmare scenario!

To have confidence that your migration will run in production, you need a fast, automated, and reproducible tests for your migration scripts which run against production-like data.

Migration Testing Utopia

Photo by Christopher Burns on Unsplash

The properties that I’m looking for are in these tests are:

  • All testing must be automated, from set up to tear down, and capable of running in CI/CD.
  • The tests should run quickly.
  • The results should be repeatable.
  • The tests should execute against data that is ‘production-like’.
  • It should be easy to replicate any data issues that are found in production in the test suite.

Here is a sample of a set of unit tests that ensures that the above migrations will execute against a defined set of data, and (I think) meet the above criteria. (Full code here)

A closer look

This is a series of tests, one for every migration that has been added. Each test comes with a SQL script that inserts data that is relevant for that at that migration. The tests will all follow the same format:

  • Set up the data to represent the state of the database prior to the migration.
  • Apply a database ‘Up’ migration. If any exceptions are thrown during the migrations, the test will fail.
  • Assuming the migration has passed, then test that the data has been correctly transformed after the migration has executed.
  • Apply the down migration
  • Test that the data is correct after the ‘Down’ migration has been applied.
  • Apply the up migration again, ready for the next test to run.

I am using xUnit in a bit of an unusual way. The ‘unit’ tests are not really unit tests as such. They have to run in order, and each subsequent test relies of the previous one completing. They are not unit test, and not really integration tests either. Migration tests? Timeline tests? Unsure, and unsure if it matters too much.

Running these tests like this builds quite a lot of confidence that the data will be transformed correctly. It is also very easy to set up edge cases, and prove that they will work, by setting up the data in the SQL scripts that are run during the testing.

Like the migrations themselves, these tests are built up one on top of the other. Every time a new migration is added, a new test is added. Data this is modelled for previous migration has to be able to flow through all of the migrations to eventually arrive at the latest. The benefit of this is a bit subtle, but it means that you should never face the situation where there is some old data lingering about in prod that catches you out many deployments down the line.

If an issue slips through the net, and an issue does arise when migrating a live database, it will be trivial to script the specific data that caused the issue to the appropriate testing script, and then solve it by red/green looping the test.

Conclusion

I am really interested to hear to opinions of the community on this technique. I think there is real value in being able to quickly spin up and test your migrations like this. But, I’m certainly not going to claim that I have happened upon the absolute best way to do this!

A couple of concerns that I have:

  • I don’t really like that the tests have to run in a set order. They are not really unit tests.
  • This could be a lot of work to maintain. Is it worth it?
  • This will end up requiring a lot of SQL scripts. This is fine by me, but I note a ‘hesitance’ in the community to deal with SQL directly these days.

Overall, I think this might be a worthwhile technique for a small to medium sized project. I’m planning on giving this a go on a more complex example, and maybe see about applying it on a live system. I’m very interested to know what y’all think of this, so let me know in the comments!

Andy

Technical Lead and co-founder at Avalone Consultants. Angular, .NET, and blockchain developer.