Databases in developers environment – Versioning and migrations
One of the most annoying things I get in touch with software development beside the question How can I format this date is How can I put my database under version control?
If you have not already read K. Scott Allens Database series, do it now.
At first follow his three rules:
- Never use a shared database
serverfor development work.
- Always Have a Single, Authoritative Source For Your Schema
- Always Version Your Database
Ask your customer which database and which environment is used. Most of these questions should be answered by a well worked out requirement questionnaire.
- Which operating system is used
- How is the operating system localized
- Which database management system is installed
- Which version and service packs of the DBMS are used
- Which language version of DBMS is used
It might sound irrelavant but database management systems differs from version to version. Setup your development database server with exactly the same settings your customer uses.
The ideal situation is that every developer owns at least two database instances on this server. One for unittests, one for integration tests.
Your continuous integration server should also own his two databases.
So you could have these databases:
- $PROJECT_CI_integration (used for Selenium)
The last database should be used for database dumps which came back from your customer for debugging purposes.
Okay, most of this should be clear. Never use a shared database for development work.
Why should you now versionize your database? Because of the same reason you versionize your code: Going back in time and retrace occuring bugs of your customer who uses an old but stable version of your program.
In my opinion there are two approaches to halfway achieve this goal:
- Store every database object (table, view, stored procedure, function) in its own .sql file
- Do database migrations like Ruby on Rails does
1 on 1
Storing every database object means that you have a directory structure like
You get it. Every SQL script contains the CREATE statement for your database object.
This very charmant, because you can track the stuctural changes via your commit log.
The problem begins when you want to migrate your developer integration databases and the database of your customer. Stored procedures or functions can be easily replaced because you can use CREATE OR REPLACE syntax. Views can be replaced by DROP VIEW view1; CREATE VIEW view1 AS…
But – and that’s a big but – you have to compare the current database of your customer with your current table definition and write the ALTER TABLE statements by hand. Not so funny.
Additionally you must delete unused views, stored procedures or functions by hand. Otherwise you’ll get a grave with dead database objects which are no longer used.
I know none database tool which can automatically generate a 100% working update script without any pitfalls. And writing a DBMS independent tool on your own is a lot of work.
1 on n (migrations)
The second approach is to create database migration scripts for every completed database change.
At first the negative aspect: You wil get a lot of files and you can not easily make a git/svn diff of the structural change of your database object because the changes are distributed over many files.
The good news is: You don’t have to write additional ALTER statements on deploy time because they are already there. That means, that you just give your customer your migration SQL scripts, he executes the SQL scripts in a given order and will get the latest database version.
To weaken the problem of diffing different database versions, you can migrate one database to version n and another database to n+1. After that you make an SQL export and use diff (n).sql (n+1).sql.
A few days ago I came to the conclusion that there is no other effective way to achieve the goal, so I decided to further speculate about the second approach.
There were some requirements I thought about
- The database development process and file naming have to be standardized
- The current installed version of your database must be stored inside the database itself
- The migration process must be ran easily
- The migration process must support pre/post scripts
- The migration process must support different scenarios (deploying on customer, no need for migrations on developer site – just install latest version, different databases)
Standardization was easy. Every project contains a folder named db which contains four sub folders
- /coredata contains the core data which every database must contain.
- /fixtures/[integrationtest|unittest] contains the fixtures which will be used in test cases
- /handlers can be executed before or after migrations
- /migrations itself contains the migration scripts
It is important that all files in /coredata, /fixtures and /migrations have an ongoing number. These must not be identical to your SVN revision. It is only important that they can be executed successively.
You can see that that I use the format YYYYMMDD_$DAYVERSION as ongoging identifier.
The database must store their migration version. This is needed for doing migrations. Every migration file greater than $DATABASE_VERSION must be executed.
In my structure above the schema migration table is executed first (20111219_001_schema_migration.sql) and contains only one statement
CREATE TABLE schema_migration (id int not null auto_increment, migration_on DATETIME, version char(20) NOT NULL, filename LONGTEXT, PRIMARY KEY(id));
After these point I started to write a small prototype migration definition file which supported different scenarios (supporting different databases and servers, migration process, recreate process).
For this definition (see my first draft) I had to write my own parser or use Xtext, so I transformed it into an XML (see my template) sample. It seemed to be ideal for xsd.exe and C#/Visual Studio.
But after I had struggled with serialization problems I took a closer look to the XML file. It looked like… Ant. The processes were more or less the same.
I did a research and came to the conclusion that almost every requirement could be solved with Ant. I implemented a halfway working solution.
The build.xml currently does not insert the migration version after a file is applied. I will fix it ASAP.
The build.xml itself is designed for MySQL. You need to customize the build.xml for every DBMS you use. But this should be easy to implement.
Update: Jens Schauder published a post about Tips for Testing Database Code.