One of the most annoying things I get in touch with software development beside the question How can I format this date is How can I put my database under version control?

If you have not already read K. Scott Allens Database series, do it now.
At first follow his three rules:

  1. Never use a shared database server for development work.
  2. Always Have a Single, Authoritative Source For Your Schema
  3. Always Version Your Database

Requirements

Ask your customer which database and which environment is used. Most of these questions should be answered by a well worked out requirement questionnaire.
Ask for

  • Which operating system is used
  • How is the operating system localized
  • Which database management system is installed
  • Which version and service packs of the DBMS are used
  • Which language version of DBMS is used

It might sound irrelavant but database management systems differs from version to version. Setup your development database server with exactly the same settings your customer uses.

Database environment

The ideal situation is that every developer owns at least two database instances on this server. One for unittests, one for integration tests.
Your continuous integration server should also own his two databases.
So you could have these databases:

  • $PROJECT_$DEVELOPER_unittest
  • $PROJECT_$DEVELOPER_integration
  • $PROJECT_CI_unittest
  • $PROJECT_CI_integration (used for Selenium)
  • $PROJECT_customer_integration

The last database should be used for database dumps which came back from your customer for debugging purposes.

Okay, most of this should be clear. Never use a shared database for development work.

Versioning

Why should you now versionize your database? Because of the same reason you versionize your code: Going back in time and retrace occuring bugs of your customer who uses an old but stable version of your program.

In my opinion there are two approaches to halfway achieve this goal:

  • Store every database object (table, view, stored procedure, function) in its own .sql file
  • Do database migrations like Ruby on Rails does

1 on 1

Storing every database object means that you have a directory structure like

/tables/tbl1.sql
/tables/tbl2.sql
/views/view1.sql
/views/view2.sql
/sps/stored_procedure1.sql
/functions/function1.sql

You get it. Every SQL script contains the CREATE statement for your database object.
This very charmant, because you can track the stuctural changes via your commit log.

The problem begins when you want to migrate your developer integration databases and the database of your customer. Stored procedures or functions can be easily replaced because you can use CREATE OR REPLACE syntax. Views can be replaced by DROP VIEW view1; CREATE VIEW view1 AS…
But – and that’s a big but – you have to compare the current database of your customer with your current table definition and write the ALTER TABLE statements by hand. Not so funny.
Additionally you must delete unused views, stored procedures or functions by hand. Otherwise you’ll get a grave with dead database objects which are no longer used.

I know none database tool which can automatically generate a 100% working update script without any pitfalls. And writing a DBMS independent tool on your own is a lot of work.

1 on n (migrations)

The second approach is to create database migration scripts for every completed database change.
At first the negative aspect: You wil get a lot of files and you can not easily make a git/svn diff of the structural change of your database object because the changes are distributed over many files.

The good news is: You don’t have to write additional ALTER statements on deploy time because they are already there. That means, that you just give your customer your migration SQL scripts, he executes the SQL scripts in a given order and will get the latest database version.

To weaken the problem of diffing different database versions, you can migrate one database to version n and another database to n+1. After that you make an SQL export and use diff (n).sql (n+1).sql.

A few days ago I came to the conclusion that there is no other effective way to achieve the goal, so I decided to further speculate about the second approach.

Implementing migrations

There were some requirements I thought about

  • The database development process and file naming have to be standardized
  • The current installed version of your database must be stored inside the database itself
  • The migration process must be ran easily
  • The migration process must support pre/post scripts
  • The migration process must support different scenarios (deploying on customer, no need for migrations on developer site – just install latest version, different databases)

Standardization was easy. Every project contains a folder named db which contains four sub folders

Directory structure of database migrations

Directory structure of database migrations

  • /coredata contains the core data which every database must contain.
  • /fixtures/[integrationtest|unittest] contains the fixtures which will be used in test cases
  • /handlers can be executed before or after migrations
  • /migrations itself contains the migration scripts

It is important that all files in /coredata, /fixtures and /migrations have an ongoing number. These must not be identical to your SVN revision. It is only important that they can be executed successively.
You can see that that I use the format YYYYMMDD_$DAYVERSION as ongoging identifier.

The database must store their migration version. This is needed for doing migrations. Every migration file greater than $DATABASE_VERSION must be executed.
In my structure above the schema migration table is executed first (20111219_001_schema_migration.sql) and contains only one statement

CREATE TABLE schema_migration (id int not null auto_increment, migration_on DATETIME, version char(20)  NOT NULL, filename LONGTEXT, PRIMARY KEY(id));

After these point I started to write a small prototype migration definition file which supported different scenarios (supporting different databases and servers, migration process, recreate process).

For this definition (see my first draft) I had to write my own parser or use Xtext, so I transformed it into an XML (see my template) sample. It seemed to be ideal for xsd.exe and C#/Visual Studio.
But after I had struggled with serialization problems I took a closer look to the XML file. It looked like… Ant. The processes were more or less the same.
I did a research and came to the conclusion that almost every requirement could be solved with Ant. I implemented a halfway working solution.
The build.xml currently does not insert the migration version after a file is applied. I will fix it ASAP.
Please note, that you need to have js.jar, bsf.jar (both for JavaScript), commons-logging.jar and ant-contrib.jar (for task) in your Ant classpath for using the build.xml.
The build.xml itself is designed for MySQL. You need to customize the build.xml for every DBMS you use. But this should be easy to implement.

Update: Jens Schauder published a post about Tips for Testing Database Code.

I am asking you for a donation.

You liked the content or this article has helped and reduced the amount of time you have struggled with this issue? Please donate a few bucks so I can keep going with solving challenges.

Categories: Databases

3 Comments

Jens Schauder · August 14, 2012 at 2:53 pm

Thanks for referencing my blog.

I noted a typo in the very first sentence: “One one the” probably should be “One of the”

Feel free to remove this otherwise completely useless comment.

enjoy

admin · August 15, 2012 at 9:42 am

Thanks Jens. I fixed the typo.

John Carmack awarded lifetime achievement award at GDC | Couch Campus · December 23, 2011 at 11:55 pm

[…] Spector who is the founder of Junction Point Studios (of Dues Ex, System Shock and Thief fame).id Software CTO John Carmack to be awarded the Lifetime achievement award at this years Game Develop…ampus.com/wp-content/uploads/2010/02/carmack.jpg" alt="Carmack's minions congratulate him over the […]

Leave a Reply