Agile database schema migration tool for Java

Update: Grails Migration Plugin Forum

About a month into building Gauntlet we found ourselves in a situation where it was impossible for us to keep our development databases up to date with the latest changes to the schema. We were sending around emails telling each other what changes need to be made alongside our check-ins. In order to get around this problem I spent a few hours building a rudimentary database schema migration tool. When you made a change to the database schema you would have to build a DDL file that would make the change and then update the version numbers for the Gauntlet software and within the database. A few short months later and I discovered that we had built something very much like ‘rake migrate’ from Ruby on Rails — in fact it was almost exactly the same except that ours worked at runtime rather than only from the command-line. Fast forward a couple years later and I no longer own the Gauntlet source code, so yesterday I set out to rebuild the schema migration tool from scratch .

The basic concept is very simple. The first time your program connects to its database it calls the schema migration code to make sure that the code and the database are at the same version so that you are always using data objects and queries that match the schema that is present in the database. When you make the call you need to pass the tool all the details about the database that you are connecting to, the place to find and classes or scripts to do the migration, and the current version of the client. Then, if the database version is less than the client version, the migration tool systematically searches for first database type specific DDL classes/scripts then generic versions, executing them in turn to migrate the schema forward until the database version and the client version match. If it encounters a case where the client version is less than the database version it has no recourse but to fail. As a bonus, it also offers a version 0 transition where it will bootstrap you from a completely empty database to your initial schema.

For instance, lets say in version 1 of your database you have the following table:

mysql> describe event;
+---------------+---------------+------+-----+-------------------+----------------+
| Field         | Type          | Null | Key | Default           | Extra          |
+---------------+---------------+------+-----+-------------------+----------------+
| id            | bigint(20)    | NO   | PRI | NULL              | auto_increment |
| time          | timestamp     | NO   |     | CURRENT_TIMESTAMP |                |
| z             | bigint(20)    | NO   | MUL |                   |                |
| ip_address    | int(11)       | YES  |     | NULL              |                |
| user_agent_id | int(11)       | YES  |     | NULL              |                |
| referrer      | varchar(256)  | YES  |     | NULL              |                |
+---------------+---------------+------+-----+-------------------+----------------+

Then you decide that you want to change the name of the referrer field to url. In order to do that you would create a new migration script that updates the field name and the database version:

ALTER TABLE event CHANGE referrer (url varchar(256));
UPDATE db_version SET version = 2;

You would name that script migrate1.sql and put it in the mysql specific database migration scripts. You would also then update the client version to 2 as well. Once that was done anyone who uses the new client code against their database will automatically get the schema changes required for the client to work with the database. This drastically cuts down on the amount of communication that needs to occur in typical database development situations. You can find the project that implements this db schema migration here. It has one dependency that is included with the project, my cli-parser .

This entry was posted in Java and tagged , , , , . Bookmark the permalink.
  • chris
    Hi Sam,

    You may want to checkout ChronicDB, which applies database schema updates without downtime: http://chronicdb.com
  • Looks interesting but is ridiculously expensive.
  • sam
    Hi Miguel,

    I'm pretty sure that MySQL doesn't support DDL + transactions like PostgreSQL does. If it does, I will fix it but that is my understanding.

    Sam
  • harry
    The same thing with Oracle, every DDL operation includes an implicit commit so the ability to roll back a schema change using traditional means is not possible. Although that specific RDBMS includes other ways to possibly overcome the problem of reverting back changes that go beyond transaction level using a feature called flashback database.

    I also agree that the ability to revert back failed migration step is crucial for actually testing the steps and ensuring that they all work correctly before applying them to any production environment. The problem currently is just how to actually do it in a foolproof way.
  • it seems that the migration tool doesn't do the migration as a transaction. i haven't looked into the code, but got an error while it was running, but part of the script still ran. this was on mysql5 and it was jsut a create table command, it failed because there was no version table yet.
  • I have created and released under the GPL a versioning framework for
    database release patching. This has been tested and used in an Agile
    development environment with multiple users throughout all stages of
    the development lifecycle.

    You can find details here: http://www.andrew-eells.com/database-versioning...

    I would like to make a section of the development community aware of
    this, really to see what response I might get, and possibly to help
    anyone who needs a framework for similar purposes.

    Please feel free to download and use it within your project team, and
    I'd be really grateful for any comments on improvements you think
    could be made.

    Thanks and regards, Andrew
  • sam
    The code is maintained at Google Code. Codehaus only has the code for the plugin because thats the way you distribute plugins using grails -- they are uploaded to the codehaus SVN server. Anyone who is serious about working on the project only need contact myself and I will add them to the project with their first good patch.
  • Arnon
    Good work Sam.
    Where do you maintain it now? There seems to be confusion as to whether it's in google code or codehaus...
  • sam

    Sounds very similar to autopatch (http://autopatch.sourceforge.net), which i’ve been using for a couple of months on a java web project. It’s more web app focused, with some nice features like a startup servlet to trigger the migration when you deploy a new version of your app, and a locking table to prevent multiple nodes in a cluster from trying to update simultaneously. So far the biggest annoyance of autopatch has been that the migrations are one way. I like the fact that rake migrate lets you downgrade your database so you can re-run the upgrade as you test your app before checking in.




    Autopatch actually looks like a really cool generic solution to this problem. If it had been easy to find perhaps I would not have built this though its probably a bit too generic for most people (but not me). I've never had an application setup so that I would need to downgrade, instead I always use original data in the tests and have it get upgraded up to current. That makes it a waste to write all that downgrade code, especially when a true downgrade may be impossible (loss of information during migration).
  • sam

    Hi Sam,

    Just been looking at dbmigrate to handle our database migration issues during development. Looks like a very promising product.



    Thanks, I am obviously still working on it but its working well in its current implementation.



    One area that has me scratching my head a bit is the processing of the scripts. It’s a bit out of sync with the doco, so I’ve been running some tests to see how it works and looking through the source code. I’d like some feedback on the reasoning behind the mirgratefrom- and migrateto- seperating of the classes/scripts. The code says that it looks for the first file from the list and runs just that one. The search order being (ignoring database specific ones which are the same):


    (Class)from(version)

    (Sql)from(Version)


    (Groovy)from(version)


    (Class)to(version+1)


    (Sql)to(Version+1)


    (Groovy)to(version+1)


    I found this a bit confusing and wondered what the logic behind this is. I also found that if I create a from1, from2, and from3, dbmigrate only runs 1 and 2 when I specify version 3. I tried changing from3 to to3, but because of the from2 script it would not run that either.


    It appears that the logic is that from scripts update from the prior version to the next number, I.e. from2 should contain updates from version 2 to version 3 of the database and the to scripts work in the opposite. But that you must code either from or to scripts, but not both. In which case I’m thinking why not just stick with what the doco says.


    It appears that the best way to code things is to create to1, to2, and to3 as most programmer think in terms of the version number of a file matching the version number in the database.



    So here is the deal. The original one had the semantics of 'migratefromN' which is what I need in order to skip versions for when you go GA and you want to update the database in one fell swoop from the previously released version to the latest development version. The problem is that if you keep track of the version you are going to you have this problem that you must assume that you are coming from the most immediate previous version. For all those people out there that want the version of the file to match up with the version of the database you can use 'migratetoN' scripts/classes. I should really update the documentation but this was after feedback like yours where you want the script version to match the database version it is setting rather than the version it finds in the database.



    Regards,

    Derek


    P.S. What’s the status of the code in terms of contributions from other programmers ? How would we go about it ? and how much time do you expect to put into this project ?



    Right now it is just myself and a couple of others that have worked on before. If you or someone from your team wants to contribute we can discuss it. Send me email (sam at sampullara dot com). In terms of time I am going to put into the project -- I would like to limit the scope of it so that in its current form it is almost done and if there is more to it, perhaps make another system that sits on top of it.

  • Lorin
    Sounds very similar to autopatch (http://autopatch.sourceforge.net), which i've been using for a couple of months on a java web project. It's more web app focused, with some nice features like a startup servlet to trigger the migration when you deploy a new version of your app, and a locking table to prevent multiple nodes in a cluster from trying to update simultaneously. So far the biggest annoyance of autopatch has been that the migrations are one way. I like the fact that rake migrate lets you downgrade your database so you can re-run the upgrade as you test your app before checking in.
  • Derek Clarkson
    Hi Sam,
    Just been looking at dbmigrate to handle our database migration issues during development. Looks like a very promising product.

    One area that has me scratching my head a bit is the processing of the scripts. It's a bit out of sync with the doco, so I've been running some tests to see how it works and looking through the source code. I'd like some feedback on the reasoning behind the mirgratefrom- and migrateto- seperating of the classes/scripts. The code says that it looks for the first file from the list and runs just that one. The search order being (ignoring database specific ones which are the same):

    (Class)from(version)
    (Sql)from(Version)
    (Groovy)from(version)
    (Class)to(version+1)
    (Sql)to(Version+1)
    (Groovy)to(version+1)

    I found this a bit confusing and wondered what the logic behind this is. I also found that if I create a from1, from2, and from3, dbmigrate only runs 1 and 2 when I specify version 3. I tried changing from3 to to3, but because of the from2 script it would not run that either.

    It appears that the logic is that from scripts update from the prior version to the next number, I.e. from2 should contain updates from version 2 to version 3 of the database and the to scripts work in the opposite. But that you must code either from or to scripts, but not both. In which case I'm thinking why not just stick with what the doco says.

    It appears that the best way to code things is to create to1, to2, and to3 as most programmer think in terms of the version number of a file matching the version number in the database.

    Regards,
    Derek

    P.S. What's the status of the code in terms of contributions from other programmers ? How would we go about it ? and how much time do you expect to put into this project ?
  • sam
    Thanks for the feedback! My responses to your excellent suggestions:

    1) I like this idea.
    2) Seems reasonable. That might make it easier to read and understand what would get executed.
    3) Now that I have integration with Grails I might add the ability to specify the environment even in the Java version.
    4) Actually as of subversion version 40 or so I support Groovy scripts (named like the sql scripts), Java classes (camelCased), and SQL scripts.

    Sam
  • John
    Wow, pretty cool. I work a lot with Rails and migrations, but also still work with projects on a Java stack. I was considering developing something very similar, some day, but for now have just organized SQL snippets by database, environment, etc. This doesn't quite do everything I will need, and I actually don't need much yet since I am only dealing with a couple instances. I will stay tuned for sure.

    A couple of thoughts for you moving forward:
    - I think you might want to consider breaking up schema from data. The data itself might change based on the environment, especially for initial loads.
    - I think separating by database vendor could be done optionally in the naming of the file itself, e.g. migratefrom01_postgres.sql or migratefrom01.sql. I think it fairly likely that the sql will be independent.
    - I use something like this for a structure
    - schema
    - baseline_schema.sql
    - schema_updates.sql (could be migrations here, further separated by vendor)
    - data
    - dev
    - baseline_data.sql (could also be migrations, the system always does schema then data)
    - data_updates.sql
    - production
    ...
    - A final nice feature would be to have the system pick up SQL OR Groovy scripts, in case you want to do some serious logic for a migration.

    Once again, nice work.

    - John
  • Nathan Voxland
    Checkout Liquibase: http://www.liquibase.org/
    “LiquiBase is an open source (LGPL), DBMS-independent library for tracking, managing and applying database changes. “

    I like the backend of this library. Not too fond of creating XML to do the changes though. Though one of the uses I see for this is adding migrations to Grails. Then creating a builder that has a nice DSL and generates the XML configuration files might not be that bad. I’ll check it out, thanks for the link!

    Hi, I'm one of the developers of LiquiBase and ran across this blog today. I had actually started making a LiquiBase Grails plug-in as a way to start learing the platform. It's not very far along so far, but I don't see it as being particularly difficult (time is the main reason it's not far along). I've had requests for a non-XML DSL for changes in the past and am not against the idea. We've gone with XML mainly so we can leverage the existing XML parsers and because IDEs have built in XML auto-completion. There may be a way to combine the two, however.

    I agree that some sort of schema migration tool would be a great addition to Grails, but I don't want to double up on the effort. Let me know if you'd be interested in working together.

    Nathan
  • If anyone knows of a DDL abstraction layer for Java that would be an awesome addition to the framework.


    Wow, funny timing. I stumbled upon this blog entry and realized I have exactly what you're looking for. We (Jive Software) made an abstraction layer for our schema about a year ago. We were struggling to maintain a bunch of different schemas, so we ended up representing the schema as XML. Our library parses the XML and produces database-specific SQL. Besides having a syntax for tables, columns, etc it can do alter statements as well. Besides all this, we build an upgrade framework which uses this and keys off a database table -- so it's really similar to what you did.

    We've been meaning to open source this stuff so maybe now's the time to do it. :) Shoot me an email (bill at jivesoftware dot com) and I'll show you the source.

    Cheers,
    --Bill
  • sam

    Sam - I’m trying to your dbmigrate package from the Google code site. I’ve tried running Migrate from cmd line with little success - I’m guess the args I’m passing are not correct. Can you email me a sample of running it from cmd line? I looked through your test cases and couldn’t fine one. Thanks - James.




    Hi James,

    I've updated the source with a test for the command line along with actually printing out the usage message when you try and run it on the command line and it fails. I would definitely appreciate any feedback you have about the library.

    Sam
  • James
    Sam - I'm trying to your dbmigrate package from the Google code site. I've tried running Migrate from cmd line with little success - I'm guess the args I'm passing are not correct. Can you email me a sample of running it from cmd line? I looked through your test cases and couldn't fine one. Thanks - James.
  • sam

    Checkout Liquibase: http://www.liquibase.org/


    “LiquiBase is an open source (LGPL), DBMS-independent library for tracking, managing and applying database changes. “




    I like the backend of this library. Not too fond of creating XML to do the changes though. Though one of the uses I see for this is adding migrations to Grails. Then creating a builder that has a nice DSL and generates the XML configuration files might not be that bad. I'll check it out, thanks for the link!
  • Checkout Liquibase: http://www.liquibase.org/

    "LiquiBase is an open source (LGPL), DBMS-independent library for tracking, managing and applying database changes. "
  • sam
    If anyone knows of a DDL abstraction layer for Java that would be an awesome addition to the framework.
blog comments powered by Disqus