Track Subversion (SVN) changes with an RSS feed

I was a bit annoyed the other day that some of the Subversion repositories out there don’t have a way to send a notification to interested parties when a new change is made.  There are a few other services on the web that claim to do this but when I went looking I didn’t have much luck using any of them.  Since I am not one to shrink from a challenge that can’t be solved by a little bit of coding I threw a new Subversion feed service together in a couple of hours.  About a quarter of that was spent trying to get the CSS on the homepage right.

If it gets any use at all (besides me) I might add some additional features to track which projects are being watched the most and other interesting features of the repositories.  If you have any great ideas you can hand them over to me in the comments :)

Posted in Java, Technology | Tagged , , | 8 Comments

Agile database schema migration tool for Java

Update: Grails Migration Plugin Forum

About a month into building Gauntlet we found ourselves in a situation where it was impossible for us to keep our development databases up to date with the latest changes to the schema. We were sending around emails telling each other what changes need to be made alongside our check-ins. In order to get around this problem I spent a few hours building a rudimentary database schema migration tool. When you made a change to the database schema you would have to build a DDL file that would make the change and then update the version numbers for the Gauntlet software and within the database. A few short months later and I discovered that we had built something very much like ‘rake migrate’ from Ruby on Rails — in fact it was almost exactly the same except that ours worked at runtime rather than only from the command-line. Fast forward a couple years later and I no longer own the Gauntlet source code, so yesterday I set out to rebuild the schema migration tool from scratch .

The basic concept is very simple. The first time your program connects to its database it calls the schema migration code to make sure that the code and the database are at the same version so that you are always using data objects and queries that match the schema that is present in the database. When you make the call you need to pass the tool all the details about the database that you are connecting to, the place to find and classes or scripts to do the migration, and the current version of the client. Then, if the database version is less than the client version, the migration tool systematically searches for first database type specific DDL classes/scripts then generic versions, executing them in turn to migrate the schema forward until the database version and the client version match. If it encounters a case where the client version is less than the database version it has no recourse but to fail. As a bonus, it also offers a version 0 transition where it will bootstrap you from a completely empty database to your initial schema.

For instance, lets say in version 1 of your database you have the following table:

mysql> describe event;
+---------------+---------------+------+-----+-------------------+----------------+
| Field         | Type          | Null | Key | Default           | Extra          |
+---------------+---------------+------+-----+-------------------+----------------+
| id            | bigint(20)    | NO   | PRI | NULL              | auto_increment |
| time          | timestamp     | NO   |     | CURRENT_TIMESTAMP |                |
| z             | bigint(20)    | NO   | MUL |                   |                |
| ip_address    | int(11)       | YES  |     | NULL              |                |
| user_agent_id | int(11)       | YES  |     | NULL              |                |
| referrer      | varchar(256)  | YES  |     | NULL              |                |
+---------------+---------------+------+-----+-------------------+----------------+

Then you decide that you want to change the name of the referrer field to url. In order to do that you would create a new migration script that updates the field name and the database version:

ALTER TABLE event CHANGE referrer (url varchar(256));
UPDATE db_version SET version = 2;

You would name that script migrate1.sql and put it in the mysql specific database migration scripts. You would also then update the client version to 2 as well. Once that was done anyone who uses the new client code against their database will automatically get the schema changes required for the client to work with the database. This drastically cuts down on the amount of communication that needs to occur in typical database development situations. You can find the project that implements this db schema migration here. It has one dependency that is included with the project, my cli-parser .

Posted in Java | Tagged , , , , | 23 Comments

Using the Yahoo! Mail SOAP API 1.1 from Java’s JAX-WS 2.1

On YDN they have samples and documentation on how to use the Yahoo! Mail SOAP API from Axis2. I’m not a big fan of that method so I went ahead and used JAX-WS to do my dirty work. As an example, I will build an RSS feed of the users unread messages.
First get JAX-WS 2.1.x or possibly just JDK 1.6 (though I did all my testing with the former and JDK 1.5). Since its easiest to work with typed APIs we are first going to generate the classes needed to talk to the the Yahoo! Mail web service. From their site, we find that the latest WSDL URL. To generate the API from the WSDL we use the ‘wsimport’ command from JAX-WS like this:

wsimport.sh -extension -s src -p com.yahoo.mail http://mail.yahooapis.com/ws/mail/v1.1/wsdl

The key command line option there is the ‘-extension’ option as the Yahoo! Mail WSDL has a few things that by default would get named the same thing by the schema compiler. By using Sun’s extensions we can automatically rename them rather than making our own binding. This will generate 120+ classes representing each part of the complex API along with some special classes like ObjectFactory. This gives us the foundation for accessing Y! Mail but there are still a few quirks that we need to understand. Normally when you use JAX-WS you would simply access the web service using the main Ymws class that was generated like this:

             // Instantiate the SOAP proxy
             Ymws service = new Ymws();
             YmwsPortType stub = service.getYmws();

Then you would be able to make calls on that stub directly like this:

            UserData userData = stub.getUserData();

You’d find though if you tried to do this that you would not be authenticated with the service nor would there be anyway to select what user for which you are making this call. Yahoo! Mail’s web services have their own authentication scheme that is built on Yahoo’s BBAuth — a 3rd party authentication system. In order to make use of BBAuth you will need to register your application with Yahoo. Make sure that you use a publicly available URL for your application and also select the third option at the bottom: “Yahoo! Mail (via BBAuth) with Read/Write access” so that you will be able to use this application ID to access the Y! Mail API. Once you have registered you will be asked to authenticate your URL by placing a special file at the root of the domain of your URL. This means you needs write access to the root of the web server so don’t attempt this on a domain that you don’t control in that way. After this is complete you continue to the success page where it provides you with your application ID (appid) and your shared secret (secret). These will be required for you to access the Y! Mail APIs on behalf of Y! users.

The core of the way BBAuth works is for you to redirect to their server when you want to authenticate a user and then they send back the token that is required to access Y! as that user to the registered application URL. Here is the code that we can use to generate the URL required to get authenticated:

             long ts = date.getTime() / 1000;
             String uri;
             uri = "/WSLogin/V1/wslogin?send_userhash=1&appid=" + URLEncoder.encode(appid, "UTF-8") + "&ts=" + ts;
             MessageDigest md;
             md = MessageDigest.getInstance("md5");
             String sig = new BigInteger(1, md.digest((uri + secret).getBytes())).toString(16);
             return LOGIN_URL + uri + "&sig=" + sig;

Essentially this code creates a URI with our appid and signs it with the secret which is then used to create a URL that includes the signature. This ensures that only someone with the secret can work on behalf of the application that we registered. Once the user is authenticated you will receive a callback at the registered URL that includes the information passed here plus a token that can be used to retrieve a WSSID and cookie that can then be used to construct authenticated web service requests. There is an additional login URL parameter that you can pass called appdata that will be returned to you when Y! redirects the user back to your application. The token that is returned is generally good for 2 weeks of authenticated access. The send_userhash=1 option tells Yahoo to return to us a unique identifier tied to your application id that will always be the same so you can use it to tie back to a particular user in your application.

Now that we have gotten the token back from Yahoo we can get our WSSID and Cookie. Included with the Yahoo! Mail sample code they have an inner class called BrowserBasedAuthManager. We’ll just use that rather than rewrite it but basically it does something similar to the code above and retrieves the two values from an XML document returned from a URL:

             // Instantiate the auth manager and set it up with the date, application ID,
             // shared secret and the user token.
             BrowserBasedAuthManager authManager = new BrowserBasedAuthManager(date, appid, secret, token);

These two values, wssid and cookie, are then needed to construct our web service requests. JAX-WS doesn’t expose this functionality directly in the API but instead allows you to set various properties on the request context:

         Map<String, Object> requestContext = ((BindingProvider) stub).getRequestContext();
         requestContext.put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY,
                 "http://mail.yahooapis.com/ws/mail/v1.1/soap?appid=" +
                         URLEncoder.encode(appid, "UTF-8") + "&wssid=" +
                         URLEncoder.encode(authManager.getWssid(), "UTF-8"));
         Map<String, List<String>> cookies = new HashMap<String, List<String>>();
         cookies.put("Cookie", Arrays.asList(authManager.getCookie()));
         requestContext.put(MessageContext.HTTP_REQUEST_HEADERS, cookies);

The first property allows us to change the actual endpoint of the web service call to include the WSSID that we got from our authentication request. The second call allows us to set cookies on the HTTP request that is used to make the specific call. Together these will give us the access we need in order to use the stub securely. Actually making use of the API is quite easy now that we are authenticated. For instance, we can trivially discover whether or not the user is a Y! Mail Plus subscriber and has access to the full API functionality (non-premium users can’t get the contents of messages for instance):

         UserData userData = stub.getUserData();
         boolean isPremium = userData.getUserFeaturePref().isIsPremium() 

Here is the code to pull all the unread messages from a folder:

         // List out up to 100 new messages in the folder
         ListMessages lm = new ListMessages();
         Flag flag = new Flag();
         flag.setIsRead(FALSE);
         lm.setFilterBy(of.createListMessagesFilterBy(flag));
         lm.setFid(folder.getFid());
         lm.setNumInfo(BigInteger.valueOf(NUM_MESSAGES));
         ListMessagesResponse lmResp = stub.listMessages(lm);
         return lmResp.getMessageInfo();

Notice how we use the ObjectFactory to create the filterBy element. Whenever you see a reference like JAXBElement<Flag> in the API you will likely want to use one of the convenience APIs within ObjectFactory to create the argument.

Now lets get to our RSS feed of unread messages example. To create our RSS feeds we could have just written out the feed directly but instead I’m going to use ROME as we might want to extend the example to a real application later. ROME has one dependency, JDOM-1.0 so we will have to get that as well. We can encapsulate the application into a single servlet that serves both the authentication feed and the mail feed. Here is the core servlet method:

        try {
             String token = httpServletRequest.getParameter("token");
             if (token == null) {
                 // If there is no token we are not authenticated
                 throw new AuthException("No token");
             }
             // Current date, needed for many API calls
             Date date = new Date();
             // Instantiate the SOAP proxy
             Ymws service = new Ymws();
             YmwsPortType stub = service.getYmws();
             // Instantiate the auth manager and set it up with the date, application ID,
             // shared secret and the user token.
             BrowserBasedAuthManager authManager = new BrowserBasedAuthManager(date, appid, secret, token);
             // Set up the web service call
             setupWebServiceCall(authManager, stub);
              // Create the feed
             SyndFeed sf = createFeed(date);
             // Go and get the list of folders and pull out the inbox
             Fid inbox = getInbox(stub);
             if (inbox != null) {
                 List<MessageInfo> messages = listUnreadMessagesInFolder(stub, inbox);
                 List<SyndEntry> entries = new ArrayList<SyndEntry>();
                 for (MessageInfo message : messages) {
                     SyndEntry se = createEntry(message);
                     entries.add(se);
                 }
                 sf.setEntries(entries);
             }
             writeFeed(httpServletResponse, sf);
         } catch (AuthException e) {
             unauthorizedFeedResponse(httpServletResponse);
         } 

This code should be fairly self-explantory and it builds on all the work that we have done so far. The only new things are the actual calls into the Y! Mail API that retrieve the messages from Y! Mail, converts them into an RSS 2.0 feed, and writes that feed to the wire. There are a couple of issues with this code that we don’t address, like generating a permanent URL for the user to use. Right now whenever their authentication is reset (at least every 2 weeks) they will have to get a new feed URL from the application.

Throughout these simple examples there are opportunities for optimization. Many of the pieces of data can be cached for some amount of time and regenerated later like the token, the wssid and the cookie when the user fails to authenticate. The user hash can, of course, be used forever as a unique identifier for the user. Other opportunities for optimization include the ability to multiply dispatch requests to the API using batchExecute. Our example is an unoptimized version so that you can see how everything works before its made more complicated through the optimization process.

Here is a link to the full application, the source code is under WEB-INF/src. You will need to configure the web.xml with your own appid and secret but otherwise it should run in a standard JEE Servlet 2.4 container out of the box.

Posted in Technology | Tagged , | 3 Comments

Versus: Argue why Foo is better than Bar

Versus is a research project by some Bix developers (my friend John Beatty is one of them) at Yahoo that is trying to aggregate the arguments for choosing between two or more competitive offerings.

It was just released near the beginning of the month and is building momentum but there are already very good comparisons with many arguments on both sides:

MySQL vs PostgreSQL
Static typing vs Duck typing
Squeezebox vs AirTunes
Yelp vs Zagat

The other cool thing about it is that anyone can add new arguments, people can vote for arguments on both sides, there are comments associated with each argument, and anyone can introduce a new competition. With luck Versus will become the place to settle arguments of all kinds!

Tagged | Leave a comment

Web 2.0 Expo

I’ll be at the Web 2.0 Expo starting tomorrow night at the Ignite event.

I was a bit disappointed with the regular Web 2.0 conference this year. Not enough technology for me. I think that this conference will be much better. Send me an email if you are going to be at the conference and want to meet up.
Tagged | Leave a comment

Java Performance Optimization

Last night I decided to revive my poker hand evaluator library and look at it from a performance perspective and do some optimizations if need be. Some of my findings give insight into what kinds of things are optimized in JDK 1.6 vs JDK 1.5 and how things vary between Mac OS X, Windows, and Linux.

So the first stage of any optimization project is to create benchmarks that accurately reflect the usage of the system in the real world — with some nod to the worst case scenario. The two benchmarks I produced to test the poker library were the following:

1) Run an entire 10 hand Texas Hold’em poker game from beginning to end with no one folding and determine a winner. This should reflect the worst case scenario for a poker server that is trying to serve games to users.
2) Evaluate random hands with random boards. This should reflect what would be required to do Monte Carlo simulations or full solution space searches.

The benchmarks will be reported on three different systems:

A) MacPro, Mac OS X 10.4.9, 2x Intel Xeon 5160 (dual core 3 ghz), 8G RAM, JDK 1.5.0_07
B) MacPro, Windows XP SP2, 2x Intel Xeon 5160 (dual core 3 ghz), 8G RAM, JDK 1.5.0_11 + JDK 1.6.0 + JRockit 5.0 R27.2
B) Dell 1850, 2x Intel Xeon 2.8 ghz (1st gen dual core, hyperthreading enabled), 4G RAM, JDK 1.5.0_11, JDK 1.6.0

If you run the benchmark on another system, please send me the results or post them in the comments. The second thing that I did was go and get a profiler. I tried a bunch of different profilers but the one that has the best integration with my IDE and also performs quite well was JProfiler 4.0 (integrated with IntelliJ IDEA). The one that was the cheapest (free) and most barebones that worked was JIP-1.0.7 and I also think it was more accurate for methods that get inlined at runtime but they basically showed identical results. JIP though is $499 cheaper and doesn’t have a nice runtime graphical display of the progress. Another advantage of JIP was that programs execute about twice as fast as under JProfiler. Looks like they need an IntelliJ IDEA plugin :)

The starting point for the poker engine was written using the Java collection classes and leveraged them quite a bit to make things clean and easy to understand. I knew at the time though that there were probably many optimizations that could be done either with custom collections or by using arrays when appropriate. So our base benchmarks look like this (best of 3):

[ java -jar bench.jar 1] bench-3687.jar

EnvironmentThreadsGames / secondRanks / second
Mac OS X, 1.5 client VM111093130975

So what do the profiles show? It turns out that using the collections libraries, even those without concurrency and carefully choosing implementations, you still end up spending tons of time within them rather than doing the real work of your program. Especially for something as data intensive as this application. I spent a couple hours painstakingly moving the collections usage over to arrays in all the hotspots that I found in the code. One consequence of this is that I found a few bugs and added a few new tests to the system so it was a very useful exercise even separate from the performance optimizations. Making these changes — without changing the interface to the library which was quite simple — netted us quite a profit:

[ java -jar bench.jar 1] bench-3785.jar

EnvironmentThreadsGames / secondRanks / second
Mac OS X, 1.5 client VM126525389610

You’ll notice that we are executing these benchmarks with the absolute default as far as tuning the Java VM goes. It turns out that tuning the VM is absolutely critical with Sun’s VM if you want the best performance — and it isn’t a small difference either. As it turns out it is very easy to get into pathological GC conditions where you are very close to the memory limit where it doesn’t not increase the heap size but instead drastically increases the frequency of collections causing the performance of the benchmark to plummet. I have even seen conditions where it is nearly at a standstill. For this benchmark we find increasing the minimum heap size well above this GC pathology threshold helps tremendously, as does using the server VM so he following benchmarks use:

[ java -server -Xmx256m -Xms256m -jar bench 1/2/4]

EnvironmentThreadsGames / secondRanks / second
Mac OS X, 1.5 server VM152714696055
Mac OS X, 1.5 server VM2907531264974
Mac OS X, 1.5 server VM41200542238467

I’ve also done these benchmarks on a full suite of systems. Unfortunately, I’m not at liberty to say which VM performed the best (the aqua line) due to an NDA I’ve signed with a large computer company but as you can see, there is widely varying behavior from the various JVMs:

As it turns out, the current set of Java VMs still cannot completely self-tune themselves, especially when it comes to choosing the amount of memory they should allocate for the best performance. Certainly more innovation around self-tuning has been done in the 1.6 and JRockit VMs but I believe, based on my limited results, that there is still a lot of room for improvement. The other take away is that the newer processors, even with approximately the same clock rate, have much better performance characteristics and scale far better than their predecessors. Finally, it appears that Mac OS X crushes Windows for running Java applications on the same hardware, especially when running heavily multithreaded applications. Of course, as with any benchmark, this is only applicable for applications quite similar to the poker engine. Other application behavior may vary as the different strengths of the systems are exercised.

Here is the current version of the Poker Engine under an attribution, non-commercial use creative commons license:
PokerEngine.zip

Tagged , | 1 Comment

Search by number

Just added a cool little feature to Yahoo! Search with Greasemonkey. Using this you can just press the number of the search result rather than clicking on it so your hands don’t have to leave the keyboard.

Here is the script, you have to have greasemonkey installed on Firefox to use it:

Search By Number

You could probably do something similar with Google but they don’t number their searches so its not quite as easy but it could be adapted.

UPDATE:

Found this really cool greasemonkey compiler that turns my script (with some more features like sponsored links) into a plugin.

Tagged , | Leave a comment

Yahoo! Pipes just launched

A really cool application, Pipes, from Yahoo! launched the other day and has the potential to really make it easy to mashup web applications and repurpose data.

It is loosely based on the Unix pipes model of using small programs chained together to process data where the output of one program is the input to another program. With the Pipes application you use RSS (or ATOM) feeds as your source data rather than text files and then use the available transforms and other data source to mashup a new feed that can be displayed using a number of different renderers including a map output. Here is an example of how you can build a pipe with variables and multiple stages:

The whole project reminds me a little bit of my XQuery project I did so long ago though much better executed and more approachable. Though as soon as they open up the ability to create more transformation types I will probably drop XQuery in there to do the dirty work.

Tagged , , | Leave a comment

Bix Widget

Putting external content on your site and/or profile is hot right now, see my newly installed Yahoo! Search Link badge, my Flickr badge, my Yahoo! Finance badge, and my MyBlogLog widget. Bix has a really cool one as well.

Bix, if you haven’t heard, is a contest site that lets you run your own contests. In fact, you can even make the whole contest yourself and then post it to your blog or myspace account. Here is an example embedded on this page:


Go to this contest on Bix
Pretty cool eh? Certainly much more engaging than a poll. What kinds of external applications like these do you find the most useful?

Tagged | Leave a comment

Crossover (WINE) vs. Parallels vs. Bootcamp

Another entrant has joined the race to capture the dollars of those who wish to run Windows software on their Intel Macs. Codeweavers has now released a beta of Crossover Mac which is a repackaging of WINE with additional compatibility modifications and more user friendly tools.

Installation of Crossover Mac couldn’t be simpler. Just download the DMG, mount it, and drag the Crossover application to your Applications directory. If you did not install X11 to your computer you will have to install a library (quartz-wm) from your Mac OS X install disk, it will guide you through the steps. After you have it installed, what can you do with it? First lets compare it with the other options:

Bootcamp: The real deal. A true Windows install with full access to all the hardware and software though only Windows XP SP2 is currently supported.
Parallels: Virtual machine. A true Windows install but with only partial access to the hardware leading to poor 3D performance, slightly reduced CPU performance, and much lower disk performance. Nearly any x86 OS.
Crossover Mac: Runs Windows software as native Mac OS X apps with emulation libraries in place of Microsoft libraries. Can pretend to be Win98, Win2000, or WinXP. Can run some 3D games.

First lets imagine the perfect piece of software and see how these line up:

1) Runs nearly all Windows programs at full speed
2) Executes them as native Mac OS X applications without a container
3) Does not require a Windows license

Bootcamp performs very well at 1) but fails utterly on points 2 and 3. A reboot is currently required to get into Bootcamp so you can’t run Mac and Windows applications at the same time. A Windows license for XPSP2 is also required. Parallels runs nearly all Windows programs that do not require 3D graphics with a reasonable performance hit. They do not however look like Mac OS X programs and all run inside a window that contains the entire Windows instance, but at least you can run Windows applications at the same time as Mac OS X applications. Parallels, like Bootcamp, does require a Windows license though it could be an old cheap one instead of an expensive XP SP2 license. Finally Crossover runs very few Windows programs at near full speed for CPU operations, though some 2D graphics operations in PowerPoint appear to be much slower than their Windows equivalents. Those applications though do run right along side your Mac OS X applications and use far less memory than a Parallels install but still don’t look quite native due to the reliance on Crossover’s X11 server. For instance, each application does not have a corresponding Dock icon. The one thing that gives Crossover a price edge though is that no Windows license at all is required.

Take all my following observations about Crossover as a review of their beta and not a final product. The Linux version of their product apparently has far fewer compatibility problems and it should get better and better as the product becomes more baked. First up is straight from the name, Office 2003. I’ve installed it in both the Win2000 and WinXP ‘bottles’. A bottle is like a fake operating system environment that looks like its namesake to the installed applications. I highly suggest that you first install IE 6 to the WinXP and Win2000 bottles since many applications appear to depend on them and may not say so. After that is done, insert your Office 2003 install disk and Crossover should prompt you to install it automatically. I chose not to install Access (it doesn’t support it), Infopath, or some other random application and just installed Word, Excel, PowerPoint, and Outlook. Once installed they will then appear in the Crossover Programs menu and can be launched from there or from the shortcuts on disk, or even from the native binaries that you could find if you look around in the place where they are installed under ~/Library/Application Support/Crossover. The aliases are placed in ~/Applications.

Excel: Looked and performed just like expected for me.
PowerPoint: Fonts a little off. Very fast for everything but graphics.
Word: I don’t use Word but it opened documents well enough.
Outlook: Connectivity not working yet for IMAP. Going to try Exchange when I get the chance.

I also tried things like Steam (the game downloading and execution software for Half-Life et al.) and it worked well for most things. When I downloaded and launched CounterStrike: Source I wouldn’t say it failed completely but it was futile to play it. Firefox worked perfectly fine out of the box. IE had issues and was basically unusable for me. In the end, I can only recommend the beta as a test bed for your Windows applications that you want to use it with and to send feedback to Codeweavers. What they have right now is a good technology demonstration but I don’t think I could use it for real work yet, at least not in PowerPoint or Outlook, the two applications I was most interested in running until Microsoft ships Universal binaries of their Mac Office suite. For only $40 you can pre-order it and hedge that its going to do what you need it to do after it launches, if you compare that to Bootcamp or Parallels thats cheaper because of the lack of Windows license.

Tagged , | 4 Comments