Java Performance Optimization
Last night I decided to revive my poker hand
evaluator library and look at it from a performance perspective and do some
optimizations if need be. Some of my findings give insight into what kinds of
things are optimized in JDK 1.6 vs JDK 1.5 and how things vary between Mac OS X,
Windows, and Linux.
So the first stage of any optimization project is
to create benchmarks that accurately reflect the usage of the system in the real
world -- with some nod to the worst case scenario. The two benchmarks I
produced to test the poker library were the
following:1) Run an entire 10 hand
Texas Hold'em poker game from beginning to end with no one folding and determine
a winner. This should reflect the worst case scenario for a poker server that
is trying to serve games to users.2)
Evaluate random hands with random boards. This should reflect what would be
required to do Monte Carlo simulations or full solution space
searches.The benchmarks will be
reported on three different systems:A)
MacPro, Mac OS X 10.4.9, 2x Intel Xeon 5160 (dual core 3 ghz), 8G RAM, JDK
1.5.0_07B) MacPro, Windows XP SP2, 2x Intel
Xeon 5160 (dual core 3 ghz), 8G RAM, JDK 1.5.0_11 + JDK 1.6.0 + JRockit 5.0
R27.2B) Dell 1850, 2x Intel Xeon 2.8 ghz
(1st gen dual core, hyperthreading enabled), 4G RAM, JDK 1.5.0_11, JDK
1.6.0If you run the benchmark on
another system, please send me the results or post them in the comments. The
second thing that I did was go and get a profiler. I tried a bunch of different
profilers but the one that has the best integration with my IDE and also
performs quite well was JProfiler 4.0 (integrated with IntelliJ IDEA). The one
that was the cheapest (free) and most barebones that worked was JIP-1.0.7 and I also think it
was more accurate for methods that get inlined at runtime but they basically
showed identical results. JIP though is $499 cheaper and doesn't have a nice
runtime graphical display of the progress. Another advantage of JIP was that
programs execute about twice as fast as under JProfiler. Looks like they need
an IntelliJ IDEA plugin :)
The starting point for the poker engine was
written using the Java collection classes and leveraged them quite a bit to make
things clean and easy to understand. I knew at the time though that there were
probably many optimizations that could be done either with custom collections or
by using arrays when appropriate. So our base benchmarks look like this (best
of 3):[ java -jar bench.jar 1]
bench-3687.jar
| Environment | Threads | Games / second | Ranks / second |
| Mac OS X, 1.5 client VM | 1 | 11093 | 130975 |
So what do the profiles show? It turns out
that using the collections libraries, even those without concurrency and
carefully choosing implementations, you still end up spending tons of time
within them rather than doing the real work of your program. Especially for
something as data intensive as this application. I spent a couple hours
painstakingly moving the collections usage over to arrays in all the hotspots
that I found in the code. One consequence of this is that I found a few bugs and
added a few new tests to the system so it was a very useful exercise even
separate from the performance optimizations. Making these changes -- without
changing the interface to the library which was quite simple -- netted us quite
a profit:[ java -jar bench.jar 1]
bench-3785.jar
| Environment | Threads | Games / second | Ranks / second |
| Mac OS X, 1.5 client VM | 1 | 26525 | 389610 |
You'll notice that we are executing these
benchmarks with the absolute default as far as tuning the Java VM goes. It
turns out that tuning the VM is absolutely critical with Sun's VM if you want
the best performance -- and it isn't a small difference either. As it turns out
it is very easy to get into pathological GC conditions where you are very close
to the memory limit where it doesn't not increase the heap size but instead
drastically increases the frequency of collections causing the performance of
the benchmark to plummet. I have even seen conditions where it is nearly at a
standstill. For this benchmark we find increasing the minimum heap size well
above this GC pathology threshold helps tremendously, as does using the server
VM so he following benchmarks use:[
java -server -Xmx256m -Xms256m -jar bench 1/2/4]
| Environment | Threads | Games / second | Ranks / second |
| Mac OS X, 1.5 server VM | 1 | 52714 | 696055 |
| Mac OS X, 1.5 server VM | 2 | 90753 | 1264974 |
| Mac OS X, 1.5 server VM | 4 | 120054 | 2238467 |
I've also done these benchmarks on a full
suite of systems. Unfortunately, I'm not at liberty to say which VM performed
the best (the aqua line) due to an NDA I've signed with a large computer company
but as you can see, there is widely varying behavior from the various
JVMs: As
it turns out, the current set of Java VMs still cannot completely self-tune
themselves, especially when it comes to choosing the amount of memory they
should allocate for the best performance. Certainly more innovation around
self-tuning has been done in the 1.6 and JRockit VMs but I believe, based on my
limited results, that there is still a lot of room for improvement. The other
take away is that the newer processors, even with approximately the same clock
rate, have much better performance characteristics and scale far better than
their predecessors. Finally, it appears that Mac OS X crushes Windows for
running Java applications on the same hardware, especially when running heavily
multithreaded applications. Of course, as with any benchmark, this is only
applicable for applications quite similar to the poker engine. Other
application behavior may vary as the different strengths of the systems are
exercised.Here is the current version
of the Poker Engine under an attribution, non-commercial use creative commons
license:PokerEngine.zip
Posted: Sat
- March 17, 2007 at 11:11 AM
|
|
Quick Links
Categories
Yahoo! Tech
MyBlogLog
Calendar
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
Archives
XML/RSS Feed
License
My Flickr
Statistics
Total entries in this blog:
Total entries in this category:
Published On: Aug 27, 2007 05:57 PM
|