Rants about Java and other internet technologies by Sam Pullara

Time Machine vs. ZFS + rsync

Update: I actually got the fslogger thing at the end of this entry working so I can do incremental backups. Not really a product yet but it isn’t hard to do. Here is the super rough version of it.

I can’t stand inefficiency. Time Machine is fundamentally a very inefficient mechanism for backing up large files that change. So bad actually that most things like Parallels and VMWare disable backups of your disk images. Here is the basic algorithm:
Continue Reading »

Yahoo! Application Platform (YAP) and Yahoo! Query Language (YQL) launch today

Since I moved into the platform group at the beginning of the year I had worked with the YAP and YQL teams to help them define their strategy and direction but without being part of the day-to-day operations. In August, the head of the Y!OS project asked me to step in to take them through their final run to launch. It has been a great couple of months working with the teams. They both had an amazing showing at Hack Day and now today we are launching the platforms worldwide.
Continue Reading »

2008 Olympic Medal Counts by Population

There are obviously a lot of ways to measure how well a country did at the Olympics. This post takes a view that we should look at how many people the country had to draw on in order to send the athletes to China to compete. There are a lot of problems with this including: ex-pats competing for their home country, vast disparity in wealth between countries and the relative interest in the Olympic games of the cultures. One of the things that jumps out immediately is that island nations that draw on a larger related population do very well in the games. They likely have inherited not only the interest in the competition but are also wealthy enough to train and compete in the games.
Continue Reading »

Yuil is dead! 4hoursearch is now online.

As this was really just a demonstration of the power of Yahoo! BOSS, I have brought the site back as a demonstration site. Additionally, Yahoo! is making the source code to the new site available so anyone with a knack for Python, HTML and CSS can take a swipe at making a better search experience.  In order to make a nice UI I teamed up with another Sam, Sam Lind.  I put together the skeleton using Yahoo!’s amazing YUI tools and he created the look and feel.  Please try it out and take advantage of Yahoo!’s open search API:
Continue Reading »

Yahoo! BOSS is easy — meet Yuil

Updated Yet Again: Relaunched as 4hoursearch including the source code. See this blog entry.

Updated Again: Yuil is dead. However, you can always get the same great search results here.

Updated: Using Glue I was able to add some simple category functionality.

I’m sure everyone saw the recent announcement of a new search engine, Cuil. I thought I would have a little fun with it and put together a quick parody of it by mashing up their UI and Yahoo!’s search results. As usual, the biggest problems I had were related to my pathetic Python skills. I’d love to add the category stuff in (Yahoo! has that info as you can see in search assist) but BOSS doesn’t yet have that in the API.  But it does have web and image search and even search suggestions. Here is the one, the only, the amazing:
Continue Reading »

Better Javadoc results using SearchMonkey

When you are searching for things like java.util.HashMap one of the issues that you run into is that it will give you the result with the highest rank which more often than not is the 1.4.2 version of the documentation.  I’ve moved on from that version of Java and would much rather see results for version 6.  I actually did this plugin back in December for the first SearchMonkey hackday and won “most useful” as it could be extended to any type of versioned documentation you might find on the web.  Today I’ll also include my plugin for MySQL but I’ll use Java as the example.
Continue Reading »

Idiomatic Python?

I’ve been working my way through compiling Java into Python code but the Python back end of my isn’t that good (my brain). I would call my stage of Python development the “magic incantation” stage. This is the stage where you really aren’t comfortable yet with the way things work in a new language but you can still get things done by miming other developers. I’ve also had some help from some friends on Twitter: @lhl, @precipice and @jkwatson.  My distributed information system is now getting some redundancy.  Little did they know that I was doing parallel invocations of identical requests for reliability and incrementally higher performance — and the results were verified using a quorum of responders.
Continue Reading »

Tivo targeted advertising



Tivo targeted advertising, originally uploaded by Sam Pullara.

This looks like it might be both effective and also something that TV advertisers would like to buy.

Using Google App Engine to Extend Yahoo! Pipes

Update: A commenter pointed out that you can

from django.utils import simplejson

instead of including it. Makes this even easier.

Yahoo! Pipes has always been a great tool for manipulating data but often you have to go to great contortions to get it to do what you want because of its very simple data flow programming model.  Google’s App Engine opens up the possibility of extending Yahoo! Pipes in very interesting ways through Pipes’ Web Service operator.  Currently this operator sees little use as it requires you to be running an external server somewhere on the internet that is always available for the Pipe execution which is quite a high barrier to entry for the typical Pipes developer. Here is what a Pipe that is using web service looks like and our example pipe:

Web Service Pipes Example 

With the launch of Google App Engine there is now a very simple way to get code up on the internet quickly in order to include arbitrary processing in the interior of your Pipes.

To demonstrate how this works, let’s first build a very simple web service that simply mirrors the data that it receives from Pipes.   If you don’t have a Google App Engine account you can still follow along by download the SDK and executing all the stuff locally though it will have to be accessible from the public internet if you want Pipes to send you requests.

First create a new application directory:

mkdir pipes-mirror
cd pipes-mirror 

Now create an application descriptor called app.yaml:

application: javarants
version: 1
runtime: python
api_version: 1

handlers:
- url: /.*
  script: pipes.py

This application descriptor basically tells Google how to deploy your application. Your application name should match an application name that you create within the GAE administration tool:

Application Name

Now we need to process the data coming from pipes. Pipes is going to pass this web service some data in JSON format and we need to parse it. GAE doesn’t include ‘simplejson‘ in the Python container so you are going to have to include it with your application. I downloaded simplejson-1.8.1 and symbolically linked its simplejson directory into my application directory. When the request comes in the JSON data will be in the ‘data‘ parameter so we are going to pull it out, parse it, grab the items array and write it back over the wire in pipes.py:

import simplejson
import wsgiref.handlers

from google.appengine.ext import webapp

class MirrorPipesWebService (webapp.RequestHandler):
	def post(self):
		data = self.request.get("data")
		obj = simplejson.loads(data)
		obj = obj["items"]
		self.response.content_type = "application/json"
		simplejson.dump(obj, self.response.out)

def main():
  application = webapp.WSGIApplication([('/mirror', MirrorPipesWebService)],
                                       debug=True)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == "__main__":
  main()

Now you should have a directory structure that looks a lot like this:

-rw-r--r--@ 1 sam  sam  106 Apr 13 18:55 app.yaml
-rw-r--r--  1 sam  sam  559 Apr 13 19:28 pipes.py
lrwxr-xr-x  1 sam  sam   47 Apr 13 17:40 simplejson -> /Users/sam/Software/simplejson-1.8.1/simplejson

Now that we have all the pieces we can deploy the application to GAE with a simple command from the GAE SDK:

appcfg.py update .

At this point you should be able to replace my web service URL that you find in my example Pipe with your application URL which will be

http://[application name].appspot.com/mirror

and get the same results as mine.

What kind of uses can you put this great power? I currently have a web service that I run that combines RSS entries from the same day into a single entry and have it deployed on my own server. I will likely port that to GAE as it doesn’t require a lot of CPU and it is a pain having to administer it. In fact, most of the functionality that you see in a service like FeedBurner would be easy to build on top of this framework. More exotic use cases can be found on Y! Pipes itself where at least one person uses web services to pass in photo URLs and return the coordinates of human faces in the images.

JPA 2.0 with Criteria

(see: JSR 317 Persistently Improving)

I love the idea of adding a criteria API to JPA, the only thing I hope that they do differently than Hibernate is to implement that API in addition to string queries.  In Gauntlet we had issues where we wanted to use EJB-QL for selecting the right data and then a criteria-like API for applying security and filtering constraints on the query.  We ended up writing a criteria-like API that augmented the WHERE clause of the query to get the behavior that we needed (like described here).  For example, you could do this:

Query q = em.createQuery("SELECT p FROM Project p");
q.addExpression(Expression.notEqual("id", 2));

Or something like that. This would give you the best of both worlds, where you have the expressiveness of the textual query and the ability to further hone that query programmatically.

YUI-Mainstream Theme by Buzzdroid.com

 Premium Wordrpess Theme