The laundry list of repositories that are filling the POM in Maven projects has to go. The ideal of having a central store of all artifacts is clearly dead and we have to move on. My proposal is two fold. First, the repository to end all repositories that doesn’t actually store the files but simply redirects to a known good location for the group id / artifact id / version combination. The second is that we create an ad hoc standard for artifact discovery based on the group id. For example, if the group id is com.sampullara.cli-parser then you should be able to find a repository that stores all artifacts at http://cli-parser.sampullara.com/repository/. Perhaps we could even put some sort of discovery file at the root to find them. This would allow anyone to distribute Maven artifacts to developers without having to publish to any central location. It would also drastically reduce the amount of repository cruft that is creeping into so many of the POM files I have seen recently. Further, it would make it really easy for github, googlecode, apache, etc to help their developers automatically publish with just the right naming conventions in place.
Come to think of it, maybe the whole thing is a little suspect. Maybe we should have been using URLs the entire time. How much value are we getting out of the layer of indirection? This is intended to be a discussion…
It's a similar argument to dealing with Java packages. The “standard” was always reverse your domain (i.e. com.foo.bar.Baz), but there was no guarantee that going to bar.foo.com would help you find the JAR containing Baz.class. Ultimately, you don't search for artifacts by group name and artifact name. Often what you want is to find the artifact that contains a particular class, because you found some code that uses com.foo.bar.Baz but you have no idea what artifact that's even a part of. It would be excellent if an ad hoc system for locating classes existed. Let some compiler tooling trawl through my code, notice I have a dependency on com.foo.bar.Baz, make requests to http://bar.foo.com/artifacts/Baz.class and http://foo.com/artifacts/bar/Baz.class until it finds an artifact that contains that particular class. Maybe even have that resource issue a redirect to a JAR resource like http://foo.com/artifacts/bar.jar. Then the compiler can resolve all of the class dependencies to JAR redirects and download the uniques.
I don't know, maybe a tad too automated. But at the end of the day, I spend a lot of time searching in our artifact repository for classes, finding out what artifact it's in, adding the artifact to my ivy.xml file (we use Ivy, but it's a similar experience with Maven) and then letting the build system download the artifact. Why not just let me declare my dependency on particular classes and have the tools resolve those to JARs and download them?
I think it would be far easier if dependencies were simply specified as URLs and there was standard resource names under a root path.
For example, http://junit.org/releases/4.8.2 would the base URL that you specify as a dependency, and the follow resources would exist:
http://junit.org/releases/4.8.2/junit-4.8.2.jar
http://junit.org/releases/4.8.2/junit-4.8.2-src...
I love this idea! It certainly solves the resolution problem implicitly when you point to a URL.
Am liking the use of URLs instead of dependency group/artifact/version tuples (particularly as often projects change their group/artifact ids or moves stuff to different repos which causes pain in poms).
Then you could just run (on your local machine or on your WAN/LAN for your team) a regular caching http proxy to speed up mvn builds.
It is nice having a global mirrored repo as a cache in case sampullara.com is down for example – but central repos could just rsync known repos and act as a backup http proxy cache.
It'd be nice to be able to do imports using URIs too in code…
import http://cli-parser.sampullara.com/4.5
and the JVM could auto-download stuff on the fly…
As I've always argued this is not a technical problem. Your argument is predicated on a supply of healthy Maven repositories which is simply not the current reality. What you're proposing is not lost of us, however. It starts with having healthy Maven repositories around the world and Sonatype has made a serious effort to get Nexus Pro instances in all of the major OSS forges around the world, and we also host a open instance (http://oss.sonatype.org) that any project can use. The system can be more distributed for certain, but there are certain things that need to be done first. We have people working on this effort full-time.
Organizations that use Maven effectively manage their own switchboard by using a repository manager like Nexus. There's just a lot of inertia and I would argue some some centralized form of store, switchboard, or discovery mechanism will always be desired to aggregate everything that exists into one manageable form.
I guess my question is why can't we also introduce an automatic discovery mechanism that doesn't require the use of software beyond web servers and proxies. As for the central mechanism, I think that a crawl + search model would work that doesn't actually store any artifacts.
The issue with relying on URL structure is that not all institutions / projects serve their own artifacts / control their domain / control their infrastructure / can sacrifice carving out a particular URL namespace like this. The indirection – whether people like it not – serves a necessary purpose. As for the uber repository that supports redirection, that's what things like Nexus do.
For the record, the single best (as in complete, accurate, well maintained metadata) Maven / Ivy repo to date is the SpringSource repo, hands down. Try it. http://www.springsource.com/repository/app/