Quick (re-)introduction: My task for Gentoo/Google Summer of Code 2009 is to give Gentoo a Debian popcon equivalent, a tool to collect statistics on "what package is installed how often". To achieve this goal I'm extending Smolt (a tool currently doing similar things with hardware information) by fine-tunable software stats gathering. The plan we have for Smolt is to make it cross-distro, not just fit Gentoo or Fedora. One point where the consequences and benefits of such an approach can be seen clearly is with counting packages from different distros into the same buckets.
What do I mean by that? Debian's Git counts for Gentoo's Git counts for Fedora's, you know the list. With packages counted from accross distros we can suddenly answer questions that we currently cannot answer, among them
What globally popular packages are missing in distro X? Let's say we don't have a package for product P. Do other distros have one? They do, maybe we need one, too? They don't, maybe P is not that important then?
How many Linux users are approximately using program X in total? Not just on Ubuntu or Arch - all across Linux, BSD, Solaris!
Does distro X have 10 times the packages of Y or is it just different splitting?
To count into the same bucket we use global identifiers for the "products" that fall out of a package. Gentoo package "dev-util/git" can produce product "cpe://a:git:git", Debian's "git-core" can, too. That string before is a CPE name, a concept close to package naming in Java. This "intermediate language" allows us to relate package names from distro X with those of distro Y and answer various questions from that data. To do such mapping we need code (or a "service") that does the mapping for us and base of collected data that the service can operate on. Both of these is project "PackageMap". I have started populating the database with packages (currently 312 in number) made from information extracted from the Gentoo tree and the National Vulnerability Database. Latter holds many CPEs. Let me state clearly that packagemap is not about Gentoo in particular. Sure, the initial data has lots of Gentoo in it but the whole point of the project is to get information and people from different distros together. To see what these 312 packages maps look like at the moment you best do a few clicks through the database folder yourself: http://git.goodpoint.de/?p=packagemap.git;a=tree;f=database Also, there are Relax NG schema and DTD for validation, more documentation than I usually write and a few scripts: http://git.goodpoint.de/?p=packagemap.git;a=tree
By now I hope you have gained interest in what this can become. Your active participation is highly appreciated. A few minutes from everyone can make a huge difference here. If you want write access to the repo - mail me: email@example.com. Please have a look at the Git repository and ask questions. Thanks for reading up to this point. PS: I'm aware "hartwork.org" might not make a good longterm location for DTDs, XML namespaces and such for a cross-distro project. Any ideas where to put them best?