Inviting you to project “PackageMap” 2009-06-12

Quick (re-)introduction:  My task for Gentoo/Google Summer of Code 2009 is to give Gentoo a Debian popcon equivalent, a tool to collect statistics on “what package is installed how often”.  To achieve this goal I’m extending Smolt (a tool currently doing similar things with hardware information) by fine-tunable software stats gathering.

The plan we have for Smolt is to make it cross-distro, not just fit Gentoo or Fedora.  One point where the consequences and benefits of such an approach can be seen clearly is with

counting packages from different distros into the same buckets.

What do I mean by that?  Debian’s Git counts for Gentoo’s Git counts for Fedora’s, you know the list.  With packages counted from accross distros we can suddenly answer questions that we currently cannot answer, among them

  • What globally popular packages are missing in distro X? Let’s say we don’t have a package for product P. Do other distros have one? They do, maybe we need one, too?  They don’t, maybe P is not that important then?
  • How many Linux users are approximately using program X in total? Not just on Ubuntu or Arch – all across Linux, BSD, Solaris!
  • Does distro X have 10 times the packages of Y or is it just different splitting?

To count into the same bucket we use global identifiers for the “products” that fall out of a package.  Gentoo package “dev-util/git” can produce product “cpe://a:git:git”, Debian’s “git-core” can, too. That string before is a CPE name, a concept close to package naming in Java.  This “intermediate language” allows us to relate package names from distro X with those of distro Y and answer various questions from that data.

To do such mapping we need code (or a “service”) that does the mapping for us and base of collected data that the service can operate on.  Both of these is project “PackageMap”.

I have started populating the database with packages (currently 312 in number) made from information extracted from the Gentoo tree and the National Vulnerability Database.  Latter holds many CPEs. Let me state clearly that packagemap is not about Gentoo in particular.  Sure, the initial data has lots of Gentoo in it but the whole point of the project is to get information and people from different distros together.

To see what these 312 packages maps look like at the moment you best do a few clicks through the database folder yourself:
http://git.goodpoint.de/?p=packagemap.git;a=tree;f=database

Also, there are Relax NG schema and DTD for validation, more documentation than I usually write and a few scripts:
http://git.goodpoint.de/?p=packagemap.git;a=tree

By now I hope you have gained interest in what this can become.
Your active participation is highly appreciated.
A few minutes from everyone can make a huge difference here.
If you want write access to the repo – mail me: sebastian@pipping.org.

Please have a look at the Git repository and ask questions.

Thanks for reading up to this point.

PS: I’m aware “hartwork.org” might not make a good longterm location for DTDs, XML namespaces and such for a cross-distro project.  Any ideas where to put them best?

Creative Commons License
The Inviting you to project “PackageMap” by Sebastian Pipping, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

7 Comments
Scott Shawcroft June 12th, 2009

Sebastian,
This is a problem I have also. I’m doing analysis of the time it takes distributions to add new versions of a particular package into their repos. I have a bit of data for this but intend on generating much more. We should coordinate. Check out the research website oswatershed.org.
~Scott

James Le Cuirot June 12th, 2009

This may be obvious but I want to say just in case. Make sure there’s an easy option to temporarily turn this off because you wouldn’t want developers skewing the results when creating and testing ebuilds.

Fabio Erculiani June 12th, 2009

You may want to look at the Entropy project, in particular at its user generated content framework and RPC infrastructure.

http://gitweb.sabayon.org/?p=entropy.git;a=summary

Warbo June 12th, 2009

If this is taken up across distros then it will DEFINITELY be a good thing! I’ve made a few packaging tools, and whilst the formats for Deb, RPM, etc. may be formalised and thus relatively easy to convert between (like alien does), it is currently impossible to preserve any dependencies, recommendations, ‘provides’, conflicts, etc. simply due to the diversity of package splitting and naming between distros. The only way to convert between them would be for a database to map equivalents between distros, which is a large undertaking. It looks like PackageMap could do that, if it gets widely accepted.

Thank you :)

[...] introducing on code level. After all one of the long term goals is to feed Smolt statistics with PackageMapped installation data from across distros, not just Gentoo. So if you read this and feel like bringing [...]

Silvio Cesare January 19th, 2011

I have been automatically generating equivalencies between packages in Linux distributions. A post I made to Debian-devel which in a response referenced PackageMap – http://lists.debian.org/debian-devel/2011/01/msg00646.html

Would you find such a list of equivalent packages useful?


Silvio Cesare

[...] though there are several projects that might help in automating this, including distromatch, PackageMap, and whohas.) And it should allow the user to customize the build as much as the source formats do, [...]

Leave a Reply