Fast find_cat

I have had my nose down for the last couple of days, coding. Awhile back, I reported on fast conversion of Debian/Ubuntu package database information to Puppy-db-format:
http://bkhome.org/blog2/?viewDetailed=00028

The application 'debdb2pupdb' is called in Woof, also from PPM when update the repo databases.

There is another utility, 'find_cat', that is mentioned in the above post. That is called by other database conversions in 0setup, for example for Slackware. 'find_cat' was migrated inside 'debdb2pupdb' for speed, in the case of Debian/Ubuntu.

Forum member noryb009 took a good look at find_cat.c and rewrote it to work in a different way, such that it no longer has to be inside debdb2pupdb, and for all distros (Debian, Ubuntu, Slackware, etc.) find_cat now works fast. Noryb009 also modified '0setup'.

One problem with noryb009's changes is that he changed the format of /usr/local/petget/categories.dat, to suit easy reading by his find_cat.c.
However, categories.dat is intended to be included into shell scripts if required, and in fact is already read by /usr/local/petget/installpkg.sh.
So, the format of categories.dat must be as it currently is -- instead, I have modified it slightly to be more generic, and added keyword lists to aid in assigning a category (these were previously hard-coded inside find_cat).
Here is the latest categories.dat:
http://bkhome.org/fossil/woof2.cgi/finfo?name=woof-code/rootfs-skeleton/usr/local/petget/categories.dat

I decided to rewrite find_cat from scratch, in BaCon, reading the new-format categories.dat and incorporating noryb009's idea for faster conversion by reading the entire pup-db file (or by stdin). Good, it processes all of the Ubuntu 'main' repo pup-db file in 16 seconds on my laptop.

Here is my new find_cat.bac:
http://bkhome.org/fossil/woof2.cgi/finfo?name=woof-code/support/find_cat.bac

...find_cat binary is in Woof at support/find_cat and in a running Puppy at /usr/local/petget/find_cat.

I took a cautious approach to modifying '0setup', basic changes to support the faster mode of find_cat, plus noryb009's changes for Arch Linux.
Note, some other distros, such as Scientific and Mageia, are not yet using find_cat -- in the case of Mageia, a script 'mageia2ppm' performs the equivalent of 'debdb2pupdb' and 'find_cat', although the latter is primitive.

Woof commit:
http://bkhome.org/fossil/woof2.cgi/info/0da19912b4


Posted on 26 Jan 2013, 18:01


Comments:

Posted on 26 Jan 2013, 18:21 by BarryK
categories.dat
A note about 'categories.dat'. Most of the entries in this file were populated thanks to work done by L18L:
http://bkhome.org/blog2/?viewDetailed=00029

If anyone else wants to add application names into it, please do!

But, be sure to get the latest 'categories.dat', see above link.

When inserting application names, note that the lists are in alphabetical order.

All application names are lower-case.

All application names are the generic names, not a distro-specific name. What I mean by this is that some distros (Debian/Ubuntu, Mageia) split packages up into two or more DEB files -- for example 'amarok' is the generic, or proper, name for a media player, but Debian/Ubuntu have package names 'amarok', 'amarok-common', 'amarok-help-*', 'amarok-utils' -- notice, one of those matches the generic name, but that it not always the case.