Fast conversion deb to pup db

This is something that has been on my to-do list for a long time. Woof has a script '0setup' that downloads the package database files and converts them to Puppy-standard format. In the case of Ubuntu (Precise Puppy), the bzipped files are about 10MB total, and much bigger after being expanded.

The '0setup' script takes a long time to convert them to Puppy-format after they have been downloaded. Somewhere in the region of half an hour I vaguely recall, I haven't timed it recently.

The same process happens for users of the Puppy Package Manager (PPM). In the 'Configure' window, there is a button to download and update the package database. Note, the script that does this in your running pup is at /usr/local/petget/0setup (called by the PPM).

I have written an application in BaCon, named 'debdb2pupdb', to speedup the conversion. On my laptop it takes 3 minutes and 40 seconds to convert all of the Ubuntu db files, including the 'precise-update' files.

This new application is at support/debdb2pupdb in Woof, and at /usr/local/petget/debdb2pupdb in a running Puppy -- or rather will be in future builds of Puppy.

The PPM calls '0setup', which in turn calls 'debdb2pupdb'. The latter calls 'find_cat' to determine the category of each package. Here is an example of a Puppy-format DB entry:

amarok_2.5.0|amarok|2.5.0|0ubuntu6|Multimedia;mediaplayer|26458K|pool/main/a/amarok|amarok_2.5.0-0ubuntu6_i386.deb|+amarok-common&eq2.5.0,+amarok-utils&eq2.5.0,+kde-runtime,+libc6&ge2.8,+libcurl3-gnutls&ge7.16.2-1,+libgcc1&ge4.1.1,+libgcrypt11&ge1.4.5,+libgdk-pixbuf2.0-0&ge2.22.0,+libgl1-mesa-glx,+libglib2.0-0&ge2.14.0,+libgpod4-nogtk&ge0.7.0,+libkcmutils4&ge4.5.86,+libkdecore5&ge4.5,+libkdeui5&ge4.5.2,+libkdewebkit5&ge4.5,+libkdnssd4&ge4.5,+libkfile4&ge4.5,+libkio5&ge4.5,+libknewstuff3-4&ge4.5,+liblastfm0&ge0.4.0~really0.3.3,+libloudmouth1-0&ge1.1.4-2,+libmtp9&ge1.1.0,+libmygpo-qt1&ge1.0.2,+libmysqlclient18&ge5.5.13-1,+libphonon4&ge4.7.0really4.3.80,+libplasma3&ge4.5.86,+libqjson0&ge0.7.1,+libqt4-dbus&ge4.6.1,+libqt4-network&ge4.5.3,+libqt4-opengl&ge4.5.3,+libqt4-script&ge4.5.3,+libqt4-sql&ge4.5.3,+libqt4-svg&ge4.5.3,+libqt4-xml&ge4.5.3,+libqtcore4&ge4.8.0,+libqtgui4&ge4.8.0,+libqtwebkit4&ge2.2~2011week36,+libsolid4&ge4.5,+libstdc++6&ge4.6,+libtag-extras1&ge1.0.0,+libtag1c2a&ge1.6.1,+libthreadweaver4&ge4.5,+libx11-6,+libxml2&ge2.7.4,+phonon,+zlib1g&ge1.2.0,+libqtscript4-core,+libqtscript4-gui,+libqtscript4-network,+libqtscript4-xml,+libqtscript4-sql,+libqtscript4-uitools|easy to use media player based on the KDE Platform|ubuntu|precise|

Note the category field "Multimedia;mediaplayer" -- the sub-category was introduced recently and is used for displaying icons alongside each entry in the PPM -- some more usefulness is planned for the future. Documentation of this can be found in earlier blog posts.

Note that the dependencies field now has versioning, for example "+libqtcore4&ge4.8.0". This was introduced awhile back and is fully supported by Woof and the PPM, however this is the first time that the DB files converted from Ubuntu/Debian have had this extra versioning information.

If anyone would like to play with this in the PPM, the Woof commit link is below. Grab the latest '0setup' and 'debdb2pupdb' and copy them to /usr/local/petget.

If you are interested in compiling 'debdb2pupdb.bac', you need BaCon 1.0.27 (Precise Puppy has 1.0.26) as it fixes a REGEX bug.

Note, I haven't tested this in Woof yet. I have only tested with the PPM. Seems to work OK.

Woof commit:
http://bkhome.org/fossil/woof2.cgi/info/80c46ff5a1

Geany
One other thing. In Precise Puppy there is no colour-syntax-highlighting for BaCon code. In the PPM, install 'z_geany_bacon_hack' PET package.


Posted on 11 Nov 2012, 23:45


Comments:

Posted on 11 Nov 2012, 24:08 by BarryK
find_cat slow
The bottle-neck now is the 'find_cat' utility. It gets called for each package.

find_cat was originally shell script, then I wrote it in Genie. Awhile back someone converted it to C code ...I can't recall his name -- he was working on the Arch Linux support in Woof for awhile, but I haven't had any communication from him for several months. Woof currently has the C version.

Anyway, as find_cat is now the bottleneck, if we want to reduce that 3 minutes and 40 seconds down further, that would be the place to start.
I know the code in find_cat is very inefficient, certainly room for improvement.



Posted on 12 Nov 2012, 2:00 by mavrothal
visual feedback
debdb2pupdb is certainly faster!
However 3-6 minute is fairly long time without visual feedback.
You may want to add some "progress dots" in rxvt during processing. Alternatively run in the background and notify when finished, thou the "frozen" (during conversion) PPM windows will be a bit awkward for such a long time.


Posted on 12 Nov 2012, 2:24 by technosaurus
database processing
this format is typically faster than compiled C because awk can work on data streams at or above the speed of most people's bandwidth (so can compiled C, but it gets more complicated to write)

wget -O- ... |bzcat |awk ...


1. use wget or curl to download a file and output to stdout
2. pipe it through tar, xzcat, bzcat, gzcat etc... to decompress the stream
3. use awk to process the datastream

awk is architecture independent, easier to update and typically smaller with very comparable speeds to compiled C

I recently wrote an automatic web page generator that used awk to process the $HOME/.pet-packages in ~20 lines to do all the stuff ppm does in a web page. Though it still needs tweaks for the web page, I can post it when I return home tomorrow for the parsing bits (It is pretty fast)

Basically the only thing you need to work on pet packages is BEGIN{FS="|"}....



Posted on 12 Nov 2012, 8:30 by BarryK
find_cat really is bottleneck
To clarify this a bit more, I tested converting just the Ubuntu 'main' repository. Running 'debdb2pupdb' to convert it took 62 seconds.

Then I took out the call to 'find_cat' from 'debdb2pupdb'. So now the conversion is being performed, except the Category field is not getting filled. This time it took 3 seconds.

So, find_cat needs attention. I will look at it tonight.



Posted on 12 Nov 2012, 8:53 by newperson
test post
test post, new user.



Posted on 12 Nov 2012, 8:55 by newperson
test post 2nd
newperson trying again.



Posted on 12 Nov 2012, 8:19 by beejaye
Register new user
Welcome to new user 'beejaye', also known as 'B.K. Johnson' on the Puppy Forum.



Posted on 12 Nov 2012, 9:10 by technosaurus
Speedup findcat
Perhaps I am mistaken, but if we know the main category already we can pass it as an arg to find_cat, so that it can skip the strstr for other main categories (currently it checks all of them)
Another way to speedup the process would be to make it capable of handling all calls at once rather than calling it for every entry.
P.s. I wrote a replacement strstr and strcasestr in the programming section under c macros... I don't know if its any faster, but it is small and simple enough that it could be modified to compare several strings at once.


Posted on 12 Nov 2012, 9:49 by bark_bark_bark
Please update....
Hello can you update info for creating a debian pup. The Debian repos don't get downloaded, and I want to create an 'official' dpup.


Posted on 12 Nov 2012, 16:54 by BarryK
Re Dpup
bark_bark_bark,
I don't know when I will be able to get onto that, however, contact 'pemasu' in the Puppy Forum -- he has been using Woof to build Dpups and might be able to help you.