Manipulating columns

February 20, 2009 — BarryK

I got up this morning all ready to work on the package manager in Woof, but I realised that with the new standardised database file format, it would be nice if there was a very efficient way of manipulating columns. That is, I may want to extract the 'package name', 'description', 'size' and 'dependencies' columns, in that order, for display in the main GUI window -- however, the fields are not in that order in the files.

So, I hunted around on the Internet for such a utility. I do recall finding a column-manipulator recently, but now of course it is nowhere to be found.

I decided to write it myself, and the first attempt was written in Genie.

I then decided to write it in C. My C coding sure is rusty... took many hours to create what is a quite small program. Well, it works. Compiled and stripped, it's just 4KB. It is written specifically for the Woof package database files, with the '|' character as column delimiter. It is run like this:

# printcols Packages-slackware-12.2-official 1 10 9 6

It will print columns 1, 10, 9 and 6 in that order on stdout. It also handles empty fields and some lines having less fields (columns).

If you want to see the source, it will be in Woof alpha2, coming soon.

Comments

Cut?
Username: Dougal
Why don't you just use "cut"? cut -d'|' -f1,10,9,6 Packages-slackware-12.2-official

reinventing wheels
Username: prehistoric
"This kind of thing has happened to me all the time, in the past. Unix/Linux has any number of cryptic solutions to common problems with names that don't spring to mind, or show up in searches. One liners like this would be a good subject for a wiki page, or an index to existing documentation. Is there already such a page out there? On a different subject, how about this idea for not only improving security, but also getting the benefit of free labor? [url=http://features.csmonitor.com/innovation/2009/02/19/recaptcha-how-to-turn-blather-into-books/]recaptcha

printcols vs cut
Username: BarryK
"Dougal, The reason that I can't use cut is the primary reason why I wrote the utility -- to print the columns in the order specified. 'cut -f 1,10,9,6' will actually print them in order 1,6,9,10. Another thing that I wanted my 'printcols' utility to do, that 'cut' doesn't, is print empty fields if they don't exist. For example: printcols examplefile 1 12 13 14 However, if any lines in the file have less fields than 14, say only 9 fields, I want the output to look like this: text-field-1||||

recaptcha
Username: BarryK
"prehistoric, That's really amazing!

Try noSQL
Username: Springer
"You can use cut and paste, but a better set of tools for this sort of thing would be one of the /rdb-style text-based databases that use regular files for tables, and the Unix shell itself as the basis of its 4GL. Check out nosql (http://www.linux.it/~carlos/nosql/) for a whole collection of very serious text/column/record power tools. These are smart about handling field-based data with tables based on one record per line, with or without headers describing field names. There are also links to other /rdb-like implementations from the page above, but nosql is a very good one, and is the most actively used and developed package of this type. IMO, this is a vital toolset for poweruser text-file munging, even if you don't want to use it as a really nice, shell-friendly database.

or awk, of course...
Username: Springer
"I forgot to add, of course, that this sort of thing is also trivial with awk: '{print $1,$10,$9,$6}' (and of course, you can tweak the input and output field separators with IFS and OFS, so you can make sure you'll see those empty fields.) I'm no awk expert (but even basic awk fluency is VERY worthwhile), but awk does a few things easily that are really hard in many other languages, so I wind up using it for things this on a fairly regular basis. One cool thing about it is that almost every awk script I've ever written (excepting a few that require gawk(yuk!) or mawk) still works just like it did 20 years ago - and does so on Cygwin, U/Win, or Linux, even though some of my awk/shell scripts date back to a proprietary Unix Version 7 system from the mid-80's!

awk
Username: BarryK
"springer, I've never really gotten into using awk, but from the little understanding I do have I guessed that it could probably do what I wanted -- which you have confirmed.

awk / grep
Username: lwill
"I sort of did a similar thing for unleashed that I posted once to make reading the packages.txt file easier. http://www.murga-linux.com/puppy/viewtopic.php?t=33301&sid=71b661410acd27fcd7c3275bdddd134a It was a good learning experience. Maybe it could be modified?

Cut
Username: Dougal
"That's strange... I thought I used "cut" that way in the past and it worked. I guess it was awk.

moving columns
Username: prehistoric
"Most of the time in the past I've used AWK, grep and/or sed for columns. Cut is very limited. I didn't object to Dougal's statement because there have been a number of implementations of cut. For all I know there could be some which will permute the order as BarryK wants, if you supply the correct magic parameters. There are large numbers of tricks I avoid to keep portability. My experience is not current enough to make definitive statements on the subject. We could use some reminders on portable/non-portable scripts, if anyone has a good reference. AWK is underrated as a useful language. I have even seen a package of gawk scripts which do common networking tasks in a very compact form. [url=http://www.gnu.org/software/gawk/manual/gawkinet/html_node/index.html]gawkinet

GhostScript (GS) also underrated
Username: GreatnessGuru
"by prehistoric: "AWK is underrated as a useful language." So is GhostScript/PostScript (GS/PS). Now that GS development is fully Open Source, perhaps GS will, also, come to be fully appreciated for its full potential in scripts, etc. And with that, we now return you to, "Manipulating columns". Thank you, Eddie Maddox Inwood IA USA

Tags: woof