Manipulating columns

I got up this morning all ready to work on the package manager in Woof, but I realised that with the new standardised database file format, it would be nice if there was a very efficient way of manipulating columns. That is, I may want to extract the 'package name', 'description', 'size' and 'dependencies' columns, in that order, for display in the main GUI window -- however, the fields are not in that order in the files.

So, I hunted around on the Internet for such a utility. I do recall finding a column-manipulator recently, but now of course it is nowhere to be found.

I decided to write it myself, and the first attempt was written in Genie.

I then decided to write it in C. My C coding sure is rusty... took many hours to create what is a quite small program. Well, it works. Compiled and stripped, it's just 4KB. It is written specifically for the Woof package database files, with the '|' character as column delimiter. It is run like this:

# printcols Packages-slackware-12.2-official 1 10 9 6

It will print columns 1, 10, 9 and 6 in that order on stdout. It also handles empty fields and some lines having less fields (columns).

If you want to see the source, it will be in Woof alpha2, coming soon.


Posted on 20 Feb 2009, 17:45


Comments:

Posted on 20 Feb 2009, 18:29 by Dougal
Cut?
Why don't you just use "cut"?

cut -d'|' -f1,10,9,6 Packages-slackware-12.2-official


Posted on 21 Feb 2009, 4:18 by prehistoric
reinventing wheels
This kind of thing has happened to me all the time, in the past. Unix/Linux has any number of cryptic solutions to common problems with names that don't spring to mind, or show up in searches. One liners like this would be a good subject for a wiki page, or an index to existing documentation. Is there already such a page out there?

On a different subject, how about this idea for not only improving security, but also getting the benefit of free labor? recaptcha




Posted on 21 Feb 2009, 7:31 by BarryK
printcols vs cut
Dougal,
The reason that I can't use cut is the primary reason why I wrote the utility -- to print the columns in the order specified. 'cut -f 1,10,9,6' will actually print them in order 1,6,9,10.

Another thing that I wanted my 'printcols' utility to do, that 'cut' doesn't, is print empty fields if they don't exist. For example:

printcols examplefile 1 12 13 14

However, if any lines in the file have less fields than 14, say only 9 fields, I want the output to look like this:

text-field-1||||



Posted on 21 Feb 2009, 7:41 by BarryK
recaptcha
prehistoric,
That's really amazing!



Posted on 21 Feb 2009, 8:34 by Springer
Try noSQL
You can use cut and paste, but a better set of tools for this sort of thing would be one of the /rdb-style text-based databases that use regular files for tables, and the Unix shell itself as the basis of its 4GL.

Check out nosql (http://www.linux.it/~carlos/nosql/) for a whole collection of very serious text/column/record power tools. These are smart about handling field-based data with tables based on one record per line, with or without headers describing field names. There are also links to other /rdb-like implementations from the page above, but nosql is a very good one, and is the most actively used and developed package of this type.

IMO, this is a vital toolset for poweruser text-file munging, even if you don't want to use it as a really nice, shell-friendly database.


Posted on 21 Feb 2009, 8:45 by Springer
or awk, of course...
I forgot to add, of course, that this sort of thing is also trivial with awk:

'{print $1,$10,$9,$6}'
(and of course, you can tweak the input and output field separators with IFS and OFS, so you can make sure you'll see those empty fields.)

I'm no awk expert (but even basic awk fluency is VERY worthwhile), but awk does a few things easily that are really hard in many other languages, so I wind up using it for things this on a fairly regular basis. One cool thing about it is that almost every awk script I've ever written (excepting a few that require gawk(yuk!) or mawk) still works just like it did 20 years ago - and does so on Cygwin, U/Win, or Linux, even though some of my awk/shell scripts date back to a proprietary Unix Version 7 system from the mid-80's!


Posted on 21 Feb 2009, 8:04 by BarryK
awk
springer,
I've never really gotten into using awk, but from the little understanding I do have I guessed that it could probably do what I wanted -- which you have confirmed.


Posted on 21 Feb 2009, 10:14 by PaulBx1
spreadsheet
I wonder if it would make sense to carry around this "database" in a spreadsheet, since that is set up for manipulating columns and such, and easily handles a regular structure like this one. I don't know if scripts have ready access to the contents though. Maybe an export command can generate the output you'd want for a script. Or something...


Posted on 21 Feb 2009, 11:45 by lwill
awk / grep
I sort of did a similar thing for unleashed that I posted once to make reading the packages.txt file easier.
http://www.murga-linux.com/puppy/viewtopic.php?t=33301&sid=71b661410acd27fcd7c3275bdddd134a
It was a good learning experience. Maybe it could be modified?


Posted on 21 Feb 2009, 18:46 by Dougal
Cut
That's strange... I thought I used "cut" that way in the past and it worked. I guess it was awk.



Posted on 21 Feb 2009, 22:36 by prehistoric
moving columns
Most of the time in the past I've used AWK, grep and/or sed for columns. Cut is very limited. I didn't object to Dougal's statement because there have been a number of implementations of cut. For all I know there could be some which will permute the order as BarryK wants, if you supply the correct magic parameters.
There are large numbers of tricks I avoid to keep portability. My experience is not current enough to make definitive statements on the subject. We could use some reminders on portable/non-portable scripts, if anyone has a good reference.

AWK is underrated as a useful language. I have even seen a package of gawk scripts which do common networking tasks in a very compact form. [url=http://www.gnu.org/software/gawk/manual/gawkinet/html_node/index.html]gawkinet[url]


Posted on 21 Feb 2009, 23:08 by zygo
sed
sed would be ok except that it gets messy after 9 submatches. There are 10 fields.


Posted on 21 Feb 2009, 23:27 by GreatnessGuru
GhostScript (GS) also underrated
by prehistoric:
"AWK is underrated as a
useful language."

So is GhostScript/PostScript
(GS/PS).

Now that GS development is
fully Open Source, perhaps
GS will, also, come to be
fully appreciated for its
full potential in scripts, etc.

And with that, we now return
you to, "Manipulating columns".

Thank you,
Eddie Maddox
Inwood IA USA