What's the best command and line options to use if I'm working with a text file that has spaces pretty much inside each description, field? They are not all uniform in each column and I need to extract them. I've already got rid of the tabs with the tr command so its a lot more cleaner now, but I'm just stuck on getting UNIX to recognize where one field ends and another one begins because of the different spaces in each one.

Thanks.

Recommended Answers

All 7 Replies

Hey There,

If there's any sort of delimiter at all between the fields (for instance, a colon), you can use awk's -F option, like:

awk -F":" '{print $1, $2, $3}'

and so forth,

Best wishes,

Mike

It's just whitespace in between.

> I've already got rid of the tabs with the tr command so its a lot more cleaner now,
And destroyed any sense of where all the columns are with it no doubt.

If the result is variable width fields, with variable content, which now overlap, then you're stuck.

Fred Flintstone 37 Wilma Pebbles
Barney Rubble 39 Betty Bam Bam

If you'd expanded the tabs with 'expand', and not gotten rid of them with 'tr', then maybe you would have multiple spaces between columns, and some sense of still having a table.

Fred Flintstone     37  Wilma   Pebbles
Barney Rubble       39  Betty   Bam Bam

At least then various utilities can count characters to establish where the fields are.

If you've still got tabs, then extracting individual fields is dead easy.
Eg. awk -F'\t' '{ print $1,$4 }' and save the removal of them until it's absolutely necessary for display purposes say. Even then, you don't mess about with your original data to do it.

This is how about half of the file that I have to work with looks.


11 LB 11 LB. WEIGHT FOR GM-8, G-9 and G-11 MOUNTS 67.50
11 LB TD 11 LB. WEIGHT FOR GM 100 and HGM 200 MOUNTS 97.50
2.5 BW EXTRA 2.5 LB. WEIGHT FOR DDWS, DWS or WS SYSTEMS 22.50
21 LB 21 LB. WEIGHT FOR GM-8, G-9 and G-11 MOUNTS 90.00
5.0 BW EXTRA 5.0 LB. WEIGHT FOR DDWS, DWS or WS SYSTEMS 30.00
7 LB 7 LB. WEIGHT FOR GM-8, G-9 and G-11 MOUNTS 45.00
ACDC CAPPED POWER ADAPTER - 120VAC TO 13.8VDC@5Amps 75.00
AMC ALUMINUM MOTOR COVERS, FULLY MACHINED 90.00
AP RISER BLOCK SET FOR ASTROPHYSICS RINGS 60.00
APP ADAPTER PLATE FOR MOUNTING G-11 SADDLE 52.50
BP LARGE DIAMETER TRIPOD FOOT FOR HD TRIPOD - SET OF 3 100.00


One of the tasks are show just the descriptions and prices, all of them are pretty much different lengths and the item codes ahead of them as well.

I'm in agreement with Salem, run that file through a quick awk statement to parse it into field based on the tabs and see what you get. It might be the answer you're looking for.

If you have space and/or tabs randomly in between each item on the line, despite the content and its relevancy to other fields, you might want to split on the words like "WEIGHT," etc, split on those and then replace them in your awk/sed statement.

Hope that helps,

Mike

That did it!

Thanks guys!!!

:)

cool :)

Glad to help out in whatever capacity I did ;)

, Mike

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.