how do I use grep/wc/uniq for max line length, word count

Question

dragonflyheli 0 Newbie Poster

16 Years Ago

Hello everyone,
I'm looking for some help with these simple tasks. I actually need this just for linguistic analysis, so I'm sorry for asking probably dumb questions. :)

There is a simple code that uses grep to find lines that contain a certain word in one file.

linecount=`grep "someword" $1/*file.txt | wc -l`
echo $linecount
wordcount=`grep "someword" $1/file.txt | cut -f2- | wc -w`
echo $wordcount
echo 'avg words per line:'
echo "scale=2; $wordcount / $linecount" | bc

What would be the simplest way to:
- find the maximum line length (in words)? wc -L should be probably used somehow?
- count the vocabulary size (simply number of different tokens) for all the found lines? I could only apply uniq -c to lines, not words

I really appreciate any help. many thanks in advance!

shell-scripting

2 Contributors
2 Replies
111 Views
10 Hours Discussion Span
Latest Post 16 Years Ago Latest Post by dragonflyheli

All 2 Replies

Salem 5,265 Posting Sage

16 Years Ago

If you have a line
"their therapist is over there"

and you're searching "the", what would you want
0 - nothing matches the actual word "the"
1 - the line contains "the" somewhere
3 - there are three places where "the" appears

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

dragonflyheli 0 Newbie Poster · Answer 1 · 2009-05-19T11:45:19+00:00

Actually the words being searched belong to technical markup, each identifying unambiguously one line in natural language, which needs to be parsed; so in this case generally possible ambiguities in search can be ignored.

how do I use grep/wc/uniq for max line length, word count

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers