Hello!

I need to:
1. Prompt user for text file
2.Analyze file and graph (with bar plot)the 25 most frequent words with length greater than 4
3.The x axis needs to show the word. The a axis is the frequency.

I've built the bulk of the program so far, but am having some trouble with narrowing it down to the 25 most frequent words with length 4.
Also, my plot is missing a couple of things.

Thanks for your help!

import matplotlib.pyplot as plot

def bar_plot(x_axis,y_axis):
    plot.plot(x_axis,y_axis, marker='o')
    plot.xlabel('Words')
    plot.ylabel('Frequency')
    plot.legend()
    plot.grid()
    plot.show()

    
def main():
    import string
    ofile=open(raw_input("Please enter the name of a text file :"))
    s=ofile.read()
    word_freq={}
    
    word_list=s.split()
    word_list=[s.translate(None, string.punctuation) for s in word_list] 


    for word in word_list:
        count=word_freq.get(word.lower(),0)
        word_freq[word.lower()]=count+1
        
    keys=word_freq.keys()
    keys.sort()
    print "Word ---> Frequency"
    for word in keys:
        bar_plot(word, keys)
        
main()

I would suggest that you print word_freq. You possibly want to use setdefault instead of .get although there is no description of what you want this part of the code to do so there is no way to tell for sure.

for word in word_list:
        word_freq.setdefault(word.lower(),0)
        word_freq[word.lower()] += 1

2.Analyze file and graph (with bar plot)the 25 most frequent words with length greater than 4

I would test for word length > 4 and place in a list of lists = [frequency, word], sort in reverse order, and print/plot the first 25.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.