User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Python section within the Software Development category of DaniWeb, a massive community of 373,375 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,774 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Python advertiser:
Views: 400 | Replies: 5
Reply
Join Date: Mar 2008
Posts: 16
Reputation: alivip is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
alivip alivip is offline Offline
Newbie Poster

How to sort word (from file) frequancy in decrease order? I need help

  #1  
Mar 14th, 2008
I wont to find most 10 frequency word of specific file for that I have written this code


import sys
import string
import re
file = open ( "corpora.txt", "r" )
text = file.read ( )
file.close ( )

word_freq ={ }

word_list = string.split ( text )

for word in word_list:
    count = word_freq.get ( string.lower ( word ), 0 )
    word_freq[string. lower ( word )] = count + 1

keys = word_freq.keys ( )
keys.sort ( )
i=0
while i<10:
 for word in keys:
     
     print word, word_freq[word]
     i=1+1




but it only get the word and its frequency
sample output

blanklines 2
blanklines, 1
characters 1
console 1
count 3
blanklines 2
blanklines, 1
characters 1
console 1
count 3

and its read the file again and again. also it did not sort the
output as you can see

how I can sort output in decrease order to be able to stop print after 10 words?

please help me ASAP
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Jan 2008
Posts: 473
Reputation: ZZucker is on a distinguished road 
Rep Power: 1
Solved Threads: 12
ZZucker's Avatar
ZZucker ZZucker is offline Offline
Posting Pro in Training

Re: How to sort word (from file) frequancy in decrease order? I need help

  #2  
Mar 14th, 2008
There are mistakes like your last line should be i = i + 1. Also string functions are builtin since version 2.2, module re is not needed.

Here is one way to do this with version 2.5
  1. # count words in a text and show the first ten items
  2. # by decreasing frequency
  3.  
  4. # sample text for testing
  5. text = """\
  6. My name is Fred Flintstone and I am a famous TV
  7. star. I have as much authority as the Pope, I
  8. just don't have as many people who believe it.
  9. """
  10.  
  11. word_freq = {}
  12.  
  13. word_list = text.split()
  14.  
  15. for word in word_list:
  16. # word all lower case
  17. word = word.lower()
  18. # strip any trailing period or comma
  19. word = word.rstrip('.,')
  20. # build the dictionary
  21. count = word_freq.get(word, 0)
  22. word_freq[word] = count + 1
  23.  
  24. # create a list of (freq, word) tuples
  25. freq_list = [(freq, word) for word, freq in word_freq.items()]
  26.  
  27. # sort the list by the first element in each tuple (default)
  28. freq_list.sort(reverse=True)
  29.  
  30. for n, tup in enumerate(freq_list):
  31. # print the first ten items
  32. if n < 10:
  33. freq, word = tup
  34. print freq, word
  35. # or
  36. #print word, freq
  37.  
  38. """
  39. my output -->
  40. 3 i
  41. 3 as
  42. 2 have
  43. 1 who
  44. 1 tv
  45. 1 the
  46. 1 star
  47. 1 pope
  48. 1 people
  49. 1 name
  50. """
Last edited by ZZucker : Mar 14th, 2008 at 12:17 pm.
Never argue with idiots, they'll just bring you down to their level and beat you with their experience.
Reply With Quote  
Join Date: Mar 2008
Posts: 16
Reputation: alivip is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
alivip alivip is offline Offline
Newbie Poster

Re: How to sort word (from file) frequancy in decrease order? I need help

  #3  
Mar 15th, 2008
thank you very much it was very helpfull
but is there way to control number of word to be enter by user
like rather than most 10 frequancy word he can enter 11 , 50 or 44 most frequancy word ..etc

and can I remove the marks like (? "" [] ) ..etc ineed only words

and can python bult user interfac (buttun ,text box etc) and how ?

if not how can I ingreat cod to be user interfac (buttun ,text box etc)
Reply With Quote  
Join Date: Jan 2008
Posts: 473
Reputation: ZZucker is on a distinguished road 
Rep Power: 1
Solved Threads: 12
ZZucker's Avatar
ZZucker ZZucker is offline Offline
Posting Pro in Training

Re: How to sort word (from file) frequancy in decrease order? I need help

  #4  
Mar 16th, 2008
... can I remove the marks like (? "" [] ) .. etc ineed only words

Instead of
word = word.rstrip('.,')
use
word = word.rstrip('.,?"[]()')

... control number of word to be enter by user

Where you now have
if n < 10:
use
if n < select:
where variable select is an integer from the user's input

... can python bult user interfac (buttun ,text box etc) and how ?

Python has a simple GUI toolkit called Tkinter supplied that can do all that for you. You need to study up on that, it's a whole new ball of wax. Here would be a typical example:
  1. # a look at the Tkinter Text widget
  2. # use ctrl+c to copy, ctrl+x to cut selected text,
  3. # ctrl+v to paste, and ctrl+/ to select all
  4.  
  5. import Tkinter as tk
  6.  
  7. def get_text():
  8. # get text widget contents between start_index and end_index
  9. # start_index = "%d.%d" % (line, column) here "1.0"
  10. # line starts with 1 and column with 0
  11. # here end_index = tk.END
  12. # set the label text to the typed-in text
  13. v1.set(text1.get(1.0, tk.END))
  14. # clear the text
  15. text1.delete(1.0, tk.END)
  16. text1.insert(tk.INSERT, ' new text')
  17. text1.insert(tk.INSERT, '\n and more text')
  18.  
  19. # this sets the window title caption too
  20. # without the leading space Text will be text!?
  21. root = tk.Tk(className = " Text, Button, Label ...")
  22.  
  23. # text entry field, width=width chars, height=lines text
  24. text1 = tk.Text(root, width=50, height=2, bg='yellow')
  25. text1.pack()
  26.  
  27. # function listed in command will be executed on button click
  28. button1 = tk.Button(root, text='get the text', command=get_text)
  29. button1.pack(pady=5)
  30.  
  31. # define a variable to hold the label text
  32. v1 = tk.StringVar()
  33.  
  34. # label text will always be the textvariable's value
  35. # width/height in char size
  36. label1 = tk.Label(root, textvariable=v1, width=50, height=2)
  37. label1.pack(pady=5)
  38.  
  39. # do some caculation and format result
  40. pi_approx = 355/113.0
  41. str1 = "%.4f" % (pi_approx) # 3.1416
  42. # show result in text widget
  43. text1.insert(tk.INSERT, str1)
  44.  
  45. # start cursor in text1
  46. text1.focus()
  47.  
  48. root.mainloop()
Last edited by ZZucker : Mar 16th, 2008 at 12:48 am.
Never argue with idiots, they'll just bring you down to their level and beat you with their experience.
Reply With Quote  
Join Date: Mar 2008
Posts: 16
Reputation: alivip is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
alivip alivip is offline Offline
Newbie Poster

Re: How to sort word (from file) frequancy in decrease order? I need help

  #5  
Mar 16th, 2008
your reply was so helpful
but how can I make an integer from the user's input (select)?

Is python provide search in directory file contain subfile and folder
for example file name is cars and subfile is Toyota,Honda and BMW and Toyota conain folder name camry and corola, honda contain accord and BMW contan folder name X5

Is there way to enter name of parent file(cars) and search in all sub file(Toyota,Honda and BMW)?
Last edited by alivip : Mar 16th, 2008 at 5:19 am.
Reply With Quote  
Join Date: Mar 2008
Posts: 16
Reputation: alivip is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
alivip alivip is offline Offline
Newbie Poster

Re: How to sort word (from file) frequancy in decrease order? I need help

  #6  
Mar 16th, 2008
this is modify code
# a look at the Tkinter Text widget

# use ctrl+c to copy, ctrl+x to cut selected text,

# ctrl+v to paste, and ctrl+/ to select all
import Tkinter as tk


def most_frequant_word():

      # count words in a text and show the first ten items
    # by decreasing frequency
     
    # sample text for testing

    import sys
    import string
    import re
    v1.set(text1.get(1.0, tk.END))
    text1.delete(1.0, tk.END)
    file = open ("arb.txt", "r")
    text = file.read ( )
    file.close ( )
     
    word_freq = {}
     
    word_list = text.split()
     
    for word in word_list:
        # word all lower case
        word = word.lower()
        # strip any trailing period or comma
        word = word.rstrip('.,/"-_;\[]()')
        # build the dictionary
        count = word_freq.get(word, 0)
        word_freq[word] = count + 1
     
    # create a list of (freq, word) tuples
    freq_list = [(freq, word) for word, freq in word_freq.items()]
     
    # sort the list by the first element in each tuple (default)
    freq_list.sort(reverse=True)
     
    for n, tup in enumerate(freq_list):
        # print the first ten items
        if n < 10:
            text1.insert(tk.INSERT, freq)
            text1.insert(tk.INSERT, word)
            text1.insert(tk.INSERT, "\n")
            freq, word = tup
            print freq, word
root = tk.Tk(className = " most_frequant_word")


# text entry field, width=width chars, height=lines text


text1 = tk.Text(root, width=50, height=20, bg='green')
text1.pack()
# function listed in command will be executed on button click
button1 = tk.Button(root, text='result', command=most_frequant_word)
button1.pack(pady=5)

# define a variable to hold the label text
v1 = tk.StringVar()
# label text will always be the textvariable's value
# width/height in char size
label1 = tk.Label(root, textvariable=v1, width=50, height=20)
label1.pack(pady=5)

# start cursor in text1.
text1.focus()
root.mainloop()

but unfortinatly when I wont to search in (not English text) for example (Arabic) file it will not read it probably it print text like
3ÇáäíÇÈÉ
28Ýí
11Úáì
11ÊÜÊÜãÜÉ
10ãä
10Úä
7Ãä
6ÈÓÈÈ
5ÎÈÑ
5ÇáãÓáãæä

the sample file in attach

I use
 text1.insert(tk.INSERT, freq)
            text1.insert(tk.INSERT, word)
            text1.insert(tk.INSERT, "\n")

to inset to the text
pleas I need your help for this and previous one
Attached Files
File Type: txt arb.txt (7.7 KB, 1 views)
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

DaniWeb Python Marketplace
Thread Tools Display Modes

Other Threads in the Python Forum

All times are GMT -4. The time now is 3:06 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC