Hello,

I do the following exercise:
Write a function called most_frequent that takes a string and prints the letters in decreasing order of frequency. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at wikipedia.org/wiki/Letter_frequencies.
I try my best but I am not sure if it corresponds to a required result thought there is not answer provided (in order to check it and compare if my method it´s not very different (deviated) from desired one)
The result might be like this (below) and if do so, I don´t know how to remove duplicate in "tuples". Please can you help me?
[(2, 'o'), (2, 'n'), (2, 'i'), (2, 'a'),(1, 'v'), (1, 's'), (1, 'p'),...]
My result is like this:
but we can´t see the frequencies...

What would be the most appropriate result?

Thank you very much!

Vlady

def most_frequent(s):
    t=s.split()
    delimiter= ''
    s=delimiter.join(t)
    l=list(s) #['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
    f=[]
    for i in l:
        f.append(l.count(i)) # [1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2]
    tup=zip(f,l)
    tup.sort(reverse=True)
    res=[]
    for freq,letter in tup:
        if letter not in res:
            res.append(letter)
    print res # ['o', 'n', 'i', 'a', 'v', 's', 'p', 'k', 'j', 'e']


def main():
    s='janko sie na pivo'
    most_frequent(s)

if __name__ == '__main__':
    main()

I would include the letter in the tuple appended to list at line 8 or do loop over count, letter tuples of frequency and letter prepared by the zip function. I would not definately use l as variable name, use full, understandable words.

Edited 5 Years Ago by pyTony: n/a

Why are you going through the following gyrations instead of using "t". Also, please do not use "i", "l", or "O" as variable names as they can look like letters.

t=s.split()
delimiter= ''
s=delimiter.join(t)
l=list(s)

The nice thing about using a list of (char_frequency, char) tuples is that they also sort the characters that have matching frequencies ...

# create a list of (char_frequency, char) tuples

import pprint

text = "supercalifragilisticexpialidocious"

# create a character list of the text
ch_list = list(text)

# create a list of (letter_freq, letter) tuples
# set(ch_list) creates a set of unique characters
# c.isalpha() is True for letters only
ltc = [(ch_list.count(c), c) for c in set(ch_list) if c.isalpha()]

# sort by increasing frequency
# also sorts the letters with matching frequencies
ltc.sort()

# pretty print the result
pprint.pprint(ltc)

''' my result ...
[(1, 'd'),
 (1, 'f'),
 (1, 'g'),
 (1, 't'),
 (1, 'x'),
 (2, 'e'),
 (2, 'o'),
 (2, 'p'),
 (2, 'r'),
 (2, 'u'),
 (3, 'a'),
 (3, 'c'),
 (3, 'l'),
 (3, 's'),
 (7, 'i')]
'''
Comments
difficult one! :-)

Just a note that can help.
New from python 2.7 is collections.Counter

>>> from collections import Counter
>>> l = ['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
>>> Counter(text)
Counter({'a': 2, 'i': 2, 'o': 2, 'n': 2, 'e': 1, 'k': 1, 'j': 1, 'p': 1, 's': 1, 'v': 1})

It also has a most_common feature.

>>> Counter(text).most_common(5)
[('a', 2), ('i', 2), ('o', 2), ('n', 2), ('e', 1)]
>>>

Edited 5 Years Ago by snippsat: n/a

The original problem is that the OP is iterating through the list

l=list(s) #['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
f=[]
for i in l:   ## <-- iterating through the input list
    f.append(l.count(i)) # [1, 2, 2, 1, 2, 1

instead of through a list of letters, which would eliminate the duplicate problem.

l=list(s) #['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
f=[]
##for i in l:
for i in string.lowercase:   ## "abcdef...etc" --> count each letter once
    f.append(l.count(i)) # [1, 2, 2, 1, 2, 1

but since this is homework, no one wants to give out a complete solution/

Edited 5 Years Ago by woooee: n/a

thank you all of you! Here is my version.

def most_frequent(s):
    t=s.split()
    delimiter= ''
    s=delimiter.join(t) # jankosielnapivo
    slabiky=list(s) #['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
    frekvencia=[]
    for i in slabiky:
        frekvencia.append(slabiky.count(i)) # [1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2]
    tup=zip(slabiky,frekvencia)

    vysledok=[]
    for i in tup:
        if i not in vysledok:
            vysledok.append(i)
    vysledok.sort(reverse=False)
    return vysledok

def main():
    s='janko siel na pivo'
    print most_frequent(s)

if __name__ == '__main__':
    main()
This question has already been answered. Start a new discussion instead.