Hello,

I do the following exercise:
Write a function called most_frequent that takes a string and prints the letters in decreasing order of frequency. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at wikipedia.org/wiki/Letter_frequencies.
I try my best but I am not sure if it corresponds to a required result thought there is not answer provided (in order to check it and compare if my method it´s not very different (deviated) from desired one)
The result might be like this (below) and if do so, I don´t know how to remove duplicate in "tuples". Please can you help me?
[(2, 'o'), (2, 'n'), (2, 'i'), (2, 'a'),(1, 'v'), (1, 's'), (1, 'p'),...]
My result is like this:
but we can´t see the frequencies...

What would be the most appropriate result?

Thank you very much!

Vlady

def most_frequent(s):
    t=s.split()
    delimiter= ''
    s=delimiter.join(t)
    l=list(s) #['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
    f=[]
    for i in l:
        f.append(l.count(i)) # [1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2]
    tup=zip(f,l)
    tup.sort(reverse=True)
    res=[]
    for freq,letter in tup:
        if letter not in res:
            res.append(letter)
    print res # ['o', 'n', 'i', 'a', 'v', 's', 'p', 'k', 'j', 'e']


def main():
    s='janko sie na pivo'
    most_frequent(s)

if __name__ == '__main__':
    main()

Recommended Answers

All 6 Replies

I would include the letter in the tuple appended to list at line 8 or do loop over count, letter tuples of frequency and letter prepared by the zip function. I would not definately use l as variable name, use full, understandable words.

Why are you going through the following gyrations instead of using "t". Also, please do not use "i", "l", or "O" as variable names as they can look like letters.

t=s.split()
delimiter= ''
s=delimiter.join(t)
l=list(s)

The nice thing about using a list of (char_frequency, char) tuples is that they also sort the characters that have matching frequencies ...

# create a list of (char_frequency, char) tuples

import pprint

text = "supercalifragilisticexpialidocious"

# create a character list of the text
ch_list = list(text)

# create a list of (letter_freq, letter) tuples
# set(ch_list) creates a set of unique characters
# c.isalpha() is True for letters only
ltc = [(ch_list.count(c), c) for c in set(ch_list) if c.isalpha()]

# sort by increasing frequency
# also sorts the letters with matching frequencies
ltc.sort()

# pretty print the result
pprint.pprint(ltc)

''' my result ...
[(1, 'd'),
 (1, 'f'),
 (1, 'g'),
 (1, 't'),
 (1, 'x'),
 (2, 'e'),
 (2, 'o'),
 (2, 'p'),
 (2, 'r'),
 (2, 'u'),
 (3, 'a'),
 (3, 'c'),
 (3, 'l'),
 (3, 's'),
 (7, 'i')]
'''
commented: difficult one! :-) +3

Just a note that can help.
New from python 2.7 is collections.Counter

>>> from collections import Counter
>>> l = ['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
>>> Counter(text)
Counter({'a': 2, 'i': 2, 'o': 2, 'n': 2, 'e': 1, 'k': 1, 'j': 1, 'p': 1, 's': 1, 'v': 1})

It also has a most_common feature.

>>> Counter(text).most_common(5)
[('a', 2), ('i', 2), ('o', 2), ('n', 2), ('e', 1)]
>>>

The original problem is that the OP is iterating through the list

l=list(s) #['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
f=[]
for i in l:   ## <-- iterating through the input list
    f.append(l.count(i)) # [1, 2, 2, 1, 2, 1

instead of through a list of letters, which would eliminate the duplicate problem.

l=list(s) #['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
f=[]
##for i in l:
for i in string.lowercase:   ## "abcdef...etc" --> count each letter once
    f.append(l.count(i)) # [1, 2, 2, 1, 2, 1

but since this is homework, no one wants to give out a complete solution/

thank you all of you! Here is my version.

def most_frequent(s):
    t=s.split()
    delimiter= ''
    s=delimiter.join(t) # jankosielnapivo
    slabiky=list(s) #['j', 'a', 'n', 'k', 'o', 's', 'i', 'e', 'n', 'a', 'p', 'i', 'v', 'o']
    frekvencia=[]
    for i in slabiky:
        frekvencia.append(slabiky.count(i)) # [1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2]
    tup=zip(slabiky,frekvencia)

    vysledok=[]
    for i in tup:
        if i not in vysledok:
            vysledok.append(i)
    vysledok.sort(reverse=False)
    return vysledok

def main():
    s='janko siel na pivo'
    print most_frequent(s)

if __name__ == '__main__':
    main()
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.