random words

Question

vlady 0 Junior Poster in Training

15 Years Ago

Hello,

I make an exercise from a book : Think Python: How to Think Like a Computer Scientist by Allen B. Downey, Version 1.1.22, random words exercise 7.

I made a script like this (see below) but how can I make list (named book) just once and since then after each execution of the script it will make random choice from this list (if I well understand the task). There are more alternatives (in this manual book) and I don't know which one is more efficient, can you also tell me?

import random
def random_word(h):
    k=histogram(h)
##    print k
    t=[]
    for word, freq in k.items():
        t.extend([word]*freq) # output is a list but it will be build again after each execution of the script
    return random.choice(t)

def histogram(h):
    d = dict()
    for c in h:
        if c not in d:
            d[c] = 1
        else:
            d[c] += 1
    return d


print random_word('I went out for a walk')

Thank you

Vlady

python

3 Contributors
12 Replies
161 Views
2 Days Discussion Span
Latest Post 15 Years Ago Latest Post by TrustyTony

All 12 Replies

TrustyTony 888 ex-Moderator

15 Years Ago

Letters have different frequencys and the list is built from input source to reprecent the frequency by multiplication.

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

Looks like the author is talking about the fact that a word:frequency dictionary of the book would be a lot shorter than a word list. Efficiency would be to get a representative random word from the book, that also reflects the frequency of the word, directly from the dictionary without expanding it into a long word list.

Imagine a book of 1 million words. A word list of every word in the book would have 1 million items. A word:frequency dictionary might only have 5,000 to 10,000 items, since duplicate words are taken care of in the frequency value.

Edited 15 Years Ago by vegaseat because: imagine

TrustyTony 888 ex-Moderator

15 Years Ago

This exercise recuires histogram of words, not letters.

Did you do that (Exercise 10.1). You must remove unwanted characters (punctuation etc) to get words in the end of sentence, for example:

import string
sentences='It was a dark, cold night. Commander said: "I will not tolerate any insubordinates."'

print [i.strip(string.punctuation) for i in sentences.lower().split()]

Edited 15 Years Ago by TrustyTony because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

vlady 0 Junior Poster in Training · Answer 1 · 2010-06-09T17:40:59+00:00

vlady 0 Junior Poster in Training

15 Years Ago

can you give me an example pls?

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 2 · 2010-06-09T18:10:19+00:00

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

Can you give us the text of exercise 7?

vlady 0 Junior Poster in Training · Answer 3 · 2010-06-09T18:32:04+00:00

13.7 Random words

To choose a random word from the histogram, the simplest algorithm is to build a list with multiple copies of each word, according to the observed frequency, and then choose from the list:

def random_word(h):
t = []
for word, freq in h.items():
t.extend([word] * freq)

return random.choice(t)

The expression [word] * freq creates a list with freq copies of the string word. The extend method is similar to append except that the argument is a sequence.
Exercise 7

This algorithm works, but it is not very efficient; each time you choose a random word, it rebuilds the list, which is as big as the original book. An obvious improvement is to build the list once and then make multiple selections, but the list is still big.

An alternative is:

1. Use keys to get a list of the words in the book.
2. Build a list that contains the cumulative sum of the word frequencies (see Exercise 10.1). The last item in this list is the total number of words in the book, n.
3. Choose a random number from 1 to n. Use a bisection search (See Exercise 10.8) to find the index where the random number would be inserted in the cumulative sum.
4. Use the index to find the corresponding word in the word list.

Write a program that uses this algorithm to choose a random word from the book.

vlady 0 Junior Poster in Training · Answer 4 · 2010-06-09T19:36:30+00:00

I've corrected a bit a script because it wasn't what I'd expected.

import random

def random_words(t):
    k=hist(t)
    d=dict()
    t=[]
    a=1
    for i in k:
        d[i]=a
        a+=1
    for w, f in d.items():
        t.extend([w]*f)
    return random.choice(t)


def hist(t):
    l=t.split()
    return l


print random_words('vlady went for a beer')

but still, I don't know how I would proceed with exercise 7. Is it worthy to learn it?

vlady 0 Junior Poster in Training · Answer 5 · 2010-06-10T14:56:11+00:00

Looks like the author is talking about the fact that a word:frequency dictionary of the book would be a lot shorter than a word list. Efficiency would be to get a representative random word from the book, that also reflects the frequency of the word, directly from the dictionary without expanding it into a long word list.
Imagine a book of 1 million words. A word list of every word in the book would have 1 million items. A word:frequency dictionary might only have 5,000 to 10,000 items, since duplicate words are taken care of in the frequency value.

do you think like this? :

import random

def random_words(t):
    g=words(t)
    t=[]
    for w, f in g.items():
        t.extend([w]*f)
    return random.choice(t)


def words(t):
    k=hist(t)
##    print k
    d=dict()
    t=[]
    for i in k:
        if i not in d:
            d[i]=1
        else:
            d[i]+=1
    return d # {'a': 1, 'went': 1, 'vlady': 1, 'beer': 2, 'for': 1}



def hist(t):
       l=t.split() # ['vlady-went', 'for', 'a', 'beer', 'beer']
       for i in l:
           i=t.replace('-',' ') # vlady went for a beer beer
           l=i.split()
       return l # ['vlady', 'went', 'for', 'a', 'beer', 'beer']
##hist('vlady-went for a beer beer')


##random_words('vlady-went for a beer beer')

print random_words('vlady-went for a beer beer')

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 6 · 2010-06-10T17:53:32+00:00

No, because
t.extend([w]*f)
extends into the large word list again!

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 7 · 2010-06-11T02:00:51+00:00

vegaseat means that you must reduce the f to be reasonable size by choosing scaling factor (in case of normal book size material for input).

The biggest problem is that some rare words are going to be multiplied be zero, means they never appear in output even they are in input.

vlady 0 Junior Poster in Training · Answer 8 · 2010-06-11T13:41:42+00:00

vegaseat means that you must reduce the f to be reasonable size by choosing scaling factor (in case of normal book size material for input).
The biggest problem is that some rare words are going to be multiplied be zero, means they never appear in output even they are in input.

I am not sure if I am able to do it, I found it difficult.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 9 · 2010-06-11T13:54:02+00:00

You can find the maximum of the counts and total number of words. If you want to prepare list of 1000 elements, you would divide each count with total count of words and multiply be desired number of elements, here 1000. By int(x) you can take out decimals from results or use integer division // (after multiplying by 1000).

random words

Recommended Answers Collapse Answers

All 12 Replies

Recommended Answers