Python + NLTK Question

Question

rmbrown09 0 Newbie Poster

11 Years Ago

Hello,

I am trying to generate word frequencies using ngrams. I have taken the brown corpus from nltk and changed it for use with ngram calculations by adding <s> and </s> at the beginning and end (in place of period.) I need to try and calculate the frequencies from this file but am unsure how to go about doing this? My end desire is to generate random ngrams based off bigrams, trigrams and quadgrams.

How can I go about with the calculations? Thank you.

import re
import nltk
import nltk.corpus as corpus
import tokenize

from nltk.corpus import brown

def alter_list(row):
    if row[-1] == '.':
        row[-1] = '</s>'
    else:
        row.append('</s>')
    return ['<s>'] + row

news = corpus.brown.sents(categories = 'editorial')
print len(news),'\n'

x = len(news)
for row in news[:x]:
    print(alter_list(row))

nltk python

Edited 11 Years Ago by TrustyTony because: Unindented the body text

2 Contributors
3 Replies
156 Views
10 Hours Discussion Span
Latest Post 11 Years Ago Latest Post by rmbrown09

All 3 Replies

TrustyTony 888 pyMod

11 Years Ago

mark_sentance is undefined, alter_list is never called. Why only slice of 5 news, how long they are? You call them row.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

rmbrown09 0 Newbie Poster · Answer 1 · 2012-10-25T05:59:39+00:00

Sorry, should be fixed and show the whole corpus!

rmbrown09 0 Newbie Poster · Answer 2 · 2012-10-25T16:45:24+00:00

So after some more looking around I think this equation will do just fine I just need a little help implementing it. What would be the best way to go about doing this?

Equation image here: http://cl.ly/image/2R0G3B2q1v0S

Python + NLTK Question

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers