I am using this code to get all synonyms from the text in document named "answer_tokens.txt" it is only listing the words in the document without the synonyms. can someone check it out?
from nltk.corpus import wordnet
with open('answer_tokens.txt') as a: #opening the tokenised answer file
    wn_tokens = (a.read())
    #printing the answer tokens word by word as opened
    print('==========================================')
    synonyms = []
    for b in word_tokenize(wn_tokens):
            print (str (b))

for syn in wordnet.synsets (b):
                    for l in syn.lemmas(b):
                            synonyms.append(l.name())

Recommended Answers

All 9 Replies

It seems to me that there is an indention error at line 11. Line 11 should be in the for b loop.

@Gribouillis i have tried to do the proposed change. still no output this is what its giving

    [
    ,
    'Compare
    '
    ,
    'dynamic
    '
    ,
    'case
    '
    ,
    'data
    '
    ,
    'changing
    '
    ,
    '
    ,
    '
    ,
    'example
    '
    ,
    'watching
    '
    ,
    'video
    '
    ]
    ===================================================
    set()
    ==================================================

If you can attach answer_token.txt to a post, I could try the code.

This is the expected output each word in the text document and its synonyms

[
    ,
    'Compare
    '
    ,
    'dynamic
    '
    ,
    'case
    '
    ,
    'data
    '
    ,
    'changing
    '
    ,
    '
    ,
    '
    ,
    'example
    '
    ,
    'watching
    '
    ,
    'video
    '
    ]
    ===================================================
  'Compare'{'equate', 'comparison', 'compare', 'comparability', 'equivalence', 'liken'}

  'dynamic'{'dynamic', 'active', 'dynamical', 'moral_force'}

'case' {'display_case', 'grammatical_case', 'example', 'event', 'causa', 'shell', 'pillow_slip', 'encase', 'character', 'cause', 'font', 'instance', 'type', 'casing', 'guinea_pig', 'slip', 'suit', "typesetter's_case", 'sheath', 'vitrine', 'typeface', 'eccentric', 'lawsuit', 'showcase', 'caseful', 'fount', 'subject', 'pillowcase', "compositor's_case", 'face', 'incase', 'case'}

'data' {'data', 'information', 'datum', 'data_point'}

'changing'{'modify', 'interchange', 'convert', 'alter', 'switch', 'transfer', 'commute', 'change', 'vary', 'deepen', 'changing', 'ever-changing', 'shift', 'exchange'}

'example ' {'example', 'exemplar', 'object_lesson', 'representative', 'good_example', 'exercise', 'instance', 'deterrent_example', 'lesson', 'case', 'illustration', 'model'}

'watching' {'watch', 'observation', 'view', 'watching', 'watch_out', 'check', 'look_on', 'ascertain', 'learn', 'watch_over', 'observe', 'follow', 'observance', 'take_in', 'look_out', 'find_out', 'keep_an_eye_on', 'catch', 'determine', 'see'}

 'video' {'video_recording', 'video', 'television', 'picture', 'TV', 'telecasting'}
    ==================================================

But what is the input file ?

find the attachement answer_token.txt

the previous code did not have import word_tokenize because i had used the module in a previous code. this should give the same output as shown above

from nltk.corpus import wordnet
from nltk import word_tokenize
with open('answer_tokens.txt') as a: #opening the tokenised answer file
    wn_tokens = (a.read())
    #printing the answer tokens word by word as opened
    print('==========================================')
synonyms = []
for b in word_tokenize(wn_tokens):
            print (str (b))
            for b in wordnet.synsets(b):
                    for l in b.lemmas():
                            synonyms.append(l.name())

print ('==========================================')
print (set (synonyms))

I'm getting this output on kubuntu with python 2.7, with almost the same code

==========================================
Compare
this
with
the
dynamic
case
when
the
data
is
changing
,
for
example
watching
a
video.
==========================================
set(['represent', 'deoxyadenosine_monophosphate', 'ever-changing', 'dynamic', 'pillowcase', 'commute', 'follow', 'font', 'guinea_pig', 'object_lesson', 'equivalence', 'character', 'look_on', 'alter', 'watching', 'datum', 'watch', 'keep_an_eye_on', 'take_in', 'amp', 'fount', 'comparability', 'ascertain', 'view', 'observance', "typesetter's_case", 'constitute', 'see', 'cost', 'grammatical_case', 'angstrom_unit', 'event', 'subject', 'deterrent_example', 'showcase', 'antiophthalmic_factor', 'equate', 'causa', 'cause', 'exercise', 'be', 'exchange', 'modify', 'illustration', 'watch_over', 'active', 'lawsuit', 'change', 'comparison', 'convert', 'casing', 'shift', 'typeface', 'equal', 'data_point', 'personify', 'liken', 'look_out', 'determine', 'lesson', 'interchange', 'moral_force', 'transfer', 'caseful', 'vitrine', 'live', 'suit', 'type', 'shell', 'observe', 'representative', 'catch', 'deepen', 'axerophthol', 'dynamical', 'case', 'adenine', "compositor's_case", 'encase', 'exemplar', 'learn', 'embody', 'good_example', 'example', 'vary', 'compare', 'vitamin_A', 'exist', 'slip', 'sheath', 'check', 'information', 'make_up', 'instance', 'group_A', 'angstrom', 'display_case', 'A', 'eccentric', 'pillow_slip', 'find_out', 'changing', 'data', 'ampere', 'a', 'type_A', 'observation', 'incase', 'watch_out', 'face', 'switch', 'model', 'comprise'])

The code is this one, with nltk version 2.0b9

from nltk.corpus import wordnet
from nltk import word_tokenize
with open('answer_tokens.txt') as a: #opening the tokenised answer file
    wn_tokens = (a.read())
    #printing the answer tokens word by word as opened
    print('==========================================')
synonyms = []
for b in word_tokenize(wn_tokens):
            print (str (b))
            for b in wordnet.synsets(b):
                    for l in b.lemmas:
                            synonyms.append(l.name)

print ('==========================================')
print (set (synonyms))

Edit: Code update

Thank you. This is exactly the output we are looking for. I am on windows 10, NLTK 3.2.1
and python 3.4.
Do we have a way to work it out in windows?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.