Hey, I have a really big file on my computer, this file only got words in it. And because the file is really big, I sure there are repeted words in there. Is there a way of deleting every single repeated word, leaving at least one? Every single word is a line, so I the script will have to compare lines and delete the repeated ones, leaving at least one. Thanks Dan08.
Dan08
8
Junior Poster
Recommended Answers
Jump to PostSomething like this?
def uniquelines(lineslist): unique = {} result = [] for item in lineslist: if item.strip() in unique: continue unique[item.strip()] = 1 result.append(item) return result file1 = open("wordlist.txt","r") filelines = file1.readlines() file1.close() output = open("wordlist_unique.txt","w") output.writelines(uniquelines(filelines)) output.close()
Jump to PostFast method, using sets :
lines=open(myfile,'r').readlines() uniquelines=set(lines) open(outfile,'w').writelines(uniquelines)
which can be done in only one line :
open(outfile,'w').writelines(set(open(myfile,'r').readlines()))
All 6 Replies
jcao219
18
Posting Pro in Training
tbone2sk
14
Junior Poster in Training
jice
53
Posting Whiz in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
ultimatebuster
14
Posting Whiz in Training
TrustyTony
888
pyMod
Team Colleague
Featured Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.