Hey, I have a really big file on my computer, this file only got words in it. And because the file is really big, I sure there are repeted words in there. Is there a way of deleting every single repeated word, leaving at least one? Every single word is a line, so I the script will have to compare lines and delete the repeated ones, leaving at least one. Thanks Dan08.
Dan08 8
Junior Poster
Recommended Answers
Jump to PostSomething like this?
def uniquelines(lineslist): unique = {} result = [] for item in lineslist: if item.strip() in unique: continue unique[item.strip()] = 1 result.append(item) return result file1 = open("wordlist.txt","r") filelines = file1.readlines() file1.close() output = open("wordlist_unique.txt","w") output.writelines(uniquelines(filelines)) output.close()
Jump to PostFast method, using sets :
lines=open(myfile,'r').readlines() uniquelines=set(lines) open(outfile,'w').writelines(uniquelines)
which can be done in only one line :
open(outfile,'w').writelines(set(open(myfile,'r').readlines()))
All 6 Replies
jcao219 18
Posting Pro in Training
tbone2sk 14
Junior Poster in Training
jice 53
Posting Whiz in Training
Gribouillis 1,391
Programming Explorer Team Colleague
ultimatebuster 14
Posting Whiz in Training
TrustyTony 888
pyMod Team Colleague Featured Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.