I've built a Counter (which is not implemented for python 2.6) for reading a sequence file of strings to a dictionary and trying to return those sequences are unique for that file/a number of files. I use sequences with a length of X-characters as key in my dictionary and put how many times that key has been read to the dictionary (if key in mydictionary: mydict[key]+=1 else: mydictionary[key] = 1).
After I have read the whole file I just check which keys have 1 as value in the dictionary and the save those entries to another dictionary for unique sequences. The problem is that the program consumes more than 2 GB of my the memory and grows all the time until everything has been put into the dictionary. Is this common for dictionaries in python or can it be that I have a memory leak in the code? The program consumes 2.5 GB for three files of 5.1MB each.
mossa
0
Newbie Poster
Recommended Answers
Jump to PostWhich version of Python are you using?
Jump to PostFirst, do not read the entire file at one time, i.e. readlines(), instead:
for rec in open(file_name, "r"): if key in mydictionary: mydict[key]+=1 else: mydictionary[key] = 1
If that doesn't help then you will have to switch an SQL file on disk. SQLite is simple to use …
All 7 Replies
vegaseat
1,735
DaniWeb's Hypocrite
Team Colleague
mossa
0
Newbie Poster
mossa
0
Newbie Poster
woooee
814
Nearly a Posting Maven
mossa
0
Newbie Poster
woooee
814
Nearly a Posting Maven
richieking
44
Master Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.