I've built a Counter (which is not implemented for python 2.6) for reading a sequence file of strings to a dictionary and trying to return those sequences are unique for that file/a number of files. I use sequences with a length of X-characters as key in my dictionary and put how many times that key has been read to the dictionary (if key in mydictionary: mydict[key]+=1 else: mydictionary[key] = 1).
After I have read the whole file I just check which keys have 1 as value in the dictionary and the save those entries to another dictionary for unique sequences. The problem is that the program consumes more than 2 GB of my the memory and grows all the time until everything has been put into the dictionary. Is this common for dictionaries in python or can it be that I have a memory leak in the code? The program consumes 2.5 GB for three files of 5.1MB each.
Jump to Post
First, do not read the entire file at one time, i.e. readlines(), instead:
for rec in open(file_name, "r"): if key in mydictionary: mydict[key]+=1 else: mydictionary[key] = 1
If that doesn't help then you will have to switch an SQL file on disk. SQLite is simple to use …
All 7 Replies
We're a friendly, industry-focused community of 1.21 million developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.