Hi, I have a huge file (over 60 GB) which has lines in the following consistent format.
The problem is that a Few lines in this file have a line break precisely after the 3rd entry like this:
I need to delete that extra newline and concatenate the line below it with the above one so that it becomes a complete line again. I've come up with the following so far.
in_file = open('myhugetextfile.txt') out_file = open('mycleaneduptextfile.txt','w') #go through the file line by line for line in in_file: #split on ;; and check if the length is less than 11 entries long #if length is less than 11, it means the line has an unnecessary newline in it if len(line.split(';;')) < 11: #strip the unneeded newline char off the end of the line line = line.strip('\n') #Read the next line (incomplete) and store it newline = in_file.readline() #now join the original broken line and the next line repaired_line = line + newline #Write it to a new file out_file.write(repaired_line) #If there are no breaks in the line, just write it out to the new file else: out_file.write(line)
However, when I run this I get a
"ValueError: Mixing iteration and read methods would lose data"
Is my program logic correct or am I doing this the wrong way?
Any help would be appreciated.