Hi..
im currently writing a program in python to parse a log file..
this log file is expected to be very big - (70MB - 250MB ) in size..
in the file there's this whole bunch lines of internet activity..
i supposed to parse each line and extract some information..

i dont want to read all lines in the file at once.. and im pretty sure there is a better way to do it...

is there any site or any guidelines in dealing with large text file..
could someone post a reasonable tutorial/efficient algorithm to parse a very big text file..

or if there is someone in here that have already had this experience of dealing with a files like this, some tips/tricks would be very appreciated..


i really am a newbie in python...thanks

I am a newbie at python as well and have never worked with a file of that size. Out of curiosity I would try the entire file first to see if Python and PC could handle it. If it can not then I would try file.read(N) to read in a certain number of bytes from the main file and then save that to a new file. Someone more experienced than myself will probably have a better solution...I am interested in finding out since some of the files I am working with are getting larger in size.

David

yeah, i wouldn't know for sure if i could read it all at once...even if i could i rather not to do it if there is a better way to do it[ coz the file will grow big - and i've been told not to read it all at once]. i just need someone to point out for sure the most efficient way handle things like this..unless reading the whole file at once is the most efficient way of doing it in python...

[im pretty sure there is a common practice of handling this type of problems - im not familiar with it and doesnt know the term for "big text parsing/processing"either [if there's any]. ]

really need some help so that some advice can put me in to the right direction

thanks for the advice though...

This will read one line at a time and you can process each line and put the result in a list or file:

fout = open("ProcFile2.txt", "a")
for line in open("LogFile2.txt", "r"):
    # process each line (whatever you need)
    # for example ...
    process = line[:2]
    fout.write(process)

fout.close()

mm.. i already have that sort of thing on my code..
what i need is some tips/tricks of the common mechanism to deal with a big file..
ie. more like what would you do in steps..
break the text up or reading randomly and maybe use certain types of algorithm to process the file so that it becomes more cost efficient in terms of memory usage and would make the program run faster...
what would be the best way to read 70MB - 250MB text file..??

thanks

What is it that you need to do with the input? Ene Uran already gave you one of the most efficient ways to read a large text file, one line at a time. Doesn't take much memory if you simply process the line then throw it away. I routinely read files 1gb+ this way, keeping stats on specific details of interest.

You can also use read(size) or readlines(size). In both cases, size is the number of bytes to read (approximate for readlines).

Jeff

This article has been dead for over six months. Start a new discussion instead.