Hi Everyone,

I have got a program which takes a html file as an argument, parses it, and outputs the data to a CSV file. It does this no problem. BUT, i need it to take more than one html file, parse them and put all the data collected into one CSV file.

I have tried just reproducing the code that i have for creating the csv file, but replacing the .write with .append, but this throws up an error.

The following is the code for reading the html file and writing the CSV file:

if __name__ == "__main__":
    try: # Put getopt in place for future usage.
        opts, args = getopt.getopt(sys.argv[1:],None)
    except getopt.GetoptError:
        print usage(sys.argv[0])  # print help information and exit:
        sys.exit(2)
    if len(args) == 0:
        print usage(sys.argv[0])  # print help information and exit:
        sys.exit(2)
    html_files = glob.glob(args[0])
    for htmlfilename in html_files:
        outputfilename = os.path.splitext(htmlfilename)[0]+'.csv'
        parser = html2csv()
        print 'Reading %s, writing %s...' % (htmlfilename, outputfilename)
        try:
            htmlfile = open(htmlfilename, 'rb')
            csvfile = open( outputfilename, 'w+b')
            data = htmlfile.read(8192)
            while data:
                parser.feed( data )
                csvfile.write( parser.getCSV() )
                sys.stdout.write('%d CSV rows written.\r' % parser.rowCount)
                data = htmlfile.read(8192)
            csvfile.write( parser.getCSV(True) )
            csvfile.close()
            htmlfile.close()
        except:
            print 'Error converting %s        ' % htmlfilename
            try:    htmlfile.close()
            except: pass
            try:    csvfile.close()
            except: pass
    print 'All done.                                      '

Anyone have any advice in how to get the program to take more arguements and process them in the same ay as above and then append the data onto the end of the CSV file?

Thanks in advance for any help. it is really appreciated!!

Shaun

Why not open the csv file outside of the loop? That would result in having only one csv output.

Hi, thank you for the reply.

When the code is running, there is only one output file, the problem is that it get over written by the next .html file that is processed.

When i run python filename.py *.html

it processed all .html files in the folder, but the one csv file that is written only contains the data read from the last .html file, when i try to put .append instead of .write, the program doesn't run and throws up an attribute error.

Does anyone have any ideas how i could do this?

Thanks

Shaun

When you open your csv-file, you use the mode "w+b". Any reason you're opening the file as a binary instead of a regular text-file? Anyway, if you want to append text to a file you have to use "a+" (or "a+b").

If you take a look at http://docs.python.org/library/functions.html#open it says:

The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems means that all writes append to the end of the file regardless of the current seek position)

I.e. when you use "w+b" you truncate (deleting the file content) the file when you open it.

Hope this can help

Hi, thank you for the reply.

When the code is running, there is only one output file, the problem is that it get over written by the next .html file that is processed.

When i run python filename.py *.html

it processed all .html files in the folder, but the one csv file that is written only contains the data read from the last .html file, when i try to put .append instead of .write, the program doesn't run and throws up an attribute error.

Does anyone have any ideas how i could do this?

Thanks

Shaun

That's why I suggested you open the CSV file outside of the loop. Why open the CSV every time you open an HTML file? You only need to open it once (and close it once).

You can still take vidaj's advice if you use the same output file and want to maintain the data over multiple runs of your program. But you should also take the advice above. There's no need to open the file more than once.

This article has been dead for over six months. Start a new discussion instead.