Hello,
I have been looking at using Beautifulsoup python module to make changes to some static html files a total of 167 files, but being a newb in programming was wondering first how to open, read/process the file, then write it, close it and then open the next file thus creating the loop.

Can someone post me a simple example of doing this loop for a given directory.
Thanks
David

This code goes recursively in all the sudirs of your rootdir. Open all the files, read the lines, calls a function to process your line (here, it does nothing) and overwrite your files with your new lines

import os

rootdir='c:\Your\Path'

def doWhatYouWant(line):
    return line

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        f=open(file, 'r')
        lines=f.readlines()
        f.close()
        f=open(file, 'w')
        for line in lines:
            newline=doWhatYouWant(line)
            f.write(newline)
        f.close()

thank you for your reply,

can you show me how to actually make a small change using Beautifulsoup for example, in each page I have this html:

<table align="center" cellpadding="0" cellspacing="0" class="tbl_left_inside" >
<tr >
<td class="tbl_left_report_name">PV Clad Analysis</td>
</tr>
<tr>
<td class="table_package">200</td>
</tr>
<tr>
<td class="tbl_price">$79.99</td>
</tr>
</table>

So I would like to first reduce the price by 15% and then change it on the file and then move to the next file.

from BeautifulSoup import BeautifulSoup
doc =
soup = BeautifulSoup(''.join(doc))

Here how do I load the actual file into the doc list? Then I guess find the td class="tbl_price"> and reduce the value by 15%

Thanks
David

hello again,
can someone please help me with this as I am stuck.

this is waht I have so far:

>>> from BeautifulSoup import BeautifulSoup
>>> path = '/home/david/test/stack.html'
>>> html = open(path, 'r')
>>> html
<open file ''/home/david/test/stack.html', mode 'r' at 0x5343c8>
>>> soup = BeautifulSoup(html)
>>> ord_tbl_price = soup.findAll('td', {'class': 'order_tbl_price'})
>>> ord_tbl_price
<td class="order_tbl_price"><span class="order_table_price_small"
> >From</span>$23.60</td>
<td class="order_tbl_price">$79.99</td>
<td class="order_tbl_price">$39.99</td>
<td class="order_tbl_price"><span class="order_table_price"><span 
class="order_table_price_small">1 Blister Pack  - </span> $65.95
                </span></td>
>>>
>>> for x in ord_tbl_price:
...     price = float(x.contents[1].lstrip(' $'))
...     x.contents[1].replaceWith('$%0.2f' % (price * 0.85))

>>> # but here I get an error
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
IndexError: list index out of range
>>> # due to the extra <span>

So, I have a <td> that has a <span> within this with class="order_table_price"

Is there a way to get to the price field and reducing this by 15% and then writing the file out.

Thanks david

I was getting this error a lot in the past. I was always able to fix this error using range(len(list)). Example:

for i in range(len(list)):

I have really only noticed this with lists though, I think adding range to where you use list (path) should solve this issue.

I was getting this error a lot in the past. I was always able to fix this error using range(len(list)). Example:

for i in range(len(list)):

I have really only noticed this with lists though, I think adding range to where you use list (path) should solve this issue.

Whoops, just noticed this was originally posted 3 years ago :D

This article has been dead for over six months. Start a new discussion instead.