Loop through files in directory and modifying them on the fly.

Question

david_wislon 0 Newbie Poster

16 Years Ago

Hello,
I have been looking at using Beautifulsoup python module to make changes to some static html files a total of 167 files, but being a newb in programming was wondering first how to open, read/process the file, then write it, close it and then open the next file thus creating the loop.

Can someone post me a simple example of doing this loop for a given directory.
Thanks
David

python

3 Contributors
6 Replies
705 Views
3 Years Discussion Span
Latest Post 13 Years Ago Latest Post by paxton91

jice 53 Posting Whiz in Training

16 Years Ago

This code goes recursively in all the sudirs of your rootdir. Open all the files, read the lines, calls a function to process your line (here, it does nothing) and overwrite your files with your new lines

import os

rootdir='c:\Your\Path'

def doWhatYouWant(line):
    return line

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        f=open(file, 'r')
        lines=f.readlines()
        f.close()
        f=open(file, 'w')
        for line in lines:
            newline=doWhatYouWant(line)
            f.write(newline)
        f.close()

jice 53 Posting Whiz in Training

16 Years Ago

I don't know beautifulsoup as i don't parse xml nor html. So, i won't be able to help you but if you google "beautifulsoup tutorial", you'll have plenty of examples.

There's another thread about a similar problem here
http://www.daniweb.com/forums/thread90256.html

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

david_wislon 0 Newbie Poster · Answer 1 · 2008-05-29T13:43:24+00:00

thank you for your reply,

can you show me how to actually make a small change using Beautifulsoup for example, in each page I have this html:

<table align="center" cellpadding="0" cellspacing="0" class="tbl_left_inside" >
<tr >
<td class="tbl_left_report_name">PV Clad Analysis</td>
</tr>
<tr>
<td class="table_package">200</td>
</tr>
<tr>
<td class="tbl_price">$79.99</td>
</tr>
</table>

So I would like to first reduce the price by 15% and then change it on the file and then move to the next file.

from BeautifulSoup import BeautifulSoup
doc =
soup = BeautifulSoup(''.join(doc))

Here how do I load the actual file into the doc list? Then I guess find the td class="tbl_price"> and reduce the value by 15%

Thanks
David

david_wislon 0 Newbie Poster · Answer 2 · 2008-06-03T13:09:53+00:00

hello again,
can someone please help me with this as I am stuck.

this is waht I have so far:

>>> from BeautifulSoup import BeautifulSoup
>>> path = '/home/david/test/stack.html'
>>> html = open(path, 'r')
>>> html
<open file ''/home/david/test/stack.html', mode 'r' at 0x5343c8>
>>> soup = BeautifulSoup(html)
>>> ord_tbl_price = soup.findAll('td', {'class': 'order_tbl_price'})
>>> ord_tbl_price
<td class="order_tbl_price"><span class="order_table_price_small"
> >From</span>$23.60</td>
<td class="order_tbl_price">$79.99</td>
<td class="order_tbl_price">$39.99</td>
<td class="order_tbl_price"><span class="order_table_price"><span 
class="order_table_price_small">1 Blister Pack  - </span> $65.95
                </span></td>
>>>
>>> for x in ord_tbl_price:
...     price = float(x.contents[1].lstrip(' $'))
...     x.contents[1].replaceWith('$%0.2f' % (price * 0.85))

>>> # but here I get an error
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
IndexError: list index out of range
>>> # due to the extra <span>

So, I have a <td> that has a <span> within this with class="order_table_price"

Is there a way to get to the price field and reducing this by 15% and then writing the file out.

Thanks david

paxton91 0 Newbie Poster · Answer 3 · 2011-09-24T00:19:07+00:00

I was getting this error a lot in the past. I was always able to fix this error using range(len(list)). Example:

for i in range(len(list)):

I have really only noticed this with lists though, I think adding range to where you use list (path) should solve this issue.

paxton91 0 Newbie Poster · Answer 4 · 2011-09-24T00:20:15+00:00

I was getting this error a lot in the past. I was always able to fix this error using range(len(list)). Example:
for i in range(len(list)):
I have really only noticed this with lists though, I think adding range to where you use list (path) should solve this issue.

Whoops, just noticed this was originally posted 3 years ago :D