Here is my code.
If I use forloop to write, it got error:
Traceback (most recent call last):
File "m.py", line 20, in <module>
f.write(tag.string)
TypeError: argument 1 must be string or read-only character buffer, not None


HOWEVER, if I don't use for loop, instead
write as : f.write(tag[1].string)
it is ok, WHY??


from BeautifulSoup import BeautifulSoup
import re
import os
import os.path
import urllib2

outfile='out.txt'

file = urllib2.urlopen("http://www.tripadvisor.com/ShowUserReviews-g294217-d305813-r49043831-Langham_Place_Hong_Kong-Hong_Kong_Hong_Kong_Region.html#REVIEWS")

soup = BeautifulSoup(file)
tag=soup.findAll(id=re.compile("^review"))

if os.path.isfile(outfile):
  os.remove(outfile)
f=open(outfile, 'a')


for i in range(0,5):
  f.write(tag[i].string)

Recommended Answers

All 4 Replies

Here is my code.
If I use forloop to write, it got error:
Traceback (most recent call last):
File "m.py", line 20, in <module>
f.write(tag.string)
TypeError: argument 1 must be string or read-only character buffer, not None


HOWEVER, if I don't use for loop, instead
write as : f.write(tag[1].string)
it is ok, WHY??


from BeautifulSoup import BeautifulSoup
import re
import os
import os.path
import urllib2

outfile='out.txt'

file = urllib2.urlopen("http://www.tripadvisor.com/ShowUserReviews-g294217-d305813-r49043831-Langham_Place_Hong_Kong-Hong_Kong_Hong_Kong_Region.html#REVIEWS")

soup = BeautifulSoup(file)
tag=soup.findAll(id=re.compile("^review"))

if os.path.isfile(outfile):
  os.remove(outfile)
f=open(outfile, 'a')


for i in range(0,5):
  f.write(tag[i].string)

Try inserting a print tag trace statement right before the for loop to make sure all elements in tag are actually strings...

I write : print tag
it goes ok...

I write : print tag
it goes ok...

How about print ' || '.join([each.string for each in tag]) ... I've never used BeatifulSoup so I don't know about the whole tag.string thing that you're doing...

I solved it.
some are not string, they are tag( of html ).

because I am grabbing the data in the url,
there are 5 tag of <p id=""> ..... in the html, I grab them down using beautifulsoup, stored in a object called "tag"

the length of tag is 5
print tag is ok
write tag[0].string is ok
write tag[1].string is ok
write tag[2].string is NOT OK!!

WHY?see the documentation of beautifulsoup:

string

For your convenience, if a tag has only one child node, and that child node is a string, the child node is made available as tag.string, as well as tag.contents[0]. In the example above, soup.b.string is a NavigableString representing the Unicode string "one". That's the string contained in the first <B> Tag in the parse tree.

soup.b.string
# u'one'
soup.b.contents[0]
# u'one'
But soup.p.string is None, because the first <P> Tag in the parse tree has more than one child. soup.head.string is also None, even though the <HEAD> Tag has only one child, because that child is a Tag (the <TITLE> Tag), not a NavigableString.

soup.p.string == None
# True
soup.head.string == None
# True

SO, tag actually is a list, of which each member can be a list itself!!
The name of the members of the inner list are contents
And some may not be String at all!
refer back to my web page, the 3rd paragraph:

<p id="review_49014668">This hotel is just superb. I'd had it recommended to me by a work colleague and decided that even at the price, I should give it a go. We arrived late from Australia (11pm) and were immediately whisked through the very interesting street level lobby, into a lift to Reception, checked in and taken to our room. From the moment we arrived to the time we were in our room would have been no more than 5 minutes and were passed from staff member to staff memeber who NEVER stopped smiling. Our whole stay was made so comfortable by the staff, the quality of the hotel and fantastic food in the Langham Place Shopping Centre. <br/><br/>These are the dot points I wrote while I was there.<br/><br/>Great location in Mongkok<br/>Close to ladies markets, sneaker St, sportswear St, temple St night markets. <br/>Next to Langham Place Shopping Centre, <br/>MTR located under shopping centre. <br/>Rooftop pool. <br/>Very helpful and attentive staff.</p>

it not only have the opening tag, but also having tag inside....
that's why the writing can do it, because it can't write a <br/> tag(WHY? I don't know)

SO, this is the solution:

f=open(outfile, 'a')

for i in range(0,5):
  f.write("\n"+ str(i)+"\n" )
  for j in range(0, len(tag[i])):
    if isinstance(tag[i].contents[j], unicode):
      f.write( tag[i].contents[j] )
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.