cant send this list from terminal to txt file/drop each index into new line

Question

Niloofar24 15 Posting Whiz

10 Years Ago

Hello, me again :)
With this code:

>>> from BeautifulSoup import BeautifulSoup
>>> import urllib2
>>> url = urllib2.urlopen('http://www.python.org').read()
>>> soup = BeautifulSoup(url)
>>> links = soup('a')
>>> print links

A list of links printed into the terminal. I want to send the list into a text file, i tried this:

>>> with open('python-links.txt.', 'w') as f:
...     f.write(links)

But there was an error:

  File "<stdin>", line 2, in <module>
TypeError: expected a character buffer object

What is the problem? How can fix that?

And one more question; as that list looks like this: (I will copy only small part of the list)

[<a href="#content" title="Skip to content">Skip to content</a>, <a id="close-python-network" class="jump-link" href="#python-network" aria-hidden="true">
<span aria-hidden="true" class="icon-arrow-down"><span>&#9660;</span></span> Close
                </a>, <a href="/" title="The Python Programming Language" class="current_item selectedcurrent_branch selected">Python</a>, <a href="/psf-landing/" title="The Python Software Foundation">PSF</a>,

So how can i drop each link into a new line?
I tried this:

>>> text = '\n'.join(links)

But i got this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected string, Tag found

How can i do that?

python seo

3 Contributors
3 Replies
738 Views
19 Hours Discussion Span
Latest Post 10 Years Ago Latest Post by Niloofar24

All 3 Replies

Gribouillis 1,391 Programming Explorer

10 Years Ago

Python complains because the file's write() method needs a string argument. Here the correct way to handle things is to find the values of the href= attributes, which contain the link targets. If you want to write anything to the file, you can use write(str(anything)).

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

snippsat 661 Master Poster · Answer 1 · 2015-02-27T12:00:50+00:00

Use the new bs4,do not call old BeautifulSoup.
Do not use read(),BeautifulSoup detect encoding and convert to Unicode.

As mention you need take out href attributes,
and you most learn to study webpage with Firebug or Chrome DevTools.
So then you see that you only need adresses that start with http and have href attributes.

from bs4 import BeautifulSoup # Use bs4
import urllib2

url = urllib2.urlopen('http://www.python.org') # Do not call read()
soup = BeautifulSoup(url)
with open('python-links.txt.', 'w') as f:
    for link in soup.find_all('a'):
        if link['href'].startswith('http'):
            f.write('{}\n'.format(link['href']))

Niloofar24 15 Posting Whiz · Answer 2 · 2015-02-27T23:02:35+00:00

Niloofar24 15 Posting Whiz

10 Years Ago

Thank you @Grebouillis.

Thank you @snippsat.

cant send this list from terminal to txt file/drop each index into new line

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers