I'm writing a script that downloads files for web design, and this line produces

Traceback (most recent call last):
  File "bootlace.py", line 60, in <module>
    download(data['jquery'], 'Downloading jquery. (File size %s)', 'js/')
  File "bootlace.py", line 11, in download
    file_size = int(meta.getheaders("Content-Length")[0])
IndexError: list index out of range

That line is repeated almost exactly further up the page. The JSON that is loaded into the data array is:

{
      ...
      "jquery": "http://cdnjs.cloudflare.com/ajax/libs/jquery/2.0.3/jquery.min.js",
      ...
}

And as you can see, jquery exists in that json. So what is happening?

Make a list index out of range error.

>>> lst = [1,2,3]
>>> lst[2]
3
>>> lst[3]
Traceback (most recnt call last):
  File "<interactive input>", line 1, in <module>
IndexError: list index out of range

So it should be easy to understand if you look at list(print or repr).
Then you see that index you try use is out of list range.

This can mean that meta.getheaders("Content-Length") is returning an empty list.
The if you index it with [0],the empty list give back list index out of range error.
which might happen if something went wrong in the urlopen call.

Edited 2 Years Ago by snippsat

Not sure why that would be. The file does exist and I'm pulling something else from cdnjs, so it's not an issue with access rights.

This works ...

''' urllib2_info_getheaders.py

tested with Python275
'''

import urllib2
import os

def download(url, download_dir):
    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url)
    f = open(os.path.join(download_dir, file_name), 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)


url = "http://www.cs.cmu.edu/~enron/enron_mail_20110402.tgz"
download_dir = "C:/Python27/Atest27/Bull" # future download option

download(url, download_dir)

''' result ...
Downloading: enron_mail_20110402.tgz Bytes: 443469519
'''

Try to test print meta and the type.

see:
http://nbviewer.ipython.org/github/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/ipynb/Chapter%206%20-%20Mining%20Mailboxes.ipynb

Edited 2 Years Ago by vegaseat

One with Requests
So popular that it can be in standar libary in future,maybe in Python 3.4.

When file is big as here(over 400mb) is a good choice to not load all in memory.
So here here it download in chunks of 4096 bytes.

import requests

url = "http://www.cs.cmu.edu/~enron/enron_mail_20110402.tgz"
r = requests.get(url, stream=True)
file_name = url.split('/')[-1]
file_size = r.headers["Content-Length"]

with open(file_name, "wb") as f_out:
    for block in r.iter_content(4096):
        if not block:
            break
        f_out.write(block)
    print "Downloading: {} Bytes: {}".format(file_name, file_size)

Edited 2 Years Ago by snippsat

This question has already been answered. Start a new discussion instead.