0

This is my code for creating a url list and Im getting an index out of range error? Please help!

#! /usr/bin/env python

import re

TOTAL_PAGES = 619
idFile = open("listOfURLs2", "r")
outFile = open("output", "w")
idList = []
urlList = []
temp = ""
tempStr = ""
currentList = []
index = 1


for line in idFile:
        idList.append(line)

for i in range(TOTAL_PAGES): #iterates through all input files
        try:
                f = open('detailedData/exchange_view.php?id=' + str(idList[index][:-1]), 'r')
                fullFile = f.read()
        except IndexError,e:
                print e

        temp = re.search('Publics_mPage=\d*">\d*</a> of.*?(\d*) ', fullFile, re.S)
        if temp is not None:
                i = 1
                while i <= int(temp.group(1)):
                        tempStr = "https://www.example.com/private/exchange_view.php?id=" + \
                                idList[index][:-1] + "&peerParticipantsPublics_mPage=" + str(i)
                        currentList.append(tempStr)
                        i += 1
        else:
                pass



        index += 1


for line in currentList:
 outFile.write(line)
        outFile.write(' ')

Edited by redcar2228

3
Contributors
4
Replies
6
Views
4 Years
Discussion Span
Last Post by snippsat
0

As posted over the full Traceback tell you where in code you get the error.

Im getting an index out of range error? Please help!

So to make a index out of range error.

>>> l = [1,2,3,4]
>>> l[3]
4
>>> l[4]
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
IndexError: list index out of range

The print statement explain what happened.

>>> try:
        l[4]
    except IndexError:  
        print "Trying to access an item in a list that isn't there"

Trying to access an item in a list that isn't there

Edited by snippsat

0
Traceback (most recent call last):
  File "createURLList.py", line 21, in <module>
    f = open('detailedData/exchange_view.php?id=' + str(idList[index][:-1]), 'r')
IndexError: list index out of range

This is the error message i get. I tried putting a try block and saw that my code is dying at the file open operation.

1

Looking at code you are doing some stuff that are not so good.
str(idList[index][:-1] this is not nessessay and don't do it inside open()
Look at this.

""" url.txt
ttp://www.youtube.com/
ttp://www.google.no/
ttp://www.sol.no/
""" 

with open('url.txt') as f:
    idList = [item.strip() for item in f]

print idList
#--> ['ttp://www.youtube.com/', 'ttp://www.google.no/', 'ttp://www.sol.no/']

As you see i get a list list without \n
Some test code you can lok at.

>>> index = 0
>>> for i in range(3):
...     i, idList[index]
...     index +=1
...     
(0, 'ttp://www.youtube.com/')
(1, 'ttp://www.google.no/')
(2, 'ttp://www.sol.no/')
>>> for i in range(4):
...     i, idList[index]
...     index +=1
...     
Traceback (most recent call last):
  File "<interactive input>", line 2, in <module>
IndexError: list index out of range

This can be shorter look at this.

>>> for i in range(3):
...     i, idList[i]
...     
...     
(0, 'ttp://www.youtube.com/')
(1, 'ttp://www.google.no/')
(2, 'ttp://www.sol.no/')
>>> for i in range(4):
...     i, idList[i]
...     
...     
(0, 'ttp://www.youtube.com/')
(1, 'ttp://www.google.no/')
(2, 'ttp://www.sol.no/')
Traceback (most recent call last):
  File "<interactive input>", line 2, in <module>
IndexError: list index out of range

Edited by snippsat

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.