Several weeks ago, I wrote a simple web crawler that was designed to simply retrieve robots.txt from a range of IP addresses. It was entirely CLI based, and I had it working perfectly. As my python experience has increased, I decided to start learning about GUI based programming, and decided for my first GUI application, I would just 'improve' upon my existing web crawler. However, I broke something.

What's happening is my itertools.product code is working exactly as it had before, but for some reason, the portion of the try block which is responsible for actually getting HTTP request does not execute until the entire range has been iterated, and then only attempts to make the connection to the last IP address in the range. I determined this from a keenly placed print statement, but the strange thing is, I've changed none of the original CLI based code, except for the portions which receive input from the GUI screen instead of the command line. I'm guessing the problem has something to do with the PyQt syntax that I'm over looking. Anyone with more experience with PyQt have any ideas? BTW this is with Python 3.1.1, and PyQt4. Code is as follows:

#! /usr/bin/python3.1

import urllib
import urllib.request
import sys
import itertools
import time

from PyQt4 import QtCore, QtGui
from robo_ui import Ui_robominer


class Start(QtGui.QMainWindow):
    def __init__(self, parent=None):
        QtGui.QWidget.__init__(self, parent)
        self.ui = Ui_robominer()
        self.ui.setupUi(self)

        QtCore.QObject.connect(self.ui.mine_button, QtCore.SIGNAL("clicked()"), self.go_mining)
        QtCore.QObject.connect(self.ui.quit_button, QtCore.SIGNAL("clicked()"), self.quit_mining)
   
    def go_mining(self):
        ip_parts =[]
        for part in self.ui.robo_target.text().split("."):
            if "-" in part:
                min, max = part.split("-")
                ip_parts.append(range(int(min), int(max)+1))
            else:
                ip_parts.append([int(part)])
        
        ip_addresses = itertools.product(*ip_parts)
        for ip_addy in ip_addresses:
            self.ui.textBrowser.append("Trying to fetch from %s..." %ip_addy)
            try:
                time.sleep(2)        
                roboFile = urllib.request.urlopen("http://%d.%d.%d.%d/robots.txt" %ip_addy, timeout=5)
                fetched = roboFile.read()
                decoded = fetched.decode("utf8")
                self.ui.textBrowser.append(decoded)
               

            except Exception as err:
                e = str(err)
                self.ui.textBrowser.append(e)

    def quit_mining(self):
        sys.exit()

           
           



if __name__ == "__main__":
    app = QtGui.QApplication(sys.argv)
    myapp = Start()
    myapp.show()
    sys.exit(app.exec_())

Recommended Answers

All 4 Replies

My guess is that
time.sleep(2)
interferes with the event-loop of PyQT.

You may have to use QTimer to accomplish your delay.

That didn't work. I removed the timer code all together, still doing the same thing.

Hard to guess without knowing the content of the designer .ui file.

Hard to guess without knowing the content of the designer .ui file.

Actually you were right about the Qtimer thing. It wasn't working without any timer in it. I put the Qtimer in just to see what would happen...at least I'm getting Qtimer related syntax errors now, and it is iterating the IP addresses correctly now, so I just need to fool around with the Qtimer now. Thanks.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.