Hi everyone,i wrote a python thread that opens a file and reads it.The problem is that i dont know how to return the data as the thread will not return it from the run() method.I tried writing another function that gets the file content but when i call this function,it does nothing.What do i do please

My code

import threading, time

class LoadHtml(threading.Thread):

     def __init__(self,file):
         self.file=file
         self.html=""
         threading.Thread.__init__(self)

     def run(self):
         
         self.f=open(self.file,"r")
         self.data=self.f.read()
         self.f.close()
         self.html=self.html+self.data
         
     
     def getHTML(self):
         return self.html

loadhtml=LoadHtml("testindex.ind")
loadhtml.start()
print loadhtml.getHTML()

Wow, this is a head scratcher. When I started working through this I added a few print statements to the functions in the class so I could trace the execution of the program. What I found was that if I kept the print statements in place, the program would work as expected, and if I removed them then it stopped working.

Below is the code that is working properly on my machine. The only difference between your code and mine is the print statements. Not sure what to tell you.

import threading, time

class LoadHtml(threading.Thread):

     def __init__(self,file):
         print "BEGIN: self.__init__"
         self.file=file
         self.html=""
         threading.Thread.__init__(self)

     def run(self):
         print "BEGIN: self.run"
         self.f=open(self.file,"r")
         self.data=self.f.read()
         self.f.close()
         self.html=self.html+self.data
         
     
     def getHTML(self):
         return self.html

loadhtml=LoadHtml("testindex.ind")
loadhtml.start()
print loadhtml.getHTML()

I see 2 solutions. The first one is to wait that the thread is dead before calling getHTML() , This is done with the join method like this

...
loadhtml=LoadHtml("testindex.ind")
loadhtml.start()
loadhtml.join()
print loadhtml.getHTML()

The other method is to add a variable done to the thread object which will become true when the thread is finished loading the html content. We then wait that this variable becomes true with a threading.Condition object like this

import threading, time

class LoadHtml(threading.Thread):

     def __init__(self,file):
         self.file=file
         self.html=""
         threading.Thread.__init__(self)
         self.cond = threading.Condition()
         self.done = False

     def run(self):
         self.cond.acquire()
         self.f=open(self.file,"r")
         self.data=self.f.read()
         self.f.close()
         self.html=self.html+self.data
         self.done = True # <-- we're setting self.done to True when the data is ready
         self.cond.notify() # <- and we notify waiters
         self.cond.release()


     def getHTML(self):
         self.cond.acquire() # <--
         while not self.done: #  <--
             self.cond.wait()#  <--  We're waiting that self.done becomes True
         self.cond.release() #  <--
         return self.html

loadhtml=LoadHtml("testindex.ind")
loadhtml.start()
print loadhtml.getHTML()

In all of the code here, I don't really see the point for the threading. Unless you've simplified the code to demonstrate your question, reading from a file doesn't usually take very long (unless its really big) and in any case, there is nothing else going on so we end up waiting for it anyway.

Now if the point was to read from a website instead of a file, that's an operation that can take some time, but unless we have something else to do, there is still no point to the threading.

If you were trying to collect data from multiple websites at once, then it makes more sense to have the threading. You could start all (or several) of the requests at once and wait for them all to be complete before doing something with the data. (The requests would all run at the same time instead of having to wait for the first site to respond before the second request was sent.)

If your intent is something like the last case, then either of the two scenarios that Gribouillis presented would be good.

Another option would be to have the threaded object collect the data and then 'deliver' the data through a callback. If you were to pass a function to the threaded object before you call start, the threaded object could collect the data and then call the function passing the data. The callback could add it to a list or do other processing on the data, but data objects accessed or modified by more than one thread (the main thread is still a thread) would require some form of locking to prevent it from being modified by two (or more) threads at the same time. Without the locking the data object could become corrupted if one thread started an update and was interrupted by another thread that also performed an update; when you get back to the first thread to finish its update the object is no-longer in the same state that it was.

I found a couple of articles that discuss using Queue (from import Queue from Queue ). It has all of the multi-threading support built in. In the articles they demonstrated a couple of ways to use Queue to manage a small collection of threads. This would allow you to (for example) use 3 threads repeatedly to collect the data from 40 websites. (You could also attempt to collect data from all 40 sites at once, but the overhead involved in starting/running 40 threads and web requests simultaneously might actually make it take longer than using 3 threads repeatedly.) Queue has all of the multi-threading support built in.

The two articles I found were:

Basic Threading in Python
Multi-threading in Python

I found them with a google for python threading

Thanks guys.Gribouillis examples did help alot.

@murtan
You are definetely right.My main purpose for threads are to read websites and process them.Thanks for those links,now i know Queues.

This article has been dead for over six months. Start a new discussion instead.