I though it'll be interesting to look at threads and queues, so I've written 2 scripts, one will break a file up and encrypt each chunk in a thread, the other will do it sequentially. I'm still very new to python and don't really know why the treading script takes so much longer.

Threaded Script:

#!/usr/bin/env python

from Crypto.Cipher import AES
from optparse import OptionParser
import os, base64, time, sys, hashlib, pickle, threading, timeit, Queue


BLOCK_SIZE = 32 
TFILE = 'mytestfile.bin'
CHUNK_SIZE = 2048 * 2048
KEY = os.urandom(32)

class DataSplit():
    def __init__(self,fileObj, chunkSize):
        
        self.fileObj = fileObj
        self.chunkSize = chunkSize
        
    def split(self):
        while True:
            data = self.fileObj.read(self.chunkSize)
            if not data:
                break
            yield data
            
class encThread(threading.Thread):
    def __init__(self, seg_queue,result_queue, cipher):
        threading.Thread.__init__(self)
        self.seg_queue = seg_queue
        self.result_queue = result_queue
        self.cipher = cipher
    
    def run(self):
        while True:
            #Grab a data segment from the queue
            data = self.seg_queue.get()
            encSegment = []           
            for lines in data:
                encSegment.append(self.cipher.encrypt(lines))
            self.result_queue.put(encSegment)
            print "Segment Encrypted"
            self.seg_queue.task_done()

start = time.time()
def main():
    seg_queue = Queue.Queue()
    result_queue = Queue.Queue()
    estSegCount = (os.path.getsize(TFILE)/CHUNK_SIZE)+1
    cipher = AES.new(KEY, AES.MODE_CFB)
    #Spawn threads (one for each segment at the moment)
    for i in range(estSegCount):
        eT = encThread(seg_queue, result_queue, cipher)
        eT.setDaemon(True)
        eT.start()
        print ("thread spawned")
    
    fileObj = open(TFILE, "rb")
    splitter = DataSplit(fileObj, CHUNK_SIZE)
    for data in splitter.split():
        seg_queue.put(data)
        print ("Data sent to thread")
 
    seg_queue.join()
    #result_queue.join()
    print ("Seg Q: {0}".format(seg_queue.qsize()))
    print ("Res Q: {0}".format(result_queue.qsize()))
    
    
    
main()
print ("Elapsed Time: {0}".format(time.time()-start))

Serial Version:

#!/usr/bin/env python

from Crypto.Cipher import AES
from optparse import OptionParser
import os, base64, time, sys, hashlib, pickle, threading, timeit, Queue

TFILE = 'mytestfile.bin'
CHUNK_SIZE = 2048 * 2048

class EncSeries():
    def __init(self):
        pass
    
    def loadFile(self,path):
        openFile = open(path, "rb")
        #fileData = openFile.readlines()
        fileData = openFile.read(CHUNK_SIZE)
        openFile.close()
        return fileData
    
    def encryptData(self,key, data):
        cipher = AES.new(key, AES.MODE_CFB)
        newData = []
        for lines in data:
            newData.append(cipher.encrypt(lines))
        return newData
    

start = time.time()
def main():
    print ("Start")
    key = os.urandom(32)
    run = EncSeries()
    fileData = run.loadFile(TFILE)
    
    encFileData=run.encryptData(key, fileData)
    print("Finish")
    
main()
print ("Elapsed Time: {0}".format(time.time()-start))

using readlines() instead of read seems to speed things up considerably on the serial version too, but it's already much fast than the threaded version.

Recommended Answers

All 5 Replies

Perhaps you could use the cProfile module to find out where time is spent ?

second version should be readlines() not read (as the whole file isn't being read!)

Apparently, I won't get a performance increase here as threads will only benefit performance if there is a significant wait for I/O. Plus they're not running in parallel (I thought they were!).

You could replace threads with processes to get parallelism, using the multiprocessing module.

You could replace threads with processes to get parallelism, using the multiprocessing module.

That's what I'm planning to do, just don't want to lose any performance gain by passing heavy data chunks around. I was thinking: 1 Thread for File I/O, and 2 processes for encryption?

That's what I'm planning to do, just don't want to lose any performance gain by passing heavy data chunks around. I was thinking: 1 Thread for File I/O, and 2 processes for encryption?

It seems worth trying. In any case, if there is a significant difference in performance, it's a good idea to find out where it comes from. The cProfile module allows you to spot the function calls which slow down the program.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.