Function coinT() tests if two time series are stationary using ADF test and Hurst exponent. Time series are stored in cvs files 1511x6 each, but for testing only a vector of the 5th column is returned by function stock(), there are 50 files in total. It seems that the program is using too much memory as it makes the PC crash after running for ~30 secs, it works fine on 15 files, but crashes on larger sets(>50).

Can somebody please help me out to find where is the memory leak, I've tried splitting computations in to multiple functions and deleting object, but it didn't help much.

import numpy as np
import pandas as pd
import statsmodels.tsa.stattools as ts
import csv
import timeit
from numpy import log, polyfit, sqrt, std, subtract
from pandas.stats.api import ols
import os

src = 'C:/Users/PC/Desktop/Magistr/Ibpython/testing/'
filenames = next(os.walk(src))[2] #load all stock file names into array
cointegratedPairs = []

def hurst(ts):
"""Returns the Hurst Exponent of the time series vector ts
    H<0.5 - The time series is mean reverting
    H=0.5 - The time series is a Geometric Brownian Motion
    H>0.5 - The time series is trending"""

    # Create the range of lag values
    lags = range(2, 100)

    # Calculate the array of the variances of the lagged differences
    tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]

    # Use a linear fit to estimate the Hurst Exponent
    poly = polyfit(log(lags), log(tau), 1)

    del lags
    del tau

    # Return the Hurst exponent from the polyfit output
    return poly[0]*2.0

#Convert file into an array
def stock(filename):
    #read file into array and get it's length
    delimiter = ","
    with open(src + filename,'r') as dest_f:
        data_iter = csv.reader(dest_f, 
                            delimiter = delimiter, 
                            quotechar = '"')
        data = [data for data in data_iter]
    data_array = np.asarray(data)[:,5]
    return data_array

   del data
   del data_array

#Check if two time series are cointegrated
def coinTest(itemX, itemY):
    indVar = map(float, stock(itemX)[0:1000]) #2009.05.22 - 2013.05.14
    depVar = map(float, stock(itemY)[0:1000]) #2009.05.22 - 2013.05.14

    #Calculate optimal hedge ratio "beta"
    df = pd.DataFrame()
    df[itemX] = indVar
    df[itemY] = depVar

    res = ols(y=df[itemY], x=df[itemX])
    beta_hr = res.beta.x
    alpha = res.beta.intercept
    df["res"] = df[itemY] - beta_hr*df[itemX] - alpha

    #Calculate the CADF test on the residuals
    cadf = ts.adfuller(df["res"])

    #Reject the null hypothesis at 1% confidence level
    if cadf[4]['1%'] > cadf[0]:

    #Hurst exponent test if residuals are mean reverting
        if hurst(df["res"]) < 0.4:
            cointegratedPairs.append((itemY,itemX))
    del indVar
    del depVar  
    del df[itemX]
    del df[itemY]
    del df["res"]   
    del cadf  

#Main function
def coinT():
    limit = 0
    TotalPairs = 0

    for itemX in filenames:
        for itemY in filenames[limit:]:
            TotalPairs +=1
            if itemX == itemY:
                next
            else:
                coinTest(itemX, itemY) 



    limit +=1  

All the del statements are useless: local variables in functions are destroyed when the function exits. Without running the code, I see only one growing structure: the array cointegratedPairs. This is the most probable source of memory leak.

cointegratedPairs array does not grow beyond 10-20 tuples. I'm really puzzled, perhaps there is some issue with the PC handling 100% CPU performance task.

It is very strange that the program crashes the PC. Did you discover a bug in one of the imported modules ? One thing I would try if I were you is to run coinTest() in a subprocess (by the means of the multiprocessing module). Each coinTest would have its own process with all the data destroyed when this process terminates. You could fill a multiprocessing.Queue instead of the list cointegratedPairs.

Edit: another interesting idea is to call gc.collect() after the call to coinTest() in your code, to see if it changes anything.

Edited 1 Year Ago by Gribouillis

I've tested the same programm on a laptop with the same RAM, but a bit more modern processor (i5) and it worked fine. The longer the scpt runs the faster the cooler starts accelerating, I've added sleep(0.1) which brought CPU usage to ~50% from 90%-100% but it didn't help, I've also cleaned up the cooler which was a total mess, but it didn't help also. It's a hardware issue, but I can't figure out which one, maybe I should add thermo paste to the processor, have not done that for the last 3 years.

This article has been dead for over six months. Start a new discussion instead.