Dear all,

I am busy implementing a small program and I'd like to work with threads. First of all, I'd like to specify that I am certainly not an expert user of C++ (I often use Perl -ouch- but in this case, Perl was far too slow for the program I wanted to use).

That being said, my program is divided into two main parts :
a) reading and filling a set of vectors (with functions)
b) running one function to compute a score (with vectors loaded in part a)) and then running them again on a vector of randomized vectors of Elem (bootstrap to estimate the significance of the score).


The vectors loaded in the first part are declared as :

vector <string> datasetNames; // vector of string
vector <Elem> rankedData; // vector of Elem. An elem is an object that contains one int and one double
vector <vector <bool> > datasets;// vector of vectors of booleans
vector <vector <Elem> > shuffledRankedVectors; // vector of numerous (at least 5000) vectors of Elem

In the begininning, I was using a function es (see below) that was computing the score once on the real values (vector <Elem> rankedData) and a given number of times on the randomized values (vector <vector <Elem> > shuffledRankedVectors).

double es (vector <Elem> &rankedData, int rankedSize, vector <bool> &dataset, vector <double> ess)

Even if it was far better than the Perl performances, the bootstrap part stayed far too slow, so I wanted to use threads. To this, unfortunately, I read that I could only pass ONE argument to the function es. I thus wrote a function run_es_pval (see below) that would take a struct that contains integers and pointers to the objects I needed to run my calculations.

struct input_struct {
  int start; 
  int end; 
  vector <string> *datasetNames; 
  int rankedSize; 
  vector <Elem> *rankedData; 
  vector <vector <bool> > *datasets; 
  vector <vector <Elem> > *shuffledRankedVectors;
};
void* run_es_pval(void *inptr) {
  input_struct in = *((input_struct*)(inptr));
  
  int start = in.start;
  int end = in.end;
  
  vector <string> *datasetNames_ptr = in.datasetNames; 
  vector <string> datasetNames = *datasetNames_ptr;
  int rankedSize = in.rankedSize; 
  
  vector <Elem> *rankedData_ptr =  in.rankedData; 
  vector <Elem> rankedData = *rankedData_ptr;
  vector <vector <bool> > *datasets_ptr = in.datasets; 
  
  vector <vector <bool> > datasets = *datasets_ptr;
  vector <vector <Elem> > *shuffledRankedVectors_ptr = in.shuffledRankedVectors;
  vector <vector <Elem> > shuffledRankedVectors = *shuffledRankedVectors_ptr;
  
  for (int i = start; i < end; i++) {
    
    string datasetName = datasetNames[i];
    vector <double> esResults;
    esResults.resize(rankedSize);
    // real computations
    double esi = es(rankedData, rankedSize, datasets[i], esResults);
    // random controls (bootstrapping)
    double pval = getpval (shuffledRankedVectors, rankedSize, datasets[i], esi);
    // print pvalues
    cout << datasetName << "\t" <<   esi << "\t"<< pval<< endl;
  }
}
pthread_t* h1 = new pthread_t;
    pthread_attr_t* atr = new pthread_attr_t;
    pthread_attr_init(atr);
    pthread_attr_setscope(atr,PTHREAD_SCOPE_SYSTEM);
    pthread_create(h1,atr,run_es_pval,(void *) &thisthreadvalues);

However, I observe two main issues :

  • when using directly the function run_es_pval, it takes twice more space in memory (which means that with more than 1 thread, the memory usage should increase again) ... but at least, it's working!
  • when using the function run_es_pval through pthread_create, I obtain a segmentation fault on running.

So my question is thus double

  • How not to increase the memory usage even if I use 1000 threads?
  • Why do I obtain a segmentation fault?

At the moment, the only solution I see would be to store all my vectors as global variables but I don't find it very elegant!

I thank you a lot for the time you spent reading my issue.

Regards,

Sylvain

I am indeed using linux and I may give a try to your idea but I'd like my code to be as portable as possible.

So, I'd like to find another solution!

Thanks for the hint...

Thanks a lot for the precisions! I'll give a try by tomorrow!

Other solutions (linked to pthreads are welcome).

Cheers

This question has already been answered. Start a new discussion instead.