Hello All,

I was experimenting with creating a large number of threads and checking how fast it runs on my machine.(details provided later)
This is the code i'm using:

#include <iostream>
#include <thread>
#include <ctime>

#define THREADED 1

static const int num_threads = 100;
static int mult_array[100][1000000];

void multResult(int *arr)
{
    for (int i = 0; i < 1000000; i++)
        *(arr+(1000000-1)) += (*arr+i);
}

int main() 
{
    std::thread t[num_threads];

    for (int i = 0; i < 100; i++)
        for (int j = 0; j < 1000000; j++)
            mult_array[i][j] = 1;

    clock_t begin = clock();    
    {

#ifdef THREADED
        for (int i = 0; i < num_threads; ++i) {
            t[i] = std::thread(multResult, mult_array[i]);
        }    

        for (int i = 0; i < num_threads; ++i) {
            t[i].join();        
        }

        clock_t end = clock();  
        double traceTime = double(end - begin) / CLOCKS_PER_SEC;
        std::cout << "\nTime:" << traceTime;    
#else        
        for (int i = 0; i < 100; i++)
            multResult(mult_array[i]);

        clock_t end = clock();  
        double traceTime = double(end - begin) / CLOCKS_PER_SEC;
        std::cout << "\nTime:" << traceTime;        
#endif          

}    
    return 0;
}

I was assuming the threaded version to be at least faster than the non-threaded version. But, i see the same time reported! About
0.23 seconds. Since i have multiple cores on my machine, i was expecting it to be much faster.

I'm not sure if my measurement is wrong, but i'm confused as to why the times are almost the same(sometimes the threaded version
actually slightly slower!)

Could anyone please point out what i'm doing wrong?
Thanks!

Details:
Processor: Intel i5 2300(quad core)
OS: Ubuntu 12.04
Compiler: GCC 4.6.3

Firstly I have to say that I'm quite the novice in C++, but I believe my input to be
relevant, and might have something to do with your question.

It seems to me that the results you are really getting there is how fast you can create and destroy threads, which is what is taking the time in your test.

For a real test, you would need to do is have a much longer test period, and the work of your test to be carried out in multiple threads vs the same amount of work to be carried out in a single thread.

I hope I made sense.

The thread creation is what takes the time.

Edited 3 Years Ago by Suzie999

I could be mistaken but if you have a quad core processor you should be able to run 4 threads at the same time. With the way you have your code written right now you start all 100 threads and then right after that you call each thread and wait for it to finish before you go to the next thread. If you have a lot of othere processes running in your system then this could effectivly keep everything running on a single core. You might be able to pull the threas using joinable() but that could also lead to problems.

@Suzie999:
Thanks for the input. Ah, yes, i didn't think of the time for creation of threads. Now, i tried with more computations. I added a nested for in multResult(), thus leading the times to be > 200 seconds.
The times i measured are:
Threaded: 254.63
Non-Threaded: 254.65

which, i feel is too less a difference, and looks like everything just ran on a single core :(

@NathanOliver:
Thanks! I created 100 threads assuming the distribution of threads would happen across multiple cores. Also, i have a small question.

With the way you have your code written right now you start all 100 threads and then right after that you call each thread and wait for it to finish before you go to the next thread

I'm not sure i understand you correctly. Doesn't the main() thread wait for the completion of all other threads? And not that each thread waits for one thread's completion?
Please correct me if i'm wrong.

Thanks!

In your second for loop you are calling join(). join() blocks the current thread and waits for thread you called join on to complete. Im pretty sure this is where you are getting slowed down.

Edited 3 Years Ago by NathanOliver

@NathanOliver:

Ah, it just looks like the code to calculate time didn't work fine. i measured using a normal clock.(on my phone).

For the same calculations, i got these results:
Threaded: ~8 seconds
Non_Threaded: ~27 seconds

This is way better!
Also, i saw the cores as follows:

Threaded vs Non-Threaded

threadednon_threaded

So, it is some problem with my time measurement! I'm not sure what though.
Thanks!

I wish I saw that this thread earlier, because I knew the issue right away when I saw "clock()" in your code. The clock() function basically returns the number of clock ticks consumed by your program (overall process). Because each core has its own ticking clock, well, if you have one thread running on one core and another running on another core, since both threads belong to the same process, the "clock ticks" counter for the process is the addition of the time spent on both cores. In other words, your initial test program was doomed to only produce nearly equal times with slightly higher times for the threaded version because of the overhead of creating all those threads.

To keep track of time, you need to use a "real" time function, such as time(), std::system_clock, std::high_resolution_clock, or Boost.Date-Time.

If you have 4 cores, you should expect the time to be 4 times less. (from ~27s to ~8s seems reasonable)

This question has already been answered. Start a new discussion instead.