Hi everyone

I've been looking into boost and c++ reference, as well as some googling, and I cannot find a way to get a thread's execution time, all options I found are related to system time and I need only the execution time for the thread (inside the thread's context). Anyone knows how to get this?

Thanks in advance

Recommended Answers

All 18 Replies

depends on the thread, but many times execution time is not measurable because it happens too quickly.

But generally you would call time functions when the thread starts and again when it ends, then execution time is the difference between those two times. Call clock() to get time in milliseconds (for most operating systems). On MS-Windows you can use QueryPerformanceCounter()

You are in a multi-threaded environment. Other threads in your application are taking time slices. Other applications are pre-empting and taking time slices. What I usually do is bump my application priority up to real time, set that's thread priority to real time., and then call the function N=10,000 times or so and divide by N to get an approximate time.

The RDTSC assembly instruction can be called before and after and the difference taken!

static uint tclkl, tclkh;

void CpuDelaySet(void)
{
	__asm {
         rdtsc                       ; Read time-stamp counter

         mov      tclkl,eax          ; Save low 32bits
         mov      tclkh,edx          ; Save high 32bits
	};
}


uint CpuDelayCalc(void)
{
	uint v;

	__asm {
         rdtsc                       ; Read time-stamp counter

         sub      eax,tclkl
         sbb      edx,tclkh          ; edx:eax = total elapsed interval
		 mov	v,eax
	};

	return v;
}

Like wildgoose pointed out, in a multi-threaded environment if one uses the system functions like clock or time will not be able to measure the thread's execution time but only the global (process) time.

Wildgoose, I can't understand your method. Is it possible to explain a bit more?

Tks

In its simplest form....

uint nRepeat = 10000;
uint nTotTime;
double fTime;

CpuDelaySet();

for (uint n = 0; n < nRepeat; n++)
{
        vD = MyFunc( vA, vB );
}
nTotTime = CpuDelayCalc();

nOnce = nTotTime / nRepeat;
            or
nOnce = (nTotTime + (nRepeat>>1)) / nRepeat;  
            or
fTime = ((double)nTotTime) / ((double)nRepeat);

Note that I'm only returning a 32-bit value so the idea is to not overflow it! 0xffffffff ( 4,294,967,295)

The idea is to repeat the same test N times. In this example I chose 10,000 times. But I recommend to start around 1000 and work up until total time doesn't exceed a 32-bit unsigned value!

This is essentially an average result. You are still being pre-empted, then numbers will be all over the place each run. But they'll be mostly in the ball park. I use this technique to see if optimizations to my function make the function's time increase or decrease!

The RDTSC instruction is a 64-bit value, which contains the number of clock cycles that have elapsed and is accessible from the Application Ring meaning non-system software have access to it! Win32 hasn't blocked access so it is available for reading. It is set to zero at processor reset and merely rolls over to zero when the high count is reached!

Hi,

Ok, I don't need to totally understand how you do it to use it, but my purposes are different from benchmarking. This averaging is a very clever idea indeed, but I will not run the same function for N times. I have N threads running the same function and every thread needs to know for how long it has executed already. I'm thinking that this method is not applicable to such case...

AFAIK it is not possible to time a specific thread as if that thread were the only thing running on the operating system (windows in this case). One reason for that is because the os will perform thousands of context switches while the thread is running, so any time you try to calculate will include the time all those other things are doing as well. So any profiling you attempt will only be approximates, not absolutes.

Anything you try will be ballpark. One work around is to run test like I said, then encode into your program the approximate average that gets added to a bucket for each worker thread doing the same task. It won't be accurate, but will be in the ballpark. I think its the best you're going to do.

But keep in mind that it won't be accurate. Don't forget to get the samplings in a release build with your optimization turned on but make sure it is outside the scope or the optimizer will re-arrange your code and the tracking tags won't be where you think they are!

If you're trying to monitor worker thread usage, then keeping a task count per thread would be just as effective!

You mentioned several threads doing the same job thus that indicates worker threads. I'm assuming you have a number crunching task so find the number of CPU's you have then multiply by two. That is the number of worker threads you'll need for that one task to be most efficient and to run your processor dry. You can request which processor a thread is spawned from but the processor decides. Though you can override it. Over request your threads the read which CPU it is running on. Once you have the distribution you want, then release the ones you don't want! Kind of crude but its the only way I know to override the Operating System logic. Because as I mentioned, you're only requesting a CPU, that doesn't mean it has to give it to you!

A task counter would not solve the problem because what I want is the workers to stop after x elapsed time.

In the meanwhile I found this thread:
http://www.linuxforums.org/forum/linux-programming-scripting/101371-pthreads-thread-time-vs-process-time.html

which obviously is for linux only. I don't want to have to read the /proc... file every time I need the execution time so I am trying to use clock_gettime(...) method. There is still the issue system/user time. Assuming this will not make a difference for me I tried then the posix method, but sometimes the second value read is bigger than the first (diff is negative), which does not make much sense. I tried to find the reference for this function in order to know more details on why this happens but I didn't find it. Any idea (or link)?

From what I could understand there is no such (or similar) thing for windows, so I am still limited, since I'm building a supposedly cross-platform library...

When you spawn the thread you pass a void user value!
Why not pass a boolean pointer! Loop while it is set and when false fall out of the thread loop. Don't forget your thread exit function for proper cleanup.

void MyWorker( void *foo )
{
    bool *pbSignal = (bool *)foo;

   while (foo)
   {


   }

}

Or pass in a local index for that worker thread.

void MyWorker( void *foo )
{
    uint idx = (uint)foo;

   while ( gbAppActive == true )
   {


   }

   gThreadActive[ idx ] = false;  // Tell root that this thread is shut down

}

In application root cleanup.
Loop for up until all gThreadActive[] become clear or the clock runs out, whichever comes first.

If you use semaphores, use a single gThreadActive parameter and merely clear the bit!

LOCK();
gThreadActive ^= 1 << idx;
UNLOCK();

When you spawn the thread you pass a void user value!
Why not pass a boolean pointer!

When you spawn the thread you pass a void pointer, which means you pass a pointer to something (or nothing). I am already using this to pass some arguments to the threads.
Wildgoose, I don't get what this last post has to offer to the discussion.

I don't understand what's so difficult about thread timing. If you want the thread to quit after a specified amount of time, then just compare current time with original thread entry time and exit when that difference is greater than some pre-determined amount of time.

I don't understand what's so difficult about thread timing. If you want the thread to quit after a specified amount of time, then just compare current time with original thread entry time and exit when that difference is greater than some pre-determined amount of time.

I don't understand what's so difficult about multi-threading for you. Threads yield execution to other threads and gain it back and yield again... etc. When they yeld they are not running anymore so any "raw" difference between starting time and current time is *not* what you want. Stop insisting please... I would flag your last post as a bad post...

NOTE: as mentioned in a previous post this issue can be solved in UNIX by using clock_gettime with the flag CLOCK_THREAD_CPUTIME_ID. I don't even need to save the start point because each thread's clock is set to 0 when it starts. I made the tests, 10 threads executing for 50 seconds in a core-duo processor, giving total time ~= 250 seconds, perfect for me!
I am still looking for a solution which can be applied cross-platform, or at least a similar solution for windows.

Thanks, this looks like what I want for windows. As soon as I have tried it, I will post comments on it.

BOOL WINAPI GetThreadTimes(
__in HANDLE hThread,
__out LPFILETIME lpCreationTime,
__out LPFILETIME lpExitTime,
__out LPFILETIME lpKernelTime,
__out LPFILETIME lpUserTime
);

http://msdn.microsoft.com/en-us/library/ms683237(v=vs.85).aspx

It is only updated on context switch so a Sleep(1) is needed before calling it. THERE ARE OTHER CAVEATS so that it may not be suitable for measuring very small increments.

IMPORTANT CAVEATS here!

http://blog.kalmbachnet.de/?postid=28

Vista provides

BOOL WINAPI QueryThreadCycleTime(
__in HANDLE ThreadHandle,
__out PULONG64 CycleTime
);

http://msdn.microsoft.com/en-us/library/ms684943(v=vs.85).aspx

I don't know what caveats apply to this call.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void thread_function( void* arg )
{
   time_t   start, finish;
   long loop;
   double   result, elapsed_time;
  
   time( &start );
    
   // Your thread code

   time( &finish );

   elapsed_time = difftime( finish, start );
   printf( "\nThread takes %6.0f seconds.\n", elapsed_time );
}

The code counts for the execution time of one thread, it doesn't matter if the thread is active or not.

I did some testing on this. As suggested in other posts, these aren't all that accurate, they tend to run low due to the thread accounting, context switches, etc. They are much more useful to locate long-running piggy threads.

QueryPerformanceCounter is the way to go, or RDTSC, both as noted here. QPC uses RDTSC, with some code around it to make sure the value is reasonable.

http://en.wikipedia.org/wiki/Time_Stamp_Counter

The Performance counter is very high res, you don't need to run something 10,000 times. Unless it is really short you can measure it directly. Just watch for context switches in the middle of the measurement. Those will be obvious because the elapsed time will jump to milliseconds and beyond. I routinely measure code timing.

BOOL WINAPI GetThreadTimes(
__in HANDLE hThread,
__out LPFILETIME lpCreationTime,
__out LPFILETIME lpExitTime,
__out LPFILETIME lpKernelTime,
__out LPFILETIME lpUserTime
);

http://msdn.microsoft.com/en-us/library/ms683237(v=vs.85).aspx

It is only updated on context switch so a Sleep(1) is needed before calling it. THERE ARE OTHER CAVEATS so that it may not be suitable for measuring very small increments.

IMPORTANT CAVEATS here!

http://blog.kalmbachnet.de/?postid=28

Vista provides

BOOL WINAPI QueryThreadCycleTime(
__in HANDLE ThreadHandle,
__out PULONG64 CycleTime
);

http://msdn.microsoft.com/en-us/library/ms684943(v=vs.85).aspx

I don't know what caveats apply to this call.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.