I have written a program for a linux box running on a davinci (arm processor). The program communicates with a server over the lan, calls some scripts, and delays different amounts of time depending on what the server tells it to do. It all works great, most of the time. every once in a while though, (2% of the machines, once every 16 hours) will freeze at the delay command.

sleep(DisplayDuration);

DisplayDuration is an int

It seems when this happens, the command isnt just blocking, the whole phone locks up. Does not respond to pings, does not respond to serial input. It must be power cycled to recover.
Has anyone else seen this? The percentage is very low, but its too high for production, and we need to know if this is a software problem that we can fix or a genuine hardware failure.

So, what I'm asking is: is there something wrong with the C function "sleep(int)" in linux? Has anyone else experienced a device freezing when calling this function? can someone explain why this happens, or suggest a solution?

Thanks.

Man has this bug

BUGS
sleep() may be implemented using SIGALRM; mixing calls to alarm(2) and
sleep() is a bad idea.

Using longjmp(3) from a signal handler or modifying the handling of
SIGALRM while sleeping will cause undefined results.

I am not sure if this will help but
Could it be the case that the int that you are passing to the sleep function is 2 big (the value has over flown )

I doubt it, the server is configured to return a 5. And plus, it works like over 99% of the time. The packet structure is always the same, so I dont think that could be the problem, unless of course something is wrong on the server and its resending the last message which gets parsed wrong or not parsed and an uninitialized variable gets passed to sleep(). but that would only cause my program to block, all other processes in the device lock up too.
Also, I personally never call alarm() or longjmp(), and definitely never at the same time as sleep, my program is not multi-threaded, so thats pretty much out of the question. I have no idea what other processes are doing though. There shouldnt be much else running, but the programs that are running at the same time were written by the engineering department, so I have no clue what they are doing. also, I'm not sure if it would be an issue even if another process called one of those functions.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.