Hi All,I'm using a C++ client and server of different language.Using sockets for communication.After 10 - 15 hours of successful exchange of information a problem has occurred.Client seemed to send messages to server, but server didn't respond.After few hours, the client was killed.Suddenly, the server started dumping the messages towards client.I'm puzzled with this.What could be the problem?Traceroute between the machines hosting client & server has not indicated any problem with the connection.Any pointers will be of great help.Thanks,amelie

Hi All,I'm using a C++ client and server of different language.Using sockets for communication.After 10 - 15 hours of successful exchange of information a problem has occurred.Client seemed to send messages to server, but server didn't respond.After few hours, the client was killed.Suddenly, the server started dumping the messages towards client.I'm puzzled with this.What could be the problem?Traceroute between the machines hosting client & server has not indicated any problem with the connection.Any pointers will be of great help.Thanks,amelie

It sounds to me that it isn't a connection problem. But to confirm this you could try to execute the server and client on the same system (i.e. localhost). If the same problem occurs than it is either your server or client. If this is the case, you try to debug your client and server, here is a good debugger that I've been using: http://www.ollydbg.de/

If it is either, was an exception thrown? Maybe somewhere in your client and/or server, slowly but surely there is memor leaks. Just to make sure, check all you allocations and array usage (i.e. buffers). Also, if you're using Winsock version 1, I'd recommend converting to Winsock version 2 or WinSock 2. Most of the functions still are valid, just pass 2,2 to MAKEWORD when you initialize WSA and check for depricated functions and etc.

Good luck, LamaBot

Hi,If server does not respond to factory within a time limit, client shall throw timeout exception. But in this case such exception was not observed.Thanks,amelie

Sounds more like a bug in the application rather than a general situation/error (faced by many/socket API error).
Check if this happened when a particular type of msg was sent by client (or server).

>> Suddenly, the server started dumping the messages towards client.
My guess would be that server had already sent the messages but due to some bug client didn't retrieve them (read them from socket). Finally when the client restarted it read all the messages from the socket (because during the restart the problem with client was resolved).

>> but surely there is memor leaks.
I'm not too sure abt that. But if you wish to check use purify or Sun Workshop (enable memory checks). None are free-ware.

Without knowing the actual socket API that you are using, or the kind of socket that you are using, it is difficult to pin point the exact error. But most probably like the above replies say, a memory leak seems to be the best guess. The tools that you can use to detect the memory leak also depend on the OS you are using, so without that it is hard to guide you on that aspect. In case you are using windows, without going for any extra tools, try looking at the Task Manager-->Performance--> Memory Usage window. As time passes, if the memory usage line increases and stays at a maximum point for a while before the client crashes, then you can be sure that it is a memory leak.

By the way, come to think of it, are you writing the data that you recieve into the hard disk, or are you keeping it all in memory? If you are keeping it in memory, then your system must have run out of memory. Memory leak or not.

Posting the OS, Socket API and code if possible will certainly help in getting more accurate answers.

>> but surely there is memor leaks.
I'm not too sure abt that. But if you wish to check use purify or Sun Workshop (enable memory checks). None are free-ware.

What if when she buffers the data read from the socket, it overwrites memory on the stack perhaps causing the instruction pointer to return to a false address after a function call, or to an address containing overwritten data, or in data being overwritten in the heap segement? It might even be a factor of what is ultimately causing the problem. As you deceptively failed to mention in the quote, it MIGHT be slowly but surely causing memory leaks, because I haven't seen the source code(s). I'm merely just trying to give her ideas that she perhaps hadn't thought of yet.

Good luck amelie, LamaBot

What if when she buffers the data read from the socket, it overwrites memory on the stack perhaps causing the instruction pointer to return to a false address after a function call, or to an address containing overwritten data, or in data being overwritten in the heap segment? It might even be a factor of what is ultimately causing the problem.

Precisely. Just that what you just described is more of memory corruption than memory leak. Anyway, my logic in thinking it won't be a memory leak is simply that if there is a memory leak, client would keep working normally till it crashes. Whereas in the scenario described client stopped working for a few hours and then crashed.

As you deceptively failed to mention in the quote, it MIGHT be slowly but surely causing memory leaks, because I haven't seen the source code(s). I'm merely just trying to give her ideas that she perhaps hadn't thought of yet.

That was completely unintentional. :)

Oh.... sorry. Anyway, when she say's "kills" that could mean crash, but depends. My logic is that if there is a point in a program where an input string or etc. is not checked against its designated buffer then it kind of overloads or overwrites or overflows (as in buffer overflow) if it is larger, or it might overwrite the end-of-string char. which makes one wonder what a print function will output. Think of a pipe with an unrigid cap, if you pump water with a certain amount of pressure water may start leaking out the cap, slowly perhaps, then might cause the cap to burst open. The whole scenario is a memory leak, not just when you allocate data on the heap and lose the pointer to it.

LamaBot

Hi,Thanks a lot for all your suggestions.I've started the test again, with client & server on the same host.I've to wait for the result.-----------------------------------------------------Here is some information, that has been discussed in the above mail threads.Now, regarding the environment, I'm using OS- Solaris, AF_INET family, SOCK_STREAM sockets,Posix threads.Other info is, neither the client nor the server crashed.Client has been killed after few hours of inactivity, then the gush of messages from server has occurred.>>By the way, come to think of it, are you writing the data that you recieve into the hard disk, or are you keeping it all in memory? Next the data read from the socket is being displayed on the screen. Screen output is redirected to a file. Socket data is not directly linked to any file on the hard disk.>>If you are keeping it in memory, then your system must have run out of memory. Memory leak or not. Could you please explain this?You mean data written to the socket would cross the RAM capacity?-----------------------------------------------------I've 2 threads in the client, reader & wait for the message from the server. In failure scenarios, time out has not occurred.That means, Time out thread has been either blocked or the creation itself has not occurred.Is there any max limit on the threads that can be created from the application?Thanks a lot for your inputs.-amelie

>>If you are keeping it in memory, then your system must have run out of memory. Memory leak or not. Could you please explain this?

What LamaBot meant was if you're keeping data in process memory (say some local/global/member variables) then as new messages keep coming space/memory occupied by all these variable will increse. Sooner or later this will go above the allowed (by OS per process) memory limit and proc will be killed.

Is there any max limit on the threads that can be created from the application?

Yes. But it's much higher than 2, so that won't be a problem for you. FYI see "man getrlimit"

Client has been killed after few hours of inactivity, then the gush of messages from server has occurred.

If indeed the client was killed after a few hours of inactivity. You can use following to get more info.
1. Which signal killed the process? Once you know which signal see "man signals".
2. Call stack at the time of killing (pstack core)
3. While your client is 'idle/inactive' (before it is killed, running but doing nothing) you can periodically check the pstack to see why/where-in-code is the thread waiting. (pstack <pid>)

Last but not the least, I still say the same thing, it seems like memory corruption. Run your code under purify.

What LamaBot meant was if you're keeping data in process memory (say some local/global/member variables) then as new messages keep coming space/memory occupied by all these variable will increse.

The message is inserted into STL list.
Sooner or later this will go above the allowed (by OS per process) memory limit and proc will be killed.
How to detect this?
My Client was not accidentally killed, I've killed it after few hours of inactivity.Yes. But it's much higher than 2, so that won't be a problem for you. FYI see &quot;man getrlimit&quot;

Infact, I'm creating 2 threads per object & there are other threads that will be created to serve temporary purpose like timing out etc.
Last but not the least, I still say the same thing, it seems like memory corruption. Run your code under purify.
It has been through Purify few times, earlier.

This article has been dead for over six months. Start a new discussion instead.