0

OK, now we are getting somewhere. Why is there is established connection hanging around even after 5-10 mins?

java 3610 root 8u IPv6 15008 TCP 192.168.2.15:webcache->63.92.48.60.kmr02-home.net.my:55006 (ESTABLISHED)

Now, three more things:

  • Find out what exactly that connection stands for. AFAICT, it seems to be "socket" opened by the server as part of "accept()" but just make sure it isn't something else again by talking with your admin.
  • Do the same test, but with continuous 10/20 clients. Make sure you *restart* the server before beginning the test. If after the test we have around 10-20 established connections lying around after 10 minutes, we would be damn sure it *is* a leak.
  • If the above step shows a leak, close everything in your last "finally" statement. By everything, I mean the 'r', 'w' & 'socketConn1'. This isn't a very big change and is required because there isn't much correctness check we can do in this small code except close all open resources (which isn't happening now)

Edited by ~s.o.s~: n/a

0

Dear Sos,
Why is it hanging is because I did not close my telnet terminal. So this time I will open the telnet terminal send data and close the terminal ok or you want me to leave the terminal open. Give me some time to prepare for all this. Thank you and really appreciate your time.
Besides that I have increase the ulimit to 2048 and this have done some help today. So what I saw was that at certain time of the day close_wait being shown other time there is zero time_wait. Below is the results of the command at the time when I saw there is increase in lsof the highest it went was 1284 then slowly it started to drop back to around 400 to 500. At the last command you will notice zero close_wait? Is this any indication? Then what I notice at this moment when the too many connection established then the db gives error as 'too many connection'

[root@localhost /]# netstat -nat |grep 9000 | awk '{print $6}' | sort | uniq -c | sort -n
      1 LISTEN
      5 SYN_RECV
      7 LAST_ACK
     16 CLOSE_WAIT
    275 ESTABLISHED
 netstat -nat |grep 9000 | awk '{print $6}' | sort | uniq -c | sort -n
      1 LISTEN
      6 SYN_RECV
      8 LAST_ACK
    147 CLOSE_WAIT
    306 ESTABLISHED
[root@localhost /]# netstat -nat |grep 9000 | awk '{print $6}' | sort | uniq -c | sort -n
      1 LISTEN
      5 SYN_RECV
      8 LAST_ACK
    157 CLOSE_WAIT
    309 ESTABLISHED
netstat -nat |grep 9000 | awk '{print $6}' | sort | uniq -c | sort -n

      1 LISTEN
      3 SYN_RECV
      8 CLOSE_WAIT
     24 LAST_ACK
    324 ESTABLISHED

netstat -nat |grep 9000 | awk '{print $6}' | sort | uniq -c | sort -n
      1 LISTEN
      7 SYN_RECV
     10 LAST_ACK
    163 ESTABLISHED

[root@localhost /]# lsof | grep java | wc -l
1215
[root@localhost /]# lsof | grep java | wc -l
1261
[root@localhost /]# lsof | grep java | wc -l
1284

Edited by mike_2000_17: Fixed formatting

1

Why is it hanging is because I did not close my telnet terminal.

IMO, testing this with a standalone Java client would be much more efficient since in that case you can also test multiple client connections in a loop. The code would be as simple as opening a socket to the server, sending data to server, reading it and closing the socket. This kind of test can also be run from your *local* machine (local windows box).

So this time I will open the telnet terminal send data and close the terminal ok or you want me to leave the terminal open. Give me some time to prepare for all this.

No, when I say test a client connection to your server, you should make sure that is a normal communication. Client sends a request, server responds to that request, client exits and server continues to serve other clients. That way we can be sure that the client has disconnected, the server has moved on to a new client and any more resources used by the server are resource leaks.

Besides that I have increase the ulimit to 2048 and this have done some help today.

Yes, if nothing, it would at least delay the occurrence of that IOException.

[root@localhost /]# netstat -nat |grep 9000 | awk '{print $6}' | sort | uniq -c | sort -n
1 LISTEN
5 SYN_RECV
7 LAST_ACK
16 CLOSE_WAIT
275 ESTABLISHED

BTW, what exactly is this "9000" you are grepping for? Also, 275 *established* connections are too much. Instead of printing out the count, can you verify what are those established connections?

At the last command you will notice zero close_wait? Is this any indication?

Yes, read this. http://j2eedebug.blogspot.com/2008/12/difference-between-closewait-and.html
Also, just to confirm, your client code (the one run by GPS devices) is cleaning up all the resources opened (like Sockets), right?

Then what I notice at this moment when the too many connection established then the db gives error as 'too many connection'

"Too many connections" error given by the code when opening the connection? Is this database on the same machine or a different machine?

Last but not the least, make the small change so that you close that 'r' and 'socketConn1'.

Edited by ~s.o.s~: n/a

Votes + Comments
Good discussion. Learn something new.
0

Dear Sos,
Here is the sample client. Is this code enough. So how do want me to run the 20 client just put a for loop here or run separate one after another. About the 9000 is the port which the server is listerning too. About the database is it on another machine. I guess there is a connection burst to the database am I right?

import java.io.*;
import java.net.*;
import java.util.*;
import java.util.Date;
import java.text.*;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.io.IOException;


public class basicClient {
    public static void main(String[] args) throws IOException {

        Socket echoSocket = null;
        PrintWriter out = null;
        BufferedReader in = null;       
        try {
            echoSocket = new Socket("192.168.2.1", 9000);
            out = new PrintWriter(echoSocket.getOutputStream(), true);
            in = new BufferedReader(new InputStreamReader(echoSocket.getInputStream()));
            String outputToServer = "";//here is the string to the server.
            out.println(outputToServer);
        } catch (UnknownHostException e) {
            System.err.println("Don't know about host.");
            System.exit(1);
        } catch (IOException e) {
            System.err.println("Couldn't get I/O for "
                               + "the connection to.");
            System.exit(1);
        }            
	    finally
	      	{
		        try 
		       	{
			    if ( out != null ) 
			    {
			     out.close();
			    }
			    else 
			    {
			     System.out.println("MyError:out is null in finally close");
			    }
		       
		        }
		        catch(Exception ex){
                 System.out.println("MyError:IOException has been caught in out in finally close");
                 ex.printStackTrace(System.out);
               }
               try 
		       	{
			    if ( echoSocket != null ) 
			    {
			     echoSocket.close();
			    }
			    else 
			    {
			     System.out.println("MyError:echoSocket is null in finally close");
			    }
		       
		        }
		        catch(Exception ex){
                 System.out.println("MyError:Exception has been caught in echoSocket in finally close");
                 ex.printStackTrace(System.out);
               }
	        }	
    }
}
0

Yes, just put the entire client server communication in a loop. Also, make sure you read the output from the server to ensure that you are getting the correct result.

Regarding the database, every database server has a configuration wherein you can configure the maximum number of connections allowed. Plus, it also allows you to view the total number of database connections in use at any given point in time. Look into these two things for your MySQL installation or ask your DBA. This is necessary to determine whether there is a connection leak or the maximum allowed connections is really low for your application.

Before running the test, make sure that the code changes to the server which I suggested are in place (closing 'r' and client socket).

Edited by ~s.o.s~: n/a

0

Dear Sos,
Here is the results. I close the r and socket in server program. I ran 20 clients in a loop.I have counter check yes the client receive both the output from the server before I ask the client to close its connection. What I notice there are some extra lines in the after.txt. Thank you.

Before

java 3449 root cwd DIR 253,0 4096 23987283 /root/Desktop/lsof
java 3449 root rtd DIR 253,0 4096 2 /
java 3449 root txt REG 253,0 47308 30574179 /usr/java/jdk1.6.0_21/bin/java
java 3449 root mem REG 253,0 125736 5604597 /lib/ld-2.5.so
java 3449 root mem REG 253,0 1611564 5604598 /lib/libc-2.5.so
java 3449 root mem REG 253,0 208352 5604605 /lib/libm-2.5.so
java 3449 root mem REG 253,0 16428 5604599 /lib/libdl-2.5.so
java 3449 root mem REG 253,0 129716 5604600 /lib/libpthread-2.5.so
java 3449 root mem REG 253,0 44060 5604601 /lib/librt-2.5.so
java 3449 root mem REG 253,0 101404 5604611 /lib/libnsl-2.5.so
java 3449 root mem REG 253,0 96117 100342 /usr/java/jdk1.6.0_21/jre/lib/i386/libnet.so
java 3449 root mem REG 253,0 732695 100368 /usr/java/jdk1.6.0_21/jre/lib/ext/mysql-connector-java-5.1.12-bin.jar
java 3449 root mem REG 253,0 56417984 29593755 /usr/lib/locale/locale-archive
java 3449 root mem REG 253,0 52079342 100367 /usr/java/jdk1.6.0_21/jre/lib/rt.jar
java 3449 root mem REG 253,0 117867 131660 /usr/java/jdk1.6.0_21/jre/lib/ext/comm.jar
java 3449 root mem REG 253,0 75985 100351 /usr/java/jdk1.6.0_21/jre/lib/i386/libzip.so
java 3449 root mem REG 253,0 46680 5603368 /lib/libnss_files-2.5.so
java 3449 root mem REG 253,0 32768 19759167 /tmp/hsperfdata_root/3449
java 3449 root mem REG 253,0 188954 100327 /usr/java/jdk1.6.0_21/jre/lib/i386/libjava.so
java 3449 root mem REG 253,0 56701 100350 /usr/java/jdk1.6.0_21/jre/lib/i386/libverify.so
java 3449 root mem REG 253,0 9679468 131094 /usr/java/jdk1.6.0_21/jre/lib/i386/server/libjvm.so
java 3449 root mem REG 253,0 38142 131086 /usr/java/jdk1.6.0_21/jre/lib/i386/jli/libjli.so
java 3449 root mem REG 253,0 38694 131090 /usr/java/jdk1.6.0_21/jre/lib/i386/native_threads/libhpi.so
java 3449 root 0u CHR 136,1 3 /dev/pts/1
java 3449 root 1u CHR 136,1 3 /dev/pts/1
java 3449 root 2u CHR 136,1 3 /dev/pts/1
java 3449 root 3r REG 253,0 52079342 100367 /usr/java/jdk1.6.0_21/jre/lib/rt.jar
java 3449 root 4r REG 253,0 732695 100368 /usr/java/jdk1.6.0_21/jre/lib/ext/mysql-connector-java-5.1.12-bin.jar
java 3449 root 5r REG 253,0 117867 131660 /usr/java/jdk1.6.0_21/jre/lib/ext/comm.jar
java 3449 root 6u IPv6 11975 TCP *:cslistener (LISTEN)
java 3449 root 7u sock 0,5 11973 can't identify protocol

After

java 3449 root cwd DIR 253,0 4096 23987283 /root/Desktop/lsof
java 3449 root rtd DIR 253,0 4096 2 /
java 3449 root txt REG 253,0 47308 30574179 /usr/java/jdk1.6.0_21/bin/java
java 3449 root mem REG 253,0 125736 5604597 /lib/ld-2.5.so
java 3449 root mem REG 253,0 1611564 5604598 /lib/libc-2.5.so
java 3449 root mem REG 253,0 208352 5604605 /lib/libm-2.5.so
java 3449 root mem REG 253,0 16428 5604599 /lib/libdl-2.5.so
java 3449 root mem REG 253,0 129716 5604600 /lib/libpthread-2.5.so
java 3449 root mem REG 253,0 44060 5604601 /lib/librt-2.5.so
java 3449 root mem REG 253,0 101404 5604611 /lib/libnsl-2.5.so
java 3449 root mem REG 253,0 1163700 100364 /usr/java/jdk1.6.0_21/jre/lib/resources.jar
java 3449 root mem REG 253,0 96117 100342 /usr/java/jdk1.6.0_21/jre/lib/i386/libnet.so
java 3449 root mem REG 253,0 732695 100368 /usr/java/jdk1.6.0_21/jre/lib/ext/mysql-connector-java-5.1.12-bin.jar
java 3449 root mem REG 253,0 56417984 29593755 /usr/lib/locale/locale-archive
java 3449 root mem REG 253,0 52079342 100367 /usr/java/jdk1.6.0_21/jre/lib/rt.jar
java 3449 root mem REG 253,0 117867 131660 /usr/java/jdk1.6.0_21/jre/lib/ext/comm.jar
java 3449 root mem REG 253,0 75985 100351 /usr/java/jdk1.6.0_21/jre/lib/i386/libzip.so
java 3449 root mem REG 253,0 46680 5603368 /lib/libnss_files-2.5.so
java 3449 root mem REG 253,0 32768 19759167 /tmp/hsperfdata_root/3449
java 3449 root mem REG 253,0 188954 100327 /usr/java/jdk1.6.0_21/jre/lib/i386/libjava.so
java 3449 root mem REG 253,0 56701 100350 /usr/java/jdk1.6.0_21/jre/lib/i386/libverify.so
java 3449 root mem REG 253,0 9679468 131094 /usr/java/jdk1.6.0_21/jre/lib/i386/server/libjvm.so
java 3449 root mem REG 253,0 38142 131086 /usr/java/jdk1.6.0_21/jre/lib/i386/jli/libjli.so
java 3449 root mem REG 253,0 38694 131090 /usr/java/jdk1.6.0_21/jre/lib/i386/native_threads/libhpi.so
java 3449 root 0u CHR 136,1 3 /dev/pts/1
java 3449 root 1u CHR 136,1 3 /dev/pts/1
java 3449 root 2u CHR 136,1 3 /dev/pts/1
java 3449 root 3r REG 253,0 52079342 100367 /usr/java/jdk1.6.0_21/jre/lib/rt.jar
java 3449 root 4r REG 253,0 732695 100368 /usr/java/jdk1.6.0_21/jre/lib/ext/mysql-connector-java-5.1.12-bin.jar
java 3449 root 5r REG 253,0 117867 131660 /usr/java/jdk1.6.0_21/jre/lib/ext/comm.jar
java 3449 root 6u IPv6 11975 TCP *:cslistener (LISTEN)
java 3449 root 7u sock 0,5 11973 can't identify protocol
java 3449 root 10r REG 253,0 1163700 100364 /usr/java/jdk1.6.0_21/jre/lib/resources.jar
java 3449 root 14r CHR 1,8 1890 /dev/random
java 3449 root 17r CHR 1,9 770 /dev/urandom

0

The results look good, at least no open streams/resources lying around. After fixing the code and increasing the file descriptor limit, have you faced any more "too many open files" error?

Also, have you checked the connection limit for your MySQL server or have you stopped getting those "out of connection" exceptions? Are you still getting super high values of CLOSE_WAIT after the clients start hitting the server?

Edited by ~s.o.s~: n/a

0

Dear Sos,
Actually the fixed code I just tested in test box. So far I have not move it to the real server. But so far there is no problem yet but never know when it can happen rite. About mysql I have no time yet to get into it I will once I settle this server. Yes I still have the CLOSE_WAIT after some times when the number of established touch around 240 then I notice there is CLOSE_WAIT.Other time there is none. This is what worries me what shall we do about it? Thank you very much for your time and effort and knowledge I dont know how to say thank you to you.

0

Actually the fixed code I just tested in test box. So far I have not move it to the real server. But so far there is no problem yet but never know when it can happen rite

Yes, but with the increased number of file descriptors, it should take a while for that to happen. If you are redeploying, make sure the latest changes go.

Yes I still have the CLOSE_WAIT after some times when the number of established touch around 240 then I notice there is CLOSE_WAIT.Other time there is none. This is what worries me what shall we do about it?

This needs more investigation. Now that we have seen that the output of lsof is satisfactory, we need to carry on the same exercise on your testing box for CLOSE_WAIT connections.

Restart the application on test box, get the netstat output just after starting the servers (not just the count but the entire netstat -nat output). Now repeat the same client test code which hits the server for 20 times and then again get the netstat output after some time (around 1 min). Paste both the outputs and the difference if you can. I'm not sure why there are CLOSE_WAIT connections here since the code makes sure that all the resources are released after a request is processed.

Thank you very much for your time and effort and knowledge I dont know how to say thank you to you.

You are of course welcome. I just do what I can; if you want to say thanks, just help out soemone else in need and it would be good enough for me. :-)

0

Dear Sos,
Below is the results before and after 1 min. So what can you notice I dont see anything hanging rite? Thank you

Before

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:2208 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:902 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:2207 0.0.0.0:* LISTEN
tcp 0 0 :::9000 :::* LISTEN
tcp 0 0 :::22 :::* LISTEN
tcp 0 0 :::8888 :::* LISTEN
tcp 0 0 :::443 :::* LISTEN
tcp 0 0 ::ffff:192.168.2.15:22 ::ffff:124.13.144.137:3201 ESTABLISHED
tcp 0 0 ::ffff:192.168.2.15:22 ::ffff:124.13.144.137:3206 ESTABLISHED
tcp 0 0 ::1:36978 ::1:38592 TIME_WAIT

After

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:2208 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:902 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:2207 0.0.0.0:* LISTEN
tcp 0 0 :::9000 :::* LISTEN
tcp 0 0 :::22 :::* LISTEN
tcp 0 0 :::8888 :::* LISTEN
tcp 0 0 :::443 :::* LISTEN
tcp 0 0 ::ffff:192.168.2.15:22 ::ffff:124.13.144.137:3201 ESTABLISHED
tcp 0 0 ::ffff:192.168.2.15:22 ::ffff:124.13.144.137:3206 ESTABLISHED
tcp 0 0 ::1:36978 ::1:38592 TIME_WAIT

Edited by ~s.o.s~: n/a

0

Yup, I guess we are all good to go. There are no stray connections here. BTW, have you changed the ulimit on this test machine? Is it still the default 1024?

Now all that remains is for the code on your production box to blow up so that you can deploy this new code. If it doesn't blow up, you are good to go anyways. :-)

0

Dear Sos,
No on the test machine I did not change anything. It is default 1024. Another thing I ran all the test without the wrapper can it be the wrapper is the problem? Another things= just to bring to your attention since I change the ulimit on the server I see I get more connection around 300 plus with out any problem. Can it be this is tuning factor ?

0

Another thing I ran all the test without the wrapper can it be the wrapper is the problem?

It shouldn't be, but can't say; now your next target should be repeating the same tests (you know the usual routine) *with* the wrapper on the test machine.

since I change the ulimit on the server I see I get more connection around 300 plus with out any problem. Can it be this is tuning factor ?

Since the original code was missing an explicit close statement for the "Socket" on the server side, it might be the reason for all those CLOSE_WAIT state sockets. Since the limit is now 2048 (or whatever you have set), the CLOSE_WAIT connections get recycled *before* the entire process runs out of file descriptors (since it is now twice the original). But I'm not really an accomplished sysadmin :-); I'm just thinking of all the possibilities of what *might* have happened.

Edited by ~s.o.s~: n/a

0

Dear Sos,
Ok just to confirm I will set up the wrapper and perform another set of the entire test before and after. I will take some time then I will update you here. I think maybe your logic for the increase of the ulimit could be true too.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.