Hello!
I would like some help with a piece of java code that i'm having problem.

I have to make simultaneous tcp socket connections every x seconds to multiple machines, in order to get something like a status update packet.

I use a Callable thread class, which creates a future task that connects to each machine, sends a query packet, and receives a reply which is returned to the main thread that creates all the callable objects.

My socket connection class is :

public class ClientConnect implements Callable<String> 
{
    Connection con = null;
    Statement st = null;
    ResultSet rs = null;
    String hostipp, hostnamee; 
    ClientConnect(String hostname, String hostip)
    {
        hostnamee=hostname;
        hostipp = hostip;
    }
    @Override
    public String call() throws Exception 
    {
        return GetData();
    }
    private String GetData()
    {
            Socket so = new Socket();
            SocketAddress sa =  null;
            PrintWriter out = null;
            BufferedReader in = null;
        try 
        {
            sa = new InetSocketAddress(InetAddress.getByName(hostipp), 2223);
        } 
        catch (UnknownHostException e1) 
        {
            e1.printStackTrace();
        }
        try 
        {
            so.connect(sa, 10000);

            out = new PrintWriter(so.getOutputStream(), true);
            out.println("\1IDC_UPDATE\1");
            in = new BufferedReader(new InputStreamReader(so.getInputStream()));
            String [] response = in.readLine().split("\1");             
            out.close();in.close();so.close(); so = null;

            try{
                Integer.parseInt(response[2]);
            }
            catch(NumberFormatException e)
            {
                System.out.println("Number format exception");
                return hostnamee + "|-1" ;
            }

            return hostnamee + "|" + response[2];

        } 
        catch (IOException e) 
        {
            try {
                if(out!=null)out.close();
                if(in!=null)in.close();
                so.close();so = null;
                return hostnamee + "|-1" ;
            } catch (IOException e1) {
                // TODO Auto-generated catch block
                return hostnamee + "|-1" ;
            }

        }

        }
}

And this is the way i create a pool of threads in my main class :

private void StartThreadPool()
{
    ExecutorService pool = Executors.newFixedThreadPool(30);
    List<Future<String>> list = new ArrayList<Future<String>>();
    for (Map.Entry<String, String> entry : pc_nameip.entrySet()) 
    {
        Callable<String> worker = new ClientConnect(entry.getKey(),entry.getValue());
        Future<String> submit = pool.submit(worker);
        list.add(submit);
    }
    for (Future<String> future : list) {
        try {
            String threadresult;
            threadresult = future.get();
            //........ PROCESS DATA HERE!..........//
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }
    }       
}

The pc_nameip map contains <hostname, hostip> values and for every entry i create a ClientConnect thread object.

My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.

If i force the list to contain a single working pc, i have no problem.

The timeouts are pretty random, no clue what's causing them.

I also faced a complete freeze on the future.get(); line a couple of times.

Am i missing something or could it be an os network restriction problem?
I am testing this code on windows xp sp3.

p.s. all machines are in a local network

Thanks in advance!

Edited 3 Years Ago by ktsangop: Clarification

Minor comments aside:

i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.

You mean you get a TimeOutException before 10 secs on some live PCs?

If i force the list to contain a single working pc, i have no problem.

What happens if the list contains one non-working PC?

I also faced a complete freeze on the future.get(); line a couple of times.

Shouldn't be happening unless the client or the server don't coordinate properly and end up waiting infinitely for some data packet to arrive.

Are these servers again written by you? Do all the machines run the same server software? Also, I believe your code needs a bit more logging. Log something when the connection is successful, log the raw response from the server etc. That way you know exactly where the code failed.

Edited 3 Years Ago by ~s.o.s~

You mean you get a TimeOutException before 10 secs on some live PCs?

No, i get a normal timeout after 10 seconds, even though the remote socket is accepting connections.

What happens if the list contains one non-working PC?

Well i'll have to look at it and post back.
EDIT: Just a normal timeout after 10 seconds with a typical timoutexception.

The servers are written by me, but have been working for more than 2 years, without any problem. I'm trying to add extra functionality to my client with those extra threads.

Also, I believe your code needs a bit more logging

I've removed them from my post (to keep it cleaner) but they exist on my source file.

Thanks, i will post back some more results soon enough.

Edited 3 Years Ago by ktsangop: typo/update

Using SocketSniff i tried to monitor my remote server to see if the packets are arriving there correctly.

What i got was something like this:
(My java app runs on 192.168.1.7, the remote server on 192.168.1.159)
A normal send/receive call (data are not shown here but are correct):

==================================================
Socket            : 0x000006A0
Index             : 3
Type              : TCP
Local Address     : 192.168.1.159
Local Port        : 2223
Remote Address    : 192.168.1.7
Remote Port       : 6376
Send Calls        : 1
Receive Calls     : 1
Sent              : 22
Received          : 12
Closed            : Yes
==================================================

A call which timed out (no data exist here) :

==================================================
Socket            : 0x000006E0
Index             : 5
Type              : TCP
Local Address     : 0.0.0.0
Local Port        : 2223
Remote Address    : 192.168.1.7
Remote Port       : 6403
Send Calls        : 0
Receive Calls     : 1
Sent              : 0
Received          : 0
Closed            : Yes
==================================================

So the obvious difference, is that the local address is presented as 0.0.0.0 . Is this normal?

Could anybody see anything abnormal that i can't?

Thanks.

Edited 3 Years Ago by ktsangop

The 0.0.0.0 is indeed problematic since it means that the server is not connected to the network. Are you sure you are not using the loopback adapter (localhost or 127.0.0.1) when starting the server? Are all the servers spawned in the same manner?

The server is in C and it uses INADDR_ANY as local address which i always thought was 127.0.0.1 right?
Could this be problematic?

The servers are all identical.

The server is in C and it uses INADDR_ANY as local address which i always thought was 127.0.0.1 right?

I don't do C but after reading a bit, it seem that INADDR_ANY binds to all available interfaces on the host which should ideally make the server available to any machines on the network. If you are sure that all the servers are "started" in the same way and all the configurations are same, I see no reason why the some hosts should work while other don't. Really strange...

Thank you ~s.o.s~ for your effort.
I tried almost everything the past 3 days and i too cannot figure something out. I will try to setup another dummy server, to test this.
Seems impossible for an application that has been running since 2010, to show such a strange behaviour, but everything else has been ruled out. It might be the server...
:-(

UPDATE:

After creating two new server machines, and keeping one that was getting a lot of timeouts, i have the following results :

For 100 thread runs over 20 minutes :

NEW_SERVER1 : 99 successful connections/ 1 timeouts
NEW_SERVER2 : 94 successful connections/ 6 timeouts
OLD_SERVER  : 57 successful connections/ 43 timeouts

Other info :
- I experienced a JRE crash (EXCEPTION_ACCESS_VIOLATION (0xc0000005)) once and had to restart the application.
- I noticed that while the app was running my network connection was struggling as i was browsing the internet. I have no idea if this is expected but i think my having at MAX 15 threads is not that much.

So, fisrt of all my old servers had some kind of problem. No idea what that was, since my new servers were created from the same OS image.

Secondly, although the timeout percentage has dropped dramatically, i still think it is uncommon to get even one timeout in a small LAN like ours. But this could be a server's application part problem.

Finally my point of view is that, apart from the old server's problem (i still cannot beleive i lost so much time with that!), there must be either a server app bug, or a JDK related bug (since i experienced that JRE crash).

p.s. I use Eclipse as IDE and my JRE is the latest.

If any of the above ring any bells to you, please comment.
Thank you.

This article has been dead for over six months. Start a new discussion instead.