Hi Dw

I have a multi-threaded server that connects thousands of clients. The clients sends in data that the server must write to a text file. As there are a thousands of clients sending in data the server has about 60 texts files but the data sent by the client to the server gives a server a clue on which text file to write to. Suppose in the server side I have these two text files: "test1.txt" and "test2.txt"

Let's say the client sends <test2, testing> the server knows that it should write "testing" on test2.txt so now as there are a thousand clients it can happen that maybe 50 clients sends the same data to the server at the same time that means the threads will write to the same file simultaneously.

So how can I make it handle this all the data written by all the threads at the same time be saved to a file meaning allowing multi-threads to write to the same file simultaneously without overriding the existing text?

Thanks in advance.

Sample code of this server: www.daniweb.com/software-development/java/threads/481619/server-split-message-and-save-data-on-a-text-file

Recommended Answers

All 43 Replies

Not overwriting the existing text is the least of your problems. Opening the file with the append flag specified solves the issue of overwriting. That makes sure the new data is added to the end of the file. In most cases it is a boolean added to the end of the write statement.
Concurrency is by far the bigger issue. IO is slow and you could have a lot of threads all waiting for the threads before them to open, write and close the file. At least, if that is all each thread does, no other operations, you should be spared from deadlocks.

Well I don't think it will be slow because it only writes 3bytes data to a text file, it just write 1 nothing else is written to the text file but each 1 is written on it own line.

You could:
1. Have each thread synchronise its file I/O using the File object as the lock. This is simple, but may block the client/server connections for too long.
2. As each client sends it data, append that data to an in-memory queue (one queue for each file), then have one thread for each file taking the data from its queue and writing it to file. Apart from never blocking the client, this has the huge advantage of allowing the transactions for each file to be batched up for more efficient I/O.

Opening the file with the append flag specified solves the issue of overwriting. That makes sure the new data is added to the end of the file.

Be careful, the idea does not guarantee to prevent the concurrentcy issue. Appending a file is not a solution in anyway (regardless how big the data is being written to the file). There is still a possibility that there are two or more threads opening the same file to append. As a result, the data from the last thread that successfully writes to the file is saved, but not others.

I would suggest the #2 from JamesCherrill solution. The queue would be able to deal with the 'simultaneous' condition per say. I am not sure whether the Queue class in Java library already synchonizes the 'add' data to queue for you. If not, you may need to implement one.

The obvious choice is a LinkedBlockingQueue. As the API doc says:

BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control. ... a BlockingQueue can safely be used with multiple producers and multiple consumers.

so there's no need for any additional synchronisation when you add or remove items from a BlockingQueue. Neat, don't you think?

I see. Yes, it is neat. ;)

Ok thanks but I'm way lost on how I will go about doing this.

Create a LinkedBlockingQueue for each file. When a client sends data, just add it to the appropriate queue.
For each queue, create a thread that loops taking the top item from the queue and writing it to its file. (take() will simply wait of the queue is empty, and return when someone puts something into the queue).

For starters, just do a little proof-of-conecpt, with two hard-wired files. Get that working, then generalise.

When you say "Create a LinkedBlockingQueue for each file" you mean as I have almost 1000 files I have to create that for each file or?

And also what I'm not sure or clear about here is that when I use the "Producer Customer" which is "LinkedBlockingQueue" if you will remember on my server the client send one message which has to messages and taking in to account that I couldn't manage to take the data that was split from each message potion and assign it to its own unique variable like fyear for the first potion year, what I did was I took an advantage of the split and while it had split the first potion message data I assign the data to the associated variables and I save the data to their corresponding files and the system loops and once it done split of the first potion it then move to second potion as it loops it uses the same variable that it used in the first message potion but it now save the second message potion data to the variables correctly and save it to the corresponding files. So what not clear now is that how will I save or use the "Producer Customer" in this situation and if it happens to be able to load data to the memory how will I retrieve it because it loops it? Please refer to this: http://www.daniweb.com/software-development/java/threads/481619/server-split-message-and-save-data-on-a-text-file

Thank you.

I am not certain how you design your program... Is it as below?

/*
1)
 Client1    Client2    Client3   ...    Clientn
    |          |          |                |
    v          v          v                v
MsgThread  MsgThread  MsgThread  ...   MsgThread
    \          \          /                /
     \          \        /                /
       ----------------------------------
                     |
                     v
              Producer Customer


                     OR

2)
 Client1    Client2    Client3   ...    Clientn
    \          \          /                /
     \          \        /                /
       ----------------------------------
                     |
                     v
              Producer Customer
*/

I'm not clear with that as I'm using a mobile at the moment, your post is a mixture characters. To simplify at the moment I haven't added the "Producer Customer" at the moment. I've used your code to split the messages and I've also used "JamesCherrill's" code for multi-threading I haven't made any changes at the moment because I'm not sure where to add the "Producer Customer" because I'm also not too familiar with java the only thing that forced me to change from VB.NET to Java was the ability to support multi-threading then I was forced to try and use java. But I think it would be better if the server do receive the data from clients and if for instance 10 clients sends the same message at the same time the server will receive the data and sort it in memory by adding all the related data like this:
1
1
1
1
1
1
1
1
1
1
And once that has been sorted it can be written to the file, while it write data to the file the memory receive another data, as the data is being sorted or once the sorting and writing to file is done the server will send "done" to each and every client that it data that it has sent has been processed (sorted) and written to the file. But if any suggestion which may simplify this and also allow the server to write each and every data that the clients send with not even one data unwritten to the file. I think it will be better if the server sends a "done" respond message to the client if the client data has been written to the file, note that this is a multi-thread and a server may send "failed" response if there were any difficulties a server faced.

the server has about 60 texts files

later...

I have almost 1000 files

which is it?

Because I'm also not too familiar with java the only thing that forced me to change from VB.NET to Java was the ability to support multi-threading

Not sure why you say that...The CLR (on which VB.NET runs) is perfectly capable of creating threads. But I agree, since you are talking about "server" application, Java fits the role much better than VB.NET.

From my understanding, you mean messages from clients sent via their mobile? I updated the diagram below. Is it what you are looking for?

/*
----------------- Client Side ------------------
 Client1    Client2    Client3   ...    Clientn
    \          \          /                /
     \          \        /                /
       ----------------------------------
                     |
----------------- Server Side ------------------
                     |
                     v
                   ?????
       Who/What is handling this part?
                     |
                     v
                Split Message
                     |
                     v
             Add message to queue
               (using thread)
                     |
                     v
            Write messages to file
               (using thread)
*/

You don't need to create a queue per file if you don't want to, but you need to keep track of the queue and what message should be write to what file. In other words, the object added/removed from the queue would be your own customized class that contain the message and the file it should be written to.

As I understand it 11/12 is a standard Java server socket, creating a thread to handle each client socket as it connects. This is all about the threading - with 1000 clients you can't have one waiting on the other 999, similarly you need to serialise the file writes, preferably on a per-file basis.

@Taywin. I meant I'm viewing the daniweb using a mobile phone but the program is computer based. The clients sends a message to the server, the server is currently multi-thread and I've just created two clients using Visual Basic 5.0 and I made each client winsock to be able to receive the incoming response from server which is currently the client send a message the server receive the message and it forward back to that client sends it as a feedback at the moment. I think that gives me a clue on how I will make it put on pause that particular client when the "Producer Customer" is full, now the problem that still there is solving the split loop or should I say "For" so that I will be able to load to memory the message data so that I will be able to call it just like Arrays "something[0], something[1], and so on".

Ow and @Taywin about that 3rd post of yours which I responded with something with cell phone or mobile phone, where you said you not certain with the design of my system, I just looked at it using a computer and to answer that question the system is as the first one where the clients are connected to threads and the threads should connect with Producer Customer.

@JamesCherrill about the 60 files those are text files in one folder but when I combine all 10 folders gives me 600 but there are other files which are on the year folder which are almost 400 so that gives 1000. Sorry for not clearing that at first but all my samples focus on just one folder to get idea and also for simplicity.

That changes everything. Thousands of clients to 60 files means massive contention for each file, and FIFO queueing is appropriate. But if it's a thousand files then there will be little contention, and it would be OK just for each client to gain a lock on the file for the duration of its write.

Yes that what I'm trying to achieve here. A client (one client) can get access and write but what about the other let's just say 600 clients who are also trying or writing to this file at the same time? How do I make sure that the data that is written by the first client that got access remains on a file while other clients are also writing to this file and also keep these clients data that they wrote. I think it will be better if we forgot a little bit about the number of file a server has but just focus on one file for simplicity. Let say the file is test1.txt and I have a multi-thread server and as many clients as possible which writes data to this test1.txt file at the same time how will that be solved?

I've read about the lock but I didn't know where about I will put a lock on my server.

Thank you.

You simply need some arbitrary object xxx associated in some way with that file, then your file writing method needs a

synchronised (xxx) {
   // update the file
}

so that multiple threads can't do the // update the file simultaneously.

However, there's still a real decision about whether to use locks like that, or queue the transactions for each file. Which is best depends on how many files, and how many clients are trying to access any one file at once (etc). Before you go too far in either direction you really need to get a handle on the complete requirements, and throughput and ratio numbers, so you can make the right choice.

OK, I will give you 3 ideas of how to implement this.

The first idea is the diagram below. The ClientMessage class contans the file path/name that the message will be written to. The Producer Class should create the object and adds it to the queue. The Producer may utilize multi-thread functionality.

/*
 Client1    Client2    Client3   ...    Clientn
    |          |          |                |
    v          v          v                v
MsgThread  MsgThread  MsgThread  ...   MsgThread
    \          \          /                /
     \          \        /                /
       ----------------------------------
                     |
                     v
              Producer Class
      create ClientMessage object and
          add the message data to
      LinkedBlockingQueue<ClientMessage>
                     |
                     v
              Consumer Class
    remove ClientMessage from the queue and
              write-to-file
*/

The first idea will not utilize multi-thread to do the write-to-file but rather use only 1 thread/process to deal with write to file. In other words, only one thread/process removes a message to write at a time which will always append the message to a specified file.

Now, if you want to utilize multi-thread in the Consumer Class, the thread method (write-to-file) must be synchronized in order to prevent missing content when write to file.

The pros is that you do not need to worry much about arranging messages because there the LinkBlockingQueue takes care of that for you. The cons could be that it may not be scalable -- up to a certain point when a lot of messages come in at the same time, the queue gets extremely huge and could cause side effects (i.e. bottle neck, low memory, etc.).

The second idea (diagram below) is to get rid of the Consumer Class and utilize multi-thread to write inside Producer Class (which becomes Producer/Consumer Class).

/*
 Client1    Client2    Client3   ...    Clientn
    |          |          |                |
    v          v          v                v
MsgThread  MsgThread  MsgThread  ...   MsgThread
    \          \          /                /
     \          \        /                /
       ----------------------------------
                     |
                     v
          Producer/Consumer Class
            write data to file
/*

What you need to do here is to synchronize the write-to-file method. That's all you need to do.

The pros is that it is very simple to implement. You do not need to create anything else but rather control the write to file.

The cons is that it does not utilize multi-thread advantage at the maximum. In other words, even though each write may be writting to a different file, it will wait for the write each time because write-to-file allows only one write at a time.

The third idea is to lock file + synchonize with multiple thread. The diagram is the same as the second idea, but the implementation of write-to-file is a bit different. The synchonize part is when you check if the file is locked. If it is locked, exit the synchronized and go to sleep. If the file is not lock, take the lock, exit the synchronized, write to file, release the lock, and notify other threads.

The pros is that you will utilize the multi-thread at the maximum. If you correctly implement this, you should not have scalability issue.

The cons is the difficulty and complexity of implementation. You must ensure that you lock and release the file at the correct step. If you do it wrong, you could create dead lock in your system. Also, in theory, you could create starvation (a message waits forever and will never be written to file). The starvation is extremely rare but could still happen. Anyway, if you do everything correctly, there will not be any problem. However, the maintenance (code update) may affect the stability and could create a dead lock again. In other words, this type of implementation needs a lot of care.

The synchonize part is when you check if the file is locked. If it is locked, exit the synchronized and go to sleep. If the file is not lock, take the lock, exit the synchronized, write to file, release the lock, and notify other threads.

Why all that complexity? Why not just write to the file inside a synchronised block, with an object associated with each file as the lock object. The synchronised block will automatically do the waiting and notifying etc. No risk of deadlocketc, and trivially easy to implement.

Why not just write to the file inside a synchronised block, with an object associated with each file as the lock object. The synchronised block will automatically do the waiting and notifying etc. No risk of deadlocketc, and trivially easy to implement.

Yes he could, but that is not fully use of multi-thread (simply becomes the 2nd idea). In other words, the synchronised will allow only one write-to-file at a time (when the method is being call from multiple threads). :)

I did say there is a synchronising lock associated with each file. That's essential to avoid concurrent updates to one file. But it's not a global lock, and won't prevent concurrent update of two different files - two threads writing to different files will have different lock objects. Maybe you were thinking of synchronised methods rather than synchronised blocks?

Ah OK, yes I was thinking about method rather than block. The idea I proposed does not need an associated object to tie to any file but could be its own lock. It is definitely more complex. I just throw it in and has no intention for the OP to use because it could easily be too big for his project. :P

Hi guys sorry to take so long. I'm still stuck here I thought of leaving the project as is but the very same problem that I was trying to avoid occurred where many clients send the same data at the same time and many data wasn't written to the file. Is there a way to show how I can implement the solution as you have stated, I'm not a Java guru.

To simple scale the project let's use 1 text file to write to and picture having as many as possible clients trying to write to this file same time.

The answers remain the same. Either post all the updates toa llinked blocking queue and have a thread that takes entries from the q one at a time and writes them to the file
Or
Put the file update code in a synchronised block using some object associated with the file as the lock object, so the file updates for any given file will automatically be serialised.

With only one file I prefer the queue, but with many files I prefer the synch blocks..

You mean with many files accessed simultaneously I should use synch block?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.