| | |
Preventing out of memory errors while threading
Please support our C# advertiser: Intel Parallel Studio Home
![]() |
Greetings all,
I'm actually proud to have a question I've never seen asked here. I've got a 1.5GB text file (YES, it's really 1.5GB) where each line needs a set of modifications. I've got the application running in a single-thread, whereby each line is read in, the modification is made, and the line is written back out.
This is fine, but it's slow. It took about an hour for the application to traverse this file. The IO itself is very fast, it's just the processing takes a second to do. So here's what I'd like to try:
While the reader has text in it, read a line.
Start a new thread to process the line, write the modified line to a file.
Sounds simple enough. The only thing is that each "line" could be anywhere from 1KB to a couple hundred MB a piece. This isn't a problem with the single-threaded application, but it is an issue when you're dealing with multiple threads, even on a quad-proc machine with 4GB of RAM.
Basically, I'd like to dynamically create a new thread to deal with each line that is in the TextReader. Once the thread finishes, it should write its data to file and release its resources. Past that, I'd like to consider a way to do it without using too much memory, or deal with OutOfMemory exceptions "cleanly", ie, without loss of data from the TextReader.
This might sound kind of complicated, I don't know. Can anyone suggest a direction to roll, or even a better strategy?
I'm actually proud to have a question I've never seen asked here. I've got a 1.5GB text file (YES, it's really 1.5GB) where each line needs a set of modifications. I've got the application running in a single-thread, whereby each line is read in, the modification is made, and the line is written back out.
This is fine, but it's slow. It took about an hour for the application to traverse this file. The IO itself is very fast, it's just the processing takes a second to do. So here's what I'd like to try:
While the reader has text in it, read a line.
Start a new thread to process the line, write the modified line to a file.
Sounds simple enough. The only thing is that each "line" could be anywhere from 1KB to a couple hundred MB a piece. This isn't a problem with the single-threaded application, but it is an issue when you're dealing with multiple threads, even on a quad-proc machine with 4GB of RAM.
Basically, I'd like to dynamically create a new thread to deal with each line that is in the TextReader. Once the thread finishes, it should write its data to file and release its resources. Past that, I'd like to consider a way to do it without using too much memory, or deal with OutOfMemory exceptions "cleanly", ie, without loss of data from the TextReader.
This might sound kind of complicated, I don't know. Can anyone suggest a direction to roll, or even a better strategy?
Alex Cavnar, aka alc6379
•
•
Join Date: Jul 2008
Posts: 39
Reputation:
Solved Threads: 4
A dynamic count of threads isn't always a good idea, in fact it's often a bad idea. Since you have a quad core machine you should start with 4 threads (one per core), then try 8, 16, 20, etc until you see the logrythmic effect of too many threads. Adding more and more threads won't always make things faster as context switching at the CPU level isn't cheap.
What's interesting is that I did take that approach, and it did seem to work. However, I had some concerns. What if say, 20 threads per core worked on one machine, but not on another?
I wound up using the ThreadPool. With a 4 core processor it made 1000 threads available. Oddly enough, it worked like a charm-- I didn't run into any Out Of Memory Exceptions. I know it could have been a fluke, so I'm still going to investigate a good way to manage memory in this environment. I don't want to have to write my own TextReader to properly seek when an exception happens, but if I must, I must...
I wound up using the ThreadPool. With a 4 core processor it made 1000 threads available. Oddly enough, it worked like a charm-- I didn't run into any Out Of Memory Exceptions. I know it could have been a fluke, so I'm still going to investigate a good way to manage memory in this environment. I don't want to have to write my own TextReader to properly seek when an exception happens, but if I must, I must...
Alex Cavnar, aka alc6379
•
•
Join Date: May 2008
Posts: 6
Reputation:
Solved Threads: 0
How do you manage threads? are just you triggering and then joining them?
This link may help: http://www.cs.cf.ac.uk/Dave/C/node29.html i also work with threads, my matter is that threads needs to be concurrent (and they share information between them), just like small daemons inside the process.
Good luck!
This link may help: http://www.cs.cf.ac.uk/Dave/C/node29.html i also work with threads, my matter is that threads needs to be concurrent (and they share information between them), just like small daemons inside the process.
Good luck!
•
•
•
•
How do you manage threads? are just you triggering and then joining them?
This link may help: http://www.cs.cf.ac.uk/Dave/C/node29.html i also work with threads, my matter is that threads needs to be concurrent (and they share information between them), just like small daemons inside the process.
Good luck!
The only thing concurrent about my threads, so much, is that they share a single text reader and text writer. I deal with that using the Synchronized() methods on each of those. The order I write the lines I'm processing to file don't matter much. It just matters they all get in.
I'm managing the threads "Automagically", I guess you could say. I'm using ThreadPool.QueueUserWorkItem to add jobs to the thread pool. Then the thread pool runs jobs as the threads become available. It's actually working now, which I found kind of funny... Still doesn't seem like the memory management issue is being addressed. But hey, in this case I'm just going to run with it for now, because it's showing results that are satisfactory enough... for now.
Alex Cavnar, aka alc6379
![]() |
Similar Threads
Other Threads in the C# Forum
- Previous Thread: How would I code in C# for Revit to move an object
- Next Thread: Understanding Partial Classes...
| Thread Tools | Search this Thread |
.net access ado.net algorithm array barchart bitmap box broadcast buttons c# capturing check checkbox client color combobox control conversion csharp custom database datagrid datagridview dataset datetime degrees development drag draganddrop drawing encryption enum error event excel file files firefox form format forms function gdi+ httpwebrequest image index input install java label libraries list listbox listener loop mandelbrot math mouseclick mysql operator path photoshop picturebox pixelinversion port post programming radians regex remote remoting richtextbox running... saving serialization server sleep socket sql statistics stream string table tcp text textbox thread time timer update usercontrol validation view visualstudio webbrowser windows winforms wpf xml






