Hi! I am working on an program that basicly needs a large list. The program works fine untill i comletely fill up the "List<Object>" type. And no, it is not due to lack of ram. The program tries to add more than System.Int32.MaxValue ammount of objects. I have thought about simply having an array (or list) of lists, but somehow this is not tempting due to what i need to do to the rest of the code to make this work.

Any ideas as to what I should do?

Cheers

Figure out a different method to use. I can see no use for having over 2 billion objects. Maybe if you told us what you were trying to do, better methods would suggest themselves.

Basicly i use the ridiculus list to check if I allready have handled this object, so it is sort of critical. It is a crawler.

To clarify. It does not _need_ to be a "List". If any other sort of collection got a 64-bit adress space that would do the trick :)

Hehe. You see. I do this. But the performance I get when i extract info from the DB is terrible. I wish to store all this info in the ram simply for the speed gain. I could of course implement some sort of filtering/garbage collection in the list, but simply having lots more objects to work with would be wonderfull.

The performance of the database is infinitely faster than your RAM only solution, since the DB actually works. I suspect you need to optimize your DB and use stored procedures, indexing, etc. There is still zero reason to have 2 billion data items in memory at once.

Comments
The only sensible way to do this.
Very good suggestion.

Yeah I'm going to have to agree with Momerath here. How is your database structured (and no, a flat file is not a database!) ? Even with 2 billion rows, a database (even SQL Server Express) should be able to easily handle what you're doing.

If you're going to be looking up an object multiple times, consider using a List as a cache, rather than the main data store. If you're checking to see if something has been processed before, first check the list. If it's not there, check the database. If it isn't in either place, process it, then write a record of processing to the List. At some point, have a method flush the List down to the database, so it's always stored that you processed it. If you really wanted to get fancy, you could add a Time Added property to whatever object you're storing in the List, so you could remove items that haven't been touched in a certain period of time. That would also help deal with RAM usage.

Indeed that would help. I use mySQL by the way, and I have tried to use a database only version where I check with the db if what I am going to do is in it. This does _work_, indeed, but it is sluggish. But I really like the idea you gave me here alc6379, where i have a list of (up to) 2 bill. items and if i dont find it there i check the db. But, basicly, that would really not speed up anything, because most of the stuff i find is not there. Hummm..

That would result in lots of redundant data checks. I have to think some more. Maybe writing elements that is not used much to a file?

What kind of stuff are you downloading? I promise you, there's GOT to be something you could be doing to the database, either by indexing or something, (or a memory cache there) that can definitely increase your speed. People are using mySQL to handle sites with MILLIONS of transactions per second-- it's definitely going to handle what you're doing.

Maybe it's not the database itself-- maybe it's the overhead in opening or closing SQL connections? You could have a process manage that connection, or keep it open for the duration of your processing. That way, you only had to open it once, and not incur that performance hit every time.

Those are probably the places you're really getting your performance drags from-- IO and opening db connections. If you manage those properly, you'll probably get a big speed increase.

I already have. I got a connection per working thread. The issue I think, is that the DB i use is external an it is a student DB, so it probably got low pri. on the network. I have to put this on hold for now because of exams coming up, but I will definitely try to fix this issue when I got the time. It is obviously more time consuming than I first thought. I should be able to make some sort of DB arrangement that performs adequately. The whole program is getting an overhaul soon. I might discover some other issues that might "aid" the performance-hit the DB-only solution have suffered.

And, it is a web-crawling prog, essentially.

It's external as in, possibly on a different subnet, from you, or even a different local network segment? That's DEFINITELY going to be a big issue.

Can you possibly try experimenting with running a local instance of mySQL on your machine? Sure, you won't have tons of RAM, but you may not even need it, since it's just your machine needing connections to the database.

I transfer over the net, but the the machine I am running on is connected to the same, ehh, I don't really know the right words for it. I am running a high-speed connection on the same main net as the ISP, and the server is on that same net. (It is a university net.) I have something like a 100Mb connection to the server, but it is more latency than it would be if it was on the local machine. So you see, thats why I want the crazy big list:) I got enough ram to hold it all (for some time at least). But eventually I need to make a more reliable solution, and a database reliant version of my current software would fit the bill:)

As I mentioned though, there will be some major changes in the high-lever structure soon (after the exams), where I will utilize a much better structure that would perform potentially much better than what it currently does. And it is quite possibly more compatable with the database solution too.

This article has been dead for over six months. Start a new discussion instead.