So I must build an app that has to do with mail address management and more.
Let's say the user has an excel file with 2 millions of rows (email addresses). I made it the oledb way and the first mistake I made was putting ~500k rows in a datagridview, bad bad mistake. The tiny app turned out to occupy ~700mb of ram.
I ditched the datagridview for now (I will later implement it with virtualization + on demand pages). Now with only the dataset it goes to about 170mb then settles at around 100mb of ram.
I would really appreciate some advice on what's the best way to deal with this kind of files(excel, text, csv all with about 2 mil rows) keeping in mind that I need to verify each row against a regex expression, delete duplicates and export to excel, csv or text files.
ogrishmania
0
Newbie Poster
Recommended Answers
Jump to PostAre you stuck with this technology choice? Because from your description, this project has outgrown Excel as a useful backing data store. I'd consider a database-oriented solution like MySQL (which does incorporate regular …
Jump to PostI think that, if possible, the data could be read in chunks of, say, 2K rows. Each time a chunk is read, first look for duplicates inside the chunk and, then, compare to the other 2M – 2K. If a row is present in the chunk of 2K and in …
All 8 Replies
gusano79
247
Posting Shark
xrjf
213
Posting Whiz
Premium Member
Reverend Jim
4,780
Hi, I'm Jim, one of DaniWeb's moderators.
Moderator
Featured Poster
ogrishmania
0
Newbie Poster
Reverend Jim
4,780
Hi, I'm Jim, one of DaniWeb's moderators.
Moderator
Featured Poster
Reverend Jim
4,780
Hi, I'm Jim, one of DaniWeb's moderators.
Moderator
Featured Poster
ogrishmania
0
Newbie Poster
ogrishmania
0
Newbie Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.