Hey everyone,

Great forum. Glad there is so much help out there.
I am currently working in VS2010, Visual Basic.
I am working with a fairly large array: RawData(0 To 3799, 0 To 1259, 0 To 4).
I am importing the data from an XML file, so this takes several hours to import.
I only need to import it once. I was wondering if there is a quick way to save the multi-dimensional array and reload it later rather than recompiling the array everytime the program loads.
I tried IO.File.WriteAllLines(FileName, GlobalVariables.RawData), but this apparantly tries to convert it to a single-dimension array and fails.

Thanks for the help!!

Recommended Answers

All 17 Replies

Hi

Wow, several hours to load, how big is this file?

If you only need to import it once but reuse it as and when, have you considered using a database. This would be a lot more performant than working with an XML file, especially a very large one.

If you can provide some information on what it is you are trying to do and why you are using the approach that you currently are, maybe we can offer a better solution or speed up your loading process.

Well, it is actually about 500 different XML files. Each one contains about 6000 data points that must be read.

I think a database is probably where I should be headed, but I don't konw anything about them. Do you have any good references to getting started?

Hi

I have a beginners tutorial on working with Access using C# (there is a VB.NET download linked there as well) and ADO.NET which will teach you the very basics of communicating with a database in order to read data, create, update and delete data.

Once you understand the basics, you could then look at how you might go about creating a new database that is structured for your data (maybe on a small scale to ensure that it is the right approach) and how you would modify your program to use the database.

If you get stuck, just shout.

Okay, thank you, I will absolutely look into this.
What I am making is a stock quote database.
So my massive array was [COMPANY SYMBOL] [DATE] [OPEN CLOSE DIVIDEND SPLIT]
So I have 500 companies, going back 5 years is 1260 days (252 trading days per year), and 4 data points per day.

Please confirm you think this database is the best option for that much data.

Are you going to write '500' for 500 companies or are you going to write up to 500 'companies'? And so is the other data too or? Please elaborate there.

Do not use Access. For that much data you are much better off using SQL. If you want to go Microsoft then I suggest MS-SQL. I believe you can have up to 10 gig of data with the free version. There are better tools available for maintenance (free - SQL Server Management Studio 2008 Express). Also, Access is prone to "breaking" with large amounts of data. I had to maintain an application that was written on top of Access and it required a database rebuild once a week. Unfortunately the developer used native Access calls rather than ADODB which would have allowed easy porting to MS-SQL.

Depending on the format of the XML, it may be possible to bulk import the XML files directly into your database rather than having to parse every file manually. This would save a lot of time and effort.

Yes, a database is appropriate for this and as suggested using MS-SQL would be a good option if the data is very large (greater than 2gb). While Access is not the best database, if it is designed well and not many users then it works fine but as you can get SQL Express for free you may as well go this route. If the data is large or you are going to have a lot of users then definitely avoid Access for production use.

The tutorial I pointed you to uses Access but the principles are the same, the only difference is that instead of OleDb you will use the SqlClient objects.

Finally, if you do go MS-SQL then you can create some SSIS packages to bulk import data although I don't believe this feature is available to Express versions.

There are several examples of database code in the Code Snippets section and I can answer most questions unless you get complicated. If you are willing to put in the effort I'll make the time to help.

Sorry, is posting links to external tutorials frowned upon?

Not as far as I'm concerned. There is no point in duplicating effort so if a good tutorial is available then, by all means, post a link to it.

How are you currently reading/"importing" the XML files? Perhaps you can post one or two of the XML files as well.

If you do follow the code snippets, please make sure they are using ADO.NET and not the ADODB references from older versions of Visual Basic.

Okay, thank you for your efforts everyone. I have done quite a bit of digging and realized there is no way my data is going to be less than the 2 GB limit. So i am using MemoryMappedFiles which appears to solve my problem. I won't be able to use my big giant variable, but instead I will import the variables as they are needed.

See here for more details. The maximum database size for the free edition is 10 gig, not 2 gig.

SQL Server 2008 R2 Data Types

nvarchar(n): Variable-length Unicode data with a length of 1 to 4000 characters. Default length = 1. Storage size, in bytes, is two times the number of characters entered.

DateTime: ...Stored as two 4-byte integers...

money: Storage size is 8 bytes.

I'm not well-versed on the stock market, so I'm not sure what "Open close dividend split" is. Is this a monetary value? Also what is the maximum length of the a stock symbol?

Stock symbol 2 x number_of_characters = # of bytes
Trading Date (DateTime): 8 bytes

Someone correct me if I'm wrong, but I think you can compute the space required by doing something similar to the following:

Example:

Stock Symbol (nvarchar(10)): 10 x 2 = 20 bytes
Trading Date (DateTime): 8 bytes
Opening Price (money): 8 bytes
Closing Price (money): 8 bytes
High Price (money): 8 bytes
Low Price (money): 8 bytes

Total bytes per record: 20 + 8 + 8 + 8 + 8 + 8 = 60 bytes / record

Total records (1 company): 5 yrs x (252 trading days / 1 yr) x (4 records / 1 day) = 5040 records

Total records (500 companies): (5040 records / 1 company) x 500 companies = 2,520,000 records

Total (Bytes): 2,520,000 records x (60 bytes / 1 record) = 151,200,000 Bytes

Total (KB): 151,200,000 bytes x (1 KB x 1024 bytes) = 147,656.25 KB

Total (MB): 144.19 MB

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.