Data from Text File into Dynamic Array

Question

DavidB 44 Junior Poster

10 Years Ago

For the past while, I have gotten comfortable doing things a particular way when it comes to inputting data:
i) create text file whose first entry is N, an integer indicating the number of data entries following, and the actual data entries. For example, for 101 data values, the text file might look as follows:

101
0.276706347633276
0.486631466856855
0.63439588641705
0.740012950708629
. . .

ii) In C++, open the file, read the first entry, and use this number to create a dynamic array of this size, and fill the array:
e.g. -

. . .
vector<double> raw_Array;
ifstream in("Fourier_Forward_Data.txt", ios::in);
in >> N; //Input the vector dimension from the file
  // Beginning of try block, for dynamic vector and array allocation
 try {
     // Resize raw_Array to accommodate the N data values
     raw_Array.resize(N); 
 } // End of try block

 catch (bad_alloc& xa) { // Catch block, for exceptions
     in.close();
     out.close();
     cerr << "In catch block, so an exception occurred: " << xa.what() << "\n";
     cout << "\nEnter any key to continue. \n";
     cin >> rflag;
     return 0;
 } // End of catch block

for (int i = 0; i < N; i++){ //Input the raw_Array elements from the text file
     in >> raw_Array[i];
 }//End for i

. . .

However, I am starting a brand-new program and would like to try something a little different: I would like to avoid the need for the first entry in the input text file (the integer stating the number of data entries are to be input.) So the modified input file would be as follows

0.276706347633276
0.486631466856855
0.63439588641705
0.740012950708629
. . .

Note that the first value (101) is now not included.

The input data file could include a random number of entries (e,g. - 101, 1037, 10967, etc.). Whatever the number, I would like the C++ program to determine that number on its own, size the vector appropriately, and input the data as usual.

I am rusty when it comes to file manipulation, but I think the above code snippet would be modified as follows:

vector<double> raw_Array;
int N = 0;   //The number of data entries which will be input from the file
ifstream in("Fourier_Forward_Data.txt", ios::in);

  // Beginning of try block, for dynamic vector and array allocation
 try {
    while(!in.eof()){
        N++;
        // Resize raw_Array to accommodate the next data value
        raw_Array.resize(N); 
        in >> raw_Array[N-1];
    } // End while(!in.eof())
 } // End of try block

 catch (bad_alloc& xa) { // Catch block, for exceptions
     in.close();
     out.close();
     cerr << "In catch block, so an exception occurred: " << xa.what() << "\n";
     cout << "\nEnter any key to continue. \n";
     cin >> rflag;
     return 0;
 } // End of catch block

. . .

Several questions:
Will this work?

The while(!in.eof()) loop is within the try-block. Is this okay? Or should the try-block be within the while loop? Does it matter?

Either way, the array is resized each time through the loop. Is there a better way to do this?

What is the "best practices" way of accomplishing this task?

Any advice and suggestions are appreciated.

c++

4 Contributors
6 Replies
3K Views
1 Week Discussion Span
Latest Post 10 Years Ago Latest Post by DavidB

NathanOliver 429 Veteran Poster

10 Years Ago

When I read an unknown amount of data from a file I will use the following

std::vector<int> data;
ifstream fin("sampleData.txt");
int temp;  // used to store the data from the file temporarily

//use >> operator to control the loop so you end on the last good read
while (fin >> temp)
{
    data.push_back(temp);
}

Alternatively you can use std::copy and istream iterators

std::vector<int> data;
ifstream fin("sampleData.txt");

std::copy(std::istream_iterator<int>(fin), std::istream_iterator<int>(), std::back_inserter(data));

mike_2000_17 2,669 21st Century Viking

10 Years Ago

Will this work?

Yes. However, notice that there is a difference between that program and your original one. In the original program, if there is not enough memory (bad-alloc exception) to store all the numbers, the program will not read any number at all. In the second version, the program will read numbers up to the maximum that it can read before it runs out of memory (if it ever runs out of memory). That could make a difference if you expect the amount of data to be large enough to exhaust the memory available. For example, if you are running this within a multi-threaded application, the second version could exhaust the memory and cause a bad-alloc problem in a concurrent thread, where you might not expect such an exception to occur (and thus, not have the try-catch blocks to handle it).

The while(!in.eof()) loop is within the try-block. Is this okay? Or should the try-block be within the while loop? Does it matter?

It is better to put the try block outside the while loop, as you did. The reason why it matters might not be obvious at first, but it's an interesting one. In general, with loops, you want to keep the inside of the loop as short as possible because this is code that gets repeated over and over again. If the code for a single iteration can fit within a cache page, then the entire loop can run over and over again without ever causing a page swap, which will result in a significant speed improvement (unless there are other bottlenecks involved). In this particular case, the file reading is most likely to be the bottleneck so it probably won't make much of a difference, but still, it's a good habit to keep the inside-the-loop code limited to the things that need to be done at each iteration and move any other code (like exception handling or termination code) outside the loop, if possible. But then again, a decent compiler will generally do this work for you by removing such code from the loop as an optimization.

In general, the rule for try-blocks is to make them as large as possible. In fact, in this case, it would be even better to move the creation of the input stream inside the try-block, which removes the need for the in.close(); call inside the catch-block because the input stream object will be destroyed by the stack-unwinding of the try-block and thus, close the file. This would look like this:

std::vector<double> raw_Array;
// Beginning of try block, for dynamic vector and array allocation
try {
  int N = 0;   //The number of data entries
  std::ifstream in("Fourier_Forward_Data.txt", std::ios::in);
  while( in ) {
    N++;
    // Resize raw_Array to accommodate the next data value
    raw_Array.resize(N); 
    in >> raw_Array[N-1];
  } // End while( in )
} // End of try block
catch (std::bad_alloc& xa) { // Catch block, for exceptions
  cerr << "In catch block, so an exception occurred: " << xa.what() << "\n";
  cout << "\nEnter any key to continue. \n";
  cin >> rflag;
  return 0;
} // End of catch block

Either way, the array is resized each time through the loop. Is there a better way to do this?

It is true that whichever way you do it, the array is resized each time through the loop. However, you must know that when you call resize on the vector, it will not, in general, cause a copy of all the data because the resize function is only required to accomodate at least that size, but is free to allocate more capacity. In most standard implementations of the vector class, they will use a geometrically progressing allocation scheme such that capacity is only increased on rare occasions (e.g., double the capacity every time it's exhausted). Basically, whatever you do (e.g., using one of NathanOliver's codes), this will be the effect.

But this is not all, because the term "better" is subjective. One of the ways you can consider something to be better than something else is by how error-prone the code is or how easy it is to maintain. If you look at even that simple while-loop, there are a number of places where "stupid" mistakes could be made that could cause annoying bugs. For example, if you forget the ! in the !in.eof(), you would end up not doing anything. If you forget to write the N++;, you might end up in an infinite loop and some access violations (segmentation faults) when accessing the element at rawArray[N-1], or worse. Also, if you forget the -1 in that access of the last element, you could also end up access the values passed the end of the vector. If you look at the versions proposed by NathanOliver, you will notice that none of those problems are present in them, i.e., there are virtually no opportunities to make a dumb mistake.

In the same vein, the versions from NathanOliver are much more obvious as far as expressing what the loop does. The first version clearly says: try to get a value from the file and add it to the vector, until you fail to read a new value from the file. In the second version, it clearly says: copy the values from the file-stream to the vector. In other words, the code is self-explanatory, and therefore, easier to maintain. In your version, the code is much less obvious because it requires more decoding of what it does: "ok, I'm going to eventually reach the end of the file... ok, I'm incrementing the 'N' value... oh, I'm resizing the vector with it, so I guess 'N' is the size of the vector... and now, I'm reading a value from the file into somewhere in the array.. at the 'N-1' position, which, if I get it right, is the last element of the newly resized by one vector..". This kind of decoding of the meaning of the code is what creates problems during maintenance of a library, because if you don't understand it right, you might edit it in a bad way that breaks the code.

What is the "best practices" way of accomplishing this task?

Simply put, if there's an STL algorithm to do it, then use it. And for try-catch blocks, make them as wide as possible, and use the RAII features of the classes to reduce the amount of clean-up code (just let RAII destructors do the work during stack unwinding).

Here is a pretty robust way to do it, which sort of reproduces your original code's behavior:

std::vector<double> data;
try {
  std::vector<double> tmp_data;
  std::ifstream fin("sampleData.txt");
  std::copy(std::istream_iterator<double>(fin),
            std::istream_iterator<double>(), 
            std::back_inserter(tmp_data));
  data.swap(tmp_data);
} catch(std::exception& e) {
  std::cerr << "In catch block, an exception occurred: " << e.what() << "\n";
};

where a temporary data vector is used to store the data. If there is a bad-alloc exception (or any other) thrown during the operation, the file will be closed (as fin is destroyed) and the data will be destroyed (as the temporary vector is destroyed), and thus, restoring everything to how it was, as if nothing ever happened (no partial data read). If everything goes well, the temporary vector will be swapped into the definitive vector data, which is a very cheap (and no-throw) operation (just swaps some internal pointers). This is a good trick to minimize the collateral damage of an exception being thrown in this kind of situation.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

David W 131 Practically a Posting Shark · Answer 1 · 2014-08-06T07:36:23+00:00

@mike_2000_17 ... I thought your code above looked pretty cool ... and so thought that I would test it out, when I had a chance ... using a C++11 compiler.

I had a little trouble getting it to handle both kinds of errors ... (file open errors and running out of memory errors) ... as it was coded above.

This is what I came up with that actually did handle both kinds of errors the way expected ... and I thought you might like to see it.

// loadFromFile.cpp //

#include <iostream>
#include <vector>
#include <fstream>
#include <iterator>


const char* FNAME = "sample.dat"; // vs non file "sample2.txt";

// contents of test file: "sample.txt" :
/*
0 1 2 3 4 5 6 7 8 9
*/



int main()
{
    std::vector< double > data;

    try
    {
        std::ifstream fin( FNAME );

        if( fin )
            std::cout << FNAME << " opened ok ...\n";
        else
            fin.exceptions( std::ifstream::failbit );


        std::vector< double > data2;
        //data2.reserve( 10000000000 ); // to test 'a failed state'
        std::copy
        (
            std::istream_iterator< double >( fin ),
            std::istream_iterator< double >(),
            std::back_inserter( data2 )
        );

        data.swap( data2 );
    }
    catch( std::ios_base::failure &e )
    {
        std::cerr << "Catch block open file '" << FNAME
                  << "', had exception: '"
                  << e.what() << "'\n";
        return -1;
    }
    catch( std::exception& e )
    {
        std::cerr << "Catch block re. filling vector"
                  << ", had exception: '"
                  << e.what() << "'\n";
        return -2;
    }


    std::cout << "\nShowing contents of 'data' vector ... \n";
    for( auto e : data ) std::cout << e << ' ';

    std::cout << "\n\nPress 'Enter' to continue/exit ... "
              << std::flush;
    std::cin.get();
}

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 2 · 2014-08-06T18:42:18+00:00

so thought that I would test it out, when I had a chance ... using a C++11 compiler.

Why? My code does not require any C++11 features. It is purely C++98 code.

This is what I came up with that actually did handle both kinds of errors the way expected ... and I thought you might like to see it.

Ah.. yes, this has been a pet peeve of mine with the iostream library, i.e., the lack of commitment to using exceptions. Despite a lot of nice design patterns that are used in the iostream library, there are also some glaring problems. One such problem is that it was largely written at a time when people were still very reluctant to the adoption of exceptions as a primary error-handling mechanism. So, even though they made all the iostream classes into well-designed RAII classes that can be gracefully used alongside exception throwing and handling, they made them use error-codes (or error-states, which is the OOP equivalent of error-codes) as the default error-handling method and made exceptions second-class citizens in the iostream classes. I find that the option to report iostream errors via exceptions is so poor that you can hardly use it. If I have to, I just check the error-states and throw an exception if needed. Like this:

try
{
    std::ifstream fin( FNAME );
    if( !fin.is_open() )
        throw file_not_openable(FNAME);
    std::vector< double > data2;
    std::copy
    (
        std::istream_iterator< double >( fin ),
        std::istream_iterator< double >(),
        std::back_inserter( data2 )
    );
    data.swap( data2 );
}
catch( file_not_openable& e )
{
    std::cerr << "Catch block open file '" << FNAME
              << "', had exception: '"
              << e.what() << "'\n";
    return -1;
}
catch( std::exception& e )
{
    std::cerr << "Catch block re. filling vector"
              << ", had exception: '"
              << e.what() << "'\n";
    return -2;
}

I'm pretty sure that if the iostream was to be written from scratch today, it would be done very differently. There are basically two big criticism of the iostream library: it's too bloated, and its error-handling is too coarse / simplistic / C-style. The weird thing is, the reason it does not use exceptions is mainly because some people want to be able to avoid exceptions altogether (N.B.: I don't think they are justified in thinking that, but they do think that), and making iostream rely too heavily on exceptions would have made the iostream library essentially unusable for these people. However, most of these people are not using exceptions because they feel it has too much overhead (not really true), and people who are concerned with overhead that much (e.g., for embedded / resource-deprived systems) will definitely not be using the iostream library, because it's so bloated. I have never seen a project that uses the iostream library but not exceptions, but you often see the opposite (using exceptions, but not the iostream library). If the iostream library was rewritten today, they would probably take a much leaner approach, and have more mature support for exceptions and use it as the primary (default) error-handling mechanism.

For example, I wish it was possible to have different exceptions from the different causes of errors (cannot open, failed conversion, end-of-file, etc..) instead of a single general "failure" exceptions. This, by itself, makes iostream / exceptions totally impractical without creating your own exceptions and throwing them upon checking the error-states on the stream, which is a bloated anti-pattern.

David W 131 Practically a Posting Shark · Answer 3 · 2014-08-07T01:28:01+00:00

Again @mike_2000_17, thank you for your thoughtful answers.

I have pretty much, to-date, avoided using exceptions in C++, and instead, checked the various values returned ... or the error flag states set, when doing io, memory allocation, etc...

Having enjoyed using HLA ... and then Python exception handling, I have not really 'seen' much of how ... or when ... to 'nicely' handle exceptions ... in C++

Your example code above, however, really looked interesting and useful.

I too, first thought of using:

if( !fin.is_open() )
    throw file_not_openable(FNAME);

as a way to handle that 'problem' ...

but elected to see if there was some existing/avaible other way that I could use ... thus the code I tried above that seemed to also do the particular job desired.

But I think the way, you used in your code, makes for clearer code 'logic flow' and clean read-ability.

Thanks again for taking the time to comment ... and all the useful insights.

David

DavidB 44 Junior Poster · Answer 4 · 2014-08-11T23:06:50+00:00

Thanks for all the feedback.

I will edit my program to implement some of these suggestions. The other option I was thinking about was push_back(), to append values to the end of the array. But I guess that won't be necessary after all. Hopefully this all works out; I'd like to have a very robust code block for this task, and plan to use it for future programs. That way, I know it works and don't have to second-guess myself later.