input iterator optimisation

Question

trantran 0 Junior Poster in Training

11 Years Ago

I want to optimize a directory listing input iterator so that the WIN32_FIND_DATA can be written directly to a vector (via emplace) instead of being first written inside the iterator.

This is an idea of the code:

/*
std::vector<WIN32_FIND_DATA> vwfd;
struct listing_it:std::iterator<std::input_iterator_tag, WIN32_FIND_DATA>{
    HANDLE handle;
    WIN32_FIND_DATA wfd;
    auto operator*()->wfd&{return *wfd;}
    auto operator++()->listing_it &{FindNextFile(handle, &wfd);}
    ...
}
*/

Eventually used in something like: copy(lising_it_begin, listing_it_end, back_inserter(vwfd));
Of course that code is not really correct nor good (in reality I use smart pointers for the handle, etc).
What I really want is that FindNextFile use the actual location of the WIN32_FIND_DATA in the vector. Is this possible in c++11 or do I have to pay the cost of internal copying and therefore forego the use of input iterators for more optimised method (that are actually much simpler).

c++

3 Contributors
8 Replies
180 Views
6 Hours Discussion Span
Latest Post 11 Years Ago Latest Post by mike_2000_17

All 8 Replies

mike_2000_17 2,669 21st Century Viking

11 Years Ago

Ok, so let me get this straight. Currently, your code ends up doing to following thing for each iteration:

Fill in the data into a WIN32_FIND_DATA object (within your input iterator) via the FindNextFile function.
Copy-construct a new WIN32_FIND_DATA object as part of "push_back()".

And you want to change it to be:

Fill in the data into a new WIN32_FIND_DATA object (within the vector) via the FindNextFile function.

The problem with doing this is that there is really no way to avoid constructing the WIN32_FIND_DATA object before you can fill it with data from the FindNextFile function. That's by design of the FindNextFile function. The best you could do would be to default-construct a new WIN32_FIND_DATA object in the vector (and because it is a C struct, default-construct is trivial, meaning that it does nothing at all), and then fill it in. In other words, the most efficient version would be this (as a simple loop):

std::vector<WIN32_FIND_DATA> vwfd;
HANDLE handle;
while( /* still have files.. */ ) {
  vwfd.emplace_back();
  FindNextFile(handle, &vwfd.back());
}

As far as finding a way to coerce that into a back-inserter / input-iterator framework, I don't think it's worth the trouble. It could be possible, by using some kind of proxy class.... but really, you might just try to rely on NRVO (Named Return-Value Optimization), with something like this:

struct find_file_iterator {
    HANDLE handle;

    WIN32_FIND_DATA operator*() const { 
      WIN32_FIND_DATA wfd;
      FindNextFile(handle, &wfd);
      return wfd;
    }
    find_file_iterator& operator++() { return *this; }
    ...
}

This may not be what you expected (as it would seem very inefficient), but it is likely to be pretty good. Whenever something calls to dereference the iterator, a new record is created (default, trivial, i.e., "no-op") and immediately filled and then returned. Because of the fact that you have a unique return variable (the wfd variable), any good compiler should construct that return variable directly in the place where it goes as a return.

Now, this will not solve the problem because RVO can only really help when the destination memory is a stack variable, not heap memory. This means that it would inevitably have to be copied at least once to get it to the vector's memory.

I'm trying to think of a way to do this with a proxy... but I don't think it's possible to really avoid that one copy.

However, I should point out how pointless this whole exercise is. First, the WIN32_FIND_DATA struct is a C struct (trivial, standard-layout type). This means that copying it is just a raw "memory-copy" (memcpy), which is a ridiculously cheap operation to perform, even for a moderately sized chunk of memory. Second, the FindNextFile is obviously a very costly function to execute, compared to copying or appending an object to a vector. In other words, it is extremely unlikely that any effort you put into this is going to pay off in any way. So, for that reason, I say, just go with whatever is more convenient in the broader context (is it simpler to implement the simple while-loop I posted above? Or is it better for you to remain harmonious in your use of iterators?).

mike_2000_17 2,669 21st Century Viking

11 Years Ago

Regarding the speed cost, it may add up because if you travel a full directory tree (with > 1M files entry) that is memory buffered, going from one item to another is just a jump of ptr but copying the structure probably becomes a time-critical event.

Uhmm.. I'm not sure what you mean here. The cost of that extra copy of the object is not going to become more expensive as the record becomes bigger. It might add up, but it doesn't become worse. And, it will always remain an insignificant fraction of the whole execution time.. because everything else adds up too.

(yes I know, it is more probably the memory access lag)

Well, that depends. The memory access lag (i.e., reading off the file list from the file-system on the hard-drive) might not necessarily be that bad because of caching (temporarily putting HDD memory on RAM for quick access), pre-fetching (anticipating up-coming reads), and other things that can streamline that access. Of course, reading off a list of files on the HDD is always going to be very slow in terms of memory access. But, even if that wasn't there as a bottle-neck, you would still have the context-switch overhead. The thing here is that reading off the file-system requires a temporary context-switch from user-space (where applications run) to kernel-space (where the OS runs), and that is expensive enough by itself to render the copying cost insignificant.

Consider this:
- Cost of copying a chunk of memory: a few block copy operations on the cache, usually amounting to a few clock-cycles, possibly up to a few hundreds if you do a really terrible job at laying out your data structures;
- Cost of a context-switch: at least a few thousand clock-cycles, up to about a million clock-cycles; and,
- Cost of reading something off the hard-drive: at least a few millions on clock-cycles, and unbounded (could be much more).

Basically, the cost of one copy will amount to less than the amount of noise you will measure on the time required for one average call to FindNextFile.

I am somewhat disappointed with the iterators in general as I think there could have been smarter way to do it that would avoid all those unecessary copying.

Yes, I agree. Iterators are not perfect. I think they are a bit too close to pointers. By wanting to make iterators behave like pointers, they missed a few opportunities for allowing some optimizations. But like anything else, there are trade-offs, and overall, I think iterators are pretty great.

For example, the reason you cannot implement your optimization is because the output iterators (like back-inserter) are given a value as *it = value; which uses the assignment operator, which can only overload on the receiving end (as a member function of the "receiving" class). But, in your case, what you really need is to be able to overload the assignment behavior on the "giving" end (as a member of a proxy or input-iterator). If they had chosen a mechanism that allows overloading from either side, then you would have been able to implement your optimization through a special "back-emplacer" iterator and a proxy for the input-iterator. But, then again, this kind of interface would have been a bit of a can of worm to maintain or deal with for the standard committee and for the programmers themselves (twice the opportunity for overloading = twice the trouble).

However, I am very enthusiastic about c++11 that I find immemsely superior to the previous version (and I can finaly use it with vs2013, the first msft version that really works (at last!)). (I still am unable to compile the more complete latest versions of clang.

Yeah.. clang on Windows... I don't have high hopes for that. I think they've recently put up a binary installer for Windows. Not sure of the quality though (you don't want to know how messy the Clang code is!). TDM-GCC has also been updated to contain GCC 4.8.1, which is also C++11 feature-complete (and more so than VS2013), with only regex missing in the library (I think). So, that's also an option (try CodeBlocks that comes with MinGW + TDM-GCC).

Edited 11 Years Ago by mike_2000_17 because: correction

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 1 · 2014-03-10T19:50:59+00:00

The problem with that is FindFirstFile and FindNextFile know nothing about std::vector, or even c++. Those are just C functions. In order to make your suggestion work you would first have to resize the vector with enough entries to hold all the files that those two C functions might return, and you have no way of knowing that without first counting the files. Of course you could just resize the vector with some arbitrarily large number in hopes that it will be enough.

trantran 0 Junior Poster in Training · Answer 2 · 2014-03-10T21:11:19+00:00

Thanks Mike.
You perfectly understood the question and your reasoning is along what I was thinking. A proxy might solve it but would unecessarily complicating everything.
Regarding the speed cost, it may add up because if you travel a full directory tree (with > 1M files entry) that is memory buffered, going from one item to another is just a jump of ptr but copying the structure probably becomes a time-critical event (yes I know, it is more probably the memory access lag).
But yes I was thinking in a more general way as I want to have a generic function that would be optimised for any situation including much bigger structures. I am somewhat disappointed with the iterators in general as I think there could have been smarter way to do it that would avoid all those unecessary copying.
However, I am very enthusiastic about c++11 that I find immemsely superior to the previous version (and I can finaly use it with vs2013, the first msft version that really works (at last!)). (I still am unable to compile the more complete latest versions of clang. I have quit after 1 week of intense trying and total exhaustion with all possible kind of errors - I think I will reassess linux only in 1-2 years when they fix things up once and for all and you can use it without thinking about it - I'd really loved it if linux could eventually work out).

trantran 0 Junior Poster in Training · Answer 3 · 2014-03-10T22:20:21+00:00

"- Cost of a context-switch: at least a few thousand clock-cycles, up to about a million clock-cycles"

I wasn't aware of this. Thanks from pointing it to me.

" the reason you cannot implement your optimization is because the output iterators (like back-inserter) are given a value as *it = value; which uses the assignment operator, which can only overload on the receiving end ...."

That is precisely the problem I have found with them.

"how messy the clang code is"

I am quite surprise of this. I have read just the opposite: that clang code (which apparently uses C++) is much cleaner and simpler and supposedly result in much faster compilation.
Have you had much experience with it?

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 4 · 2014-03-10T23:17:35+00:00

Have you had much experience with it?

Don't pay attention to what I said, I was just venting some repressed anger. I've just started to dig into clang's code, and yesterday, I spent several hours chasing down a resource-leak issue through the maze that is the front-end architecture of clang, just to find that they intentionally leak most of their high-level data structures, and even more aggregious, they hide the leaks so that memory profilers will give them a "clean bill of health". Basically, they constructed such a messy architecture that they don't know how to deconstruct it cleanly, and so instead, they just leak it all, including file handles and the whole shabang. Very nasty stuff. As soon as you depart from RAII in C++, you set yourself up for failure in the long-run, IMHO.

There are also tons of nasty hacks that I had never even dreamed of in my worst nightmares, and with no obvious reason for using them. And I have barely seen a small cross-section of the code so far.

I'm sure that other compilers are even more messy under-the-hood. Clang is indeed nicer and faster, and more easily adaptable due to being coded in C++. However, that doesn't guarantee that things won't get messy, and they already have. And, it is still very much a work in progress (the code is riddled with "FIXME" and "TODO" stuff everywhere).

But again, I'm just venting here.

trantran 0 Junior Poster in Training · Answer 5 · 2014-03-11T00:21:56+00:00

Interesting, but isn't apple using clang/llvm as its main tool? If clang is that bad and leaks how come osx and its application appears to have a good reputation as far as reliability?

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 6 · 2014-03-11T00:47:51+00:00

Clang is great. Not perfect, not problem-free, not super-clean, not without its challenges, but still great.

input iterator optimisation

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers