which STL container?

Question

A Haunted Army 0 Light Poster

12 Years Ago

i'm not sure which STL container to use, what i'm going to be using it for is adding elements at the end and removing elements from any point inside it, i've looked into the different containers but its all a bit confusing so some advice would be great.

c++

Edited 12 Years Ago by A Haunted Army

3 Contributors
12 Replies
300 Views
4 Days Discussion Span
Latest Post 12 Years Ago Latest Post by A Haunted Army

mike_2000_17 2,669 21st Century Viking

12 Years Ago

This sounds like a space partitioning tree. I'm guessing it has something to do with collision detection. In any case, as rubberman said, this is far from being a trivial problem in which picking an STL container is not sufficient (you need something much more elaborate than what the STL can provide off-the-shelf), and, in addition, picking a correct storage strategy (within a more elaborate space partitioning tree) will not be a trivial task either.

The main concept that you must get acquainted with is the concept of "locality of reference" (aka "memory locality"). This is what can make or break a space partitioning structure (and, more generally, any search tree structure). The idea is as follows: the memory accessing patterns that characterize your main tasks (e.g., look-up, nearest-neighbor, etc.) must map well to your underlying storage such that traversals are reasonably sequential and operations are reasonably localized in physical memory. The point is that if you randomly jump back and forth in memory, you are maximizing the amount of cache misses, which, in turn, completely dominates the performance (or lack thereof) because one cache miss is roughly equivalent to a couple of hundred clock cycles (instructions). For example, if every distance computation (between a query point and a node of the search tree) takes about 20 instructions to compute, then having a cache miss at every new node visited is going to make the performance about 10 to 20 times worse than what you could get with a proper memory layout. For that reason, in large data sets, a linear search algorithm that traverses a contiguous storage (e.g., std::vector) is often going to perform better than a space-partitioning that has a very poor memory layout (e.g., a linked-list-style storage). There are a number of efficient storage strategies for search trees, most of them generalize well to a space partitioning problem, such as a Breadth-first layout (analogous to a Heap structure), Cache-Oblivious B-tree, and a few others.

Typically, a naive implementation of a space-partitioning tree (or similar search trees) will have a performance between O(N^0.9) to O(N) in practice (and this has repeatedly been reported in CS literature), and with a fairly large constant factor, which means that it takes a very large data set to see any advantage to using those techniques as compared to a linear search (O(N) but very small constant factor). However, when an appropriate memory layout is used, the performance of the space-partitioning tree comes very close to the theoretically optimal O(log(N)) (usually within a small power, e.g., O(log(N)^1.1) or so). They still have a larger constant factor, but they quickly (as N grows) become much better than linear search. And that's what I mean by the fact that memory locality makes or breaks a space partitioning (or search tree) implementation.

As for the space partitioning itself (i.e., not how to store elements, but how to partition the elements), there are a number of possible techniques. They are all search trees, some binary, some of higher arity, some variable arity (e.g., van Emde Boas trees), and thus, in all cases, the underlying storage strategy is paramount for good performance. But, additionally, you need a good method to partition the elements. The most popular for a "vector space" is the Kd-tree, which is quite simple to implement, but relies on the assumption that you have different coordinates that are suitable to partition the space. There are also more naive solutions like Octrees or Quadtrees, which are specialized for 3-d and 2-d cases, but are generally wasteful and don't perform very well in practice (compared to Kd-trees). For more general spaces, the alternatives include the Vantage-point tree (or multi-vantage point trees), or the M-tree (personally not a big fan of that one, but I haven't tested its performance).

In any case, dynamically updating the space-partitioning is always going to be tricky. The speed of any query (e.g., for collision checking) into a tree is going to depend on the balance of the tree (i.e., how close is the tree to the optimal depth of log(N), if the tree depth is exactly log(N) then it is perfectly balanced, but if it is too deeper then performance will suffer). Maintaining the tree to a nearly balanced configuration at all times, while elements are inserted, deleted or updated, is a delicate task, and there is no free lunch. At one end, you could simply not care for balancing at all, leading to poor query performance. On the other end, you could re-construct the tree at every insertion / deletion of an element, leading to optimal query performance but terrible performance at every insertion / deletion. Algorithms used in Kd-tree, VP-tree, or M-tree implementations usually find a middle ground, which often consist of adding / removing elements without any extra work as long as the tree seems reasonably balanced (upon quick, local inspection around the insertion-/deletion-point), otherwise try and do a bit of local re-construction of the tree, and if it gets really bad, re-construct the whole thing. This usually exhibits amortized performance, often around O(N^0.1 log(N)) or so in practice.

A common technique in computer game applications is to distinguish between the (quasi-)static environment (trees, building, terrain, etc.) and the dynamic environment (characters, vehicles, bullets, etc.). This way, you can have a very good static space partitioning structure to do collision checking with your static environment, and use a linear search for the dynamic elements of the environment (which are usually not too many).

So, as you see, this is not a trivial problem. I would suggest that you first use linear search, which performs best with contiguous storage (e.g., std::vector) and doesn't require house-keeping (re-structuring) nor insertions/deletions from the middle of the array. And, worry about space-partitioning only when the poor performance of the linear search is the limiting factor in your application, not before. It is likely that linear search will be sufficient, for the reasons mentioned above. As rubberman said, the brute-force approach is sometimes the most appropriate (not in theory, but in practice). A naive linear search is usually better than a naive space partitioning (as long as you don't use a linked-list for the linear search! which would be really stupid).

As you might have guessed, I have actually been through this. I used linear search for some nearest-neighbor queries, until it was clear that it was the limiting factor for the performance of the algorithm in which it was used. Then, I set out to create a proper space partitioning implementation, and it was very hard to achieve good performance (and, at first, it was even a challenge to beat the linear search for anything but very large data sets). But, through careful design of the memory layout, it was possible to get really good performance. And when I applied the same memory layout optimizations to the overall algorithm, it made things even better. But this is really hard to do, not for the faint of heart.

mike_2000_17 2,669 21st Century Viking

12 Years Ago

the linked list would point to what's above, below, left, right, previous and next, with this it would be i think fairly easy to move objects around inside the map by jumping from node to node.

Linked-lists perform extremely bad for any kind of traversals. It might seem "easy" to move from node to node, but easy doesn't mean efficient. There are ways to implement a linked-list for efficient traversals, but that's also really hard and rarely worth the trouble.

for computer and player controlled objects i was thinking of just dividing there location by the size of each block globally and locally to get the position though i'm not sure of the computation overhead from a lot of division operations.

This is basically what a Kd-tree is, or similarly, an Octree / Quadtree. A simple solution that could be a good start would be to do a coarse Kd-tree partitioning, i.e., recursively divide the space only up to some coarse resolution, and store an array of objects inside each area. That would make the array of object in local areas quite small, meaning that dealing with them is easy and quick. And because the final division is coarse (not too small in volume), objects won't tend to transfer from one area to another too often, reducing the amount of maintenance to do (remove an object from one area and insert it in another). This certainly won't be optimal or anything like that, but it will be easy to do.

mike_2000_17 2,669 21st Century Viking

12 Years Ago

i've been looking more into kd-trees and they all use pointers to parent nodes which is what i thought a linked list was...

Well, Kd-trees are, first and foremost, trees. So, yes, they are naturally expressed as linked-list-style structure: each node has a parent pointer and a few (2 or more) pointers to children nodes (with the exception of the root (no parent) and the leafs (no children)). But take this with a grain of salt. When you read descriptions of the structure or the algorithms (insertion / deletion, etc.), they will often assume this kind of linked-structure, and you will probably also see that in example implementations (or pseudo-code). That doesn't mean that it should be implemented exactly in that way. There are many ways to actually implement a tree structure (especially with fixed branching factors), and a linked-structure (i.e. pointers to parent-children) is usually the worst way to do it, but it is also the easiest to use in pseudo-code / examples. Unfortunately, some (or many) are misled into thinking that the pseudo-code or the explanation of the algorithm / data-structure actually corresponds directly to how it is implemented in real life. In this domain, these types of explanation must be regarded more as if someone told you "it has a metal frame, two wheels, a set of pedals, and a handle bar" as a way to instruct you on how to build a bicycle. In other words, there are lots of additional implementation details and choices that are not necessarily discussed, and naively following high-level explanations as if they were detailed instructions leads to very poor results. And that is usually what is referred to as a "naive implementation". I wouldn't be surprised if many Kd-tree implementations out there are implemented this way, but certainly none of those perform as well as they should.

what i'm still thinking is a 4D vector array, first is Y position, secound X, third layer and fourth the objects occupying the location, to find the location i take the floating objecs position and divide it by the spacial size, then loop through the objects at the location testing collision, running triggers or just displaying the graphics, i don't think this is a kd-tree but its the simplest way i can think of implementing the map.

Ok, if I understand correctly, you mean something like this:

class Map2D {
  private:
    double m_height, m_width;
    std::size_t m_resolution;

    std::vector< std::vector< std::vector< Object > > > m_objects;
  public:

    Map2D(double aHeight, double aWidth, 
          std::size_t aResolution = 64) : 
          m_height(aHeight), m_width(aWidth),
          m_resolution(aResolution),
          m_objects(m_resolution, std::vector< std::vector< Object > >(m_resolution) ) { };

    std::vector< Object >& getObjectsNearPosition( double x, double y ) {
      // find which square (i,j) the position is in:
      std::size_t i = std::floor(( x / m_width ) * m_resolution);
      std::size_t j = std::floor(( y / m_height ) * m_resolution);
      // return the list of objects in the same square:
      return m_objects[i][j];
    };
};

The first remark with this is that you don't need a two-dimensional array for this kind of purpose, you can easily do the same (and more efficiently) with a single dimensional array (in the same way as you store a matrix in a single-dimensional array):

class Map2D {
  private:
    double m_height, m_width;
    std::size_t m_resolution;

    std::vector< std::vector< Object > > m_objects;
  public:

    Map2D(double aHeight, double aWidth, 
          std::size_t aResolution = 64) : 
          m_height(aHeight), m_width(aWidth),
          m_resolution(aResolution),
          m_objects(m_resolution * m_resolution) { };

    std::vector< Object >& getObjectsNearPosition( double x, double y ) {
      // find which square (i,j) the position is in:
      std::size_t i = std::floor(( x / m_width ) * m_resolution);
      std::size_t j = std::floor(( y / m_height ) * m_resolution);
      // return the list of objects in the same square:
      return m_objects[ j * m_resolution + i ];
    };
};

In other words, all the "squares" of the grid are stored as one horizontal line after another in memory. That's far better and simpler than a 2D array (e.g., vector of vector). An even better storage strategy is to use Morton ordering.

This is, of course, not a Kd-tree or even a Quadtree, this is a fixed partitioning, or a fixed-size discretization of space, or regular grid, or whatever other term you like. Here are some basic issues with this approach:

If the resolution is too high, you're going to end up with very few objects in each square and you will typically have to look at multiple squares (in a neighborhood) in order to do collision checking (or whatever else). It would also mean that moving objects will often cross the boundary of a square and will need to be transfered to the other square. And, of course, a high resolution means higher memory consumption (overhead) which might be wasteful.
On the other hand, if the resolution is too low, each square is very big (in dimension) and has lots of (unsorted) objects making any kind of collision checking or local operations somewhat inefficient (i.e., you are not getting the benefits of space partitioning).
In any case (low or high resolution), you also need the world to fit nicely in a square (width x height), having all areas more or less equally filled with objects (i.e., uniform distribution).

These are precisely the problems that a Kd-tree (or other similar approaches) tries to overcome. Instead of splitting the world uniformly (i.e., as a regular equally-sized grid), it splits it uniformly with respect to the objects in the world, that is, instead of spliting the world in half (by its dimensions), you find a dividing line that puts half of the objects on one side and half on the other. This guarantees that the divisions of the space match the spatial distributions of the objects, that is, the final groups of objects all contain more or less the same number of objects but might differ a lot in the volume / area they cover, depending on how spaced-out the objects are in that area. As for the high versus low resolution problem, you essentially specify roughly how many objects you want in each final square, and the "resolution" (in terms of size of the squares) will adapt to whatever is appropriate to roughly meet that requirement of an average number of objects in each square. Of course, this has a price in complexity (for look-ups and maintainance) and memory overhead (storing the divisions of the space), but it usually pays off.

With that said, if you know that your objects are uniformly distributed all over a reasonably square world, and you can come up with a reasonable fixed-resolution, then that solution will work just fine. The problem is that most of the time this is not the case, and that's why Kd-trees / Quadtrees (2D) / Octrees (3D) are used more often.

for the object structure i'm still thinking of 6 nodes, one pointing to the next spacial location to the left, right, up and down, the other 2 will point to the previous and next objects at the location, if the object moves out of its current spacial location then moving it to the next one is just swapping pointers around.

I don't see any benefit to this, just overhead for no good reason. And unless your objects are all neatly placed on a grid, you're gonna have trouble maintaining this arrangement. If by "spatial location" you mean the squares of the fixed grid, then you have either its (i,j) index or its (x,y) position, or both, and then it is a trivial matter to move left / right, up / down, or whatever, just by incrementing / decrementing those indices or positions and looking up the new square from the grid (which will hopefully be close to the previous one, and by "close" I mean in memory).

Moving the objects between spatial locations is trivial. And this is just the same as always in programming. If the objects are complicated to copy or move around, then just store (shared-)pointers to those objects and move those pointers around (this is called an "indirect indexing"). If they are cheap to copy or move around, then copy or move them around by value, you gain nothing by avoiding a cheap copy / move. In a nutshell, this is why linked-lists are so useless in real life.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

rubberman 1,355 Nearly a Posting Virtuoso Featured Poster · Answer 1 · 2013-02-23T17:41:03+00:00

More information as to what specifically your use-case is would be helpful. I use vectors and lists for this all the time, but what pattern I utilized depends a lot upon what I am trying to accomplish.

A Haunted Army 0 Light Poster · Answer 2 · 2013-02-23T17:53:04+00:00

i have a 5 dimensional map made up of vectors to store a world, first 2 vectors are for global positions to help narrow down looping instead of looping through the entire thing of potentially thousands of objects, the next 2 vectors are for local location of the objects, the 5 vector is just a layer, differnet layer for different things such as triggers, backgrounds ETC.

at the object location i want to store multiple different objects so multiple objects can occupy the same location, i want to be able to quickly add and removed elements from any point inside the container, this could happen quite a bit if the object is moving so its constantly changing location in the map.

i'm just unsure as to which STL container is best for the job.

rubberman 1,355 Nearly a Posting Virtuoso Featured Poster · Answer 3 · 2013-02-23T18:02:17+00:00

Well, you can have vectors of vectors (to your specified number of dimensions), or you can create a struction with the needed dimensions. Locations have "locality of reference" - ie, a moving body (car/plane/train/bus/etc) will change it's locality, but in a temporal manner - so you might want to model that as well.

This is not a trivial problem. Sometimes, brute-force is the appropriate solution to a problem. My preference in such situations is to track the absolue current location, which requires some processing overhead to keep up-to-date (delete from last known position, and adding to current location). YOU need to determine what will be best for your needs! :-)

Again, this is a GREAT problem, and as I said, there is NO one best answer to deal with it!

A Haunted Army 0 Light Poster · Answer 4 · 2013-02-23T18:07:34+00:00

A Haunted Army 0 Light Poster

12 Years Ago

what exactly is a struction?

A Haunted Army 0 Light Poster · Answer 5 · 2013-02-23T21:15:47+00:00

my head just exploded.... thanks!!

i was thinking of a 6 way linked list to store static objects and reference more floating objects that aren't computer controlled so some physics can be applied. the linked list would point to what's above, below, left, right, previous and next, with this it would be i think fairly easy to move objects around inside the map by jumping from node to node. for computer and player controlled objects i was thinking of just dividing there location by the size of each block globally and locally to get the position though i'm not sure of the computation overhead from a lot of division operations.

i still need to digest that epic post though and that's going to take a while, i've really droped myself in the deep end hehe.

you are probably right though, a liniar search will work fine since this is just a small platformer engine/game not some huge world, though i would to support huge worlds...

A Haunted Army 0 Light Poster · Answer 6 · 2013-02-23T22:00:19+00:00

ok... so should i change my 5D dimensional arry to a 4D one? first 2 are the partitions, 3rd is layer and 4th a list of objects to locally search through?

hm...

i should go read the kd-tree article you linked.

A Haunted Army 0 Light Poster · Answer 7 · 2013-02-24T15:35:38+00:00

i think i'm missunderstanding linked lists, i've been looking more into kd-trees and they all use pointers to parent nodes which is what i thought a linked list was... looking more into kd-tree it looks very close to what i have in mind myself but implemented differently.

what i'm still thinking is a 4D vector array, first is Y position, secound X, third layer and fourth the objects occupying the location, to find the location i take the floating objecs position and divide it by the spacial size, then loop through the objects at the location testing collision, running triggers or just displaying the graphics, i don't think this is a kd-tree but its the simplest way i can think of implementing the map.

for the object structure i'm still thinking of 6 nodes, one pointing to the next spacial location to the left, right, up and down, the other 2 will point to the previous and next objects at the location, if the object moves out of its current spacial location then moving it to the next one is just swapping pointers around.

though, i think what you(mike) have said has gone right over my head, some advice and perhaps practical examples would be great.

A Haunted Army 0 Light Poster · Answer 8 · 2013-02-24T22:05:10+00:00

yup, something like that...

thanks for the explanation on examples, i always thought that examples showed basic implementation of how its done, i never thought it was just there to help explain the concept/algoritm but not an actual good implementation.

i decided 2D array because its easier for me to follow and i was worried about different object positions calculating to the same location in a single 1D array but if 1D is better then can i get a quick explanation on why?

this kd-tree does have me very interested because i would like my map to be more dynamic, as in objects in the map can be any shape and size instead of being uniform which something i've been keeping in mind when designing this and is the main reason behind the list of objects occupieing the same location. is there any good tutorials articles on kd-trees to give me a better idea on how its implemented because everything i can find is just a short write up with a linked list set up and i'm now curious to know how it can be done without a linked list set up..

A Haunted Army 0 Light Poster · Answer 9 · 2013-02-28T03:02:20+00:00

been thinking and i think when loading the map from file i'll store it in a multi-dimensional vector then once its finnished loading move all the objects into a single vetctor using morton ordering and start the game, this should allow me to not require knowing the dimensions of a map a head of time.

is this a good idea? if so should i use std::unique_ptr to store each object and move the responsibility of the objects from the multidimensional vector to the single vector instead of copying each each value inside the object?

thanks for all the advice, been a great help.